CN113822371A

CN113822371A - Training packet model, and method and device for grouping time sequence data

Info

Publication number: CN113822371A
Application number: CN202111163343.6A
Authority: CN
Inventors: 朱志博; 刘子奇; 张志强; 周俊
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2021-09-30
Filing date: 2021-09-30
Publication date: 2021-12-21

Abstract

The embodiment of the specification provides a method for training a grouping model and grouping time sequence data. The method for training the grouping model comprises the following steps: acquiring a time sequence sample set, wherein any sample comprises a sequence of n index values of a single business object in n time periods. And inputting the samples into a grouping model to obtain the prediction probability distribution of the samples belonging to K groups. In addition, the samples are respectively input into K decoding networks corresponding to the K groups to obtain K reconstructed samples. Therefore, the total distribution loss can be determined according to the prediction probability distribution and the preset prior distribution of each sample in the sample set; and determining the total reconstruction loss according to the K reconstruction samples respectively corresponding to each sample and the prediction probability distribution. And then training a grouping model and K decoding networks according to the total distribution loss and the total reconstruction loss. After training, the micro timing samples may be grouped using the trained grouping model.

Description

Training packet model, and method and device for grouping time sequence data

Technical Field

One or more embodiments of the present specification relate to the field of machine learning, and more particularly, to a grouping model for training time series data, a method for grouping time series data using the grouping model, and a corresponding apparatus.

Background

With the development of computer technology, machine learning has been applied to various technical fields for analyzing and predicting various business data. In an internet environment, time series data is a common source of analysis.

The time series data is a sequence of a series of data that occurs and is recorded over a time line. In many scenarios, the time-series data of interest is the result of aggregation by a more fine-grained summary of the data. For example, the time sequence of the total sales volume of a shop with time may be a summary of the time sequence data of all the sales volumes of the goods under the shop; the time sequence of the total traffic (or access volume) of a website with time may be a summary of time sequence data of all user access behaviors of accessing the website. In such a case, such fine-grained time-series data may be referred to as microscopic time-series data, and the summary of such fine-grained time-series data may be referred to as macroscopic time-series data.

In many cases, macroscopic deployment is performed by predicting macroscopic timing data. For example, some network platforms need to predict the access volume and traffic in different periods in the future, so that server clusters can be deployed better in advance for traffic changes; some social platforms need to predict popularity or number of reviews for some review objects, such as songs, movies, books, so that business strategy deployment can be better performed, e.g., page arrangement, user triage, user drainage, etc.

However, due to the complexity of their construction, macroscopic timing data presents some difficulties with its timing prediction. The most direct prediction mode is to directly establish a time sequence prediction model aiming at macroscopic time sequence data. Because the microcosmic time series data are collected, a large amount of microcosmic data information, such as different potential laws of microcosmic data, can be lost in the mode; the lack of such information may make it difficult to build a suitable macroscopic timing model. However, if the prediction model is directly established for the microscopic data and the macroscopic dimensions are obtained by aggregation, although the information of the microscopic data is considered, a better prediction effect cannot be obtained due to the fact that the data characteristics of macroscopic changes cannot be seen.

Thus, improved schemes are desired that allow for better processing of temporal data and thus better prediction of macroscopic temporal data.

Disclosure of Invention

One or more embodiments of the present specification describe a training grouping model, and a method and an apparatus for grouping time series data, wherein the grouping of the micro time series data is intelligently implemented by directly starting from the micro time series data itself and discovering the rules of different micro time series data through a hybrid model.

According to a first aspect, there is provided a method of training a packet model, comprising:

acquiring a time sequence sample set, wherein any first time sequence sample comprises a sequence formed by n index values of a single service object in n time periods;

inputting the first time sequence sample into a grouping model to obtain the prediction probability distribution of the sample belonging to K groups;

inputting the first time sequence samples into K decoding networks corresponding to K groups respectively to obtain K reconstructed samples;

determining total distribution loss according to the prediction probability distribution of each sample in the time sequence sample set and the preset prior distribution of the samples among K groups;

determining total reconstruction loss according to K reconstruction samples respectively corresponding to each sample and the prediction probability distribution;

and training the grouping model and the K decoding networks according to the total distribution loss and the total reconstruction loss.

In various embodiments, the single business object may be one of: a single user, a single commodity, a single shop, a single service item, a single product; the index value is one of: click volume, sales volume, flow volume.

According to one embodiment, the method further comprises preprocessing the first time-series sample according to a mean of the n index values.

According to one embodiment, the packet model includes a coding network and a packet network; inputting the first time sequence sample into a grouping model to obtain the prediction probability distribution of the sample belonging to K groups, including: inputting the first time sequence sample into a coding network to obtain the coding characteristics of the first time sequence sample; and inputting the coding characteristics into the packet network to obtain the prediction probability distribution.

Further, in one embodiment of the above embodiment, the encoding network is a time-series based neural network; inputting the first time sequence sample into a coding network to obtain the coding characteristics of the first time sequence sample, which specifically includes: and sequentially inputting the n index values into the time sequence-based neural network to obtain the coding features.

In one embodiment of the above embodiment, the packet network is implemented as a multi-layer perceptron MLP; inputting the coding features into the packet network to obtain the predictive probability distribution, specifically including: and processing the coding characteristics through a multi-layer perceptron MLP to obtain the prediction probability distribution.

In one embodiment, the K decoding networks have the same network structure and different network parameters.

According to one embodiment, the K decoding networks include an arbitrary first decoding network; inputting the first time sequence samples into K decoding networks corresponding to the K groups respectively to obtain K reconstructed samples, wherein the K reconstructed samples comprise: -inputting a first time-sequential fraction of the first time-sequential samples into a first decoding network, the first decoding network predicting a second time-sequential fraction from the first time-sequential fraction, -forming first reconstructed samples based on the second time-sequential fraction.

Further, in an embodiment, the first time segment includes the first s index values of the n index values; the second time sequence segment comprises the last s index values; wherein s is greater than n/2.

According to one embodiment, each index value in the first time sequence segment may be sequentially input to the first decoding network, and the first decoding network performs rolling prediction on a next index value to form a second time sequence segment.

According to another embodiment, the whole of the first time-sequence segment may be input to the first decoding network, and the first decoding network obtains a segment characterization vector of the first time-sequence segment and predicts a second time-sequence segment based on the segment characterization vector.

According to an embodiment, the determining the total distribution loss specifically includes: obtaining the total posterior distribution of each sample in the K groups according to the prediction probability distribution of each sample in the time sequence sample set; and determining the total distribution loss according to the distribution difference between the preset prior distribution and the total posterior distribution.

According to another embodiment, determining the total distribution loss specifically comprises: obtaining a sample distribution loss corresponding to the first time sequence sample according to the distribution difference between the prediction probability distribution of the first time sequence sample and the preset prior distribution; and obtaining the total distribution loss according to the sample distribution loss of each sample.

In one embodiment, the distribution difference is a KL divergence between distributions.

According to one embodiment, the determining the total structural loss comprises: for any first decoding network in the K decoding networks, determining the single-network single-sample reconstruction loss of the first decoding network for a first time sequence sample; taking K probabilities corresponding to the K decoding networks in the prediction probability distribution of the first time sequence sample as weights, and summing the single-network single-sample reconstruction losses of the K decoding networks for the first time sequence sample to obtain a single-sample reconstruction loss of the first time sequence sample; summing the single-sample reconstruction losses for each sample in the time series sample set to obtain the total reconstruction loss.

In one embodiment of the above embodiment, the single-network single-sample reconstruction loss is determined as follows: obtaining a first reconstructed sample of the first decoding network for the first timing sample; and determining the reconstruction loss of the single-network single sample according to the difference value between the predicted value of each time interval in the first reconstruction sample and the index value of the corresponding time interval in the first time sequence sample.

In another embodiment of the above embodiment, the single-network single-sample reconstruction loss is determined as follows: acquiring the mean and the variance of the index value distribution predicted by the first decoding network for each target time interval according to a plurality of target time intervals covered by the reconstructed samples; determining the distribution deviation of each target time interval according to the mean value and the variance of each target time interval and the index value of the target time interval in the first time sequence sample; determining the single-network single-sample reconstruction loss according to the summation of the distribution deviations of the plurality of target time periods.

According to an embodiment, training the packet model and the K decoding networks according to the total distribution loss and the total reconstruction loss specifically includes: combining the total distribution loss and the total reconstruction loss according to a preset weight to obtain a total loss; adjusting parameters in the grouping model and the K decoding networks in a direction of total loss reduction.

According to a second aspect, there is provided a method of grouping temporal data, comprising:

acquiring a time sequence sample to be detected, wherein the time sequence sample comprises a sequence formed by n index values of a single service object in n time periods;

inputting the time sequence sample to be tested into a grouping model, and outputting the prediction probability distribution of the time sequence sample belonging to K groups, wherein the grouping model is obtained by training according to the method of the first aspect;

and determining a target group with the highest probability value in the prediction probability distribution, and classifying the time sequence sample to be tested into the target group.

According to a third aspect, there is provided a method of predicting macroscopic temporal data, comprising:

acquiring N time sequence samples, wherein each time sequence sample comprises a sequence formed by N index values of a single service object in N time periods;

inputting each time sequence sample into a grouping model, and outputting the prediction probability distribution of the time sequence sample belonging to K groups, wherein the grouping model is obtained by training according to the method of the first aspect;

dividing each time sequence sample into a group with the highest probability value in the corresponding prediction probability distribution, thereby obtaining K groups of time sequence samples;

for each group of the K groups of time sequence samples, aggregating all the time sequence samples in the group to obtain a fusion time sequence sample corresponding to the group;

determining a prediction index value of a business object sub-group corresponding to the group in a target time period based on the fusion time sequence sample;

and determining the macroscopic index value of the total group of the business objects in the target period based on the prediction index values respectively corresponding to the K groups.

According to a fourth aspect, there is provided an apparatus for training a packet model, comprising:

the acquisition unit is configured to acquire a time sequence sample set, wherein any first time sequence sample comprises a sequence formed by n index values of a single service object in n periods;

the grouping processing unit is configured to input the first time sequence sample into a grouping model to obtain the prediction probability distribution of the sample belonging to K groups;

the reconstruction processing unit is configured to input the first time sequence samples into K decoding networks corresponding to the K groups respectively to obtain K reconstruction samples;

the distribution loss determining unit is configured to determine total distribution loss according to the prediction probability distribution of each sample in the time sequence sample set and preset prior distribution of the samples among the K groups;

a reconstruction loss determining unit configured to determine a total reconstruction loss according to K reconstruction samples respectively corresponding to the samples and the prediction probability distribution;

and the training unit is configured to train the grouping model and the K decoding networks according to the total distribution loss and the total reconstruction loss.

According to a fifth aspect, there is provided an apparatus for grouping temporal data, comprising:

the acquisition unit is configured to acquire a time sequence sample to be detected, wherein the time sequence sample comprises a sequence formed by n index values of a single business object in n time periods;

the model processing unit is configured to input the time sequence sample to be tested into a grouping model and output the prediction probability distribution of the time sequence sample belonging to K groups, and the grouping model is obtained according to the device training of the fourth aspect;

and the grouping unit is configured to determine a target group with the highest probability value in the prediction probability distribution and classify the time sequence sample to be tested into the target group.

According to a sixth aspect, there is provided an apparatus for predicting macroscopic temporal data, comprising:

the acquisition unit is configured to acquire N time sequence samples, wherein each time sequence sample comprises a sequence formed by N index values of a single service object in N time periods;

a model processing unit configured to input each time-series sample into a grouping model and output a prediction probability distribution of the time-series sample belonging to K groups, the grouping model being obtained by training according to the apparatus of the fourth aspect;

the grouping unit is configured to divide each time sequence sample into groups with the highest probability value in the corresponding prediction probability distribution so as to obtain K groups of time sequence samples;

the aggregation unit is configured to aggregate the time sequence samples in each group of the K groups of time sequence samples to obtain fusion time sequence samples corresponding to the group;

the prediction unit is configured to determine a prediction index value of the business object sub-group corresponding to the group in a target time period based on the fusion time sequence sample;

and the determining unit is configured to determine a macroscopic index value of the total group of the business objects in the target period based on the prediction index values corresponding to the K groups respectively.

According to a seventh aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of any one of the first to third aspects.

According to a fourth aspect, there is provided a computing device comprising a memory and a processor, wherein the memory has stored therein executable code, and wherein the processor, when executing the executable code, implements the method of any one of the first to third aspects.

In an embodiment of the present specification, to group micro time series data, a grouping model is trained. And the hybrid model is trained synchronously with the grouping model, the deep rule of the micro time sequence data is found through the distribution of the micro time sequence data in the hybrid model, and the micro time sequence data with similar distribution and similar rule is classified into one class. Therefore, the splitting of the macro time sequence is intelligently and automatically realized based on the micro behavior law reflected by the mixed model without being limited by the obvious attribute, and the macro prediction performance is further improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic diagram of an implementation of grouping micro timing data according to one embodiment;

FIG. 2 illustrates a flow diagram of a method of training a packet model, according to one embodiment;

FIG. 3 illustrates a process diagram for training a packet model according to one embodiment;

FIG. 4 illustrates a flow diagram of a method of grouping temporal data, according to one embodiment;

FIG. 5 illustrates a flow diagram of a method of predicting macroscopic temporal data, according to one embodiment;

FIG. 6 illustrates an apparatus diagram for training a packet model according to one embodiment;

FIG. 7 is a diagram illustrating an apparatus for grouping temporal data according to one embodiment;

fig. 8 shows a schematic structural diagram of an apparatus for predicting macroscopic data according to an embodiment.

Detailed Description

The scheme provided by the specification is described below with reference to the accompanying drawings.

As previously mentioned, macroscopic timing data has some difficulties with its timing prediction due to the complexity of its composition. Whether prediction is made directly for macroscopic data or prediction is made for microscopic data followed by aggregation, it is difficult to achieve the desired accuracy and effect.

To this end, in one embodiment, from the granularity of prediction, temporal prediction is split between the two ways of directly predicting macroscopic data and predicting microscopic data. In other words, a certain strategy can be used to split the macro time series data into a plurality of sub-time series, and the sub-time series are respectively predicted and then summarized to obtain a macro prediction result. Compared with direct prediction of macroscopic data, the scheme can utilize more useful information and mine different rules of a microscopic time sequence to assist in prediction; compared with direct prediction of microscopic data, the scheme avoids the problem caused by incapability of capturing macroscopic time sequence change, and can greatly reduce the time sequence modeling cost.

In the above embodiment, proper splitting of the macroscopic time series data is crucial for the final prediction effect. Intuitively, the macroscopic time sequence data can be split according to some obvious attributes or indexes, such as geographic positions, product types and the like. However, these explicit attributes or indicators usually do not contain much microscopic time sequence information, and the attributes/indicators that can be taken as the basis for the splitting are hundreds of thousands, and finding a suitable indicator requires strong expert knowledge, and at the same time, requires a lot of trial and error costs, and is time-consuming and labor-consuming.

In view of this, in the embodiments of the present specification, a time sequence splitting method for macroscopic time sequence prediction is provided, which has a technical idea that, directly starting from microscopic time sequence data itself, rules of different microscopic time sequence data are found through a hybrid model, data that obey the same microscopic behavior rules are classified into one class, and splitting of a macroscopic time sequence is intelligently achieved, so as to assist in improving prediction performance of the macroscopic time sequence.

FIG. 1 is a diagram illustrating an implementation of grouping micro timing data according to one embodiment. It is understood that splitting the macroscopic timing is equivalent to grouping the microscopic timing data. According to the embodiment of the present specification, a packet model is trained by a hybrid model formed by a plurality of decoding networks, and then, the micro time series data can be packetized by using the packet model. Therefore, the implementation process of fig. 1 can be divided into a training phase and a using phase of the grouping model, which are described separately below.

In the training phase, a desired number K of packets may be set, i.e., it is desired to split the macro timing data into K sub-timings, or to divide the micro timing data into K packets. Furthermore, an a priori distribution of a large number of microscopic time series samples among the K groups may also be assumed. Based on this, a hybrid model for assistance can be constructed, which includes a plurality of decoding networks for reconstructing microscopic time-series samples. The number of decoding networks is equal to the number of expected packets K. I.e. the hybrid model comprises K decoding networks, corresponding to K packets.

In the training process, as shown in the above diagram, for a single microscopic time series sample x in the sample set, the sample x is input into the packet model to be trained, and the probability distribution P predicted by the packet model and belonging to K packets (corresponding to K decoding networks) of the sample x can be obtained. In addition, the micro time sequence samples x are respectively input into K decoding networks for decoding reconstruction. Each decoding network i reconstructs the sample x and outputs a reconstructed sample x⁽ⁱ⁾Then K decoding networks respectively obtain K reconstructed samples x⁽¹⁾To x^(K)。

After performing the above probability prediction and sample reconstruction for each sample in the sample set, the overall prediction loss may be determined based on the predicted probability distribution P of each sample, the reconstructed sample, and the foregoing prior distribution, wherein the overall prediction loss includes the distribution loss of the predicted probability distribution and the prior distribution, and the reconstruction loss related to the sample reconstruction. Therefore, parameter optimization can be carried out on the grouping model and the K decoding networks based on the overall prediction loss, and training can be achieved.

After the training of the grouping model is completed, the grouping model can be used to group the micro timing data. As shown in the following figure, for a single microscopic time series sample y to be grouped, the sample y is input into the trained grouping model, and the probability distribution predicted by the grouping model and belonging to K groups of the sample y can be obtained. The packet with the highest probability is selected from the probability distribution as a target packet, and the sample y is divided into the target packets. In this way, grouping of individual microscopic time series samples is achieved. Further, a prediction of the macroscopic timing may be made based on the microscopic timing data thus grouped.

In the implementation process, the hybrid model is trained synchronously with the grouping model, the deep-layer rule of the micro time sequence data is found through the distribution of the micro time sequence data in the hybrid model, and the micro time sequence data with similar distribution and similar rule are classified into one class. Therefore, the splitting of the macro time sequence is intelligently and automatically realized based on the micro behavior law reflected by the mixed model without being limited by the obvious attribute, and the macro prediction performance is further improved.

The above implementation is described in detail below.

FIG. 2 illustrates a flow diagram of a method of training a packet model, which may be performed by any computing, processing capable device, appliance, platform, cluster of appliances, according to one embodiment. FIG. 3 illustrates a process diagram for training a packet model, according to one embodiment. The specific implementation of each step in the training process of the packet model is described in detail below with reference to fig. 2 and 3.

As shown in fig. 2, first, in step 21, a time series sample set is acquired. As can be appreciated, the time series samplesA set includes a batch, e.g., N, of microscopic time series samples. The single microscopic time sequence sample is a sequence formed by n index values of a single business object in n time intervals according to the time sequence. Which will hereinafter be referred to as the first timing sample for clarity and convenience. In fig. 3, a single microscopic time series sample is denoted as x ═ x₁,x₂,…,x_n]Wherein x is_iAnd indicating the index value of the corresponding business object in the time period i.

Depending on the macro objects to be analyzed, a single business object may be a single user, a single good, a single shop, a single service item, a single product, etc., and the index values may be, for example, click rate, visit duration, sales rate, flow rate, etc.; the n periods may take 24 hours a day, 7 days a week, 30 days a month, etc. Accordingly, in different embodiments, a single microscopic time-series sample may have different meanings. For example, in one example, where the macroscopic object to be analyzed is the total sales volume for a store, then a single microscopic time series sample may be a sequence of sales volumes for individual items in the store over various time periods; in one example, where the macroscopic object to be analyzed is the total sales volume of an e-commerce platform, then a single microscopic time series sample may be a sequence of sales volumes for individual shops in the e-commerce platform at various time periods; in another example, if the macro object to be analyzed is the review volume of a review-like website, then a single micro time-series sample may be the review volume sequence of a single product in the website at each time period; in another example, where the macroscopic object to be analyzed is a flow of service products involving flowable resources (e.g., a fund transfer-out and a transfer-in of service products involving payment), then a single microscopic time series sample may be the resource flow of a single user at various time periods. It is understood that those skilled in the art can set the corresponding micro time-series samples according to the macro object to be analyzed, and the specific implementation of the micro time-series samples may have more examples, which are not enumerated herein.

According to one embodiment, each microscopic time series sample x is preprocessed before being modeled. Specifically, in one embodiment, the sample may be preprocessed according to an average of n index values in a single microscopic time series sample, so that a numerical range of the processed index values is smaller than an original numerical range. In one example, the microscopic time series sample x may be preprocessed according to the following equation (1):

wherein x is_iThe original index value of the business object in the time period i is represented,

representing the mean of the n original index values in sample x,

is the index value of the preprocessed time interval i. By the preprocessing shown in the formula (1), the numerical range of the index value can be reduced, and the training difficulty caused by overlarge numerical value is avoided.

In another example, other preprocessing methods, such as normalization, may be used to normalize all index values to a predetermined interval.

For the above various micro time series samples x, or called first time series samples, as shown in fig. 2, in step 22, the first time series samples are input into the grouping model, and the predicted probability distribution of the sample belonging to K groups is obtained. The number of packets K may be a value set in advance as needed. Let any one of the K groups be denoted as z, then the predicted probability distribution of the grouping model output can be denoted as q (z | x), i.e., the probability that a sample x belongs to each group z.

In one embodiment, the grouping model is implemented as an overall neural network model, and the prediction probability distribution is output by processing features in the input time-series samples x.

In another embodiment, as shown in FIG. 3, the packet model includes a coded network and a packet network. Accordingly, the processing procedure of the grouping model to the first time sequence sample xThe method may include inputting the first timing sample x into an encoding network to obtain an encoding characteristic u of the first timing sample x, where u ═ f_enc(x)，f_encA coding function representing a coding network; then, the coding feature u is input into the packet network to obtain the predictive probability distribution q (z | x).

In one example, the coding network is a timing-based neural network, such as a recurrent neural network RNN, a long short term memory network LSTM, or the like. In this case, the n index values in the first time-series sample are sequentially input to the time-series-based neural network in time order, and the neural network encodes the input sample based on time-series accumulation to obtain the encoding characteristic u.

In another example, the above coding network may also be a non-time-sequential neural network that can be used to process multidimensional inputs, e.g., a convolutional neural network CNN, or an attention-based neural network such as a Transformer or the like. In such a case, the index values in the first time-series sample may be input to the coding network together as a multi-dimensional input; the encoding network processes the multidimensional input, for example based on an attention mechanism, resulting in an encoding feature u.

The coding signature u is then input into the packet network for prediction of the packet probability. In one embodiment, the packet network is implemented as a multi-layer perceptron MLP. In this way, the multi-layer perceptron MLP processes the above-mentioned coding feature u to obtain the prediction probability distribution q (z | x), that is:

q(z|x)＝MLP(u) (2)

in other examples, the packet network may be implemented by a more complex neural network, and is not limited herein.

On the other hand, in step 23 of fig. 2, the first time-series samples x are input into K decoding networks corresponding to K packets, respectively, to obtain K reconstructed samples. As described above, K decoding networks are constructed corresponding to K preset groups to form a hybrid model. Typically, the K decoding networks have the same network structure, different network parameters. The following describes a process of sample reconstruction for the first time-sequence sample x by using any one of the decoding networks z as an example.

In one embodiment, a portion of a time-sequential slice of a first time-sequential sample x, referred to as a first time-sequential slice, may be input to a decoding network z that predicts another portion of the time-sequential slice from the first time-sequential slice, referred to as a second time-sequential slice, thereby forming a reconstructed sample x' based on the predicted second time-sequential slice. Thus, the reconstructed samples x' typically include prediction values for part of the segments in the n segments.

For example, in the example of FIG. 3, the first timing fragment is x₁To x_n-m(ii) a The second time sequence segment is x₂To x_nWhere m is generally chosen to be less than n/2. More typically, m may be set to 1, in which case the decoding network z is based on the first time-sequence fragment x₁To x_n-1Predicting a second temporal segment x₂-x_nPredicted x₂-x_nConstituting the reconstructed sample x'.

In other examples, the first/second timing segments may have other arrangements. For example, in one specific example, the first time sequence segment may include the first s index values of the n index values corresponding to the n time periods; the second time sequence segment comprises the last s index values; wherein s is greater than n/2. For example, when the n time intervals are 12 months, s can be selected to be 9, and at the moment, the coding network predicts the index value of 4-12 months based on the index value of 1-9 months.

The coding network may predict the second temporal segment in a number of ways.

In one embodiment, the encoding network z is a time-sequential based neural network, such as RNN or LSTM, and is arranged to roll prediction from period to period. Specifically, each index value in the first time sequence segment may be sequentially input to the decoding network z, and the decoding network z performs rolling prediction on the next index value to form the second time sequence segment.

For example, in one specific example, x₁Inputting a decoding network z realized by LSTM, the LSTM network being dependent on the initial state and the inputted x₁To obtain a hidden state h₁According to the hidden state h₁Predicting the index value x at the next time₂(ii) a Then will beTrue x₂Inputting into LSTM network, the LSTM network being in accordance with the hidden state h of the last moment₁And the current input x₂To obtain the current hidden state h₂According to the current hidden state h₂Predicting the index value x at the next time₃. Continuing so, it may be based on the first timing fragment x₁To x_n-1Predicting a second temporal segment x₂-x_nThe reconstructed sample x' is constructed. In another example, the first timing segment takes x₁To x_n-mThen, based on x, in the above manner₁To x_n-mPrediction x₂To x_n-m+1Then, from x_n-m+1Starting, inputting the predicted values into the LSTM network one by one, and continuing to predict the subsequent index values until x_nThereby forming reconstructed samples x'.

In another embodiment, the decoding network z predicts the second temporal segment based on an overall characterization of the first temporal segment. At this time, the whole first time sequence segment may be input into a decoding network z, and the decoding network z obtains a segment characterization vector h of the first time sequence segment, and then predicts the second time sequence segment based on the segment characterization vector h. In a specific example, the decoding network z may be implemented as a transform network, and based on an attention mechanism, an overall segment characterization vector h of the first time-series segment is obtained. The second time-sequential segment is then predicted by the decoding stage based on the vector h, forming reconstructed samples x'. In yet another example, the decoding network z may be implemented as a seq2seq network, predicting elements in the second time-series segment one by one based on the segment characterization vector h of the first time-series segment, constituting the reconstructed sample x'.

In further embodiments, the decoding network z may also form the reconstructed samples by other means, such as a self-encoding network. The present description does not limit the manner and process of decoding the network reconstructed samples.

In addition, it should be noted that the processing of the first timing sample x by the packet model in step 22 and the processing of the sample x by the K decoding networks in step 23 may be performed in any relative order, for example, sequentially, and concurrently, and are not limited herein.

After

steps

22 and 23 are performed for the N micro time series samples in the sample set, respectively, the loss in the whole prediction process can be determined, and the packet model and the K decoding networks are tuned and trained accordingly. The details are as follows.

In step 24, a total distribution loss L1 is determined according to the predicted probability distribution q (z | x) of each sample in the time-series sample set and the preset prior distribution of the samples among K packets.

As mentioned above, on the basis of setting the desired number K of groups, an a priori distribution, denoted as p (z), of a large number of micro-timing samples among the K groups can also be set. In a simpler scenario, the prior distribution can be assumed to be evenly distributed among the K packets. In the case of having more expert knowledge about the scene characteristics, other prior distribution forms may be set for the scene characteristics, for example, the distribution of the samples among the K groups is set to conform to a non-uniform distribution, and so on. Thus, based on the prior distribution p (z), and the predicted probability distribution q (z | x) of the grouping model for each sample, the total distribution loss associated with the sample distribution is determined.

In an embodiment, the prediction probability distributions of N samples in a sample set may be aggregated to obtain a total prediction distribution, or referred to as a total posterior distribution, denoted as q (z) of the N samples in K groups, and then the total distribution loss L1 is determined according to a distribution difference between a preset prior distribution p (z) and the total posterior distribution q (z).

In one example, the predicted probability distributions for N samples may be averaged to obtain an overall posterior distribution q (z), i.e.:

in determining the distribution difference between the prior distribution p (z) and the overall posterior distribution q (z), a number of calculations may be used. For example, the KL divergence between p (z) and q (z) can be calculated as its distribution difference as follows:

in another example, the difference in distribution between p (z) and q (z) can also be calculated using cross entropy.

According to another embodiment, the distribution loss of the sample corresponding to the sample x can be calculated according to the distribution difference between the predicted probability distribution q (z | x) of the single microscopic time sequence sample x and the prior distribution p (z); then, the sample distribution losses of the respective samples in the sample set are integrated to obtain a total distribution loss L1. In determining the distribution difference between the predicted probability distribution q (z | x) and the prior distribution p (z), cross entropy, KL divergence, and other calculation methods may be similarly employed.

Thus, the total distribution loss L1 associated with the sample distribution is determined in a number of ways.

On the other hand, in step 25, a total reconstruction loss L2 is determined according to K reconstructed samples corresponding to the respective samples and the predicted probability distribution.

In one embodiment, the total reconstruction loss L2 may be obtained by multiple summations. Specifically, for any decoding network z in the K decoding networks, the single-network single-sample reconstruction loss L of the decoding network z for the first time sequence sample x is determined_x(ii) a Taking K probabilities corresponding to K decoding networks in the prediction probability distribution of the first time sequence sample x as weights, and reconstructing the loss L of the single network single sample of the first time sequence sample x for the K decoding networks_xSumming to obtain the single sample reconstruction loss of the first time sequence sample x; and summing the single sample reconstruction losses of the N samples in the sequence sample set, and obtaining a total reconstruction loss L2 according to the summation result. This calculation can be represented by the following equation:

L2＝∑_x∑_zq(z|x)*L_X (5)

wherein L is_xRepresenting the single network single sample reconstruction penalty of the single decoding network z for the first timing sample x. The single-network single-sample reconstruction loss may be determined in a variety of ways.

In one example, the first time sequence sample x of the decoding network z is obtainedA reconstructed sample x'; and determining the single-network single-sample reconstruction loss according to the difference value between the predicted value of each time interval in the first reconstruction sample x' and the index value of the corresponding time interval in the first sequence sample x. Specifically, the single-network single-sample reconstruction loss L can be determined by the following equation (6)_x：

L_X＝∑_i(x′_I-x_I)² (6)

Wherein x is_i' to decode the predicted value, x, of network z for time period i_iIs the true value of the time segment i in the first timing sample x. The summation of the time segments i in the formula depends on the length of the time-series segment covered by the reconstructed samples. For example, in FIG. 3, the sample is reconstructed as x₂To x_nIn the example of (3), the time period i takes a value of 2 to n.

In another example, assuming that the index values of the individual samples conform to a distribution for each time period i, the single-network single-sample reconstruction loss is determined from the distribution. Specifically, it can be assumed that, in each period i, the index value of each sample conforms to a gaussian distribution; accordingly, the reconstructed samples output by the decoding network z are set to indicate the mean μ of the distribution of index values for each time interval i, and the decoding network z also predicts the variance σ of the distribution of index values. On the basis of obtaining the mean and the variance of the prediction of the decoding network z for the time interval i, the index value x of the time interval i in the first time sequence sample x is obtained according to the mean and the variance_iDetermining the distribution deviation of the time period i; furthermore, the reconstruction loss L of the single sample of the single network is determined according to the summation of the distribution deviation of each time interval covered by the reconstruction sample_x。

In one specific example, the single-sample reconstruction loss L for a single network is determined by the following equation (7)_x：

Wherein x is_IIs the true value, mu, of the time period i in the first time-sequence sample x_iFor decoding the mean, σ, of the prediction of the network z for the time period i based on the first time-sequence samples_iWhen isThe predicted variance of segment i. The summation range of period i is the length of the time-series segment covered by the reconstructed sample.

Substituting the manner of determining the single-network single-sample reconstruction loss in equation (7) into equation (5) may result in, in one specific example, a calculation process for determining the total reconstruction loss L2, as represented by the following equation:

in other embodiments, some variations and modifications can be made to the calculation method in the above example, for example, adjusting the summation sequence in formula (5), and modifying L_xAnd so on, to determine the total reconstruction loss L2 by more calculation, and the specific calculation process for determining L2 is not limited herein.

It should be further noted that the determination of the total distribution loss in step 24 and the determination of the total configuration loss in step 25 may be performed in any relative order, such as sequentially or in parallel, and are not limited herein.

On the basis of the above, in step 26, the above grouping model and K decoding networks can be trained according to the total distribution loss L1 and the total reconstruction loss L2. In one embodiment, the total loss L may be obtained by combining the total distribution loss L1 and the total configuration loss L2 according to a preset weight α. The total loss L can be expressed, for example, as:

L＝L1+α*L2 (9)

further, the parameters in the above grouping model and the K decoding networks are adjusted in the direction of the total loss reduction. In this process, the packet model and the K decoding networks are jointly trained.

Thus, through training of a plurality of batches of sample sets, the trained packet model and K decoding networks can be obtained. The training of the K decoding networks is to assist the training of the packet model, and after the training is completed, the decoding networks may not be used any more or may be used for other purposes. And for the trained grouping model, the micro time sequence samples can be directly grouped by using the grouping model.

Fig. 4 illustrates a flow diagram of a method of grouping temporal data, which may be performed by any device, apparatus, platform, cluster of apparatuses having computing and processing capabilities, according to one embodiment. As shown in fig. 4, the method may include the following steps. A method of grouping temporal data, comprising:

in step 41, a time series sample to be tested is obtained, which includes a sequence of n index values of a single service object in n time periods. The time series sample to be measured can be denoted as sample y.

In step 42, the time series sample to be tested is input into the grouping model, and the prediction probability distribution of the time series sample to be tested belonging to K groups is output. The packet model is trained according to the method described above in connection with fig. 2.

Then, in step 43, the packet with the highest probability value in the above predicted probability distribution is determined as a target packet, and the time series sample y to be measured is classified into the target packet.

Through the method, grouping of the single microscopic time sequence samples y is achieved. Each of the large number of microscopic time series samples to be processed may be processed in the manner described above so as to be divided into K groups, respectively. Because the grouping model and the mixed model composed of K decoding networks are synchronously trained, the output of the grouping model reflects the distribution of the input micro time sequence samples among the K components of the mixed model, and the samples with similar distribution have similar deep-layer rules on the micro characteristics, so that the micro time sequence data with the similar deep-layer rules can be classified into one class through the processes.

The grouping of the micro time series data can be used for various scenes and purposes. For example, grouping each microscopic time series sample may be used to perform a distribution process on the corresponding business object, for example, distributing to K processing units (which may be K groups of customer service staff, K sub-servers) provided corresponding to K groups, and the like. Grouping a large number of micro time sequence samples can facilitate further analysis and processing of grouped data. Typically, the grouping can be used to split the macro timing data, so as to predict the macro index based on the grouped data.

Fig. 5 illustrates a flow diagram of a method for predicting macro timing data, which may be performed by any device, apparatus, platform, cluster of apparatuses having computing and processing capabilities, according to an embodiment. As shown in fig. 5, the method includes the following steps.

In step 51, N time series samples are obtained, wherein each time series sample comprises a sequence of N index values of a single service object in N time periods.

At step 52, each time series sample is input into a grouping model, and the grouping model outputs a prediction probability distribution of the time series sample belonging to K groups; wherein the packet model is trained according to the method described above in connection with fig. 2.

Next, in step 53, each time series sample is divided into groups with the highest probability value in the corresponding prediction probability distribution, so as to obtain K groups of time series samples. In this way, a division of the N time-sequential samples into K groups is achieved.

Next, in step 54, for each group j of the K groups of time series samples, the time series samples in the group j are aggregated to obtain a fused time series sample corresponding to the group j. In one embodiment, the aggregation may be to sum the time-series samples in one packet to obtain a fused time-series sample. In another example, the average value of each time-series sample in a group can be used as the fused time-series sample corresponding to the group.

Then, in step 55, the prediction index value of the business object sub-population corresponding to the group in the target time period may be determined based on the fused time series sample. The prediction process may be implemented by a time-series prediction model. The algorithm structure, the construction method, and the training method of the time sequence prediction model are not limited herein.

Finally, in step 56, based on the prediction index values respectively corresponding to the K groups, a macro index value of the total group of the business objects in the target time period is determined, so as to realize macro prediction.

Because the grouping of the microcosmic time sequence samples is carried out based on the deep layer rule of the microcosmic samples, and the macro prediction is carried out based on the subsequences grouped in such a way, the rule of the microcosmic time sequence can be better mined as assistance, and the prediction performance is improved.

According to an embodiment of another aspect, an apparatus for training a packet model is provided. Fig. 6 is a schematic structural diagram of an apparatus for training a packet model according to an embodiment, which may be deployed in any device, platform, or device cluster having data storage, computation, and processing capabilities. As shown in fig. 6, the training apparatus 600 includes:

an obtaining unit 61, configured to obtain a time-series sample set, where an arbitrary first time-series sample includes a sequence of n index values of a single service object in n time periods;

a grouping processing unit 62 configured to input the first time sequence sample into a grouping model, and obtain a predicted probability distribution of the sample belonging to K groups;

a reconstruction processing unit 63 configured to input the first time sequence samples into K decoding networks corresponding to the K packets, respectively, to obtain K reconstructed samples;

a distribution loss determining unit 64 configured to determine a total distribution loss according to the predicted probability distribution of each sample in the time series sample set and a preset prior distribution of the samples among the K packets;

a reconstruction loss determining unit 65 configured to determine a total reconstruction loss according to K reconstruction samples respectively corresponding to the samples and the prediction probability distribution;

a training unit 66 configured to train the packet model and the K decoding networks according to the total distribution loss and the total reconstruction loss.

The specific implementation of the training apparatus may refer to the training method described in conjunction with fig. 2.

According to an embodiment of yet another aspect, an apparatus for grouping temporal data is provided. Fig. 7 is a schematic structural diagram of an apparatus for grouping time series data according to an embodiment, which may be deployed in any device, platform, or device cluster having data storage, computation, and processing capabilities. As shown in fig. 7, the grouping apparatus 700 includes:

the acquiring unit 71 is configured to acquire a time sequence sample to be detected, where the time sequence sample includes a sequence of n index values of a single service object in n time periods;

a model processing unit 72 configured to input the time sequence sample to be tested into a grouping model, and output a prediction probability distribution of the time sequence sample belonging to K groups, wherein the grouping model is obtained by training according to the apparatus shown in fig. 6;

and the grouping unit 73 is configured to determine a target group with the highest probability value in the predicted probability distribution and to classify the time sequence sample to be tested into the target group.

The specific implementation of the training apparatus may refer to the grouping method described in conjunction with fig. 4.

According to an embodiment of yet another aspect, an apparatus for predicting macroscopic temporal data is provided. Fig. 8 is a schematic structural diagram of an apparatus for predicting macro data, which may be deployed in any device, platform, or device cluster having data storage, computation, and processing capabilities, according to an embodiment. As shown in fig. 8, the prediction apparatus 800 includes:

an obtaining unit 81 configured to obtain N time series samples, where each time series sample includes a sequence of N index values of a single service object in N time periods;

a model processing unit 82 configured to input each time-series sample into a grouping model, which is trained according to the apparatus shown in fig. 6, and output a predicted probability distribution of which each time-series sample belongs to K groups;

a grouping unit 83 configured to divide each time series sample into groups with the highest probability value in the corresponding prediction probability distribution, so as to obtain K groups of time series samples;

an aggregation unit 84 configured to aggregate, for each of the K groups of time-series samples, each time-series sample in the group to obtain a fused time-series sample corresponding to the group;

the prediction unit 85 is configured to determine a prediction index value of the service object sub-group corresponding to the group in the target time period based on the fusion time sequence sample;

the determining unit 86 is configured to determine a macro index value of the total group of the business objects in the target period based on the prediction index values corresponding to the K groups, respectively.

The specific implementation manner of the prediction apparatus may refer to the prediction method described in conjunction with fig. 5.

Through the devices, the hybrid model is trained synchronously with the grouping model, the deep-layer rule of the micro time sequence data is found through the distribution of the micro time sequence data in the hybrid model, and the micro time sequence data with similar distribution and similar rule are classified into one class. Therefore, the splitting of the macro time sequence is intelligently and automatically realized based on the micro behavior law reflected by the mixed model without being limited by the obvious attribute, and the macro prediction performance is further improved.

According to an embodiment of another aspect, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method described in connection with fig. 2, fig. 4, or fig. 5.

According to an embodiment of yet another aspect, there is also provided a computing device comprising a memory and a processor, the memory having stored therein executable code, the processor, when executing the executable code, implementing the method described in conjunction with fig. 2, fig. 4, or fig. 5.

Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in this invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.

The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present invention should be included in the scope of the present invention.

Claims

1. A method of training a packet model, comprising:

2. The method of claim 1, wherein the single business object is one of: a single user, a single commodity, a single shop, a single service item, a single product; the index value is one of: click volume, sales volume, flow volume.

3. The method of claim 1, further comprising preprocessing the first time series of samples according to a mean of the n index values.

4. The method of claim 1, wherein the packet model comprises a coded network and a packet network; inputting the first time sequence sample into a grouping model to obtain the prediction probability distribution of the sample belonging to K groups, including:

inputting the first time sequence sample into a coding network to obtain the coding characteristics of the first time sequence sample;

and inputting the coding characteristics into the packet network to obtain the prediction probability distribution.

5. The method of claim 4, wherein the encoding network is a timing-based neural network; inputting the first time sequence sample into a coding network to obtain the coding characteristics of the first time sequence sample, which specifically includes:

and sequentially inputting the n index values into the time sequence-based neural network to obtain the coding features.

6. The method of claim 4, wherein the packet network is implemented as a multi-layer perceptron (MLP); inputting the coding features into the packet network to obtain the predictive probability distribution, specifically including:

and processing the coding characteristics through a multi-layer perceptron MLP to obtain the prediction probability distribution.

7. The method of claim 1, wherein the K decoding networks have the same network structure, different network parameters.

8. The method of claim 1, wherein the K decoding networks include an arbitrary first decoding network; inputting the first time sequence samples into K decoding networks corresponding to the K groups respectively to obtain K reconstructed samples, wherein the K reconstructed samples comprise:

-inputting a first time-sequential fraction of the first time-sequential samples into a first decoding network, the first decoding network predicting a second time-sequential fraction from the first time-sequential fraction, -forming first reconstructed samples based on the second time-sequential fraction.

9. The method of claim 8, wherein the first timing segment comprises the first s of the n index values; the second time sequence segment comprises the last s index values; wherein s is greater than n/2.

10. The method of claim 8, wherein inputting a first time slice of the first time-ordered samples into a first decoding network, the first decoding network predicting a second time-ordered slice from the first time-ordered slice comprises:

and sequentially inputting each index value in the first time sequence segment into the first decoding network, and the first decoding network carries out rolling prediction on a next index value to form a second time sequence segment.

11. The method of claim 8, wherein inputting a first time slice of the first time-ordered samples into a first decoding network, the first decoding network predicting a second time-ordered slice from the first time-ordered slice comprises:

and integrally inputting the first time sequence segment into the first decoding network, obtaining a segment characterization vector of the first time sequence segment by the first decoding network, and predicting a second time sequence segment based on the segment characterization vector.

12. The method of claim 1, wherein determining a total distribution loss comprises:

obtaining the total posterior distribution of each sample in the K groups according to the prediction probability distribution of each sample in the time sequence sample set;

and determining the total distribution loss according to the distribution difference between the preset prior distribution and the total posterior distribution.

13. The method of claim 1, wherein determining a total distribution loss comprises:

obtaining a sample distribution loss corresponding to the first time sequence sample according to the distribution difference between the prediction probability distribution of the first time sequence sample and the preset prior distribution;

and obtaining the total distribution loss according to the sample distribution loss of each sample.

14. The method according to claim 12 or 13, wherein the distribution difference is a KL divergence between distributions.

15. The method of claim 1, wherein determining a total reconstruction loss comprises:

for any first decoding network in the K decoding networks, determining the single-network single-sample reconstruction loss of the first decoding network for a first time sequence sample;

taking K probabilities corresponding to the K decoding networks in the prediction probability distribution of the first time sequence sample as weights, and summing the single-network single-sample reconstruction losses of the K decoding networks for the first time sequence sample to obtain a single-sample reconstruction loss of the first time sequence sample;

summing the single-sample reconstruction losses for each sample in the time series sample set to obtain the total reconstruction loss.

16. The method of claim 15, wherein determining a single-network single-sample reconstruction penalty for the first decoding network for the first timing sample comprises:

obtaining a first reconstructed sample of the first decoding network for the first timing sample;

and determining the reconstruction loss of the single-network single sample according to the difference value between the predicted value of each time interval in the first reconstruction sample and the index value of the corresponding time interval in the first time sequence sample.

17. The method of claim 15, wherein determining a single-network single-sample reconstruction penalty for the first decoding network for the first timing sample comprises:

acquiring the mean and the variance of the index value distribution predicted by the first decoding network for each target time interval according to a plurality of target time intervals covered by the reconstructed samples;

determining the distribution deviation of each target time interval according to the mean value and the variance of each target time interval and the index value of the target time interval in the first time sequence sample;

determining the single-network single-sample reconstruction loss according to the summation of the distribution deviations of the plurality of target time periods.

18. The method of claim 1, wherein training the packet model and the K decoding networks according to the total distribution loss and total reconstruction loss comprises:

combining the total distribution loss and the total reconstruction loss according to a preset weight to obtain a total loss;

adjusting parameters in the grouping model and the K decoding networks in a direction of total loss reduction.

19. A method of grouping temporal data, comprising:

inputting the time sequence sample to be tested into a grouping model, and outputting the prediction probability distribution of the time sequence sample belonging to K groups, wherein the grouping model is obtained by training according to the method of claim 1;

20. A method of predicting macroscopic temporal data, comprising:

inputting each time sequence sample into a grouping model, and outputting a prediction probability distribution of the time sequence sample belonging to K groups, wherein the grouping model is obtained by training according to the method of claim 1;

21. An apparatus for training a packet model, comprising:

22. An apparatus that groups temporal data, comprising:

a model processing unit configured to input the time series sample to be tested into a grouping model, and output a prediction probability distribution of the time series sample to be tested, wherein the prediction probability distribution belongs to K groups, and the grouping model is obtained by training according to the device of claim 21;

23. An apparatus for predicting macroscopic temporal data, comprising:

a model processing unit configured to input each time series sample into a grouping model, which is trained according to the apparatus of claim 21, and output a predicted probability distribution whose distribution belongs to K groups;

24. A computer-readable storage medium, having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of any one of claims 1-20.

25. A computing device comprising a memory and a processor, wherein the memory has stored therein executable code that, when executed by the processor, performs the method of any of claims 1-20.