CN113112030A

CN113112030A - Method and system for training model and method and system for predicting sequence data

Info

Publication number: CN113112030A
Application number: CN202110497221.4A
Authority: CN
Inventors: 姚权铭; 时鸿志
Original assignee: 4Paradigm Beijing Technology Co Ltd
Current assignee: 4Paradigm Beijing Technology Co Ltd
Priority date: 2019-04-28
Filing date: 2019-04-28
Publication date: 2021-07-13
Anticipated expiration: 2039-04-28
Also published as: CN113112030B; CN110097193A; CN110097193B

Abstract

A method and system for training a model and a method and system for predicting sequence data are provided. The method and system for training the model can obtain a sequence training sample set, and train a machine learning model based on the sequence training sample set, wherein the machine learning model is a hidden Markov model comprising two hidden state layers, wherein a first hidden state layer comprises a personalized hidden state of each object in a plurality of objects, and a second hidden state layer comprises a plurality of shared hidden states shared by the plurality of objects. The method and system of predicting sequence data may acquire a sequence prediction sample of a subject and perform prediction on the sequence prediction sample using the machine learning model trained in advance to predict next sequence data after a series of sequence data arranged in time order to provide a prediction result on next sequence data after the series of sequence data.

Description

Method and system for training model and method and system for predicting sequence data

The present application is a divisional application of patent applications entitled "method and system for training a model and method and system for predicting sequence data" filed on 28/4/2019 and having an application number of 201910349922.6.

Technical Field

The present application relates generally to the field of artificial intelligence and, more particularly, to a method and system for training a machine learning model for predicting sequence data and a method and system for predicting sequence data using a machine learning model.

Background

With the advent of massive data, artificial intelligence technology is rapidly developing, and machine learning is a necessary product of artificial intelligence development to a certain stage, which is dedicated to mining valuable potential information from massive data through a calculation means.

Mining the law behind sequence data (e.g., mobile location data, music listening sequence, etc.) by modeling the sequence data that continuously appears through machine learning is very important for various application scenarios. For example, personalized sequence behavior is ubiquitous in our daily lives, and simulating such behavior is very important for many application scenarios. For example, modeling trajectory data (one of the sequence data) helps to understand the flow laws of the user, which can facilitate improving ride sharing services and traffic; modeling the music listening sequence helps to reveal the continuous law behind the behavior of people, thereby being convenient for enhancing the accuracy of content recommendation; modeling the order in which the user purchases the goods is beneficial to analyzing the user's preferences, thereby facilitating targeted advertising; there are many such scenarios, not limited to these. In all these application scenarios, an important feature is that the sequence pattern reflected by the sequence data is highly personalized, and different objects may have completely different sequence rules, so that a model for effectively learning personalized sequence data is needed.

Hidden Markov Models (HMMs) are one of the models used to model sequence data, and are often used for sequence modeling because they not only carve sequence patterns, but also discover the states behind hidden sequence patterns. However, sequence modeling using HMMs often has problems, for example, if we train one HMM for each object, a reliable HMM model cannot be trained using very limited data because there is too little data for the object; if we use the data of all objects to train an HMM for all objects, the trained model will lose its individualization. At present, although researchers propose to group objects according to the similarity of sequence data of the objects and train an HMM for each group, this method still forces different objects (objects in the same group) to share one HMM, which leads to the fact that the model still has insufficient individuation for the objects, and further leads to the difficulty in meeting the requirement of prediction accuracy when predicting object sequence data by using the trained model.

Disclosure of Invention

The invention aims to solve the problem that the existing HMM model cannot simultaneously process the scarcity of training data and the diversity of sequence patterns of different objects, for example, the prediction accuracy of sequence data is improved in a scene related to the prediction of object sequence data (such as sequence behaviors).

According to an exemplary embodiment of the present application, there is provided a method of training a machine learning model for predicting sequence data, the method may include: obtaining a set of sequential training samples, wherein the set of sequential training samples comprises a plurality of sequential training samples for each of a plurality of subjects, and each sequential training sample comprises a plurality of sequence data arranged in a chronological order; training the machine learning model based on the set of sequence training samples, wherein the machine learning model is a hidden Markov model comprising two hidden state layers, wherein a first hidden state layer comprises a personalized hidden state for each of the plurality of objects and a second hidden state layer comprises a plurality of shared hidden states shared by the plurality of objects.

According to another exemplary embodiment of the present application, there is provided a computer-readable storage medium storing instructions that, when executed by at least one computing device, cause the at least one computing device to perform a method of training a machine learning model for predicting sequence data as described above.

According to another exemplary embodiment of the application, a system is provided comprising at least one computing device and at least one storage device storing instructions, wherein the instructions, when executed by the at least one computing device, cause the at least one computing device to perform a method of training a machine learning model for predicting sequence data as described above.

According to another exemplary embodiment of the present application, there is provided a system for training a machine learning model for predicting sequence data, the system may include: a training sample acquisition device configured to acquire a set of sequential training samples, wherein the set of sequential training samples includes a plurality of sequential training samples for each of a plurality of subjects, and each sequential training sample includes a plurality of sequence data arranged in chronological order; training means configured to train the machine learning model based on the set of sequence training samples, wherein the machine learning model is a hidden Markov model comprising two hidden state layers, wherein a first hidden state layer comprises a personalized hidden state for each of the plurality of objects and a second hidden state layer comprises a plurality of shared hidden states shared by the plurality of objects.

According to another exemplary embodiment of the present application, there is provided a method of predicting sequence data using a machine learning model, the method may include: obtaining a sequence prediction sample of a subject, wherein the sequence prediction sample comprises a plurality of time-ordered sequence data of the subject; performing prediction on the sequence prediction sample using the machine learning model to provide a prediction result regarding next sequence data following the plurality of sequence data, wherein the machine learning model is trained in advance to predict next sequence data following a series of sequence data arranged in time order, and the machine learning model is a hidden Markov model including two hidden state layers, wherein a first hidden state layer includes a personalized hidden state of each object of a plurality of objects, and a second hidden state layer includes a plurality of shared hidden states shared by the plurality of objects.

According to another exemplary embodiment of the present application, there is provided a computer-readable storage medium storing instructions that, when executed by at least one computing device, cause the at least one computing device to perform a method of predicting sequence data using a machine learning model as described above.

According to another exemplary embodiment of the application, a system is provided comprising at least one computing device and at least one storage device storing instructions, wherein the instructions, when executed by the at least one computing device, cause the at least one computing device to perform the method of predicting sequence data using a machine learning model as described above.

According to another exemplary embodiment of the present application, there is provided a system for predicting sequence data using a machine learning model, which may include: a prediction sample acquisition device configured to acquire a sequence prediction sample of a subject, wherein the sequence prediction sample includes a plurality of sequence data of the subject arranged in time series; a prediction device configured to perform prediction on the sequence prediction sample to provide a prediction result on next sequence data following the plurality of sequence data, using the machine learning model, wherein the machine learning model is trained in advance to predict the next sequence data following the series of sequence data for a time-series arrangement, and the machine learning model is a hidden markov model including two hidden state layers, wherein a first hidden state layer includes a personalized hidden state of each of a plurality of objects, and a second hidden state layer includes a plurality of shared hidden states shared by the plurality of objects.

According to the method and the system for training the machine learning model, a hidden Markov model comprising two hidden state layers can be trained, and because the first hidden state layer of the hidden Markov model comprises a personalized hidden state of each object in a plurality of objects and the second hidden state layer comprises a plurality of shared hidden states shared by the objects, the scarcity of training data can be overcome, and the diversity of sequence modes of different objects can be ensured, so that the trained hidden Markov model can provide more accurate sequence data prediction results for different objects.

The method of predicting sequence data using a machine learning model according to an exemplary embodiment of the present application provides personalized prediction of sequence data for different objects since the sequence data is predicted using the above-described hidden markov model including two hidden state layers, so that it is possible to improve the prediction accuracy of the sequence data.

Drawings

These and/or other aspects and advantages of the present application will become more apparent and more readily appreciated from the following detailed description of the embodiments of the present application, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a block diagram illustrating a system for training a machine learning model for predicting sequence data according to an exemplary embodiment of the present application;

FIG. 2 is a schematic diagram of a hidden Markov model sharing a hidden state according to an exemplary embodiment of the present application;

FIG. 3 is a flow diagram illustrating a method of training a machine learning model for predicting sequence data according to an exemplary embodiment of the present application;

FIG. 4 is a block diagram illustrating a system for predicting sequence data using a machine learning model according to an exemplary embodiment of the present application;

FIG. 5 is a flowchart illustrating a method of predicting sequence data using a machine learning model according to an exemplary embodiment of the present application.

Detailed Description

In order that those skilled in the art will better understand the present application, exemplary embodiments of the present application will be described in further detail below with reference to the accompanying drawings and detailed description.

Fig. 1 is a block diagram illustrating a system 100 for training a machine learning model for predicting sequence data (hereinafter, simply referred to as "model training system" for convenience of description) according to an exemplary embodiment of the present application. As shown in FIG. 1, the model training system 100 may include a training sample acquisition device 110 and a training device 120.

Specifically, the training sample acquiring means 110 may acquire a set of sequence training samples. Here, the set of sequence training samples may include a plurality of sequence training samples for each of a plurality of subjects, and each sequence training sample may include a plurality of sequence data arranged in time order. As an example, the plurality of sequence data may relate to behavioral data of the object at different points in time. Here, the behavior data may include continuous feature data reflecting the behavior of the object, for example, position data of the object (for example, position data of a user riding), and the like, but is not limited thereto. Alternatively, the behavior data may include discrete characteristic data reflecting the behavior of the object, such as, but not limited to, a content ID of content accepted by the object (e.g., the content may be various types of content such as music, video, advertisements, images, etc.). As another example, the plurality of sequence data may relate to status data of the subject at different points in time, such as physiological status data of the user (e.g., blood pressure, blood glucose, etc.), prices of goods, stocks, and so forth.

For example, in a scenario in which a moving trajectory of an object (for example, a user or a vehicle) is predicted, the training sample acquisition means 110 may acquire a series of historical position data of each of a plurality of objects arranged in time series to constitute the above-described series of training samples; in the content recommendation scenario, the training sample acquiring means 110 may acquire content IDs of a series of history acceptance contents arranged in time series for each of a plurality of users to constitute the series of training samples; in a scenario involving the prediction of physiological states of subjects (human or animal), the training sample acquisition means 110 may acquire a time-series of historical physiological state data of each of a plurality of subjects to constitute the above-described series of training samples; in a scenario involving commodity or stock price prediction, the training sample acquiring device 110 may acquire a chronological sequence of historical price data of each of the same commodity or stock to construct the sequence training sample.

In the present application, the object may be a living person or an inanimate object (for example, a machine, a commodity, stock, or the like). Furthermore, sequence data may be the appearance or nature of a subject in a particular respect at different points in time, and is not limited to merely behavioral or status data.

Specifically, as an example, the training sample acquiring device 110 may acquire a historical data record set of a plurality of subjects, and construct the sequence training sample set based on the historical data record set of the plurality of subjects. Alternatively, the training sample acquiring device 110 may directly acquire the set of training samples of the sequence generated by other devices from the outside. Here, the training sample acquiring apparatus 110 itself performs operations to construct a sequence training sample set. For example, the training sample acquiring device 110 may acquire the historical data record manually, semi-automatically or automatically, or process the acquired historical data record so that the processed historical data record has a suitable format or form. Here, the training sample acquiring device 110 may receive the historical data records manually input by the user through an input device (e.g., a workstation), or the training sample acquiring device 110 may acquire the historical data record set from the data source in a fully automatic manner, for example, by systematically requesting the data source to send the historical data record set to the training sample acquiring device through a timer mechanism implemented in software, firmware, hardware, or a combination thereof, or may automatically perform the acquisition of the historical data record set in the case of human intervention, for example, requesting the acquisition of the historical data record set in the case of receiving a specific user input. Each time the historical data record is acquired, the data record acquisition device 110 may preferably store the captured data in a non-volatile memory. As an example, a data warehouse may be utilized to store the acquired historical data records as well as the processed historical data records.

Here, when constructing the sequence training sample set, for a plurality of time-sequentially arranged historical data records of each object, if a time interval between two adjacent historical data records satisfies a preset condition, the training sample acquiring device 110 may segment the plurality of time-sequentially arranged historical data records of each object, and then obtain a plurality of sequence training samples of the object. For example, the preset condition may be that the time interval between any two adjacent historical data records is greater than a predetermined time threshold, but is not limited thereto, and for example, the preset condition may also be that the time interval between any two adjacent historical data records is within a predetermined time range. Here, as an example, each history data record may include a plurality of data attribute fields, such as an object identification field, an object behavior data field, a behavior occurrence time field, and the like. The training sample acquiring device 110 may first obtain a plurality of historical data records of each object according to the object identification field in the acquired historical data record set of the plurality of objects, then may arrange the plurality of historical data records of each object in a time sequence, and may perform slicing if a time interval between two adjacent historical data records in the arranged plurality of historical data records is greater than a preset threshold, so that the time interval between any two adjacent historical data records in each sliced historical data record subset is less than or equal to the preset threshold.

To more intuitively represent the slicing process, assume that a history of objects is defined as a tuple r_n＝<u_n，t_n，e_n> (wherein u)_nIs the user ID, e_nIs historical sequence data, t_nIs a time stamp (i.e., and e)_nThe corresponding time stamp). Here, e_nEither continuous or discrete data. As an example, when the historical data record relates to object behavior, in a scenario involving a movement location prediction, e_nWhich may be, for example, position data of an object and may be represented as a two-dimensional continuous vector e_n＝(l_o，l_a) Where lo represents longitude and la represents latitude. As another example, in scenarios involving content recommendations, e_nFor example, may be the singer ID of the music the user listens to. As another example, in a scenario involving prediction of a physiological state of a subject (human or animal), e_nThe data may be data reflecting the physiological state of the subject, for example, a blood pressure value, a blood glucose value, or the like. MakingAs another example, in a scenario involving price prediction of a good or stock, e_nWhich may be the price of a good or stock. However, the kind or representation form of the history data record is not limited to the above example. In this case, it is assumed that

Is a collection of historical data records for a plurality of objects if

Is that

And satisfy

(wherein. DELTA.t > 0) and

then

A sequence training sample of the constructed object may be used.

After obtaining a set of sequence training samples (including a plurality of sequence training samples for each of a plurality of subjects) in the manner described above, the training apparatus 120 may train the machine learning model based on the set of sequence training samples. In the present application, the machine learning model may be a hidden markov model including two hidden state layers, wherein a first hidden state layer includes a personalized hidden state of each of the plurality of objects, and a second hidden state layer includes a plurality of shared hidden states shared by the plurality of objects.

In order to facilitate understanding of the hidden markov model including two hidden state layers proposed in the present application (hereinafter, may be referred to as a shared hidden state hidden markov model), a brief description of a classical Hidden Markov Model (HMM) will be first made herein. HMMs assume that the sequence data of an object is controlled by a number of hidden states and that the transitions between these hidden states follow a markov assumption, i.e. the probability that the object is in the next hidden state depends only on the current hidden state. Assuming that M is the number of hidden states, a classical HMM model includes three parameter sets, which are:

(1) m-dimensional vector pi ∈ R^MWherein, is_mDefining an initial probability that an object initially accesses an mth hidden state, where z represents an initial hidden state;

(2) MxM transition probability matrix

It defines the transition probabilities between M hidden states following the Markov assumption, where a_ij＝p(z_n＝j|z_n-1I) and represents the probability of the object going from the ith hidden state to the jth hidden state;

(3) parameter set D ═ { D ═ D_m1.2, … M, which defines a set of probability distributions of M hidden states in observation space, where d_mDefining the probability distribution of the m-th hidden state on the observation space.

Next, a hidden markov model sharing a hidden state according to an exemplary embodiment of the present application will be described with reference to fig. 2.

As shown in fig. 2, compared to a classical HMM, the hidden markov model for shared hidden states in the present application may include two hidden state layers, a first hidden state layer may include a personalized hidden state for each of the plurality of objects, and a second hidden state layer may include a plurality of shared hidden states shared by the plurality of objects. The first hidden state layer comprises the personalized hidden state of each object in the plurality of objects, so that the personalized sequence mode of each object is ensured, and the plurality of objects in the second hidden state layer share the plurality of shared hidden states, so that the problem of scarcity of training data is effectively solved.

The shared hidden state hidden markov model in the present application fully follows and can reflect objective laws in practical application scenarios, for example, many people are gathered together to form hot spots or groups with similar interests often listen to a type of music that is typically shared by users and is unlikely to be affected by a single user. On the other hand, for example, user sequence behavior patterns are extremely diverse. For example, two users work together at site a, often returning home after work. Their homes are likely not in the same region, so it is not appropriate to use a single diversion model to predict their way to after the a site. In addition, for example, user 1 likes rock music and balladry, and user 2 likes rock music and rap music. Without personalized information, we can hardly predict what music they will hear after rock music. In the present application, it is through the first hidden state layer that the personalized sequence mode of each object is ensured, and through the second hidden state layer, a plurality of objects can share the hidden state which is not likely to be influenced by a single object.

The shared hidden state hidden markov model of the present application is described in further detail below with reference to fig. 2. For convenience of description, three objects are included in the first hidden-state layer and each object has three hidden states (in the first hidden-state layer, the first three circles represent three personalized hidden states of the first object, the middle three circles represent three personalized hidden states of the second object, and the last three circles represent three personalized hidden states of the third object), but it should be clear that: the present application is not limited in any way to the number of objects and the number of hidden states.

Referring to fig. 2, the number of personalized hidden states of each object in the first hidden state layer is less than the number of the plurality of shared hidden states in the second hidden state layer. As an example, in a scenario where the mobile location of an object (user or vehicle) is predicted, the personalized hidden state may include, for example, that the location of the object is in a work area, a living area, a rest area, and the like, and the shared hidden state may include some hotspots in the observation space that are shared by the object, such as a shopping mall, a restaurant, a leisure center, and the like. As another example, in a content recommendation scenario, personalized implicit states may include types of content that are commonly accepted by a particular user, e.g., ballad music, rock music, rap music, etc., while shared implicit states may include types of content that are accepted by most users, e.g., soothing music, rhythmic music, etc. As another example, in a scenario involving a prediction of a physiological state of a subject (human or animal), the personalized hidden state may be a physiological state index interval (e.g., a blood pressure change interval) common to a particular subject, and the shared hidden state may be a physiological state index interval common to a homogeneous subject; as another example, in a scenario involving a price forecast for a good or stock, the personalized hidden state may be a general price interval for a good or stock, and the shared hidden state may be a price interval for a good or stock of the same type. Although fig. 2 shows that the number of shared hidden states shared by three objects is 8, it should be clear that the number of shared hidden states is not limited by the present application as long as the number is greater than the number of personalized hidden states of each object.

As shown in FIG. 2, each shared hidden state in the second hidden state layer corresponds to a probability distribution (denoted by d in FIG. 2, e.g., d₁To d₈). As an example, when the behavior data described above is continuous feature data (e.g., position data) reflecting the behavior of the object, the probability distribution corresponding to each shared hidden state may include a gaussian distribution, but is not limited thereto. As another example, when the behavior data described above includes discrete feature data (e.g., content IDs) reflecting the behavior of the object, the probability distribution corresponding to each shared hidden state may include a polynomial distribution, but is not limited thereto.

According to an exemplary embodiment of the present application, the model parameters of the hidden state-shared markov model may include a personalized parameter set for each object and a shared parameter set shared by the plurality of objects. Specifically, the personalized parameter set includes a probability of a personalized hidden state of each object in the first hidden state layer, a transition probability between personalized hidden states of each object, and an emission probability of each object from a personalized hidden state to a shared hidden state, and the shared parameter set includes a set of probability distributions corresponding to each shared hidden state.

Next, the training process of the model will be described in detail. Before this, for convenience of description, the above-mentioned personalized parameter set and shared parameter set are visually represented.

In particular, assuming that the number of shared hidden states in the second hidden state layer is M, the set of shared parameters shared by multiple objects may be represented as D ═ D_m(wherein M is equal to or less than M and equal to or greater than 1), each d_mIs the probability distribution corresponding to the mth shared hidden state, which defines the probability distribution of the mth shared hidden state over the observation space.

Furthermore, the personalized parameter set for each object u may be denoted as Φ^u＝{π^u，A^u，B^u}. Assume personalized hidden state for object u with z_nRepresenting and sharing hidden states by c_nDenotes here, n^uIs the probability of the personalized hidden state of the object u in the first hidden state layer, if the object has K personalized hidden states, then

Is the probability that object u is initially in the ith individualized hidden state, where z₁Is the initial personalized hidden state of the object, i is less than or equal to K and greater than or equal to 1.

Is a transition probability matrix between the K personalized hidden states of the object u, wherein,

representing the transition probability of the object u from the ith personalized hidden state to the jth personalized hidden state.

Is a matrix of transmit probabilities, wherein,

represents from the firstThe transmission probability of the ith individualized hidden state in the hidden state layer to the mth shared hidden state in the second hidden state layer.

In general, in an actual scene, objects have only a few distribution states in the observation space, for example, users only shift between a few areas (e.g., home and office), or each user tends to listen to only a few types of music in a set. Therefore, if during training, the personalized hidden states in the hidden markov model of the present application located in the first hidden state layer are transmitted to a small number of shared hidden states in the second hidden state layer with a highly concentrated probability distribution, the trained model will be easier to interpret (in other words, better fit the objective law in the actual scene).

To this end, according to an exemplary implementation of the present application, the training device 120 may construct an objective function for training the machine learning model to include a loss function and a regularization term, where the regularization term is used to constrain a concentration degree of a emission probability distribution of each object from a personalized hidden state to a shared hidden state. Since entropy can measure uncertainty or measure diversity, the regularization term herein can include a constraint term related to the entropy of the emission probability of each object from the personalized hidden state to the shared hidden state, as an example. For example, the constraint term can be structured as

Wherein the content of the first and second substances,

wherein, λ is a real number greater than 0,

indicating a probability of transmission of the u-th object from the i-th personalized hidden state to the m-th shared hidden state, wherein u, i and m are positive integers greater than 0.

Although the constraint term related to entropy is taken as an example of the above regular term, it should be noted that the regular term is not limited to the constraint related to entropyThe bundle term may be any functional term that can constrain the concentration of the emission probability distribution from the personalized hidden state to the shared hidden state. Alternatively, the objective function for training the hidden markov model for the shared hidden state may not include a regularization term for constraining the degree of concentration of the emission probability distribution from the personalized hidden state to the shared hidden state for each object (in this case, λ ═ 0). Alternatively, the objective function may include other regularizations that constrain the complexity of the model in addition to the regularization term described above. Further, the above-mentioned entropy-related constraint term is not limited to being configured to

But can be constructed as a combination of any function term with respect to entropy.

As an example, an objective function according to an exemplary embodiment of the present application may be constructed as follows:

wherein the content of the first and second substances,

is a function of the loss as a function of,

is a sequence training sample of the subject (i.e., each sequence consisting of sequence data observed in the observation space), and

(where N is the length of the sequence), J^uIs a set of sequence training samples for all subjects u, and λ > 0 is the constraint coefficient of the constraint term.

As shown in FIG. 2, for each sequence consisting of sequence data observed in the observation space

All have corresponding theretoTwo hidden state sequences, respectively personalized hidden state sequence

And sharing hidden state sequences

So in equation (1) of the above objective function,

can be expressed as follows:

wherein the content of the first and second substances,

meaning that

By continuously optimizing the objective function by using the sequence training samples, the personalized parameter set phi can be finally determined^u＝{π^u，A^u，B^uD ═ D_m}. Finally, if multiple sequence data in a sequence training sample relate to behavior data of an object at different points in time, the hidden markov model sharing hidden states of the present application may be trained to predict next behavior data of the object after a time-ordered series of historical behavior data for the object. Alternatively, if multiple sequence data in a sequence of training samples relate to state data of the object at different points in time, the hidden markov model sharing a hidden state may be trained to predict the next state data of the object after a series of historical attribute data for a chronological series of historical state data of the object.

For example, if the behavioral data is location data of an object, the machine learning model is trained to predict the location data of the object at the next point in time for a chronological series of historical location data of the object. If the behavioral data is a content ID of content accepted by the user, the machine learning model is trained to predict a content ID of content that the user will accept at a next point in time for a chronological series of content IDs of historical accepted content of the user. If the state data is physiological state data of the subject, the machine learning model is trained to predict the physiological state data of the subject at a next point in time for a chronological series of historical physiological state data of the subject. If the state data is price data of the good or stock, the machine learning model is trained to predict price data of the good or stock at a next point in time for a chronological series of historical price data of the good or stock.

Next, a process of training a hidden markov model sharing a hidden state using the above objective function will be described in detail.

In particular, the training device 120 may determine a lower bound of the objective function based on the jensen inequality using the personalized hidden state sequence and the shared hidden state sequence corresponding to each sequence training sample, and determine model parameters of the model by maximizing the lower bound of the objective function.

First, the training device 120 can utilize the personalized hidden state sequence

And sharing hidden state sequences

To find a lower bound on the jensen inequality based target, which is then optimized to update the model parameters and find a new lower bound until convergence. Here, the lower bound of the objective function L (Φ, D) may be determined as follows:

in the concrete trainingDuring model training, the personalized parameter set { π is initialized first^u}、{A^uAnd { B }^uAnd sharing parameter set D, then, aiming at each sequence training sample of the input object u, updating the posterior probability of the personalized hidden state sequence and the shared hidden state sequence corresponding to the training sample

(the following will be updated for the convenience of the description

Is referred to as E-step) and can be optimized by maximizing L'₁(Φ, D) to update model parameters { π_u}、{A^uAnd { B }^uAnd D (hereinafter, for convenience of description, the step of updating the model parameters will be referred to as an M-step). The training device 120 may repeat the steps E and M continuously until the objective function L (Φ, D) is maximum, and when the objective function takes the maximum value, the corresponding model parameters are the model parameters of the finally trained model.

As mentioned above, by maximizing L 'in M-step'₁(Φ, D) to update model parameters { π^u}、{A^uAnd { B }^uAnd D, the M-step is described in detail below.

First, in M-step, training device 120 may combine the lower bound L 'of the objective function'₁(Φ, D) is transformed to include function terms affected only by the probability of a personalized hidden state, function terms affected only by the transition probability, function terms affected only by the transmission probability, and function terms affected only by the set of shared parameters, and corresponding model parameters are determined by maximizing each function term, respectively. Specifically, L 'may be prepared, for example'₁(Φ, D) becomes:

here, the three auxiliary variables ξ may be defined_n(i，j)、γ_n(i) And ρ_n(i, m) to estimate

Wherein the content of the first and second substances,

and

wherein N is 1, 2. In the M-step, ξ may be used_n(i，j)、γ_n(i) And ρ_n(i, m) substitution

To make L'₁(Φ, D) becomes a form including the above (4) th to (7) th terms, in which the function term (4) is a function term affected only by the probability of the personalized hidden state, the function term (5) is a function term affected only by the transition probability, the function term (6) is a function term affected only by the transmission probability, and the function term (7) is a function term affected only by the shared parameter set. The training device 120 may then determine the corresponding model parameters Φ and D by maximizing the respective function terms, respectively.

Since the above function terms (4), (5) and (7) are concave without any additional terms, the training apparatus 120 can determine the maximum value thereof based on the conventional Baum Welch algorithm, and further determine the corresponding model parameters, and it is clear for a person skilled in the art how to determine the maximum values of the function terms (4), (5) and (7) based on the conventional Baum Welch algorithm, and therefore, the detailed description thereof is omitted here. However, the function term (6) is not always concave and is affected by the above-mentioned constraint term, and therefore, the maximum value thereof cannot be determined based on the conventional Baum Welch algorithm. Here, for a function term affected by emission probability, the present application proposes a manner in which the emission probability can be determined by converting the problem of maximizing the function term into a one-dimensional nonlinear equation problem under a Convex Difference Programming (DCP) framework. This manner will be described below.

For simplicity, let

And b ═ b_m}. Next, the problem of finding b for each of i and u (i.e., the maximization problem of the function term (6)) can be converted into a minimization problem:

wherein the content of the first and second substances,

it has been estimated in the E-step that there is a λ > 0 such that (8) is a non-convex optimization problem.

To optimize such a non-convex function with convergence guarantee b, (8) can be decomposed into convex terms

And the concave part

To meet the formal requirements of the DCP framework.

DCP is a general and powerful framework for solving non-convex problems, according to which the convex upper bound f can be minimized by locally linearizing the concave terms^(t+1)(b) Wherein, f^(t+1)(b) Is represented as follows:

how to solve (9) effectively is the key to achieving fast solution to the non-convex problem. To achieve this goal, the conversion of (9) into a one-dimensional nonlinear equation problem in this application, i.e., there is η such that:

the optimal solution of equation (9) can be determined by

Eta in (1) is obtained. Equation (10) is a simple one-dimensional nonlinear equation problem that can be solved, for example, using newton's method. Specifically, the process of solving (8) under the DCP framework is as follows:

first, initialize b⁽¹⁾Then, for T ═ 1, …, T, with the current b^(t)Converting equation (9) to equation (10) and obtaining b by solving (10) using the Newton method^(t+1). Repeating the above operations to obtain b^(T)The probability of transmission at which the function term (6) is maximum is determined.

A system of training a machine learning model for predicting sequence data and a structure of the machine learning model and the like according to an exemplary embodiment of the present application have been described in detail above with reference to fig. 1 and 2. In one aspect, since the machine learning model of the present application includes two hidden state layers (where a first hidden state layer includes a personalized hidden state for each object and a second hidden state layer includes a plurality of shared hidden states shared by a plurality of objects), it can not only overcome the scarcity of training data but also guarantee the diversity of sequence patterns of different objects. On the other hand, the objective function used for training the machine learning model is constructed to include the regularity for restricting the concentration degree of the emission probability distribution of each object from the personalized hidden state to the shared hidden state, so that the trained machine learning model is easier to interpret and better conforms to the objective rule. In addition, in the model training process, the emission probability is determined by converting the problem of maximizing the function item into a one-dimensional nonlinear equation problem under a DCP framework, so that the emission probability can be quickly solved, and the model training speed can be improved.

It should be noted that, although the model training system 100 is described above as being divided into devices (e.g., the training sample acquiring device 110 and the training device 120) for respectively performing corresponding processes, it is clear to those skilled in the art that the processes performed by the devices may be performed without any specific device division by the model training system 100 or without explicit delimitation between the devices. Furthermore, the model training system 100 described above with reference to fig. 1 is not limited to include the above-described devices, but some other devices (e.g., a storage device, a data processing device, etc.) may be added as needed, or the above devices may be combined.

Fig. 3 is a flowchart illustrating a method of training a machine learning model for predicting sequence data (hereinafter, simply referred to as "model training method" for convenience of description) according to an exemplary embodiment of the present application.

Here, as an example, the model training method shown in fig. 3 may be performed by the model training system 100 shown in fig. 1, may also be implemented entirely in software by a computer program or instructions, and may also be performed by a specifically configured computing system or computing device, for example, by a system including at least one computing device and at least one storage device storing instructions that, when executed by the at least one computing device, cause the at least one computing device to perform the model training method described above. For convenience of description, it is assumed that the model training method shown in fig. 3 is performed by the model training system 100 shown in fig. 1, and that the model training system 100 may have the configuration shown in fig. 1.

Referring to fig. 3, in step S310, the training sample acquiring device 110 may acquire a set of sequence training samples. Here, the set of sequence training samples may include a plurality of sequence training samples for each of a plurality of subjects, and each sequence training sample includes a plurality of sequence data arranged in time order. As an example, the plurality of sequence data herein may relate to behavior data of the object at different points in time, the machine learning model being trained to predict a next behavior data of the object after a time-ordered series of historical behavior data for the object. As another example, where the plurality of sequence data relate to state data of the subject at different points in time, the machine learning model is trained to predict a next state data of the subject after a series of historical attribute data for a chronological series of historical state data of the subject. The sequence data, behavior data, status data, etc. have been described above with reference to fig. 1, and will not be described herein again, and the relevant contents described with reference to fig. 1 are also applicable thereto.

Specifically, in step S310, the training sample acquiring device 110 may acquire the historical data record sets of the plurality of subjects, and construct the sequence training sample set based on the historical data record sets of the plurality of subjects. Here, for a plurality of time-sequentially arranged historical data records of each object, if a time interval between two adjacent historical data records satisfies a preset condition, segmentation is performed, and then a plurality of sequence training samples of the object are obtained. For example, if the time interval between any two adjacent historical data records is greater than a predetermined time threshold, the segmentation is performed. Since the way of obtaining the sequence training sample of each object by the way of segmentation has already been introduced in the description of fig. 1, it is not repeated here.

Next, in step S320, the training device 120 may train the machine learning model based on the sequence training sample set obtained in step S310. Here, the machine learning model is a hidden markov model including two hidden state layers. Specifically, a personalized hidden state of each object in the plurality of objects may be included in the first hidden state layer, and a plurality of shared hidden states shared by the plurality of objects may be included in the second hidden state layer.

According to an example embodiment, each shared hidden state may correspond to a probability distribution. As described above, the sequence data can include behavioral data of the object. As an example, the behavior data may include continuous feature data reflecting behavior of the object, in which case the probability distribution corresponding to each shared hidden state may include a gaussian distribution, but is not limited thereto. As another example, the behavior data may include discrete feature data reflecting the behavior of the object, in which case the probability distribution corresponding to each shared hidden state may include, but is not limited to, a polynomial distribution. Here, the continuous feature data may include position data of the object, in which case the machine learning model may be trained to predict next position data of the object for a series of historical position data of the object (i.e., the machine learning model is trained to predict a movement position of the object). As an example, the discrete feature data may include a content ID of content accepted by the subject, in which case the machine learning model may be trained to predict a content ID of the next content that the subject will accept for a series of content IDs of historical accepted content of the subject. It should be noted that the continuous feature data and the discrete feature data may include different types of data of the object for different application scenarios.

In a hidden markov model of the present application comprising two hidden state layers, the number of personalized hidden states for each object in the first hidden state layer may be less than the number of the plurality of shared hidden states in the second hidden state layer. Furthermore, the model parameters of the hidden markov model may include a personalized parameter set for each object and a shared parameter set shared by the plurality of objects. In particular, the personalized parameter set may include a probability of a personalized hidden state for each object in the first hidden state layer, a transition probability between personalized hidden states for each object, and an emission probability from a personalized hidden state to a shared hidden state for each object, and the shared parameter set may include a set of probability distributions corresponding to each shared hidden state.

Further, an objective function used to train the machine learning model may be constructed to include a loss function and a regularization term. Here, the regularization term is used to constrain the degree of concentration of the emission probability distribution of each object from the personalized hidden state to the shared hidden state. By constructing the objective function to include a constraint term for the concentration degree of the emission probability distribution of each object from the personalized hidden state to the shared hidden state, the trained model can be easier to interpret, i.e., better conform to the objective rules in the actual scene. As an example, the regularization term herein may include a constraint term related to the entropy of the emission probability of each object from the personalized hidden state to the shared hidden state. For example, a constraint term can be constructed as

Wherein the content of the first and second substances,

wherein, λ is a real number greater than 0,

indicating a probability of transmission of a u-th object of the plurality of objects from an i-th personalized hidden state to an m-th shared hidden state, wherein u, i, and m are each positive integers greater than 0.

The above descriptions of the machine learning model of the present application mentioned in the descriptions of fig. 1 and 2 are all adapted to fig. 3, and therefore, are not repeated here.

In step S320, the training device 120 may determine the lower bound of the objective function based on the jensen inequality using the personalized hidden state sequence and the shared hidden state sequence corresponding to each sequence training sample, and determine the model parameters by maximizing the lower bound of the objective function. Specifically, in step S320, the training device 120 may transform the lower bound of the objective function to include a function term affected only by the probability of the personalized hidden state, a function term affected only by the transition probability, a function term affected only by the emission probability, and a function term affected only by the set of shared parameters, and determine the corresponding model parameters by respectively maximizing each function term. In particular, for the function term affected by the transmission probability, the training device 120 may determine the transmission probability by converting the problem of maximizing the function term into a one-dimensional nonlinear equation problem under the DCP framework. In the description of fig. 1, the description of how to determine the model parameters has been described, and the description is omitted here.

In addition, the contents mentioned above in describing each device included in the model training system with reference to fig. 1 are all applicable here, so regarding the relevant details involved in the above steps, reference may be made to the corresponding description of fig. 1, and no further description is given here.

The above-described model training method according to the exemplary embodiment of the present application may not only overcome scarcity of training data but also ensure diversity of sequence patterns of different objects due to the inclusion of two hidden state layers, so that the trained model may provide more accurate prediction of sequence data, and furthermore, by including a regularization term for constraining emission probability in an objective function for training the model, the trained model may be easier to interpret.

Hereinafter, a process of predicting sequence data using the machine learning model trained as described above will be described with reference to fig. 4 and 5.

Fig. 4 is a block diagram illustrating a system for predicting sequence data using a machine learning model (hereinafter, simply referred to as "prediction system" for convenience of description) 400 according to an exemplary embodiment of the present application.

Referring to fig. 4, the prediction system 400 may include a prediction sample acquisition device 410 and a prediction device 420. In particular, the prediction sample acquisition device 410 may be configured to acquire a sequence prediction sample of a subject. Here, the sequence prediction sample includes a plurality of sequence data of the object arranged in time series. The prediction means 420 may perform prediction on the sequence prediction sample acquired by the prediction sample acquisition means 410 using a machine learning model to provide a prediction result on the next sequence data after the plurality of sequence data.

Here, the machine learning model may be trained in advance to predict a next sequence data after a series of sequence data arranged in time series, and may be a hidden markov model including two hidden state layers. In particular, a personalized hidden state for each of a plurality of objects may be included in a first hidden state layer, and a plurality of shared hidden states shared by the plurality of objects may be included in a second hidden state layer. The machine learning model is the hidden markov model sharing the hidden states mentioned in the description of fig. 1 to 3, and the training process thereof may be as described with reference to fig. 3, and will not be described herein again.

As an example, the plurality of sequence data may relate to behavior data of the object at different time points (for example, movement position data of the object, behavior of click content of the object, and the like), or may relate to status data of the object at different time points (for example, physiological status data of a living body, price of a commodity, trading price of stocks, and the like). Specifically, the behavior data may include both continuous characteristic data reflecting the behavior of the object and discrete characteristic data reflecting the behavior of the object. For example, the continuous characteristic data may include location data of the object, and the discrete characteristic data may include a content ID of the content accepted by the object.

For example, in a scene in which a movement trajectory of an object (for example, a user or a vehicle) is predicted, the prediction sample acquisition means 110 may acquire a series of historical position data of the object arranged in time series to constitute the above-described series prediction sample; in the content recommendation scenario, the prediction sample acquisition means 110 may acquire content IDs of a series of history accepted contents arranged in time series of the user to constitute the above-described sequence prediction sample; in a scenario involving the prediction of a physiological state of a subject (human or animal), the prediction sample acquisition means 110 may acquire a time-series of historical physiological state data of the subject to constitute the above-described series prediction sample; in a scenario involving a price prediction of a commodity or a stock, the prediction sample acquisition device 110 may acquire a chronological series of historical price data of the commodity or the stock to construct the above-mentioned serial prediction sample.

In the hidden markov models of shared hidden states of the present application, each shared hidden state may correspond to a probability distribution. If the behavior data is continuous feature data reflecting the behavior of the object, the probability distribution corresponding to each shared hidden state may include a Gaussian distribution, but is not limited thereto. If the behavior data is discrete feature data reflecting the behavior of an object, the probability distribution corresponding to each shared hidden state may include, but is not limited to, a polynomial distribution.

As described above with reference to fig. 1-3, the number of personalized hidden states for each object in the first hidden state layer may be less than the number of the plurality of shared hidden states in the second hidden state layer. Further, the model parameters of the machine learning model described above may include a personalized parameter set for each object and a shared parameter set shared by the plurality of objects. In particular, the personalized parameter set may include a probability of a personalized hidden state for each object in the first hidden state layer, a transition probability between personalized hidden states for each object, and an emission probability from a personalized hidden state to a shared hidden state for each object, and the shared parameter set may include a set of probability distributions corresponding to each shared hidden state.

As described above, the prediction sample acquisition means 410 may acquire a sequence prediction sample of the subject. Specifically, the prediction sample acquisition means 410 may acquire a plurality of history data records of the subject, arrange the plurality of history data records in chronological order, and construct the sequence prediction sample based on the arranged plurality of history data records. Here, if the time interval between two adjacent historical data records in the arranged multiple historical data records meets a preset condition, performing segmentation, and further obtaining a sequence prediction sample of the object.

As an example, each of the plurality of historical data records for the object may include a plurality of data attribute fields, such as an object identification field, an object behavior data field, a behavior occurrence time field, and the like. For example, the object behavior data field may include location data of the object at the time point indicated by the behavior occurrence time field (e.g., the location data may be represented by a vector including longitude and latitude). Alternatively, the object behavior data field may include a content ID of content that the object accepts at a time point corresponding to the time indicated by the behavior occurrence time field (e.g., a news ID of news that the user clicks, or a singer ID of music that the user listens to, etc.). It should be noted that the present application does not limit the type of sequence data as long as it is a series of data that appears in chronological order. Further, the present application also does not limit the type of the behavior data as long as it is data reflecting a series of behaviors of the object that proceed in time series.

Further, the sequence data record of the object may be a data record generated on-line, a data record generated and stored in advance, or a data record received from an external data source (for example, a server, a database, or the like) through an input device or a transmission medium. The data records may be stored, for example, in the form of data tables in local storage media or cloud computing platforms (including but not limited to public and private clouds, etc.) with data storage functionality. In addition, as for the manner of acquiring the data records, the above-mentioned historical data records may be input to the predictive sample acquiring device 410 through an input device, or may be automatically generated by the predictive sample acquiring device 410 according to the acquired data, or may be acquired by the predictive sample acquiring device 410 from a network (e.g., a storage medium (e.g., a data warehouse) on the network), and furthermore, an intermediate data exchanging device such as a server may facilitate the predictive sample acquiring device 410 to acquire corresponding data from an external data source. Here, the acquired history data record may be further converted into an easily processed format, for example, form data. According to an exemplary embodiment of the present application, the plurality of history data records of the object mentioned above may refer to a series of sequence data with a certain continuity (e.g., continuity of behavior in time), for example, a content ID of a content that a user continuously clicks after opening an App of a certain news information class until exiting the App.

As described above, the prediction device 410 can perform prediction on a sequence prediction sample using the machine learning model to provide a prediction result regarding next sequence data following the plurality of sequence data. Specifically, the prediction device 420 may first determine an individualized parameter set for the subject in the trained model parameters of the machine learning model, then determine a probability of occurrence of each next candidate sequence data after the plurality of sequence data using the determined individualized parameter set and the shared parameter set for the subject to be predicted, and determine the next sequence data after the plurality of sequence data based on the determined probabilities. Here, as illustrated by the schematic diagram shown in FIG. 2, the prediction device 420 may first predict the probability of the personalized hidden state of the object (e.g., π as illustrated in FIG. 2^u) And personalized hidden state transition probability (e.g., A shown in FIG. 2^u) Determining a personalized hidden state sequence for the object (e.g., as shown in FIG. 2)

) Second, the emission probability of the object from the personalized hidden state to the shared hidden state (e.g., B shown in fig. 2) may be determined according to the determined personalized hidden state sequence and the object^u) Determining a shared hidden state sequence corresponding to the personalized hidden state sequence (e.g., as shown in FIG. 2)

) Finally, a probability of occurrence of each next candidate sequence data after the plurality of sequence data may be determined based on the determined shared hidden state sequence and the set of shared parameters (e.g., D shown in fig. 2).

Here, assuming that the above-described plurality of sequence data relate to position data of the object at different points in time, for example, the object is located at positions 1 to 5 (for example, the positions may be represented by latitude and longitude) at the first to fifth points in time, respectively, the prediction means 420 may predict the probability of the object appearing at the next candidate position according to the prediction process described above. For example, assuming that there are three candidate positions (candidate position 1 to candidate position 3, for example, the three candidate positions correspond to building 1, building 2, and building 3, respectively), the prediction apparatus 420 may calculate the probabilities that the object is next located at candidate position 1 to candidate position 3, respectively. Then, the predicting device 420 may determine the next sequence data after the plurality of sequence data based on the determined probability. For example, the prediction means 420 may select a candidate position with the highest calculated probability among the candidate positions 1 to 3 as the next sequence data. Assuming that the probability that the object is predicted to be located next at the candidate position 3 is the highest, the prediction means 420 may determine the position data of the building 3 as the next sequence data here.

For example, if the behavior data is position data of an object, the prediction device 420 may predict the position data of the object at the next point in time using a chronological series of historical position data of the object by the machine learning model. After predicting the location to which the user or vehicle will move next using the machine learning model, for example, the prediction system 400 can provide the prediction to a ride service provider, which can then deploy the vehicle (e.g., a shared bicycle) to that location to better provide ride service to the user.

If the behavior data is a content ID of a content accepted by the user, the prediction means 420 may predict a content ID of a content accepted by the user at the next point in time using the content IDs of a series of history acceptance contents chronologically of the user by the machine learning model. After predicting the content ID of the content that the user can accept next, the prediction system 400 may provide the prediction result to the content service provider, for example, and then the content provider may recommend the content corresponding to the content ID to the user, thereby facilitating accurate content recommendation.

If the state data is physiological state data of the subject, the prediction means 420 may predict the physiological state data of the subject at a next point in time using a chronological sequence of historical physiological state data of the subject using the machine learning model. For example, after predicting the user's next physiological state data, the prediction system 400 may provide the prediction to a healthcare provider, which may then instruct the user to take countermeasures in advance against changes in the physiological state based on the prediction.

If the state data is price data of the goods or stocks, the predicting device 420 may predict price data of the goods or stocks at a next time point using a time-series of historical price data of the goods or stocks by the machine learning model. After predicting the price of the good or stock at the next point in time, for example, the prediction system 400 may provide the prediction to the user to assist the user in making decisions, such as helping the user decide whether to purchase the good or stock.

It should be noted that although only four application scenarios involving sequence data prediction are listed above, it is clear to those skilled in the art that the scenarios to which the prediction system 400 can be applied are not limited to the above four application scenarios, but can be applied to any scenario involving the generation of sequence data of an object.

The prediction system according to the exemplary embodiment can predict sequence data using a hidden markov model including two hidden state layers, thereby effectively providing personalized sequence data prediction for different objects, and improving the accuracy of prediction.

In addition, it should be noted that although the prediction system 400 is described above as being divided into devices (e.g., the prediction sample acquisition device 410 and the prediction device 420) for respectively performing corresponding processes, it is clear to those skilled in the art that the processes performed by the devices may be performed without any specific device division by the prediction system or explicit demarcation between the devices. Furthermore, the prediction system 400 described above with reference to fig. 4 is not limited to include the above-described devices, but some other devices (e.g., a storage device, a data processing device, etc.) may be added as needed, or the above devices may be combined. Also, as an example, the model training system 100 and the prediction system 400 described above with reference to fig. 1 may also be combined into one system or be systems independent of each other, which is not limited in this application.

Fig. 5 is a flowchart illustrating a method of predicting sequence data using a machine learning model (hereinafter, simply referred to as "prediction method" for convenience of description) according to an exemplary embodiment of the present application.

Here, as an example, the prediction method shown in fig. 5 may be performed by the prediction system 400 shown in fig. 4, may also be implemented entirely in software by a computer program or instructions, and may also be performed by a specifically configured computing system or computing device, for example, by a system including at least one computing device and at least one storage device storing instructions that, when executed by the at least one computing device, cause the at least one computing device to perform the prediction method described above. For convenience of description, it is assumed that the prediction method shown in fig. 5 is performed by the prediction system 400 shown in fig. 4, and that the prediction system 400 may have the configuration shown in fig. 4.

Referring to fig. 5, in step S510, the prediction sample acquisition means 410 may acquire a sequence prediction sample of the subject. Here, the sequence prediction sample may include a plurality of sequence data of the object arranged in time series. As an example, the plurality of sequence data may relate to behavior data of the object at different points in time, or the plurality of sequence data may relate to status data of the object at different points in time. Here, the behavior data may include continuous feature data reflecting behavior of the object, for example, the continuous feature data may include position data of the object, but is not limited thereto. Alternatively, the behavior data may include discrete feature data reflecting behavior of the object, e.g., without limitation, the discrete feature data including a content ID of content accepted by the object.

Specifically, in step S510, the prediction sample acquisition device 410 may acquire a plurality of history data records of the subject, arrange the plurality of history data records in time series, and construct the sequence prediction sample based on the arranged plurality of history data records. Here, if the time interval between two adjacent historical data records in the arranged multiple historical data records meets a preset condition, performing segmentation, and further obtaining a sequence prediction sample of the object.

Next, at step S520, the prediction device 420 may perform prediction on the sequence prediction sample using a machine learning model to provide a prediction result regarding next sequence data after the plurality of sequence data. Here, the machine learning model may be trained in advance to predict a next sequence data after a series of sequence data arranged in time series, and the machine learning model is a hidden markov model including two hidden state layers, where a personalized hidden state of each of a plurality of objects may be included in a first hidden state layer, and a plurality of shared hidden states shared by the plurality of objects may be included in a second hidden state layer. Here, each shared hidden state may correspond to a probability distribution. As described above, the behavior data may include continuous feature data reflecting the behavior of the object, and at this time, the probability distribution corresponding to each shared hidden state may include a gaussian distribution, but is not limited thereto. As described above, the behavior data may also include discrete feature data reflecting the behavior of the object, and in this case, the probability distribution corresponding to each shared hidden state may include a polynomial distribution, but is not limited thereto. Further, in the above machine learning model, the number of personalized hidden states for each object in the first hidden state layer may be less than the number of the plurality of shared hidden states in the second hidden state layer.

For example, if the above behavior data is position data of an object, the prediction apparatus 420 may predict position data of the object at the next time point using a time-series of historical position data of the object by the machine learning model at step S520. For example, if the above behavior data is the content ID of the content accepted by the user, the prediction means 420 may predict the content ID of the content that the user will accept at the next point in time using the content IDs of a series of history acceptance contents chronologically arranged for the user by the machine learning model at step S520. For example, if the above state data is physiological state data of the subject, the prediction apparatus 420 may predict physiological state data of the subject at the next time point using a time-series of historical physiological state data of the subject using the machine learning model at step S520. For example, if the above-mentioned state data is price data of a commodity or a stock, the prediction apparatus 420 may predict price data of the commodity or the stock at the next time point using a chronological series of historical price data of the commodity or the stock by the machine learning model at step S520.

According to an exemplary embodiment, the model parameters of the above machine learning model may include a personalized parameter set for each object and a shared parameter set shared by the plurality of objects. In particular, the personalized parameter set may include a probability of a personalized hidden state for each object in the first hidden state layer, a transition probability between personalized hidden states for each object, and an emission probability from a personalized hidden state to a shared hidden state for each object, the shared parameter set including a set of probability distributions corresponding to each shared hidden state.

Specifically, in step S520, the prediction device 420 may first determine an individualized parameter set for the subject in the model parameters of the machine learning model, then determine a probability of occurrence of each next candidate sequence data after the plurality of sequence data using the determined individualized parameter set and the shared parameter set for the subject, and finally determine the next sequence data after the plurality of sequence data based on the determined probabilities. For example, in determining the probability of each next candidate sequence data occurring after the plurality of sequence data, the prediction device 420 may first determine a personalized hidden state sequence for the subject based on the probability of the personalized hidden state of the subject and the transition probability between personalized hidden states. Then, the predicting device 420 may determine a shared hidden state sequence corresponding to the personalized hidden state sequence according to the determined personalized hidden state sequence and the emission probability of the object from the personalized hidden state to the shared hidden state, and finally, the predicting device 420 may determine a probability of occurrence of each next candidate sequence data after the plurality of sequence data according to the determined shared hidden state sequence and the set of shared parameters.

Since the prediction method shown in fig. 5 can be performed by the prediction system 400 shown in fig. 4, for the relevant details involved in the above steps, reference may be made to the corresponding description of fig. 4, and details are not repeated here.

The prediction method according to the exemplary embodiment described above predicts sequence data by using a hidden markov model including two hidden state layers, so that personalized sequence data prediction can be effectively provided for different objects, and thus the prediction accuracy of sequence prediction data can be improved.

The model training apparatus and the model training method, and the prediction system and the prediction method according to the exemplary embodiments of the present application have been described above with reference to fig. 1 to 5.

However, it should be understood that: the systems and devices shown in fig. 1 and 4, respectively, may be configured as software, hardware, firmware, or any combination thereof that performs the specified functions. For example, the systems or devices may correspond to application specific integrated circuits, to pure software code, or to modules combining software and hardware. Further, one or more functions implemented by these systems or apparatuses may also be performed collectively by components in a physical entity device (e.g., a processor, a client, or a server, etc.).

Further, the above method may be implemented by instructions recorded on a computer-readable storage medium, for example, according to an exemplary embodiment of the present application, there may be provided a computer-readable storage medium storing instructions that, when executed by at least one computing device, cause the at least one computing device to perform the steps of: obtaining a set of sequential training samples, wherein the set of sequential training samples comprises a plurality of sequential training samples for each of a plurality of subjects, and each sequential training sample comprises a plurality of sequence data arranged in a chronological order; training the machine learning model based on the set of sequence training samples, wherein the machine learning model is a hidden Markov model comprising two hidden state layers, wherein a first hidden state layer comprises a personalized hidden state for each of the plurality of objects and a second hidden state layer comprises a plurality of shared hidden states shared by the plurality of objects.

Further, according to another exemplary embodiment of the present application, a computer-readable storage medium storing instructions that, when executed by at least one computing device, cause the at least one computing device to perform the steps of: obtaining a sequence prediction sample of a subject, wherein the sequence prediction sample comprises a plurality of time-ordered sequence data of the subject; performing prediction on the sequence prediction sample using the machine learning model to provide a prediction result regarding next sequence data following the plurality of sequence data, wherein the machine learning model is trained in advance to predict next sequence data following a series of sequence data arranged in time order, and the machine learning model is a hidden Markov model including two hidden state layers, wherein a first hidden state layer includes a personalized hidden state of each object of a plurality of objects, and a second hidden state layer includes a plurality of shared hidden states shared by the plurality of objects.

The instructions stored in the computer-readable storage medium can be executed in an environment deployed in a computer device such as a client, a host, a proxy device, a server, etc., and it should be noted that the instructions can also perform more specific processing when the above steps are performed, and the content of the further processing is mentioned in the process described with reference to fig. 3 and 5, so that the further processing will not be described again here to avoid repetition.

It should be noted that the model training system and the prediction system according to the exemplary embodiments of the present disclosure may fully rely on the execution of a computer program or instructions to implement the respective functions, i.e., respective devices correspond to respective steps in the functional architecture of the computer program, so that the entire system is called by a specialized software package (e.g., lib library) to implement the respective functions.

On the other hand, when the systems and apparatuses shown in fig. 1 and 4 are implemented in software, firmware, middleware or microcode, program code or code segments for performing the corresponding operations may be stored in a computer-readable medium such as a storage medium, so that at least one processor or at least one computing device may perform the corresponding operations by reading and executing the corresponding program code or code segments.

For example, according to an exemplary embodiment of the present application, a system may be provided comprising at least one computing device and at least one storage device storing instructions, wherein the instructions, when executed by the at least one computing device, cause the at least one computing device to perform the steps of: obtaining a set of sequential training samples, wherein the set of sequential training samples comprises a plurality of sequential training samples for each of a plurality of subjects, and each sequential training sample comprises a plurality of sequence data arranged in a chronological order; training the machine learning model based on the set of sequence training samples, wherein the machine learning model is a hidden Markov model comprising two hidden state layers, wherein a first hidden state layer comprises a personalized hidden state for each of the plurality of objects and a second hidden state layer comprises a plurality of shared hidden states shared by the plurality of objects.

For example, according to another exemplary embodiment of the present application, a system may be provided comprising at least one computing device and at least one storage device storing instructions, wherein the instructions, when executed by the at least one computing device, cause the at least one computing device to perform the steps of: obtaining a sequence prediction sample of a subject, wherein the sequence prediction sample comprises a plurality of time-ordered sequence data of the subject; performing prediction on the sequence prediction sample using the machine learning model to provide a prediction result regarding next sequence data following the plurality of sequence data, wherein the machine learning model is trained in advance to predict next sequence data following a series of sequence data arranged in time order, and the machine learning model is a hidden Markov model including two hidden state layers, wherein a first hidden state layer includes a personalized hidden state of each object of a plurality of objects, and a second hidden state layer includes a plurality of shared hidden states shared by the plurality of objects.

In particular, the above-described system may be deployed in a server or client or on a node in a distributed network environment. Further, the system may be a PC computer, tablet device, personal digital assistant, smart phone, web application, or other device capable of executing the set of instructions. In addition, the system may also include a video display (such as a liquid crystal display) and a user interaction interface (such as a keyboard, mouse, touch input device, etc.). In addition, all components of the system may be connected to each other via a bus and/or a network.

The system here need not be a single system, but can be any collection of devices or circuits capable of executing the above instructions (or sets of instructions) either individually or in combination. The system may also be part of an integrated control system or system manager, or may be configured as a portable electronic device that interfaces with local or remote (e.g., via wireless transmission).

In the system, the at least one computing device may comprise a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a programmable logic device, a dedicated processor system, a microcontroller, or a microprocessor. By way of example, and not limitation, the at least one computing device may also include analog processors, digital processors, microprocessors, multi-core processors, processor arrays, network processors, and the like. The computing device may execute instructions or code stored in one of the storage devices, which may also store data. Instructions and data may also be transmitted and received over a network via a network interface device, which may employ any known transmission protocol.

The memory device may be integrated with the computing device, for example, by having RAM or flash memory disposed within an integrated circuit microprocessor or the like. Further, the storage device may comprise a stand-alone device, such as an external disk drive, storage array, or any other storage device usable by a database system. The storage device and the computing device may be operatively coupled or may communicate with each other, such as through I/O ports, network connections, etc., so that the computing device can read instructions stored in the storage device.

While exemplary embodiments of the present application have been described above, it should be understood that the above description is exemplary only, and not exhaustive, and that the present application is not limited to the exemplary embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the present application. Therefore, the protection scope of the present application shall be subject to the scope of the claims.

Claims

1. A method of training a machine learning model for predicting sequence data, comprising:

obtaining a set of sequential training samples, wherein the set of sequential training samples comprises a plurality of sequential training samples for each of a plurality of subjects, and each sequential training sample comprises a plurality of sequence data arranged in a chronological order;

training the machine learning model based on the set of sequence training samples,

wherein the machine learning model is a hidden Markov model comprising two hidden state layers, wherein a first hidden state layer comprises a personalized hidden state for each of the plurality of objects and a second hidden state layer comprises a plurality of shared hidden states shared by the plurality of objects.

2. The method of claim 1, wherein the plurality of sequence data relate to behavioral data of the subject at different points in time, the machine learning model being trained to predict, for a chronological series of historical behavioral data of the subject, a next behavioral data of the subject after the series of historical behavioral data; or

The plurality of sequence data relate to state data of the subject at different points in time, the machine learning model being trained to predict a next state data of the subject after a series of historical attribute data for a chronological series of historical state data of the subject.

3. The method of claim 1, wherein the step of obtaining a set of sequence training samples comprises:

acquiring a historical data record set of the plurality of objects;

and constructing the sequence training sample set based on the historical data record sets of the objects, wherein for a plurality of historical data records of each object arranged in time sequence, if the time interval between two adjacent historical data records meets a preset condition, segmentation is carried out, and then a plurality of sequence training samples of the object are obtained.

4. A computer-readable storage medium storing instructions that, when executed by at least one computing device, cause the at least one computing device to perform the method of any of claims 1 to 3.

5. A system comprising at least one computing device and at least one storage device storing instructions that, when executed by the at least one computing device, cause the at least one computing device to perform the method of any of claims 1 to 3.

6. A system for training a machine learning model for predicting sequence data, comprising:

a training sample acquisition device configured to acquire a set of sequential training samples, wherein the set of sequential training samples includes a plurality of sequential training samples for each of a plurality of subjects, and each sequential training sample includes a plurality of sequence data arranged in chronological order;

training means configured to train the machine learning model based on the set of sequence training samples,

7. A method of predicting sequence data using a machine learning model, comprising:

obtaining a sequence prediction sample of a subject, wherein the sequence prediction sample comprises a plurality of time-ordered sequence data of the subject;

performing, with the machine learning model, a prediction for the sequence prediction sample to provide a prediction result for a next sequence data after the plurality of sequence data,

wherein the machine learning model is trained in advance to predict a next sequence data following a series of sequence data arranged in time series, and is a hidden Markov model including two hidden state layers, wherein a first hidden state layer includes a personalized hidden state of each of a plurality of objects, and a second hidden state layer includes a plurality of shared hidden states shared by the plurality of objects.

8. A computer-readable storage medium storing instructions that, when executed by at least one computing device, cause the at least one computing device to perform the method of claim 7.

9. A system comprising at least one computing device and at least one storage device storing instructions that, when executed by the at least one computing device, cause the at least one computing device to perform the method of claim 7.

10. A system for predicting sequence data using a machine learning model, comprising:

a prediction sample acquisition device configured to acquire a sequence prediction sample of a subject, wherein the sequence prediction sample includes a plurality of sequence data of the subject arranged in time series;

a prediction device configured to perform prediction for the sequence prediction sample using the machine learning model to provide a prediction result regarding next sequence data following the plurality of sequence data,