CN115438787A

CN115438787A - Training method and device of behavior prediction system

Info

Publication number: CN115438787A
Application number: CN202211174369.5A
Authority: CN
Inventors: 赵科科
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2022-09-26
Filing date: 2022-09-26
Publication date: 2022-12-06

Abstract

The embodiment of the specification provides a training method and a training device for a behavior prediction system. The method comprises the following steps: acquiring a first training sample, which comprises a plurality of historical objects of a preset behavior made by a first user, statistical behavior characteristics aiming at the historical objects, a target object and a behavior label indicating whether the first user makes the preset behavior on the target object; inputting a first training sample into a behavior prediction system comprising an attention network and a prediction network; determining a first degree of association between each history object and a target object by using an attention network, determining a second degree of association between each history object and the corresponding statistical behavior characteristic thereof, and determining an attention weight corresponding to each history object based on the first degree of association and the second degree of association; processing the target object, each historical object and the corresponding attention weight by using a prediction network to obtain a behavior prediction result; and training a behavior prediction system according to the behavior prediction result and the behavior label.

Description

Training method and device of behavior prediction system

Technical Field

One or more embodiments of the present disclosure relate to the field of machine learning technologies, and in particular, to a training method and apparatus for a behavior prediction system.

Background

With the development of science and technology and the progress of society, more and more service platforms emerge, and various services are provided for users so as to meet various requirements of the users in life and work. In order to realize thousands of people and thousands of faces, a plurality of service platforms predict user behaviors by utilizing a machine learning technology, so that personalized service schemes are customized for users according to prediction results. For example, the video platform may predict the click probability of a certain user on various videos, thereby determining the category and the order of pushing video streams to the user. As another example, the shopping platform may predict a user's preference for a particular item to determine whether to recommend the item to the user.

Obviously, it is desirable that the more accurate the prediction result for the user behavior is, the better. However, the current manner of predicting user behavior is single, and it is difficult to meet the increasing demands of practical applications. Therefore, a scheme is needed to be provided, which can effectively improve the accuracy of the user behavior prediction result, thereby better serving the user and effectively improving the user experience.

Disclosure of Invention

One or more embodiments of the present disclosure describe a training method and apparatus for a behavior prediction system, which can better learn user interests, thereby effectively improving accuracy of a prediction result for user behavior.

According to a first aspect, there is provided a method of training a behaviour prediction system, comprising: the method comprises the steps of obtaining a first training sample, wherein the first training sample comprises a plurality of historical objects of a first user for making a predetermined behavior, statistical behavior characteristics aiming at the historical objects, a target object and a behavior label indicating whether the first user makes the predetermined behavior on the target object after the plurality of historical objects. Inputting the first training sample into a behavior prediction system comprising an attention network and a prediction network; determining a first degree of association between each history object and a target object by using the attention network, determining a second degree of association between each history object and the corresponding statistical behavior feature, and determining an attention weight corresponding to each history object based on the first degree of association and the second degree of association; and processing the target object, the historical objects and the attention weights corresponding to the target object and the historical objects by using the prediction network to obtain a behavior prediction result. And training the behavior prediction system according to the behavior prediction result and the behavior label.

In one embodiment, the first training sample comprises a historical behavior matrix, a first dimension of the historical behavior matrix corresponding to object identification and a second dimension corresponding to statistical behavior features.

In one embodiment, obtaining a first training sample comprises: acquiring a plurality of behavior records of the first user making the predetermined behavior for a plurality of times; and carrying out statistical processing on the behavior records to obtain the historical objects involved in the behavior records and the statistical behavior characteristics.

In one embodiment, the behavior prediction system further comprises an embedding layer, the method further comprising: processing the identification of each object in the first training sample by using the embedding layer to obtain a corresponding embedding vector; determining a first association degree between each history object and a target object, wherein the determining comprises: and calculating a first similarity between the embedded vector of each history object and the embedded vector of the target object as the first association degree.

In one embodiment, the behavior prediction system further includes an object coding layer and a feature coding layer, and the method further includes: processing the embedded vectors of the historical objects by using the object coding layer to obtain corresponding object coding vectors; processing the statistical behavior characteristics by utilizing the characteristic coding layer to obtain corresponding characteristic coding vectors; determining a second association degree between each historical object and the corresponding statistical behavior feature thereof, wherein the determining the second association degree comprises the following steps: and calculating a second similarity between the object coding vector and the feature coding vector as the second correlation degree.

In a specific embodiment, the statistical behavior feature includes a plurality of statistical features corresponding to a plurality of statistical terms, and the feature encoding layer includes a plurality of feature encoding layers corresponding to the plurality of statistical terms; wherein, processing the statistical behavior characteristics by using the characteristic coding layer to obtain corresponding characteristic coding vectors comprises: correspondingly processing the plurality of statistical characteristics by using the plurality of characteristic coding layers to obtain a plurality of characteristic coding vectors; wherein calculating a second similarity between the object coding vector and the feature coding vector as the second degree of association comprises: and respectively calculating a plurality of second similarity degrees between the object code vector and the plurality of feature code vectors as a plurality of second association degrees.

In a particular embodiment, the statistical behavior characteristic comprises at least one of: number of actions, action period, object category.

In one embodiment, the statistical behavior feature comprises a behavior period, and the prediction network comprises a period interest characterization layer, a period interest interaction layer and a prediction layer; wherein, processing the target object, the historical objects and the attention weights corresponding to the target object, the historical objects and the attention weights by using the prediction network to obtain a behavior prediction result comprises: utilizing the time interval interest characterization layer to perform weighted summation on the corresponding embedded vectors by utilizing the attention weights corresponding to the historical objects with the same behavior time interval so as to obtain time interval interest characterization vectors under the behavior time interval; processing the time interval interest characterization vectors under each behavior time interval by using the time interval interest interaction layer to obtain a first comprehensive interest characterization vector; and processing the first comprehensive interest representation vector and the embedded vector of the target object by utilizing the prediction layer to obtain the behavior prediction result.

In a specific embodiment, the acquiring of the behavior period includes: aiming at each historical object, acquiring a plurality of behavior moments when the first user makes the predetermined behavior for a plurality of times within a predetermined time; and determining a duration subinterval to which the last behavior moment in the behavior moments belongs in the preset duration, and taking the interval sequence number of the duration subinterval as a corresponding behavior time interval and classifying the behavior time interval into the statistical behavior characteristic.

In a specific embodiment, the time-interval interest interaction layer is implemented as a time sequence network, wherein the processing, by using the time-interval interest interaction layer, the time-interval interest characterization vectors in each behavior time interval to obtain a first comprehensive interest characterization vector includes: and sequentially taking the time interval interest characterization vectors in each behavior time interval as the input of the time sequence network to obtain the first comprehensive interest characterization vector.

In a specific embodiment, processing, by using the prediction layer, the first synthetic interest characterization vector and the embedded vector of the target object to obtain the behavior prediction result includes: and utilizing the prediction layer to perform fusion processing on the first comprehensive interest representation vector and the embedded vector of the target object to obtain a fusion vector, and performing linear transformation and/or nonlinear transformation processing on the fusion vector to obtain the behavior prediction result.

In a more specific embodiment, the fusion process includes stitching, summing, or bit-wise multiplying.

In one embodiment, the statistical behavior features comprise object categories, and the prediction network comprises a category interest characterization layer, a category interest interaction layer and a prediction layer; wherein, processing the target object, the historical objects and the attention weights corresponding to the target object and the historical objects by using the prediction network to obtain a behavior prediction result comprises: utilizing the class interest representation layer to perform weighted summation on the corresponding embedded vectors by utilizing the attention weights corresponding to historical objects with the same object class to obtain class interest representation vectors under the object class; processing category interest representation vectors under all object categories by utilizing the category interest interaction layer to obtain a second comprehensive interest representation vector; and processing the second comprehensive interest representation vector and the embedded vector of the target object by utilizing the prediction layer to obtain the behavior prediction result.

In a specific embodiment, processing, by using the prediction layer, the second synthetic interest feature vector and the embedded vector of the target object to obtain the behavior prediction result includes: and utilizing the prediction layer to perform fusion processing on the second comprehensive interest representation vector and the embedded vector of the target object to obtain a fusion vector, and performing linear transformation and/or nonlinear transformation processing on the fusion vector to obtain the behavior prediction result.

According to a second aspect, there is provided a training apparatus for a behavior prediction system, comprising: the training device comprises an acquisition unit, a training unit and a training unit, wherein the acquisition unit is configured to acquire a first training sample, a plurality of historical objects and statistical behavior characteristics of each historical object are included, the first training sample further comprises a target object and a behavior label, the behavior label indicates whether the first user makes a predetermined behavior on the target object after the plurality of historical objects; a processing unit configured to process the first training sample with a behavior prediction system comprising an attention network and a prediction network; the processing unit comprises the following sub-units: the weight determination subunit is configured to determine a first association degree between each history object and a target object by using the attention network, determine a second association degree between each history object and the corresponding statistical behavior feature thereof, and determine an attention weight corresponding to each history object based on the first association degree and the second association degree; the prediction subunit is configured to process the target object, the historical objects and the attention weights corresponding to the historical objects by using the prediction network to obtain a behavior prediction result; and the training unit is configured to train the behavior prediction system according to the behavior prediction result and the behavior label.

According to a third aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of the first aspect.

According to a fourth aspect, there is provided a computing device comprising a memory having stored therein executable code and a processor which, when executing the executable code, implements the method of the first aspect.

By adopting the method and the device provided by the embodiment of the specification, the collected historical behavior data are not arranged in a row according to the time sequence to form a sequence, but are organized to form a sparse matrix form, so that the effective information is fully reserved, the storage space and the calculated amount are greatly reduced, further, the newly designed behavior prediction system is utilized to process the behavior data in the sparse matrix form, and the accuracy of the user behavior prediction result is effectively improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 shows a comparison schematic diagram of behavioral data collection for active and inactive users according to one embodiment;

FIG. 2 illustrates a sample sequence of behaviors;

FIG. 3 illustrates behavioral data in the form of a sparse matrix, according to one embodiment;

FIG. 4 illustrates a training architecture diagram of a behavior prediction system, according to one embodiment;

FIG. 5 illustrates a flow diagram of a training method of a behavior prediction system, according to one embodiment;

FIG. 6 illustrates a schematic diagram of a training device of a behavior prediction system according to one embodiment.

Detailed Description

The scheme provided by the specification is described below with reference to the accompanying drawings. It should be noted that the following embodiments are all within the scope of the embodiments described in the present specification, and the embodiments obtained by reasonably combining the embodiments described below are not intended to be inventive.

As mentioned earlier, the current way of predicting user behavior is relatively single. In recent years, much work in the industry focuses on mining historical behavior data of users in a sequential manner, and although the effect obtained by a correlation model is good, two relatively big problems still exist through analysis and discovery:

1) From the perspective of the model, the data length that can be processed by the sequence model is limited, and sequence processing layers such as a Recurrent Neural Networks (RNN) or a Long-short Term timing Memory network (LSTM) have difficulty in processing longer sequences, and the overlong sequences are truncated to result in discarded information. In addition, the model needs to process the behavior sequence according to a given maximum length (e.g. 200), and in practice, the richness of the sample information is different greatly, so that the model is not friendly to process for a very sparse sample (e.g. a new user or an inactive user, which has less historical behavior data), and the model has waste of storage and calculation resources.

2) Representing the behavior data in the form of a fixed long sequence, there is irrationality over a time span, and in practice the behavior sequence pattern of active users is very different from that of inactive users. Referring to fig. 1, for inactive users, to take a fixed-length (illustrated as 10 behaviors in the figure) sequence of behaviors, most of the intercepted data is behavior data 1 year ago, while for active users, behavior data within a week is intercepted, and behavior data before the week is discarded.

Based on the above observation and analysis, the inventor proposes a scheme that collected historical behavior data is not arranged in a behavior sequence according to a time sequence (see fig. 2), but is organized in a sparse matrix form, for example, see fig. 3, where o and u respectively represent an object (object) and a user (user), and elements in the matrix are accumulated times, so that while effective information is fully retained, storage space and calculation amount are greatly reduced, and further, a model structure for processing behavior data in the sparse matrix form is proposed, and accuracy of a user behavior prediction result is effectively improved.

FIG. 4 illustrates a training architecture diagram of a behavior prediction system, according to one embodiment. As shown in FIG. 4, first, a training sample is obtained that organizes historical behavior data into a historical behavior matrix, where the row-column dimension of the historical behavior matrix corresponds to the identity (denoted as Id) of the business object _i ) And statistical behavior characteristics (e.g. cumulative number, etc.) for historical objectsf _j ) In FIG. 4, m and n are both positive integers and are typically integers greater than 1; the training samples are then input into the behavior prediction system 400, where the attention network 410 is used to determine the various historical and target objects (denoted as Ids) in the training samples ^t ) The first degree of association between them is denoted as

Or simply as

And, a second degree of association between each historical behavioral object and each statistical characteristic, denoted as

Or simply as

And determining attention weights corresponding to the historical behavior objects based on the first relevance and the second relevance, and recording the attention weights as

Or simply as

Thereby processing the target object Id using the predictive network 420 ^t Each history object Id _i And their corresponding attention weights

Obtaining a behavior prediction result

Thereafter, the behavior prediction results are utilized

And the corresponding behavior label y determines the training loss to update the model parameters in the behavior prediction system.

The implementation steps of the above scheme are described in detail below with reference to more examples.

Fig. 5 is a flow chart illustrating a training method of a behavior prediction system according to an embodiment, where an execution subject of the method may be any device, platform, server, or equipment cluster with computing and processing capabilities. As shown in fig. 5, the method comprises the steps of:

first, in step S510, a first training sample is obtained, which includes a number of historical objects in which a first user makes a predetermined behavior, and a statistical behavior feature for each of the historical objects, a target object, and a behavior tag indicating whether the first user makes the predetermined behavior on the target object after the number of historical objects.

It is to be understood that the first training sample may refer to any one of a set of training samples. In one embodiment, for a certain user, acquiring a plurality of behavior records of a predetermined behavior made by the user before a certain historical moment, and acquiring a target object pushed to the user after the certain historical moment; further, the plurality of behavior records are subjected to statistical analysis to obtain a plurality of (a plurality of in the text refers to one or more) historical objects related to the behavior records and statistical behavior characteristics corresponding to the historical objects, and in addition, behavior labels are determined according to collected behavior data of whether the user performs preset behaviors aiming at the target object, so that a corresponding training sample can be constructed.

In a specific embodiment, a plurality of behavior records that the certain user performed the predetermined behavior within a predetermined time period (e.g., within 1 month, within 1 year, etc.) before the certain historical time may be obtained. In this way, time alignment of different users on historical behaviors can be achieved, and therefore usability of the sample and accuracy of subsequent prediction are improved.

In a particular embodiment, the behavior tag may be a binary tag, for example, a tag value of 1 indicates that the predetermined behavior is made, and a tag value of 0 indicates that the predetermined behavior is not made.

On the other hand, in one embodiment, the plurality of history objects and the target object generally belong to the same business object, and the predetermined behavior is adapted to the business object. Illustratively, the business object is an advertisement, and the predetermined behavior may be a click operation; the business object is a user, and the predetermined behavior can be attention operation or cancellation of attention operation; the business object is a commodity, and the predetermined behavior can be operation behaviors such as searching, browsing, collecting, adding a shopping cart, creating an order, purchasing and the like; the service object is a video, and the predetermined behavior can be that the playing time reaches a preset percentage of the total time; the business object is APP, and the predetermined behavior can be registration; the service object is content information, the predetermined behavior can be approval, forwarding and the like, the service object is a public number, and the predetermined behavior can be subscription, attention, unsubscription and the like.

In one embodiment, the statistical behavior characteristics determined for each historical object may include one or more of: an accumulated number of times a predetermined action is made, an action period, an object category of a history object, and the like. In a specific embodiment, the number of behavior records corresponding to each history object is determined as the corresponding accumulated number.

In a specific embodiment, the first training sample is constructed based on a plurality of behavior records collected within the predetermined time period, each behavior record includes a historical time when the predetermined behavior is made, and the predetermined time period is divided into a plurality of time periods (or time sub-intervals) in advance, for example, the predetermined time period is 1 year, and the plurality of time periods are 12 months or 4 quarters; therefore, for each history object, the last history time when the predetermined behavior is made for the last time for the history object can be determined according to the behavior records, and the time period of the last history time in the multiple time periods can be determined as the corresponding behavior time period, and exemplarily, the sequence number of the time period can be taken as the corresponding behavior time period.

In a specific embodiment, the historical objects are commodities, and correspondingly, the object categories are commodity categories, such as electronic products, mother and baby products, household products, daily consumables, and the like. In another embodiment, the historical object is content information, and accordingly, the object category may include science and technology category, entertainment category, sports category, social category, and the like.

Therefore, the original behavior data are subjected to feature statistics and slight aggregation, so that redundancy in the original behavior data can be eliminated, and available behavior information can be effectively reserved, thereby saving storage space and calculation amount and effectively improving the accuracy of subsequent prediction results.

Further, in one embodiment, the plurality of historical objects and the statistical behavior feature included in the first training sample are organized in a matrix form, a row dimension in the matrix corresponds to the object identifier, a column dimension corresponds to the statistical behavior feature, and vice versa. For example, see the historical behavior matrix shown in fig. 1, where the ith row and jth column elements represent the ith statistical behavior feature for the jth historical object.

In another embodiment, the plurality of historical objects and statistical behavior features included in the first training sample are organized in a key-vector form, which can be regarded as an extension of the key-value form, and is denoted as:

{b,v _b :b∈B _u } (1)

in the formula (1), B _u Representing a historical object identification set (or index set), v, of a first user (or user u) _b A statistical feature vector representing a plurality of statistical behavior features formed for a historical object b, illustratively v _b ＝[t _b ,c _b ,κ _b ]Wherein t is _b 、c _b And kappa _b Respectively representing the above-described action period, the accumulated number of times, and the object category.

Based on this, for a batch of training samples employed in a single training iteration, the key-vector form can be represented as a sparse matrix with non-zero elements x _u,b Represents a triplet (u, v, x) _u,b ). Similarly, the behavior data in the form of key-vector can be represented by three-dimensional tensors (U, B, X), where U and B represent row and column index vectors, respectively, and X represents a matrix composed of all statistical feature vectors, in such a way that no repetition or padding is needed, which can effectively save computation and storage space.

In the above, the content and the construction method of the first training sample are described. After the first training sample is obtained, step S520 is executed, and the processing of the first training sample by using the behavior prediction system specifically includes:

in step S521, a first degree of association between each historical object and the target object and a second degree of association between each historical object and the corresponding statistical behavior feature are determined by using the attention network in the behavior prediction system, and an attention weight corresponding to each historical object is determined based on the first degree of association and the second degree of association.

It can be understood that the behavior prediction system further includes an embedding layer for the business object, for determining an object embedding vector, and further for calculating the first relevance and the second relevance. In one embodiment, before performing this step, the embedding layer may be used to separately process the identifiers of the objects involved in the first training sample to obtain corresponding embedded vectors, where the objects include a number of history objects and target objects.

In another embodiment, the embedding layer is designed to include a first embedding layer and a second embedding layer, based on which, before executing this step, the object identifications of the respective history objects can be processed by the first embedding layer to obtain corresponding embedding vectors, and the object features of the target object can be processed by the second embedding layer to obtain the embedding vectors of the target object. Illustratively, the target object is a commodity, and the object characteristics thereof may include cost, place of production, commodity category, target group, and the like.

Based on the object embedding vectors obtained above, on the one hand, for the calculation of the first degree of association, in this step, a first degree of similarity between the embedding vector of each history object and the embedding vector of the target object may be calculated as a corresponding first degree of association. It is understood that the inter-vector similarity calculation described herein can be implemented by calculating a dot product or calculating a cosine similarity, etc. In this way, a first degree of association may be calculated.

On the other hand, for the calculation of the second degree of association, in the present step, the second degree of similarity may be calculated as the above-described second degree of association based on the embedding vector of each history object and the feature encoding vector of the statistical behavior feature. In a specific embodiment, the attention network further includes a feature encoding layer, and accordingly, the feature encoding layer may be used to process the statistical behavior features to obtain feature encoding vectors. It can be understood that the feature coding layer uses the statistical behavior features as input, and performs linear transformation processing by using the parameter matrix, and/or performs nonlinear transformation processing by using the activation function, so as to output corresponding feature coding vectors. In a more specific embodiment, the statistical behavior feature comprises a plurality of statistical features corresponding to a plurality of statistical terms, and the feature encoding layer comprises a plurality of feature encoding layers corresponding to the plurality of statistical terms; in this way, the plurality of statistical behavior features are processed correspondingly by the plurality of feature encoding layers to obtain a plurality of feature encoding vectors, and second similarity calculation is performed on each of the embedded vectors of the history objects and the plurality of feature encoding vectors to obtain a plurality of second similarities as a plurality of second degrees of association.

In a specific embodiment, the feature encoding vector is designed to have the same dimension as the object embedding vector, and in this case, the inter-vector similarity calculation can be performed directly by using the embedding vector and the feature encoding vector, so as to obtain a corresponding second similarity. In another specific embodiment, the dimension of the embedded vector is greater than the dimension of the feature coding vector, and at this time, an object coding layer may be introduced into the attention network, so that the object coding layer is first used to perform dimension reduction on the embedded vector to obtain an object coding vector, and then the object coding vector and the feature coding vector are used to perform inter-vector similarity calculation to obtain a corresponding second similarity.

In this way, the second degree of association can be calculated.

From the above, for each historical object, a first degree of association between the historical object and the target object and a second degree of association between the historical object and each statistical behavior feature can be obtained. Based on the above, the attention scores of the historical objects can be respectively calculated, and then the attention scores corresponding to the historical objects are normalized to obtain the attention weights.

For each historical object, a plurality of relevancy calculation items are determined based on the first relevancy and the second relevancy involved in the historical object, and the attention score of the historical object is determined based on the relevancy calculation items. In one embodiment, the first relevance and the second relevance involved are included in a plurality of relevance calculating items. In another embodiment, the first relevance degree and the second relevance degree involved in the method are subjected to cross multiplication processing, and a plurality of product results are put into a plurality of relevance degree calculation items; illustratively, the product between the first relevance degree involved with the first relevance degree and each second relevance degree is calculated respectively, and the product between a predetermined number (such as 2 or 3) of second relevance degrees in a plurality of second relevance degrees involved with the second relevance degree is calculated respectively.

In one embodiment, the plurality of relevancy calculation items are summed, and the summed result is used as the corresponding attention score. In another embodiment, the plurality of relevancy calculation items can be jointly input into an attention grading layer set in the attention network to obtain the attention score of the historical object.

In this way, several attention scores corresponding to several historical objects can be obtained. Further, in one embodiment, the attention scores may be normalized by utilizing a softmax function, or by utilizing a simple duty ratio calculation method, etc., to obtain the attention weights.

From the above, by using the attention network in the behavior prediction system, several attention weights corresponding to several historical objects in the first training sample can be obtained.

Thereafter, in step S522, the target object, each history object, and their corresponding attention weights are processed by using the prediction network in the behavior prediction system, and a behavior prediction result is obtained. It is to be understood that the behavior prediction result includes a prediction probability that the first user made a predetermined behavior with respect to the target object.

In one embodiment, each historical object and the corresponding attention weight thereof are processed by a prediction network to obtain an interest characterization vector for the first user, and a behavior prediction result is determined based on the interest characterization vector and an embedded vector of the target object.

In one embodiment, for several history objects, several embedded vectors thereof may be simply weighted and summed with several attention weights thereof, thereby taking the resulting vector as the interest characterization vector of the first user.

In another embodiment, multi-dimensional segmentation and fusion of user interests are considered, resulting in vectors that can be more richly and accurately characterized. Specifically, for a plurality of historical objects, the plurality of historical objects are grouped according to the statistical behavior characteristics, the historical objects in the same group have the same statistical behavior characteristics, and it can be understood that under the condition that the statistical behavior characteristics comprise a plurality of the historical objects, a certain characteristic can be selected from the historical objects to group the plurality of historical objects; then, aiming at each group obtained by grouping, carrying out weighted summation on the corresponding embedded vector by using the attention weight of the historical object to obtain a corresponding interest characterization vector; then, fusion processing is carried out on the basis of a plurality of interest characterization vectors corresponding to the groups, and a comprehensive interest characterization vector of the first user is obtained.

In a specific embodiment, grouping can be performed according to behavior periods in the statistical behavior features, and accordingly, a period interest characterization layer, a period interest interaction layer and a first prediction layer are arranged in the prediction network; based on this, the method can comprise the following steps: utilizing the time interval interest characterization layer to perform weighted summation on the corresponding embedded vectors by utilizing the corresponding attention weights of the historical objects with the same behavior time interval to obtain time interval interest characterization vectors under the behavior time interval; processing the time interval interest characterization vectors under each behavior time interval by using the time interval interest interaction layer to obtain a first comprehensive interest characterization vector; and processing the first comprehensive interest representation vector and the embedded vector of the target object by utilizing the first prediction layer to obtain the behavior prediction result.

Further, in a more specific embodiment, the period interest interaction layer is implemented as a time sequence network, and accordingly, the period interest characterization vectors in each behavior period may be sequentially used as the input of the time sequence network to obtain the first comprehensive interest characterization vector. Illustratively, the timing network may be implemented by RNN network, LSTM network, gated Recycling Unit (GRU), or the like. In another more specific embodiment, the above-described temporal interest interaction layer may be implemented as a multi-layer perceptron. Therefore, the first comprehensive interest representation vector fusing the interest information of the user in different periods can be obtained. Illustratively, assuming that the user purchases apples and pears in sequence in two historical periods, the first composite interest characterization vector contains information that the user is interested in fruits such as apples and pears in addition to the two pieces of information.

In a more specific embodiment, a prediction layer is utilized to perform fusion processing on the first synthetic interest representation vector and the embedded vector of the target object to obtain a fusion vector, and linear transformation and/or nonlinear transformation processing is performed on the fusion vector to obtain a behavior prediction result. In one example, the fusion process includes splicing, summing, or bit-wise multiplying. In one example, the linear transformation and/or the nonlinear transformation process may be implemented by Deep Neural Networks (DNNs).

Therefore, the first comprehensive characterization vector which integrates the interests of the user in different time periods can be obtained, and therefore the accuracy of the prediction result is effectively improved.

In another specific embodiment, the grouping may be performed according to object categories in the statistical behavior characteristics, and accordingly, a category interest characterization layer, a category interest interaction layer, and a second prediction layer are set in the prediction network; based on this, the method can comprise the following steps: utilizing a category interest representation layer to perform weighted summation on the corresponding embedded vectors by utilizing the corresponding attention weights of historical objects with the same object category to obtain category interest representation vectors under the object category; processing category interest representation vectors under each object category by using a category interest interaction layer to obtain a second comprehensive interest representation vector; and processing the second comprehensive interest representation vector and the embedded vector of the target object by utilizing the second prediction layer to obtain the behavior prediction result.

In a more specific embodiment, the above-mentioned category interest interaction layer may be implemented as a multi-layer perceptron, which inputs a plurality of category interest characterization vectors under a plurality of object categories, whereby the resulting second composite interest characterization vector fuses the user's cross-preferences between different object categories.

In a more specific embodiment, a second prediction layer is utilized to perform fusion processing on the second synthetic interest representation vector and the embedded vector of the target object to obtain a fusion vector, and linear transformation and/or nonlinear transformation processing is performed on the fusion vector to obtain a behavior prediction result. Illustratively, the fusion process includes summing, stitching, averaging, bit-wise multiplying, and the like.

According to an example, the prediction network includes the period interest characterization layer, the period interest interaction layer, the category interest characterization layer, the category interest interaction layer, and the prediction layer, and at this time, the prediction layer may be used to perform fusion processing on the first comprehensive interest characterization vector, the second comprehensive interest characterization vector, and the embedded vector of the target object to obtain a fusion vector, and perform linear transformation and/or nonlinear transformation processing on the fusion vector to obtain a behavior prediction result. Furthermore, the behavior prediction system further includes a user characterization layer, the user characterization layer inputs user characteristics (such as occupation, permanent residence, and the like) of the first user, and outputs a user characterization vector.

From the above, the behavior prediction result output by the behavior prediction system can be obtained. Then, in step S530, the behavior prediction system is trained according to the behavior prediction result and the behavior label. Specifically, a training loss, such as cross entropy loss, may be calculated according to the behavior prediction result and the behavior label, and then a training gradient may be calculated based on the training loss, so as to update the network parameters in the behavior prediction system by using a back propagation method based on the training gradient. It will be appreciated that this network parameter refers to an optimizable parameter.

From the above, training of the behavior prediction system can be achieved. By repeatedly executing the method flow indicated in fig. 5 based on the training sample set, multiple iterative updates of the behavior prediction system may be implemented until a convergence criterion is reached, e.g., the fluctuation of the training loss over the validation set is less than a predetermined threshold, or the iteration round reaches a predetermined number of times, etc. Therefore, the trained behavior prediction system can be obtained and applied to user behavior prediction in actual service scenes.

In summary, by using the training method of the behavior prediction system disclosed in the embodiment of the present specification, collected historical behavior data is not arranged in a row according to a time sequence to form a sequence, but is organized in a sparse matrix form, so that effective information is fully retained, a storage space and a calculation amount are greatly reduced, and further, the newly designed behavior prediction system is used for processing behavior data in the sparse matrix form, thereby effectively improving the accuracy of a user behavior prediction result.

Corresponding to the training method, the embodiment of the specification also discloses a training device. FIG. 6 illustrates a schematic diagram of a training device of a behavior prediction system according to one embodiment. As shown in fig. 6, the apparatus 600 includes:

an obtaining unit 610 is configured to obtain a first training sample, which includes a number of history objects in which a first user makes a predetermined behavior, and a statistical behavior feature for each of the history objects, and also includes a target object, and a behavior tag indicating whether the first user makes the predetermined behavior for the target object after the number of history objects.

A processing unit 620 configured to process the first training sample with a behavior prediction system comprising an attention network and a prediction network; the processing unit 620 comprises the following sub-units: a weight determining subunit 621, configured to determine, by using the attention network, a first degree of association between each history object and a target object, and a second degree of association between each history object and a corresponding statistical behavior feature thereof, and determine, based on the first degree of association and the second degree of association, an attention weight corresponding to each history object; and the predicting sub-unit 622 is configured to process the target object, the historical objects and the attention weights corresponding to the target object, the historical objects and the attention weights by using the prediction network to obtain a behavior prediction result.

A training unit 630 configured to train the behavior prediction system according to the behavior prediction result and the behavior label.

In one embodiment, the obtaining unit 610 is specifically configured to: acquiring a plurality of behavior records of the first user making the predetermined behavior for a plurality of times; and carrying out statistical processing on the behavior records to obtain the historical objects involved in the behavior records and the statistical behavior characteristics.

In one embodiment, the behavior prediction system further includes an embedding layer, and the processing unit 620 further includes an embedding subunit 623 configured to process the identifiers of the respective objects in the first training sample by using the embedding layer to obtain corresponding embedding vectors; the weight determining subunit 621 is specifically configured to: and calculating a first similarity between the embedded vector of each history object and the embedded vector of the target object as the first association degree.

In one embodiment, the behavior prediction system further includes an object coding layer and a feature coding layer, and the processing unit 620 further includes an encoding subunit 624 configured to process the embedded vectors of the respective historical objects by using the object coding layer to obtain corresponding object coding vectors; processing the statistical behavior characteristics by utilizing the characteristic coding layer to obtain corresponding characteristic coding vectors; the weight determining subunit 621 is specifically configured to: and calculating a second similarity between the object coding vector and the feature coding vector as the second correlation degree.

In a specific embodiment, the statistical behavior feature includes a plurality of statistical features corresponding to a plurality of statistical terms, and the feature encoding layer includes a plurality of feature encoding layers corresponding to the plurality of statistical terms; the encoding subunit 624 is specifically configured to: correspondingly processing the plurality of statistical characteristics by utilizing the plurality of characteristic coding layers to obtain a plurality of characteristic coding vectors; the weight determining subunit 621 is specifically configured to: and respectively calculating a plurality of second similarity degrees between the object code vector and the plurality of feature code vectors as a plurality of second association degrees.

In one embodiment, the statistical behavior feature comprises at least one of: number of actions, action period, object category.

In one embodiment, the statistical behavior feature comprises a behavior period, and the prediction network comprises a period interest characterization layer, a period interest interaction layer and a prediction layer; the predictor 622 is specifically configured to: utilizing the time interval interest characterization layer to perform weighted summation on the corresponding embedded vectors by utilizing the attention weights corresponding to the historical objects with the same behavior time interval so as to obtain time interval interest characterization vectors under the behavior time interval; processing the time interval interest representation vectors under each behavior time interval by utilizing the time interval interest interaction layer to obtain a first comprehensive interest representation vector; and processing the first comprehensive interest representation vector and the embedded vector of the target object by utilizing the prediction layer to obtain the behavior prediction result.

In a specific embodiment, the obtaining unit 610 is further configured to: aiming at each historical object, acquiring a plurality of behavior moments when the first user makes the predetermined behavior for a plurality of times within a predetermined time; and determining a duration subinterval to which the last behavior moment in the behavior moments belongs in the preset duration, and taking the interval sequence number of the duration subinterval as a corresponding behavior time interval and classifying the interval sequence number into the statistical behavior characteristic.

In a specific embodiment, the interval interest interaction layer is implemented as a time-series network, and the prediction subunit 622 is further configured to: and sequentially taking the time interval interest characterization vectors in each behavior time interval as the input of the time sequence network to obtain the first comprehensive interest characterization vector.

In a specific embodiment, the predictor 622 is further configured to: and utilizing the prediction layer to perform fusion processing on the first comprehensive interest representation vector and the embedded vector of the target object to obtain a fusion vector, and performing linear transformation and/or nonlinear transformation processing on the fusion vector to obtain the behavior prediction result.

In one example, the fusion process includes stitching, summing, or bit-wise multiplying.

In one embodiment, the statistical behavior features comprise object categories, and the prediction network comprises a category interest characterization layer, a category interest interaction layer and a prediction layer; the predictor 622 is specifically configured to: utilizing the category interest representation layer to perform weighted summation on the corresponding embedded vectors by utilizing the corresponding attention weights of the historical objects with the same object category to obtain the category interest representation vectors under the object category; processing category interest representation vectors under all object categories by utilizing the category interest interaction layer to obtain a second comprehensive interest representation vector; and processing the second comprehensive interest representation vector and the embedded vector of the target object by utilizing the prediction layer to obtain the behavior prediction result.

In a specific embodiment, the prediction subunit 622 is further configured to: and utilizing the prediction layer to perform fusion processing on the second comprehensive interest representation vector and the embedded vector of the target object to obtain a fusion vector, and performing linear transformation and/or nonlinear transformation processing on the fusion vector to obtain the behavior prediction result.

According to an embodiment of another aspect, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method described in connection with fig. 5.

According to an embodiment of still another aspect, there is also provided a computing device including a memory and a processor, the memory having stored therein executable code, and the processor implementing the method described in conjunction with fig. 5 when executing the executable code.

Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in this invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.

The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only examples of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present invention should be included in the scope of the present invention.

Claims

1. A method of training a behavioral prediction system, comprising:

acquiring a first training sample, wherein the first training sample comprises a plurality of historical objects of a first user for making a predetermined behavior, statistical behavior characteristics of the historical objects, a target object and a behavior label indicating whether the first user makes the predetermined behavior on the target object after the plurality of historical objects;

inputting the first training sample into a behavior prediction system comprising an attention network and a prediction network; wherein, the first and the second end of the pipe are connected with each other,

determining a first degree of association between each historical object and a target object by using the attention network, determining a second degree of association between each historical object and the corresponding statistical behavior characteristic thereof, and determining an attention weight corresponding to each historical object based on the first degree of association and the second degree of association;

processing the target object, the historical objects and the attention weights corresponding to the target object and the historical objects by using the prediction network to obtain a behavior prediction result;

and training the behavior prediction system according to the behavior prediction result and the behavior label.

2. The method of claim 1, wherein the first training sample comprises a historical behavior matrix, a first dimension of the historical behavior matrix corresponding to object identification and a second dimension corresponding to statistical behavior features.

3. The method of claim 1, wherein obtaining a first training sample comprises:

acquiring a plurality of behavior records of the predetermined behavior made by the first user for a plurality of times;

and carrying out statistical processing on the behavior records to obtain the historical objects involved in the behavior records and the statistical behavior characteristics.

4. The method of claim 1, wherein the behavior prediction system further comprises an embedding layer, the method further comprising:

processing the identification of each object in the first training sample by using the embedding layer to obtain a corresponding embedding vector;

determining a first degree of association between each history object and a target object, wherein the determining comprises:

and calculating a first similarity between the embedded vector of each history object and the embedded vector of the target object as the first association degree.

5. The method of claim 1 or 4, wherein the behavior prediction system further comprises an object coding layer and a feature coding layer, the method further comprising:

processing the embedded vectors of the historical objects by using the object coding layer to obtain corresponding object coding vectors;

processing the statistical behavior characteristics by utilizing the characteristic coding layer to obtain corresponding characteristic coding vectors;

determining a second association degree between each historical object and the corresponding statistical behavior feature thereof, wherein the determining the second association degree comprises the following steps:

and calculating a second similarity between the object coding vector and the feature coding vector as the second correlation degree.

6. The method of claim 5, wherein the statistical behavior feature comprises a plurality of statistical features corresponding to a plurality of statistical terms, and the feature coding layer comprises a plurality of feature coding layers corresponding to the plurality of statistical terms; wherein, processing the statistical behavior characteristics by using the characteristic coding layer to obtain corresponding characteristic coding vectors comprises:

correspondingly processing the plurality of statistical characteristics by using the plurality of characteristic coding layers to obtain a plurality of characteristic coding vectors;

wherein calculating a second similarity between the object coding vector and the feature coding vector as the second degree of association comprises:

and respectively calculating a plurality of second similarity degrees between the object code vector and the plurality of feature code vectors as a plurality of second association degrees.

7. The method according to any one of claims 1-4 and 6, wherein the statistical behavior feature comprises at least one of: number of actions, action period, object category.

8. The method of claim 1, wherein the statistical behavior feature comprises a period of behavior, the prediction network comprises a period interest characterization layer, a period interest interaction layer, and a prediction layer; wherein, processing the target object, the historical objects and the attention weights corresponding to the target object, the historical objects and the attention weights by using the prediction network to obtain a behavior prediction result comprises:

utilizing the time interval interest characterization layer to perform weighted summation on the corresponding embedded vectors by utilizing the attention weights corresponding to the historical objects with the same behavior time interval so as to obtain time interval interest characterization vectors under the behavior time interval;

processing the time interval interest characterization vectors under each behavior time interval by using the time interval interest interaction layer to obtain a first comprehensive interest characterization vector;

and processing the first comprehensive interest representation vector and the embedded vector of the target object by utilizing the prediction layer to obtain the behavior prediction result.

9. The method of claim 8, wherein the obtaining of the behavioral period comprises:

aiming at each historical object, acquiring a plurality of behavior moments when the first user makes the predetermined behavior for a plurality of times within a predetermined time;

and determining a duration subinterval to which the last behavior moment in the behavior moments belongs in the preset duration, and taking the interval sequence number of the duration subinterval as a corresponding behavior time interval and classifying the behavior time interval into the statistical behavior characteristic.

10. The method of claim 8, wherein the temporal interest interaction layer is implemented as a time-series network, wherein processing the temporal interest characterization vectors at each activity period using the temporal interest interaction layer to obtain a first synthetic interest characterization vector comprises:

and sequentially taking the time interval interest characterization vectors in each behavior time interval as the input of the time sequence network to obtain the first comprehensive interest characterization vector.

11. The method of claim 8, wherein processing the first synthetic interest characterization vector and the embedded vector of the target object using the prediction layer to obtain the behavior prediction result comprises:

and utilizing the prediction layer to perform fusion processing on the first comprehensive interest representation vector and the embedded vector of the target object to obtain a fusion vector, and performing linear transformation and/or nonlinear transformation processing on the fusion vector to obtain the behavior prediction result.

12. The method of claim 11, wherein the fusion process comprises stitching, summing, or bit-wise multiplying.

13. The method of claim 1, wherein the statistical behavioral characteristics include object categories, the prediction network includes a category interest characterization layer, a category interest interaction layer, and a prediction layer; wherein, processing the target object, the historical objects and the attention weights corresponding to the target object, the historical objects and the attention weights by using the prediction network to obtain a behavior prediction result comprises:

utilizing the category interest representation layer to perform weighted summation on the corresponding embedded vectors by utilizing the corresponding attention weights of the historical objects with the same object category to obtain the category interest representation vectors under the object category;

processing category interest representation vectors under all object categories by utilizing the category interest interaction layer to obtain a second comprehensive interest representation vector;

and processing the second comprehensive interest representation vector and the embedded vector of the target object by utilizing the prediction layer to obtain the behavior prediction result.

14. The method of claim 13, wherein processing the second synthetic interest characterization vector and the embedded vector of the target object using the prediction layer to obtain the behavior prediction result comprises:

and utilizing the prediction layer to perform fusion processing on the second comprehensive interest representation vector and the embedded vector of the target object to obtain a fusion vector, and performing linear transformation and/or nonlinear transformation processing on the fusion vector to obtain the behavior prediction result.

15. A training apparatus of a behavior prediction system, comprising:

the training device comprises an acquisition unit, a comparison unit and a training unit, wherein the acquisition unit is configured to acquire a first training sample, the first training sample comprises a plurality of historical objects of a predetermined behavior made by a first user, statistical behavior characteristics aiming at each historical object, a target object and a behavior label indicating whether the first user makes the predetermined behavior on the target object after the plurality of historical objects;

a processing unit configured to process the first training sample with a behavior prediction system comprising an attention network and a prediction network; the processing unit comprises the following sub-units:

a weight determination subunit, configured to determine, by using the attention network, a first degree of association between each history object and a target object, and a second degree of association between each history object and a corresponding statistical behavior feature thereof, and determine an attention weight corresponding to each history object based on the first degree of association and the second degree of association;

the predicting subunit is configured to process the target object, the historical objects and the attention weights corresponding to the historical objects by using the prediction network to obtain a behavior prediction result;

and the training unit is configured to train the behavior prediction system according to the behavior prediction result and the behavior label.

16. A computer-readable storage medium, on which a computer program is stored, wherein the computer program causes a computer to carry out the method of any one of claims 1-14, when the computer program is carried out in the computer.

17. A computing device comprising a memory and a processor, wherein the memory has stored therein executable code that when executed by the processor implements the method of any of claims 1-14.