CN111222722B

CN111222722B - Method, neural network model and device for business prediction for business object

Info

Publication number: CN111222722B
Application number: CN202010329614.XA
Authority: CN
Inventors: 辛超; 崔卿; 向彪; 周俊
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2020-04-24
Filing date: 2020-04-24
Publication date: 2020-07-24
Anticipated expiration: 2040-04-24
Also published as: CN111222722A

Abstract

The embodiment of the specification provides a method for business prediction aiming at business objects and a neural network model. In the method, an initial feature matrix corresponding to a business object is obtained first, wherein the initial feature matrix comprises N original vectors corresponding to N features of the business object. Then, performing multi-stage processing on the initial feature matrix, wherein each stage of processing comprises that for the ith feature vector to be processed at the stage, the corresponding ith original vector and each corresponding stage of feature vector in the initial feature matrix are respectively subjected to linear transformation to obtain the ith transformation vector and each corresponding stage of transformation vector; and performing weighted combination on the fusion result according to the correlation degrees between the ith transformation vector and each current-stage transformation vector, thereby determining the next-stage feature vector of the ith feature vector. And obtaining a representation vector corresponding to the service object according to the matrix obtained by the last-stage processing, and performing service prediction on the service object according to the representation vector.

Description

Method, neural network model and device for business prediction for business object

Technical Field

One or more embodiments of the present specification relate to the field of machine learning, and more particularly, to a method and apparatus for business prediction for business objects.

Background

With the development of computer technology, machine learning has been applied to various technical fields for analyzing and predicting various business data. For example, the classification of the user can be predicted according to the attribute characteristics of the user, so that customized personalized services are provided for the user; the recommendation degree between the users and the articles can be predicted according to the integration of the information of the users and the articles, so that the appropriate articles can be recommended for the users; for another example, traffic peaks may be predetermined based on predictions of times users visit websites, thereby deploying the network environment in advance.

In a prediction scene aiming at various business objects, in order to improve the accuracy of model prediction as much as possible, abundant feature data with different dimensions are usually introduced. The characteristics of different dimensions depict different information of the service scene from different angles. In most cases, the fitting target of the model and each basic feature are not in a simple linear relationship, so that the model trained based on the basic features can only express a linear combination of feature information, and the model expression capability is limited. Therefore, it is desirable to efficiently combine features to improve the expressive power of the model. The traditional feature combination work is manually designed by engineering personnel according to business experience, has high cost and poor business expansibility and is limited by the business understanding of the engineering personnel.

Therefore, an improved scheme is expected to be provided, the feature combination is more effectively carried out on the service features, the modeling limitation on high-order features is avoided, the expression capacity of the model is improved, and the prediction accuracy of the model is improved.

Disclosure of Invention

One or more embodiments of the present disclosure describe a method and a neural network model for service prediction of a service object, which can perform more effective high-order combination on features of the service object, and improve prediction accuracy.

According to a first aspect, there is provided a method for business prediction for a business object, comprising:

acquiring an initial feature matrix corresponding to a first service object, wherein the initial feature matrix comprises N original vectors obtained by coding feature values of N features of the first service object;

performing multi-stage feature cross processing on the initial feature matrix to obtain a multi-stage processing matrix; performing linear transformation on the ith original vector at the corresponding position in the initial feature matrix and each level of feature vector in the level matrix to obtain an ith transformation vector and each level of transformation vector; according to the correlation degrees between the ith conversion vector and each current-level conversion vector, performing weighted combination on the fusion result obtained by fusing the ith conversion vector and each current-level conversion vector respectively, and determining the feature vector of the ith feature vector in the next-level processing matrix based on the combination result;

obtaining a representation vector corresponding to the first business object according to the last-stage processing matrix in the multi-stage processing matrix;

and performing service prediction on the first service object according to the characterization vector.

According to one embodiment, the first business object may be one of the following entity objects: a user, a merchant, a commodity, an item to be recommended; correspondingly, the N features include attribute features of the entity object.

According to another embodiment, the first business object may be a business event, the business event comprising one of: payment events, purchase events, recommendation events, login events; correspondingly, the N characteristics include respective attribute characteristics of each participant of the business event.

In one embodiment, the linear transformation is performed on the ith original vector at the corresponding position in the initial feature matrix and each of the feature vectors at the current level in the initial feature matrix, specifically including: and respectively carrying out linear transformation on the ith transformation vector and each current-level feature vector by using the current-level parameter matrix corresponding to the current level to obtain the ith transformation vector and each current-level transformation vector.

In another embodiment, the linear transformation is performed on the ith original vector at the corresponding position in the initial feature matrix and each of the feature vectors at the current level in the initial feature matrix, specifically including: performing linear transformation on the ith original vector by using a first parameter matrix to obtain an ith transformation vector; and performing linear transformation on each level of feature vector by using a second parameter matrix to obtain each level of transformation vector.

According to one embodiment, the weighted combination is performed by: determining respective degrees of correlation between the ith transform vector and the respective present-level transform vectors; determining each weight factor corresponding to each current-level transformation vector according to each correlation degree; respectively carrying out fusion operation on the ith transformation vector and each current-level transformation vector to obtain each fusion vector; and performing weighted combination on each fusion vector according to each weight factor to obtain a combination result.

Further, the correlation may be determined by: calculating cosine similarity of the ith transformation vector and each current-level transformation vector as the correlation degree; or, calculating the inner product result of the ith transformation vector and each current-stage transformation vector as the correlation; or calculating the vector distance between the ith transformation vector and each current-stage transformation vector, and determining the correlation according to the vector distance.

In one embodiment, the fusing operation comprises one of: multiplying by bit, summing and averaging.

According to an embodiment, the determining the feature vector of the ith feature vector in the next-stage processing matrix based on the combination result specifically includes: on the basis of the combination result, adding an offset vector and the ith feature vector as a feature vector in a next-stage processing matrix.

In one embodiment, the characterization vector is obtained by: pooling the final-stage processing matrix to obtain the characterization vector, wherein the pooling comprises one of the following steps: maximal pooling, average pooling, attention-based pooling.

According to a second aspect, there is provided a neural network model for business prediction for a business object, comprising:

the system comprises an input layer and a service object, wherein the input layer is used for acquiring an initial feature matrix corresponding to a first service object, and the initial feature matrix comprises N original vectors obtained by coding feature values of N features of the first service object;

the multilevel cross processing layer is used for carrying out multilevel characteristic cross processing on the initial characteristic matrix to obtain a multilevel processing matrix; each stage of cross processing layer is used for respectively carrying out linear transformation on the ith original vector at the corresponding position in the initial characteristic matrix and each level of characteristic vector in the level matrix to obtain the ith transformation vector and each level of transformation vector for any ith characteristic vector in the level matrix to be processed; according to the correlation degrees between the ith conversion vector and each current-level conversion vector, performing weighted combination on the fusion result obtained by fusing the ith conversion vector and each current-level conversion vector respectively, and determining the feature vector of the ith feature vector in the next-level processing matrix based on the combination result;

the pooling layer is used for obtaining a characterization vector corresponding to the first service object according to the last-stage processing matrix in the multi-stage processing matrices;

and the output layer is used for carrying out service prediction on the first service object according to the characterization vector.

According to a third aspect, there is provided an apparatus for performing traffic prediction for a traffic object, comprising:

an obtaining unit, configured to obtain an initial feature matrix corresponding to a first service object, where the initial feature matrix includes N original vectors obtained by encoding feature values of N features of the first service object;

the multiple cross processing units are configured to perform multi-stage feature cross processing on the initial feature matrix to obtain a multi-stage processing matrix; each cross processing unit is configured to perform linear transformation on the ith original vector at the corresponding position in the initial feature matrix and each ith feature vector in the current-level matrix to obtain an ith transformation vector and each current-level transformation vector for any ith feature vector in the current-level matrix to be processed; according to the correlation degrees between the ith conversion vector and each current-level conversion vector, performing weighted combination on the fusion result obtained by fusing the ith conversion vector and each current-level conversion vector respectively, and determining the feature vector of the ith feature vector in the next-level processing matrix based on the combination result;

the pooling unit is configured to obtain a representation vector corresponding to the first service object according to the last-stage processing matrix in the multi-stage processing matrix;

and the prediction unit is configured to perform service prediction on the first service object according to the characterization vector.

According to a fourth aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of the first aspect.

According to a fifth aspect, there is provided a computing device comprising a memory and a processor, wherein the memory has stored therein executable code, and the processor, when executing the executable code, implements the method of the first aspect.

According to the method, the device and the neural network model provided by the embodiment of the specification, in the multi-stage feature intersection processing, a non-linear fusion mode and an attention-based weighted combination mode are adopted among feature vectors, so that sufficient cross combination operation is obtained among the features, and high-order features with more expressive power are obtained. The service prediction is executed based on the high-order characteristics, so that the accuracy of the service prediction can be further improved. And moreover, a basis and possibility are provided for the interpretability of the business prediction result based on the attention combination mode.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic diagram illustrating an implementation scenario of an embodiment disclosed herein;

FIG. 2 illustrates a method of business prediction for a business object, according to one embodiment;

FIG. 3 illustrates a second embodimentlClass de superA cross-check processing process;

FIG. 4 illustrates the process steps of fusing transform vectors and performing weighted combination of the fusion results in one embodiment;

FIG. 5 illustrates a schematic structural diagram of a neural network model according to one embodiment;

fig. 6 shows a schematic block diagram of a traffic prediction apparatus according to an embodiment.

Detailed Description

The scheme provided by the specification is described below with reference to the accompanying drawings.

As previously mentioned, in order to improve the feature expression capability of the model for the business object, it is desirable to combine features more efficiently. In one approach, a multi-layer neural network is utilized to perform high order combining of features in various hidden layers of the network. For example, high order Cross-combining between features can be done through a Deep Cross-connect Network (DCN).

Specifically, the original input obtained by the deep cross network is composed of N feature items, and each feature item is represented as X by a d-dimensional feature vector₀ ⁱI =1,2 … N. The DCN firstly splices all N eigenvectors to obtain an original input vector X₀:

（1）

I.e. the original input vector X₀And splicing the feature vectors of the feature items.

Then, at each feature intersection processing layer, feature intersection combination is performed, whereinlThe combination of the layers satisfies the following formula:

（2）

wherein the content of the first and second substances,

is as followslThe input of the layer(s) is (are),

is as followslThe output of the layer(s) is,

，

is as followslNetwork parameters of the layer.

According to the above formula (2), the first step of DCNlLayer output

Including primitive features from first order tolAll possible combinations of orders. DCNs can thus achieve arbitrary finite high-order feature combinations by stacking feature-crossing layers.

However, if the bias term in equation (2) is ignored

By careful analysis, the following rules can be found when each layer is changed:

（3）

wherein the coefficients

。

In a similar manner, the first and second substrates are,

(4)

wherein the coefficients

It can be seen that modeling of DCN for higher order feature combinations degrades to the original features

Although the scale of scaling is related to the input features, its expressive power is relatively limited.

In order to further improve the feature expression capability of the neural network model for business prediction and improve the prediction accuracy of the neural network model, according to the embodiment of the invention, a further feature cross combination mode is provided to avoid the linear degradation of high-order feature combination.

Fig. 1 is a schematic view of an implementation scenario of an embodiment disclosed in this specification. As shown in the figure, firstly, the neural network model of the embodiment arranges the original features of the business objects into a feature matrix

And not feature vectors. When each cross processing layer carries out feature cross processing of each level, on one hand, each feature vector in the matrix is fused by combining the original feature matrix, and the fusion results are weighted and combined based on an attention mechanism. The above fusion and attention-based combination are both non-linear operations, which makes the feature matrix output by the last cross processing layer contain various high-order combination modes among feature vectors without being degraded to linear scaling of the original vector. Therefore, the characteristic expression capability of the neural network model is enhanced, and the prediction accuracy of the neural network model on the business object is improved.

The whole process of business prediction for business objects under the above concept is described in detail below.

FIG. 2 illustrates a method of business prediction for a business object, according to one embodiment. It is to be appreciated that the method can be performed by any apparatus, device, platform, cluster of devices having computing and processing capabilities. In one embodiment, the method may be performed by a neural network model that may be deployed in any device, apparatus, platform, cluster of devices having computing, processing capabilities. As shown in fig. 2, the method for traffic prediction at least comprises the following steps.

First, in step 21, an initial feature matrix corresponding to a first service object to be predicted is obtained, where the initial feature matrix includes N original vectors obtained by encoding feature values of N features of the first service object.

In one embodiment, the first business object corresponds to a single entity object, and the N features include attribute features of the entity object.

For example, in one example, the entity object is a user. At this time, the above-mentioned N characteristics may be attribute characteristics of the user, such as basic attribute characteristics of age, sex, registration time, education level, and the like, and behavior attribute characteristics such as recent browsing history, recent shopping history, and the like.

In another example, the entity object may be a merchant. At this time, the above-mentioned N characteristics may be attribute characteristics of the merchant, such as merchant category, registration time, commodity quantity, sales volume, number of people concerned, and the like.

In other examples, the entity object may also be a commodity, or an item to be recommended (e.g., an article to be pushed, music, a movie, etc.). Correspondingly, the N characteristics include attribute characteristics of the corresponding goods or articles.

In another embodiment, the first business object to be predicted is a business event, and the business event may be, for example, a payment event, a purchase event, a recommendation event, a login event, and the like. Correspondingly, the N characteristics include respective attribute characteristics of each participant of the business event.

For example, in one example, the first business object is a recommended event involving a first user and a first item. Accordingly, the N characteristics may include a user attribute characteristic of the first user and an item attribute characteristic of the first item.

For example, in yet another example, the first business object is a payment event involving two users, a first user and a second user. Accordingly, the N features may include respective user attribute features of the first user and the second user. Examples of N features in the case of other business events are not enumerated one by one.

For the N features of the various business objects exemplified above, the feature values thereof may be encoded as d-dimensional vectors, respectively, thereby forming N d-dimensional vectors. The encoding of the characteristic values may take a number of forms. For example, in one example, for the feature values of some feature items, one-hot (one-hot) encoding may be employed; in another example, the feature values may also be mapped to d-dimensional vectors using a look-up table. In one example, a predetermined word embedding tool (e.g., word2 vec) may also be used to convert the text therein into a d-dimensional vector.

Thus, the obtained N features correspond to N d-dimensional vectors, and form an N x d-dimensional matrix, which is called an initial feature matrix. The initial feature matrix corresponds to X in FIG. 1₀。

Next, in step 22, a multilevel feature cross processing is performed on the initial feature matrix to obtain a multilevel processing matrix. Any one of the stages is described below and denoted aslStage, process of feature intersection processing.

FIG. 3 illustrates a second embodimentlThe level feature intersection process, the sub-process of step 22 in fig. 2. It is understood that the first steplStage feature cross processing to obtain the processing matrix of the previous stage output

As the matrix of the current stage to be processed, the characteristic cross processing is carried out on the matrix, and the processing matrix of the next stage is output

. For simplicity of description, the current level matrix to be processed is described below

The ith feature vector of

(corresponding to the ith feature in the original feature matrix)。

As shown in fig. 3, for the ith feature vector

The cross processing of (2) comprises the following processes: step 31, for the ith original vector of the corresponding position in the initial feature matrix

And each level feature vector in the level matrix (which may be denoted as

) Respectively carrying out linear transformation to obtain an ith transformation vector and each current-level transformation vector; then, in step 32, the fusion results of the fusion of the ith transformation vector and each of the current-stage transformation vectors are weighted and combined according to each degree of correlation between the ith transformation vector and each of the current-stage transformation vectors; in step 33, the i-th feature vector is determined based on the combination result of the weighted combination

Processing the matrix at the next stage

Feature vector of

. The manner in which the above steps are performed is described in detail below.

First, in step 31, for the ith original vector

And each of the level-of-cost feature vectors

And respectively carrying out linear transformation to obtain the ith transformation vector and each current-level transformation vector. The linear transformation described above may be implemented using a parameter matrix. Specifically, the sum can be calculated for each of the present-level feature vectorsAnd respectively applying a parameter matrix W to the ith original vector, and thus obtaining each current-level transformation vector and the ith transformation vector.

In one embodiment, the parameter matrix W may be a unified parameter matrix in a multi-level feature intersection process.

In another embodiment, the parameter matrix W may be different in stages. For the current secondlFor stage processing, the corresponding second stage is usedlLevel parameter matrix

For each level of feature vector

And ith original vector

Respectively processing the two to obtain each level of transformation vector

And ith transform vector

。

In yet another embodiment, different parameter matrices may be used, respectively for the ith original vector

And each of the level-of-cost feature vectors

A linear transformation is performed. For example, a first parameter matrix may be utilized

For the ith original vector

Performing linear transformation to obtain the ith transformation vector

(ii) a Using a second parameter matrix

For each level of feature vector

Performing linear transformation to obtain each current-level transformation vector

. The above first parameter matrix

And a second parameter matrix

And the different levels can be different or the same.

It should be understood that the values of the elements in the above various parameter matrices can be determined by training the neural network model.

For convenience of description, each of the obtained current-level transformation vectors will be referred to as

Let the i-th transformation vector be

Wherein the parameter matrix W includes the cases of the above embodiments.

Next, at step 32, a vector is transformed according to the ith transform vector

And each current-level transformation vector

The ith transform vector is subjected to the respective correlation degrees

Respectively transforming the vectors with respective levels

And carrying out weighted combination on the fusion results of the fusion.

Fig. 4 shows the process steps of fusing the transform vectors and performing a weighted combination of the fusion results in one embodiment, i.e., the sub-steps of step 32 above.

As shown in FIG. 4, in step 321, the i-th transformed vector is determined

And the above-mentioned each level of conversion vector

The respective degrees of correlation therebetween. Specifically, a correlation calculation function f may be introduced to calculate the ith transformation vector respectively

And j' th current-level transformation vector

Degree of correlation e between_ij:

(5)

The correlation calculation function f may employ various correlation calculation methods. In one example, the correlation calculation function f is used to calculate the cosine similarity between the ith and jth transform vectors. In another example, the correlation calculation function f is used to calculate the inner product result (i.e., vector dot product) of the ith transform vector and the jth current-level transform vector as its correlation. In yet another example, the correlation calculation function f is configured to calculate a vector distance, such as a euclidean distance, between the ith and jth transform vectors, and determine the correlation according to the vector distance such that the correlation is inversely related to the vector distance. The correlation calculation function may also take a form of calculation.

Then, in step 322, each weight factor corresponding to each current-level transformation vector is determined according to each correlation.

In one embodiment, the correlation between each of the current-level transformation vectors and the ith transformation vector determined above is directly used as the corresponding weighting factor.

In another embodiment, the correlation corresponding to each of the present-level transformation vectors is normalized, and the normalized value is used as a weight factor.

More specifically, in one example, the weighting factor corresponding to the jth current-level transformation vector may be obtained by using proportional normalization

:

(6)

In another example, the weighting factor corresponding to the jth current-level transformation vector can be obtained by normalization using a softmax function

:

(7)

In addition, in step 323, the ith transform vector is applied

Respectively transforming the vectors with respective levels

Performing fusion operation to obtain each fusion vector A_ij：

(8)

Wherein the content of the first and second substances,

indicating a fusion operation.

In one example, the above described fusion operation is a bitwise multiplication of two vectors.

In another example, other fusion methods may be used, such as summation, averaging, etc. It is to be understood that the fusion operation herein needs to make the fused vector dimension unchanged.

Next, at step 324, a weighting factor is determined based on each of the weighting factors obtained at step 322

For each fusion vector A obtained in step 323_ijAnd carrying out weighted combination to obtain a combination result C:

(9)

in the above, through steps 321 to 324, step 32 in fig. 3 is performed. Returning to fig. 3, after step 32 is performed, it is determined that the ith eigenvector processes the matrix at the next stage based on the combination result C of the above weighted combination in step 33

Feature vector of

。

In one embodiment, the combination result C is used as the ith feature vector

Next stage feature vector of

。

In another embodiment, the results are combined as described aboveAdding the ith feature vector on the basis of C

Itself as its next level feature vector

Namely:

(10)

in another embodiment, on the basis of the combination result C, the offset vector and the ith feature vector are added as the feature vector of the next level

Namely:

（11）

wherein the content of the first and second substances,

is an offset vector.

Thus, through the process of FIG. 3, the current level matrix to be processed

Any one of the feature vectors

Performing feature cross processing to obtain the next stage feature vector

. The feature cross processing is carried out on each feature vector in the matrix of the current level, so that a processing matrix of the next level is obtained

。

It is to be understood that FIG. 3 illustrates an optional second embodimentlThe feature interleaving process of a stage, which is a sub-step of step 22 in fig. 2. Returning to step 22 of FIG. 2, each stage of the multi-stage feature intersection processing is performed as shown in FIG. 3, so that the multi-stage processing respectively obtains the corresponding multi-stage processing matrix

And m is the stage number of the characteristic cross processing. Since each stage of processing adopts a mode of non-linear fusion between vectors and weighted combination based on attention for each feature vector, such as shown in formulas (9) to (11), sufficient cross combination operation is performed between features, and the obtained high-order features do not depend on the original feature matrix X linearly₀。

Then, in step 23 of fig. 2, a characterization vector corresponding to the first service object is obtained according to the last stage processing matrix in the multi-stage processing matrix. In this step, the resulting matrix containing the high-order features is processed into a vector form for subsequent prediction.

Specifically, in one embodiment, the token vector is obtained by pooling the last-stage processing matrix. The pooling process may include, maximum pooling, average pooling, attention-based pooling, and the like. In another embodiment, the last-stage processing matrix may be further processed into a token vector in other manners, such as vector stitching.

Then, in step 24, a traffic prediction is performed on the first traffic object according to the above-mentioned characterization vector. Specifically, a prediction function such as softmax may be applied to the above token vector to obtain a traffic prediction result. In one embodiment, the business prediction result may be a classification result, i.e., a classification of the first business object predicted, such as a user class, a merchant class, a binary result of whether the payment event is safe, and so on. In another embodiment, the business prediction result may also be a regression value, i.e., a score predicted to result in the first business object, such as a good score of an item, a security score of a payment event, a recommendation score of a recommendation event, and so on.

Reviewing the above process, in the multi-stage feature intersection processing, a mode of nonlinear fusion among feature vectors and attention-based weighted combination is adopted, so that sufficient cross combination operation is obtained among features, and high-order features with more expressive power are obtained. The service prediction is executed based on the high-order characteristics, so that the accuracy of the service prediction can be further improved. And moreover, a basis and possibility are provided for the interpretability of the business prediction result based on the attention combination mode.

According to an embodiment of another aspect, a neural network model for business prediction of business objects is provided. Fig. 5 illustrates a schematic structural diagram of a neural network model that may be deployed in any device, platform, or cluster of devices having data storage, computing, processing capabilities, according to one embodiment. As shown in fig. 5, the neural network model 500 for business prediction of business objects includes:

the input layer 51 is configured to obtain an initial feature matrix corresponding to a first service object, where the initial feature matrix includes N original vectors obtained by encoding feature values of N features of the first service object;

a multilevel cross processing layer 52, configured to perform multilevel feature cross processing on the initial feature matrix to obtain a multilevel processing matrix; each stage of cross processing layer is used for respectively carrying out linear transformation on the ith original vector at the corresponding position in the initial characteristic matrix and each level of characteristic vector in the level matrix to obtain the ith transformation vector and each level of transformation vector for any ith characteristic vector in the level matrix to be processed; according to the correlation degrees between the ith conversion vector and each current-level conversion vector, performing weighted combination on the fusion result obtained by fusing the ith conversion vector and each current-level conversion vector respectively, and determining the feature vector of the ith feature vector in the next-level processing matrix based on the combination result;

the pooling layer 53 is used for obtaining a characterization vector corresponding to the first service object according to the last-stage processing matrix in the multi-stage processing matrix;

and the output layer 54 is configured to perform service prediction on the first service object according to the characterization vector.

According to one embodiment, the first business object is one of the following entity objects: the user, the merchant, the commodity and the article to be recommended. Correspondingly, the N features include attribute features of the entity object.

According to another embodiment, the first business object is a business event, and the business event includes one of the following: payment events, purchase events, recommendation events, login events. Correspondingly, the N characteristics include respective attribute characteristics of each participant of the business event.

In one embodiment, the linear transformation performed in each of the multiple stages of interleaving layers 52 specifically includes: and respectively carrying out linear transformation on the ith transformation vector and each current-level feature vector by using the current-level parameter matrix corresponding to the current level to obtain the ith transformation vector and each current-level transformation vector.

In another embodiment, the linear transformation in each level of the interleaving layer specifically includes: performing linear transformation on the ith original vector by using a first parameter matrix to obtain an ith transformation vector; and performing linear transformation on each level of feature vector by using a second parameter matrix to obtain each level of transformation vector.

In one embodiment, the weighted combination performed in each level of the interleaving layer specifically includes:

determining respective degrees of correlation between the ith transform vector and the respective present-level transform vectors;

determining each weight factor corresponding to each current-level transformation vector according to each correlation degree;

respectively carrying out fusion operation on the ith transformation vector and each current-level transformation vector to obtain each fusion vector;

and performing weighted combination on each fusion vector according to each weight factor to obtain a combination result.

More specifically, in each example, the correlation is determined by: calculating cosine similarity of the ith transformation vector and each current-level transformation vector as the correlation degree; or, calculating the inner product result of the ith transformation vector and each current-stage transformation vector as the correlation; or calculating the vector distance between the ith transformation vector and each current-stage transformation vector, and determining the correlation according to the vector distance.

In one embodiment, the fusing operation includes one of: multiplying by bit, summing and averaging.

According to one embodiment, each level of the interleaving layer is specifically configured to: on the basis of the combination result, adding an offset vector and the ith feature vector as a feature vector in a next-stage processing matrix.

According to one embodiment, the pooling layer 53 may be implemented by several fully connected layers. In a specific example, the pooling layer 53 may pool the last-stage processing matrix to obtain the characterization vector, where the pooling includes one of: maximal pooling, average pooling, attention-based pooling.

Through the neural network model, the characteristics of the business object are subjected to more effective cross combination processing, and higher-order characteristics with more expressive power are obtained, so that the accuracy of business prediction is improved.

According to an embodiment of yet another aspect, an apparatus for business prediction of a business object is provided, which may be implemented as any device, platform or cluster of devices having data storage, computing, processing capabilities. Fig. 6 shows a schematic block diagram of a traffic prediction apparatus according to an embodiment. As shown in fig. 6, the prediction apparatus 600 includes:

an obtaining unit 61, configured to obtain an initial feature matrix corresponding to a first service object, where the initial feature matrix includes N original vectors obtained by encoding feature values of N features of the first service object;

a plurality of cross processing units 62 configured to perform multi-stage feature cross processing on the initial feature matrix to obtain a multi-stage processing matrix; each cross processing unit is configured to perform linear transformation on the ith original vector at the corresponding position in the initial feature matrix and each ith feature vector in the current-level matrix to obtain an ith transformation vector and each current-level transformation vector for any ith feature vector in the current-level matrix to be processed; according to the correlation degrees between the ith conversion vector and each current-level conversion vector, performing weighted combination on the fusion result obtained by fusing the ith conversion vector and each current-level conversion vector respectively, and determining the feature vector of the ith feature vector in the next-level processing matrix based on the combination result;

the pooling unit 63 is configured to obtain a characterization vector corresponding to the first service object according to the last-stage processing matrix in the multi-stage processing matrices;

and the prediction unit 64 is configured to perform service prediction on the first service object according to the characterization vector.

Through the device, the characteristics of the business object are subjected to more effective cross combination processing, and high-order characteristics with more expressive power are obtained, so that the accuracy of business prediction is improved.

According to an embodiment of another aspect, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method described in connection with fig. 2 to 4.

According to an embodiment of yet another aspect, there is also provided a computing device comprising a memory and a processor, the memory having stored therein executable code, the processor implementing the method described in conjunction with fig. 2-4 when executing the executable code.

Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in this invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.

The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present invention should be included in the scope of the present invention.

Claims

1. A method of business prediction for a business object, comprising:

performing multi-stage feature cross processing on the initial feature matrix to obtain a multi-stage processing matrix; performing linear transformation on the ith original vector at the corresponding position in the initial feature matrix and each level of feature vector in the level matrix to obtain an ith transformation vector and each level of transformation vector; according to each weight factor determined based on each correlation degree between the ith transformation vector and each current-stage transformation vector, performing weighted combination based on an attention mechanism on a fusion result obtained by fusing the ith transformation vector and each current-stage transformation vector respectively, and determining a feature vector of the ith feature vector in a next-stage processing matrix based on a combination result, wherein the fusion result has the same dimension as that of the ith transformation vector;

2. The method of claim 1, wherein,

the first business object is one of the following entity objects: a user, a merchant, a commodity, an item to be recommended; the N features include attribute features of the entity object.

3. The method of claim 1, wherein,

the first business object is a business event, and the business event comprises one of the following: payment events, purchase events, recommendation events, login events; the N characteristics comprise respective attribute characteristics of each participant of the business event.

4. The method of claim 1, wherein the performing linear transformation on the ith original vector of the corresponding position in the initial feature matrix and each level feature vector in the level matrix respectively comprises:

and respectively carrying out linear transformation on the ith transformation vector and each current-level feature vector by using the current-level parameter matrix corresponding to the current level to obtain the ith transformation vector and each current-level transformation vector.

5. The method of claim 1, wherein the performing linear transformation on the ith original vector of the corresponding position in the initial feature matrix and each level feature vector in the level matrix respectively comprises:

performing linear transformation on the ith original vector by using a first parameter matrix to obtain an ith transformation vector; and performing linear transformation on each level of feature vector by using a second parameter matrix to obtain each level of transformation vector.

6. The method according to claim 1, wherein the weighted combination of the fusion results of the ith transform vector and the respective current-level transform vectors according to respective weight factors determined based on respective correlation degrees between the ith transform vector and the respective current-level transform vectors comprises:

7. The method of claim 6, wherein determining respective degrees of correlation between the ith transform vector and the respective present-level transform vectors comprises:

calculating cosine similarity of the ith transformation vector and each current-level transformation vector as the correlation degree; alternatively, the first and second electrodes may be,

calculating the inner product result of the ith transformation vector and each current-stage transformation vector as the correlation degree; alternatively, the first and second electrodes may be,

and calculating the vector distance between the ith transformation vector and each current-stage transformation vector, and determining the correlation degree according to the vector distance.

8. The method of claim 6, wherein the fusion operation comprises one of:

multiplying by bit, summing and averaging.

9. The method of claim 1, wherein determining the eigenvector of the ith eigenvector in the next-level processing matrix based on the combined result comprises:

on the basis of the combination result, adding an offset vector and the ith feature vector as a feature vector in a next-stage processing matrix.

10. The method of claim 1, wherein obtaining the characterization vector corresponding to the first service object according to the last processing matrix in the multi-stage processing matrices comprises:

pooling the final-stage processing matrix to obtain the characterization vector, wherein the pooling comprises one of the following steps: maximal pooling, average pooling, attention-based pooling.

11. A neural network model for business prediction for a business object, comprising:

the multilevel cross processing layer is used for carrying out multilevel characteristic cross processing on the initial characteristic matrix to obtain a multilevel processing matrix; each stage of cross processing layer is used for respectively carrying out linear transformation on the ith original vector at the corresponding position in the initial characteristic matrix and each level of characteristic vector in the level matrix to obtain the ith transformation vector and each level of transformation vector for any ith characteristic vector in the level matrix to be processed; according to each weight factor determined based on each correlation degree between the ith transformation vector and each current-stage transformation vector, performing weighted combination based on an attention mechanism on a fusion result obtained by fusing the ith transformation vector and each current-stage transformation vector respectively, and determining a feature vector of the ith feature vector in a next-stage processing matrix based on a combination result, wherein the fusion result has the same dimension as that of the ith transformation vector;

12. The neural network model of claim 11,

13. The neural network model of claim 11,

14. The neural network model of claim 11, wherein the linear transformation performed in each level of the cross-processing layer specifically comprises:

15. The neural network model of claim 11, wherein the linear transformation performed in each level of the cross-processing layer specifically comprises:

16. The neural network model of claim 11, wherein the weighted combination performed in each level of the cross-processing layer specifically comprises:

17. The neural network model of claim 16, wherein the weighted combination performed in each level of the cross-processing layer specifically comprises:

18. The neural network model of claim 16, wherein the fusion operation comprises one of:

multiplying by bit, summing and averaging.

19. The neural network model of claim 11, wherein each level of intersection processing layer is specifically configured to:

20. The neural network model of claim 11, wherein the pooling layer is specifically configured to:

21. An apparatus for business prediction for a business object, comprising:

the multiple cross processing units are configured to perform multi-stage feature cross processing on the initial feature matrix to obtain a multi-stage processing matrix; each cross processing unit is configured to perform linear transformation on the ith original vector at the corresponding position in the initial feature matrix and each ith feature vector in the current-level matrix to obtain an ith transformation vector and each current-level transformation vector for any ith feature vector in the current-level matrix to be processed; according to weight factors determined based on the correlation degrees between the ith transformation vector and the corresponding transformation vectors, performing weighted combination based on an attention mechanism on the fusion results obtained by fusing the ith transformation vector and the corresponding transformation vectors, and determining the feature vector of the ith feature vector in a next-stage processing matrix based on the combination results, wherein the fusion results have the same dimension as the ith transformation vector;

22. A computing device comprising a memory and a processor, wherein the memory has stored therein executable code that, when executed by the processor, performs the method of any of claims 1-10.