CN111222722B - Method, neural network model and device for business prediction for business object - Google Patents

Method, neural network model and device for business prediction for business object Download PDF

Info

Publication number
CN111222722B
CN111222722B CN202010329614.XA CN202010329614A CN111222722B CN 111222722 B CN111222722 B CN 111222722B CN 202010329614 A CN202010329614 A CN 202010329614A CN 111222722 B CN111222722 B CN 111222722B
Authority
CN
China
Prior art keywords
vector
ith
level
matrix
transformation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010329614.XA
Other languages
Chinese (zh)
Other versions
CN111222722A (en
Inventor
辛超
崔卿
向彪
周俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202010329614.XA priority Critical patent/CN111222722B/en
Publication of CN111222722A publication Critical patent/CN111222722A/en
Application granted granted Critical
Publication of CN111222722B publication Critical patent/CN111222722B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Abstract

The embodiment of the specification provides a method for business prediction aiming at business objects and a neural network model. In the method, an initial feature matrix corresponding to a business object is obtained first, wherein the initial feature matrix comprises N original vectors corresponding to N features of the business object. Then, performing multi-stage processing on the initial feature matrix, wherein each stage of processing comprises that for the ith feature vector to be processed at the stage, the corresponding ith original vector and each corresponding stage of feature vector in the initial feature matrix are respectively subjected to linear transformation to obtain the ith transformation vector and each corresponding stage of transformation vector; and performing weighted combination on the fusion result according to the correlation degrees between the ith transformation vector and each current-stage transformation vector, thereby determining the next-stage feature vector of the ith feature vector. And obtaining a representation vector corresponding to the service object according to the matrix obtained by the last-stage processing, and performing service prediction on the service object according to the representation vector.

Description

Method, neural network model and device for business prediction for business object
Technical Field
One or more embodiments of the present specification relate to the field of machine learning, and more particularly, to a method and apparatus for business prediction for business objects.
Background
With the development of computer technology, machine learning has been applied to various technical fields for analyzing and predicting various business data. For example, the classification of the user can be predicted according to the attribute characteristics of the user, so that customized personalized services are provided for the user; the recommendation degree between the users and the articles can be predicted according to the integration of the information of the users and the articles, so that the appropriate articles can be recommended for the users; for another example, traffic peaks may be predetermined based on predictions of times users visit websites, thereby deploying the network environment in advance.
In a prediction scene aiming at various business objects, in order to improve the accuracy of model prediction as much as possible, abundant feature data with different dimensions are usually introduced. The characteristics of different dimensions depict different information of the service scene from different angles. In most cases, the fitting target of the model and each basic feature are not in a simple linear relationship, so that the model trained based on the basic features can only express a linear combination of feature information, and the model expression capability is limited. Therefore, it is desirable to efficiently combine features to improve the expressive power of the model. The traditional feature combination work is manually designed by engineering personnel according to business experience, has high cost and poor business expansibility and is limited by the business understanding of the engineering personnel.
Therefore, an improved scheme is expected to be provided, the feature combination is more effectively carried out on the service features, the modeling limitation on high-order features is avoided, the expression capacity of the model is improved, and the prediction accuracy of the model is improved.
Disclosure of Invention
One or more embodiments of the present disclosure describe a method and a neural network model for service prediction of a service object, which can perform more effective high-order combination on features of the service object, and improve prediction accuracy.
According to a first aspect, there is provided a method for business prediction for a business object, comprising:
acquiring an initial feature matrix corresponding to a first service object, wherein the initial feature matrix comprises N original vectors obtained by coding feature values of N features of the first service object;
performing multi-stage feature cross processing on the initial feature matrix to obtain a multi-stage processing matrix; performing linear transformation on the ith original vector at the corresponding position in the initial feature matrix and each level of feature vector in the level matrix to obtain an ith transformation vector and each level of transformation vector; according to the correlation degrees between the ith conversion vector and each current-level conversion vector, performing weighted combination on the fusion result obtained by fusing the ith conversion vector and each current-level conversion vector respectively, and determining the feature vector of the ith feature vector in the next-level processing matrix based on the combination result;
obtaining a representation vector corresponding to the first business object according to the last-stage processing matrix in the multi-stage processing matrix;
and performing service prediction on the first service object according to the characterization vector.
According to one embodiment, the first business object may be one of the following entity objects: a user, a merchant, a commodity, an item to be recommended; correspondingly, the N features include attribute features of the entity object.
According to another embodiment, the first business object may be a business event, the business event comprising one of: payment events, purchase events, recommendation events, login events; correspondingly, the N characteristics include respective attribute characteristics of each participant of the business event.
In one embodiment, the linear transformation is performed on the ith original vector at the corresponding position in the initial feature matrix and each of the feature vectors at the current level in the initial feature matrix, specifically including: and respectively carrying out linear transformation on the ith transformation vector and each current-level feature vector by using the current-level parameter matrix corresponding to the current level to obtain the ith transformation vector and each current-level transformation vector.
In another embodiment, the linear transformation is performed on the ith original vector at the corresponding position in the initial feature matrix and each of the feature vectors at the current level in the initial feature matrix, specifically including: performing linear transformation on the ith original vector by using a first parameter matrix to obtain an ith transformation vector; and performing linear transformation on each level of feature vector by using a second parameter matrix to obtain each level of transformation vector.
According to one embodiment, the weighted combination is performed by: determining respective degrees of correlation between the ith transform vector and the respective present-level transform vectors; determining each weight factor corresponding to each current-level transformation vector according to each correlation degree; respectively carrying out fusion operation on the ith transformation vector and each current-level transformation vector to obtain each fusion vector; and performing weighted combination on each fusion vector according to each weight factor to obtain a combination result.
Further, the correlation may be determined by: calculating cosine similarity of the ith transformation vector and each current-level transformation vector as the correlation degree; or, calculating the inner product result of the ith transformation vector and each current-stage transformation vector as the correlation; or calculating the vector distance between the ith transformation vector and each current-stage transformation vector, and determining the correlation according to the vector distance.
In one embodiment, the fusing operation comprises one of: multiplying by bit, summing and averaging.
According to an embodiment, the determining the feature vector of the ith feature vector in the next-stage processing matrix based on the combination result specifically includes: on the basis of the combination result, adding an offset vector and the ith feature vector as a feature vector in a next-stage processing matrix.
In one embodiment, the characterization vector is obtained by: pooling the final-stage processing matrix to obtain the characterization vector, wherein the pooling comprises one of the following steps: maximal pooling, average pooling, attention-based pooling.
According to a second aspect, there is provided a neural network model for business prediction for a business object, comprising:
the system comprises an input layer and a service object, wherein the input layer is used for acquiring an initial feature matrix corresponding to a first service object, and the initial feature matrix comprises N original vectors obtained by coding feature values of N features of the first service object;
the multilevel cross processing layer is used for carrying out multilevel characteristic cross processing on the initial characteristic matrix to obtain a multilevel processing matrix; each stage of cross processing layer is used for respectively carrying out linear transformation on the ith original vector at the corresponding position in the initial characteristic matrix and each level of characteristic vector in the level matrix to obtain the ith transformation vector and each level of transformation vector for any ith characteristic vector in the level matrix to be processed; according to the correlation degrees between the ith conversion vector and each current-level conversion vector, performing weighted combination on the fusion result obtained by fusing the ith conversion vector and each current-level conversion vector respectively, and determining the feature vector of the ith feature vector in the next-level processing matrix based on the combination result;
the pooling layer is used for obtaining a characterization vector corresponding to the first service object according to the last-stage processing matrix in the multi-stage processing matrices;
and the output layer is used for carrying out service prediction on the first service object according to the characterization vector.
According to a third aspect, there is provided an apparatus for performing traffic prediction for a traffic object, comprising:
an obtaining unit, configured to obtain an initial feature matrix corresponding to a first service object, where the initial feature matrix includes N original vectors obtained by encoding feature values of N features of the first service object;
the multiple cross processing units are configured to perform multi-stage feature cross processing on the initial feature matrix to obtain a multi-stage processing matrix; each cross processing unit is configured to perform linear transformation on the ith original vector at the corresponding position in the initial feature matrix and each ith feature vector in the current-level matrix to obtain an ith transformation vector and each current-level transformation vector for any ith feature vector in the current-level matrix to be processed; according to the correlation degrees between the ith conversion vector and each current-level conversion vector, performing weighted combination on the fusion result obtained by fusing the ith conversion vector and each current-level conversion vector respectively, and determining the feature vector of the ith feature vector in the next-level processing matrix based on the combination result;
the pooling unit is configured to obtain a representation vector corresponding to the first service object according to the last-stage processing matrix in the multi-stage processing matrix;
and the prediction unit is configured to perform service prediction on the first service object according to the characterization vector.
According to a fourth aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of the first aspect.
According to a fifth aspect, there is provided a computing device comprising a memory and a processor, wherein the memory has stored therein executable code, and the processor, when executing the executable code, implements the method of the first aspect.
According to the method, the device and the neural network model provided by the embodiment of the specification, in the multi-stage feature intersection processing, a non-linear fusion mode and an attention-based weighted combination mode are adopted among feature vectors, so that sufficient cross combination operation is obtained among the features, and high-order features with more expressive power are obtained. The service prediction is executed based on the high-order characteristics, so that the accuracy of the service prediction can be further improved. And moreover, a basis and possibility are provided for the interpretability of the business prediction result based on the attention combination mode.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a schematic diagram illustrating an implementation scenario of an embodiment disclosed herein;
FIG. 2 illustrates a method of business prediction for a business object, according to one embodiment;
FIG. 3 illustrates a second embodimentlClass de superA cross-check processing process;
FIG. 4 illustrates the process steps of fusing transform vectors and performing weighted combination of the fusion results in one embodiment;
FIG. 5 illustrates a schematic structural diagram of a neural network model according to one embodiment;
fig. 6 shows a schematic block diagram of a traffic prediction apparatus according to an embodiment.
Detailed Description
The scheme provided by the specification is described below with reference to the accompanying drawings.
As previously mentioned, in order to improve the feature expression capability of the model for the business object, it is desirable to combine features more efficiently. In one approach, a multi-layer neural network is utilized to perform high order combining of features in various hidden layers of the network. For example, high order Cross-combining between features can be done through a Deep Cross-connect Network (DCN).
Specifically, the original input obtained by the deep cross network is composed of N feature items, and each feature item is represented as X by a d-dimensional feature vector0 iI =1,2 … N. The DCN firstly splices all N eigenvectors to obtain an original input vector X0:
Figure DEST_PATH_IMAGE001
(1)
I.e. the original input vector X0And splicing the feature vectors of the feature items.
Then, at each feature intersection processing layer, feature intersection combination is performed, whereinlThe combination of the layers satisfies the following formula:
Figure 387423DEST_PATH_IMAGE002
(2)
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE003
is as followslThe input of the layer(s) is (are),
Figure 153385DEST_PATH_IMAGE004
is as followslThe output of the layer(s) is,
Figure DEST_PATH_IMAGE005
Figure 888123DEST_PATH_IMAGE006
is as followslNetwork parameters of the layer.
According to the above formula (2), the first step of DCNlLayer output
Figure 965800DEST_PATH_IMAGE004
Including primitive features from first order tolAll possible combinations of orders. DCNs can thus achieve arbitrary finite high-order feature combinations by stacking feature-crossing layers.
However, if the bias term in equation (2) is ignored
Figure 545817DEST_PATH_IMAGE006
By careful analysis, the following rules can be found when each layer is changed:
Figure DEST_PATH_IMAGE007
(3)
wherein the coefficients
Figure 774409DEST_PATH_IMAGE008
In a similar manner, the first and second substrates are,
Figure DEST_PATH_IMAGE009
(4)
wherein the coefficients
Figure 758677DEST_PATH_IMAGE010
It can be seen that modeling of DCN for higher order feature combinations degrades to the original features
Figure DEST_PATH_IMAGE011
Although the scale of scaling is related to the input features, its expressive power is relatively limited.
In order to further improve the feature expression capability of the neural network model for business prediction and improve the prediction accuracy of the neural network model, according to the embodiment of the invention, a further feature cross combination mode is provided to avoid the linear degradation of high-order feature combination.
Fig. 1 is a schematic view of an implementation scenario of an embodiment disclosed in this specification. As shown in the figure, firstly, the neural network model of the embodiment arranges the original features of the business objects into a feature matrix
Figure 212399DEST_PATH_IMAGE011
And not feature vectors. When each cross processing layer carries out feature cross processing of each level, on one hand, each feature vector in the matrix is fused by combining the original feature matrix, and the fusion results are weighted and combined based on an attention mechanism. The above fusion and attention-based combination are both non-linear operations, which makes the feature matrix output by the last cross processing layer contain various high-order combination modes among feature vectors without being degraded to linear scaling of the original vector. Therefore, the characteristic expression capability of the neural network model is enhanced, and the prediction accuracy of the neural network model on the business object is improved.
The whole process of business prediction for business objects under the above concept is described in detail below.
FIG. 2 illustrates a method of business prediction for a business object, according to one embodiment. It is to be appreciated that the method can be performed by any apparatus, device, platform, cluster of devices having computing and processing capabilities. In one embodiment, the method may be performed by a neural network model that may be deployed in any device, apparatus, platform, cluster of devices having computing, processing capabilities. As shown in fig. 2, the method for traffic prediction at least comprises the following steps.
First, in step 21, an initial feature matrix corresponding to a first service object to be predicted is obtained, where the initial feature matrix includes N original vectors obtained by encoding feature values of N features of the first service object.
In one embodiment, the first business object corresponds to a single entity object, and the N features include attribute features of the entity object.
For example, in one example, the entity object is a user. At this time, the above-mentioned N characteristics may be attribute characteristics of the user, such as basic attribute characteristics of age, sex, registration time, education level, and the like, and behavior attribute characteristics such as recent browsing history, recent shopping history, and the like.
In another example, the entity object may be a merchant. At this time, the above-mentioned N characteristics may be attribute characteristics of the merchant, such as merchant category, registration time, commodity quantity, sales volume, number of people concerned, and the like.
In other examples, the entity object may also be a commodity, or an item to be recommended (e.g., an article to be pushed, music, a movie, etc.). Correspondingly, the N characteristics include attribute characteristics of the corresponding goods or articles.
In another embodiment, the first business object to be predicted is a business event, and the business event may be, for example, a payment event, a purchase event, a recommendation event, a login event, and the like. Correspondingly, the N characteristics include respective attribute characteristics of each participant of the business event.
For example, in one example, the first business object is a recommended event involving a first user and a first item. Accordingly, the N characteristics may include a user attribute characteristic of the first user and an item attribute characteristic of the first item.
For example, in yet another example, the first business object is a payment event involving two users, a first user and a second user. Accordingly, the N features may include respective user attribute features of the first user and the second user. Examples of N features in the case of other business events are not enumerated one by one.
For the N features of the various business objects exemplified above, the feature values thereof may be encoded as d-dimensional vectors, respectively, thereby forming N d-dimensional vectors. The encoding of the characteristic values may take a number of forms. For example, in one example, for the feature values of some feature items, one-hot (one-hot) encoding may be employed; in another example, the feature values may also be mapped to d-dimensional vectors using a look-up table. In one example, a predetermined word embedding tool (e.g., word2 vec) may also be used to convert the text therein into a d-dimensional vector.
Thus, the obtained N features correspond to N d-dimensional vectors, and form an N x d-dimensional matrix, which is called an initial feature matrix. The initial feature matrix corresponds to X in FIG. 10
Next, in step 22, a multilevel feature cross processing is performed on the initial feature matrix to obtain a multilevel processing matrix. Any one of the stages is described below and denoted aslStage, process of feature intersection processing.
FIG. 3 illustrates a second embodimentlThe level feature intersection process, the sub-process of step 22 in fig. 2. It is understood that the first steplStage feature cross processing to obtain the processing matrix of the previous stage output
Figure 861686DEST_PATH_IMAGE003
As the matrix of the current stage to be processed, the characteristic cross processing is carried out on the matrix, and the processing matrix of the next stage is output
Figure 903372DEST_PATH_IMAGE004
. For simplicity of description, the current level matrix to be processed is described below
Figure 307809DEST_PATH_IMAGE003
The ith feature vector of
Figure 94499DEST_PATH_IMAGE012
(corresponding to the ith feature in the original feature matrix)。
As shown in fig. 3, for the ith feature vector
Figure 813056DEST_PATH_IMAGE012
The cross processing of (2) comprises the following processes: step 31, for the ith original vector of the corresponding position in the initial feature matrix
Figure DEST_PATH_IMAGE013
And each level feature vector in the level matrix (which may be denoted as
Figure 267172DEST_PATH_IMAGE014
) Respectively carrying out linear transformation to obtain an ith transformation vector and each current-level transformation vector; then, in step 32, the fusion results of the fusion of the ith transformation vector and each of the current-stage transformation vectors are weighted and combined according to each degree of correlation between the ith transformation vector and each of the current-stage transformation vectors; in step 33, the i-th feature vector is determined based on the combination result of the weighted combination
Figure 842509DEST_PATH_IMAGE012
Processing the matrix at the next stage
Figure 850917DEST_PATH_IMAGE004
Feature vector of
Figure DEST_PATH_IMAGE015
. The manner in which the above steps are performed is described in detail below.
First, in step 31, for the ith original vector
Figure 183285DEST_PATH_IMAGE013
And each of the level-of-cost feature vectors
Figure 632852DEST_PATH_IMAGE014
And respectively carrying out linear transformation to obtain the ith transformation vector and each current-level transformation vector. The linear transformation described above may be implemented using a parameter matrix. Specifically, the sum can be calculated for each of the present-level feature vectorsAnd respectively applying a parameter matrix W to the ith original vector, and thus obtaining each current-level transformation vector and the ith transformation vector.
In one embodiment, the parameter matrix W may be a unified parameter matrix in a multi-level feature intersection process.
In another embodiment, the parameter matrix W may be different in stages. For the current secondlFor stage processing, the corresponding second stage is usedlLevel parameter matrix
Figure 379091DEST_PATH_IMAGE005
For each level of feature vector
Figure 812477DEST_PATH_IMAGE014
And ith original vector
Figure 138416DEST_PATH_IMAGE013
Respectively processing the two to obtain each level of transformation vector
Figure 98282DEST_PATH_IMAGE016
And ith transform vector
Figure DEST_PATH_IMAGE017
In yet another embodiment, different parameter matrices may be used, respectively for the ith original vector
Figure 421947DEST_PATH_IMAGE013
And each of the level-of-cost feature vectors
Figure 732843DEST_PATH_IMAGE014
A linear transformation is performed. For example, a first parameter matrix may be utilized
Figure 128052DEST_PATH_IMAGE018
For the ith original vector
Figure 880107DEST_PATH_IMAGE013
Performing linear transformation to obtain the ith transformation vector
Figure DEST_PATH_IMAGE019
(ii) a Using a second parameter matrix
Figure 975672DEST_PATH_IMAGE020
For each level of feature vector
Figure 445967DEST_PATH_IMAGE014
Performing linear transformation to obtain each current-level transformation vector
Figure DEST_PATH_IMAGE021
. The above first parameter matrix
Figure 330353DEST_PATH_IMAGE018
And a second parameter matrix
Figure 202494DEST_PATH_IMAGE020
And the different levels can be different or the same.
It should be understood that the values of the elements in the above various parameter matrices can be determined by training the neural network model.
For convenience of description, each of the obtained current-level transformation vectors will be referred to as
Figure 727017DEST_PATH_IMAGE022
Let the i-th transformation vector be
Figure DEST_PATH_IMAGE023
Wherein the parameter matrix W includes the cases of the above embodiments.
Next, at step 32, a vector is transformed according to the ith transform vector
Figure 950188DEST_PATH_IMAGE023
And each current-level transformation vector
Figure 890462DEST_PATH_IMAGE022
The ith transform vector is subjected to the respective correlation degrees
Figure 679426DEST_PATH_IMAGE023
Respectively transforming the vectors with respective levels
Figure 312533DEST_PATH_IMAGE022
And carrying out weighted combination on the fusion results of the fusion.
Fig. 4 shows the process steps of fusing the transform vectors and performing a weighted combination of the fusion results in one embodiment, i.e., the sub-steps of step 32 above.
As shown in FIG. 4, in step 321, the i-th transformed vector is determined
Figure 554158DEST_PATH_IMAGE023
And the above-mentioned each level of conversion vector
Figure 563703DEST_PATH_IMAGE022
The respective degrees of correlation therebetween. Specifically, a correlation calculation function f may be introduced to calculate the ith transformation vector respectively
Figure 410436DEST_PATH_IMAGE023
And j' th current-level transformation vector
Figure 214444DEST_PATH_IMAGE022
Degree of correlation e betweenij:
Figure 208945DEST_PATH_IMAGE024
(5)
The correlation calculation function f may employ various correlation calculation methods. In one example, the correlation calculation function f is used to calculate the cosine similarity between the ith and jth transform vectors. In another example, the correlation calculation function f is used to calculate the inner product result (i.e., vector dot product) of the ith transform vector and the jth current-level transform vector as its correlation. In yet another example, the correlation calculation function f is configured to calculate a vector distance, such as a euclidean distance, between the ith and jth transform vectors, and determine the correlation according to the vector distance such that the correlation is inversely related to the vector distance. The correlation calculation function may also take a form of calculation.
Then, in step 322, each weight factor corresponding to each current-level transformation vector is determined according to each correlation.
In one embodiment, the correlation between each of the current-level transformation vectors and the ith transformation vector determined above is directly used as the corresponding weighting factor.
In another embodiment, the correlation corresponding to each of the present-level transformation vectors is normalized, and the normalized value is used as a weight factor.
More specifically, in one example, the weighting factor corresponding to the jth current-level transformation vector may be obtained by using proportional normalization
Figure DEST_PATH_IMAGE025
:
Figure 225442DEST_PATH_IMAGE026
(6)
In another example, the weighting factor corresponding to the jth current-level transformation vector can be obtained by normalization using a softmax function
Figure 926682DEST_PATH_IMAGE025
:
Figure DEST_PATH_IMAGE027
(7)
In addition, in step 323, the ith transform vector is applied
Figure 514308DEST_PATH_IMAGE023
Respectively transforming the vectors with respective levels
Figure 933788DEST_PATH_IMAGE022
Performing fusion operation to obtain each fusion vector Aij
Figure 81872DEST_PATH_IMAGE028
(8)
Wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE029
indicating a fusion operation.
In one example, the above described fusion operation is a bitwise multiplication of two vectors.
In another example, other fusion methods may be used, such as summation, averaging, etc. It is to be understood that the fusion operation herein needs to make the fused vector dimension unchanged.
Next, at step 324, a weighting factor is determined based on each of the weighting factors obtained at step 322
Figure 903198DEST_PATH_IMAGE025
For each fusion vector A obtained in step 323ijAnd carrying out weighted combination to obtain a combination result C:
Figure 49008DEST_PATH_IMAGE030
(9)
in the above, through steps 321 to 324, step 32 in fig. 3 is performed. Returning to fig. 3, after step 32 is performed, it is determined that the ith eigenvector processes the matrix at the next stage based on the combination result C of the above weighted combination in step 33
Figure 752522DEST_PATH_IMAGE004
Feature vector of
Figure 376402DEST_PATH_IMAGE015
In one embodiment, the combination result C is used as the ith feature vector
Figure 317813DEST_PATH_IMAGE012
Next stage feature vector of
Figure 431262DEST_PATH_IMAGE015
In another embodiment, the results are combined as described aboveAdding the ith feature vector on the basis of C
Figure 559755DEST_PATH_IMAGE012
Itself as its next level feature vector
Figure 49643DEST_PATH_IMAGE015
Namely:
Figure DEST_PATH_IMAGE031
(10)
in another embodiment, on the basis of the combination result C, the offset vector and the ith feature vector are added as the feature vector of the next level
Figure 845560DEST_PATH_IMAGE015
Namely:
Figure 333173DEST_PATH_IMAGE032
(11)
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE033
is an offset vector.
Thus, through the process of FIG. 3, the current level matrix to be processed
Figure 214542DEST_PATH_IMAGE003
Any one of the feature vectors
Figure 445803DEST_PATH_IMAGE012
Performing feature cross processing to obtain the next stage feature vector
Figure 892965DEST_PATH_IMAGE015
. The feature cross processing is carried out on each feature vector in the matrix of the current level, so that a processing matrix of the next level is obtained
Figure 551479DEST_PATH_IMAGE004
It is to be understood that FIG. 3 illustrates an optional second embodimentlThe feature interleaving process of a stage, which is a sub-step of step 22 in fig. 2. Returning to step 22 of FIG. 2, each stage of the multi-stage feature intersection processing is performed as shown in FIG. 3, so that the multi-stage processing respectively obtains the corresponding multi-stage processing matrix
Figure 716881DEST_PATH_IMAGE034
And m is the stage number of the characteristic cross processing. Since each stage of processing adopts a mode of non-linear fusion between vectors and weighted combination based on attention for each feature vector, such as shown in formulas (9) to (11), sufficient cross combination operation is performed between features, and the obtained high-order features do not depend on the original feature matrix X linearly0
Then, in step 23 of fig. 2, a characterization vector corresponding to the first service object is obtained according to the last stage processing matrix in the multi-stage processing matrix. In this step, the resulting matrix containing the high-order features is processed into a vector form for subsequent prediction.
Specifically, in one embodiment, the token vector is obtained by pooling the last-stage processing matrix. The pooling process may include, maximum pooling, average pooling, attention-based pooling, and the like. In another embodiment, the last-stage processing matrix may be further processed into a token vector in other manners, such as vector stitching.
Then, in step 24, a traffic prediction is performed on the first traffic object according to the above-mentioned characterization vector. Specifically, a prediction function such as softmax may be applied to the above token vector to obtain a traffic prediction result. In one embodiment, the business prediction result may be a classification result, i.e., a classification of the first business object predicted, such as a user class, a merchant class, a binary result of whether the payment event is safe, and so on. In another embodiment, the business prediction result may also be a regression value, i.e., a score predicted to result in the first business object, such as a good score of an item, a security score of a payment event, a recommendation score of a recommendation event, and so on.
Reviewing the above process, in the multi-stage feature intersection processing, a mode of nonlinear fusion among feature vectors and attention-based weighted combination is adopted, so that sufficient cross combination operation is obtained among features, and high-order features with more expressive power are obtained. The service prediction is executed based on the high-order characteristics, so that the accuracy of the service prediction can be further improved. And moreover, a basis and possibility are provided for the interpretability of the business prediction result based on the attention combination mode.
According to an embodiment of another aspect, a neural network model for business prediction of business objects is provided. Fig. 5 illustrates a schematic structural diagram of a neural network model that may be deployed in any device, platform, or cluster of devices having data storage, computing, processing capabilities, according to one embodiment. As shown in fig. 5, the neural network model 500 for business prediction of business objects includes:
the input layer 51 is configured to obtain an initial feature matrix corresponding to a first service object, where the initial feature matrix includes N original vectors obtained by encoding feature values of N features of the first service object;
a multilevel cross processing layer 52, configured to perform multilevel feature cross processing on the initial feature matrix to obtain a multilevel processing matrix; each stage of cross processing layer is used for respectively carrying out linear transformation on the ith original vector at the corresponding position in the initial characteristic matrix and each level of characteristic vector in the level matrix to obtain the ith transformation vector and each level of transformation vector for any ith characteristic vector in the level matrix to be processed; according to the correlation degrees between the ith conversion vector and each current-level conversion vector, performing weighted combination on the fusion result obtained by fusing the ith conversion vector and each current-level conversion vector respectively, and determining the feature vector of the ith feature vector in the next-level processing matrix based on the combination result;
the pooling layer 53 is used for obtaining a characterization vector corresponding to the first service object according to the last-stage processing matrix in the multi-stage processing matrix;
and the output layer 54 is configured to perform service prediction on the first service object according to the characterization vector.
According to one embodiment, the first business object is one of the following entity objects: the user, the merchant, the commodity and the article to be recommended. Correspondingly, the N features include attribute features of the entity object.
According to another embodiment, the first business object is a business event, and the business event includes one of the following: payment events, purchase events, recommendation events, login events. Correspondingly, the N characteristics include respective attribute characteristics of each participant of the business event.
In one embodiment, the linear transformation performed in each of the multiple stages of interleaving layers 52 specifically includes: and respectively carrying out linear transformation on the ith transformation vector and each current-level feature vector by using the current-level parameter matrix corresponding to the current level to obtain the ith transformation vector and each current-level transformation vector.
In another embodiment, the linear transformation in each level of the interleaving layer specifically includes: performing linear transformation on the ith original vector by using a first parameter matrix to obtain an ith transformation vector; and performing linear transformation on each level of feature vector by using a second parameter matrix to obtain each level of transformation vector.
In one embodiment, the weighted combination performed in each level of the interleaving layer specifically includes:
determining respective degrees of correlation between the ith transform vector and the respective present-level transform vectors;
determining each weight factor corresponding to each current-level transformation vector according to each correlation degree;
respectively carrying out fusion operation on the ith transformation vector and each current-level transformation vector to obtain each fusion vector;
and performing weighted combination on each fusion vector according to each weight factor to obtain a combination result.
More specifically, in each example, the correlation is determined by: calculating cosine similarity of the ith transformation vector and each current-level transformation vector as the correlation degree; or, calculating the inner product result of the ith transformation vector and each current-stage transformation vector as the correlation; or calculating the vector distance between the ith transformation vector and each current-stage transformation vector, and determining the correlation according to the vector distance.
In one embodiment, the fusing operation includes one of: multiplying by bit, summing and averaging.
According to one embodiment, each level of the interleaving layer is specifically configured to: on the basis of the combination result, adding an offset vector and the ith feature vector as a feature vector in a next-stage processing matrix.
According to one embodiment, the pooling layer 53 may be implemented by several fully connected layers. In a specific example, the pooling layer 53 may pool the last-stage processing matrix to obtain the characterization vector, where the pooling includes one of: maximal pooling, average pooling, attention-based pooling.
Through the neural network model, the characteristics of the business object are subjected to more effective cross combination processing, and higher-order characteristics with more expressive power are obtained, so that the accuracy of business prediction is improved.
According to an embodiment of yet another aspect, an apparatus for business prediction of a business object is provided, which may be implemented as any device, platform or cluster of devices having data storage, computing, processing capabilities. Fig. 6 shows a schematic block diagram of a traffic prediction apparatus according to an embodiment. As shown in fig. 6, the prediction apparatus 600 includes:
an obtaining unit 61, configured to obtain an initial feature matrix corresponding to a first service object, where the initial feature matrix includes N original vectors obtained by encoding feature values of N features of the first service object;
a plurality of cross processing units 62 configured to perform multi-stage feature cross processing on the initial feature matrix to obtain a multi-stage processing matrix; each cross processing unit is configured to perform linear transformation on the ith original vector at the corresponding position in the initial feature matrix and each ith feature vector in the current-level matrix to obtain an ith transformation vector and each current-level transformation vector for any ith feature vector in the current-level matrix to be processed; according to the correlation degrees between the ith conversion vector and each current-level conversion vector, performing weighted combination on the fusion result obtained by fusing the ith conversion vector and each current-level conversion vector respectively, and determining the feature vector of the ith feature vector in the next-level processing matrix based on the combination result;
the pooling unit 63 is configured to obtain a characterization vector corresponding to the first service object according to the last-stage processing matrix in the multi-stage processing matrices;
and the prediction unit 64 is configured to perform service prediction on the first service object according to the characterization vector.
Through the device, the characteristics of the business object are subjected to more effective cross combination processing, and high-order characteristics with more expressive power are obtained, so that the accuracy of business prediction is improved.
According to an embodiment of another aspect, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method described in connection with fig. 2 to 4.
According to an embodiment of yet another aspect, there is also provided a computing device comprising a memory and a processor, the memory having stored therein executable code, the processor implementing the method described in conjunction with fig. 2-4 when executing the executable code.
Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in this invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.
The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present invention should be included in the scope of the present invention.

Claims (22)

1. A method of business prediction for a business object, comprising:
acquiring an initial feature matrix corresponding to a first service object, wherein the initial feature matrix comprises N original vectors obtained by coding feature values of N features of the first service object;
performing multi-stage feature cross processing on the initial feature matrix to obtain a multi-stage processing matrix; performing linear transformation on the ith original vector at the corresponding position in the initial feature matrix and each level of feature vector in the level matrix to obtain an ith transformation vector and each level of transformation vector; according to each weight factor determined based on each correlation degree between the ith transformation vector and each current-stage transformation vector, performing weighted combination based on an attention mechanism on a fusion result obtained by fusing the ith transformation vector and each current-stage transformation vector respectively, and determining a feature vector of the ith feature vector in a next-stage processing matrix based on a combination result, wherein the fusion result has the same dimension as that of the ith transformation vector;
obtaining a representation vector corresponding to the first business object according to the last-stage processing matrix in the multi-stage processing matrix;
and performing service prediction on the first service object according to the characterization vector.
2. The method of claim 1, wherein,
the first business object is one of the following entity objects: a user, a merchant, a commodity, an item to be recommended; the N features include attribute features of the entity object.
3. The method of claim 1, wherein,
the first business object is a business event, and the business event comprises one of the following: payment events, purchase events, recommendation events, login events; the N characteristics comprise respective attribute characteristics of each participant of the business event.
4. The method of claim 1, wherein the performing linear transformation on the ith original vector of the corresponding position in the initial feature matrix and each level feature vector in the level matrix respectively comprises:
and respectively carrying out linear transformation on the ith transformation vector and each current-level feature vector by using the current-level parameter matrix corresponding to the current level to obtain the ith transformation vector and each current-level transformation vector.
5. The method of claim 1, wherein the performing linear transformation on the ith original vector of the corresponding position in the initial feature matrix and each level feature vector in the level matrix respectively comprises:
performing linear transformation on the ith original vector by using a first parameter matrix to obtain an ith transformation vector; and performing linear transformation on each level of feature vector by using a second parameter matrix to obtain each level of transformation vector.
6. The method according to claim 1, wherein the weighted combination of the fusion results of the ith transform vector and the respective current-level transform vectors according to respective weight factors determined based on respective correlation degrees between the ith transform vector and the respective current-level transform vectors comprises:
determining respective degrees of correlation between the ith transform vector and the respective present-level transform vectors;
determining each weight factor corresponding to each current-level transformation vector according to each correlation degree;
respectively carrying out fusion operation on the ith transformation vector and each current-level transformation vector to obtain each fusion vector;
and performing weighted combination on each fusion vector according to each weight factor to obtain a combination result.
7. The method of claim 6, wherein determining respective degrees of correlation between the ith transform vector and the respective present-level transform vectors comprises:
calculating cosine similarity of the ith transformation vector and each current-level transformation vector as the correlation degree; alternatively, the first and second electrodes may be,
calculating the inner product result of the ith transformation vector and each current-stage transformation vector as the correlation degree; alternatively, the first and second electrodes may be,
and calculating the vector distance between the ith transformation vector and each current-stage transformation vector, and determining the correlation degree according to the vector distance.
8. The method of claim 6, wherein the fusion operation comprises one of:
multiplying by bit, summing and averaging.
9. The method of claim 1, wherein determining the eigenvector of the ith eigenvector in the next-level processing matrix based on the combined result comprises:
on the basis of the combination result, adding an offset vector and the ith feature vector as a feature vector in a next-stage processing matrix.
10. The method of claim 1, wherein obtaining the characterization vector corresponding to the first service object according to the last processing matrix in the multi-stage processing matrices comprises:
pooling the final-stage processing matrix to obtain the characterization vector, wherein the pooling comprises one of the following steps: maximal pooling, average pooling, attention-based pooling.
11. A neural network model for business prediction for a business object, comprising:
the system comprises an input layer and a service object, wherein the input layer is used for acquiring an initial feature matrix corresponding to a first service object, and the initial feature matrix comprises N original vectors obtained by coding feature values of N features of the first service object;
the multilevel cross processing layer is used for carrying out multilevel characteristic cross processing on the initial characteristic matrix to obtain a multilevel processing matrix; each stage of cross processing layer is used for respectively carrying out linear transformation on the ith original vector at the corresponding position in the initial characteristic matrix and each level of characteristic vector in the level matrix to obtain the ith transformation vector and each level of transformation vector for any ith characteristic vector in the level matrix to be processed; according to each weight factor determined based on each correlation degree between the ith transformation vector and each current-stage transformation vector, performing weighted combination based on an attention mechanism on a fusion result obtained by fusing the ith transformation vector and each current-stage transformation vector respectively, and determining a feature vector of the ith feature vector in a next-stage processing matrix based on a combination result, wherein the fusion result has the same dimension as that of the ith transformation vector;
the pooling layer is used for obtaining a characterization vector corresponding to the first service object according to the last-stage processing matrix in the multi-stage processing matrices;
and the output layer is used for carrying out service prediction on the first service object according to the characterization vector.
12. The neural network model of claim 11,
the first business object is one of the following entity objects: a user, a merchant, a commodity, an item to be recommended; the N features include attribute features of the entity object.
13. The neural network model of claim 11,
the first business object is a business event, and the business event comprises one of the following: payment events, purchase events, recommendation events, login events; the N characteristics comprise respective attribute characteristics of each participant of the business event.
14. The neural network model of claim 11, wherein the linear transformation performed in each level of the cross-processing layer specifically comprises:
and respectively carrying out linear transformation on the ith transformation vector and each current-level feature vector by using the current-level parameter matrix corresponding to the current level to obtain the ith transformation vector and each current-level transformation vector.
15. The neural network model of claim 11, wherein the linear transformation performed in each level of the cross-processing layer specifically comprises:
performing linear transformation on the ith original vector by using a first parameter matrix to obtain an ith transformation vector; and performing linear transformation on each level of feature vector by using a second parameter matrix to obtain each level of transformation vector.
16. The neural network model of claim 11, wherein the weighted combination performed in each level of the cross-processing layer specifically comprises:
determining respective degrees of correlation between the ith transform vector and the respective present-level transform vectors;
determining each weight factor corresponding to each current-level transformation vector according to each correlation degree;
respectively carrying out fusion operation on the ith transformation vector and each current-level transformation vector to obtain each fusion vector;
and performing weighted combination on each fusion vector according to each weight factor to obtain a combination result.
17. The neural network model of claim 16, wherein the weighted combination performed in each level of the cross-processing layer specifically comprises:
calculating cosine similarity of the ith transformation vector and each current-level transformation vector as the correlation degree; alternatively, the first and second electrodes may be,
calculating the inner product result of the ith transformation vector and each current-stage transformation vector as the correlation degree; alternatively, the first and second electrodes may be,
and calculating the vector distance between the ith transformation vector and each current-stage transformation vector, and determining the correlation degree according to the vector distance.
18. The neural network model of claim 16, wherein the fusion operation comprises one of:
multiplying by bit, summing and averaging.
19. The neural network model of claim 11, wherein each level of intersection processing layer is specifically configured to:
on the basis of the combination result, adding an offset vector and the ith feature vector as a feature vector in a next-stage processing matrix.
20. The neural network model of claim 11, wherein the pooling layer is specifically configured to:
pooling the final-stage processing matrix to obtain the characterization vector, wherein the pooling comprises one of the following steps: maximal pooling, average pooling, attention-based pooling.
21. An apparatus for business prediction for a business object, comprising:
an obtaining unit, configured to obtain an initial feature matrix corresponding to a first service object, where the initial feature matrix includes N original vectors obtained by encoding feature values of N features of the first service object;
the multiple cross processing units are configured to perform multi-stage feature cross processing on the initial feature matrix to obtain a multi-stage processing matrix; each cross processing unit is configured to perform linear transformation on the ith original vector at the corresponding position in the initial feature matrix and each ith feature vector in the current-level matrix to obtain an ith transformation vector and each current-level transformation vector for any ith feature vector in the current-level matrix to be processed; according to weight factors determined based on the correlation degrees between the ith transformation vector and the corresponding transformation vectors, performing weighted combination based on an attention mechanism on the fusion results obtained by fusing the ith transformation vector and the corresponding transformation vectors, and determining the feature vector of the ith feature vector in a next-stage processing matrix based on the combination results, wherein the fusion results have the same dimension as the ith transformation vector;
the pooling unit is configured to obtain a representation vector corresponding to the first service object according to the last-stage processing matrix in the multi-stage processing matrix;
and the prediction unit is configured to perform service prediction on the first service object according to the characterization vector.
22. A computing device comprising a memory and a processor, wherein the memory has stored therein executable code that, when executed by the processor, performs the method of any of claims 1-10.
CN202010329614.XA 2020-04-24 2020-04-24 Method, neural network model and device for business prediction for business object Active CN111222722B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010329614.XA CN111222722B (en) 2020-04-24 2020-04-24 Method, neural network model and device for business prediction for business object

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010329614.XA CN111222722B (en) 2020-04-24 2020-04-24 Method, neural network model and device for business prediction for business object

Publications (2)

Publication Number Publication Date
CN111222722A CN111222722A (en) 2020-06-02
CN111222722B true CN111222722B (en) 2020-07-24

Family

ID=70831712

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010329614.XA Active CN111222722B (en) 2020-04-24 2020-04-24 Method, neural network model and device for business prediction for business object

Country Status (1)

Country Link
CN (1) CN111222722B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113255908B (en) * 2021-05-27 2023-04-07 支付宝(杭州)信息技术有限公司 Method, neural network model and device for service prediction based on event sequence

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110751285A (en) * 2018-07-23 2020-02-04 第四范式(北京)技术有限公司 Training method and system and prediction method and system of neural network model

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11348018B2 (en) * 2017-12-19 2022-05-31 Aspen Technology, Inc. Computer system and method for building and deploying models predicting plant asset failure
CN110751261A (en) * 2018-07-23 2020-02-04 第四范式(北京)技术有限公司 Training method and system and prediction method and system of neural network model
CN110046304B (en) * 2019-04-18 2022-12-13 腾讯科技(深圳)有限公司 User recommendation method and device
CN110263973B (en) * 2019-05-15 2024-02-02 创新先进技术有限公司 Method and device for predicting user behavior
CN110929206B (en) * 2019-11-20 2023-04-07 腾讯科技(深圳)有限公司 Click rate estimation method and device, computer readable storage medium and equipment

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110751285A (en) * 2018-07-23 2020-02-04 第四范式(北京)技术有限公司 Training method and system and prediction method and system of neural network model

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
AutoInt: Automatic Feature Interaction Learning via Self-Attentive Neural Networks;Weiping Song等;《CIKM"19》;20190823;全文 *
Deep Spatio-Temporal Neural Networks for Click-Through Rate Prediction;Wentao Ouyang等;《KDD"19》;20190719;全文 *

Also Published As

Publication number Publication date
CN111222722A (en) 2020-06-02

Similar Documents

Publication Publication Date Title
CN109785062B (en) Hybrid neural network recommendation system based on collaborative filtering model
CN112800342B (en) Recommendation method, system, computer device and storage medium based on heterogeneous information
CN110598118A (en) Resource object recommendation method and device and computer readable medium
CN111008335B (en) Information processing method, device, equipment and storage medium
WO2022152161A1 (en) Training and prediction of hybrid graph neural network model
CN111737578A (en) Recommendation method and system
CN113255908B (en) Method, neural network model and device for service prediction based on event sequence
CN112633927B (en) Combined commodity mining method based on knowledge graph rule embedding
CN111177577B (en) Group project recommendation method, intelligent terminal and storage device
CN115859199A (en) Medical insurance fraud detection method and embedded vector generation method, device and medium thereof
CN111222722B (en) Method, neural network model and device for business prediction for business object
CN115482141A (en) Image processing method, image processing device, electronic equipment and storage medium
CN110502701B (en) Friend recommendation method, system and storage medium introducing attention mechanism
CN113779380A (en) Cross-domain recommendation method, device and equipment, and content recommendation method, device and equipment
JP7414357B2 (en) Text processing methods, apparatus, devices and computer readable storage media
CN114491086A (en) Clothing personalized matching recommendation method and system, electronic equipment and storage medium
CN114996566A (en) Intelligent recommendation system and method for industrial internet platform
CN112734519B (en) Commodity recommendation method based on convolution self-encoder network
CN114817758A (en) Recommendation system method based on NSGC-GRU integrated model
CN112132345A (en) Method and device for predicting user information of electric power company, electronic equipment and storage medium
CN113850616A (en) Customer life cycle value prediction method based on depth map neural network
CN112559640A (en) Training method and device of atlas characterization system
CN117859139A (en) Multi-graph convolution collaborative filtering
CN111445282B (en) Service processing method, device and equipment based on user behaviors
CN117252665B (en) Service recommendation method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant