CN111222722B - Method, neural network model and device for business prediction for business object - Google Patents
Method, neural network model and device for business prediction for business object Download PDFInfo
- Publication number
- CN111222722B CN111222722B CN202010329614.XA CN202010329614A CN111222722B CN 111222722 B CN111222722 B CN 111222722B CN 202010329614 A CN202010329614 A CN 202010329614A CN 111222722 B CN111222722 B CN 111222722B
- Authority
- CN
- China
- Prior art keywords
- vector
- ith
- level
- matrix
- transformation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Abstract
The embodiment of the specification provides a method for business prediction aiming at business objects and a neural network model. In the method, an initial feature matrix corresponding to a business object is obtained first, wherein the initial feature matrix comprises N original vectors corresponding to N features of the business object. Then, performing multi-stage processing on the initial feature matrix, wherein each stage of processing comprises that for the ith feature vector to be processed at the stage, the corresponding ith original vector and each corresponding stage of feature vector in the initial feature matrix are respectively subjected to linear transformation to obtain the ith transformation vector and each corresponding stage of transformation vector; and performing weighted combination on the fusion result according to the correlation degrees between the ith transformation vector and each current-stage transformation vector, thereby determining the next-stage feature vector of the ith feature vector. And obtaining a representation vector corresponding to the service object according to the matrix obtained by the last-stage processing, and performing service prediction on the service object according to the representation vector.
Description
Technical Field
One or more embodiments of the present specification relate to the field of machine learning, and more particularly, to a method and apparatus for business prediction for business objects.
Background
With the development of computer technology, machine learning has been applied to various technical fields for analyzing and predicting various business data. For example, the classification of the user can be predicted according to the attribute characteristics of the user, so that customized personalized services are provided for the user; the recommendation degree between the users and the articles can be predicted according to the integration of the information of the users and the articles, so that the appropriate articles can be recommended for the users; for another example, traffic peaks may be predetermined based on predictions of times users visit websites, thereby deploying the network environment in advance.
In a prediction scene aiming at various business objects, in order to improve the accuracy of model prediction as much as possible, abundant feature data with different dimensions are usually introduced. The characteristics of different dimensions depict different information of the service scene from different angles. In most cases, the fitting target of the model and each basic feature are not in a simple linear relationship, so that the model trained based on the basic features can only express a linear combination of feature information, and the model expression capability is limited. Therefore, it is desirable to efficiently combine features to improve the expressive power of the model. The traditional feature combination work is manually designed by engineering personnel according to business experience, has high cost and poor business expansibility and is limited by the business understanding of the engineering personnel.
Therefore, an improved scheme is expected to be provided, the feature combination is more effectively carried out on the service features, the modeling limitation on high-order features is avoided, the expression capacity of the model is improved, and the prediction accuracy of the model is improved.
Disclosure of Invention
One or more embodiments of the present disclosure describe a method and a neural network model for service prediction of a service object, which can perform more effective high-order combination on features of the service object, and improve prediction accuracy.
According to a first aspect, there is provided a method for business prediction for a business object, comprising:
acquiring an initial feature matrix corresponding to a first service object, wherein the initial feature matrix comprises N original vectors obtained by coding feature values of N features of the first service object;
performing multi-stage feature cross processing on the initial feature matrix to obtain a multi-stage processing matrix; performing linear transformation on the ith original vector at the corresponding position in the initial feature matrix and each level of feature vector in the level matrix to obtain an ith transformation vector and each level of transformation vector; according to the correlation degrees between the ith conversion vector and each current-level conversion vector, performing weighted combination on the fusion result obtained by fusing the ith conversion vector and each current-level conversion vector respectively, and determining the feature vector of the ith feature vector in the next-level processing matrix based on the combination result;
obtaining a representation vector corresponding to the first business object according to the last-stage processing matrix in the multi-stage processing matrix;
and performing service prediction on the first service object according to the characterization vector.
According to one embodiment, the first business object may be one of the following entity objects: a user, a merchant, a commodity, an item to be recommended; correspondingly, the N features include attribute features of the entity object.
According to another embodiment, the first business object may be a business event, the business event comprising one of: payment events, purchase events, recommendation events, login events; correspondingly, the N characteristics include respective attribute characteristics of each participant of the business event.
In one embodiment, the linear transformation is performed on the ith original vector at the corresponding position in the initial feature matrix and each of the feature vectors at the current level in the initial feature matrix, specifically including: and respectively carrying out linear transformation on the ith transformation vector and each current-level feature vector by using the current-level parameter matrix corresponding to the current level to obtain the ith transformation vector and each current-level transformation vector.
In another embodiment, the linear transformation is performed on the ith original vector at the corresponding position in the initial feature matrix and each of the feature vectors at the current level in the initial feature matrix, specifically including: performing linear transformation on the ith original vector by using a first parameter matrix to obtain an ith transformation vector; and performing linear transformation on each level of feature vector by using a second parameter matrix to obtain each level of transformation vector.
According to one embodiment, the weighted combination is performed by: determining respective degrees of correlation between the ith transform vector and the respective present-level transform vectors; determining each weight factor corresponding to each current-level transformation vector according to each correlation degree; respectively carrying out fusion operation on the ith transformation vector and each current-level transformation vector to obtain each fusion vector; and performing weighted combination on each fusion vector according to each weight factor to obtain a combination result.
Further, the correlation may be determined by: calculating cosine similarity of the ith transformation vector and each current-level transformation vector as the correlation degree; or, calculating the inner product result of the ith transformation vector and each current-stage transformation vector as the correlation; or calculating the vector distance between the ith transformation vector and each current-stage transformation vector, and determining the correlation according to the vector distance.
In one embodiment, the fusing operation comprises one of: multiplying by bit, summing and averaging.
According to an embodiment, the determining the feature vector of the ith feature vector in the next-stage processing matrix based on the combination result specifically includes: on the basis of the combination result, adding an offset vector and the ith feature vector as a feature vector in a next-stage processing matrix.
In one embodiment, the characterization vector is obtained by: pooling the final-stage processing matrix to obtain the characterization vector, wherein the pooling comprises one of the following steps: maximal pooling, average pooling, attention-based pooling.
According to a second aspect, there is provided a neural network model for business prediction for a business object, comprising:
the system comprises an input layer and a service object, wherein the input layer is used for acquiring an initial feature matrix corresponding to a first service object, and the initial feature matrix comprises N original vectors obtained by coding feature values of N features of the first service object;
the multilevel cross processing layer is used for carrying out multilevel characteristic cross processing on the initial characteristic matrix to obtain a multilevel processing matrix; each stage of cross processing layer is used for respectively carrying out linear transformation on the ith original vector at the corresponding position in the initial characteristic matrix and each level of characteristic vector in the level matrix to obtain the ith transformation vector and each level of transformation vector for any ith characteristic vector in the level matrix to be processed; according to the correlation degrees between the ith conversion vector and each current-level conversion vector, performing weighted combination on the fusion result obtained by fusing the ith conversion vector and each current-level conversion vector respectively, and determining the feature vector of the ith feature vector in the next-level processing matrix based on the combination result;
the pooling layer is used for obtaining a characterization vector corresponding to the first service object according to the last-stage processing matrix in the multi-stage processing matrices;
and the output layer is used for carrying out service prediction on the first service object according to the characterization vector.
According to a third aspect, there is provided an apparatus for performing traffic prediction for a traffic object, comprising:
an obtaining unit, configured to obtain an initial feature matrix corresponding to a first service object, where the initial feature matrix includes N original vectors obtained by encoding feature values of N features of the first service object;
the multiple cross processing units are configured to perform multi-stage feature cross processing on the initial feature matrix to obtain a multi-stage processing matrix; each cross processing unit is configured to perform linear transformation on the ith original vector at the corresponding position in the initial feature matrix and each ith feature vector in the current-level matrix to obtain an ith transformation vector and each current-level transformation vector for any ith feature vector in the current-level matrix to be processed; according to the correlation degrees between the ith conversion vector and each current-level conversion vector, performing weighted combination on the fusion result obtained by fusing the ith conversion vector and each current-level conversion vector respectively, and determining the feature vector of the ith feature vector in the next-level processing matrix based on the combination result;
the pooling unit is configured to obtain a representation vector corresponding to the first service object according to the last-stage processing matrix in the multi-stage processing matrix;
and the prediction unit is configured to perform service prediction on the first service object according to the characterization vector.
According to a fourth aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of the first aspect.
According to a fifth aspect, there is provided a computing device comprising a memory and a processor, wherein the memory has stored therein executable code, and the processor, when executing the executable code, implements the method of the first aspect.
According to the method, the device and the neural network model provided by the embodiment of the specification, in the multi-stage feature intersection processing, a non-linear fusion mode and an attention-based weighted combination mode are adopted among feature vectors, so that sufficient cross combination operation is obtained among the features, and high-order features with more expressive power are obtained. The service prediction is executed based on the high-order characteristics, so that the accuracy of the service prediction can be further improved. And moreover, a basis and possibility are provided for the interpretability of the business prediction result based on the attention combination mode.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a schematic diagram illustrating an implementation scenario of an embodiment disclosed herein;
FIG. 2 illustrates a method of business prediction for a business object, according to one embodiment;
FIG. 3 illustrates a second embodimentlClass de superA cross-check processing process;
FIG. 4 illustrates the process steps of fusing transform vectors and performing weighted combination of the fusion results in one embodiment;
FIG. 5 illustrates a schematic structural diagram of a neural network model according to one embodiment;
fig. 6 shows a schematic block diagram of a traffic prediction apparatus according to an embodiment.
Detailed Description
The scheme provided by the specification is described below with reference to the accompanying drawings.
As previously mentioned, in order to improve the feature expression capability of the model for the business object, it is desirable to combine features more efficiently. In one approach, a multi-layer neural network is utilized to perform high order combining of features in various hidden layers of the network. For example, high order Cross-combining between features can be done through a Deep Cross-connect Network (DCN).
Specifically, the original input obtained by the deep cross network is composed of N feature items, and each feature item is represented as X by a d-dimensional feature vector0 iI =1,2 … N. The DCN firstly splices all N eigenvectors to obtain an original input vector X0:
I.e. the original input vector X0And splicing the feature vectors of the feature items.
Then, at each feature intersection processing layer, feature intersection combination is performed, whereinlThe combination of the layers satisfies the following formula:
wherein the content of the first and second substances,is as followslThe input of the layer(s) is (are),is as followslThe output of the layer(s) is,,is as followslNetwork parameters of the layer.
According to the above formula (2), the first step of DCNlLayer outputIncluding primitive features from first order tolAll possible combinations of orders. DCNs can thus achieve arbitrary finite high-order feature combinations by stacking feature-crossing layers.
However, if the bias term in equation (2) is ignoredBy careful analysis, the following rules can be found when each layer is changed:
In a similar manner, the first and second substrates are,
It can be seen that modeling of DCN for higher order feature combinations degrades to the original featuresAlthough the scale of scaling is related to the input features, its expressive power is relatively limited.
In order to further improve the feature expression capability of the neural network model for business prediction and improve the prediction accuracy of the neural network model, according to the embodiment of the invention, a further feature cross combination mode is provided to avoid the linear degradation of high-order feature combination.
Fig. 1 is a schematic view of an implementation scenario of an embodiment disclosed in this specification. As shown in the figure, firstly, the neural network model of the embodiment arranges the original features of the business objects into a feature matrixAnd not feature vectors. When each cross processing layer carries out feature cross processing of each level, on one hand, each feature vector in the matrix is fused by combining the original feature matrix, and the fusion results are weighted and combined based on an attention mechanism. The above fusion and attention-based combination are both non-linear operations, which makes the feature matrix output by the last cross processing layer contain various high-order combination modes among feature vectors without being degraded to linear scaling of the original vector. Therefore, the characteristic expression capability of the neural network model is enhanced, and the prediction accuracy of the neural network model on the business object is improved.
The whole process of business prediction for business objects under the above concept is described in detail below.
FIG. 2 illustrates a method of business prediction for a business object, according to one embodiment. It is to be appreciated that the method can be performed by any apparatus, device, platform, cluster of devices having computing and processing capabilities. In one embodiment, the method may be performed by a neural network model that may be deployed in any device, apparatus, platform, cluster of devices having computing, processing capabilities. As shown in fig. 2, the method for traffic prediction at least comprises the following steps.
First, in step 21, an initial feature matrix corresponding to a first service object to be predicted is obtained, where the initial feature matrix includes N original vectors obtained by encoding feature values of N features of the first service object.
In one embodiment, the first business object corresponds to a single entity object, and the N features include attribute features of the entity object.
For example, in one example, the entity object is a user. At this time, the above-mentioned N characteristics may be attribute characteristics of the user, such as basic attribute characteristics of age, sex, registration time, education level, and the like, and behavior attribute characteristics such as recent browsing history, recent shopping history, and the like.
In another example, the entity object may be a merchant. At this time, the above-mentioned N characteristics may be attribute characteristics of the merchant, such as merchant category, registration time, commodity quantity, sales volume, number of people concerned, and the like.
In other examples, the entity object may also be a commodity, or an item to be recommended (e.g., an article to be pushed, music, a movie, etc.). Correspondingly, the N characteristics include attribute characteristics of the corresponding goods or articles.
In another embodiment, the first business object to be predicted is a business event, and the business event may be, for example, a payment event, a purchase event, a recommendation event, a login event, and the like. Correspondingly, the N characteristics include respective attribute characteristics of each participant of the business event.
For example, in one example, the first business object is a recommended event involving a first user and a first item. Accordingly, the N characteristics may include a user attribute characteristic of the first user and an item attribute characteristic of the first item.
For example, in yet another example, the first business object is a payment event involving two users, a first user and a second user. Accordingly, the N features may include respective user attribute features of the first user and the second user. Examples of N features in the case of other business events are not enumerated one by one.
For the N features of the various business objects exemplified above, the feature values thereof may be encoded as d-dimensional vectors, respectively, thereby forming N d-dimensional vectors. The encoding of the characteristic values may take a number of forms. For example, in one example, for the feature values of some feature items, one-hot (one-hot) encoding may be employed; in another example, the feature values may also be mapped to d-dimensional vectors using a look-up table. In one example, a predetermined word embedding tool (e.g., word2 vec) may also be used to convert the text therein into a d-dimensional vector.
Thus, the obtained N features correspond to N d-dimensional vectors, and form an N x d-dimensional matrix, which is called an initial feature matrix. The initial feature matrix corresponds to X in FIG. 10。
Next, in step 22, a multilevel feature cross processing is performed on the initial feature matrix to obtain a multilevel processing matrix. Any one of the stages is described below and denoted aslStage, process of feature intersection processing.
FIG. 3 illustrates a second embodimentlThe level feature intersection process, the sub-process of step 22 in fig. 2. It is understood that the first steplStage feature cross processing to obtain the processing matrix of the previous stage outputAs the matrix of the current stage to be processed, the characteristic cross processing is carried out on the matrix, and the processing matrix of the next stage is output. For simplicity of description, the current level matrix to be processed is described belowThe ith feature vector of(corresponding to the ith feature in the original feature matrix)。
As shown in fig. 3, for the ith feature vectorThe cross processing of (2) comprises the following processes: step 31, for the ith original vector of the corresponding position in the initial feature matrixAnd each level feature vector in the level matrix (which may be denoted as) Respectively carrying out linear transformation to obtain an ith transformation vector and each current-level transformation vector; then, in step 32, the fusion results of the fusion of the ith transformation vector and each of the current-stage transformation vectors are weighted and combined according to each degree of correlation between the ith transformation vector and each of the current-stage transformation vectors; in step 33, the i-th feature vector is determined based on the combination result of the weighted combinationProcessing the matrix at the next stageFeature vector of. The manner in which the above steps are performed is described in detail below.
First, in step 31, for the ith original vectorAnd each of the level-of-cost feature vectorsAnd respectively carrying out linear transformation to obtain the ith transformation vector and each current-level transformation vector. The linear transformation described above may be implemented using a parameter matrix. Specifically, the sum can be calculated for each of the present-level feature vectorsAnd respectively applying a parameter matrix W to the ith original vector, and thus obtaining each current-level transformation vector and the ith transformation vector.
In one embodiment, the parameter matrix W may be a unified parameter matrix in a multi-level feature intersection process.
In another embodiment, the parameter matrix W may be different in stages. For the current secondlFor stage processing, the corresponding second stage is usedlLevel parameter matrixFor each level of feature vectorAnd ith original vectorRespectively processing the two to obtain each level of transformation vectorAnd ith transform vector。
In yet another embodiment, different parameter matrices may be used, respectively for the ith original vectorAnd each of the level-of-cost feature vectorsA linear transformation is performed. For example, a first parameter matrix may be utilizedFor the ith original vectorPerforming linear transformation to obtain the ith transformation vector(ii) a Using a second parameter matrixFor each level of feature vectorPerforming linear transformation to obtain each current-level transformation vector. The above first parameter matrixAnd a second parameter matrixAnd the different levels can be different or the same.
It should be understood that the values of the elements in the above various parameter matrices can be determined by training the neural network model.
For convenience of description, each of the obtained current-level transformation vectors will be referred to asLet the i-th transformation vector beWherein the parameter matrix W includes the cases of the above embodiments.
Next, at step 32, a vector is transformed according to the ith transform vectorAnd each current-level transformation vectorThe ith transform vector is subjected to the respective correlation degreesRespectively transforming the vectors with respective levelsAnd carrying out weighted combination on the fusion results of the fusion.
Fig. 4 shows the process steps of fusing the transform vectors and performing a weighted combination of the fusion results in one embodiment, i.e., the sub-steps of step 32 above.
As shown in FIG. 4, in step 321, the i-th transformed vector is determinedAnd the above-mentioned each level of conversion vectorThe respective degrees of correlation therebetween. Specifically, a correlation calculation function f may be introduced to calculate the ith transformation vector respectivelyAnd j' th current-level transformation vectorDegree of correlation e betweenij:
The correlation calculation function f may employ various correlation calculation methods. In one example, the correlation calculation function f is used to calculate the cosine similarity between the ith and jth transform vectors. In another example, the correlation calculation function f is used to calculate the inner product result (i.e., vector dot product) of the ith transform vector and the jth current-level transform vector as its correlation. In yet another example, the correlation calculation function f is configured to calculate a vector distance, such as a euclidean distance, between the ith and jth transform vectors, and determine the correlation according to the vector distance such that the correlation is inversely related to the vector distance. The correlation calculation function may also take a form of calculation.
Then, in step 322, each weight factor corresponding to each current-level transformation vector is determined according to each correlation.
In one embodiment, the correlation between each of the current-level transformation vectors and the ith transformation vector determined above is directly used as the corresponding weighting factor.
In another embodiment, the correlation corresponding to each of the present-level transformation vectors is normalized, and the normalized value is used as a weight factor.
More specifically, in one example, the weighting factor corresponding to the jth current-level transformation vector may be obtained by using proportional normalization:
In another example, the weighting factor corresponding to the jth current-level transformation vector can be obtained by normalization using a softmax function:
In addition, in step 323, the ith transform vector is appliedRespectively transforming the vectors with respective levelsPerforming fusion operation to obtain each fusion vector Aij:
In one example, the above described fusion operation is a bitwise multiplication of two vectors.
In another example, other fusion methods may be used, such as summation, averaging, etc. It is to be understood that the fusion operation herein needs to make the fused vector dimension unchanged.
Next, at step 324, a weighting factor is determined based on each of the weighting factors obtained at step 322For each fusion vector A obtained in step 323ijAnd carrying out weighted combination to obtain a combination result C:
in the above, through steps 321 to 324, step 32 in fig. 3 is performed. Returning to fig. 3, after step 32 is performed, it is determined that the ith eigenvector processes the matrix at the next stage based on the combination result C of the above weighted combination in step 33Feature vector of。
In one embodiment, the combination result C is used as the ith feature vectorNext stage feature vector of。
In another embodiment, the results are combined as described aboveAdding the ith feature vector on the basis of CItself as its next level feature vectorNamely:
in another embodiment, on the basis of the combination result C, the offset vector and the ith feature vector are added as the feature vector of the next levelNamely:
Thus, through the process of FIG. 3, the current level matrix to be processedAny one of the feature vectorsPerforming feature cross processing to obtain the next stage feature vector. The feature cross processing is carried out on each feature vector in the matrix of the current level, so that a processing matrix of the next level is obtained。
It is to be understood that FIG. 3 illustrates an optional second embodimentlThe feature interleaving process of a stage, which is a sub-step of step 22 in fig. 2. Returning to step 22 of FIG. 2, each stage of the multi-stage feature intersection processing is performed as shown in FIG. 3, so that the multi-stage processing respectively obtains the corresponding multi-stage processing matrixAnd m is the stage number of the characteristic cross processing. Since each stage of processing adopts a mode of non-linear fusion between vectors and weighted combination based on attention for each feature vector, such as shown in formulas (9) to (11), sufficient cross combination operation is performed between features, and the obtained high-order features do not depend on the original feature matrix X linearly0。
Then, in step 23 of fig. 2, a characterization vector corresponding to the first service object is obtained according to the last stage processing matrix in the multi-stage processing matrix. In this step, the resulting matrix containing the high-order features is processed into a vector form for subsequent prediction.
Specifically, in one embodiment, the token vector is obtained by pooling the last-stage processing matrix. The pooling process may include, maximum pooling, average pooling, attention-based pooling, and the like. In another embodiment, the last-stage processing matrix may be further processed into a token vector in other manners, such as vector stitching.
Then, in step 24, a traffic prediction is performed on the first traffic object according to the above-mentioned characterization vector. Specifically, a prediction function such as softmax may be applied to the above token vector to obtain a traffic prediction result. In one embodiment, the business prediction result may be a classification result, i.e., a classification of the first business object predicted, such as a user class, a merchant class, a binary result of whether the payment event is safe, and so on. In another embodiment, the business prediction result may also be a regression value, i.e., a score predicted to result in the first business object, such as a good score of an item, a security score of a payment event, a recommendation score of a recommendation event, and so on.
Reviewing the above process, in the multi-stage feature intersection processing, a mode of nonlinear fusion among feature vectors and attention-based weighted combination is adopted, so that sufficient cross combination operation is obtained among features, and high-order features with more expressive power are obtained. The service prediction is executed based on the high-order characteristics, so that the accuracy of the service prediction can be further improved. And moreover, a basis and possibility are provided for the interpretability of the business prediction result based on the attention combination mode.
According to an embodiment of another aspect, a neural network model for business prediction of business objects is provided. Fig. 5 illustrates a schematic structural diagram of a neural network model that may be deployed in any device, platform, or cluster of devices having data storage, computing, processing capabilities, according to one embodiment. As shown in fig. 5, the neural network model 500 for business prediction of business objects includes:
the input layer 51 is configured to obtain an initial feature matrix corresponding to a first service object, where the initial feature matrix includes N original vectors obtained by encoding feature values of N features of the first service object;
a multilevel cross processing layer 52, configured to perform multilevel feature cross processing on the initial feature matrix to obtain a multilevel processing matrix; each stage of cross processing layer is used for respectively carrying out linear transformation on the ith original vector at the corresponding position in the initial characteristic matrix and each level of characteristic vector in the level matrix to obtain the ith transformation vector and each level of transformation vector for any ith characteristic vector in the level matrix to be processed; according to the correlation degrees between the ith conversion vector and each current-level conversion vector, performing weighted combination on the fusion result obtained by fusing the ith conversion vector and each current-level conversion vector respectively, and determining the feature vector of the ith feature vector in the next-level processing matrix based on the combination result;
the pooling layer 53 is used for obtaining a characterization vector corresponding to the first service object according to the last-stage processing matrix in the multi-stage processing matrix;
and the output layer 54 is configured to perform service prediction on the first service object according to the characterization vector.
According to one embodiment, the first business object is one of the following entity objects: the user, the merchant, the commodity and the article to be recommended. Correspondingly, the N features include attribute features of the entity object.
According to another embodiment, the first business object is a business event, and the business event includes one of the following: payment events, purchase events, recommendation events, login events. Correspondingly, the N characteristics include respective attribute characteristics of each participant of the business event.
In one embodiment, the linear transformation performed in each of the multiple stages of interleaving layers 52 specifically includes: and respectively carrying out linear transformation on the ith transformation vector and each current-level feature vector by using the current-level parameter matrix corresponding to the current level to obtain the ith transformation vector and each current-level transformation vector.
In another embodiment, the linear transformation in each level of the interleaving layer specifically includes: performing linear transformation on the ith original vector by using a first parameter matrix to obtain an ith transformation vector; and performing linear transformation on each level of feature vector by using a second parameter matrix to obtain each level of transformation vector.
In one embodiment, the weighted combination performed in each level of the interleaving layer specifically includes:
determining respective degrees of correlation between the ith transform vector and the respective present-level transform vectors;
determining each weight factor corresponding to each current-level transformation vector according to each correlation degree;
respectively carrying out fusion operation on the ith transformation vector and each current-level transformation vector to obtain each fusion vector;
and performing weighted combination on each fusion vector according to each weight factor to obtain a combination result.
More specifically, in each example, the correlation is determined by: calculating cosine similarity of the ith transformation vector and each current-level transformation vector as the correlation degree; or, calculating the inner product result of the ith transformation vector and each current-stage transformation vector as the correlation; or calculating the vector distance between the ith transformation vector and each current-stage transformation vector, and determining the correlation according to the vector distance.
In one embodiment, the fusing operation includes one of: multiplying by bit, summing and averaging.
According to one embodiment, each level of the interleaving layer is specifically configured to: on the basis of the combination result, adding an offset vector and the ith feature vector as a feature vector in a next-stage processing matrix.
According to one embodiment, the pooling layer 53 may be implemented by several fully connected layers. In a specific example, the pooling layer 53 may pool the last-stage processing matrix to obtain the characterization vector, where the pooling includes one of: maximal pooling, average pooling, attention-based pooling.
Through the neural network model, the characteristics of the business object are subjected to more effective cross combination processing, and higher-order characteristics with more expressive power are obtained, so that the accuracy of business prediction is improved.
According to an embodiment of yet another aspect, an apparatus for business prediction of a business object is provided, which may be implemented as any device, platform or cluster of devices having data storage, computing, processing capabilities. Fig. 6 shows a schematic block diagram of a traffic prediction apparatus according to an embodiment. As shown in fig. 6, the prediction apparatus 600 includes:
an obtaining unit 61, configured to obtain an initial feature matrix corresponding to a first service object, where the initial feature matrix includes N original vectors obtained by encoding feature values of N features of the first service object;
a plurality of cross processing units 62 configured to perform multi-stage feature cross processing on the initial feature matrix to obtain a multi-stage processing matrix; each cross processing unit is configured to perform linear transformation on the ith original vector at the corresponding position in the initial feature matrix and each ith feature vector in the current-level matrix to obtain an ith transformation vector and each current-level transformation vector for any ith feature vector in the current-level matrix to be processed; according to the correlation degrees between the ith conversion vector and each current-level conversion vector, performing weighted combination on the fusion result obtained by fusing the ith conversion vector and each current-level conversion vector respectively, and determining the feature vector of the ith feature vector in the next-level processing matrix based on the combination result;
the pooling unit 63 is configured to obtain a characterization vector corresponding to the first service object according to the last-stage processing matrix in the multi-stage processing matrices;
and the prediction unit 64 is configured to perform service prediction on the first service object according to the characterization vector.
Through the device, the characteristics of the business object are subjected to more effective cross combination processing, and high-order characteristics with more expressive power are obtained, so that the accuracy of business prediction is improved.
According to an embodiment of another aspect, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method described in connection with fig. 2 to 4.
According to an embodiment of yet another aspect, there is also provided a computing device comprising a memory and a processor, the memory having stored therein executable code, the processor implementing the method described in conjunction with fig. 2-4 when executing the executable code.
Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in this invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.
The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present invention should be included in the scope of the present invention.
Claims (22)
1. A method of business prediction for a business object, comprising:
acquiring an initial feature matrix corresponding to a first service object, wherein the initial feature matrix comprises N original vectors obtained by coding feature values of N features of the first service object;
performing multi-stage feature cross processing on the initial feature matrix to obtain a multi-stage processing matrix; performing linear transformation on the ith original vector at the corresponding position in the initial feature matrix and each level of feature vector in the level matrix to obtain an ith transformation vector and each level of transformation vector; according to each weight factor determined based on each correlation degree between the ith transformation vector and each current-stage transformation vector, performing weighted combination based on an attention mechanism on a fusion result obtained by fusing the ith transformation vector and each current-stage transformation vector respectively, and determining a feature vector of the ith feature vector in a next-stage processing matrix based on a combination result, wherein the fusion result has the same dimension as that of the ith transformation vector;
obtaining a representation vector corresponding to the first business object according to the last-stage processing matrix in the multi-stage processing matrix;
and performing service prediction on the first service object according to the characterization vector.
2. The method of claim 1, wherein,
the first business object is one of the following entity objects: a user, a merchant, a commodity, an item to be recommended; the N features include attribute features of the entity object.
3. The method of claim 1, wherein,
the first business object is a business event, and the business event comprises one of the following: payment events, purchase events, recommendation events, login events; the N characteristics comprise respective attribute characteristics of each participant of the business event.
4. The method of claim 1, wherein the performing linear transformation on the ith original vector of the corresponding position in the initial feature matrix and each level feature vector in the level matrix respectively comprises:
and respectively carrying out linear transformation on the ith transformation vector and each current-level feature vector by using the current-level parameter matrix corresponding to the current level to obtain the ith transformation vector and each current-level transformation vector.
5. The method of claim 1, wherein the performing linear transformation on the ith original vector of the corresponding position in the initial feature matrix and each level feature vector in the level matrix respectively comprises:
performing linear transformation on the ith original vector by using a first parameter matrix to obtain an ith transformation vector; and performing linear transformation on each level of feature vector by using a second parameter matrix to obtain each level of transformation vector.
6. The method according to claim 1, wherein the weighted combination of the fusion results of the ith transform vector and the respective current-level transform vectors according to respective weight factors determined based on respective correlation degrees between the ith transform vector and the respective current-level transform vectors comprises:
determining respective degrees of correlation between the ith transform vector and the respective present-level transform vectors;
determining each weight factor corresponding to each current-level transformation vector according to each correlation degree;
respectively carrying out fusion operation on the ith transformation vector and each current-level transformation vector to obtain each fusion vector;
and performing weighted combination on each fusion vector according to each weight factor to obtain a combination result.
7. The method of claim 6, wherein determining respective degrees of correlation between the ith transform vector and the respective present-level transform vectors comprises:
calculating cosine similarity of the ith transformation vector and each current-level transformation vector as the correlation degree; alternatively, the first and second electrodes may be,
calculating the inner product result of the ith transformation vector and each current-stage transformation vector as the correlation degree; alternatively, the first and second electrodes may be,
and calculating the vector distance between the ith transformation vector and each current-stage transformation vector, and determining the correlation degree according to the vector distance.
8. The method of claim 6, wherein the fusion operation comprises one of:
multiplying by bit, summing and averaging.
9. The method of claim 1, wherein determining the eigenvector of the ith eigenvector in the next-level processing matrix based on the combined result comprises:
on the basis of the combination result, adding an offset vector and the ith feature vector as a feature vector in a next-stage processing matrix.
10. The method of claim 1, wherein obtaining the characterization vector corresponding to the first service object according to the last processing matrix in the multi-stage processing matrices comprises:
pooling the final-stage processing matrix to obtain the characterization vector, wherein the pooling comprises one of the following steps: maximal pooling, average pooling, attention-based pooling.
11. A neural network model for business prediction for a business object, comprising:
the system comprises an input layer and a service object, wherein the input layer is used for acquiring an initial feature matrix corresponding to a first service object, and the initial feature matrix comprises N original vectors obtained by coding feature values of N features of the first service object;
the multilevel cross processing layer is used for carrying out multilevel characteristic cross processing on the initial characteristic matrix to obtain a multilevel processing matrix; each stage of cross processing layer is used for respectively carrying out linear transformation on the ith original vector at the corresponding position in the initial characteristic matrix and each level of characteristic vector in the level matrix to obtain the ith transformation vector and each level of transformation vector for any ith characteristic vector in the level matrix to be processed; according to each weight factor determined based on each correlation degree between the ith transformation vector and each current-stage transformation vector, performing weighted combination based on an attention mechanism on a fusion result obtained by fusing the ith transformation vector and each current-stage transformation vector respectively, and determining a feature vector of the ith feature vector in a next-stage processing matrix based on a combination result, wherein the fusion result has the same dimension as that of the ith transformation vector;
the pooling layer is used for obtaining a characterization vector corresponding to the first service object according to the last-stage processing matrix in the multi-stage processing matrices;
and the output layer is used for carrying out service prediction on the first service object according to the characterization vector.
12. The neural network model of claim 11,
the first business object is one of the following entity objects: a user, a merchant, a commodity, an item to be recommended; the N features include attribute features of the entity object.
13. The neural network model of claim 11,
the first business object is a business event, and the business event comprises one of the following: payment events, purchase events, recommendation events, login events; the N characteristics comprise respective attribute characteristics of each participant of the business event.
14. The neural network model of claim 11, wherein the linear transformation performed in each level of the cross-processing layer specifically comprises:
and respectively carrying out linear transformation on the ith transformation vector and each current-level feature vector by using the current-level parameter matrix corresponding to the current level to obtain the ith transformation vector and each current-level transformation vector.
15. The neural network model of claim 11, wherein the linear transformation performed in each level of the cross-processing layer specifically comprises:
performing linear transformation on the ith original vector by using a first parameter matrix to obtain an ith transformation vector; and performing linear transformation on each level of feature vector by using a second parameter matrix to obtain each level of transformation vector.
16. The neural network model of claim 11, wherein the weighted combination performed in each level of the cross-processing layer specifically comprises:
determining respective degrees of correlation between the ith transform vector and the respective present-level transform vectors;
determining each weight factor corresponding to each current-level transformation vector according to each correlation degree;
respectively carrying out fusion operation on the ith transformation vector and each current-level transformation vector to obtain each fusion vector;
and performing weighted combination on each fusion vector according to each weight factor to obtain a combination result.
17. The neural network model of claim 16, wherein the weighted combination performed in each level of the cross-processing layer specifically comprises:
calculating cosine similarity of the ith transformation vector and each current-level transformation vector as the correlation degree; alternatively, the first and second electrodes may be,
calculating the inner product result of the ith transformation vector and each current-stage transformation vector as the correlation degree; alternatively, the first and second electrodes may be,
and calculating the vector distance between the ith transformation vector and each current-stage transformation vector, and determining the correlation degree according to the vector distance.
18. The neural network model of claim 16, wherein the fusion operation comprises one of:
multiplying by bit, summing and averaging.
19. The neural network model of claim 11, wherein each level of intersection processing layer is specifically configured to:
on the basis of the combination result, adding an offset vector and the ith feature vector as a feature vector in a next-stage processing matrix.
20. The neural network model of claim 11, wherein the pooling layer is specifically configured to:
pooling the final-stage processing matrix to obtain the characterization vector, wherein the pooling comprises one of the following steps: maximal pooling, average pooling, attention-based pooling.
21. An apparatus for business prediction for a business object, comprising:
an obtaining unit, configured to obtain an initial feature matrix corresponding to a first service object, where the initial feature matrix includes N original vectors obtained by encoding feature values of N features of the first service object;
the multiple cross processing units are configured to perform multi-stage feature cross processing on the initial feature matrix to obtain a multi-stage processing matrix; each cross processing unit is configured to perform linear transformation on the ith original vector at the corresponding position in the initial feature matrix and each ith feature vector in the current-level matrix to obtain an ith transformation vector and each current-level transformation vector for any ith feature vector in the current-level matrix to be processed; according to weight factors determined based on the correlation degrees between the ith transformation vector and the corresponding transformation vectors, performing weighted combination based on an attention mechanism on the fusion results obtained by fusing the ith transformation vector and the corresponding transformation vectors, and determining the feature vector of the ith feature vector in a next-stage processing matrix based on the combination results, wherein the fusion results have the same dimension as the ith transformation vector;
the pooling unit is configured to obtain a representation vector corresponding to the first service object according to the last-stage processing matrix in the multi-stage processing matrix;
and the prediction unit is configured to perform service prediction on the first service object according to the characterization vector.
22. A computing device comprising a memory and a processor, wherein the memory has stored therein executable code that, when executed by the processor, performs the method of any of claims 1-10.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010329614.XA CN111222722B (en) | 2020-04-24 | 2020-04-24 | Method, neural network model and device for business prediction for business object |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010329614.XA CN111222722B (en) | 2020-04-24 | 2020-04-24 | Method, neural network model and device for business prediction for business object |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111222722A CN111222722A (en) | 2020-06-02 |
CN111222722B true CN111222722B (en) | 2020-07-24 |
Family
ID=70831712
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010329614.XA Active CN111222722B (en) | 2020-04-24 | 2020-04-24 | Method, neural network model and device for business prediction for business object |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111222722B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113255908B (en) * | 2021-05-27 | 2023-04-07 | 支付宝(杭州)信息技术有限公司 | Method, neural network model and device for service prediction based on event sequence |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110751285A (en) * | 2018-07-23 | 2020-02-04 | 第四范式(北京)技术有限公司 | Training method and system and prediction method and system of neural network model |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11348018B2 (en) * | 2017-12-19 | 2022-05-31 | Aspen Technology, Inc. | Computer system and method for building and deploying models predicting plant asset failure |
CN110751261A (en) * | 2018-07-23 | 2020-02-04 | 第四范式(北京)技术有限公司 | Training method and system and prediction method and system of neural network model |
CN110046304B (en) * | 2019-04-18 | 2022-12-13 | 腾讯科技(深圳)有限公司 | User recommendation method and device |
CN110263973B (en) * | 2019-05-15 | 2024-02-02 | 创新先进技术有限公司 | Method and device for predicting user behavior |
CN110929206B (en) * | 2019-11-20 | 2023-04-07 | 腾讯科技(深圳)有限公司 | Click rate estimation method and device, computer readable storage medium and equipment |
-
2020
- 2020-04-24 CN CN202010329614.XA patent/CN111222722B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110751285A (en) * | 2018-07-23 | 2020-02-04 | 第四范式(北京)技术有限公司 | Training method and system and prediction method and system of neural network model |
Non-Patent Citations (2)
Title |
---|
AutoInt: Automatic Feature Interaction Learning via Self-Attentive Neural Networks;Weiping Song等;《CIKM"19》;20190823;全文 * |
Deep Spatio-Temporal Neural Networks for Click-Through Rate Prediction;Wentao Ouyang等;《KDD"19》;20190719;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN111222722A (en) | 2020-06-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109785062B (en) | Hybrid neural network recommendation system based on collaborative filtering model | |
CN112800342B (en) | Recommendation method, system, computer device and storage medium based on heterogeneous information | |
CN110598118A (en) | Resource object recommendation method and device and computer readable medium | |
CN111008335B (en) | Information processing method, device, equipment and storage medium | |
WO2022152161A1 (en) | Training and prediction of hybrid graph neural network model | |
CN111737578A (en) | Recommendation method and system | |
CN113255908B (en) | Method, neural network model and device for service prediction based on event sequence | |
CN112633927B (en) | Combined commodity mining method based on knowledge graph rule embedding | |
CN111177577B (en) | Group project recommendation method, intelligent terminal and storage device | |
CN115859199A (en) | Medical insurance fraud detection method and embedded vector generation method, device and medium thereof | |
CN111222722B (en) | Method, neural network model and device for business prediction for business object | |
CN115482141A (en) | Image processing method, image processing device, electronic equipment and storage medium | |
CN110502701B (en) | Friend recommendation method, system and storage medium introducing attention mechanism | |
CN113779380A (en) | Cross-domain recommendation method, device and equipment, and content recommendation method, device and equipment | |
JP7414357B2 (en) | Text processing methods, apparatus, devices and computer readable storage media | |
CN114491086A (en) | Clothing personalized matching recommendation method and system, electronic equipment and storage medium | |
CN114996566A (en) | Intelligent recommendation system and method for industrial internet platform | |
CN112734519B (en) | Commodity recommendation method based on convolution self-encoder network | |
CN114817758A (en) | Recommendation system method based on NSGC-GRU integrated model | |
CN112132345A (en) | Method and device for predicting user information of electric power company, electronic equipment and storage medium | |
CN113850616A (en) | Customer life cycle value prediction method based on depth map neural network | |
CN112559640A (en) | Training method and device of atlas characterization system | |
CN117859139A (en) | Multi-graph convolution collaborative filtering | |
CN111445282B (en) | Service processing method, device and equipment based on user behaviors | |
CN117252665B (en) | Service recommendation method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |