CN113159101A

CN113159101A - Model obtaining method, user processing method, device and electronic equipment

Info

Publication number: CN113159101A
Application number: CN202110201867.3A
Authority: CN
Inventors: 熊涛; 江曼; 洪星芸
Original assignee: Beijing Sankuai Online Technology Co Ltd
Current assignee: Beijing Sankuai Online Technology Co Ltd
Priority date: 2021-02-23
Filing date: 2021-02-23
Publication date: 2021-07-23

Abstract

The embodiment of the invention provides a model obtaining method, a user processing method, a device and electronic equipment. Therefore, a plurality of sub-models are trained without splitting, sub-training data under a plurality of themes are fused through theme weights for training, and finally required user processing models can be obtained based on the data of the plurality of themes, so that the training process can be simplified to a certain extent, and the model maintenance cost is reduced.

Description

Model obtaining method, user processing method, device and electronic equipment

Technical Field

The invention belongs to the technical field of networks, and particularly relates to a model acquisition method, a user processing device and electronic equipment.

Background

Currently, it is often necessary to acquire models to perform a wide variety of processes on a user. For example, a credit evaluation model is obtained to perform credit evaluation on the user, an order rate prediction model is obtained to perform order rate prediction on the user, and so on. Training data is often used for training when obtaining a model, but the training data may include data of a plurality of subjects with different coverage degrees for sample users. Therefore, how to use the data acquisition model with multiple topics becomes a problem of great attention.

In the prior art, data of different subjects are often used to respectively obtain corresponding sub-models, and then a final model is obtained based on the plurality of sub-models. In the mode, the training process is complicated, and due to the fact that the sub models are split and trained, the multiple models need to be maintained at the same time subsequently, and the maintenance cost is high.

Disclosure of Invention

The invention provides a model acquisition method, a user processing device and electronic equipment, and aims to solve the problems of complexity in a training process and high maintenance cost.

In a first aspect, the present invention provides a method for model acquisition, the method comprising:

inputting training data into a model to be trained; the training data comprises a plurality of types of sub-training data corresponding to the sample user, and different sub-training data correspond to different subjects;

determining a theme weight corresponding to each sub-training data according to the feature representation of each sub-training data based on a first self-attention layer included in the model to be trained;

acquiring feature representation of the training data according to the feature representation of each sub-training data and the corresponding theme weight;

and training the model to be trained according to the feature representation of the training data to obtain a user processing model.

In a second aspect, the present invention provides a user processing method, including:

inputting user related data of a user to be predicted into a pre-trained user processing model; the user related data comprises a plurality of sub related data corresponding to the user to be predicted, and different sub related data correspond to different subjects;

acquiring the output of the user processing model to obtain the target information of the user to be predicted;

wherein the user process model is generated based on the model acquisition method of any one of claims 1 to 8.

In a third aspect, the present invention provides a model acquisition apparatus, comprising:

the input module is used for inputting training data into a model to be trained; the training data comprises a plurality of types of sub-training data corresponding to the sample user, and different sub-training data correspond to different subjects;

a first determining module, configured to determine, based on a first self-attention layer included in the model to be trained, a theme weight corresponding to each sub-training data according to a feature representation of each sub-training data;

the first acquisition module is used for acquiring the feature representation of the training data according to the feature representation of each sub-training data and the corresponding theme weight;

and the training module is used for training the model to be trained according to the feature representation of the training data so as to obtain a user processing model.

In a fourth aspect, the present invention provides a user processing apparatus, comprising:

the input module is used for inputting the user related data of the user to be predicted into the pre-trained user processing model; the user related data comprises a plurality of sub related data corresponding to the user to be predicted, and different sub related data correspond to different subjects;

the acquisition module is used for acquiring the output of the user processing model so as to obtain the target information of the user to be predicted;

wherein the user processing model is generated based on the model obtaining device.

In a fifth aspect, the present invention provides an electronic device, comprising: a processor, a memory and a computer program stored on the memory and executable on the processor, the processor implementing the above method when executing the program.

In a sixth aspect, the present invention provides a readable storage medium, wherein instructions, when executed by a processor of an electronic device, enable the electronic device to perform the above-described method.

In the embodiment of the invention, training data are input into a preset model to be trained, the training data comprise a plurality of types of sub-training data corresponding to a sample user, different sub-training data correspond to different subjects, based on a first self-attention layer included in the model to be trained, subject weights corresponding to the sub-training data are determined according to feature representations of the sub-training data, feature representations of the training data are obtained according to the feature representations of the sub-training data and the corresponding subject weights, and the model to be trained is trained according to the feature representations of the training data to obtain a user processing model. Therefore, a plurality of sub-models do not need to be split and trained, and the sub-training data under a plurality of themes are fused through theme weights for training, so that the finally required user processing model can be obtained based on the data of the plurality of themes. Therefore, the training process can be simplified to a certain extent, and the subsequent model maintenance cost is reduced.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

FIG. 1 is a flow chart illustrating steps of a model acquisition method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a mapping provided by an embodiment of the invention;

FIG. 3 is a schematic diagram of a theme weight process according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a process provided by an embodiment of the present invention;

FIG. 5 is a schematic diagram of a model structure provided by an embodiment of the invention;

FIG. 6 is a schematic diagram of another model structure provided by the embodiment of the invention;

FIG. 7 is a flowchart illustrating steps of a method for processing a user according to an embodiment of the present invention;

fig. 8 is a structural diagram of a model acquisition apparatus according to an embodiment of the present invention;

FIG. 9 is a block diagram of a user processing device according to an embodiment of the present invention;

fig. 10 is a block diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 is a flowchart of steps of a model obtaining method according to an embodiment of the present invention, and as shown in fig. 1, the method may include:

step 101, inputting training data into a model to be trained; the training data comprises a plurality of types of sub-training data corresponding to the sample user, and different sub-training data correspond to different subjects.

In the embodiment of the present invention, the model to be trained may be selected according to actual requirements, for example, the model to be trained may be a neural network model. The sample user may be a user selected from the platform according to actual needs, the training data may be user-related data of the sample user, the user-related data may include a plurality of sub-related data corresponding to the sample user, and the sub-related data is sub-training data. Wherein, the topics corresponding to the sub-training data may be pre-divided. For example, the training data may be divided into sub-training data under different topics according to data attributes. The data attribute can be set according to actual requirements, for example, the data attribute can be set according to a data acquisition channel, according to the type of the data, and the like. Taking the example that the user processing model is used for predicting the credit degree of the user, the topics corresponding to the sub-training data may include: transaction browsing behavior in the platform, performance behavior of loan and repayment in the platform, default behavior, bank operation behavior and the like. Accordingly, the sub-training data may correspond to: transaction browsing behavior data, fulfillment behavior data, default behavior data, and bank operation behavior data. The coverage rate of data under different subjects to users is different, for example, users who have trade browsing behaviors tend to be more, and users who have default behaviors tend to be less. Thus, the coverage of transaction browsing behavior data tends to be greater than the coverage of default behavior data. Or, taking the example that the user processing model is used for predicting the order rate of the user, the topics corresponding to the sub-training data may also include: the user's merchant browsing behavior, order placing behavior, order canceling behavior, etc. in the platform, and correspondingly, the sub-training data may correspondingly include: the merchant browsing behavior data, the order placing behavior data and the order cancelling behavior data.

And 102, determining the theme weight corresponding to each sub-training data according to the feature representation of each sub-training data based on the first self-attention layer included in the model to be trained.

In the embodiment of the present invention, the specific model structure of the model to be trained may be set according to actual requirements, and specifically, the model to be trained may include a first self-attention layer. After the sub-training data is input into the model to be trained, the model to be trained can obtain the feature representation of the sub-training data, and then the subject weight is distributed to the sub-training data. The feature representation of the sub-training data may be a feature vector of the sub-training data obtained after vectorizing the sub-training data. The first self-attention layer can determine the importance degree of each sub-training data according to the characteristics of each sub-training data in the training data, so that the theme weight corresponding to each sub-training data is determined.

And 103, acquiring the feature representation of the training data according to the feature representation of each sub-training data and the corresponding theme weight.

In the embodiment of the invention, the training data is composed of a plurality of sub-training data, so that the feature representation of the training data can be generated based on the feature representation of each sub-training data and the corresponding theme weight. For example, the sub-training data in the training data may be fused according to the feature representation of each sub-training data and the corresponding theme weight, so as to obtain the feature representation of the training data.

And 104, training the model to be trained according to the feature representation of the training data to obtain a user processing model.

In the embodiment of the invention, the sub-training data under a plurality of topics are fused through the topic weights and are used as the integral feature representation of the training data. Therefore, the model to be trained is trained through the characteristic representation of the training data, so that the model to be trained can fully learn the information of the data of various themes and the cross information among different themes to a certain extent, and the prediction effect of the user processing model obtained through training can be further improved to a certain extent.

In summary, in the model obtaining method provided in the embodiment of the present invention, training data is input into a preset model to be trained, the training data includes multiple types of sub-training data corresponding to a sample user, different sub-training data correspond to different topics, based on a first attention layer included in the model to be trained, a topic weight corresponding to each sub-training data is determined according to a feature representation of each sub-training data, a feature representation of the training data is obtained according to the feature representation of each sub-training data and the corresponding topic weight, and the model to be trained is trained according to the feature representation of the training data, so as to obtain a user processing model. Therefore, a plurality of sub-models do not need to be split and trained, and the sub-training data under a plurality of themes are fused through theme weights for training, so that the finally required user processing model can be obtained based on the data of the plurality of themes. Therefore, the training process can be simplified to a certain extent, and the subsequent model maintenance cost is reduced.

Meanwhile, because the coverage of the data of different themes is different, the influence degree of the data of different themes on the model output is different, and in a mode of splitting the data of different themes and respectively establishing the sub-models, the cross information between the data of different themes cannot be utilized. Therefore, the prediction effect of the trained model is poor. In the embodiment of the invention, the sub-training data under a plurality of subjects are fused by the subject weight for training, so that the model can fully learn the information of the data of the plurality of subjects and the cross information among different subjects, and the prediction effect of the trained model can be further improved to a certain extent.

Optionally, the model obtaining method in the embodiment of the present invention may further include the following steps:

and step S21, determining the target time length of the data generation time of each sub-training data from the current time.

In this step, the data generation time of each pre-recorded sub-training data may be read first, and then the difference between the data generation time and the current time may be calculated as the target time length. The data generation time may be the data acquisition time of the sub-training data, and the sub-training data may include the target duration.

Step S22, determining the freshness weight of each sub-training data according to the target duration of each sub-training data; wherein the freshness weight is positively correlated with the target length of time.

In an actual scene, the generation periods of data of different subjects are different, for example, browsing behavior data is often updated faster, and due to the limitation of query cost and other conditions, pedestrian data of a client is often updated slower. This results in different target durations for different sub-training data. And newer data tends to be more trustworthy, while older data tends to be less trustworthy. Therefore, in the embodiment of the present invention, weighting may be performed according to the time length from the present time of the acquisition time of the data of different topics, giving a lower weight to the data of the topic that is longer from the present time, and giving a greater weight to the data that is fresher, that is, determining the freshness weight according to the target time length.

Correspondingly, the step of obtaining the feature representation of the training data according to the feature representation of each sub-training data and the corresponding theme weight may specifically include:

step 1031, obtaining the feature representation of the training data according to the feature representation, the corresponding theme weight and the freshness weight of each sub-training data.

In this step, the theme weight and the freshness weight corresponding to the sub-training data may represent the degree of contribution of the feature representation of the sub-training data to the feature representation of the training data, and the larger the theme weight and the freshness weight corresponding to the sub-training data are, the larger the proportion of the feature representation of the sub-training data in the feature representation of the training data may be.

In an actual scene, the credibility of newer data is higher, and the credibility of older data is lower. In the embodiment of the invention, the freshness weight of each sub-training data is determined according to the target duration of each sub-training data from the data generation time of each sub-training data to the current time, and the freshness weight of each sub-training data is further combined in the process of obtaining the feature representation of the training data. Therefore, a higher freshness weight is allocated to the updated sub-training data, the proportion of the sub-training data in the feature representation of the training data is increased, and the finally obtained feature representation of the training data can be more representative. The feature representation of the training data is used for training subsequently, so that the latest data can be more trusted when the user processing model obtained through final training is used for prediction, the effect of the latest data on model prediction can be further larger, the user processing model can be used to a certain extent, the latest feature representation of the user to be predicted can be captured more sensitively, the prediction precision is improved, and the probability of misjudgment is reduced. Further, by reducing the probability of erroneous judgment, the loss due to erroneous judgment can be reduced to some extent.

Optionally, the model to be trained in the embodiment of the present invention may further include a full connection layer. Accordingly, before obtaining the feature representation of the training data according to the feature representation of each sub-training data and the corresponding theme weight, the following steps may be further performed:

step S31, vectorize each of the sub-training data of any of the sample users to obtain an initial feature representation of each of the sub-training data.

In this step, a preset vectorization tool can be selected according to actual requirements to carry out vectorization on the sub-training data. The preset vectorization tool can be used for an algorithm for converting discrete variables into continuous vector representations. Further, since the dimensions of the sub-training data of different topics are not consistent, the dimensions of the initial feature representations of the acquired sub-training data may be different.

And step S32, mapping the initial feature representation of each sub-training data to a specified dimension by using the full connection layer to obtain the feature representation of each sub-training data.

In the embodiment of the present invention, the fully-connected layer may be a local fully-connected neural network in nature. By accessing a local fully-connected neural network, Embedding (Embedding) of the initial feature representations of the sub-training data of different topics can be realized, that is, the initial feature representations of the sub-training data of different topics are mapped to a specified dimension, so that the initial feature representations of the sub-training data of different topics are characterized in a feature space of the same dimension.

Further, the specified dimension may be set according to actual requirements, and is not limited in the embodiment of the present invention. For example, the designated dimension is represented by h, and the initial feature corresponding to the sub-training data of the ith topic is assumed to be represented by (x)_i1，x_i2，…x_im) And m represents the dimension of the initial feature representation corresponding to the sub-training data of the ith subject. Then the mapping may be implemented to be (x)_i1，x_i2，…，x_im) And converting into h-dimensional feature representation. Fig. 2 is a mapping schematic diagram provided by an embodiment of the present invention, and as shown in fig. 2, for each element in the initial feature representation, a weighted sum between each element and its corresponding jth dimension weight may be calculated, so as to obtain a jth element in the feature representation obtained after mapping. Wherein j is 1 to h,the mapped feature representation may be represented as (e)_i1，e_i2，…，e_ih) The 1 st dimension weight may include w_i11～w_im1，w_i12～w_im2，…，w_i1h～w_imhThese weights may correspond to connection weights in the fully connected layer, i.e., the weights are parameters in the fully connected layer. Thus, these parameters can be continuously optimized as the model is trained.

In the embodiment of the invention, each sub-training data of any sample user is vectorized to obtain the initial feature representation of each sub-training data, and then the initial feature representation of each sub-training data is mapped to the specified dimension by utilizing the full connection layer to obtain the feature representation of each sub-training data. Like this, all map to the same dimension through the feature representation with sub-training data, can make things convenient for the follow-up to calculate to a certain extent, and all map to appointed dimension through the feature representation with all sub-training data, can avoid the too big too high problem of the processing complexity that leads to the model of dimension to a certain extent to and avoid the too little problem of information that leads to the model to study of dimension undersize.

Optionally, the model to be trained in the embodiment of the present invention may further include a second self-attention layer. Correspondingly, the step of determining the freshness weight of each of the sub-training data according to the target duration of each of the sub-training data may specifically include:

step S22a, regarding any one of the sub-training data, using the second self-attention layer, and taking the target duration as an input of a preset freshness weight formula, and taking an output of the preset freshness weight formula as a freshness weight of the sub-training data; the preset freshness weight formula comprises a target coefficient, and the target coefficient is a parameter in the second self-attention layer.

In this step, the specific content of the preset freshness weight formula can be set according to actual requirements, and the preset freshness weight formula can be set in the second self-attention layer. For example, the preset freshness weight formula can be expressed as:

wherein f is_iRepresenting freshness weight, t, of sub-training data_iRepresenting target duration, v, of sub-training data_iAnd b_iRepresenting the target coefficient. The target coefficient is used as a parameter in the second self-attention layer and can be continuously adjusted and optimized along with a parameter adjusting link in the training process. After the training of the model to be trained is completed, the finally obtained user processing model can learn the correct specific value of the target coefficient. Meanwhile, the sub-training data of each theme in different rounds of training may be different, so that the target coefficient can be adjusted and optimized according to different themes through the adjustment operation on the target coefficient in different rounds.

In the embodiment of the invention, the freshness weight is determined for each sub-training data based on the preset freshness weight formula by utilizing the second self-attention layer in the model to be trained, so that the accuracy of the freshness weight can be ensured to a certain extent, and the accuracy of the feature representation of the subsequent training data generated based on the freshness weight is further ensured.

Optionally, the step of obtaining the feature representation of the training data according to the feature representation, the corresponding theme weight, and the freshness weight of each sub-training data may specifically include:

step 1031a, for any one of the sub-training data, calculating a first product between the feature representation of the sub-training data and the theme weight corresponding to the sub-training data.

For example, assume that the sub-training data of the ith topic corresponds to a topic weight of a_iThen the first product can be characterized as a_i*(e_i1，e_i2，…，e_ih)。

Step 1031b, calculating a second product between the first product and the freshness weight of the sub-training data.

For example, assume that the child training data of the ith topic corresponds to a freshness weight of f_iThen the second product may beIs characterized by f_i*a_i*(e_i1，e_i2，…，e_ih). Thus, by the feature representation of the sub-training data at the ith topic (e)_i1，e_i2，…，e_ih) On the basis, the feature representation is weighted according to the freshness weight and the theme weight, so that the freshness of the sub-training data of the ith theme and the information richness of the data under the theme can be represented more accurately by the second product to a certain extent.

And step 1031c, splicing the second products of the sub-training data to obtain the feature representation of the training data.

Specifically, the second products of the sub-training data may be sequentially concatenated to obtain the feature representation of the training data. For example, assuming that there are k topics, the feature representation of the training data obtained after splicing may be [ f₁*a₁*(e₁₁，e₁₂，…，e_1h)][f₂*a₂*(e₂₁，e₂₂，…，e_2h)]…[f_k*a_k*(e_k1，e_k2，…，e_kh)]. Of course, other ways of stitching may be used as long as it is ensured that the second products of all sub-training data are present in the feature representation of the training data.

In the embodiment of the invention, a first product between the feature representation of the sub-training data and the theme weight corresponding to the sub-training data is calculated, then a second product between the first product and the freshness weight of the sub-training data is calculated, and finally the second products of all the sub-training data are spliced into the feature representation of the training data. Therefore, the sub-training data are weighted through the theme weight and the freshness weight, the weighting result is spliced to be used as the feature representation of the training data, the obtained feature representation of the training data can have information of representing different sub-training data with emphasis to a certain degree, and the training effect based on the feature representation of the training data can be ensured to a certain degree.

Optionally, the step of determining the theme weight corresponding to each piece of sub-training data according to the feature representation of each piece of sub-training data based on the first self-attention layer may specifically include:

step 1021, for any one of the sub-training data, determining a self-attention weight corresponding to each dimension feature in the feature representation based on the first self-attention layer.

In this step, the first self-Attention layer may be essentially a local fully-connected network using a self-Attention (Attention) mechanism. The first self-attention layer may determine a self-attention weight corresponding to each dimension feature in the feature representation of the sub-training data based on a self-attention mechanism. Exemplary, feature representation of sub-training data (e)_i1，e_i2，…，e_ih) The respective corresponding self-attention weight may be represented as (W'_i1，W’_i2，…，W’_ih). The self-attention weight may be a parameter set in the first self-attention layer, and the self-attention weight is continuously adjusted and optimized during the training process of the model.

And 1022, calculating a feature weighted sum corresponding to the sub-training data according to each dimension feature and the self-attention weight corresponding to each dimension feature.

By way of example, e may be calculated_i1*W’_i1+e_i2*W’_i2+，…，e_ih*W’_ihAnd obtaining a characteristic weighted sum.

And 1023, determining the theme weight corresponding to each sub-training data according to the feature weighted sum corresponding to each sub-training data.

In the embodiment of the invention, the theme data which has larger influence on the model prediction result can be given larger weight, and the theme data which has lower influence on the model prediction result can be given smaller weight. In one implementation, the feature weights may be directly output as topic weights corresponding to the sub-training data. Thus, the theme weight can be obtained without executing additional operation, and the processing flow can be simplified to a certain extent. For example, fig. 3 is a schematic diagram of a theme weight processing provided by an embodiment of the present invention, and as shown in fig. 3, the theme weight ai may be obtained by performing weighted summation based on each dimension feature and a self-attention weight corresponding to each dimension feature.

Further, in another implementation manner, the model to be trained may further include a softmax layer, and when determining the theme weight corresponding to each piece of sub-training data according to the feature weighting sum corresponding to each piece of sub-training data, the feature weighting sum corresponding to each piece of sub-training data may be input to the softmax layer, and the softmax layer is used to determine the normalization processing value corresponding to each piece of sub-training data. And determining the normalization processing value corresponding to each sub-training data as the theme weight corresponding to each sub-training data. Therein, the softmax layer may be used to map inputs to real numbers within a specified interval (e.g., 0-1). In this way, by setting the softmax layer and performing normalization processing on the feature weighting sum corresponding to each sub-training data based on the softmax layer to serve as the theme weight corresponding to the sub-training data, the finally obtained theme weight can be normalized within a specific range. And because the softmax layer can be adjusted correspondingly according to the input size when working, the theme weight of each sub-training data can be more balanced to a certain extent.

For example, taking the user processing model as an example for performing credit evaluation, when only the sub-training data with weak financial attributes exists in the training data, the sub-training data with weak financial attributes with low theme weight is processed by the softmax layer, so that the theme weight of the sub-training data with weak financial attributes can be increased to a certain extent, the model can obtain more sufficient information and pay more attention to the data with weak financial attributes, the prediction effect of the later-stage model can be ensured to a certain extent, and the risk evaluation capability of the model is improved. It should be noted that, in the embodiment of the present invention, each freshness weight may also be input into the softmax layer, that is, the softmax layer is connected to the second attention layer, and the softmax layer is used to determine the normalized processing value corresponding to each freshness weight. And updating each freshness weight to a corresponding normalized processing value.

In the embodiment of the invention, the self-attention weight corresponding to each dimension feature in the feature representation is determined through a self-attention mechanism, then the feature weighted sum corresponding to the sub-training data is calculated according to each dimension feature and the self-attention weight corresponding to each dimension feature, and the theme weight corresponding to each sub-training data is determined according to the feature weighted sum corresponding to each sub-training data. Therefore, the characteristics of the sub-training data can be fully combined, more important characteristics are given more weight, and the accuracy of the theme weight corresponding to each sub-training data can be further ensured to a certain extent.

Optionally, when the model to be trained is trained according to the feature representation of the training data, the feature representation of the training data may be processed to obtain an output of the model to be trained. Based on the output of the model to be trained, adjusting model parameters in the model to be trained so as to make the model to be trained converge; the model parameters may include parameters in each layer included in the model to be trained. In a specific embodiment, the feature representation of the training data may be input to a subsequent layer for processing, for example, assuming that a multi-layer fully-connected neural network layer (MLP) is connected after the layer for generating the feature representation of the training data, the MLP may process the feature representation of the training data, and finally, the output layer may output the feature representation. The output layer may adopt a preset Logistic Regression (Logistic Regression) algorithm: y is 1/(1+ exp (- (W)^TY + b)), where Y represents the output of the last layer of the MLP, W and b are preset algorithm parameters, and W may also be represented by other flags, e.g., a.

For example, fig. 4 is a schematic processing diagram provided by an embodiment of the present invention, and as shown in fig. 4, for data of N topics, after weighting processing is performed based on the topic weights and the freshness weights, a final output Y can be obtained. Further, fig. 5 is a schematic structural diagram of a model provided in an embodiment of the present invention, and as shown in fig. 5, in an implementation manner, the model to be trained may include an input layer, a full connection layer, a first self-attention layer, a second self-attention layer, a softmax layer, an MLP layer, and an output layer. Wherein the respective layers may perform the respective operations in the foregoing description. In this implementation, the model adopts a parallel structure, so that the processing efficiency can be improved to a certain extent. Further, fig. 6 is another schematic structural diagram of a model provided in an embodiment of the present invention, and as shown in fig. 6, in another implementation, the model to be trained may include a plurality of input layers, a plurality of fully connected layers, a first self-attention layer, a second self-attention layer, a softmax layer connected after the first self-attention layer, a softmax layer connected after the second self-attention layer, an MLP layer, and an output layer. Wherein, the input layers can be respectively used for inputting data of different subjects, and each layer can execute the corresponding operation in the foregoing description. In the implementation mode, the model adopts a serial structure, so that the information quantity learned in each link can be improved to a certain extent, and the model training effect is further improved.

Further, whether the model to be trained is converged currently can be determined based on the output and the label corresponding to the training data. For example, the current loss value of the model to be trained is determined based on the output and the label corresponding to the training data, and when the loss value does not meet the preset requirement, the model to be trained is determined not to be converged. Accordingly, parameters in the model to be trained may be adjusted, for example, the parameters may be adjusted by a random gradient descent method, and the adjusted model to be trained may be trained continuously. In this way, through multiple rounds of iterative training, the model to be trained can gradually tend to converge. Accordingly, after the model to be trained converges, the training may be stopped, and the converged model to be trained may be used as the user processing model. In the embodiment of the invention, model parameters in each layer of the model to be trained are continuously adjusted and optimized in the training process of the model to be trained, so that the user processing model obtained by final training can be ensured to be accurately processed.

In an existing implementation, sub-training data of all topics are often directly spliced, and modeling is performed based on the spliced full-scale data. In this way, sometimes the model is over-focused on the subject matter that the desired prediction result is strongly related to. For example, in a scenario where a user processes a model for risk assessment, the model often screens out the first N topics with the strongest financial attributes in a training process, and learns based on data of the first N topics, so that the model cannot fully utilize data of all topics, only focuses on data with strong financial attributes during prediction, and further the prediction effect of the model is poor. In the embodiment of the invention, the sub-training data of each theme are processed in a differentiated mode according to the theme weight, so that the processing effect of the user processing model obtained by using the sub-training data of a plurality of themes for training can be ensured to a certain extent.

Fig. 7 is a flowchart of steps of a user processing method according to an embodiment of the present invention, and as shown in fig. 7, the method may include:

step 201, inputting user related data of a user to be predicted into a pre-trained user processing model; the user related data comprises a plurality of sub related data corresponding to the user to be predicted, and different sub related data correspond to different subjects.

In the embodiment of the present invention, the types of sub-related data and the specific types of corresponding topics may refer to the description of the sub-training data, and are not described herein again. Specifically, based on the first self-attention layer in the user processing model, the theme weight corresponding to each piece of sub-related data may be determined according to the feature representation of each piece of sub-related data, then the feature representation of the user-related data may be determined according to the feature representation of each piece of sub-related data and the theme weight corresponding to the sub-related data, and finally the feature representation of the user-related data may be processed to obtain the output of the user processing model. In this way, the data of different topics are fused in a self-attention mechanism weighting mode, and the output is obtained based on the feature representation of the user-related data obtained through fusion, so that the user processing model can fully utilize the information in the user-related data of the user to be predicted to a certain extent, and the prediction effect is further ensured.

Step 202, obtaining the output of the user processing model to obtain the target information of the user to be predicted. The user processing model is generated based on the model obtaining method.

In the embodiment of the present invention, the target information may be set according to actual requirements, and the target information may be used to represent one or more of the following: the user's order rate, credit level, and probability of default, etc.

In summary, in the user processing method provided in the embodiment of the present invention, the user-related data of the user to be predicted is input into the pre-trained user processing model, the user-related data includes multiple sub-related data corresponding to the user to be predicted, different sub-related data correspond to different topics, and the output of the user processing model is used as the target information of the user to be predicted. Because the user processing model is obtained by combining the theme weights with the sub-training data under the multiple themes for training, the information of the data of the multiple themes and the cross information among different themes can be fully learned in the training process, and therefore, when the user processing model is used for predicting based on the sub-related data of the multiple themes of the user to be predicted, the accuracy of the predicted target information can be ensured to a certain extent. And the training of a plurality of sub-models is not required to be split, and the training is performed by fusing sub-training data under a plurality of themes through theme weights, so that the user processing model can be obtained based on the data of the plurality of themes. Therefore, the training process can be simplified to a certain extent, and the subsequent model maintenance cost is reduced.

Optionally, the user processing model may perform the following operations on the input user processing model: determining a theme weight corresponding to each piece of sub-related data according to the feature representation of each piece of sub-related data based on a first self-attention layer included in the user processing model; acquiring feature representation of the user related data according to feature representation of each sub-related data and corresponding theme weight; processing the feature representation of the user-related data to obtain an output of a user processing model.

Further, the user processing method may further include: determining a target duration from the data generation time of each piece of sub-related data to the current time; determining the freshness weight of each sub-related data according to the target duration of each sub-related data; wherein the freshness weight is positively correlated with the target length of time;

the obtaining the feature representation of the user-related data according to the feature representation of each piece of sub-related data and the corresponding theme weight may include: and acquiring the characteristic representation of the user related data according to the characteristic representation, the corresponding theme weight and the freshness weight of each piece of sub-related data.

Optionally, the user processing model further includes a second self-attention layer; determining a freshness weight of each of the sub-related data according to a target duration of each of the sub-related data, which may include, for any one of the sub-related data, using the second self-attention layer, taking the target duration as an input of a preset freshness weight formula, and taking an output of the preset freshness weight formula as the freshness weight of the sub-related data; wherein the preset freshness weight formula comprises a target coefficient, and the target coefficient is a parameter in the second self-attention layer.

Further, the obtaining the feature representation of the user-related data according to the feature representation, the corresponding theme weight, and the freshness weight of each piece of sub-related data may include: for any one of the sub-related data, calculating a first product between the feature representation of the sub-related data and the topic weight corresponding to the sub-related data; calculating a second product between the first product and a freshness weight of the sub-related data; and splicing the second products of the sub-related data to obtain the feature representation of the user-related data.

Further, the user processing model further comprises a full connection layer; before determining the feature representation of the user-related data according to the feature representation of each piece of sub-related data and the theme weight corresponding to the sub-related data, the following operations may be further performed: vectorizing each of the sub-related data of any of the sample users to obtain an initial feature representation of each of the sub-related data; and mapping the initial feature representation of each sub-related data to a specified dimension by using the full connection layer to obtain the feature representation of each sub-related data.

Further, the operation of determining the theme weight corresponding to each piece of sub-related data based on the feature representation of each piece of sub-related data by the first self-attention layer may specifically include: for any of the sub-related data, determining a self-attention weight corresponding to each dimension feature in the feature representation based on the first self-attention layer; calculating a feature weighted sum corresponding to the sub-correlation data according to each dimension feature and the self-attention weight corresponding to each dimension feature; and determining the theme weight corresponding to each sub-related data according to the feature weighted sum corresponding to each sub-related data.

Further, the user processing model may further include a softmax layer; the determining the theme weight corresponding to each piece of sub-related data according to the feature weighted sum corresponding to each piece of sub-related data may specifically include: inputting the feature weighted sum corresponding to each sub-related data into the softmax layer, and determining a normalization processing value corresponding to each sub-related data by using the softmax layer; and determining the normalization processing value corresponding to each sub-related data as the theme weight corresponding to each sub-related data. The specific implementation manner and the achievable effect of each operation described above may refer to the foregoing related description, and are not described herein again.

Fig. 8 is a block diagram of a model acquisition apparatus according to an embodiment of the present invention, where the apparatus 30 may include:

an input module 301, configured to input training data into a model to be trained; the training data comprises a plurality of types of sub-training data corresponding to the sample user, and different sub-training data correspond to different subjects;

a first determining module 302, configured to determine, based on a first self-attention layer included in the model to be trained, a theme weight corresponding to each piece of sub-training data according to a feature representation of each piece of sub-training data;

a first obtaining module 303, configured to obtain feature representations of the training data according to the feature representations of the sub-training data and corresponding theme weights;

a training module 304, configured to train the model to be trained according to the feature representation of the training data, so as to obtain a user processing model.

Optionally, the apparatus 30 further includes:

the second determining module is used for determining the target duration from the data generation time of each sub-training data to the current time;

a third determining module, configured to determine a freshness weight of each of the sub-training data according to a target duration of each of the sub-training data; wherein the freshness weight is positively correlated with the target length of time;

the first obtaining module 303 is specifically configured to:

and acquiring the feature representation of the training data according to the feature representation, the corresponding theme weight and the freshness weight of each sub-training data.

Optionally, the model to be trained further includes a second self-attention layer; the third determining module is specifically configured to:

for any one of the sub-training data, utilizing the second self-attention layer, taking the target duration as an input of a preset freshness weight formula, and taking an output of the preset freshness weight formula as a freshness weight of the sub-training data;

wherein the preset freshness weight formula comprises a target coefficient, and the target coefficient is a parameter in the second self-attention layer.

Optionally, the first obtaining module 303 is further specifically configured to:

for any of the sub-training data, calculating a first product between a feature representation of the sub-training data and a theme weight corresponding to the sub-training data;

calculating a second product between the first product and a freshness weight of the sub-training data;

and splicing the second products of the sub-training data to obtain the feature representation of the training data.

Optionally, the model to be trained further includes a full connection layer; the device further comprises:

a second obtaining module, configured to perform vectorization on each sub-training data of any sample user to obtain an initial feature representation of each sub-training data;

and the mapping module is used for mapping the initial feature representation of each sub-training data to a specified dimension by using the full connection layer to obtain the feature representation of each sub-training data.

Optionally, the first determining module 302 is specifically configured to:

for any one of the sub-training data, determining a self-attention weight corresponding to each dimension feature in the feature representation based on the first self-attention layer;

calculating a feature weighted sum corresponding to the sub-training data according to each dimension feature and the self-attention weight corresponding to each dimension feature;

and determining the theme weight corresponding to each sub-training data according to the feature weighted sum corresponding to each sub-training data.

Optionally, the model to be trained further includes a softmax layer; the first determining module 302 is further specifically configured to:

inputting the feature weighted sum corresponding to each sub-training data into the softmax layer, and determining a normalization processing value corresponding to each sub-training data by using the softmax layer;

and determining the normalization processing value corresponding to each sub-training data as the theme weight corresponding to each sub-training data.

Optionally, the self-attention weight is a parameter in the first self-attention layer; the training module 304 is specifically configured to:

processing the feature representation of the training data to obtain the output of the model to be trained;

based on the output of the model to be trained, adjusting model parameters in the model to be trained so as to make the model to be trained converge; wherein the model parameters include parameters in each layer included in the model to be trained.

In summary, the model obtaining apparatus provided in the embodiment of the present invention inputs training data into a preset model to be trained, where the training data includes multiple types of sub-training data corresponding to a sample user, different sub-training data correspond to different topics, based on a first attention layer included in the model to be trained, a topic weight corresponding to each sub-training data is determined according to a feature representation of each sub-training data, a feature representation of the training data is obtained according to the feature representation of each sub-training data and the corresponding topic weight, and the model to be trained is trained according to the feature representation of the training data, so as to obtain a user processing model. Therefore, a plurality of sub-models do not need to be split and trained, and the sub-training data under a plurality of themes are fused through theme weights for training, so that the finally required user processing model can be obtained based on the data of the plurality of themes. Therefore, the training process can be simplified to a certain extent, and the subsequent model maintenance cost is reduced.

Fig. 9 is a block diagram of a user processing device according to an embodiment of the present invention, where the device 40 may include:

an input module 401, configured to input user-related data of a user to be predicted into a pre-trained user processing model; the user related data comprises a plurality of sub related data corresponding to the user to be predicted, and different sub related data correspond to different subjects;

an obtaining module 402, configured to obtain the user processing model output to obtain target information of the user to be predicted; wherein the user processing model is generated based on the model obtaining device.

Optionally, the target information is used to characterize one or more of the following: the ordering rate, the credit level and the default probability of the user.

In summary, the user processing apparatus provided in the embodiment of the present invention inputs the user-related data of the user to be predicted into the pre-trained user processing model, where the user-related data includes multiple sub-related data corresponding to the user to be predicted, and different sub-related data correspond to different topics, and outputs the user processing model as the target information of the user to be predicted. Because the user processing model is obtained by combining the theme weights with the sub-training data under the multiple themes for training, the information of the data of the multiple themes and the cross information among different themes can be fully learned in the training process, and therefore, when the user processing model is used for predicting based on the sub-related data of the multiple themes of the user to be predicted, the accuracy of the predicted target information can be ensured to a certain extent.

The present invention also provides an electronic device, see fig. 10, including: a processor 501, a memory 502 and a computer program 5021 stored on the memory and executable on the processor, which when executed performs the method of the preceding embodiments.

The invention also provides a readable storage medium in which instructions, when executed by a processor of an electronic device, enable the electronic device to perform the method of the foregoing embodiments.

For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.

The algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. Moreover, the present invention is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.

In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the invention and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. It will be appreciated by those skilled in the art that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functions of some or all of the components in a sequencing device according to the present invention. The present invention may also be embodied as an apparatus or device program for carrying out a portion or all of the methods described herein. Such programs implementing the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A method of model acquisition, the method comprising:

2. The method of claim 1, further comprising:

determining the target duration of the data generation time of each sub-training data from the current time;

determining the freshness weight of each sub-training data according to the target duration of each sub-training data; wherein the freshness weight is positively correlated with the target length of time;

the obtaining the feature representation of the training data according to the feature representation of each sub-training data and the corresponding theme weight includes:

3. The method according to claim 2, characterized in that the model to be trained further comprises a second self-attention layer; determining a freshness weight of each of the sub-training data according to the target duration of each of the sub-training data, including:

4. The method according to claim 2, wherein the obtaining the feature representation of the training data according to the feature representation, the corresponding topic weight and the freshness weight of each of the sub-training data comprises:

5. The method of claim 1, wherein the model to be trained further comprises a fully connected layer; before the obtaining the feature representation of the training data according to the feature representation of each sub-training data and the corresponding theme weight, the method further includes:

vectorizing each sub-training data of any sample user to obtain an initial feature representation of each sub-training data;

and mapping the initial feature representation of each sub-training data to a specified dimension by using the full connection layer to obtain the feature representation of each sub-training data.

6. The method according to any one of claims 1 to 5, wherein the determining, based on a first self-attention layer included in the model to be trained, a theme weight corresponding to each of the sub-training data according to the feature representation of each of the sub-training data includes:

7. The method of claim 6, wherein the model to be trained further comprises a softmax layer; determining the theme weight corresponding to each sub-training data according to the feature weighted sum corresponding to each sub-training data includes:

8. The method of claim 7, wherein the self-attention weight is a parameter in the first self-attention layer; the training the model to be trained according to the feature representation of the training data comprises:

9. A method for user processing, the method comprising:

10. The method of claim 9, wherein the target information is used to characterize one or more of: the ordering rate, the credit level and the default probability of the user.

11. A model acquisition apparatus, characterized in that the apparatus comprises:

12. A user processing apparatus, characterized in that the apparatus comprises:

wherein the user process model is generated based on the model obtaining apparatus of claim 11.

13. An electronic device, comprising:

processor, memory and computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to one or more of claims 1-10 when executing the program.

14. A readable storage medium, characterized in that instructions in the storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the method of one or more of claims 1-10.