CN113052238A

CN113052238A - Model training and service distribution method, device and equipment based on user classification

Info

Publication number: CN113052238A
Application number: CN202110319026.2A
Authority: CN
Inventors: 陈李龙; 王娜; 强锋; 刘华杰
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2021-03-25
Filing date: 2021-03-25
Publication date: 2021-06-29

Abstract

The embodiment of the specification provides a method, a device and equipment for model training and service distribution based on user classification. The method comprises the following steps: acquiring user sample data; the user sample data comprises tagged data and untagged data; the labeled data corresponds to a user category based on the service processing record of the user; determining a user class probability of non-labeled data by neighboring labeled data corresponding to the non-labeled data; constructing a near neighbor similarity regularization feature based on the user category probability; generating classification regularization features corresponding to at least two information categories by using the user sample data; training by integrating the neighbor similarity regularization features and the classification regularization features to obtain a user classification model; the user classification model is used for classifying users to distribute corresponding services. The method reduces the time and resources consumed by a large amount of labeling data, optimizes the generalization effect of the classification model, and improves the service processing experience of users.

Description

Model training and service distribution method, device and equipment based on user classification

Technical Field

The embodiment of the specification relates to the technical field of artificial intelligence, in particular to a service distribution method, a device and equipment.

Background

With the development of various industries, the subdivision degree of the service types is also continuously improved. These services may be services providing corresponding services for users, or services requiring users to process in time, and accordingly, services required to be acquired by different types of users are different. Therefore, services which are possibly acquired by the user are judged in advance according to the relevant information of the user, so that data and resources corresponding to the corresponding services are prepared in advance, the service processing efficiency in the subsequent process can be effectively improved, and the user experience is improved.

At present, when predicting services acquired by a user, a large amount of sample data is often acquired in advance, and a corresponding machine learning model is trained by using the sample data, so that the effect of predicting the services required by different users is realized by using the trained machine learning model. However, currently, after sample data is acquired, the sample data often needs to be labeled. In order to ensure the accuracy of the model, the sample data generally has a huge number, and the labeling of the sample data not only needs to consume larger time and resources, but also has higher requirements on professional knowledge of a labeler, thereby influencing the actual training effect of the model. Therefore, a method for training a model quickly and accurately to ensure a service prediction effect of a user is needed.

Disclosure of Invention

An embodiment of the specification aims to provide a method, a device and equipment for model training and business distribution based on user classification, so as to solve the problem of how to quickly and accurately train a model to ensure the business prediction effect of a user.

In order to solve the foregoing technical problem, an embodiment of the present specification provides a model training method based on user classification, including: acquiring user sample data; the user sample data comprises tagged data and untagged data; the labeled data corresponds to a user category based on the service processing record of the user; determining a user class probability of non-labeled data by neighboring labeled data corresponding to the non-labeled data; the neighboring tagged data comprises tagged data having a difference with the non-tagged data that is less than a specified difference threshold; the user category probability comprises the probability that the unlabeled data belongs to each user category; constructing a near neighbor similarity regularization feature based on the user category probability; the neighbor similarity regularization features comprise data constructed based on correspondence between unlabeled data and labeled data of different user classes; generating classification regularization features corresponding to at least two information categories by using the user sample data; the information categories correspond to categories of different characteristics of user sample data; training by integrating the neighbor similarity regularization features and the classification regularization features to obtain a user classification model; the user classification model is used for determining user classification according to user information so as to distribute services of users corresponding to the user classification.

An embodiment of the present specification further provides a model training apparatus based on user classification, including: the user sample data acquisition module is used for acquiring user sample data; the user sample data comprises tagged data and untagged data; the labeled data corresponds to a user category based on the service processing record of the user; a user category probability determination module for determining a user category probability of the non-labeled data by neighboring labeled data corresponding to the non-labeled data; the neighboring tagged data comprises tagged data having a difference with the non-tagged data that is less than a specified difference threshold; the user category probability comprises the probability that the unlabeled data belongs to each user category; a neighbor similarity regularization feature construction module, configured to construct neighbor similarity regularization features based on the user category probability; the neighbor similarity regularization features comprise data constructed based on correspondence between unlabeled data and labeled data of different user classes; the classification regularization feature generation module is used for generating classification regularization features corresponding to at least two information categories by using the user sample data; the information categories correspond to categories of different characteristics of user sample data; the user classification model acquisition module is used for integrating the neighbor similarity regularization characteristics and the classification regularization characteristics to train to obtain a user classification model; the user classification model is used for determining user classification according to user information so as to distribute services of users corresponding to the user classification.

The embodiment of the present specification further provides a service distribution device, including a memory and a processor; the memory to store computer program instructions; the processor to execute the computer program instructions to implement the steps of: acquiring user sample data; the user sample data comprises tagged data and untagged data; the labeled data corresponds to a user category based on the service processing record of the user; determining a user class probability of non-labeled data by neighboring labeled data corresponding to the non-labeled data; the neighboring tagged data comprises tagged data having a difference with the non-tagged data that is less than a specified difference threshold; the user category probability comprises the probability that the unlabeled data belongs to each user category; constructing a near neighbor similarity regularization feature based on the user category probability; the neighbor similarity regularization features comprise data constructed based on correspondence between unlabeled data and labeled data of different user classes; generating classification regularization features corresponding to at least two information categories by using the user sample data; the information categories correspond to categories of different characteristics of user sample data; training by integrating the neighbor similarity regularization features and the classification regularization features to obtain a user classification model; the user classification model is used for determining user classification according to user information so as to distribute services of users corresponding to the user classification.

In order to solve the above technical problem, an embodiment of the present specification further provides a service allocation method based on user classification, including: acquiring user characteristic information of a target user; inputting the user characteristic information into a user classification model to obtain a user category; the user classification model is obtained by the following method: after user sample data containing tag data and non-tag data are obtained, determining user category probability according to the tag data and the non-tag data; the user category probability comprises the probability that the unlabeled data belongs to each user category; constructing a near neighbor similarity regularization feature based on the user category probability; the neighbor similarity regularization features comprise data constructed based on correspondence between unlabeled data and labeled data of different user classes; generating classification regularization features corresponding to at least two information categories by using the user sample data; the information categories correspond to categories of different characteristics of user sample data; training by integrating the neighbor similarity regularization features and the classification regularization features to obtain a user classification model; and distributing the service corresponding to the user category to the target user.

An embodiment of the present specification further provides a service allocation apparatus based on user classification, including: the user characteristic information acquisition module is used for acquiring user characteristic information of a target user; the user category acquisition module is used for inputting the user characteristic information into a user classification model to obtain a user category; the user classification model is obtained by the following method: after user sample data containing tag data and non-tag data are obtained, determining user category probability according to the tag data and the non-tag data; the user category probability comprises the probability that the unlabeled data belongs to each user category; constructing a near neighbor similarity regularization feature based on the user category probability; the neighbor similarity regularization features comprise data constructed based on correspondence between unlabeled data and labeled data of different user classes; generating classification regularization features corresponding to at least two information categories by using the user sample data; the information categories correspond to categories of different characteristics of user sample data; training by integrating the neighbor similarity regularization features and the classification regularization features to obtain a user classification model; and the service distribution module is used for distributing the service corresponding to the user category to the target user.

The embodiment of the present specification further provides a service allocation device based on user classification, which includes a memory and a processor; the memory to store computer program instructions; the processor to execute the computer program instructions to implement the steps of: acquiring user characteristic information of a target user; inputting the user characteristic information into a user classification model to obtain a user category; the user classification model is obtained by the following method: after user sample data containing tag data and non-tag data are obtained, determining user category probability according to the tag data and the non-tag data; the user category probability comprises the probability that the unlabeled data belongs to each user category; constructing a near neighbor similarity regularization feature based on the user category probability; the neighbor similarity regularization features comprise data constructed based on correspondence between unlabeled data and labeled data of different user classes; generating classification regularization features corresponding to at least two information categories by using the user sample data; the information categories correspond to categories of different characteristics of user sample data; training by integrating the neighbor similarity regularization features and the classification regularization features to obtain a user classification model; and distributing the service corresponding to the user category to the target user.

As can be seen from the technical solutions provided by the embodiments of the present specification, after user sample data is obtained, the embodiments of the present specification label only part of the data, and further, under the condition that the relevance between the labeled data and the unlabeled data is considered, sequentially determine the user class probability and the neighbor similarity regularization feature, and implement training on a user classification model in combination with the classification regularization feature corresponding to the information class, so that the user classification model can be used to complete user classification, and further, corresponding services are allocated to the user according to the user class. By the method, time and resources consumed by marking a large amount of data are reduced, the incidence relation among the data is fully mined, the generalization effect of the classification model is optimized, and the service processing experience of a user is improved.

Drawings

In order to more clearly illustrate the embodiments of the present specification or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the specification, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a flow chart of a method for model training based on user classification according to an embodiment of the present disclosure;

FIG. 2 is a schematic diagram of a model training process according to an embodiment of the present disclosure;

fig. 3 is a schematic diagram of a service allocation process according to an embodiment of the present disclosure;

fig. 4 is a flowchart of a service allocation method based on user classification according to an embodiment of the present disclosure;

FIG. 5 is a block diagram of a model training apparatus based on user classification according to an embodiment of the present disclosure;

fig. 6 is a block diagram of a service distribution apparatus based on user classification according to an embodiment of the present disclosure;

FIG. 7 is a block diagram of a model training device based on user classification in an embodiment of the present disclosure;

fig. 8 is a block diagram of a service distribution device based on user classification according to an embodiment of the present disclosure.

Detailed Description

The technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present specification without any creative effort shall fall within the protection scope of the present specification.

In order to solve the above technical problem, a model training method based on user classification in the embodiment of the present specification is first introduced. The executive body of the model training method based on the user classification can be a model training device based on the user classification. The model training device based on user classification comprises but is not limited to a server, an industrial personal computer, a PC and the like. As shown in fig. 1, the method for training a model based on user classification may include the following implementation steps.

S110: acquiring user sample data; the user sample data comprises tagged data and untagged data; the tagged data corresponds to a user category based on a business process record of the user.

The user sample data may be sample data used to train the model. The user sample data may be basic information of the user itself, such as data of a scholarly calendar, gender, age, and the like, or may be history information of the user processing the service. Specifically, the user sample data may include tagged data and untagged data. In the field of machine learning, the modeling learning based on part of labeled data and part of unlabeled data has the characteristics of rapidness and accuracy, and has a good application value.

The labeled data is data labeled with a corresponding label. The label may be a user category, and specifically, the user category may be set based on a business process record of a user. The user category is a result obtained by classifying the user according to the service processing condition of the user, and specifically, the user category may include a positive category and a negative category, which respectively indicate a condition of a large historical service processing amount and a condition of a small historical service processing amount.

Preferably, in order to reduce the time consumed by labeling the data, the labeled data may have a smaller proportion corresponding to the user sample data as a whole.

In some embodiments, the tagging of the user sample data may be to determine a test user corresponding to the user sample data after determining a part of the user sample data that needs to be tagged. And then, service processing records of the test users in preset test time can be acquired, and labels are added to user sample data corresponding to the test users based on the service processing records. The preset test time can be set based on the requirements of practical application, and can be set to be 3, 7 days, one month, three months and the like.

To illustrate with one specific example, assume that the added tags include a positive class tag and a negative class tag. The positive type label can indicate that the user has more frequent business processing records, and the negative type label can indicate that the user has almost no business processing records in the near future. Accordingly, the preset test time may be set to 3 months. After the service processing record of the test user in three months is obtained, if the test user has the service processing record in three months, marking a positive label; and if the test user does not have the service processing record within three months, marking the negative type label.

The above embodiment is only a specific example designed in combination with the label category, and in practical application, data may be marked in other ways based on different types of labels, which is not described herein again.

S120: determining a user class probability of non-labeled data by neighboring labeled data corresponding to the non-labeled data; the neighboring tagged data comprises tagged data having a difference with the non-tagged data that is less than a specified difference threshold; the user category probability includes a probability that the unlabeled data belongs to each user category.

After the tagged data and non-tagged data are obtained, the user category probability of the non-tagged data can be determined using the tagged data. In the case where the labels of the labeled data have identified respective user categories, the probability that the unlabeled data corresponds to each user category may be determined based on the degree of similarity between the unlabeled data and the labeled data. The user class probability is used to indicate the probability that the non-tag data belongs to each user class.

In particular, the user class probability may be determined by neighboring tagged data. The neighboring labeled data is data whose difference from the unlabeled data is smaller than a specified gap threshold among the labeled data. The specified difference threshold is used to represent a maximum degree of difference between data in the same category.

In some embodiments, determining the difference between tagged data and untagged data may be determined by a K-nearest neighbor similarity algorithm. The idea of the K-nearest neighbor similarity algorithm is that most of K most similar samples (i.e., nearest samples in the feature space) in the feature space belong to a certain category, and by calculating the euclidean distance between different sample data, the classification of the sample data can be realized according to the distance between different samples. The specific implementation may be set based on the actual application, and the implementation process is not described herein again.

After determining the labeled data of the neighbors without labeled data, the number of the labeled data of the neighbors corresponding to different user classes can be counted, so as to determine the user class probability. Assuming that the user categories include positive categories and negative categories and the corresponding user category probabilities also include positive category probabilities and negative category probabilities, the positive category probabilities p₊(x^u)＝k₊K in the formula₊The number of positive class samples in the labeled data is the neighbor, k is the number of the labeled data in the neighbor, and the probability p of the negative class_-(x^u)＝k_-K in the formula_-And the number of the negative type samples in the adjacent labeled data is set.

In some embodiments, before the user category probability is obtained, the user sample data may be preprocessed. And the preprocessing comprises the steps of constructing the user sample data into original features respectively corresponding to each user, and completing the original features based on preset feature fields.

Because the data tables corresponding to the sample data can be respectively acquired based on different data types, after the sample data is obtained, data columns in different data tables can be collected according to the user identification, so that the original characteristics corresponding to each user are obtained. Correspondingly, in order to ensure that the original features can be effectively utilized in the subsequent process, the missing value column in the original features can be supplemented. The preset feature field may be a field corresponding to a different completion rule. For example, for a missing value of a numerical feature in the original feature, the preset feature field may be complemented by a value of "0"; for missing values where the non-numeric feature is missing, the default feature field may be completed for an "unknown" value. In practical application, the completion of the original feature can be realized by using other preset feature fields according to requirements, which is not limited to the above examples and is not described herein again.

In some embodiments, the user sample data may be in the form of constructed features, so that the user sample data can be better applied in a subsequent model training process, that is, the user sample data may be user information features.

When the user information features are obtained, the original features may be constructed based on the collected user information, and the specific manner of constructing the original features may refer to the description in the above embodiments, which is not described herein again. And encoding the category features in the original features, wherein the category features comprise at least One of the characteristics of the academic calendar and the gender, and the encoding mode can be One-Hot encoding.

Time series historical characteristics and time series aggregation characteristics can be further constructed on the basis of different characteristics of the user sample data corresponding to time.

The time series historical feature may represent a feature formed by user sample data in a corresponding time interval, and specifically, F may be used_his＝[feature_time,time＝1,2,3,4,5,6]And constructing, wherein the formula can represent the characteristics constructed by using the user sample data in different time intervals when the time period is the first month, the second month, the third month, the fourth month, the fifth month and the sixth month before.

The time-series aggregation feature may represent a feature obtained based on a time interval and a manner of obtaining the feature, and specifically, F may be used_agg＝[f(feature)_time,time＝1-3,1-6,1-9,1-12]Obtaining time series aggregation characteristics F_aggAnd f () in the formula respectively takes Mean () average value, Max () maximum value, Min () minimum value and Std () standard deviation, and the time period can respectively take the first 1 month, the first 3 months, the first 6 months and the first 12 th month, so that the aggregation of the sample data characteristics in a certain time interval is realized based on a certain access mode. The feature classes obtained in practical applications are not limited to the specific examples described above, and are not described herein again.

After the original features, the time series historical features and the time series aggregation features are obtained, the features can be integrated to obtain features used for a subsequent model training process, and therefore model training is facilitated.

S130: constructing a near neighbor similarity regularization feature based on the user category probability; the neighbor similarity regularization features include data constructed based on correspondences between unlabeled data and labeled data of different user classes.

After obtaining the user category probability, a neighbor similarity regularization feature may be constructed based on the user category probability. The neighbor similarity regularization feature is feature data constructed in consideration of the correlation between the non-labeled data and the labeled data, thereby sufficiently learning potential distribution information between the non-labeled data and the neighbor labeled data.

In particular, in the case where the user categories include a positive category and a negative category, a formula may be utilized

Determining neighbor similarity regularization features, where R_nsFor near-neighbor semblance regularization features, X^UFor unlabeled data, | X^UI is the number of unlabeled data, p₊(x^u) Probability of user category, p, for unlabeled data corresponding to forward category_-(x^u) A user category probability that unlabeled data corresponds to a negative-going category,

to obtain a classification result after inputting sample data into a sub-classifier, the sub-classifier is used to determine a user class, ω₊In the forward class, ω_-Is the negative going category.

Preferably, when the user sample data is acquired based on different information categories, the neighbor similarity regularization feature may be configured for each information category. Specifically, in combination with the above formula, the sub-classifier f may be a sub-classifier corresponding to one of the information categories, and accordingly, each information category also corresponds to a respective sub-classifier. By respectively acquiring the neighbor similarity regularization characteristics under different information categories, the differences of corresponding data under different visual angles are maximally distinguished, the distinguishing degree of the data is improved, and the model training effect is facilitated.

The categories of information may be categories corresponding to different characteristics of the user sample data. In some embodiments, the information categories may include a user basic information category, a business information category, and a transaction information category. The user basic information category may be a category corresponding to the information of the user, for example, user information such as a study, age, and gender of the user; the service information category may be a category to which a corresponding service belongs, for example, in the financial field, different types of services such as a withdrawal service, a transfer service, an inquiry service, and the like may be available. The transaction information category can be used for representing specific interaction information of a user when processing a service, such as service processing time, specific service processing flow and the like. In practical application, other information categories may also be determined according to requirements, and are not limited to the above examples, and are not described herein again.

S140: generating classification regularization features corresponding to at least two information categories by using the user sample data; the information categories correspond to categories of different characteristics of the user sample data.

After the neighbor similar regularization features are obtained through construction, classification regularization features corresponding to different information categories can be generated by using user sample data. The classification regularization features are features obtained after considering the effect of mutual influence among the features of different information categories.

In this embodiment, corresponding sub-classifiers are respectively set for different information categories, and the different sub-classifiers are used for implementing user classification according to information of different information categories. In order to enable different sub-classifiers to obtain corresponding technical effects, the outputs of the corresponding sub-classifiers under different information categories are different, so that the sub-classifiers are prevented from being consistent, and the generalization effect of the model is improved. Correspondingly, under the condition that the outputs of all classifiers are not completely consistent, the influence degree of results among different sub-classifiers also needs to be considered comprehensively, and then the optimization of the sub-classifiers is realized. The classification regularization features are features constructed to obtain the result difference of the sub-classifiers.

In particular, a formula may be utilized

Obtaining classification regularization features, where R_vmTo classify regularized features, X^UFor unlabeled data, | X^UI is the number of unlabeled data, f₁For sub-classifiers corresponding to the user's basic information categories, f₂For sub-classifiers corresponding to classes of traffic information, f₃Are sub-classifiers corresponding to a transaction information category, the sub-classifiers are used to determine a user category. The formula is constructed when the information category is the basic information category, the service information category and the transaction information category, and in practical application, the formula can be adjusted correspondingly based on the number of the information categories and the specific types of the information categories, which is not described herein again.

S150: training by integrating the neighbor similarity regularization features and the classification regularization features to obtain a user classification model; the user classification model is used for determining user classification according to user information so as to distribute services of users corresponding to the user classification.

Because the neighbor similarity regularization features and the classification regularization features can both reflect the classification effect of the sub-classifiers, the models corresponding to the sub-classifiers can be optimized based on the values of the neighbor similarity regularization features and the classification regularization features to obtain the final user classification model.

In some implementations, the training process can be to compute an objective function based on empirical losses, L2 regularization losses, neighbor-like regularization features, and classification regularization features. The empirical loss may be a loss derived from the labeled data to reflect the degree of fit of the sub-classifiers to the labeled data. The model may be further optimized based on the magnitude of the empirical loss. The L2 regularization loss is used for limiting an overfitting phenomenon caused by excessive parameters, and the generalization capability of the model is improved.

The objective function may be a function for evaluating the effect of the model. Specifically, a formula can be utilized

Calculating an objective function, wherein L is the objective function and R_empFor empirical loss, R_nsFor near-neighbor semblance regularization features, R_vmTo classify regularized features, R_regFor the L2 regularization loss, α, β, γ are hyperparameters.

After obtaining the result corresponding to the objective function, the sub-classifiers may be optimized based on the value of the objective function, and specifically, the corresponding optimization problem may be solved by using a gradient descent method in combination with the objective function. And repeating the steps of obtaining the objective function corresponding to the sub-classifiers and optimizing the sub-classifiers based on the objective function until the preset iteration times are reached or the difference between the loss values of the test results before and after optimization is less than a preset threshold value, completing the optimization of the sub-classifiers, and constructing a final user classification model by using the sub-classifiers. Based on the above example, in the case that the information categories include the basic information category, the business information category and the transaction information category, the optimized sub-classifiers are f respectively₁、f₂、f₃And accordingly correspond to the information categories described above.

The user classification model may be a model obtained by synthesizing the optimized sub-classifiers, and specifically, a formula may be used

Obtaining a user classification model, wherein F (x) is the user classification model, f_vIs a sub-classifier, x is user information, ω₁Is a first user class, ω₂Is a second user category. The above example is only the formula set in the case that the sub-classifiers correspond to the above three information categories, and in practical application, the corresponding generation can be determined according to the specific information categoriesThe formula of the user classification model is not limited to the above example, and is not described herein again.

In some embodiments, after training the user classification model, the user classification model may be tested using test data. Specifically, a formula can be utilized

And (6) carrying out testing. And comparing the test result with the actual category corresponding to the test data by inputting the corresponding test data x, thereby verifying the accuracy of the user classification model.

After the user classification model is obtained, the data to be analyzed can be input into the user classification model, and the corresponding service is determined according to the output user classification result, so that the service is distributed to the user.

The flow of the above method is described with a specific example in conjunction with fig. 2 and 3.

Fig. 2 is a schematic flow chart corresponding to the model training process. After a small amount of labeled samples and a large amount of unlabeled samples are obtained, the experience loss can be calculated independently by using the small amount of labeled samples. And combining the small amount of labeled samples and the large amount of unlabeled samples, respectively constructing corresponding neighbor information from a basic information view angle, a transaction information view angle and a product information view angle, and then respectively determining neighbor similar regularization items under each view angle by utilizing the neighbor information. In addition, the information under each visual angle is integrated, and a visual angle diversity regularization item can be further determined. And training a user classification model can be realized by combining the experience loss, the view diversity regularization, the neighbor similarity regularization term and the L2 regularization.

Fig. 3 is a schematic diagram corresponding to a specific training and testing process. After data are obtained from a data warehouse, preprocessing is carried out on the data, corresponding features are constructed on the basis of feature engineering, the features are divided into training samples and testing samples, training of a user classification model is completed through the training samples, a prediction result corresponding to the user classification model can be obtained through the testing samples, accuracy of the model can be judged on the basis of the prediction result, and optimization of the model is achieved.

Based on the above description of the embodiments and examples, it can be seen that, after user sample data is obtained, the method labels only part of the data, further determines user class probability and neighbor similarity regularization features in turn under the condition of considering the relevance between labeled data and unlabeled data, and implements training of a user classification model in combination with the classification regularization features corresponding to information classes, so that the user classification model can be used to complete user classification, and further corresponding services are allocated to users according to user classes. By the method, time and resources consumed by marking a large amount of data are reduced, the incidence relation among the data is fully mined, the generalization effect of the classification model is optimized, and the service processing experience of a user is improved.

A service allocation method based on user classification in the embodiment of the present specification is introduced based on a model training method based on user classification corresponding to fig. 1. The execution subject of the service allocation method based on the user classification can be a service allocation device based on the user classification. As shown in fig. 4, the service allocation method based on user classification may include the following implementation steps.

S410: and acquiring user characteristic information of the target user.

The target user may be a user for whom a service needs to be allocated, or for whom a service needs to be performed needs to be predicted. The user characteristic information is information corresponding to the target user. The user characteristic information may be set according to the requirement of the user classification model, for example, the user characteristic information may be identity information of the user itself, or information of historical services processed by the user, and the like, which is not limited herein.

S420: inputting the user characteristic information into a user classification model to obtain a user category; the user classification model is obtained by the following method: after user sample data containing tag data and non-tag data are obtained, determining user category probability according to the tag data and the non-tag data; the user category probability comprises the probability that the unlabeled data belongs to each user category; constructing a near neighbor similarity regularization feature based on the user category probability; the neighbor similarity regularization features comprise data constructed based on correspondence between unlabeled data and labeled data of different user classes; generating classification regularization features corresponding to at least two information categories by using the user sample data; the information categories correspond to categories of different characteristics of user sample data; and training by integrating the neighbor similarity regularization characteristics and the classification regularization characteristics to obtain a user classification model.

After obtaining the user characteristic information, the user characteristic information may be input into a user classification model. The user classification model can determine the corresponding class of the user according to the user characteristic information, so that the corresponding user class can be output. For a specific introduction and obtaining method of the user classification model, reference may be made to the introduction in the embodiment corresponding to fig. 1, and details are not described here.

S430: and distributing the service corresponding to the user category to the target user.

Based on different user categories, corresponding services can be preset, and after the user category corresponding to the target user is determined, the services corresponding to the user category can be distributed to the target user.

The specific corresponding relation between the determined user category and the business can be directly specified by management personnel or obtained by training based on historical data. The specific obtaining mode may be set based on the actual application situation, and is not described herein again.

A model training apparatus based on user classification according to an embodiment of the present specification is introduced based on a model training method based on user classification corresponding to fig. 1. As shown in fig. 5, the model training apparatus based on user classification includes the following modules.

A user sample data obtaining module 510, configured to obtain user sample data; the user sample data comprises tagged data and untagged data; the tagged data corresponds to a user category based on a business process record of the user.

A user category probability determining module 520, configured to determine a user category probability of non-labeled data through neighboring labeled data corresponding to the non-labeled data; the neighboring tagged data comprises tagged data having a difference with the non-tagged data that is less than a specified difference threshold; the user category probability includes a probability that the unlabeled data belongs to each user category.

A neighbor similarity regularization feature construction module 530 for constructing neighbor similarity regularization features based on the user category probabilities; the neighbor similarity regularization features include data constructed based on correspondences between unlabeled data and labeled data of different user classes.

A classification regularization feature generation module 540, configured to generate classification regularization features corresponding to at least two information categories using the user sample data; the information categories correspond to categories of different characteristics of the user sample data.

A user classification model obtaining module 550, configured to synthesize the neighbor similarity regularization feature and the classification regularization feature to obtain a user classification model; the user classification model is used for determining user classification according to user information so as to distribute services of users corresponding to the user classification.

A service allocation apparatus based on user classification according to an embodiment of the present description is introduced based on a service allocation method based on user classification corresponding to fig. 4. As shown in fig. 6, the service distribution apparatus based on user classification includes the following modules.

The user characteristic information obtaining module 610 is configured to obtain user characteristic information of a target user.

A user category obtaining module 620, configured to input the user feature information into a user classification model to obtain a user category; the user classification model is obtained by the following method: after user sample data containing tag data and non-tag data are obtained, determining user category probability according to the tag data and the non-tag data; the user category probability comprises the probability that the unlabeled data belongs to each user category; constructing a near neighbor similarity regularization feature based on the user category probability; the neighbor similarity regularization features comprise data constructed based on correspondence between unlabeled data and labeled data of different user classes; generating classification regularization features corresponding to at least two information categories by using the user sample data; the information categories correspond to categories of different characteristics of user sample data; and training by integrating the neighbor similarity regularization characteristics and the classification regularization characteristics to obtain a user classification model.

A service allocating module 630, configured to allocate the service corresponding to the user category to the target user.

Based on the model training method based on user classification corresponding to fig. 1, an embodiment of the present specification provides a model training device based on user classification. As shown in FIG. 7, the user classification based model training device may include a memory and a processor.

In this embodiment, the memory may be implemented in any suitable manner. For example, the memory may be a read-only memory, a mechanical hard disk, a solid state disk, a U disk, or the like. The memory may be used to store computer program instructions.

In this embodiment, the processor may be implemented in any suitable manner. For example, the processor may take the form of, for example, a microprocessor or processor and a computer-readable medium that stores computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an Application Specific Integrated Circuit (ASIC), a programmable logic controller, an embedded microcontroller, and so forth. The processor may execute the computer program instructions to perform the steps of: acquiring user sample data; the user sample data comprises tagged data and untagged data; the labeled data corresponds to a user category based on the service processing record of the user; determining a user class probability of non-labeled data by neighboring labeled data corresponding to the non-labeled data; the neighboring tagged data comprises tagged data having a difference with the non-tagged data that is less than a specified difference threshold; the user category probability comprises the probability that the unlabeled data belongs to each user category; constructing a near neighbor similarity regularization feature based on the user category probability; the neighbor similarity regularization features comprise data constructed based on correspondence between unlabeled data and labeled data of different user classes; generating classification regularization features corresponding to at least two information categories by using the user sample data; the information categories correspond to categories of different characteristics of user sample data; training by integrating the neighbor similarity regularization features and the classification regularization features to obtain a user classification model; the user classification model is used for determining user classification according to user information so as to distribute services of users corresponding to the user classification.

Based on the service allocation method based on the user classification corresponding to fig. 4, an embodiment of the present specification provides a service allocation device based on the user classification. As shown in fig. 8, the traffic distribution apparatus based on user classification may include a memory and a processor.

In this embodiment, the processor may be implemented in any suitable manner. For example, the processor may take the form of, for example, a microprocessor or processor and a computer-readable medium that stores computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an Application Specific Integrated Circuit (ASIC), a programmable logic controller, an embedded microcontroller, and so forth. The processor may execute the computer program instructions to perform the steps of: acquiring user characteristic information of a target user; inputting the user characteristic information into a user classification model to obtain a user category; the user classification model is obtained by the following method: after user sample data containing tag data and non-tag data are obtained, determining user category probability according to the tag data and the non-tag data; the user category probability comprises the probability that the unlabeled data belongs to each user category; constructing a near neighbor similarity regularization feature based on the user category probability; the neighbor similarity regularization features comprise data constructed based on correspondence between unlabeled data and labeled data of different user classes; generating classification regularization features corresponding to at least two information categories by using the user sample data; the information categories correspond to categories of different characteristics of user sample data; training by integrating the neighbor similarity regularization features and the classification regularization features to obtain a user classification model; and distributing the service corresponding to the user category to the target user.

In the 90 s of the 20 th century, improvements in a technology could clearly distinguish between improvements in hardware (e.g., improvements in circuit structures such as diodes, transistors, switches, etc.) and improvements in software (improvements in process flow). However, as technology advances, many of today's process flow improvements have been seen as direct improvements in hardware circuit architecture. Designers almost always obtain the corresponding hardware circuit structure by programming an improved method flow into the hardware circuit. Thus, it cannot be said that an improvement in the process flow cannot be realized by hardware physical modules. For example, a Programmable Logic Device (PLD), such as a Field Programmable Gate Array (FPGA), is an integrated circuit whose Logic functions are determined by programming the Device by a user. A digital system is "integrated" on a PLD by the designer's own programming without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Furthermore, nowadays, instead of manually making an Integrated Circuit chip, such Programming is often implemented by "logic compiler" software, which is similar to a software compiler used in program development and writing, but the original code before compiling is also written by a specific Programming Language, which is called Hardware Description Language (HDL), and HDL is not only one but many, such as abel (advanced Boolean Expression Language), ahdl (alternate Language Description Language), traffic, pl (core unified Programming Language), HDCal, JHDL (Java Hardware Description Language), langue, Lola, HDL, laspam, hardbyscript Description Language (vhr Description Language), and the like, which are currently used by Hardware compiler-software (Hardware Description Language-software). It will also be apparent to those skilled in the art that hardware circuitry that implements the logical method flows can be readily obtained by merely slightly programming the method flows into an integrated circuit using the hardware description languages described above.

The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

From the above description of the embodiments, it is clear to those skilled in the art that the present specification can be implemented by software plus the necessary first hardware platform. Based on such understanding, the technical solutions of the present specification may be essentially or partially implemented in the form of software products, which may be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and include instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments of the present specification.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The description is operational with numerous first or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.

This description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

While the specification has been described with examples, those skilled in the art will appreciate that there are numerous variations and permutations of the specification that do not depart from the spirit of the specification, and it is intended that the appended claims include such variations and modifications that do not depart from the spirit of the specification.

Claims

1. A model training method based on user classification is characterized by comprising the following steps:

acquiring user sample data; the user sample data comprises tagged data and untagged data; the labeled data corresponds to a user category based on the service processing record of the user;

determining a user class probability of non-labeled data by neighboring labeled data corresponding to the non-labeled data; the neighboring tagged data comprises tagged data having a difference with the non-tagged data that is less than a specified difference threshold; the user category probability comprises the probability that the unlabeled data belongs to each user category;

constructing a near neighbor similarity regularization feature based on the user category probability; the neighbor similarity regularization features comprise data constructed based on correspondence between unlabeled data and labeled data of different user classes;

generating classification regularization features corresponding to at least two information categories by using the user sample data; the information categories correspond to categories of different characteristics of user sample data;

training by integrating the neighbor similarity regularization features and the classification regularization features to obtain a user classification model; the user classification model is used for determining user classification according to user information so as to distribute services of users corresponding to the user classification.

2. The method of claim 1, wherein prior to determining a user class probability for unlabeled data by neighboring labeled data corresponding to the unlabeled data, further comprising:

and acquiring the adjacent labeled data corresponding to the unlabeled data based on a K adjacent similarity algorithm.

3. The method of claim 1, wherein the user categories include a positive category and a negative category; constructing a near neighbor similarity regularization feature according to the unlabeled data and the labeled data based on the user category probability, including:

using formulas

to be a sampleThe data is input into a sub-classifier for determining the user category, omega₊In the forward class, ω_-Is the negative going category.

4. The method of claim 1, in which the neighbor similarity regularization features comprise features that respectively correspond to respective categories of information.

5. The method of claim 1, wherein said generating classification regularization features corresponding to at least two information classes using said user sample data comprises:

using formulas

Obtaining classification regularization features, where R_vmTo classify regularized features, X^UFor unlabeled data, | X^UI is the number of unlabeled data, f₁For sub-classifiers corresponding to the user's basic information categories, f₂For sub-classifiers corresponding to classes of traffic information, f₃Are sub-classifiers corresponding to a transaction information category, the sub-classifiers are used to determine a user category.

6. The method of claim 1, wherein the synthesizing of the neighbor similarity regularization features and the classification regularization features to obtain a user classification model comprises:

obtaining experience loss according to the labeled data;

calculating an objective function based on the empirical loss, the L2 regularization loss, the neighbor similarity regularization features and the classification regularization features;

optimizing the sub-classifiers according to the objective function; the sub-classifier is used for determining user categories corresponding to user sample data under different information categories; the sub-classification is further used for constructing the neighbor similarity regularization features and classification regularization features;

and synthesizing the optimized sub-classifiers to obtain a user classification model.

7. The method of claim 6, wherein said optimizing sub-classifiers according to the objective function comprises:

and repeating the steps of obtaining the objective function corresponding to the sub-classifiers and optimizing the sub-classifiers based on the objective function until the preset iteration times are reached or the difference of the loss values of the test results before and after optimization is smaller than a preset threshold value.

8. The method of claim 6, wherein computing an objective function based on empirical losses, L2 regularization losses, neighbor-like regularization features, and classification regularization features comprises:

using formulas

9. The method of claim 6, wherein synthesizing the optimized sub-classifiers results in a user classification model comprising:

using formulas

Obtaining a user classification model, wherein F (x) is the user classification model, f_vIs a sub-classifier, x is user information, ω₁Is a first user class, ω₂Is a second user category.

10. The method of claim 1, wherein the information categories include at least one of a user basic information category, a business information category, and a transaction information category.

11. The method of claim 1, wherein prior to determining a user class probability for unlabeled data by neighboring labeled data corresponding to the unlabeled data, further comprising:

preprocessing the user sample data; the pretreatment comprises the following steps: constructing original features respectively corresponding to each user based on the user sample data; and completing the original features based on preset feature fields.

12. The method of claim 1, in which the user sample data comprises user information characteristics; the acquiring user sample data includes:

constructing original features based on the user sample data;

encoding the class features in the original features; the category features comprise at least one of academic calendar and gender;

constructing time series historical characteristics through user sample data in different time intervals;

constructing a time sequence aggregation characteristic by using an average value, a maximum value, a minimum value and a standard deviation of user sample data in time distribution;

and synthesizing the original characteristics, the time sequence historical characteristics and the time sequence aggregation characteristics to obtain the user information characteristics.

13. The method of claim 1, wherein before said obtaining user sample data, further comprising:

determining a test user corresponding to part of user sample data;

acquiring a service processing record of the test user within a preset test time;

and adding a label to the user sample data corresponding to the test user based on the service processing record.

14. A model training apparatus based on user classification, comprising:

the user sample data acquisition module is used for acquiring user sample data; the user sample data comprises tagged data and untagged data; the labeled data corresponds to a user category based on the service processing record of the user;

a user category probability determination module for determining a user category probability of the non-labeled data by neighboring labeled data corresponding to the non-labeled data; the neighboring tagged data comprises tagged data having a difference with the non-tagged data that is less than a specified difference threshold; the user category probability comprises the probability that the unlabeled data belongs to each user category;

a neighbor similarity regularization feature construction module, configured to construct neighbor similarity regularization features based on the user category probability; the neighbor similarity regularization features comprise data constructed based on correspondence between unlabeled data and labeled data of different user classes;

the classification regularization feature generation module is used for generating classification regularization features corresponding to at least two information categories by using the user sample data; the information categories correspond to categories of different characteristics of user sample data;

the user classification model acquisition module is used for integrating the neighbor similarity regularization characteristics and the classification regularization characteristics to train to obtain a user classification model; the user classification model is used for determining user classification according to user information so as to distribute services of users corresponding to the user classification.

15. A model training device based on user classification, comprising a memory and a processor;

the memory to store computer program instructions;

the processor to execute the computer program instructions to implement the steps of: acquiring user sample data; the user sample data comprises tagged data and untagged data; the labeled data corresponds to a user category based on the service processing record of the user; determining a user class probability of non-labeled data by neighboring labeled data corresponding to the non-labeled data; the neighboring tagged data comprises tagged data having a difference with the non-tagged data that is less than a specified difference threshold; the user category probability comprises the probability that the unlabeled data belongs to each user category; constructing a near neighbor similarity regularization feature based on the user category probability; the neighbor similarity regularization features comprise data constructed based on correspondence between unlabeled data and labeled data of different user classes; generating classification regularization features corresponding to at least two information categories by using the user sample data; the information categories correspond to categories of different characteristics of user sample data; training by integrating the neighbor similarity regularization features and the classification regularization features to obtain a user classification model; the user classification model is used for determining user classification according to user information so as to distribute services of users corresponding to the user classification.

16. A service distribution method based on user classification is characterized by comprising the following steps:

acquiring user characteristic information of a target user;

inputting the user characteristic information into a user classification model to obtain a user category; the user classification model is obtained by the following method: after user sample data containing tag data and non-tag data are obtained, determining user category probability according to the tag data and the non-tag data; the user category probability comprises the probability that the unlabeled data belongs to each user category; constructing a near neighbor similarity regularization feature based on the user category probability; the neighbor similarity regularization features comprise data constructed based on correspondence between unlabeled data and labeled data of different user classes; generating classification regularization features corresponding to at least two information categories by using the user sample data; the information categories correspond to categories of different characteristics of user sample data; training by integrating the neighbor similarity regularization features and the classification regularization features to obtain a user classification model;

and distributing the service corresponding to the user category to the target user.

17. A service allocation apparatus based on user classification, comprising:

the user characteristic information acquisition module is used for acquiring user characteristic information of a target user;

the user category acquisition module is used for inputting the user characteristic information into a user classification model to obtain a user category; the user classification model is obtained by the following method: after user sample data containing tag data and non-tag data are obtained, determining user category probability according to the tag data and the non-tag data; the user category probability comprises the probability that the unlabeled data belongs to each user category; constructing a near neighbor similarity regularization feature based on the user category probability; the neighbor similarity regularization features comprise data constructed based on correspondence between unlabeled data and labeled data of different user classes; generating classification regularization features corresponding to at least two information categories by using the user sample data; the information categories correspond to categories of different characteristics of user sample data; training by integrating the neighbor similarity regularization features and the classification regularization features to obtain a user classification model;

and the service distribution module is used for distributing the service corresponding to the user category to the target user.

18. A traffic distribution apparatus based on user classification, comprising a memory and a processor;

the memory to store computer program instructions;

the processor to execute the computer program instructions to implement the steps of: acquiring user characteristic information of a target user; inputting the user characteristic information into a user classification model to obtain a user category; the user classification model is obtained by the following method: after user sample data containing tag data and non-tag data are obtained, determining user category probability according to the tag data and the non-tag data; the user category probability comprises the probability that the unlabeled data belongs to each user category; constructing a near neighbor similarity regularization feature based on the user category probability; the neighbor similarity regularization features comprise data constructed based on correspondence between unlabeled data and labeled data of different user classes; generating classification regularization features corresponding to at least two information categories by using the user sample data; the information categories correspond to categories of different characteristics of user sample data; training by integrating the neighbor similarity regularization features and the classification regularization features to obtain a user classification model; and distributing the service corresponding to the user category to the target user.