CN112990989B - Value prediction model input data generation method, device, equipment and medium - Google Patents

Value prediction model input data generation method, device, equipment and medium Download PDF

Info

Publication number
CN112990989B
CN112990989B CN202110531498.4A CN202110531498A CN112990989B CN 112990989 B CN112990989 B CN 112990989B CN 202110531498 A CN202110531498 A CN 202110531498A CN 112990989 B CN112990989 B CN 112990989B
Authority
CN
China
Prior art keywords
target
value
similar
user
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110531498.4A
Other languages
Chinese (zh)
Other versions
CN112990989A (en
Inventor
刘志伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Taiping Financial Technology Services Shanghai Co Ltd Shenzhen Branch
Original Assignee
Taiping Financial Technology Services Shanghai Co Ltd Shenzhen Branch
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Taiping Financial Technology Services Shanghai Co Ltd Shenzhen Branch filed Critical Taiping Financial Technology Services Shanghai Co Ltd Shenzhen Branch
Priority to CN202110531498.4A priority Critical patent/CN112990989B/en
Publication of CN112990989A publication Critical patent/CN112990989A/en
Application granted granted Critical
Publication of CN112990989B publication Critical patent/CN112990989B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0202Market predictions or forecasting for commercial activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data

Abstract

The application relates to the technical field of big data, in particular to a value prediction model input data generation method, device, equipment and medium. The method comprises the following steps: acquiring target historical service data corresponding to a target user; extracting preset target characteristic values corresponding to a plurality of target characteristics from target historical service data; when the target features of which the target feature values are not extracted exist, the target features of which the target feature values are not extracted are used as missing features, and similar users corresponding to the target users and the similarity between the target users and the similar users are obtained; extracting similar characteristic values corresponding to the missing characteristics from similar service data corresponding to similar users; calculating to obtain a missing characteristic value corresponding to the missing characteristic according to the similarity and the similar characteristic value of the similar user; and obtaining value prediction model input data according to the calculated missing characteristic value and the extracted target characteristic value. By adopting the method, the accuracy of generating the model input data can be improved.

Description

Value prediction model input data generation method, device, equipment and medium
Technical Field
The application relates to the technical field of big data, in particular to a value prediction model input data generation method, device, equipment and medium.
Background
With the development of computer technology, traditional offline services are gradually shifted to online for processing, so that the amount of online data becomes more and more huge. For companies, it is becoming more and more important how to analyze and process huge online data to obtain valid data.
For example, a company can predict the value level of a user in a future time period according to the value level of the user in a historical time period by analyzing online data, and further can execute business activities of corresponding levels on users of different levels in the future time period, so that the business execution efficiency is improved.
In the traditional method, in the process of predicting the user value level, the future value level of the user is directly predicted according to the historical data corresponding to the user, and when the historical data of the user is missing, the user value level predicted according to the historical data of the user is inaccurate.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a value prediction model input data generation method, device, apparatus, and medium capable of improving accuracy of model input data acquisition.
A value prediction model input data generation method comprises the following steps:
acquiring target historical service data corresponding to a target user;
extracting preset target characteristic values corresponding to a plurality of target characteristics from target historical service data;
when the target features of which the target feature values are not extracted exist, the target features of which the target feature values are not extracted are used as missing features, and similar users corresponding to the target users and the similarity between the target users and the similar users are obtained;
extracting similar characteristic values corresponding to the missing characteristics from similar service data corresponding to similar users;
calculating to obtain a missing characteristic value corresponding to the missing characteristic according to the similarity and the similar characteristic value of the similar user;
and obtaining value prediction model input data according to the calculated missing characteristic value and the extracted target characteristic value.
In one embodiment, a target feature which is extracted from target historical business data to a target feature value is taken as a normal feature; acquiring similar users corresponding to the target user and the similarity between the target user and the similar users, wherein the steps comprise:
extracting preset similar characteristic values corresponding to all target characteristics from similar service data corresponding to similar users;
determining a similar characteristic mean value according to each similar characteristic value;
determining a normal characteristic mean value according to normal characteristic values corresponding to all normal characteristics in the target historical service data;
determining a similarity difference value of each target feature according to the similarity feature mean value and the similarity feature value corresponding to each target feature;
determining a target difference value of each target characteristic according to the target characteristic mean value and a target characteristic value corresponding to each target characteristic;
and determining the similarity between the target user and the similar user according to the similarity difference corresponding to each target feature and the target difference.
In one embodiment, calculating a missing feature value corresponding to the missing feature according to the similarity of similar users and the similar feature value includes:
determining a similarity adjustment value corresponding to each similar user according to the similarity characteristic mean value, the similarity characteristic value and the similarity degree respectively corresponding to each similar user;
and calculating to obtain a missing characteristic value corresponding to the missing characteristic according to the normal characteristic mean value and the similar adjustment value respectively corresponding to each similar user.
In one embodiment, the method further comprises:
preprocessing target historical service data; the preprocessing includes at least one of data verification, data cleansing, and data normalization.
A method of user value prediction, the method comprising:
according to the value prediction model input data generation method in the embodiment, model input data are obtained;
inputting the model input data into a pre-constructed value prediction model, and obtaining the predicted value of the target user in a future time period according to the value prediction model; the value prediction model is constructed according to the training characteristics and the training values, and the training characteristics and the training values are obtained from historical business data corresponding to more than one user respectively.
In one embodiment, a method of constructing a value prediction model includes:
acquiring historical service data corresponding to more than one user respectively;
respectively extracting training characteristics from each historical service data;
extracting value calculation features from the training features;
acquiring the feature weight of each value calculation feature;
determining the training value of each user according to the characteristic weight and the value calculation characteristic;
and training the prediction model according to the training characteristics and the training value, and stopping the training of the prediction model when the training ending condition is met to obtain the value prediction model.
A user request processing method comprises the following steps:
receiving a user request, wherein the user request carries user data;
processing the user data by the user value prediction method in the embodiment to obtain the user value;
acquiring a service strategy corresponding to the user value;
and processing the user request according to the service strategy.
A value prediction model input data generation apparatus, the apparatus comprising:
the acquisition module is used for acquiring target historical service data corresponding to a target user;
the extraction module is used for extracting preset target characteristic values corresponding to a plurality of target characteristics from the target historical service data;
the similarity acquisition module is used for taking the target features of which the target feature values are not extracted as missing features when the target features of which the target feature values are not extracted exist, and acquiring similar users corresponding to the target users and the similarity between the target users and the similar users;
the similar data extraction module is used for extracting similar characteristic values corresponding to the missing characteristics from similar service data corresponding to similar users;
the calculation module is used for calculating to obtain a missing characteristic value corresponding to the missing characteristic according to the similarity of the similar users and the similar characteristic value;
and the generating module is used for obtaining value prediction model input data according to the calculated missing characteristic value and the extracted target characteristic value.
A computer device comprising a memory storing a computer program and a processor implementing the steps of the method when the processor executes the computer program.
A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method.
The value prediction model input data generation method acquires target historical service data corresponding to a target user, and extracts preset target characteristic values corresponding to a plurality of target characteristics from the target historical service data. And after the target characteristic value is obtained, whether the target characteristic value is not extracted is judged (namely whether the target characteristic value corresponding to each target characteristic has a missing value is judged), so that the accuracy of the target historical service data is judged before the user value is predicted. And when there is a target feature for which the target feature value is not extracted (that is, when there is a target feature value corresponding to the target feature as a missing value), the target feature for which the target feature value is not extracted is taken as the missing feature. And then acquiring similar users corresponding to the target user and similar service data corresponding to each similar user, so as to complete missing characteristic values of missing characteristics in the target service data according to the similar service data and the similarity between each similar user and the target user, thereby realizing the completion of the target historical service data and improving the accuracy of the target historical service data.
Drawings
FIG. 1 is a diagram of an application scenario of a method for generating value prediction model input data in one embodiment;
FIG. 2 is a schematic flow chart diagram illustrating a method for generating value prediction model input data in one embodiment;
FIG. 3 is a flow diagram illustrating a method for user value prediction in one embodiment;
FIG. 4 is a schematic flow chart illustrating a method for constructing a value prediction model according to an embodiment;
FIG. 5 is an overall schematic diagram of model training and prediction in one embodiment;
FIG. 6 is a block diagram of an embodiment of a value prediction model input data generation apparatus;
FIG. 7 is a block diagram showing the structure of a user value prediction apparatus according to an embodiment;
FIG. 8 is a block diagram of a user request processing device in one embodiment;
FIG. 9 is a diagram illustrating an internal structure of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The value prediction model input data generation method provided by the application can be applied to the application environment shown in FIG. 1. Wherein the terminal 102 communicates with the server 104 via a network. The server 104 acquires target historical service data corresponding to a target user from the terminal 102; extracting preset target characteristic values corresponding to a plurality of target characteristics from target historical service data; when the target features of which the target feature values are not extracted exist, the target features of which the target feature values are not extracted are used as missing features, and similar users corresponding to the target users and the similarity between the target users and the similar users are obtained; extracting similar characteristic values corresponding to the missing characteristics from similar service data corresponding to similar users; calculating to obtain a missing characteristic value corresponding to the missing characteristic according to the similarity and the similar characteristic value of the similar user; and obtaining value prediction model input data according to the calculated missing characteristic value and the extracted target characteristic value. The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices, and the server 104 may be implemented by an independent server or a server cluster formed by a plurality of servers.
In one embodiment, as shown in fig. 2, a value prediction model input data generation method is provided, which is described by taking the method as an example applied to the server 104 in fig. 1, and in other embodiments, the method can also be applied to the terminal 102, and the method includes the following steps:
step 202, obtaining target historical service data corresponding to the target user.
Wherein the target user is a user who needs to perform value prediction. The target historical service data may be data corresponding to the target user in a historical period of time. For example, the target historical service data may include attribute information of the target user, historical transaction data of the target user, historical behavior information of the target user, and the like. It is understood that the user attribute information may be the name, gender, and geographic location of the target user. The historical transaction data of the target user may be the transaction stream generated by the target user during the transaction, such as the product purchased by the target user, the frequency of purchasing the product, the price of purchasing the product, and the like. The behavior information of the target user may be the behavior of the user in a transaction scenario or in other non-transaction scenarios, such as the behavior of whether the transaction of the target user is successful or not.
Furthermore, the server can also perform cleaning processing on the crawled target historical service data, such as removing error data in the target historical service data or data which does not conform to a standard format, so that the target historical service data conforms to subsequent data processing requirements, and the accuracy of data processing is improved. And the server can also normalize the crawled target historical business data so as to enable data calculation among data of different dimensions.
Step 204, extracting preset target characteristic values corresponding to a plurality of target characteristics from the target historical service data.
The target characteristics are characteristics extracted from target historical business data and used for characterizing target users. The target feature value is a numerical value of the target feature. It is to be understood that the target characteristics may be the target user's purchase interval, frequency of purchases, successive years of purchases, premium, profit, house property value, vehicle value, monthly revenue, and premium revenue proportion, among others. The target feature value is a specific numerical value corresponding to each target feature. Also, the target feature may be a predetermined feature.
And step 206, when the target features of which the target feature values are not extracted exist, taking the target features of which the target feature values are not extracted as missing features, and acquiring similar users corresponding to the target users and the similarity between the target users and the similar users.
The missing feature refers to a target feature which cannot be extracted from the target historical service data to obtain a corresponding target feature value. Specifically, the target historical service data does not include the target feature, for example, the target historical service data does not include the property value of the target user, so the server cannot extract the target feature value of the property value from the target historical service data.
In one embodiment, more than one target feature may be predetermined, then the server extracts a target feature value corresponding to each predetermined target feature from the target historical business data, and when the target feature values corresponding to all the predetermined target features are successfully extracted from the target historical business data, the target historical business data is qualified data, and the prediction step of the target user value may be performed according to the target historical business data. When the target characteristic values corresponding to all the predetermined target characteristics cannot be successfully extracted from the target historical service data, the target historical service data is indicated to be unqualified data, namely missing part data in the target historical service data, and at this time, if the value of the target user predicted according to the target historical service data of the missing part data is inaccurate, the target historical service data is not accurate.
In one embodiment, the server extracts a target characteristic value corresponding to each target characteristic from the target historical service data, and divides the target characteristic into a missing characteristic and a normal characteristic according to whether the target characteristic value of the target characteristic is a missing value. Specifically, a target feature from which a target feature value cannot be extracted from the target historical traffic data is taken as a missing feature, and a target feature from which a target feature value can be extracted is taken as a normal feature. When the server judges that the target characteristics have the missing characteristics, the server indicates that the data used for representing the missing characteristics of the user are missing in the target historical service data. After extracting the target characteristic value corresponding to more than one target characteristic from the target historical business data, the method further comprises the following steps: and when the server judges that the target characteristics have missing characteristics, acquiring similar users corresponding to the target users and similar service data corresponding to the similar users. And extracting similar data corresponding to the missing features from the similar service data, and determining missing feature values corresponding to the missing features according to the similar data.
Wherein the similar user is a user similar to the target user, such as the similar user may be a user with similar transaction behavior with the target user. Specifically, when the server judges that the target feature has the missing feature, similar users similar to the target user are obtained, and the missing feature value of the target user is estimated according to similar service data of the similar users. It should be noted that the number of similar users may be one or more, and is not limited herein.
And step 208, extracting similar characteristic values corresponding to the missing characteristics from the similar service data corresponding to the similar users.
And the server acquires similar service data corresponding to similar users so as to supplement missing data in the target historical service data according to the similar service data. Specifically, the server may extract a similar feature value corresponding to the missing feature from the similar service data, and then supplement the missing feature value of the missing feature with the similar feature value, so as to obtain the missing feature value of the missing feature.
And step 210, calculating to obtain a missing feature value corresponding to the missing feature according to the similarity of the similar users and the similar feature value.
In one embodiment, when the number of the similar users is multiple, the method further includes calculating the similarity between each similar user and the target user. And then calculating to obtain a missing characteristic value corresponding to the missing characteristic in the server according to the similarity of the similar users and the similar characteristic value calculated by each similar user.
Step 212, model input data is obtained according to the calculated missing feature value and the extracted target feature value.
In one embodiment, when the insurance company collects user (client) data, only basic information of the user may be collected, and value information of the user, such as monthly income, liability condition, house value, vehicle value and the like, may not be acquired, or data of other aspects of the user may be collected through specific business scenarios, but the obtained user data is a sparse matrix, and in many cases, the collected target historical business data of the target user is not comprehensive. And when the sparse matrix is input into the random forest prediction model to predict the user value, the accuracy of the model is influenced. Therefore, in the embodiment, the missing data of the target user can be complemented by the data of the similar user similar to the target user. Specifically, the unknown data of the target user may be weight voted according to the correlation coefficient, and the sparse matrix may be filled, for example, the target historical service data may be complemented by local matrix voting.
The value prediction model input data generation method acquires target historical service data corresponding to a target user, and extracts preset target characteristic values corresponding to a plurality of target characteristics from the target historical service data. And after the target characteristic value is obtained, whether the target characteristic value is not extracted is judged (namely whether the target characteristic value corresponding to each target characteristic has a missing value is judged), so that the accuracy of the target historical service data is judged before the user value is predicted. And when there is a target feature for which the target feature value is not extracted (that is, when there is a target feature value corresponding to the target feature as a missing value), the target feature for which the target feature value is not extracted is taken as the missing feature. And then acquiring similar users corresponding to the target user and similar service data corresponding to each similar user, so as to complete missing characteristic values of missing characteristics in the target service data according to the similar service data and the similarity between each similar user and the target user, thereby realizing the completion of the target historical service data and improving the accuracy of the target historical service data.
In one embodiment, a target feature which is extracted from target historical business data to a target feature value is taken as a normal feature; acquiring similar users corresponding to the target user and the similarity between the target user and the similar users, wherein the steps comprise: and extracting the preset similar characteristic values corresponding to the target characteristics from the similar service data corresponding to the similar users. And determining a similar characteristic mean value according to each similar characteristic value, and determining a normal characteristic mean value according to normal characteristic values corresponding to each normal characteristic in the target historical service data. Determining a similarity difference value of each target feature according to the similarity feature mean value and the similarity feature value corresponding to each target feature; determining a target difference value of each target characteristic according to the target characteristic mean value and a target characteristic value corresponding to each target characteristic; and determining the similarity between the target user and the similar user according to the similarity difference corresponding to each target feature and the target difference.
Specifically, the server extracts similar characteristic values corresponding to the target characteristics j from similar service data corresponding to the similar users i
Figure DEST_PATH_IMAGE001
. Then, averaging all similar characteristic values to obtain similar characteristic mean values
Figure 569117DEST_PATH_IMAGE002
. Assuming that the information of similar users is a matrix of i x j columns, wherein
Figure DEST_PATH_IMAGE003
And corresponding similar characteristic values on the target characteristic j are the similar users i. Then the mean of similar features corresponding to similar users i can be calculated by the following formula (1).
Figure 705700DEST_PATH_IMAGE004
(1)
The server extracts normal characteristic values corresponding to all normal characteristics from the target historical service data of the target user a comprising the missing characteristics
Figure 905738DEST_PATH_IMAGE005
Then, a normal feature mean value is calculated from the normal feature values
Figure DEST_PATH_IMAGE006
. Then according to the mean value of the similar features
Figure 587517DEST_PATH_IMAGE002
And similar characteristic value corresponding to each target characteristic
Figure 748371DEST_PATH_IMAGE003
Determining similarity difference values of the target features
Figure 16541DEST_PATH_IMAGE007
. According to the target feature mean
Figure 195719DEST_PATH_IMAGE006
And target feature values corresponding to the respective target features
Figure 156721DEST_PATH_IMAGE005
Determining a target difference value for each target feature
Figure DEST_PATH_IMAGE008
. And finally, determining the similarity between the target user and the similar users according to the similarity difference corresponding to each target feature and the target difference, specifically, the similarity between the target user a and the similar user i
Figure 336030DEST_PATH_IMAGE009
The calculation formula is shown in formula (2).
Figure DEST_PATH_IMAGE010
(2)
And, according to the calculation
Figure 299569DEST_PATH_IMAGE009
The corresponding magnitude of the value may be used to evaluate the degree of similarity between the target user and the similar users. Specifically, the larger the similarity value between the target user and the similar user is, the higher the similarity between the target user and the similar user is, that is, when the similar user scores the missing feature value of the target user, the higher the corresponding weight is.
In one embodiment, calculating a missing feature value corresponding to the missing feature according to the similarity of similar users and the similar feature value includes: determining a similarity adjustment value corresponding to each similar user according to the similarity characteristic mean value, the similarity characteristic value and the similarity degree respectively corresponding to each similar user; and calculating to obtain a missing characteristic value corresponding to the missing characteristic according to the normal characteristic mean value and the similar adjustment value respectively corresponding to each similar user.
Specifically, for the target user a with unknown information (missing feature), the value of the missing feature estimated according to similar users
Figure 349565DEST_PATH_IMAGE011
The calculation may be by weight voting. In one embodiment, the predicted value of the target feature j of the target user a
Figure 215890DEST_PATH_IMAGE011
The sum of the normal feature mean value of the normal feature corresponding to the known information of the target user a and the difference value between the target feature value of the n similar users and the similar feature mean value of the corresponding similar users multiplied by the similar values of the similar users. The specific calculation process is shown in formula (3).
Figure DEST_PATH_IMAGE012
(3)
In this embodiment, the target historical service data of the target user is complemented by the known similar service data corresponding to the similar user, so that the accuracy of the target historical service data is higher. And the missing characteristic value of the target user is voted according to the similarity between different similar users and the target user to supplement the matrix information of the target user, so that a sparse matrix is changed into a dense matrix, and the accuracy of predicting the user value according to a value prediction model in the follow-up process is improved.
In one embodiment, the method further comprises: preprocessing target historical service data; the preprocessing includes at least one of data verification, data cleansing, and data normalization.
In one embodiment, the data verification may include verifying the accuracy of the data in the target historical traffic data. The data cleansing may include cleansing erroneous data in the user data and removing the erroneous data. Specifically, due to data input errors, different representations caused by different source data, inconsistency between data and the like, the existing data has such or other dirty data, which mainly appears as follows: illegal values, entered non-specifications, inconsistent values, data duplication, and the like. The data cleaning function comprises removing unnecessary fields, cleaning format contents, filling vacancy values, cleaning logic errors, verifying data authenticity and the like.
Firstly, extracting determined customer information data from Oracle to a hive platform, inputting and extracting a database table and data to be cleaned, and performing information authenticity check according to a specific check rule, such as province check of an identity card, sector verification of a mobile phone number and the like, so as to clean real and effective data. In one embodiment, data cleansing includes: 1. the method specifically requires that the length of the identity card number is 15 or 18 bits, the identity card number needs to be in accordance with regional code verification, the identity card number needs to be in accordance with identity card date verification, the identity card number needs to be in accordance with identity card check bit judgment, and the identity card number does not contain abnormal numbers such as '0000'. When the ID card code is determined to be unsatisfactory with any of the above requirements, the ID card code is nulled. 2. If the user and the field personnel set the identity card number or the mobile phone number to be the same numerical value, the identity card number or the mobile phone number is also set to be null. 3. The name only keeps pure Chinese, pure letters and blank spaces, and the data mixed with Chinese and English is removed. 4. The length of the mobile phone number is not equal to 11 bits, the unconventional mobile phone number is verified according to a given rule, or the number comprises unconventional numbers such as 000000' and the like, and the number is empty. 5. The name length is more than or equal to 3 bits, and contains the character of 'equal to' and is removed. 7. And 3 different clients using the same identity card number and mobile phone number are rejected. 8. The name contains "company", culling.
In one embodiment, the data cleansing rules include: a. and the cleaning rule (admission rule) comprises null value verification, identity card number verification, mobile phone number verification and the like, and when the field does not accord with the configured rule, a new value is given according to a specified default value. When the field is null or null, it is replaced with a character string 'null'. And finally, forming a corresponding new data record by each piece of original record data according to the data cleaning rule, if the data of the row is valid, entering the next ID to get through, and if not, filtering. b. And (4) checking a null value, judging whether the field value is the null value, and if so, giving a character string 'null' to a field default value. c. And (4) checking a Null value, judging whether the field value is the Null value, and if so, giving a character string Null to a field default value. d. Checking the ID card number, judging whether the ID card number is legal, whether the area code verification is effective, whether the ID card date verification is effective, judging the last bit of the ID card value, and judging the length of the ID card. The method specifically comprises the following steps: when the province code of the identity card is incorrect, the identity card is nulled; when the regular expression of the identity card is judged to be incorrect, emptying the identity card; when the check position of the identity card is incorrect, the identity card is empty; when the ID card contains '0000', the ID card is empty.
And (3) checking the mobile phone number, specifically, judging whether the mobile phone number is legal or not, such as judging the length of the mobile phone number, judging whether the mobile phone number starts with 1 or not, judging whether the mobile phone number is an abnormal number such as 1111111111 or not, and the like. The method comprises the following specific steps: when the length of the mobile phone number is not equal to 11, the mobile phone number is empty; when the mobile phone number is not started with 1, the mobile phone number is empty; when the mobile phone number contains '000000', the mobile phone number is empty; when the mobile phone number contains '11111111', the mobile phone number is empty; when the mobile phone number contains '22222222', the mobile phone number is set to be null; when the mobile phone number contains '33333333', the mobile phone number is empty; when the mobile phone number contains '44444444', the mobile phone number is empty; when the mobile phone number contains '5555555555', the mobile phone number is empty; when the mobile phone number contains '66666666', the mobile phone number is empty; when the mobile phone number contains '77777777', the mobile phone number is empty; when the mobile phone number contains '88888888', the mobile phone number is empty; when the mobile phone number contains '99999999', the mobile phone number is empty; when the mobile phone number contains '23456789', the mobile phone number is set to be null; when the mobile phone number contains '12345678', the mobile phone number is set to be null; when the mobile phone number contains '01234567', the mobile phone number is empty; when the mobile phone number contains '34567890', the mobile phone number is empty; when the mobile phone number contains '456789', the mobile phone number is empty; when the mobile phone number contains '1380013800', the mobile phone number is set to be empty.
The same process as the agent information. Specifically, when the data in the user basic information summary table is judged not to belong to the agent but the agent information is used, the corresponding information is nulled.
In one embodiment, the step of normalizing the data comprises: because a client may be reached by multiple portals through multiple paths, the same client may be tagged with multiple IDs on different systems. Also, when a user transacts a business several times, it may be considered as two clients because of the difference in the provided information. When analyzing the value of a client, the ID normalization is needed to collect the data of the client in all systems and all time periods, and the client ID is called through. Specifically, the data cut-through rule includes: the method comprises the steps of obtaining a user basic information data summary table in a server, generating a new user ID for a user as a unique identification of the user, storing the new user ID in the summary table at a first field position, wherein one client ID corresponds to a plurality of pieces of record data, but one record only belongs to one client ID.
And (4) ID opening: and adding a field which can identify the user uniquely by a specified rule to each piece of data, and simultaneously, keeping a main key of each piece of data in the source table by a field so as to trace back the source table, and finally storing the processed result data in the hive data warehouse. Specifically, the rules currently used as the rules for identifying users are as follows: determining a user by the first two digits of the name plus the certificate number; name + mobile phone number, determining a user; determining a user by the certificate number and the mobile phone number; name + bank card number, determining a user; name + micro-signal identifies a user; the mobile phone number and the bank card number determine a user; determining a user by the mobile phone number and the micro signal; name + device ID, determining a user; mobile phone number + device ID, determine a user, etc. And are not intended to be limiting herein.
As shown in fig. 3, a flow chart of a user value prediction method is provided, and the method includes:
step 302, model input data is obtained according to the value prediction model input data generation method provided in any of the above embodiments.
Step 304, inputting model input data into a pre-constructed value prediction model, and obtaining the predicted value of the target user in a future time period according to the value prediction model; the value prediction model is constructed according to the training characteristics and the training values, and the training characteristics and the training values are obtained from historical business data corresponding to more than one user respectively.
The value prediction model is a pre-trained model and can be used for predicting the value of the target user. Specifically, the server acquires historical service data corresponding to a plurality of users respectively, then extracts training features from the historical service data, and determines a training value according to the training features so as to train a model according to the training features and the training value to obtain a value prediction model. It is understood that the value prediction model may be a decision tree model, a random forest model, a regression model, or a machine learning model, and the like, and is not limited herein.
It should be noted that the training feature and the target feature may be the same feature or different features, and are not limited herein. And the training value is all or part of the features extracted from the training features and is used for calculating the training value of the user according to the training value.
The user value prediction method obtains target historical service data corresponding to a target user, and extracts preset target characteristic values corresponding to a plurality of target characteristics from the target historical service data. And after the target characteristic values are obtained, whether the target characteristic values corresponding to the target characteristics have missing values or not is judged, so that the accuracy of the target historical service data is judged before the user value is predicted. And only when the target characteristic values corresponding to the target characteristics are judged to have no missing values, the accurate target characteristic values are input into a pre-constructed value prediction model, the prediction value of the target user in the future time period is obtained according to the value prediction model, and the accuracy of value prediction of the target user is improved. The value prediction model is constructed according to the training characteristics and the training values, and the training characteristics and the training values are obtained from historical business data corresponding to more than one user respectively. Before the value of the target user is predicted, the accuracy of the value of the target user obtained according to target characteristic prediction in the following process is guaranteed by verifying the target characteristic value extracted from the target historical service data.
In one embodiment, as shown in fig. 4, a flow chart of a method for constructing a value prediction model is provided, and the method includes:
step 402, obtaining historical service data corresponding to more than one user respectively.
The historical service data may be data corresponding to a historical period of time. The historical service data may include attribute information of the user, historical transaction data of the user, historical behavior information of the user, and the like. It is understood that the user attribute information may be the name, gender, and geographic location of the user. The user's historical transaction data may be a running line of transactions that the user generates during the transaction, such as the products purchased by the user, the frequency with which the products are purchased, the price at which the products are purchased, and the like. The behavior information of the user may be the behavior of the user in a transaction scenario or in other non-transaction scenarios, such as the behavior of whether the transaction of the user is successful or not.
Specifically, the server crawls historical business data of a plurality of users from a business system. Furthermore, the server can also perform cleaning processing on the crawled historical service data, such as removing error data in the historical service data or data which does not conform to a standard format, so that the historical service data conforms to subsequent data processing requirements, and the accuracy of data processing is improved. And the server can also normalize the crawled historical service data so as to enable data calculation among data of different dimensions.
Step 404, extracting training features from each historical service data.
The training features are extracted from historical traffic data to characterize the user. It will be appreciated that the training features may be the same features as the target features, e.g., the training features may be user intervals for purchase of the product, frequency of purchase, age of successive purchases, premium, profit, house property value, vehicle value, monthly revenue, and premium revenue duty, etc.
Step 406, extracting value calculation features from the training features.
Wherein the value calculation feature is a part or all of the features extracted from the training features for calculating the training value of the user. Specifically, the server extracts premium characteristics and profit characteristics from the training characteristics and then calculates a training value based on the premium characteristics and the profit characteristics.
Step 408, obtain feature weights for each value calculation feature.
The feature weight may be predetermined, and the feature weight is set in advance for the value calculation feature. In one embodiment, the server acquires historical business data of a plurality of users and value labels corresponding to the users, analyzes and mines the historical business data of the users to find out the value calculation features with the maximum relevance with the value labels of the users from the historical business data, determines the influence of the value calculation features on the value labels of the users, and determines the feature weight of the value calculation features according to the influence.
Step 410, determining the training value of each user according to the feature weight and the value calculation feature.
Specifically, the server multiplies the feature weight by the corresponding value calculation feature, and sums up to obtain the training value of each user.
And 412, training the prediction model according to the training characteristics and the training value, and stopping training the prediction model when the training ending condition is met to obtain the value prediction model.
Specifically, the training characteristics and the training value are used as a training set and used for training a prediction model, and when the training precision of the obtained prediction model meets a preset precision condition or the iteration number of the training reaches a preset number, the training of the prediction model is stopped, so that the value prediction model is obtained.
In one embodiment, after the training features are obtained, extracting some or all of the features from the training features to obtain key features, and training the prediction model according to the key features extracted from the training features and the training value. Specifically, the step of extracting key features from the training features includes: extracting a feature vector corresponding to each training feature; and extracting key features from the training features according to the vector feature value of each feature vector. Specifically, the top-ranked key features can be obtained according to the scores of the vector feature values.
In an embodiment, the description will be given by taking an example that the server acquires historical service data corresponding to the user in 2015-2018 and predicts the value of the user in 2019 according to the historical service data.
Firstly, data normalization is carried out on the data of the premium characteristic value and the profit characteristic value according to the distribution situation of the premium characteristic value and the profit characteristic value data in the historical business data. And determining a premium weight for the premium characteristic and a profit weight for the profit characteristic based on the business experience. And then fitting the premium characteristic value and the premium weight and the profit characteristic value and the profit weight to construct a premium profit comprehensive consideration expression. And calculating a score according to the comprehensive consideration expression, and determining the user value according to the score.
Specifically, the users are comprehensively ranked and layered according to the scores calculated by the comprehensive consideration expression. For example, users may be ranked according to score ranking, such as ranking top 10% of users with a first value of training value, 20% -50% of users with a second value of training value, 50% -70% of users with a third value of training value, 70% -90% of users with a fourth value of training value, and 90% -100% of users with a fifth value of training value.
Then, training a prediction model according to training characteristics extracted from historical service data and a training value obtained by calculation according to a comprehensive consideration expression, carrying out model training, optimizing and iterating to obtain a stable algorithm model, and storing the obtained value prediction model. In one embodiment, a prediction model can be constructed by using a random forest algorithm, and the results of a plurality of decision trees are subjected to weighted aggregation, so that the algorithm is more stable, and the risk of overfitting is reduced.
Furthermore, in the process of constructing the prediction model by using the random forest algorithm, the parameter tuning algorithm of the prediction model mainly has three characteristics for tuning. One of which is the maximum number of features: random forests allow a single decision tree to use the maximum number of features. Increasing the maximum features of a single tree generally improves the fitting performance of the model, since more choices are available for consideration at each node. However, the diversity of the individual trees is reduced and the speed of the algorithm is reduced by increasing the maximum number of features. It is therefore necessary to select the best maximum number of features. Typically, for models with feature numbers less than 200, the maximum feature data may be considered between 35% and 75% of the total features. And performing targeted adjustment according to the fitting condition of the model, wherein overfitting is reduced, and underfitting is improved. Again, the number of trees: the number of trees has two influences on the model, one is the fitting ability of the model, and the larger the number is, the lower the calculation speed is, and the better the fitting ability is. For low sample diversity (depending on the number of features and label class), typically no more than two hundred trees are used. And leaf node minimum sample number: the number of the samples of the leaf nodes can control the complexity of the model, and meanwhile, the robustness of the model can be well guaranteed, and in one embodiment of the application, for a scene with few classes of business and multiple training samples, the minimum number of the samples of the leaf nodes can be set to be a large value (larger than 50).
Specifically, grid _ search can be performed through cross validation, and the optimal parameters are selected to obtain the optimal model. According to general business knowledge, in a specified parameter range, a smaller super-parameter value field is listed, and Cartesian products (permutation and combination) of the super-parameter value fields form a group of super-parameters. And then, carrying out error test on each hyper-parameter model by using a cross validation method to obtain an optimal model.
As shown in FIG. 5, FIG. 5 provides an overall schematic of model training and prediction. Specifically, the value prediction model shown in fig. 5 includes two stages, namely training and prediction. Specifically, the example of predicting the user value in 2019 by using the historical business data training model in 2015-2018 is described. The section starting from the upper left corner in fig. 5 corresponds to the feature extraction stage. Specifically, the server obtains 2015-2018 historical service data (the historical service data may include user policy data, user claim data, user report data, and the like), and performs data processing (data prediction, such as data verification, data cleaning, data normalization, and the like) on the historical service data. And then, performing feature engineering on the data subjected to data processing, for example, extracting preset target feature values corresponding to a plurality of target features from the historical service data subjected to data processing, and then performing feature screening on the target features to obtain the screened features related to the value of the user.
Continuing to refer to fig. 5, starting from the lower left corner in fig. 5, the server obtains the historical service data corresponding to the year 2015-. And finally, establishing a multi-classification problem according to the features obtained by screening and the value categories (targets) of the users as learning labels, training the model to obtain a logistic regression model or a decision tree model and the like, and predicting the value categories of the customers (users) of the users in 2019 according to the trained model.
A user request processing method comprises the following steps: receiving a user request, wherein the user request carries user data; processing the user data by the user value prediction method illustrated in any one of the above embodiments to obtain a user value; acquiring a service strategy corresponding to the user value; and processing the user request according to the service strategy.
In a specific application scenario, value classification can be performed on users, and different services can be distributed to users with different values and different service strategies can be adopted. Wherein the service policy can be a service tactical policy, or a service content policy, etc.
In a specific application scenario, a business department hopes to model the user value of a user so as to realize value division of the user, and further, different service strategies are adopted for different value users, so that the function of matching corresponding services for users of different grades is realized, the matching degree of the user and the server strategy is improved, unnecessary computer matching processes are reduced, and computer resources are saved. Meanwhile, the user experience and the service accuracy are improved.
In a specific application scenario, firstly, normalization of user ID is performed on the obtained user historical service data, user historical service data in different service systems are communicated, and unique identification is given. And then selecting a time interval to extract features through the normalized user ID. The target features will be converted to value tags according to business logic. And (3) constructing a random forest model, and carrying out super-parameter tuning by using a grid search CV method to obtain an optimal model. And storing the user value generated by the model into a user data analysis application platform as a user label. Differentiated service or differentiated claim settlement is carried out through the user tags, and when a user touches the user, the user seat can inquire the value tags of the user through the user data analysis platform. And by taking the label as a reference, different dialogues and service strategies are used for providing individuation for the user and providing targeted service.
It should be understood that although the various steps in the flow charts of fig. 2-4 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2-4 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternating with other steps or at least some of the sub-steps or stages of other steps.
In one embodiment, as shown in FIG. 6, there is provided a value prediction model input data generation apparatus 600 comprising:
the obtaining module 602 is configured to obtain target historical service data corresponding to a target user.
The extracting module 604 is configured to extract target feature values corresponding to a plurality of preset target features from the target historical service data.
And a similarity obtaining module 606, configured to, when there is a target feature for which the target feature value is not extracted, take the target feature for which the target feature value is not extracted as a missing feature, and obtain a similar user corresponding to the target user and a similarity between the target user and the similar user.
The similar data extracting module 608 is configured to extract a similar feature value corresponding to the missing feature from similar service data corresponding to a similar user.
And the calculating module 610 is configured to calculate a missing feature value corresponding to the missing feature according to the similarity of the similar users and the similar feature value.
And the generating module 612 is configured to obtain value prediction model input data according to the calculated missing feature value and the extracted target feature value.
In an embodiment, the similarity obtaining module 606 is further configured to extract, from similar service data corresponding to similar users, a preset similar feature value corresponding to each target feature; determining a similar characteristic mean value according to each similar characteristic value; determining a normal characteristic mean value according to normal characteristic values corresponding to all normal characteristics in the target historical service data; determining a similarity difference value of each target feature according to the similarity feature mean value and the similarity feature value corresponding to each target feature; determining a target difference value of each target characteristic according to the target characteristic mean value and a target characteristic value corresponding to each target characteristic; and determining the similarity between the target user and the similar user according to the similarity difference corresponding to each target feature and the target difference.
In one embodiment, the calculation module 610 is further configured to: determining a similarity adjustment value corresponding to each similar user according to the similarity characteristic mean value, the similarity characteristic value and the similarity degree respectively corresponding to each similar user; and calculating to obtain a missing characteristic value corresponding to the missing characteristic according to the normal characteristic mean value and the similar adjustment value respectively corresponding to each similar user.
In one embodiment, the value prediction model input data generation apparatus further includes a preprocessing module 614, where the preprocessing module 614 is configured to preprocess the target historical business data; the preprocessing includes at least one of data verification, data cleansing, and data normalization.
In one embodiment, as shown in FIG. 7, there is provided a user value prediction apparatus 700, comprising:
an input data obtaining module 702 is configured to obtain model input data according to the value prediction model input data generation method in the foregoing embodiment.
The prediction module 704 is used for inputting the model input data into a pre-constructed value prediction model and obtaining the predicted value of the target user in the future time period according to the value prediction model; the value prediction model is constructed according to the training characteristics and the training values, and the training characteristics and the training values are obtained from historical business data corresponding to more than one user respectively.
In one embodiment, the user value prediction apparatus 700 further includes a model building module 706, where the model building module 706 is configured to obtain historical service data corresponding to more than one user; respectively extracting training characteristics from each historical service data; extracting value calculation features from the training features; acquiring the feature weight of each value calculation feature; determining the training value of each user according to the characteristic weight and the value calculation characteristic; and training the prediction model according to the training characteristics and the training value, and stopping the training of the prediction model when the training ending condition is met to obtain the value prediction model.
In one embodiment, as shown in fig. 8, there is provided a user request processing apparatus 800, the apparatus comprising:
a request receiving module 802, configured to receive a user request, where the user request carries user data.
The value calculating module 804 is configured to process the user data by using the user value predicting method in the foregoing embodiment, so as to obtain the user value.
And a policy matching module 806, configured to obtain a service policy corresponding to the user value.
And the processing module 808 is configured to process the user request according to the service policy.
For the specific limitations of the above apparatus, reference may be made to the limitations of the above method, which are not described herein again. The various modules in the above-described apparatus may be implemented in whole or in part by software, hardware, and combinations thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 9. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing relevant business data for predicting the value of the user. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a value prediction model input data generation method.
Those skilled in the art will appreciate that the architecture shown in fig. 9 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is provided, comprising a memory storing a computer program and a processor implementing the following steps when the processor executes the computer program: acquiring target historical service data corresponding to a target user; extracting preset target characteristic values corresponding to a plurality of target characteristics from target historical service data; when the target features of which the target feature values are not extracted exist, the target features of which the target feature values are not extracted are used as missing features, and similar users corresponding to the target users and the similarity between the target users and the similar users are obtained; extracting similar characteristic values corresponding to the missing characteristics from similar service data corresponding to similar users; calculating to obtain a missing characteristic value corresponding to the missing characteristic according to the similarity and the similar characteristic value of the similar user; and obtaining value prediction model input data according to the calculated missing characteristic value and the extracted target characteristic value.
In one embodiment, the processor, when executing the computer program, further performs the step of obtaining a similar user corresponding to the target user and a similarity between the target user and the similar user, to: extracting preset similar characteristic values corresponding to all target characteristics from similar service data corresponding to similar users; determining a similar characteristic mean value according to each similar characteristic value; determining a normal characteristic mean value according to normal characteristic values corresponding to all normal characteristics in the target historical service data; determining a similarity difference value of each target feature according to the similarity feature mean value and the similarity feature value corresponding to each target feature; determining a target difference value of each target characteristic according to the target characteristic mean value and a target characteristic value corresponding to each target characteristic; and determining the similarity between the target user and the similar user according to the similarity difference corresponding to each target feature and the target difference.
In one embodiment, when the processor executes the computer program, the step of calculating the missing feature value corresponding to the missing feature according to the similarity and the similar feature value of the similar user is further configured to: determining a similarity adjustment value corresponding to each similar user according to the similarity characteristic mean value, the similarity characteristic value and the similarity degree respectively corresponding to each similar user; and calculating to obtain a missing characteristic value corresponding to the missing characteristic according to the normal characteristic mean value and the similar adjustment value respectively corresponding to each similar user.
In one embodiment, the processor, when executing the computer program, further performs the steps of: preprocessing target historical service data; the preprocessing includes at least one of data verification, data cleansing, and data normalization.
In one embodiment, a computer device is provided, comprising a memory storing a computer program and a processor implementing the following steps when the processor executes the computer program: according to the value prediction model input data generation method in the embodiment, model input data are obtained; inputting the model input data into a pre-constructed value prediction model, and obtaining the predicted value of the target user in a future time period according to the value prediction model; the value prediction model is constructed according to the training characteristics and the training values, and the training characteristics and the training values are obtained from historical business data corresponding to more than one user respectively.
In one embodiment, the processor, when executing the computer program, further performs the steps of: acquiring historical service data corresponding to more than one user respectively; respectively extracting training characteristics from each historical service data; extracting value calculation features from the training features; acquiring the feature weight of each value calculation feature; determining the training value of each user according to the characteristic weight and the value calculation characteristic; and training the prediction model according to the training characteristics and the training value, and stopping the training of the prediction model when the training ending condition is met to obtain the value prediction model.
In one embodiment, a computer device is provided, comprising a memory storing a computer program and a processor implementing the following steps when the processor executes the computer program: receiving a user request, wherein the user request carries user data; processing the user data by the user value prediction method in the embodiment to obtain the user value; acquiring a service strategy corresponding to the user value; and processing the user request according to the service strategy.
In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored, which when executed by a processor performs the steps of: acquiring target historical service data corresponding to a target user; extracting preset target characteristic values corresponding to a plurality of target characteristics from target historical service data; when the target features of which the target feature values are not extracted exist, the target features of which the target feature values are not extracted are used as missing features, and similar users corresponding to the target users and the similarity between the target users and the similar users are obtained; extracting similar characteristic values corresponding to the missing characteristics from similar service data corresponding to similar users; calculating to obtain a missing characteristic value corresponding to the missing characteristic according to the similarity and the similar characteristic value of the similar user; and obtaining value prediction model input data according to the calculated missing characteristic value and the extracted target characteristic value.
In one embodiment, the computer program when executed by the processor further performs the step of obtaining a similar user corresponding to the target user and a similarity between the target user and the similar user, further: extracting preset similar characteristic values corresponding to all target characteristics from similar service data corresponding to similar users; determining a similar characteristic mean value according to each similar characteristic value; determining a normal characteristic mean value according to normal characteristic values corresponding to all normal characteristics in the target historical service data; determining a similarity difference value of each target feature according to the similarity feature mean value and the similarity feature value corresponding to each target feature; determining a target difference value of each target characteristic according to the target characteristic mean value and a target characteristic value corresponding to each target characteristic; and determining the similarity between the target user and the similar user according to the similarity difference corresponding to each target feature and the target difference.
In one embodiment, when being executed by a processor, the computer program further performs the step of calculating a missing feature value corresponding to the missing feature according to the similarity and the similar feature value of the similar user, and is further configured to: determining a similarity adjustment value corresponding to each similar user according to the similarity characteristic mean value, the similarity characteristic value and the similarity degree respectively corresponding to each similar user; and calculating to obtain a missing characteristic value corresponding to the missing characteristic according to the normal characteristic mean value and the similar adjustment value respectively corresponding to each similar user.
In one embodiment, the computer program when executed by the processor further performs the steps of: preprocessing target historical service data; the preprocessing includes at least one of data verification, data cleansing, and data normalization.
In one embodiment, a computer device is provided, comprising a memory and a processor, the memory storing a computer program that when executed by the processor performs the steps of: according to the value prediction model input data generation method in the embodiment, model input data are obtained; inputting the model input data into a pre-constructed value prediction model, and obtaining the predicted value of the target user in a future time period according to the value prediction model; the value prediction model is constructed according to the training characteristics and the training values, and the training characteristics and the training values are obtained from historical business data corresponding to more than one user respectively.
In one embodiment, the computer program when executed by the processor further performs the steps of: acquiring historical service data corresponding to more than one user respectively; respectively extracting training characteristics from each historical service data; extracting value calculation features from the training features; acquiring the feature weight of each value calculation feature; determining the training value of each user according to the characteristic weight and the value calculation characteristic; and training the prediction model according to the training characteristics and the training value, and stopping the training of the prediction model when the training ending condition is met to obtain the value prediction model.
In one embodiment, a computer device is provided, comprising a memory and a processor, the memory storing a computer program that when executed by the processor performs the steps of: receiving a user request, wherein the user request carries user data; processing the user data by the user value prediction method in the embodiment to obtain the user value; acquiring a service strategy corresponding to the user value; and processing the user request according to the service strategy.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware related to instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above examples only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A method of generating value prediction model input data, the method comprising:
acquiring target historical service data corresponding to a target user;
extracting preset target characteristic values corresponding to a plurality of target characteristics from the target historical service data;
when target features with target feature values not extracted exist, the target features with the target feature values not extracted are used as missing features, and similar users corresponding to the target users and the similarity between the target users and the similar users are obtained; the similar user and the target user have similar transaction behaviors;
extracting similar characteristic values corresponding to the missing characteristics from similar service data corresponding to the similar users, including: extracting preset similar characteristic values corresponding to the target characteristics from the similar service data corresponding to the similar users; determining a similar characteristic mean value according to each similar characteristic value;
calculating to obtain a missing feature value corresponding to the missing feature according to the similarity of the similar users and the similar feature value, including: determining a normal characteristic mean value according to normal characteristic values corresponding to all normal characteristics in the target historical service data; determining the weight of the corresponding similar user according to the similarity of the similar users; determining a similar adjustment value corresponding to each similar user according to the similar feature mean value, the similar feature value and the weight corresponding to each similar user; calculating to obtain a missing feature value corresponding to the missing feature according to the normal feature mean value and the similar adjustment value corresponding to each similar user;
and obtaining value prediction model input data according to the calculated missing characteristic value and the extracted target characteristic value.
2. The value prediction model input data generation method according to claim 1, characterized in that a target feature extracted from the target historical business data to a target feature value is taken as a normal feature; the obtaining of the similar user corresponding to the target user and the similarity between the target user and the similar user includes:
determining a similarity difference value of each target feature according to the similarity feature mean value and the similarity feature value corresponding to each target feature;
determining a target difference value of each normal feature according to the normal feature mean value and the normal feature value corresponding to each normal feature;
and determining the similarity between the target user and the similar user according to the similarity difference corresponding to each target feature and the target difference.
3. The value prediction model input data generation method of any of claims 1 or 2, further comprising:
preprocessing the target historical service data; the preprocessing comprises at least one of data verification, data cleaning and data normalization.
4. A method for predicting user value, the method comprising:
the value prediction model input data generation method according to any one of claims 1 to 3, obtaining model input data;
inputting the model input data into a pre-constructed value prediction model, and obtaining the predicted value of the target user in a future time period according to the value prediction model; the value prediction model is constructed according to training features and training values, and the training features and the training values are obtained from historical business data corresponding to more than one user respectively.
5. The user value prediction method according to claim 4, wherein the value prediction model is constructed by a method comprising:
acquiring historical service data corresponding to more than one user respectively;
respectively extracting training characteristics from each historical service data;
extracting value calculation features from the training features;
obtaining a feature weight of each value calculation feature;
determining the training value of each user according to the feature weight and the value calculation feature;
and training a prediction model according to the training characteristics and the training value, and stopping training the prediction model when a training ending condition is met to obtain the value prediction model.
6. A method for processing a user request, the method comprising:
receiving a user request, wherein the user request carries user data;
processing the user data by the user value prediction method of any one of claims 4 to 5 to obtain a user value;
acquiring a service strategy corresponding to the user value;
and processing the user request according to the service strategy.
7. A value prediction model input data generation apparatus, the apparatus comprising:
the acquisition module is used for acquiring target historical service data corresponding to a target user;
the extraction module is used for extracting preset target characteristic values corresponding to a plurality of target characteristics from the target historical service data;
the similarity acquisition module is used for taking the target features of which the target feature values are not extracted as missing features when the target features of which the target feature values are not extracted exist, and acquiring similar users corresponding to the target users and the similarity between the target users and the similar users; the similar user and the target user have similar transaction behaviors;
a similar data extracting module, configured to extract a similar feature value corresponding to the missing feature from similar service data corresponding to the similar user, where the similar data extracting module is configured to: extracting preset similar characteristic values corresponding to the target characteristics from the similar service data corresponding to the similar users; determining a similar characteristic mean value according to each similar characteristic value;
a calculating module, configured to calculate, according to the similarity of the similar users and the similar feature value, a missing feature value corresponding to the missing feature, where the calculating module includes: determining a normal characteristic mean value according to normal characteristic values corresponding to all normal characteristics in the target historical service data; determining the weight of the corresponding similar user according to the similarity of the similar users; determining a similar adjustment value corresponding to each similar user according to the similar feature mean value, the similar feature value and the weight corresponding to each similar user; calculating to obtain a missing feature value corresponding to the missing feature according to the normal feature mean value and the similar adjustment value corresponding to each similar user;
and the generating module is used for obtaining value prediction model input data according to the calculated missing characteristic value and the extracted target characteristic value.
8. The apparatus of claim 7, wherein the similarity obtaining module is further configured to determine a similarity difference value of each target feature according to the similar feature mean and the similar feature value corresponding to each target feature; determining a target difference value of each normal feature according to the normal feature mean value and the normal feature value corresponding to each normal feature; and determining the similarity between the target user and the similar user according to the similarity difference corresponding to each target feature and the target difference.
9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 6 when executing the computer program.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 6.
CN202110531498.4A 2021-05-17 2021-05-17 Value prediction model input data generation method, device, equipment and medium Active CN112990989B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110531498.4A CN112990989B (en) 2021-05-17 2021-05-17 Value prediction model input data generation method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110531498.4A CN112990989B (en) 2021-05-17 2021-05-17 Value prediction model input data generation method, device, equipment and medium

Publications (2)

Publication Number Publication Date
CN112990989A CN112990989A (en) 2021-06-18
CN112990989B true CN112990989B (en) 2021-07-30

Family

ID=76336636

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110531498.4A Active CN112990989B (en) 2021-05-17 2021-05-17 Value prediction model input data generation method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN112990989B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113657945A (en) * 2021-08-27 2021-11-16 建信基金管理有限责任公司 User value prediction method, device, electronic equipment and computer storage medium
CN115455708B (en) * 2022-09-19 2023-12-19 贵州航天云网科技有限公司 Multi-model local modeling method based on vector discrimination

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111694830A (en) * 2020-06-12 2020-09-22 复旦大学 Missing data completion method based on deep ensemble learning
CN112241916A (en) * 2020-10-22 2021-01-19 北京大学 Personal credit risk default early warning method, device, equipment and storage medium
CN112269937A (en) * 2020-11-16 2021-01-26 加和(北京)信息科技有限公司 Method, system and device for calculating user similarity

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111694830A (en) * 2020-06-12 2020-09-22 复旦大学 Missing data completion method based on deep ensemble learning
CN112241916A (en) * 2020-10-22 2021-01-19 北京大学 Personal credit risk default early warning method, device, equipment and storage medium
CN112269937A (en) * 2020-11-16 2021-01-26 加和(北京)信息科技有限公司 Method, system and device for calculating user similarity

Also Published As

Publication number Publication date
CN112990989A (en) 2021-06-18

Similar Documents

Publication Publication Date Title
CN108876133B (en) Risk assessment processing method, device, server and medium based on business information
CN110009174B (en) Risk recognition model training method and device and server
CN109829776B (en) Merchant risk assessment method, device, computer equipment and storage medium
CN107066616B (en) Account processing method and device and electronic equipment
CN109711955B (en) Poor evaluation early warning method and system based on current order and blacklist base establishment method
CN112990386B (en) User value clustering method and device, computer equipment and storage medium
CN112132233A (en) Criminal personnel dangerous behavior prediction method and system based on effective influence factors
US11562262B2 (en) Model variable candidate generation device and method
CN112990989B (en) Value prediction model input data generation method, device, equipment and medium
CN110929525A (en) Network loan risk behavior analysis and detection method, device, equipment and storage medium
Kolodiziev et al. Automatic machine learning algorithms for fraud detection in digital payment systems
CN112035775B (en) User identification method and device based on random forest model and computer equipment
CN114202336A (en) Risk behavior monitoring method and system in financial scene
CN112288279A (en) Business risk assessment method and device based on natural language processing and linear regression
CN116821759A (en) Identification prediction method and device for category labels, processor and electronic equipment
CN112487284A (en) Bank customer portrait generation method, equipment, storage medium and device
CN116739764A (en) Transaction risk detection method, device, equipment and medium based on machine learning
CN111091276A (en) Enterprise risk scoring method and device, computer equipment and storage medium
CN110570301B (en) Risk identification method, device, equipment and medium
CN114612239A (en) Stock public opinion monitoring and wind control system based on algorithm, big data and artificial intelligence
CN113706258A (en) Product recommendation method, device, equipment and storage medium based on combined model
CN114626940A (en) Data analysis method and device and electronic equipment
CN114372867A (en) User credit verification and evaluation method and device and computer equipment
CN113094595A (en) Object recognition method, device, computer system and readable storage medium
CN112866295A (en) Block chain big data crawler-prevention processing method and cloud platform system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant