CN113034179A - User classification method, related device and equipment - Google Patents

User classification method, related device and equipment Download PDF

Info

Publication number
CN113034179A
CN113034179A CN202110276092.6A CN202110276092A CN113034179A CN 113034179 A CN113034179 A CN 113034179A CN 202110276092 A CN202110276092 A CN 202110276092A CN 113034179 A CN113034179 A CN 113034179A
Authority
CN
China
Prior art keywords
user
data
consumption data
historical
type
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110276092.6A
Other languages
Chinese (zh)
Inventor
陈友洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Huya Technology Co Ltd
Original Assignee
Guangzhou Huya Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Huya Technology Co Ltd filed Critical Guangzhou Huya Technology Co Ltd
Priority to CN202110276092.6A priority Critical patent/CN113034179A/en
Publication of CN113034179A publication Critical patent/CN113034179A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification

Abstract

The application discloses a user classification method, a related device and equipment, wherein the user classification method comprises the following steps: acquiring various types of historical consumption data of each user in a first preset time period; predicting corresponding predicted consumption data in a second preset time period based on various types of historical consumption data of various users; rejecting various types of historical consumption data of each user and predicting abnormal data in the consumption data; and clustering the historical consumption data and the predicted consumption data after the abnormal data are removed to obtain the user type of each user. According to the scheme, the users can be accurately classified, so that corresponding operation strategies can be adopted for different types of users, and the operation efficiency is improved.

Description

User classification method, related device and equipment
Technical Field
The present application relates to the field of user classification technologies, and in particular, to a user classification method, and a related apparatus and device.
Background
With the rapid development of internet technology, more and more businesses are shifting from product-centric business models to customer-centric business models. It becomes particularly important how to win and retain customers and maximize customer value.
With the rapid development of information technology and the advent of big data era, enterprises can make full use of the massive data to segment clients by means of data analysis technology, and decision quality is improved. To perform refined operations, wherein the operation strategies adopted by users of different values are different.
At present, the value of users is directly grouped according to a certain interval, and the method is rough and has the condition of inaccurate classification, thereby easily causing decision errors.
Disclosure of Invention
The application provides a user classification method, a related device and equipment, and solves the problem of inaccurate user classification in the prior art.
The application provides a user classification method, which comprises the following steps: acquiring various types of historical consumption data of each user in a first preset time period; predicting corresponding predicted consumption data in a second preset time period based on various types of historical consumption data of various users; rejecting various types of historical consumption data of each user and predicting abnormal data in the consumption data; and clustering the historical consumption data and the predicted consumption data after the abnormal data are removed to obtain the user type of each user.
The method comprises the following steps of clustering historical consumption data and predicted consumption data after abnormal data are removed to obtain user types of users, wherein the steps comprise: respectively carrying out aggregation processing on various types of historical consumption data and predicted consumption data of each user to obtain various types of aggregation data of each user; grading each type of aggregated data of each user respectively; and clustering the grade division results of the aggregation data of each type of each user to obtain the user type of each user.
The step of clustering the grade division results of the aggregation data of each type of each user to obtain the user type of each user comprises the following steps: determining the number of user types of each user based on the grading result of the aggregated data by using an elbow method; and clustering the grade division results of the aggregation data of each type of each user by using a kmeans clustering method based on the number of the user types of each user to obtain the user type of each user.
The steps of removing various types of historical consumption data of various users and predicting abnormal data in the consumption data comprise: drawing data based on the historical consumption data and the predicted consumption data to obtain box line graphs of the historical consumption data and the predicted consumption data; data in the box plot that is outside the normal range is determined as anomalous data.
The steps of removing historical consumption data and predicting abnormal data in the consumption data comprise: responding to the fact that the difference value of the threshold value of each type of abnormal data and the threshold value of the corresponding type of normal range is not larger than a preset value, and adjusting the abnormal data into data in the normal range; otherwise, rejecting abnormal data.
The method comprises the following steps of obtaining historical consumption data, wherein the historical consumption data comprise historical consumption frequency and historical consumption money, and the step of predicting corresponding predicted consumption data in a second preset time period based on various types of historical consumption data of users comprises the following steps: predicting the consumption frequency of each user in a second preset time period by utilizing a first regression model based on the historical consumption frequency of each user in a first preset time period; and predicting the consumption amount of each user in a second preset time period by using a second regression model based on the historical consumption amount of each user in the first preset time period.
The historical consumption data also comprises the interval duration of the last consumption of each user from the current time.
The present application further provides a user classification apparatus, including: the acquisition module is used for acquiring various types of historical consumption data of each user in a first preset time period; the prediction module is used for predicting corresponding predicted consumption data in a second preset time period based on various types of historical consumption data of various users; the removing module is used for removing various types of historical consumption data of various users and abnormal data in the predicted consumption data; and the clustering module is used for clustering the historical consumption data and the predicted consumption data after the abnormal data are removed to obtain the user type of each user.
The present application further provides an electronic device, which includes a memory and a processor coupled to each other, wherein the processor is configured to execute program instructions stored in the memory to implement the classification method of any one of the users.
The present application also provides a computer readable storage medium having stored thereon program instructions that, when executed by a processor, implement any of the above-described user classification methods.
According to the scheme, the preset consumption data are obtained based on the historical consumption data, so that the time dimension and the number of the reference consumption data are enlarged, the reference value of the consumption data is improved, and the accuracy of user classification is improved. Further, according to the method and the device, abnormal data in the historical consumption data and the preset consumption data are eliminated, clustering is conducted on the basis of the historical consumption data after the abnormal data are eliminated and the preset consumption data, so that the reference value of the consumption data is improved, the reliability of user classification is further improved, finally, the users are classified in a clustering mode, the user types of the users are obtained, the classification results of the users are determined, and the reference of the user classification is improved.
Drawings
FIG. 1 is a flow chart illustrating an embodiment of a method for classifying users of the present application;
FIG. 2 is a schematic flow chart diagram illustrating another embodiment of a user classification method according to the present application;
FIG. 3 is a schematic diagram of an embodiment of a boxplot of historical consumption data and predicted consumption data for each user in the embodiment of FIG. 2;
FIG. 4 is a block diagram of an embodiment of a sorting apparatus of the present application;
FIG. 5 is a block diagram of an embodiment of an electronic device of the present application;
FIG. 6 is a block diagram of an embodiment of a computer-readable storage medium of the present application.
Detailed Description
The following describes in detail the embodiments of the present application with reference to the drawings attached hereto.
In the following description, for purposes of explanation and not limitation, specific details are set forth such as particular system structures, interfaces, techniques, etc. in order to provide a thorough understanding of the present application.
The terms "system" and "network" are often used interchangeably herein. The term "and/or" herein is merely an association describing an associated object, and there may be three relationships, e.g., a and/or B, and: a exists alone, A and B exist simultaneously, and B exists alone. In addition, in this document, the character "/", generally, the former and latter related objects are in an "or" relationship. Further, herein, "more" than two or more than two.
Referring to fig. 1, fig. 1 is a flowchart illustrating a user classification method according to an embodiment of the present application. Step S11: and acquiring various types of historical consumption data of each user in a first preset time period.
And acquiring various types of historical consumption data of each user in a first preset time period. The first preset time period is a historical time period, preferably, the first preset time period is a certain time period closest to the current time, and specifically, the first preset time period may be: the time of the last month from the current time, the time of the last week from the current time, and the like, and the specific time length may be set based on the practical application, and is not limited herein.
In consideration of the timeliness of the data, in the embodiment, the historical consumption data within a certain time period closest to the current time is selected for user classification, so that the historical consumption data can accurately reflect the consumption state of the user to a certain extent, the interference of expired data is reduced, and the accuracy of user classification is improved.
The multiple types of historical consumption data may be data related to consumption behaviors of the users, such as: the interval duration between the last consumption of the user and the current time, the average value of the single consumption, the total consumption amount in the first preset time period, the consumption frequency in the first preset time period, the consumption time characteristics and the like may be specifically set based on actual requirements, and are not limited herein.
Step S12: and predicting corresponding predicted consumption data in a second preset time period based on the various types of historical consumption data of the users.
After the various types of historical consumption data of the users are obtained, the predicted consumption data corresponding to the various types of historical consumption data in the second preset time period are respectively predicted based on the various types of historical consumption data of the users. The second preset time period is a future time period, preferably, the second preset time period is a certain time period after the current time, and specifically, the second preset time period may be: a time one month in the future from the current time, a time one week in the future from the current time, and the like, and the specific time length may be set based on the actual application, which is not limited herein. In a specific application scenario, the duration of the second preset time period may be equal to the duration of the first preset time period, so as to facilitate data aggregation.
The present embodiment uses the historical consumption data and the predicted consumption data as the consumption data of each user to classify each user.
Step S13: and removing various types of historical consumption data of various users and predicting abnormal data in the consumption data.
The consumption data which do not have long-term reference value may exist in the various types of historical consumption data and the predicted consumption data of each user, for example, the user consumes a larger amount of money in a certain time, the difference with the daily consumption condition of the user is larger, and the consumption data does not appear again in the following time, so that the reference value of the consumption data is lower. Therefore, abnormal data in various types of historical consumption data and predicted consumption data of each user are removed, and the reference value of the consumption data can be guaranteed.
In a specific application scenario, the anomaly data may be data that differs from other data by more than an anomaly threshold. In another specific application scenario, the abnormal data may also be data with a value exceeding a normal range. The specific method for determining the abnormal data is not limited herein. The specific threshold of the abnormal threshold and the upper and lower limits of the normal range may be set based on actual requirements, and are not limited herein.
In a specific application scenario, abnormal data in the historical consumption data can be removed, and then corresponding predicted consumption data can be predicted based on the historical consumption data with the abnormal data removed.
Step S14: and clustering the historical consumption data and the predicted consumption data after the abnormal data are removed to obtain the user type of each user.
And clustering the historical consumption data and the predicted consumption data after the abnormal data are removed to obtain the user type of each user.
In a specific application scene, clustering is carried out on historical consumption data and predicted consumption data after abnormal data are removed, a plurality of clustering categories are obtained, and user types of various users are determined based on the clustering categories and the number of the clustering categories, so that the users are classified. The specific user type of the user may be set based on actual classification requirements, which is not limited herein. In a specific application scenario, the clustering method may include a K-means clustering method, an HAC hierarchical clustering method, a maximum-minimum distance clustering algorithm, and the like, and is not limited herein.
By the method, the user classification method of the embodiment obtains multiple types of historical consumption data of each user in a first preset time period, predicts the corresponding predicted consumption data in a second preset time period based on the historical consumption data of each user, further eliminates the historical consumption data of each type of each user and abnormal data in the predicted consumption data, and then clusters the historical consumption data and the predicted consumption data after the abnormal data are eliminated to obtain the user type of each user. According to the embodiment, the preset consumption data are obtained based on the historical consumption data, so that the time dimension and the number of the reference consumption data are enlarged, the reference value of the consumption data is improved, and the accuracy of user classification is improved. Further, the embodiment also eliminates abnormal data in the historical consumption data and the preset consumption data, and then clusters the historical consumption data and the preset consumption data based on the abnormal data eliminated, so that the reference value of the consumption data is improved, the reliability of user classification is further improved, finally, the users are classified in a clustering mode, the user types of the users are obtained, the classification results of the users are determined, and the reference of the user classification is improved.
Referring to fig. 2, fig. 2 is a flowchart illustrating another embodiment of a user classification method according to the present application.
Step S21: the method comprises the steps of obtaining multiple types of historical consumption data of each user in a first preset time period, wherein the historical consumption data comprise historical consumption frequency, historical consumption amount and interval duration of each user from the current time of the latest consumption.
The method comprises the steps of obtaining multiple types of historical consumption data of each user in a first preset time period, wherein the multiple types of historical consumption data comprise historical consumption frequency, historical consumption amount and interval duration of each user from the current time of the latest consumption.
In a specific application scenario, the consumption frequency, the consumption amount and the interval duration between the last consumption and the current time of each user in the last month can be acquired, so that the users are classified based on the user data. The interval duration between the last consumption of the user and the current time can be used for describing the current consumption loyalty of the user, the consumption frequency of the user can be used for representing the consumption activity of the user, and the consumption amount of the user can be used for representing the consumption capacity of the user. Therefore, the consumption state of the user can be more comprehensively reflected by the 3 types of consumption data, and further, each user can be accurately classified according to the 3 types of consumption data.
Step S22: predicting the consumption frequency of each user in a second preset time period by utilizing a first regression model based on the historical consumption frequency of each user in a first preset time period; and predicting the consumption amount of each user in a second preset time period by using a second regression model based on the historical consumption amount of each user in the first preset time period.
After the historical consumption frequency, the historical consumption amount and the interval duration of the last consumption of each user from the current time are obtained, the predicted consumption frequency of each user in a second preset time period can be obtained through prediction of the first regression model based on the historical consumption frequency of each user in a first preset time period, and the predicted consumption amount of each user in the second preset time period can be obtained through prediction of the second regression model based on the historical consumption amount of each user in the first preset time period. And then obtaining predicted consumption data, wherein the predicted consumption data comprises predicted consumption frequency and predicted consumption amount in the embodiment.
The second preset time period is a future time period different from the past first preset time period. The first regression model and the second regression model are prediction models obtained after training based on corresponding training data, specifically may be linear regression models, polynomial regression models, and the like, and may be set based on practical applications, which is not limited herein.
In other embodiments, the interval duration between the user's future initial consumption and the current time can also be predicted through a corresponding regression model based on the interval duration between the user's latest consumption and the current time. In the application scenario, the predicted consumption data includes a predicted consumption frequency, a predicted consumption amount and an interval duration between the future initial consumption and the current time.
Step S23: and drawing data based on the historical consumption data and the predicted consumption data to obtain box line graphs of the historical consumption data and the predicted consumption data, and determining the data exceeding the normal range in the box line graphs as abnormal data.
And performing data drawing on the historical consumption data and the predicted consumption data so as to respectively draw the historical consumption data and the predicted consumption data of each type of each user on a coordinate axis, thereby obtaining the box line graph of each type of the historical consumption data and the predicted consumption data of each user.
In a specific application scenario, a box line graph corresponding to the consumption amount of each user can be drawn according to the historical consumption amount and the predicted consumption amount of each user. In a specific application scenario, a box line graph corresponding to the consumption amount of each user can be drawn according to the historical consumption frequency and the predicted consumption frequency of each user. In another specific application scenario, a box line graph corresponding to the interval duration of each user may also be drawn according to the interval duration of the last consumption of each user from the current time. Thereby respectively determining abnormal data among different types of consumption data.
Referring to fig. 3, fig. 3 is a diagram illustrating an embodiment of a boxline graph of historical consumption data and predicted consumption data of each user in the embodiment of fig. 2.
In this embodiment, a box plot is obtained by plotting a plurality of corresponding data points on a coordinate axis based on the historical consumption frequency and the predicted consumption frequency of each user.
Outliers 11 in the box plot are identified. Specifically, data points in the box plot that are greater than the upper bound (Qu +1.5IQR) or less than the lower bound (Ql-1.5IQR) may be taken as outliers 11 in the box plot, that is, the consumption frequency corresponding to the outliers 11 is determined as abnormal data, where IQR is the difference between the upper quarter data Qu in the box plot and the lower quarter data Ql in the box plot, and includes half of data, and has certain robustness and is not affected by abnormal data. And the upper quarter data Qu and the lower quarter data Ql in the box line diagram are obtained by sorting all data points in the box line diagram from small to large according to the numerical value, and then respectively taking the numerical values at the positions of 75% and 25%.
The method for drawing the boxplot of the consumption amount of each user and the interval duration between the last consumption and the current time and determining the outlier of the boxplot are the same as the method for drawing the boxplot corresponding to the consumption frequency and determining the outlier of the boxplot in the above embodiment, and please refer to the foregoing, which is not described herein again.
Step S24: and in response to the fact that the difference value between the abnormal data of each type and the threshold value of the normal range of the corresponding type is not larger than a preset value, adjusting the abnormal data into the data in the normal range, and otherwise, rejecting the abnormal data.
After historical consumption data of each user and abnormal data in the predicted consumption data are determined through the box line graph, responding to the fact that the difference value between the abnormal data of each type and the threshold value of the normal range of the corresponding type is not larger than a preset value, adjusting the abnormal data to be data in the normal range, and otherwise, rejecting the abnormal data. Wherein, the specific value of the threshold is the upper and lower limits of the normal range, for example: the threshold values are 0 and 10 when the normal range is 0-10. The preset value may be set based on practical applications, and is not limited herein.
In one specific application scenario, the normal range may be a data range in the boxplot that is less than the upper bound (Qu +1.5IQR) and greater than the lower bound (Ql-1.5IQR), and the threshold is a numerical value of the upper and lower bounds.
In a specific application scenario, when the normal range is 2-10, the threshold values are 2 and 10, and the preset value is 3, the abnormal data 16 is rejected because the difference between the abnormal data 16 and the threshold value 10 of the normal range exceeds the preset value 3. The abnormal data 12 is adjusted to the value 10 closest to the abnormal data 12 in the normal range by the capping method because the difference between the abnormal data 12 and the threshold value 10 of the normal range does not exceed the preset value 3.
Step S25: and respectively carrying out aggregation processing on various types of historical consumption data and predicted consumption data of each user to obtain various types of aggregation data of each user, respectively carrying out grade division on various types of aggregation data of each user, and clustering grade division results of various types of aggregation data of each user to obtain the user type of each user.
And respectively carrying out aggregation processing on various types of historical consumption data and predicted consumption data of each user after the abnormal data are removed to obtain various types of aggregation data of each user. The aggregation process may be obtained by adding the historical consumption data and the predicted consumption data of each user based on a certain weight. The certain weight may be set based on actual application, and is not limited herein. For example: 1:1, 1:2, etc.
In a specific application scenario, when the historical consumption frequency and the predicted consumption frequency are aggregated, the consumption times in the historical consumption frequency and the consumption times in the predicted consumption frequency are added according to a certain weight, the duration of a first preset time period corresponding to the historical consumption frequency and the duration of a second preset time period corresponding to the predicted consumption frequency are added according to a certain weight, and finally the aggregated result of the historical consumption frequency and the predicted consumption frequency can be obtained by dividing the added consumption times by the added duration. In a specific application scenario, when the historical consumption amount and the predicted consumption amount are aggregated, the historical consumption amount and the predicted consumption amount are added according to a certain weight, the duration of a first preset time period corresponding to the historical consumption amount and the duration of a second preset time period corresponding to the predicted consumption amount are added according to a certain weight, and finally the aggregated result of the historical consumption amount and the predicted consumption amount can be obtained by dividing the added consumption amount by the added duration.
In a specific application scenario, when a certain weight is 1:1, the historical consumption frequency of a certain user is 10 times per month, the historical consumption amount is 1000 yuan, the predicted consumption frequency is 12 times per month, and the predicted consumption amount is 1200 yuan, wherein a first preset time period is one month, and a second preset time period is one month, the consumption frequency in the aggregated data of the user is (10+12) times/(1 +1) months which is 11 times per month, and the consumption amount is (1000+1200) yuan/(1 +1) months which is 1100 yuan after the aggregated data of the user is aggregated.
After obtaining the aggregation data of each type of each user, the aggregation data of each type of each user is graded respectively to obtain high-grade aggregation data and low-grade aggregation data. In a specific application scenario, after 3 types of aggregation data of each user are obtained, a corresponding type threshold is set for each type of aggregation data, and the aggregation data of each type are graded by comparing the aggregation data with the corresponding type threshold.
In a specific application scenario, when the consumption amount aggregated by a certain user is 1100 yuan, the consumption frequency is 5 times/month, the interval duration between the latest consumption and the current time is 10 days, the consumption amount threshold is 1000 yuan, the consumption frequency threshold is 6 times/month, and the interval duration is 5 days, the consumption amount of the user is in a high level, the consumption frequency is in a low level, and the interval duration between the latest consumption and the current time is in a low level.
And after grading each aggregated data, clustering grading results of each type of aggregated data of each user. Specifically, the number of user types is determined by using an elbow method based on the ranking results of the aggregated data, and then the ranking results of each type of aggregated data of each user are clustered by using a kmeans clustering method based on the number of the user types, so that the clustering results of each user are obtained.
In this embodiment, since 3 types of aggregated data are fixed and the ranking result is fixed, the number of clustering centers can be determined to be 8 by an elbow method, and the aggregated data is clustered into 8 categories by using a kmeans clustering method, so that the user type of each user is determined based on the category in which the aggregated data of each user is located.
Referring to table 1, table 1 is a corresponding diagram of user types and ratings in the embodiment of fig. 2.
TABLE 1
Figure BDA0002976691800000101
Figure BDA0002976691800000111
In the embodiment, based on the level of each type of clustering data of the user, the clustering data is divided into 8 user types including important value users, important call-back users, important deep ploughing users, important saving users, general maintenance users, potential users, users prone to loss and new users. Therefore, the user type to which the user belongs can be determined according to the grading result of the aggregated data of the user, and then, the corresponding operation strategy can be manually formulated for various users based on the user types of the various users, so that the operation effect and efficiency are improved.
Through the method, the classification method of the users determines the historical consumption data and predicts the abnormal data in the consumption data in a boxplot mode, and adjusts the abnormal data into the data in the normal range in response to the fact that the difference value between the abnormal data of each type and the threshold value of the normal range of the corresponding type is not larger than the preset value, otherwise, eliminates the abnormal data, thereby reducing the influence of the abnormal data on the classification of the users, ensuring the integrity of the data to a certain extent, and improving the reliability of the classification result of the users. The embodiment also aggregates the historical consumption data and the predicted consumption data of each user after the abnormal data are removed, grades the aggregated data of each type of each user respectively, and clusters the grade division results of the aggregated data of each type of each user to obtain the user type of each user, so that the consumption conditions among different users are distinguished by respectively grading the aggregated data of each type, the type of the user is determined, and the accuracy of user classification is improved.
In the embodiment, the value of the user is divided by using a clustering method of grade division, so that a reliable basis is provided for the decision of refined operation. The different user types classified by the embodiment can quantify the behavior feature difference and the basic attribute difference of different users, so that the user can be deeply embodied with pictures.
Referring to fig. 4, fig. 4 is a schematic diagram of a framework of an embodiment of a classification device of a user of the present application. The user classification device 40 comprises an acquisition module 41, a prediction module 42, a culling module 43 and a clustering module 44. The obtaining module 41 is configured to obtain multiple types of historical consumption data of each user in a first preset time period; the prediction module 42 is configured to predict, based on each type of historical consumption data of each user, corresponding predicted consumption data within a second preset time period; a removing module 43, configured to remove various types of historical consumption data of each user and abnormal data in the predicted consumption data; and the clustering module 44 is configured to cluster the historical consumption data and the predicted consumption data after the abnormal data is removed, so as to obtain the user type of each user.
The clustering module 44 is further configured to perform aggregation processing on various types of historical consumption data and predicted consumption data of each user, respectively, to obtain various types of aggregated data of each user; grading each type of aggregated data of each user respectively; and clustering the grade division results of the aggregation data of each type of each user to obtain the user type of each user.
The clustering module 44 is further configured to determine the number of user types of each user for the historical consumption data and the predicted consumption data after the abnormal data are removed by using an elbow method; and clustering the grade division results of the aggregation data of each type of each user by using a kmeans clustering method based on the number of the user types of each user to obtain the user type of each user.
The eliminating module 43 is further configured to perform data drawing based on the historical consumption data and the predicted consumption data to obtain box line graphs of the historical consumption data and the predicted consumption data; data in the box plot that is outside the normal range is determined as anomalous data.
The eliminating module 43 is further configured to adjust the abnormal data to data within the normal range in response to that a difference between the abnormal data of each type and the threshold of the normal range of the corresponding type is not greater than a preset value; otherwise, rejecting abnormal data.
The prediction module 42 is further configured to predict, by using the first regression model, the consumption frequency predicted by each user in a second preset time period based on the historical consumption frequency of each user in the first preset time period; and predicting the consumption amount of each user in a second preset time period by using a second regression model based on the historical consumption amount of each user in the first preset time period.
According to the scheme, the users can be accurately classified, so that corresponding operation strategies can be adopted for different types of users, and the operation efficiency is improved.
Referring to fig. 5, fig. 5 is a schematic diagram of a frame of an embodiment of an electronic device according to the present application. The electronic device 50 comprises a memory 51 and a processor 52 coupled to each other, the processor 52 being configured to execute program instructions stored in the memory 51 to implement the steps of any of the embodiments of the user classification method described above. In one particular implementation scenario, electronic device 50 may include, but is not limited to: a microcomputer, a server, and the electronic device 50 may also include a mobile device such as a notebook computer, a tablet computer, and the like, which is not limited herein.
In particular, the processor 52 is adapted to control itself and the memory 51 to implement the steps of any of the above-described embodiments of the user's classification method. Processor 52 may also be referred to as a CPU (Central Processing Unit). Processor 52 may be an integrated circuit chip having signal processing capabilities. The Processor 52 may also be a general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. In addition, the processor 52 may be commonly implemented by an integrated circuit chip.
According to the scheme, the users can be accurately classified, so that corresponding operation strategies can be adopted for different types of users, and the operation efficiency is improved.
Referring to fig. 6, fig. 6 is a block diagram illustrating an embodiment of a computer-readable storage medium according to the present application. The computer readable storage medium 60 stores program instructions 601 capable of being executed by a processor, the program instructions 601 for implementing the steps of any of the user classification method embodiments described above.
By the scheme, the mixed arrangement effect of the characters and the dynamic pictures can be improved, and the dynamic pictures after mixed arrangement can be dynamically played.
In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a module or a unit is merely one type of logical division, and an actual implementation may have another division, for example, a unit or a component may be combined or integrated with another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some interfaces, and may be in an electrical, mechanical or other form.
Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on network elements. Some or all of the units can be selected according to actual needs to achieve the purpose of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Claims (10)

1. A user classification method is characterized by comprising the following steps:
acquiring various types of historical consumption data of each user in a first preset time period;
predicting corresponding predicted consumption data in a second preset time period based on the historical consumption data of each type of each user;
rejecting abnormal data in the historical consumption data and the predicted consumption data of each type of each user;
and clustering the historical consumption data and the predicted consumption data after the abnormal data are removed to obtain the user type of each user.
2. The method for classifying users according to claim 1, wherein the step of clustering the historical consumption data and the predicted consumption data after removing abnormal data to obtain the user type of each user comprises:
respectively carrying out aggregation processing on various types of historical consumption data and predicted consumption data of each user to obtain various types of aggregation data of each user;
grading the aggregated data of each type of each user respectively;
and clustering the grade division results of the aggregation data of each type of each user to obtain the user type of each user.
3. The method according to claim 2, wherein the step of clustering the ranking results of the aggregated data of each type of each user to obtain the user type of each user comprises:
determining the number of user types of each user based on the grading result of the aggregated data by using an elbow method;
and clustering the grade division results of the aggregation data of each type of each user by using a kmeans clustering method based on the number of the user types of each user to obtain the user type of each user.
4. The method for classifying users according to claim 1, wherein the step of culling abnormal data in the types of the historical consumption data and the predicted consumption data of the users comprises:
performing data drawing based on the historical consumption data and the predicted consumption data to obtain box line graphs of the historical consumption data and the predicted consumption data;
and determining the data beyond the normal range in the box line graph as the abnormal data.
5. The method for classifying users according to claim 1, wherein the step of eliminating abnormal data in the historical consumption data and the predicted consumption data comprises:
responding to the fact that the difference value between the abnormal data of each type and the threshold value of the normal range of the corresponding type is not larger than a preset value, and adjusting the abnormal data into the data in the normal range; otherwise, rejecting the abnormal data.
6. The method for classifying users according to any one of claims 1 to 5, wherein the historical consumption data includes a historical consumption frequency and a historical consumption amount,
the step of predicting corresponding predicted consumption data within a second preset time period based on the various types of historical consumption data of the users comprises:
predicting the consumption frequency of each user in the second preset time period by using a first regression model based on the historical consumption frequency of each user in the first preset time period;
and predicting the consumption amount of each user in the second preset time period by using a second regression model based on the historical consumption amount of each user in the first preset time period.
7. The method of claim 6, wherein the historical consumption data further comprises a time interval between the last consumption of each of the users and the current time.
8. A user classification apparatus, comprising:
the acquisition module is used for acquiring various types of historical consumption data of each user in a first preset time period;
the prediction module is used for predicting corresponding predicted consumption data in a second preset time period based on the historical consumption data of each type of each user;
the removing module is used for removing the historical consumption data of each type of each user and abnormal data in the predicted consumption data;
and the clustering module is used for clustering the historical consumption data and the predicted consumption data after the abnormal data are removed to obtain the user type of each user.
9. An electronic device comprising a memory and a processor coupled to each other, the processor being configured to execute program instructions stored in the memory to implement the method of classifying a user according to any one of claims 1 to 7.
10. A computer-readable storage medium, on which program instructions are stored, which program instructions, when executed by a processor, implement the method of classifying a user according to any one of claims 1 to 7.
CN202110276092.6A 2021-03-15 2021-03-15 User classification method, related device and equipment Pending CN113034179A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110276092.6A CN113034179A (en) 2021-03-15 2021-03-15 User classification method, related device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110276092.6A CN113034179A (en) 2021-03-15 2021-03-15 User classification method, related device and equipment

Publications (1)

Publication Number Publication Date
CN113034179A true CN113034179A (en) 2021-06-25

Family

ID=76468802

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110276092.6A Pending CN113034179A (en) 2021-03-15 2021-03-15 User classification method, related device and equipment

Country Status (1)

Country Link
CN (1) CN113034179A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104951446A (en) * 2014-03-25 2015-09-30 阿里巴巴集团控股有限公司 Big data processing method and platform
CN109993582A (en) * 2019-04-01 2019-07-09 东北大学 A kind of multi objective customer segmentation method based on RFMCA model
CN110135876A (en) * 2018-02-09 2019-08-16 北京京东尚科信息技术有限公司 The method and device of Method for Sales Forecast
CN110689355A (en) * 2019-09-03 2020-01-14 浙江数链科技有限公司 Client classification method, device, computer equipment and storage medium
CN111784403A (en) * 2020-07-08 2020-10-16 广州市景心科技股份有限公司 User category analysis method and device based on online shopping mall and computer equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104951446A (en) * 2014-03-25 2015-09-30 阿里巴巴集团控股有限公司 Big data processing method and platform
CN110135876A (en) * 2018-02-09 2019-08-16 北京京东尚科信息技术有限公司 The method and device of Method for Sales Forecast
CN109993582A (en) * 2019-04-01 2019-07-09 东北大学 A kind of multi objective customer segmentation method based on RFMCA model
CN110689355A (en) * 2019-09-03 2020-01-14 浙江数链科技有限公司 Client classification method, device, computer equipment and storage medium
CN111784403A (en) * 2020-07-08 2020-10-16 广州市景心科技股份有限公司 User category analysis method and device based on online shopping mall and computer equipment

Similar Documents

Publication Publication Date Title
US10216558B1 (en) Predicting drive failures
CN111784508A (en) Enterprise risk assessment method and device and electronic equipment
CN110493065B (en) Alarm correlation degree analysis method and system for cloud center operation and maintenance
CN111401777B (en) Enterprise risk assessment method, enterprise risk assessment device, terminal equipment and storage medium
KR20160121806A (en) Determining a temporary transaction limit
US20220036259A1 (en) Application capacity forecasting
CN108390793A (en) A kind of method and device of analysis system stability
WO2021236344A1 (en) Time series forecasting
WO2019085754A1 (en) Application cleaning method and apparatus, and storage medium and electronic device
CN115794578A (en) Data management method, device, equipment and medium for power system
CN114202256A (en) Architecture upgrading early warning method and device, intelligent terminal and readable storage medium
WO2019062404A1 (en) Application program processing method and apparatus, storage medium, and electronic device
CN112416590A (en) Server system resource adjusting method and device, computer equipment and storage medium
CN113034179A (en) User classification method, related device and equipment
CN113761082A (en) Data visualization method, device and system
CN114999665A (en) Data processing method and device, electronic equipment and storage medium
CN110246026B (en) Data transfer output combination setting method and device and terminal equipment
CN114610561A (en) System monitoring method, device, electronic equipment and computer readable storage medium
CN108109002B (en) Data processing method and device
WO2019085742A1 (en) Background application cleaning method and apparatus, and storage medium and electronic device
CN111461542A (en) Event statistical method and device
WO2018220685A1 (en) Stock price analysis device
CN111737281B (en) Database query method, device, electronic equipment and readable storage medium
CN113469374B (en) Data prediction method, device, equipment and medium
CN112631892B (en) Method, computing device, and computer medium for predicting server health status

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination