CN109740685B - User loss characteristic analysis method, prediction method, device, equipment and medium - Google Patents

User loss characteristic analysis method, prediction method, device, equipment and medium Download PDF

Info

Publication number
CN109740685B
CN109740685B CN201910018617.9A CN201910018617A CN109740685B CN 109740685 B CN109740685 B CN 109740685B CN 201910018617 A CN201910018617 A CN 201910018617A CN 109740685 B CN109740685 B CN 109740685B
Authority
CN
China
Prior art keywords
user
users
binary selection
selection tree
lost
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910018617.9A
Other languages
Chinese (zh)
Other versions
CN109740685A (en
Inventor
肖源
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Douyu Network Technology Co Ltd
Original Assignee
Wuhan Douyu Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Douyu Network Technology Co Ltd filed Critical Wuhan Douyu Network Technology Co Ltd
Priority to CN201910018617.9A priority Critical patent/CN109740685B/en
Publication of CN109740685A publication Critical patent/CN109740685A/en
Application granted granted Critical
Publication of CN109740685B publication Critical patent/CN109740685B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention provides a method for analyzing user loss characteristics, which comprises the following steps: acquiring user information of lost users and users not lost, and extracting m user characteristics; randomly selecting n user characteristics from m user characteristics to obtain
Figure DDA0001939297410000011
A feature combination; aiming at each feature combination, calculating a decisive value of each user feature in the feature combination to obtain n decisive values, and constructing the n user features into a binary selection tree model according to the n decisive values; authentication
Figure DDA0001939297410000012
Obtaining X binary selection tree models with top ranking of user loss accuracy; and counting the user characteristics in the X binary selection tree models to obtain Y user characteristics with the top number ranking as user loss characteristics. The invention provides a user loss prediction method based on a binary selection tree model, which can screen out main factors of user loss and quickly judge whether a user has loss risk. The invention provides a device, an apparatus and a medium.

Description

User loss characteristic analysis method, prediction method, device, equipment and medium
Technical Field
The invention relates to the field of internet live broadcast, in particular to a user loss feature analysis method, a user loss prediction method, a user loss feature prediction device, user loss equipment and a user loss feature prediction medium.
Background
With the development of the internet, live broadcasting becomes more and more popular. For each live broadcast platform, daily activity (number of active users per day) is a key index concerned by the platform, and is concerned with the growth and decline of the whole live broadcast platform, and high daily activity can bring more user consumption and advertising revenue, which is also a main traffic emerging channel of the current live broadcast platform.
Along with the introduction of various new entertainment platforms, the entertainment categories of users are gradually changed, so that the phenomenon of user loss occurs all the time. Because the total time of the users participating in the entertainment is constant, after the users are converted into users of other platforms, economic losses can be brought to the current platform. Thus, in addition to using an ongoing entertainment approach to increase user stickiness, the platform should also be able to summarize the main causes of user churn.
At present, the main mode for analyzing the loss of the user is a questionnaire survey mode, and the analysis effect of the mode is not obvious because the user can not pay attention to the dynamic state of the platform after the loss, so that a model is needed to analyze the main reason of the loss of the user and predict whether the user will lose or not, so that the user saving work can be made in advance, and the daily living capacity stability of the platform can be guaranteed.
Disclosure of Invention
Technical problem to be solved
Aiming at the technical problems existing at present, the invention provides a user loss feature analysis method, a prediction method, a device, equipment and a medium, which are used for at least partially solving the technical problems.
(II) technical scheme
One aspect of the present invention provides a method for analyzing user churn characteristics, including: acquiring user information of lost users and users not lost in a training sample, and extracting m user characteristics in the user information; randomly selecting n user characteristics from m user characteristics to obtain
Figure BDA0001939297390000024
A characteristic combination, wherein n is more than or equal to m/2; to the above
Figure BDA0001939297390000025
Each feature combination in the feature combinations calculates a determinant value of each user feature in the feature combinations to obtain n determinant values, and constructs n user features into a binary selection tree model according to the n determinant values; respectively inputting user information samples into
Figure BDA0001939297390000026
Obtaining X binary selection tree models with top ranking of user loss accuracy from the binary selection tree models; and counting the user characteristics in the X binary selection tree models to obtain Y user characteristics with the top number ranking as user loss characteristics.
Optionally, calculating a determinant value of each user feature in the feature combination to obtain n determinant values, including: aiming at a user characteristic A in a characteristic combination, dividing a training sample into k different subsets { s ] according to k different values of the user characteristic A1,s2,s3,si,...sk}; counting the number of users in each subset, the number of lost users and the number of non-lost users corresponding to the user characteristics A with different values; by passing
Figure BDA0001939297390000021
Calculating a determinant value of the user characteristic A, wherein TAIs the decisive value of the user characteristic A, | s | is the total number of users of the training sample, | siI is the number of users in the ith subset, sijThe number of users in the ith subset that have not been drained, siqFor the i subset the number of users, p, is lost0Frequency of occurrence of user not lost in training sample, p1The frequency of occurrence of the lost user in the training sample; by adopting the method, the determinant values of other user characteristics are calculated to obtain the n determinant values.
Optionally, in the binary selection tree model, the user feature with the largest determinant value is located at a topmost node of the binary selection tree model, the user feature with the smallest determinant value is located at a bottommost node of the binary selection tree model, and the other user features are sequentially arranged at nodes between the topmost node and the bottommost node from top to bottom according to a descending order of their determinant values.
Optionally, the user information samples are input separately to
Figure BDA0001939297390000022
Obtaining X binary selection tree models with top ranking of user loss accuracy from the binary selection tree models, wherein the X binary selection tree models comprise: for aiming at
Figure BDA0001939297390000023
Inputting the user characteristics of each user in the user information sample into the binary selection tree model, and outputting the result of whether each user runs off; comparing the output result of whether the user runs off with the output result of whether the user runs off, and respectively counting the number of the users of which the output result of whether the user runs off and the output result of whether the user runs off are the same as the output result of whether the user runs off; and ranking the number of the users to obtain X binary selection tree models with the top ranking accuracy.
Optionally, the m user features include: at least one of age, sex, whether married, whether car is present, whether child is present, working age, annual income, number of days to become members, consumption for appreciation in a certain period of time, charge in a certain period of time, number of anchor broadcasts concerned, number of days of broadcast of anchor broadcasts concerned in a certain period of time, number of barracks in a certain period of time, and whether lost.
Another aspect of the present invention provides a method for predicting user churn, including: acquiring user information of lost users and users not lost in a training sample, and extracting m user characteristics in the user information; randomly selecting n user characteristics from m user characteristics to obtain
Figure BDA0001939297390000031
The characteristic combination is that n is more than or equal to m/2; to the above
Figure BDA0001939297390000032
Each feature combination in the feature combinations calculates a determinant value of each user feature in the feature combinations to obtain n determinant values, and constructs n user features into a binary selection tree model according to the n determinant values; respectively inputting user information samples into
Figure BDA0001939297390000033
Obtaining X binary selection tree models with top ranking of user loss accuracy from the binary selection tree models; and respectively inputting the user characteristics of the user to be tested into the X binary selection tree models, wherein each binary selection tree model can display whether the user runs off, if the number of the binary selection tree models which show that the user does not run off is larger than the number of the binary selection tree models which show that the user runs off in the X binary selection tree models, the user to be tested is predicted not to run off, and otherwise, the user to be tested is predicted to run off.
Optionally, calculating a determinant value of each user feature in the feature combination to obtain n determinant values, including: aiming at a user characteristic A in a characteristic combination, dividing a training sample into k different subsets { s ] according to k different values of the user characteristic A1,s2,s3,si,...sk}; counting the number of users in each subset, the number of lost users and the number of non-lost users corresponding to the user characteristics A with different values; by passing
Figure BDA0001939297390000034
Calculating a determinant value of the user characteristic A, wherein TAIs the decisive value of the user characteristic A, | s | is the total number of users of the training sample, | siI is the number of users in the ith subset, sijThe number of users in the ith subset that have not been drained, siqFor the i subset the number of users, p, is lost0Frequency of occurrence of user not lost in training sample, p1The frequency of occurrence of the lost user in the training sample; by adopting the method, the decisive values of other user characteristics are calculated to obtain n decisive values.
Another aspect of the present invention provides an apparatus for analyzing user churn characteristics, the apparatus comprising: the characteristic acquisition module is used for acquiring user information of lost users and users not lost in the training sample and extracting m user characteristics in the user information; a feature combination generation module for randomly selecting n user features from the m user features to obtain
Figure BDA0001939297390000041
A characteristic combination, wherein n is more than or equal to m/2; a model building module for aiming at the above
Figure BDA0001939297390000042
Each feature combination in the feature combinations calculates a determinant value of each user feature in the feature combinations to obtain n determinant values, and constructs n user features into a binary selection tree model according to the n determinant values; a verification module for inputting user information samples into the data processing module
Figure BDA0001939297390000043
Obtaining X binary selection tree models with top ranking of user loss accuracy from the binary selection tree models; and the counting module is used for counting the user characteristics in the X binary selection tree models to obtain Y user characteristics with the top number ranking as user loss characteristics.
Another aspect of the present invention also provides an electronic device, including: a processor; a memory storing a computer executable program which, when executed by the processor, causes the processor to execute a user churn feature analysis method and a user churn prediction method of the present invention.
In another aspect, the present invention further provides a computer readable medium, on which a computer program is stored, which when executed by a processor implements the user churn feature analysis method and the user churn prediction method of the present invention.
(III) advantageous effects
The invention provides a user loss characteristic analysis method and a user loss prediction method, which are characterized in that user characteristics are decomposed and combined to form characteristic combinations, a binary selection tree model formed by user characteristics is established according to the size of a determinative value of each user characteristic in each characteristic combination, and the binary selection tree model established in the way can screen out main factors of user loss and realize quick judgment on whether a user has loss risks, so that a platform can well perform prevention work in advance and ensure the daily activity stability of the platform.
Drawings
For a more complete understanding of the present invention, and the advantages thereof, reference is now made to the following descriptions taken in conjunction with the accompanying drawings, in which:
fig. 1 schematically shows a flow chart of a user churn feature analysis method according to an embodiment of the present invention.
FIG. 2 schematically shows a structural diagram of a binary selection tree model according to an embodiment of the present invention.
Fig. 3 schematically shows a flow chart of a user churn prediction method according to an embodiment of the present invention.
Fig. 4 schematically shows a block diagram of a user churn feature analysis apparatus according to an embodiment of the present invention.
Fig. 5 schematically shows a block diagram of an electronic device according to an embodiment of the invention.
Detailed Description
Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings. It is to be understood that such description is merely illustrative and not intended to limit the scope of the present invention. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the invention. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present invention.
The invention provides a user loss characteristic analysis method and a user loss prediction method, which can be used for screening out main characteristics (factors) of user loss and realizing rapid judgment of whether the user has loss risks by selectively decomposing and combining characteristics of user information and training a tree decision model. The tree decision is a basic classification and regression method, a model capable of predicting and classifying a new sample is obtained by training a known sample, and the model is very consistent with a thinking mode that people judge the object type, for example, a student is judged to be a good student or not, firstly, the goodness of the student is not good, and if the goodness is not good, the model can be directly denied; if the result is good, the user can not be determined whether the user is a good student or a good student, the user needs to judge the learning of the user, and if the result is good, the user needs to judge the sports of the user, so that a judgment process forms a decision tree, and the decision tree model is a process for generating the decision tree through known data. Therefore, the invention establishes a tree decision model through the user characteristics contained in the platform user information based on the decision tree to analyze the main characteristics (factors) of user loss and predict whether the user loss of the current platform is lost in the future.
The embodiment of the invention provides an analysis method for user loss characteristics.
Fig. 1 schematically shows a flow chart of a user churn feature analysis method according to an embodiment of the present invention.
As shown in fig. 1, the method comprises:
s101, obtaining user information of lost users and users not lost in the training sample, and extracting m user characteristics in the user information.
In order to construct a decision tree model, training sample data needs to be selected first, and therefore 200 universal users of the platform are selected as training samples S, wherein the ratio of the number of lost users to the number of non-lost users is randomly selected, and the specific number of training samples, the number of lost users and the number of non-lost users can be selected according to actual requirements without limitation.
Whether the lost user or the non-lost user, some information is retained in the process of platform registration and entertainment participation, the information can be used as a measure for the user characteristics of a user, the user can be analyzed based on the user characteristics, and the general user information mainly comprises the user characteristics as shown in table 1:
Figure BDA0001939297390000061
Figure BDA0001939297390000071
TABLE 1
Therefore, the user characteristics listed in table 1 of each of 200 ten thousand users need to be obtained, where uid is the unique identifier of the user, and does not participate in the characteristic comparison in the subsequent decision tree model building process. The user information includes user characteristics not limited to those listed in table 1, and in another embodiment of the present invention, other user characteristics, such as "whether there is room" or not, may be obtained.
S102, randomly selecting n user characteristics from m user characteristics to obtain
Figure BDA0001939297390000072
A combination of characteristics, wherein n is more than or equal to m/2.
And decomposing and combining the acquired m user characteristics to obtain a characteristic combination. And screening the user characteristics by adopting a permutation and combination mode, wherein n user characteristics are screened each time to form a characteristic combination, wherein n is more than or equal to m/2. The total number of feature combinations in the m user features is:
Figure BDA0001939297390000073
when the obtained user characteristics are as shown in table 1, that is, 14 user characteristics are selected per user information. If each feature combination contains 7 user features, the total number of feature combinations is:
Figure BDA0001939297390000074
the number of the user features in each feature combination is not limited to 7, and is determined according to actual requirements, and the present invention is not limited thereto, and in another embodiment of the present invention, the number of the user features in the feature combination may be selected from 5, 6, 8, or 9, and the like.
S103, aiming at the above
Figure BDA0001939297390000075
Each of the feature combinations, and a block for each of the user features in the feature combinations is calculatedAnd (5) obtaining n decisive values, and constructing the n user features into a binary selection tree model according to the n decisive values.
After the user features in the acquired user information are decomposed and combined to obtain feature combinations, model training is performed on each feature combination in a probability selection tree mode to generate a corresponding binary selection tree model, namely, each user feature in the feature combinations forms a tree structure from top to bottom. The confirmation of the tree structure relationship is determined according to the decisive value of each user characteristic. The decisive value refers to a numerical value calculated according to the number relation of users corresponding to different attributes of different user characteristics in each subset after dividing the sample into a plurality of subsets according to the value of each user characteristic attribute. The calculation of the certainty value will be specifically described below by taking a feature combination of 7 user features, i.e., age, seq, marry, car, children, work _ life, and income, as an example.
Firstly, aiming at the A user characteristics in a characteristic combination, dividing a training sample into k different subsets { s ] according to k different values of the user characteristics1,s2,s3,si,…sk}. For example, the age of a user who generally watches live broadcast may be from 20 to 40, if the age is an age group every year, the user feature age includes 21 different values, the training sample is divided into 21 different subsets, and if the age is 30 years, the user feature age includes two values greater than 30 and less than or equal to 30, the training sample is divided into 2 different subsets; as another example, if the user feature marry includes 2 different values, that is, yes and no, the training sample is divided into 2 different subsets, and the different user feature values are not necessarily the same.
Secondly, each subset s is countediThe number of users, the number of lost users and the number of non-lost users corresponding to the user characteristics A with different values. Specifically, assuming that the user characteristic a is the user characteristic age, for each subset, the included users are attrition users and non-attrition users, and the assumption is that the user is under 20 years old, between 20 and 30 years old (including 20 years old and 30 years old)) And dividing the samples by three age groups above 30 years, dividing the training samples into 3 subsets, counting the number of all users in the 1 st subset, the number of users below 20 years, the number of users between 20 and 30 years, the number of users above 30 years, the number of users who are not lost and the number of users who are lost in the training samples below 20 years, the number of users who are not lost and the number of users who are lost in the training samples between 20 and 30 years, and the number of users who are not lost and the number of users who are lost in the training samples above 30 years. And counting the number of users in the 2 nd and 3 rd subsets according to the same statistical mode.
Thirdly, after obtaining the number of samples in each subset, the method passes
Figure BDA0001939297390000081
Calculating a determinant value of the user characteristic A, wherein TAIs the decisive value of the user characteristic A, | s | is the total number of users of the training sample, | siI is the number of users in the ith subset, sijThe number of users in the ith subset that have not been drained, siqFor the i subset the number of users, p, is lost0For the frequency, p, of non-lost users in the training sample S1The frequency of occurrence of users is lost in the training samples. By adopting the method, the determinant values of the other user characteristics in the characteristic combination are calculated, and n determinant values of the characteristic combination are obtained.
Finally, according to the n decisive values of the feature combination, a binary selection tree model composed of the n user features in the feature combination is constructed. Specifically, the n deterministic values are sorted from large to small, the user feature with the largest deterministic value is located at the topmost node, the user feature with the smallest deterministic value is located at the bottommost node, and the user features decrease from top to bottom, so that a binary selection tree model determined by the size of the deterministic value is generated, and a schematic structural diagram of the binary selection tree model is shown in fig. 2.
To pair
Figure BDA0001939297390000091
Each feature combination in the feature combinations is processed according to the method to obtain
Figure BDA0001939297390000092
The tree model is selected in binary.
S104, respectively inputting the user information samples into
Figure BDA0001939297390000093
And obtaining X binary selection tree models with top ranking of user loss accuracy from the binary selection tree models.
Generated by
Figure BDA0001939297390000094
The binary selection tree models do not all have high accuracy, and the binary selection tree models with high accuracy need to be screened out. The accuracy rate refers to the number of users who input user characteristics of a plurality of users into a binary selection tree model respectively, and for each user, the binary selection tree model outputs a result of whether the user runs away, and whether the user runs away is known in advance, and the output result of whether each user runs away is compared with whether the user runs away, and whether the result of whether the user running away output by the binary selection tree model is the same as whether the user runs away (the user is a non-running user, the output result is a non-running user, or the user is a running-away user, and the output result is a running-away user) is counted, and the higher the counted number of users indicates that the accuracy rate of the binary selection tree model is higher.
Therefore, it is necessary to select a user information sample pair
Figure BDA0001939297390000095
Verification of binary selection tree models, in particular, for
Figure BDA0001939297390000096
Inputting the user characteristics of each user in the user information sample into the binary selection tree model, and outputting the result of whether each user runs off; the output result of whether the user is lost or not is compared with the userRespectively counting the number of the users who lose the same result as the number of the users who lose the same result, wherein the result is output by each binary selection tree model; and ranking the number of the users to obtain X binary selection tree models with the top ranking accuracy.
The number of the user information samples selected in the embodiment of the present invention is 100 ten thousand, the top 1500 binary selection tree models with the highest accuracy are selected, and the specific number is not limited in the present invention.
And S105, counting the user characteristics in the X binary selection tree models to obtain Y user characteristics with the top number ranking as user loss characteristics.
In X binary selection tree models with top ranking of accuracy, each binary selection tree model comprises n user features, and the n user features between different binary selection tree models are not completely the same. And counting all the user characteristics in the X binary selection tree models, and calculating the number of the user characteristics. For example, one binary selection tree model is composed of 7 user features of age, seq, marry, car, children, work _ life, and incom, and the other binary selection tree model is composed of 7 user features of marry, car, children, work _ life, incom, duration, and reward, so that the number of 4 user features of age, seq, duration, and reward is 1, and the number of 5 user features of marry, car, children, work _ life, and incom is 2.
And sorting the number corresponding to each user characteristic of the statistical user from large to small, selecting Y user characteristics with the largest number as user loss characteristics, and analyzing the main reason of the platform user loss.
Another embodiment of the present invention provides a user churn prediction method based on the user churn feature analysis.
Fig. 3 schematically shows a flow chart of a user churn prediction method according to an embodiment of the present invention.
As shown in fig. 1, the method comprises:
s201, obtaining user information of lost users and users not lost in the training sample, and extracting m user characteristics in the user information.
S202, randomly selecting n user characteristics from m user characteristics to obtain
Figure BDA0001939297390000101
A combination of characteristics, wherein n is more than or equal to m/2.
And decomposing and combining the acquired m user characteristics to obtain a characteristic combination. And screening the user characteristics by adopting a permutation and combination mode, wherein n user characteristics are screened each time to form a characteristic combination, wherein n is more than or equal to m/2. The total number of feature combinations in the m user features is:
Figure BDA0001939297390000102
s203, aiming at the above
Figure BDA0001939297390000103
And each feature combination in the feature combinations calculates the determinant value of each user feature in the feature combinations to obtain n determinant values, and constructs the n user features into a binary selection tree model according to the n determinant values.
Specifically, for a user feature a in a feature combination, a training sample is divided into k different subsets { s ] according to k different values of the user feature a1,s2,s3,si,...sk}。
And counting the number of users in each subset, the number of lost users and the number of non-lost users corresponding to the user characteristics A with different values.
By passing
Figure BDA0001939297390000111
Calculating a determinant value of the user characteristic A, wherein TAIs the decisive value of the user characteristic A, | s | is the total number of users of the training sample, | siI is the number of users in the ith subset, sijThe number of users in the ith subset that have not been drained, siqFor the i subset the number of users, p, is lost0For users who do not lose in training samplesFrequency of occurrence, p1The frequency of occurrence of users is lost in the training samples.
By adopting the method, the decisive values of other user characteristics are calculated to obtain n decisive values, the n decisive values are ranked from large to small, the user characteristic with the largest decisive value is positioned at the topmost node, the user characteristic with the smallest decisive value is positioned at the bottommost node, and the user characteristic with the smallest decisive value is sequentially decreased from top to bottom, so that a binary selection tree model determined by the size of the decisive value is generated.
S204, respectively inputting the user information samples into
Figure BDA0001939297390000112
And obtaining X binary selection tree models with top ranking of user loss accuracy from the binary selection tree models.
In particular, it is directed to
Figure BDA0001939297390000113
Inputting the user characteristics of each user in the user information sample into the binary selection tree model, and outputting the result of whether each user runs off; comparing the output result of whether the user runs off with the user, and respectively counting the number of the users with the same user running off of the user output result of whether the user runs off and the user; and ranking the number of the users to obtain X binary selection tree models with the top ranking accuracy.
S205, user characteristics of the user to be tested are respectively input into the X binary selection tree models, each binary selection tree model can display whether the user runs off, if the number of the binary selection tree models which show that the user does not run off is larger than the number of the binary selection tree models which run off of the user in the X binary selection tree models, the user to be tested is predicted not to run off, and otherwise, the user to be tested can run off.
The number of the user features of the user to be detected is larger than that of the user features of the binary selection tree model, and m is larger than n, so that when the user features are input into the binary selection tree model, only the user features contained in the binary selection tree model are input. Because the result predicted by one binary selection tree model is inaccurate, all the X binary selection tree models ranking the user characteristic input accuracy of the user to be detected at the top are used for predicting, and the output results of all the binary selection tree models are synthesized to obtain the final prediction result, so that the prediction accuracy is improved.
In summary, the embodiments of the present invention provide an analysis method for user churn characteristics and a prediction method for user churn. The method comprises the steps of decomposing and combining user characteristics to form characteristic combinations by obtaining the user characteristics in platform user information, establishing a binary selection tree model formed by the user characteristics according to the size of each user characteristic decisive value in each characteristic combination, analyzing user loss characteristics according to the binary selection tree model and predicting whether a current platform user loses in the future. The binary selection tree model can screen out main factors of user loss, and can quickly judge whether the user has loss risks, so that the platform can do preventive work in advance, and the daily activity of the platform is stable.
Fig. 4 schematically shows a block diagram of an analysis apparatus 400 for user churn characteristics according to an embodiment of the present invention.
As shown in fig. 4, the apparatus 400 includes a feature obtaining module 410, a feature combination generating module 420, a model building module 430, a verifying module 440, and a statistical module 450.
The feature obtaining module 410 is configured to obtain user information of users who have been lost and users who have not been lost in the training sample, and extract m user features in the user information.
A feature combination generating module 420, configured to randomly select n user features from the m user features to obtain
Figure BDA0001939297390000121
A combination of characteristics, wherein n is more than or equal to m/2.
A model building module 430 for addressing the above
Figure BDA0001939297390000122
Calculating a feature combination for each feature combination in the feature combinationsObtaining n deterministic values for each user feature, and constructing the n user features into a binary selection tree model according to the n deterministic values.
A verification module 440 for inputting user information samples to
Figure BDA0001939297390000123
Obtaining X binary selection tree models with top ranking of user loss accuracy from the binary selection tree models;
and a counting module 450, configured to count the user features in the X binary selection tree models to obtain Y user features with the largest number, which are used as user churn features.
It should be understood that the feature obtaining module 410, the feature combination generating module 420, the model building module 430, the verifying module 440, and the statistics module 450 may be combined and implemented in one module, or any one of the modules may be split into a plurality of modules. Alternatively, at least part of the functionality of one or more of these modules may be combined with at least part of the functionality of the other modules and implemented in one module. According to an embodiment of the present invention, at least one of the feature obtaining module 410, the feature combination generating module 420, the model building module 430, the verifying module 440, and the statistics module 450 may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or any other reasonable manner of integrating or packaging a circuit, as hardware or firmware, or as a suitable combination of software, hardware, and firmware implementations. Alternatively, at least one of the feature obtaining module 410, the feature combination generating module 420, the model building module 430, the verifying module 440, and the counting module 450 may be at least partially implemented as a computer program module that, when executed by a computer, may perform the functions of the respective modules.
The present invention provides an electronic device, as shown in fig. 5, the electronic device 500 includes a processor 510 and a memory 520. The electronic device 500 may perform a method according to the embodiment of the invention shown in fig. 1 and 3.
In particular, processor 510 may include, for example, a general purpose microprocessor, an instruction set processor and/or related chip set and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), and/or the like. The processor 510 may also include on-board memory for caching purposes. Processor 510 may be a single processing unit or a plurality of processing units for performing different actions of a method flow according to embodiments of the disclosure.
The memory 520, for example, can be any medium that can contain, store, communicate, propagate, or transport the instructions. For example, a readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, device, or propagation medium. Specific examples of the readable storage medium include: magnetic storage devices, such as magnetic tape or Hard Disk Drives (HDDs); optical storage devices, such as compact disks (CD-ROMs); a memory, such as a Random Access Memory (RAM) or a flash memory; and/or wired/wireless communication links.
The memory 520 may include a computer program 521, which computer program 521 may include code/computer-executable instructions that, when executed by the processor 510, cause the processor 510 to perform a method according to an embodiment of the disclosure, or any variation thereof.
The computer program 521 may be configured with, for example, computer program code comprising computer program modules. For example, in an example embodiment, code in computer program 521 may include at least one program module, including for example, module 521A, modules 521B, … …. It should be noted that the division and number of modules are not fixed, and those skilled in the art may use suitable program modules or program module combinations according to actual situations, and when these program modules are executed by the processor 510, the processor 510 may execute the method according to the embodiment of the present disclosure or any variation thereof.
The present disclosure also provides a computer-readable medium, which may be embodied in the apparatus/device/system described in the above embodiments; or may exist separately and not be assembled into the device/apparatus/system. The computer readable medium carries one or more programs which, when executed, implement the method according to an embodiment of the disclosure.
According to embodiments of the present disclosure, a computer readable medium may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer-readable signal medium may include a propagated data signal with computer-readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, optical fiber cable, radio frequency signals, etc., or any suitable combination of the foregoing.
While the present disclosure has been shown and described with reference to certain exemplary embodiments thereof, various changes in form and details may be made therein without departing from the spirit and scope of the present disclosure as defined by the appended claims and their equivalents. Accordingly, the scope of the present disclosure should not be limited to the above-described embodiments, but should be defined not only by the appended claims, but also by equivalents thereof.

Claims (8)

1. A method for analyzing user churn characteristics is characterized by comprising the following steps:
acquiring user information of lost users and users not lost in a training sample, and extracting m user characteristics in the user information;
randomly selecting n user characteristics from the m user characteristics to obtain
Figure FDA0002632844030000011
A characteristic combination, wherein n is more than or equal to m/2;
to the above
Figure FDA0002632844030000012
Each feature combination in the feature combinations calculates a determinant value of each user feature in the feature combinations to obtain n determinant values, and constructs the n user features into a binary selection tree model according to the n determinant values; wherein, the calculating the decisive value of each user feature in the feature combination to obtain n decisive values includes:
aiming at a user characteristic A in a characteristic combination, dividing a training sample into k different subsets { s ] according to k different values of the user characteristic A1,s2,s3,si,...sk};
Counting the number of users in each subset, the number of lost users and the number of non-lost users corresponding to the user characteristics A with different values;
by passing
Figure FDA0002632844030000013
Calculating a determinant value of the user characteristic A, wherein TAIs specific to the userThe determinant value of A is characterized in that | s | is the total number of users of the training sample, | siI is the number of users in the ith subset, sijThe number of users in the ith subset that have not been drained, siqFor the i subset the number of users, p, is lost0Frequency p of non-lost users in the training sample1The frequency of occurrence of the lost user in the training sample;
by adopting the method, the determinant values of other user characteristics are calculated to obtain the n determinant values;
respectively inputting user information samples into
Figure FDA0002632844030000014
Obtaining X binary selection tree models with top ranking of user loss accuracy from the binary selection tree models;
and counting the user characteristics in the X binary selection tree models to obtain Y user characteristics with the top number ranking as user loss characteristics.
2. The method according to claim 1, wherein in the binary selection tree model, the user feature with the largest determinant value is located at the topmost node of the binary selection tree model, the user feature with the smallest determinant value is located at the bottommost node of the binary selection tree model, and the other user features are arranged at the nodes between the topmost node and the bottommost node in sequence from top to bottom according to the descending order of their determinant values.
3. The method of claim 1, wherein the user information samples are respectively input to
Figure FDA0002632844030000021
Obtaining X binary selection tree models with top ranking of user loss accuracy from the binary selection tree models, wherein the X binary selection tree models comprise:
for aiming at
Figure FDA0002632844030000022
Inputting the user characteristics of each user in the user information sample into the binary selection tree model, and outputting the result of whether the corresponding user runs away;
comparing the output result of whether the user runs off with the user, and respectively counting the number of the users with the same user running off of the user output result of whether the user runs off and the user;
and ranking the number of the users to obtain X binary selection tree models with the higher accuracy ranking.
4. The method of claim 1, wherein the m user characteristics comprise: at least one of age, sex, whether married, whether car is present, whether child is present, working age, annual income, number of days to become members, consumption for appreciation in a certain period of time, charge in a certain period of time, number of anchor broadcasts concerned, number of days of broadcast of anchor broadcasts concerned in a certain period of time, number of barracks in a certain period of time, and whether lost.
5. A method for predicting user churn, comprising:
acquiring user information of lost users and users not lost in a training sample, and extracting m user characteristics in the user information;
randomly selecting n user characteristics from the m user characteristics to obtain
Figure FDA0002632844030000023
The characteristic combination is that n is more than or equal to m/2;
to the above
Figure FDA0002632844030000024
Each feature combination in the feature combinations, calculating the decisive value of each user feature in the feature combinations to obtain n decisive values, and obtaining the n decisive values according to the n decisive valuesConstructing the n user features into a binary selection tree model; wherein, the calculating the decisive value of each user feature in the feature combination to obtain n decisive values includes:
aiming at a user characteristic A in a characteristic combination, dividing a training sample into k different subsets { s ] according to k different values of the user characteristic A1,s2,s3,si,...sk};
Counting the number of users in each subset, the number of lost users and the number of non-lost users corresponding to the user characteristics A with different values;
by passing
Figure FDA0002632844030000031
Calculating a determinant value of the user characteristic A, wherein TAIs the decisive value of the user characteristic A, | s | is the total number of users of the training sample, | siI is the number of users in the ith subset, sijThe number of users in the ith subset that have not been drained, siqFor the i subset the number of users, p, is lost0Frequency p of non-lost users in the training sample1The frequency of occurrence of the lost user in the training sample;
by adopting the method, the determinant values of other user characteristics are calculated to obtain the n determinant values;
respectively inputting user information samples into
Figure FDA0002632844030000032
Obtaining X binary selection tree models with top ranking of user loss accuracy from the binary selection tree models;
and respectively inputting the user characteristics of the user to be detected into the X binary selection tree models, wherein each binary selection tree model can display whether the user runs off, if the number of the binary selection tree models which are displayed in the X binary selection tree models and are not run off by the user is larger than the number of the binary selection tree models which are run off by the user, the user to be detected is predicted not to run off, and otherwise, the user to be detected runs off.
6. An apparatus for analyzing user churn characteristics, the apparatus comprising:
the characteristic acquisition module is used for acquiring user information of lost users and users not lost in the training sample and extracting m user characteristics in the user information;
a feature combination generation module for randomly selecting n user features from the m user features to obtain
Figure FDA0002632844030000033
A characteristic combination, wherein n is more than or equal to m/2;
a model building module for aiming at the above
Figure FDA0002632844030000034
Each feature combination in the feature combinations calculates a determinant value of each user feature in the feature combinations to obtain n determinant values, and constructs the n user features into a binary selection tree model according to the n determinant values; wherein, the calculating the decisive value of each user feature in the feature combination to obtain n decisive values includes:
aiming at a user characteristic A in a characteristic combination, dividing a training sample into k different subsets { s ] according to k different values of the user characteristic A1,s2,s3,si,...sk};
Counting the number of users in each subset, the number of lost users and the number of non-lost users corresponding to the user characteristics A with different values;
by passing
Figure FDA0002632844030000041
Calculating a determinant value of the user characteristic A, wherein TAIs the decisive value of the user characteristic A, | s | is the total number of users of the training sample, | siI is the number of users in the ith subset, sijThe number of users in the ith subset that have not been drained, siqIs the ith subCentralizing the number of churned users, p0Frequency p of non-lost users in the training sample1The frequency of occurrence of the lost user in the training sample;
by adopting the method, the determinant values of other user characteristics are calculated to obtain the n determinant values;
a verification module for inputting user information samples into the data processing module
Figure FDA0002632844030000042
Obtaining X binary selection tree models with top ranking of user loss accuracy from the binary selection tree models;
and the counting module is used for counting the user characteristics in the X binary selection tree models to obtain Y user characteristics with the top quantity ranking as user loss characteristics.
7. An electronic device, comprising:
a processor;
a memory storing a computer executable program which, when executed by the processor, causes the processor to perform the method of any one of claims 1-5.
8. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-5.
CN201910018617.9A 2019-01-08 2019-01-08 User loss characteristic analysis method, prediction method, device, equipment and medium Active CN109740685B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910018617.9A CN109740685B (en) 2019-01-08 2019-01-08 User loss characteristic analysis method, prediction method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910018617.9A CN109740685B (en) 2019-01-08 2019-01-08 User loss characteristic analysis method, prediction method, device, equipment and medium

Publications (2)

Publication Number Publication Date
CN109740685A CN109740685A (en) 2019-05-10
CN109740685B true CN109740685B (en) 2020-10-27

Family

ID=66364038

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910018617.9A Active CN109740685B (en) 2019-01-08 2019-01-08 User loss characteristic analysis method, prediction method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN109740685B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103970864A (en) * 2014-05-08 2014-08-06 清华大学 Emotion classification and emotion component analyzing method and system based on microblog texts
CN104813353A (en) * 2012-10-30 2015-07-29 阿尔卡特朗讯 System and method for generating subscriber churn predictions
CN107332694A (en) * 2017-06-14 2017-11-07 深圳市易信成科技股份有限公司 A kind of method and system that user's telephone number is excavated based on fixed network big data
CN108876034A (en) * 2018-06-13 2018-11-23 重庆邮电大学 A kind of improved Lasso+RBF neural network ensemble prediction model
CN108960262A (en) * 2017-05-19 2018-12-07 意礴科技有限公司 A kind of methods, devices and systems and computer readable storage medium for predicting shoes code
CN109034894A (en) * 2018-07-20 2018-12-18 武汉斗鱼网络科技有限公司 Advertisement page pageview statistical method, device, electronic equipment and storage medium

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8065214B2 (en) * 2005-09-06 2011-11-22 Ge Corporate Financial Services, Inc. Methods and system for assessing loss severity for commercial loans
CN106203679A (en) * 2016-06-27 2016-12-07 武汉斗鱼网络科技有限公司 A kind of customer loss Forecasting Methodology and system
CN106250403A (en) * 2016-07-19 2016-12-21 北京奇艺世纪科技有限公司 Customer loss Forecasting Methodology and device
US10394871B2 (en) * 2016-10-18 2019-08-27 Hartford Fire Insurance Company System to predict future performance characteristic for an electronic record
CN109034853B (en) * 2017-06-09 2021-11-26 北京京东尚科信息技术有限公司 Method, device, medium and electronic equipment for searching similar users based on seed users
CN107832581B (en) * 2017-12-15 2022-02-18 百度在线网络技术(北京)有限公司 State prediction method and device
CN108549594B (en) * 2018-03-30 2021-07-23 武汉斗鱼网络科技有限公司 Method and device for determining user loss reason
CN108665321A (en) * 2018-05-18 2018-10-16 广州虎牙信息科技有限公司 High viscosity customer loss prediction technique, device and computer readable storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104813353A (en) * 2012-10-30 2015-07-29 阿尔卡特朗讯 System and method for generating subscriber churn predictions
CN103970864A (en) * 2014-05-08 2014-08-06 清华大学 Emotion classification and emotion component analyzing method and system based on microblog texts
CN108960262A (en) * 2017-05-19 2018-12-07 意礴科技有限公司 A kind of methods, devices and systems and computer readable storage medium for predicting shoes code
CN107332694A (en) * 2017-06-14 2017-11-07 深圳市易信成科技股份有限公司 A kind of method and system that user's telephone number is excavated based on fixed network big data
CN108876034A (en) * 2018-06-13 2018-11-23 重庆邮电大学 A kind of improved Lasso+RBF neural network ensemble prediction model
CN109034894A (en) * 2018-07-20 2018-12-18 武汉斗鱼网络科技有限公司 Advertisement page pageview statistical method, device, electronic equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Filter Model Research of Characteristic Value of Typical Construction Engineering Based on T Test and Decision Tree Method;Shasha Xie et al;《International Conference on Construction & Real Estate Management》;20171109;第345-354页 *
基于数据挖掘的网购用户流失预测研究;郭成蹊;《中国优秀硕士学位论文全文数据库 经济与管理科学辑》;20161215(第12期);J157-23 *

Also Published As

Publication number Publication date
CN109740685A (en) 2019-05-10

Similar Documents

Publication Publication Date Title
Mason et al. A guide for using functional diversity indices to reveal changes in assembly processes along ecological gradients
US20220277404A1 (en) Pattern Identification in Time-Series Social Media Data, and Output-Dynamics Engineering for a Dynamic System Having One or More Multi-Scale Time-Series Data Sets
Colizza et al. The role of the airline transportation network in the prediction and predictability of global epidemics
Peiró-Velert et al. Screen media usage, sleep time and academic performance in adolescents: clustering a self-organizing maps analysis
Rombach et al. Core-periphery structure in networks
CN102331992A (en) Distributed decision tree training
JP2016524259A (en) Dynamic research panel
CN108648000B (en) Method and device for evaluating user retention life cycle and electronic equipment
CN109118119A (en) Air control model generating method and device
US9639455B2 (en) Autonomous media version testing
CN105869022B (en) Application popularity prediction method and device
Martín et al. A methodology to study noise annoyance and to perform Action Plans follow up using as input an existing survey and noise map: Application to the city of Málaga (Spain)
CN109726826B (en) Training method and device for random forest, storage medium and electronic equipment
CN106611021B (en) Data processing method and equipment
US20130282445A1 (en) Method or system to evaluate strategy decisions
CN109740685B (en) User loss characteristic analysis method, prediction method, device, equipment and medium
CN111611781B (en) Data labeling method, question answering device and electronic equipment
Mafteiu-Scai A new approach for solving equations systems inspired from brainstorming
Sanli et al. Temporal pattern of online communication spike trains in spreading a scientific rumor: how often, who interacts with whom?
Teusner et al. Taking informed action on student activity in MOOCs
Eyvindson et al. Using a compromise programming framework to integrating spatially specific preference information for forest management problems
Chen et al. Developing Taiwan into the tourist transport centre of East Asia
Glonek et al. Semi-supervised graph labelling reveals increasing partisanship in the United States Congress
CN110263029A (en) Method, apparatus, terminal and the medium of database generation test data
Nikolić et al. Building an ensemble from a single Naive Bayes classifier in the analysis of key risk factors for Polish State Fire Service

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant