CN109740685B

CN109740685B - User loss characteristic analysis method, prediction method, device, equipment and medium

Info

Publication number: CN109740685B
Application number: CN201910018617.9A
Authority: CN
Inventors: 肖源
Original assignee: Wuhan Douyu Network Technology Co Ltd
Current assignee: Wuhan Douyu Network Technology Co Ltd
Priority date: 2019-01-08
Filing date: 2019-01-08
Publication date: 2020-10-27
Anticipated expiration: 2039-01-08
Also published as: CN109740685A

Abstract

The invention provides a method for analyzing user loss characteristics, which comprises the following steps: acquiring user information of lost users and users not lost, and extracting m user characteristics; randomly selecting n user characteristics from m user characteristics to obtain

A feature combination; aiming at each feature combination, calculating a decisive value of each user feature in the feature combination to obtain n decisive values, and constructing the n user features into a binary selection tree model according to the n decisive values; authentication

Obtaining X binary selection tree models with top ranking of user loss accuracy; and counting the user characteristics in the X binary selection tree models to obtain Y user characteristics with the top number ranking as user loss characteristics. The invention provides a user loss prediction method based on a binary selection tree model, which can screen out main factors of user loss and quickly judge whether a user has loss risk. The invention provides a device, an apparatus and a medium.

Description

User loss characteristic analysis method, prediction method, device, equipment and medium

Technical Field

The invention relates to the field of internet live broadcast, in particular to a user loss feature analysis method, a user loss prediction method, a user loss feature prediction device, user loss equipment and a user loss feature prediction medium.

Background

With the development of the internet, live broadcasting becomes more and more popular. For each live broadcast platform, daily activity (number of active users per day) is a key index concerned by the platform, and is concerned with the growth and decline of the whole live broadcast platform, and high daily activity can bring more user consumption and advertising revenue, which is also a main traffic emerging channel of the current live broadcast platform.

Along with the introduction of various new entertainment platforms, the entertainment categories of users are gradually changed, so that the phenomenon of user loss occurs all the time. Because the total time of the users participating in the entertainment is constant, after the users are converted into users of other platforms, economic losses can be brought to the current platform. Thus, in addition to using an ongoing entertainment approach to increase user stickiness, the platform should also be able to summarize the main causes of user churn.

At present, the main mode for analyzing the loss of the user is a questionnaire survey mode, and the analysis effect of the mode is not obvious because the user can not pay attention to the dynamic state of the platform after the loss, so that a model is needed to analyze the main reason of the loss of the user and predict whether the user will lose or not, so that the user saving work can be made in advance, and the daily living capacity stability of the platform can be guaranteed.

Disclosure of Invention

Technical problem to be solved

Aiming at the technical problems existing at present, the invention provides a user loss feature analysis method, a prediction method, a device, equipment and a medium, which are used for at least partially solving the technical problems.

(II) technical scheme

One aspect of the present invention provides a method for analyzing user churn characteristics, including: acquiring user information of lost users and users not lost in a training sample, and extracting m user characteristics in the user information; randomly selecting n user characteristics from m user characteristics to obtain

A characteristic combination, wherein n is more than or equal to m/2; to the above

Each feature combination in the feature combinations calculates a determinant value of each user feature in the feature combinations to obtain n determinant values, and constructs n user features into a binary selection tree model according to the n determinant values; respectively inputting user information samples into

Obtaining X binary selection tree models with top ranking of user loss accuracy from the binary selection tree models; and counting the user characteristics in the X binary selection tree models to obtain Y user characteristics with the top number ranking as user loss characteristics.

Optionally, calculating a determinant value of each user feature in the feature combination to obtain n determinant values, including: aiming at a user characteristic A in a characteristic combination, dividing a training sample into k different subsets { s ] according to k different values of the user characteristic A₁，s₂，s₃，s_i，...s_k}; counting the number of users in each subset, the number of lost users and the number of non-lost users corresponding to the user characteristics A with different values; by passing

Calculating a determinant value of the user characteristic A, wherein T_AIs the decisive value of the user characteristic A, | s | is the total number of users of the training sample, | s_iI is the number of users in the ith subset, s_ijThe number of users in the ith subset that have not been drained, s_iqFor the i subset the number of users, p, is lost₀Frequency of occurrence of user not lost in training sample, p₁The frequency of occurrence of the lost user in the training sample; by adopting the method, the determinant values of other user characteristics are calculated to obtain the n determinant values.

Optionally, in the binary selection tree model, the user feature with the largest determinant value is located at a topmost node of the binary selection tree model, the user feature with the smallest determinant value is located at a bottommost node of the binary selection tree model, and the other user features are sequentially arranged at nodes between the topmost node and the bottommost node from top to bottom according to a descending order of their determinant values.

Optionally, the user information samples are input separately to

Obtaining X binary selection tree models with top ranking of user loss accuracy from the binary selection tree models, wherein the X binary selection tree models comprise: for aiming at

Inputting the user characteristics of each user in the user information sample into the binary selection tree model, and outputting the result of whether each user runs off; comparing the output result of whether the user runs off with the output result of whether the user runs off, and respectively counting the number of the users of which the output result of whether the user runs off and the output result of whether the user runs off are the same as the output result of whether the user runs off; and ranking the number of the users to obtain X binary selection tree models with the top ranking accuracy.

Optionally, the m user features include: at least one of age, sex, whether married, whether car is present, whether child is present, working age, annual income, number of days to become members, consumption for appreciation in a certain period of time, charge in a certain period of time, number of anchor broadcasts concerned, number of days of broadcast of anchor broadcasts concerned in a certain period of time, number of barracks in a certain period of time, and whether lost.

Another aspect of the present invention provides a method for predicting user churn, including: acquiring user information of lost users and users not lost in a training sample, and extracting m user characteristics in the user information; randomly selecting n user characteristics from m user characteristics to obtain

The characteristic combination is that n is more than or equal to m/2; to the above

Obtaining X binary selection tree models with top ranking of user loss accuracy from the binary selection tree models; and respectively inputting the user characteristics of the user to be tested into the X binary selection tree models, wherein each binary selection tree model can display whether the user runs off, if the number of the binary selection tree models which show that the user does not run off is larger than the number of the binary selection tree models which show that the user runs off in the X binary selection tree models, the user to be tested is predicted not to run off, and otherwise, the user to be tested is predicted to run off.

Calculating a determinant value of the user characteristic A, wherein T_AIs the decisive value of the user characteristic A, | s | is the total number of users of the training sample, | s_iI is the number of users in the ith subset, s_ijThe number of users in the ith subset that have not been drained, s_iqFor the i subset the number of users, p, is lost₀Frequency of occurrence of user not lost in training sample, p₁The frequency of occurrence of the lost user in the training sample; by adopting the method, the decisive values of other user characteristics are calculated to obtain n decisive values.

Another aspect of the present invention provides an apparatus for analyzing user churn characteristics, the apparatus comprising: the characteristic acquisition module is used for acquiring user information of lost users and users not lost in the training sample and extracting m user characteristics in the user information; a feature combination generation module for randomly selecting n user features from the m user features to obtain

A characteristic combination, wherein n is more than or equal to m/2; a model building module for aiming at the above

Each feature combination in the feature combinations calculates a determinant value of each user feature in the feature combinations to obtain n determinant values, and constructs n user features into a binary selection tree model according to the n determinant values; a verification module for inputting user information samples into the data processing module

Obtaining X binary selection tree models with top ranking of user loss accuracy from the binary selection tree models; and the counting module is used for counting the user characteristics in the X binary selection tree models to obtain Y user characteristics with the top number ranking as user loss characteristics.

Another aspect of the present invention also provides an electronic device, including: a processor; a memory storing a computer executable program which, when executed by the processor, causes the processor to execute a user churn feature analysis method and a user churn prediction method of the present invention.

In another aspect, the present invention further provides a computer readable medium, on which a computer program is stored, which when executed by a processor implements the user churn feature analysis method and the user churn prediction method of the present invention.

(III) advantageous effects

The invention provides a user loss characteristic analysis method and a user loss prediction method, which are characterized in that user characteristics are decomposed and combined to form characteristic combinations, a binary selection tree model formed by user characteristics is established according to the size of a determinative value of each user characteristic in each characteristic combination, and the binary selection tree model established in the way can screen out main factors of user loss and realize quick judgment on whether a user has loss risks, so that a platform can well perform prevention work in advance and ensure the daily activity stability of the platform.

Drawings

For a more complete understanding of the present invention, and the advantages thereof, reference is now made to the following descriptions taken in conjunction with the accompanying drawings, in which:

fig. 1 schematically shows a flow chart of a user churn feature analysis method according to an embodiment of the present invention.

FIG. 2 schematically shows a structural diagram of a binary selection tree model according to an embodiment of the present invention.

Fig. 3 schematically shows a flow chart of a user churn prediction method according to an embodiment of the present invention.

Fig. 4 schematically shows a block diagram of a user churn feature analysis apparatus according to an embodiment of the present invention.

Fig. 5 schematically shows a block diagram of an electronic device according to an embodiment of the invention.

Detailed Description

Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings. It is to be understood that such description is merely illustrative and not intended to limit the scope of the present invention. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the invention. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present invention.

The invention provides a user loss characteristic analysis method and a user loss prediction method, which can be used for screening out main characteristics (factors) of user loss and realizing rapid judgment of whether the user has loss risks by selectively decomposing and combining characteristics of user information and training a tree decision model. The tree decision is a basic classification and regression method, a model capable of predicting and classifying a new sample is obtained by training a known sample, and the model is very consistent with a thinking mode that people judge the object type, for example, a student is judged to be a good student or not, firstly, the goodness of the student is not good, and if the goodness is not good, the model can be directly denied; if the result is good, the user can not be determined whether the user is a good student or a good student, the user needs to judge the learning of the user, and if the result is good, the user needs to judge the sports of the user, so that a judgment process forms a decision tree, and the decision tree model is a process for generating the decision tree through known data. Therefore, the invention establishes a tree decision model through the user characteristics contained in the platform user information based on the decision tree to analyze the main characteristics (factors) of user loss and predict whether the user loss of the current platform is lost in the future.

The embodiment of the invention provides an analysis method for user loss characteristics.

As shown in fig. 1, the method comprises:

s101, obtaining user information of lost users and users not lost in the training sample, and extracting m user characteristics in the user information.

In order to construct a decision tree model, training sample data needs to be selected first, and therefore 200 universal users of the platform are selected as training samples S, wherein the ratio of the number of lost users to the number of non-lost users is randomly selected, and the specific number of training samples, the number of lost users and the number of non-lost users can be selected according to actual requirements without limitation.

Whether the lost user or the non-lost user, some information is retained in the process of platform registration and entertainment participation, the information can be used as a measure for the user characteristics of a user, the user can be analyzed based on the user characteristics, and the general user information mainly comprises the user characteristics as shown in table 1:

TABLE 1

Therefore, the user characteristics listed in table 1 of each of 200 ten thousand users need to be obtained, where uid is the unique identifier of the user, and does not participate in the characteristic comparison in the subsequent decision tree model building process. The user information includes user characteristics not limited to those listed in table 1, and in another embodiment of the present invention, other user characteristics, such as "whether there is room" or not, may be obtained.

S102, randomly selecting n user characteristics from m user characteristics to obtain

A combination of characteristics, wherein n is more than or equal to m/2.

And decomposing and combining the acquired m user characteristics to obtain a characteristic combination. And screening the user characteristics by adopting a permutation and combination mode, wherein n user characteristics are screened each time to form a characteristic combination, wherein n is more than or equal to m/2. The total number of feature combinations in the m user features is:

when the obtained user characteristics are as shown in table 1, that is, 14 user characteristics are selected per user information. If each feature combination contains 7 user features, the total number of feature combinations is:

the number of the user features in each feature combination is not limited to 7, and is determined according to actual requirements, and the present invention is not limited thereto, and in another embodiment of the present invention, the number of the user features in the feature combination may be selected from 5, 6, 8, or 9, and the like.

S103, aiming at the above

Each of the feature combinations, and a block for each of the user features in the feature combinations is calculatedAnd (5) obtaining n decisive values, and constructing the n user features into a binary selection tree model according to the n decisive values.

After the user features in the acquired user information are decomposed and combined to obtain feature combinations, model training is performed on each feature combination in a probability selection tree mode to generate a corresponding binary selection tree model, namely, each user feature in the feature combinations forms a tree structure from top to bottom. The confirmation of the tree structure relationship is determined according to the decisive value of each user characteristic. The decisive value refers to a numerical value calculated according to the number relation of users corresponding to different attributes of different user characteristics in each subset after dividing the sample into a plurality of subsets according to the value of each user characteristic attribute. The calculation of the certainty value will be specifically described below by taking a feature combination of 7 user features, i.e., age, seq, marry, car, children, work _ life, and income, as an example.

Firstly, aiming at the A user characteristics in a characteristic combination, dividing a training sample into k different subsets { s ] according to k different values of the user characteristics₁，s₂，s₃，s_i，…s_k}. For example, the age of a user who generally watches live broadcast may be from 20 to 40, if the age is an age group every year, the user feature age includes 21 different values, the training sample is divided into 21 different subsets, and if the age is 30 years, the user feature age includes two values greater than 30 and less than or equal to 30, the training sample is divided into 2 different subsets; as another example, if the user feature marry includes 2 different values, that is, yes and no, the training sample is divided into 2 different subsets, and the different user feature values are not necessarily the same.

Secondly, each subset s is counted_iThe number of users, the number of lost users and the number of non-lost users corresponding to the user characteristics A with different values. Specifically, assuming that the user characteristic a is the user characteristic age, for each subset, the included users are attrition users and non-attrition users, and the assumption is that the user is under 20 years old, between 20 and 30 years old (including 20 years old and 30 years old)) And dividing the samples by three age groups above 30 years, dividing the training samples into 3 subsets, counting the number of all users in the 1 st subset, the number of users below 20 years, the number of users between 20 and 30 years, the number of users above 30 years, the number of users who are not lost and the number of users who are lost in the training samples below 20 years, the number of users who are not lost and the number of users who are lost in the training samples between 20 and 30 years, and the number of users who are not lost and the number of users who are lost in the training samples above 30 years. And counting the number of users in the 2 nd and 3 rd subsets according to the same statistical mode.

Thirdly, after obtaining the number of samples in each subset, the method passes

Calculating a determinant value of the user characteristic A, wherein T_AIs the decisive value of the user characteristic A, | s | is the total number of users of the training sample, | s_iI is the number of users in the ith subset, s_ijThe number of users in the ith subset that have not been drained, s_iqFor the i subset the number of users, p, is lost₀For the frequency, p, of non-lost users in the training sample S₁The frequency of occurrence of users is lost in the training samples. By adopting the method, the determinant values of the other user characteristics in the characteristic combination are calculated, and n determinant values of the characteristic combination are obtained.

Finally, according to the n decisive values of the feature combination, a binary selection tree model composed of the n user features in the feature combination is constructed. Specifically, the n deterministic values are sorted from large to small, the user feature with the largest deterministic value is located at the topmost node, the user feature with the smallest deterministic value is located at the bottommost node, and the user features decrease from top to bottom, so that a binary selection tree model determined by the size of the deterministic value is generated, and a schematic structural diagram of the binary selection tree model is shown in fig. 2.

To pair

Each feature combination in the feature combinations is processed according to the method to obtain

The tree model is selected in binary.

S104, respectively inputting the user information samples into

And obtaining X binary selection tree models with top ranking of user loss accuracy from the binary selection tree models.

Generated by

The binary selection tree models do not all have high accuracy, and the binary selection tree models with high accuracy need to be screened out. The accuracy rate refers to the number of users who input user characteristics of a plurality of users into a binary selection tree model respectively, and for each user, the binary selection tree model outputs a result of whether the user runs away, and whether the user runs away is known in advance, and the output result of whether each user runs away is compared with whether the user runs away, and whether the result of whether the user running away output by the binary selection tree model is the same as whether the user runs away (the user is a non-running user, the output result is a non-running user, or the user is a running-away user, and the output result is a running-away user) is counted, and the higher the counted number of users indicates that the accuracy rate of the binary selection tree model is higher.

Therefore, it is necessary to select a user information sample pair

Verification of binary selection tree models, in particular, for

Inputting the user characteristics of each user in the user information sample into the binary selection tree model, and outputting the result of whether each user runs off; the output result of whether the user is lost or not is compared with the userRespectively counting the number of the users who lose the same result as the number of the users who lose the same result, wherein the result is output by each binary selection tree model; and ranking the number of the users to obtain X binary selection tree models with the top ranking accuracy.

The number of the user information samples selected in the embodiment of the present invention is 100 ten thousand, the top 1500 binary selection tree models with the highest accuracy are selected, and the specific number is not limited in the present invention.

And S105, counting the user characteristics in the X binary selection tree models to obtain Y user characteristics with the top number ranking as user loss characteristics.

In X binary selection tree models with top ranking of accuracy, each binary selection tree model comprises n user features, and the n user features between different binary selection tree models are not completely the same. And counting all the user characteristics in the X binary selection tree models, and calculating the number of the user characteristics. For example, one binary selection tree model is composed of 7 user features of age, seq, marry, car, children, work _ life, and incom, and the other binary selection tree model is composed of 7 user features of marry, car, children, work _ life, incom, duration, and reward, so that the number of 4 user features of age, seq, duration, and reward is 1, and the number of 5 user features of marry, car, children, work _ life, and incom is 2.

And sorting the number corresponding to each user characteristic of the statistical user from large to small, selecting Y user characteristics with the largest number as user loss characteristics, and analyzing the main reason of the platform user loss.

Another embodiment of the present invention provides a user churn prediction method based on the user churn feature analysis.

As shown in fig. 1, the method comprises:

s201, obtaining user information of lost users and users not lost in the training sample, and extracting m user characteristics in the user information.

S202, randomly selecting n user characteristics from m user characteristics to obtain

A combination of characteristics, wherein n is more than or equal to m/2.

s203, aiming at the above

And each feature combination in the feature combinations calculates the determinant value of each user feature in the feature combinations to obtain n determinant values, and constructs the n user features into a binary selection tree model according to the n determinant values.

Specifically, for a user feature a in a feature combination, a training sample is divided into k different subsets { s ] according to k different values of the user feature a₁，s₂，s₃，s_i，...s_k}。

And counting the number of users in each subset, the number of lost users and the number of non-lost users corresponding to the user characteristics A with different values.

By passing

Calculating a determinant value of the user characteristic A, wherein T_AIs the decisive value of the user characteristic A, | s | is the total number of users of the training sample, | s_iI is the number of users in the ith subset, s_ijThe number of users in the ith subset that have not been drained, s_iqFor the i subset the number of users, p, is lost₀For users who do not lose in training samplesFrequency of occurrence, p₁The frequency of occurrence of users is lost in the training samples.

By adopting the method, the decisive values of other user characteristics are calculated to obtain n decisive values, the n decisive values are ranked from large to small, the user characteristic with the largest decisive value is positioned at the topmost node, the user characteristic with the smallest decisive value is positioned at the bottommost node, and the user characteristic with the smallest decisive value is sequentially decreased from top to bottom, so that a binary selection tree model determined by the size of the decisive value is generated.

S204, respectively inputting the user information samples into

In particular, it is directed to

Inputting the user characteristics of each user in the user information sample into the binary selection tree model, and outputting the result of whether each user runs off; comparing the output result of whether the user runs off with the user, and respectively counting the number of the users with the same user running off of the user output result of whether the user runs off and the user; and ranking the number of the users to obtain X binary selection tree models with the top ranking accuracy.

S205, user characteristics of the user to be tested are respectively input into the X binary selection tree models, each binary selection tree model can display whether the user runs off, if the number of the binary selection tree models which show that the user does not run off is larger than the number of the binary selection tree models which run off of the user in the X binary selection tree models, the user to be tested is predicted not to run off, and otherwise, the user to be tested can run off.

The number of the user features of the user to be detected is larger than that of the user features of the binary selection tree model, and m is larger than n, so that when the user features are input into the binary selection tree model, only the user features contained in the binary selection tree model are input. Because the result predicted by one binary selection tree model is inaccurate, all the X binary selection tree models ranking the user characteristic input accuracy of the user to be detected at the top are used for predicting, and the output results of all the binary selection tree models are synthesized to obtain the final prediction result, so that the prediction accuracy is improved.

In summary, the embodiments of the present invention provide an analysis method for user churn characteristics and a prediction method for user churn. The method comprises the steps of decomposing and combining user characteristics to form characteristic combinations by obtaining the user characteristics in platform user information, establishing a binary selection tree model formed by the user characteristics according to the size of each user characteristic decisive value in each characteristic combination, analyzing user loss characteristics according to the binary selection tree model and predicting whether a current platform user loses in the future. The binary selection tree model can screen out main factors of user loss, and can quickly judge whether the user has loss risks, so that the platform can do preventive work in advance, and the daily activity of the platform is stable.

Fig. 4 schematically shows a block diagram of an analysis apparatus 400 for user churn characteristics according to an embodiment of the present invention.

As shown in fig. 4, the apparatus 400 includes a feature obtaining module 410, a feature combination generating module 420, a model building module 430, a verifying module 440, and a statistical module 450.

The feature obtaining module 410 is configured to obtain user information of users who have been lost and users who have not been lost in the training sample, and extract m user features in the user information.

A feature combination generating module 420, configured to randomly select n user features from the m user features to obtain

A combination of characteristics, wherein n is more than or equal to m/2.

A model building module 430 for addressing the above

Calculating a feature combination for each feature combination in the feature combinationsObtaining n deterministic values for each user feature, and constructing the n user features into a binary selection tree model according to the n deterministic values.

A verification module 440 for inputting user information samples to

Obtaining X binary selection tree models with top ranking of user loss accuracy from the binary selection tree models;

and a counting module 450, configured to count the user features in the X binary selection tree models to obtain Y user features with the largest number, which are used as user churn features.

It should be understood that the feature obtaining module 410, the feature combination generating module 420, the model building module 430, the verifying module 440, and the statistics module 450 may be combined and implemented in one module, or any one of the modules may be split into a plurality of modules. Alternatively, at least part of the functionality of one or more of these modules may be combined with at least part of the functionality of the other modules and implemented in one module. According to an embodiment of the present invention, at least one of the feature obtaining module 410, the feature combination generating module 420, the model building module 430, the verifying module 440, and the statistics module 450 may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or any other reasonable manner of integrating or packaging a circuit, as hardware or firmware, or as a suitable combination of software, hardware, and firmware implementations. Alternatively, at least one of the feature obtaining module 410, the feature combination generating module 420, the model building module 430, the verifying module 440, and the counting module 450 may be at least partially implemented as a computer program module that, when executed by a computer, may perform the functions of the respective modules.

The present invention provides an electronic device, as shown in fig. 5, the electronic device 500 includes a processor 510 and a memory 520. The electronic device 500 may perform a method according to the embodiment of the invention shown in fig. 1 and 3.

In particular, processor 510 may include, for example, a general purpose microprocessor, an instruction set processor and/or related chip set and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), and/or the like. The processor 510 may also include on-board memory for caching purposes. Processor 510 may be a single processing unit or a plurality of processing units for performing different actions of a method flow according to embodiments of the disclosure.

The memory 520, for example, can be any medium that can contain, store, communicate, propagate, or transport the instructions. For example, a readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, device, or propagation medium. Specific examples of the readable storage medium include: magnetic storage devices, such as magnetic tape or Hard Disk Drives (HDDs); optical storage devices, such as compact disks (CD-ROMs); a memory, such as a Random Access Memory (RAM) or a flash memory; and/or wired/wireless communication links.

The memory 520 may include a computer program 521, which computer program 521 may include code/computer-executable instructions that, when executed by the processor 510, cause the processor 510 to perform a method according to an embodiment of the disclosure, or any variation thereof.

The computer program 521 may be configured with, for example, computer program code comprising computer program modules. For example, in an example embodiment, code in computer program 521 may include at least one program module, including for example, module 521A, modules 521B, … …. It should be noted that the division and number of modules are not fixed, and those skilled in the art may use suitable program modules or program module combinations according to actual situations, and when these program modules are executed by the processor 510, the processor 510 may execute the method according to the embodiment of the present disclosure or any variation thereof.

The present disclosure also provides a computer-readable medium, which may be embodied in the apparatus/device/system described in the above embodiments; or may exist separately and not be assembled into the device/apparatus/system. The computer readable medium carries one or more programs which, when executed, implement the method according to an embodiment of the disclosure.

According to embodiments of the present disclosure, a computer readable medium may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer-readable signal medium may include a propagated data signal with computer-readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, optical fiber cable, radio frequency signals, etc., or any suitable combination of the foregoing.

While the present disclosure has been shown and described with reference to certain exemplary embodiments thereof, various changes in form and details may be made therein without departing from the spirit and scope of the present disclosure as defined by the appended claims and their equivalents. Accordingly, the scope of the present disclosure should not be limited to the above-described embodiments, but should be defined not only by the appended claims, but also by equivalents thereof.

Claims

1. A method for analyzing user churn characteristics is characterized by comprising the following steps:

acquiring user information of lost users and users not lost in a training sample, and extracting m user characteristics in the user information;

randomly selecting n user characteristics from the m user characteristics to obtain

A characteristic combination, wherein n is more than or equal to m/2;

to the above

Each feature combination in the feature combinations calculates a determinant value of each user feature in the feature combinations to obtain n determinant values, and constructs the n user features into a binary selection tree model according to the n determinant values; wherein, the calculating the decisive value of each user feature in the feature combination to obtain n decisive values includes:

aiming at a user characteristic A in a characteristic combination, dividing a training sample into k different subsets { s ] according to k different values of the user characteristic A₁，s₂，s₃，s_i，...s_k}；

Counting the number of users in each subset, the number of lost users and the number of non-lost users corresponding to the user characteristics A with different values;

by passing

Calculating a determinant value of the user characteristic A, wherein T_AIs specific to the userThe determinant value of A is characterized in that | s | is the total number of users of the training sample, | s_iI is the number of users in the ith subset, s_ijThe number of users in the ith subset that have not been drained, s_iqFor the i subset the number of users, p, is lost₀Frequency p of non-lost users in the training sample₁The frequency of occurrence of the lost user in the training sample;

by adopting the method, the determinant values of other user characteristics are calculated to obtain the n determinant values;

respectively inputting user information samples into

and counting the user characteristics in the X binary selection tree models to obtain Y user characteristics with the top number ranking as user loss characteristics.

2. The method according to claim 1, wherein in the binary selection tree model, the user feature with the largest determinant value is located at the topmost node of the binary selection tree model, the user feature with the smallest determinant value is located at the bottommost node of the binary selection tree model, and the other user features are arranged at the nodes between the topmost node and the bottommost node in sequence from top to bottom according to the descending order of their determinant values.

3. The method of claim 1, wherein the user information samples are respectively input to

Obtaining X binary selection tree models with top ranking of user loss accuracy from the binary selection tree models, wherein the X binary selection tree models comprise:

for aiming at

Inputting the user characteristics of each user in the user information sample into the binary selection tree model, and outputting the result of whether the corresponding user runs away;

comparing the output result of whether the user runs off with the user, and respectively counting the number of the users with the same user running off of the user output result of whether the user runs off and the user;

and ranking the number of the users to obtain X binary selection tree models with the higher accuracy ranking.

4. The method of claim 1, wherein the m user characteristics comprise: at least one of age, sex, whether married, whether car is present, whether child is present, working age, annual income, number of days to become members, consumption for appreciation in a certain period of time, charge in a certain period of time, number of anchor broadcasts concerned, number of days of broadcast of anchor broadcasts concerned in a certain period of time, number of barracks in a certain period of time, and whether lost.

5. A method for predicting user churn, comprising:

The characteristic combination is that n is more than or equal to m/2;

to the above

Each feature combination in the feature combinations, calculating the decisive value of each user feature in the feature combinations to obtain n decisive values, and obtaining the n decisive values according to the n decisive valuesConstructing the n user features into a binary selection tree model; wherein, the calculating the decisive value of each user feature in the feature combination to obtain n decisive values includes:

by passing

Calculating a determinant value of the user characteristic A, wherein T_AIs the decisive value of the user characteristic A, | s | is the total number of users of the training sample, | s_iI is the number of users in the ith subset, s_ijThe number of users in the ith subset that have not been drained, s_iqFor the i subset the number of users, p, is lost₀Frequency p of non-lost users in the training sample₁The frequency of occurrence of the lost user in the training sample;

respectively inputting user information samples into

and respectively inputting the user characteristics of the user to be detected into the X binary selection tree models, wherein each binary selection tree model can display whether the user runs off, if the number of the binary selection tree models which are displayed in the X binary selection tree models and are not run off by the user is larger than the number of the binary selection tree models which are run off by the user, the user to be detected is predicted not to run off, and otherwise, the user to be detected runs off.

6. An apparatus for analyzing user churn characteristics, the apparatus comprising:

the characteristic acquisition module is used for acquiring user information of lost users and users not lost in the training sample and extracting m user characteristics in the user information;

a feature combination generation module for randomly selecting n user features from the m user features to obtain

A characteristic combination, wherein n is more than or equal to m/2;

a model building module for aiming at the above

by passing

Calculating a determinant value of the user characteristic A, wherein T_AIs the decisive value of the user characteristic A, | s | is the total number of users of the training sample, | s_iI is the number of users in the ith subset, s_ijThe number of users in the ith subset that have not been drained, s_iqIs the ith subCentralizing the number of churned users, p₀Frequency p of non-lost users in the training sample₁The frequency of occurrence of the lost user in the training sample;

a verification module for inputting user information samples into the data processing module

and the counting module is used for counting the user characteristics in the X binary selection tree models to obtain Y user characteristics with the top quantity ranking as user loss characteristics.

7. An electronic device, comprising:

a processor;

a memory storing a computer executable program which, when executed by the processor, causes the processor to perform the method of any one of claims 1-5.

8. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-5.