CN109767045A - A kind of prediction technique, device, calculating equipment and the medium of loss user - Google Patents

A kind of prediction technique, device, calculating equipment and the medium of loss user Download PDF

Info

Publication number
CN109767045A
CN109767045A CN201910045620.XA CN201910045620A CN109767045A CN 109767045 A CN109767045 A CN 109767045A CN 201910045620 A CN201910045620 A CN 201910045620A CN 109767045 A CN109767045 A CN 109767045A
Authority
CN
China
Prior art keywords
user
feature
event
probability
lost
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910045620.XA
Other languages
Chinese (zh)
Inventor
张小艳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Tengyun World Technology Co Ltd
Original Assignee
Beijing Tengyun World Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Tengyun World Technology Co Ltd filed Critical Beijing Tengyun World Technology Co Ltd
Priority to CN201910045620.XA priority Critical patent/CN109767045A/en
Publication of CN109767045A publication Critical patent/CN109767045A/en
Pending legal-status Critical Current

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a kind of prediction technique for being lost user, device, equipment and medium are calculated, this method comprises: obtaining subscriber data set, subscriber data set includes a plurality of user data;Each user data is separately input to be lost in user in predicting model and is handled, to predict the probability that user data corresponds to customer churn;The probability of customer churn is corresponded to according to each user data, is determined and is lost user.

Description

A kind of prediction technique, device, calculating equipment and the medium of loss user
Technical field
The present invention relates to internet and big data processing technology field, in particular to a kind of prediction technique for being lost user, Device calculates equipment and medium.
Background technique
With the continuous development of Internet technology and hardware technology, more and more people begin to use such as smart phone, put down The mobile terminal devices such as plate computer.Meanwhile the widely available development for promoting mobile application of mobile Internet is swifter and more violent, user By using all kinds of mobile applications installed on mobile terminal, the Activities such as read, chatted, being done shopping.
For the developer of application or supplier, it is contemplated that the cost for obtaining the new user of application retains existing use Family obtains objective cost for maximization and customer lifelong value is most important.In general, can be predicted by the event in analysis application A possibility that customer churn, and energetically exchanged with correct user according to a possibility that prediction, it is used to reach reduction The purpose of family turnover rate.
However, at present when convection current appraxia family is predicted, based on feature only have ten several, such as application opens, equipment Platform, information transmission, application version etc., characteristic type is abundant not enough, and the accuracy of prediction result is to be improved, and cannot mention For being lost the higher feature of correlation, it is not easy to research staff or operation personnel's later period carries out churn analysis.Therefore, it is necessary to one kind The prediction technique of new loss user optimizes above-mentioned treatment process.
Summary of the invention
For this purpose, the present invention provides a kind of prediction scheme for being lost user, to try hard to solve or at least alleviate exist above The problem of.
According to an aspect of the present invention, a kind of prediction technique for being lost user is provided, this method comprises the following steps: first First, subscriber data set is obtained, subscriber data set includes a plurality of user data;Each user data is separately input to be lost and is used It is handled in the prediction model of family, to predict the probability that user data corresponds to customer churn;According to each user data to application The probability that family is lost determines and is lost user.
Optionally, in the prediction technique according to the present invention for being lost user, user data includes user identifier, behavior spy It seeks peace attributive character.
Optionally, in the prediction technique according to the present invention for being lost user, subscriber data set is obtained, comprising: from answering It include event information, application message and facility information using data with extracting user identifier in log and applying data;According to answering With data, the corresponding behavioural characteristic of each user identifier and attributive character are determined;By user identifier, behavioural characteristic and attributive character phase It should be associated with, to generate user data;Collect each user data, to form subscriber data set.
Optionally, in the prediction technique according to the present invention for being lost user, event information includes event title, event hair Raw Time And Event temporal information, application message includes application time information, application version and application site information.
Optionally, in the prediction technique according to the present invention for being lost user, according to data are applied, each user identifier is determined Corresponding behavioural characteristic and attributive character, comprising: by the coding mode of one-hot encoding, respectively to Time To Event, using version Originally, application site information and facility information carry out coded treatment, special to generate corresponding first event temporal characteristics, application version Sign, position feature and equipment feature;By first event temporal characteristics, application version feature, position feature and equipment feature, as The corresponding attributive character of user identifier.
Optionally, in the prediction technique according to the present invention for being lost user, according to data are applied, each user identifier is determined Corresponding behavioural characteristic and attributive character, comprising: according to user identifier and event title, it is secondary to determine that the event of corresponding event occurs Number feature;Event time information and application time information are accordingly converted respectively, to generate the corresponding second event time Feature and application time feature;Based on first event temporal characteristics, determine that user identifier corresponds to the user of user and enlivens number of days spy Sign;Event frequency feature, second event temporal characteristics, application time feature and user are enlivened into number of days feature, as with Family identifies corresponding behavioural characteristic.
Optionally, in the prediction technique according to the present invention for being lost user, customer churn is corresponded to according to each user data Probability, determine be lost user, comprising: if user data correspond to customer churn probability be lower than the first probability threshold value, should User is determined as low probability and is lost user;If user data corresponds to the probability of customer churn not less than the first probability threshold value, and not Higher than the second probability threshold value, then the user is determined as middle probability current appraxia family;If user data corresponds to the probability of customer churn Higher than the second probability threshold value, then the user is determined as high probability and is lost user.
Optionally, in the prediction technique according to the present invention for being lost user, being lost user in predicting model includes that logic is returned Return model.
It optionally, further include determining to be lost corelation behaviour feature in the prediction technique according to the present invention for being lost user, It determines and is lost corelation behaviour feature, comprising: determine the corresponding knee value of each behavioural characteristic;According to knee value, behavioural characteristic is calculated Corresponding criterion score;Based on criterion score, accordingly sorted to each behavioural characteristic;Preceding first quantity behavioural characteristic is made To be lost corelation behaviour feature.
It optionally, further include determining to be lost association attributes feature in the prediction technique according to the present invention for being lost user, Determine be lost association attributes feature, comprising: to each attributive character, computation attribute feature correspond to customer churn probability and average use The difference of family loss probability;Computation attribute feature corresponds to the quantity of user and the ratio of total number of users;It is right based on difference and ratio Each attributive character is accordingly sorted;Using preceding second quantity attributive character as loss association attributes feature.
According to a further aspect of the invention, provide it is a kind of be lost user prediction meanss, the device include obtain module, Prediction module and determining module.Wherein, it obtains module to be suitable for obtaining subscriber data set, subscriber data set includes a plurality of user Data;Prediction module, which is suitable for being separately input to each user data to be lost in user in predicting model, to be handled, to predict use User data corresponds to the probability of customer churn;Determining module is suitable for corresponding to the probability of customer churn according to each user data, determines stream Appraxia family.
Optionally, in the prediction meanss according to the present invention for being lost user, user data includes user identifier, behavior spy It seeks peace attributive character.
Optionally, it in the prediction meanss according to the present invention for being lost user, obtains module and is further adapted for from using day User identifier is extracted in will and applies data, includes event information, application message and facility information using data;According to using number According to determining the corresponding behavioural characteristic of each user identifier and attributive character;User identifier, behavioural characteristic and attributive character are accordingly closed Connection, to generate user data;Collect each user data, to form subscriber data set.
Optionally, in the prediction meanss according to the present invention for being lost user, event information includes event title, event hair Raw Time And Event temporal information, application message includes application time information, application version and application site information.
Optionally, it in the prediction meanss according to the present invention for being lost user, obtains module and is further adapted for through solely heat The coding mode of code carries out coded treatment to Time To Event, application version, application site information and facility information respectively, To generate corresponding first event temporal characteristics, application version feature, position feature and equipment feature;By first event time spy Sign, application version feature, position feature and equipment feature, as the corresponding attributive character of user identifier.
Optionally, it in the prediction meanss according to the present invention for being lost user, obtains module and is further adapted for according to user Mark and event title determine the event frequency feature of corresponding event;Event time information and application time are believed respectively Breath is accordingly converted, to generate corresponding second event temporal characteristics and application time feature;Based on first event time spy Sign determines that user identifier corresponds to the user of user and enlivens number of days feature;Event frequency feature, second event time is special Sign, application time feature and user enliven number of days feature, as the corresponding behavioural characteristic of user identifier.
Optionally, in the prediction meanss according to the present invention for being lost user, determining module is further adapted for working as number of users When being lower than the first probability threshold value according to the probability of corresponding customer churn, which is determined as low probability and is lost user;Work as number of users When being not less than the first probability threshold value according to the probability of corresponding customer churn, and being not higher than the second probability threshold value, which is determined as Middle probability current appraxia family;When the probability that user data corresponds to customer churn is higher than the second probability threshold value, which is determined as High probability is lost user.
Optionally, in the prediction meanss according to the present invention for being lost user, being lost user in predicting model includes that logic is returned Return model.
Optionally, in the prediction meanss according to the present invention for being lost user, determining module, which is further adapted for determining, is lost correlation Behavioural characteristic is further adapted for determining the corresponding knee value of each behavioural characteristic;According to knee value, the corresponding mark of behavioural characteristic is calculated Quasi- score;Based on criterion score, accordingly sorted to each behavioural characteristic;Using preceding first quantity behavioural characteristic as loss phase Close behavioural characteristic.
Optionally, in the prediction meanss according to the present invention for being lost user, determining module, which is further adapted for determining, is lost correlation Attributive character is further adapted for each attributive character, and computation attribute feature corresponds to the probability of customer churn and average user is lost The difference of probability;Computation attribute feature corresponds to the quantity of user and the ratio of total number of users;Based on difference and ratio, to each attribute Feature is accordingly sorted;Using preceding second quantity attributive character as loss association attributes feature.
According to a further aspect of the invention, provide a kind of calculating equipment, including one or more processors, memory with And one or more programs, wherein one or more programs are stored in memory and are configured as by one or more processors It executes, one or more programs include the instruction for executing the prediction technique according to the present invention for being lost user.
According to a further aspect of the invention, a kind of computer-readable storage medium storing one or more programs is also provided Matter, one or more programs include instruction, are instructed when executed by a computing apparatus, so that it is according to the present invention to calculate equipment execution It is lost the prediction technique of user.
The prediction scheme according to the present invention for being lost user predicts user data based on user in predicting model is lost Processing, obtains the probability that user data corresponds to customer churn, grade classification is carried out to user according to the probability of loss, to determine not With the corresponding user of grade is lost, to there is the understanding of macroscopic view to the loss of user.Wherein, user data is with behavioural characteristic and category Property feature characterize the characteristics of relative users, behavioural characteristic includes behavior event corresponding feature of the user in the interior generation of application, Attributive character includes user in the corresponding feature of the interior dependent event attribute of application, and feature diversification helps to understand prediction model, Promote the accuracy of prediction result.In addition, the behavioural characteristic and attributive character all by combing user, determine significant with loss Relevant behavioural characteristic and attributive character, can it is more intuitive understand user distribution and be lost correlated characteristic, convenient for subsequent analysis simultaneously It is linked up in time with correct user.
Detailed description of the invention
To the accomplishment of the foregoing and related purposes, certain illustrative sides are described herein in conjunction with following description and drawings Face, these aspects indicate the various modes that can practice principles disclosed herein, and all aspects and its equivalent aspect It is intended to fall in the range of theme claimed.Read following detailed description in conjunction with the accompanying drawings, the disclosure it is above-mentioned And other purposes, feature and advantage will be apparent.Throughout the disclosure, identical appended drawing reference generally refers to identical Component or element.
Fig. 1 shows the structural block diagram according to an embodiment of the invention for calculating equipment 100;
Fig. 2 shows the flow charts of the prediction technique 200 according to an embodiment of the invention for being lost user;And
Fig. 3 shows the schematic diagram of the prediction meanss 300 according to an embodiment of the invention for being lost user.
Specific embodiment
Exemplary embodiments of the present disclosure are described in more detail below with reference to accompanying drawings.Although showing the disclosure in attached drawing Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here It is limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure It is fully disclosed to those skilled in the art.
Fig. 1 shows the structural block diagram according to an embodiment of the invention for calculating equipment 100.In basic configuration 102 In, it calculates equipment 100 and typically comprises system storage 106 and one or more processor 104.Memory bus 108 can For the communication between processor 104 and system storage 106.
Depending on desired configuration, processor 104 can be any kind of processing, including but not limited to: microprocessor (μ P), microcontroller (μ C), digital information processor (DSP) or any combination of them.Processor 104 may include such as The cache of one or more rank of on-chip cache 110 and second level cache 112 etc, processor core 114 and register 116.Exemplary processor core 114 may include arithmetic and logical unit (ALU), floating-point unit (FPU), Digital signal processing core (DSP core) or any combination of them.Exemplary Memory Controller 118 can be with processor 104 are used together, or in some implementations, and Memory Controller 118 can be an interior section of processor 104.
Depending on desired configuration, system storage 106 can be any type of memory, including but not limited to: easily The property lost memory (RAM), nonvolatile memory (ROM, flash memory etc.) or any combination of them.System storage Device 106 may include operating system 120, one or more program 122 and program data 124.In some embodiments, Program 122 may be arranged to be executed instruction by one or more processors 104 using program data 124 on an operating system.
Calculating equipment 100 can also include facilitating from various interface equipments (for example, output equipment 142, Peripheral Interface 144 and communication equipment 146) to basic configuration 102 via the communication of bus/interface controller 130 interface bus 140.Example Output equipment 142 include graphics processing unit 148 and audio treatment unit 150.They can be configured as facilitate via One or more port A/V 152 is communicated with the various external equipments of such as display or loudspeaker etc.Outside example If interface 144 may include serial interface controller 154 and parallel interface controller 156, they, which can be configured as, facilitates Via one or more port I/O 158 and such as input equipment (for example, keyboard, mouse, pen, voice-input device, touch Input equipment) or the external equipment of other peripheral hardwares (such as printer, scanner etc.) etc communicated.Exemplary communication is set Standby 146 may include network controller 160, can be arranged to convenient for via one or more communication port 164 and one A or multiple other calculate communication of the equipment 162 by network communication link.
Network communication link can be an example of communication media.Communication media can be usually presented as in such as carrier wave Or computer readable instructions, data structure, program module in the modulated data signal of other transmission mechanisms etc, and can To include any information delivery media." modulated data signal " can such signal, one in its data set or more It is a or it change can the mode of encoded information in the signal carry out.As unrestricted example, communication media can be with Wired medium including such as cable network or private line network etc, and it is such as sound, radio frequency (RF), microwave, infrared (IR) the various wireless mediums or including other wireless mediums.Term computer-readable medium used herein may include depositing Both storage media and communication media.
Calculating equipment 100 can be implemented as server, such as file server, database server, application program service Device and WEB server etc. also can be implemented as a part of portable (or mobile) electronic equipment of small size, these electronic equipments It can be such as cellular phone, personal digital assistant (PDA), personal media player device, wireless network browsing apparatus, individual Helmet, application specific equipment or may include any of the above function mixing apparatus.Calculating equipment 100 can also be real It is now the personal computer for including desktop computer and notebook computer configuration.
In some embodiments, it calculates equipment 100 and is embodied as server, and be configured as executing loss according to the present invention The prediction technique 200 of user.Wherein, it includes according to the present invention for executing for calculating one or more programs 122 of equipment 100 It is lost the instruction of the prediction technique 200 of user.
Fig. 2 shows the flow charts of the prediction technique 200 according to an embodiment of the invention for being lost user.Such as Fig. 2 Shown, method 200 starts from step S210.In step S210, subscriber data set is obtained, subscriber data set includes a plurality of use User data.Wherein, user data includes user identifier, behavioural characteristic and attributive character.
According to one embodiment of present invention, subscriber data set can be obtained in the following way.Firstly, from log is applied Middle extraction user identifier and apply data, include event information, application message and facility information using data.
In this embodiment, event information includes event title, Time To Event and event time information, using letter Breath includes application time information, application version and application site information, and facility information includes equipment brand, device model, language Environment, time-zone information, regional information and system version.Wherein, event time information includes that session persistence and event are sent Time interval, the number of days that event transmission time interval correspond to the date at the time of being transmission event and the default date is separated by, when application Between information include using the set-up time and apply renewal time, using renewal time be using last update time, answer It is user using geographical location locating when applying with location information, is normally stored down to county (area) grade, time-zone information and area letter Breath is accurate to city-level and national level respectively.
For example, being a certain application from the original log on April 14th, 1 day 1 April in 2018 using log, include 14 days daily record datas.Wherein, preceding 7 days daily record datas apply data for extracting, and are determined so as to subsequent according to application data Behavioural characteristic and attributive character, rear 7 days daily record datas be used to judge whether user is really lost, with assessment prediction result whether Accurately, i.e., each user identifier occurred for first 7 days, if the rear any event for confiscating the user identifier for 7 days, label should The corresponding customer churn of user identifier, on the contrary label is loss.62614 use have been extracted from preceding 7 days daily record datas Family mark, and it is associated using data with each user identifier, and 35026 users are marked to be lost user.
The following are an examples in user identifier and application data including content:
User identifier: id, 3b824f95edb040755767fb10aabb1c60b
Event information: event, app_init_first_False
Date, 2018/3/31 23:56:27
SessionDuration, NaN
Recency, 8
Application message: installTime, 2017/12/12 22:25:50
UpdateTime, 2018/3/16 8:15:27
AppVersion, 3.8.3
City, the People's Republic of China (PRC)-Sichuan Province-Meishan City-Renshou County
Facility information: brand, viviopo
Model, viviopo R11
Language, zh
TimeZone, Asia/Shanghai
Locale, zh_CN
OsVersion, Android+5.1.1
As it appears from the above, user identifier (id) is " 3b824f95edb040755767fb10aabb1c60b ", event title (event) be " app_init_first_False ", indicate the event be using unlatching event, and this be not for the first time open Using Time To Event (date) is " 2018/3/31 23:56:27 ", shows the event 31 days 23: 56 March in 2018 Divide 27 seconds and occur, session persistence (ses sionDuration) is " NaN ", and event transmission time interval (recency) is 8, show that it is 8 that the number of days that the date is separated by with the default date is corresponded at the time of transmission event, here presetting at the date is April 10 in 2018 Day.
It is " 2017/12/12 22:25:50 " using set-up time (installTime), shows that the application is installed on 2017 In 25 minutes and 50 seconds 22 points of on December 12, it is " 2018/3/16 8:15:27 " using renewal time (updateTime), shows that this is answered It is 15 minutes and 27 seconds 8 points of on March 16th, 2018 with last update, application version (appVersion) is 3.8.3, using position It is People's Republic of China's Meishan city Renshou County that confidence, which ceases (city),.
Equipment brand (brand) is viviopo, shows that user uses the brand of the equipment for being equipped with the application to be Viviopo, device model (model) are viviopo R11, show that the model of the equipment is viviopo R11, language environment It (language) is " zh " to show equipment current setting Chinese as display language, time-zone information (timeZone) is " Asia/ Shanghai " shows that current time zone is Asia Shanghai, and regional information (locale) is " zh_CN ", shows that current locale is State, system version (osVersion) are " Android+5.1.1 ", show that equipment current operation system is the 5.1.1 of Android system Version.
Next, determining the corresponding behavioural characteristic of each user identifier and attributive character according to data are applied.According to the present invention One embodiment, the corresponding behavioural characteristic of each user identifier and attributive character can be determined as follows.Firstly, by only The coding mode of hot code respectively carries out at coding Time To Event, application version, application site information and facility information Reason, to generate corresponding first event temporal characteristics, application version feature, position feature and equipment feature.
In this embodiment, one-hot encoding coding is carried out to the Time To Event corresponding date, to generate corresponding the One event time feature.To in equipment feature equipment brand, device model, language environment, time-zone information, regional information and be Unite version, also respectively carry out one-hot encoding coding, with formed corresponding brand identity, model features, language feature, time zone feature, Regionalism and system version feature, using these features as equipment feature.
Then, by first event temporal characteristics, application version feature, position feature and equipment feature, as user identifier Corresponding attributive character.It is generated by being then based on the coding mode of one-hot encoding, the value of attributive character is only 1 or 0, value The characteristic with this feature is represented for " 1 ", value is that " 0 " then represents the characteristic without this feature.
Above-mentioned processing is the extraction process of attributive character, and behavioural characteristic can be determined as follows.Firstly, according to user Mark and event title determine the event frequency feature of corresponding event, then respectively to event time information and application time Information is accordingly converted, and to generate corresponding second event temporal characteristics and application time feature, is based on the first event time Feature determines that user identifier corresponds to the user of user and enlivens number of days feature, and event frequency feature, second event time is special Sign, application time feature and user enliven number of days feature, as the corresponding behavioural characteristic of user identifier.
In general, counting same event name to a user identifier and weighing up existing number, that is, can determine the event of corresponding event Event title is associated with event frequency as event frequency feature by frequency.Event time information includes " NaN " is substituted for " 0 ", then tire out if session persistence is " NaN " by session persistence and event transmission time interval The value of meter session persistence is substituted for 20 as Session Time feature, and by the exceptional value in event transmission time interval, into And the minimum value in interval of events is sent as transmission time interval feature using event, it will between words temporal characteristics and sending time Every feature as second event temporal characteristics.
Application time information includes using the set-up time and applying renewal time, when will update using set-up time and application Between the corresponding date, be converted to the number of days that distance presets the date, then using application set-up time corresponding number of days as the set-up time Feature is made set-up time feature and renewal time feature using application renewal time corresponding number of days as renewal time feature For application time feature.And for a user identifier, the value of the corresponding first event temporal characteristics of the user identifier that adds up, The user for corresponding to user as the user identifier enlivens number of days feature.
After the determination of consummatory behavior feature and attributive character, user identifier, behavioural characteristic and attributive character are accordingly closed Connection, to generate user data.According to one embodiment of present invention, for 62614 user identifiers extracted, amount to and generate 62614 user data.Finally, collecting each user data, to form subscriber data set.
Then, S220 is entered step, each user data is separately input to handle in loss user in predicting model, with Predict the probability that user data corresponds to customer churn.According to one embodiment of present invention, being lost user in predicting model includes Logic Regression Models.
Logic Regression Models are also known as LR (Logistic Regression) model, can be used for regression analysis, prediction, divide Class etc..Certainly, it in application Logic Regression Models, needs in advance to be trained it, used training data is using as above The mode for generating user data needs to add user data label also only to form complete training data.If with The corresponding user of user data is to be lost user, then the label for being 1 to the user data add value, the label that otherwise add value is 0. In view of Logic Regression Models are existing mature technology, building and training process are for understanding the technical staff of the present invention program For can be readily apparent that, and also within protection scope of the present invention, do not repeated herein.Preferably In, by a user data input to after being lost in user in predicting model, the user data will be exported by being lost user in predicting model The probability of corresponding customer churn.
In addition, being lost user in predicting model can not only realize based on Logic Regression Models as above, branch can also be passed through Hold the realization of vector machine (Support Vector Machine, SVM) model scheduling algorithm.To which kind of algorithm or model construction stream used Appraxia family prediction model, the present invention also do not limit, can be according to practical application scene, network training situation, system configuration and performance It is required that etc. selected, and to the model construction process and relevant parameter appropriate adjustment in selected mode, these are for understanding It can be readily apparent that for the technical staff of the present invention program, and also within protection scope of the present invention, herein not It is repeated.
Finally, executing step S230, the probability of customer churn is corresponded to according to each user data, is determined and is lost user.According to One embodiment of the present of invention can correspond to the probability of customer churn according to each user data in the following way, determine to be lost and use Family.Specifically, if user data corresponds to the probability of customer churn lower than the first probability threshold value, which is determined as low probability It is lost user, if user data corresponds to the probability of customer churn not less than the first probability threshold value, and is not higher than the second probability threshold value, The user is then determined as middle probability current appraxia family, if the probability that user data corresponds to customer churn is higher than the second probability threshold value, The user is then determined as high probability and is lost user.
In this embodiment, the first probability threshold value is 0.2, and the second probability threshold value is 0.8, finally can determine low probability stream The quantity at appraxia family is 21151, and the quantity at middle probability current appraxia family is 41365, and the quantity that high probability is lost user is 98.Upper It states in 62614 users, the number of users that the probability of loss is more than 0.6 is 32237, and being actually lost user is 35026, Error illustrates above prediction result accuracy with higher within 10%.
Further to analyze customer churn reason, according to one embodiment of present invention, combing behavior can be passed through Feature and attributive character determine that significant difference is lost user's category related to the loss corelation behaviour feature at non-streaming appraxia family and loss Property feature.
When determining loss corelation behaviour feature, the corresponding knee value of each behavioural characteristic is first determined, according to knee value, calculate The corresponding criterion score of behavioural characteristic is based on criterion score, is accordingly sorted to each behavioural characteristic, by preceding first quantity row It is characterized as loss corelation behaviour feature.
Specifically, being directed to each behavioural characteristic, all possible value of behavior feature is recorded, to each value, according to this All users are divided into two groups by value, and a group includes user of the value no more than the above-mentioned value for grouping of behavior feature, Another group includes the user that the value of behavior feature is greater than the above-mentioned value for grouping, calculates customer churn between two groups The value that loss probability differs maximum behavioural characteristic is denoted as the knee value of behavior feature by probability difference.
According to knee value, criterion score is calculated according to following formula:
Wherein,It is lost the specific gravity of user in the middle for user before knee value,To be lost user in user after knee value Specific gravity,For the specific gravity for being lost user in all users, n1For the quantity of user before knee value, n1For user after knee value Quantity.
In this embodiment, the first quantity is preset as 2, then the sequence according to the absolute value of criterion score from big to small, The corresponding behavioural characteristic of each criterion score is ranked up, obtain preceding 2 behavioural characteristics be successively user enliven number of days feature and Event frequency feature (the entitled app_init_first_False of event) is related as being lost using this 2 behavioural characteristics Behavioural characteristic.
Determining that, to each attributive character, computation attribute feature corresponds to the probability of customer churn when being lost association attributes feature The difference of probability is lost with average user, computation attribute feature corresponds to the quantity of user and the ratio of total number of users, is based on difference And ratio, it is accordingly sorted to each attributive character, using preceding second quantity attributive character as loss association attributes feature.
In this embodiment, the second quantity is preset as 3, is lost probability according to the probability and average user of customer churn The sequence of difference from big to small, is first ranked up each attributive character, each attributive character identical for above-mentioned difference, then is based on The attributive character corresponds to the quantity of user and the ratio sequence from big to small of total number of users, to each two minor sort of attributive character, Obtaining preceding 3 attributive character is successively that (corresponding application site information is that Shaanxi Province, the People's Republic of China (PRC) is safe and comfortable to position feature City Ziyang County), brand identity (corresponding equipment brand is SA&CI) and model features (corresponding device model is SC Pro), Using this 3 attributive character as loss association attributes feature.
Fig. 3 shows the schematic diagram of the prediction meanss 300 according to an embodiment of the invention for being lost user.Such as Fig. 3 Shown, the prediction meanss 300 for being lost user include obtaining module 310, prediction module 320 and determining module 330.
It obtains module 310 to be suitable for obtaining subscriber data set, subscriber data set includes a plurality of user data.
According to one embodiment of present invention, user data includes user identifier, behavioural characteristic and attributive character.
Module 310 is obtained to be further adapted for extracting user identifier from application log and using data, include using data Event information, application message and facility information determine that the corresponding behavioural characteristic of each user identifier and attribute are special according to data are applied Sign, by user identifier, behavioural characteristic and attributive character respective associated, to generate user data, collects each user data, to be formed Subscriber data set.
In this embodiment, event information includes event title, Time To Event and event time information, using letter Breath includes application time information, application version and application site information.
It obtains module 310 and is further adapted for the coding mode by one-hot encoding, respectively to Time To Event, using version Originally, application site information and facility information carry out coded treatment, special to generate corresponding first event temporal characteristics, application version Sign, position feature and equipment feature, by first event temporal characteristics, application version feature, position feature and equipment feature, as The corresponding attributive character of user identifier.
Module 310 is obtained to be further adapted for determining the event generation time of corresponding event according to user identifier and event title Number feature, respectively accordingly converts event time information and application time information, to generate the corresponding second event time Feature and application time feature are based on first event temporal characteristics, determine and user identifier correspond to the user of user to enliven number of days special Sign, event frequency feature, second event temporal characteristics, application time feature and user are enlivened into number of days feature, as with Family identifies corresponding behavioural characteristic.
Prediction module 320, which is suitable for being separately input to each user data to be lost in user in predicting model, to be handled, with pre- Measure the probability that user data corresponds to customer churn.
According to one embodiment of present invention, being lost user in predicting model includes Logic Regression Models.
Determining module 330 is suitable for corresponding to the probability of customer churn according to each user data, determines and is lost user.
According to one embodiment of present invention, determining module 330 is further adapted for corresponding to customer churn when user data When probability is lower than the first probability threshold value, which is determined as low probability and is lost user, when user data corresponds to customer churn Probability be not less than the first probability threshold value, and be not higher than the second probability threshold value when, which is determined as middle probability current appraxia family, when When the probability that user data corresponds to customer churn is higher than the second probability threshold value, which is determined as high probability and is lost user.
Determining module 330, which is further adapted for determining, is lost corelation behaviour feature, is further adapted for determining that each behavioural characteristic is corresponding Knee value calculates the corresponding criterion score of behavioural characteristic, is based on criterion score according to knee value, carries out phase to each behavioural characteristic It should sort, using preceding first quantity behavioural characteristic as loss corelation behaviour feature.
Determining module 330, which is further adapted for determining, is lost association attributes feature, is further adapted for each attributive character, computation attribute Feature corresponds to the probability of customer churn and the difference of average user loss probability, and computation attribute feature corresponds to the quantity and use of user The ratio of family sum is based on difference and ratio, is accordingly sorted to each attributive character, and preceding second quantity attributive character is made To be lost association attributes feature.
About the specific steps and embodiment of the prediction for being lost user, it has been disclosed in detail in the description based on Fig. 2, Details are not described herein again.
The existing prediction technique for being lost user, used feature is less, and type is not abundant enough, prediction result accuracy It is lower, and cannot provide and be lost significant relevant feature, it is not easy to subsequent further analysis processing, it is difficult to retrieve to be lost user Data supporting is provided.The prediction scheme according to an embodiment of the present invention for being lost user, based on loss user in predicting model to user Data carry out prediction processing, obtain the probability that user data corresponds to customer churn, carry out grade to user according to the probability of loss It divides, with the corresponding user of the different loss grades of determination, to there is the understanding of macroscopic view to the loss of user.Wherein, user data The characteristic of relative users is characterized with behavioural characteristic and attributive character, behavioural characteristic includes behavior thing of the user in the interior generation of application The corresponding feature of part, attributive character include user in the corresponding feature of the interior dependent event attribute of application, and feature diversification facilitates Understand prediction model, promotes the accuracy of prediction result.In addition, the behavioural characteristic and attributive character all by combing user, Determine and be lost significant relevant behavioural characteristic and attributive character, can the more intuitive distribution for understanding user and loss correlated characteristic, It is linked up in time convenient for subsequent analysis and with correct user.
A8. the method as described in any one of A1-7, wherein the loss user in predicting model includes logistic regression mould Type.
A9. the method as described in any one of A2-8, wherein further include determining to be lost corelation behaviour feature, the determination It is lost corelation behaviour feature, comprising:
Determine the corresponding knee value of each behavioural characteristic;
According to the knee value, the corresponding criterion score of the behavioural characteristic is calculated;
Based on the criterion score, accordingly sorted to each behavioural characteristic;
Using preceding first quantity behavioural characteristic as loss corelation behaviour feature.
A10. the method as described in any one of A2-9, wherein further include determining to be lost association attributes feature, the determination It is lost association attributes feature, comprising:
To each attributive character, calculates the attributive character and correspond to the probability of customer churn and the difference of average user loss probability Value;
It calculates the attributive character and corresponds to the quantity of user and the ratio of total number of users;
Based on the difference and the ratio, accordingly sorted to each attributive character;
Using preceding second quantity attributive character as loss association attributes feature.
In the instructions provided here, numerous specific details are set forth.It is to be appreciated, however, that implementation of the invention Example can be practiced without these specific details.In some instances, well known method, knot is not been shown in detail Structure and technology, so as not to obscure the understanding of this specification.
Similarly, it should be understood that in order to simplify the disclosure and help to understand one or more of the various inventive aspects, Above in the description of exemplary embodiment of the present invention, each feature of the invention is grouped together into single implementation sometimes In example, figure or descriptions thereof.However, the disclosed method should not be interpreted as reflecting the following intention: i.e. required to protect Shield the present invention claims than feature more features expressly recited in each claim.More precisely, as following As claims reflect, inventive aspect is all features less than single embodiment disclosed above.Therefore, it abides by Thus the claims for following specific embodiment are expressly incorporated in the specific embodiment, wherein each claim itself As a separate embodiment of the present invention.
Those skilled in the art should understand that the module of the equipment in example disclosed herein or unit or groups Between can be arranged in equipment as depicted in this embodiment, or alternatively can be positioned at and the equipment in the example In different one or more equipment.Module in aforementioned exemplary can be combined into a module or furthermore be segmented into multiple Submodule.
Those skilled in the art will understand that can be carried out adaptively to the module in the equipment in embodiment Change and they are arranged in one or more devices different from this embodiment.It can be the module or list in embodiment Be combined into one between module or unit or group between member or group, and furthermore they can be divided into multiple submodule or subelement or Between subgroup.Other than such feature and/or at least some of process or unit exclude each other, it can use any Combination is to all features disclosed in this specification (including adjoint claim, abstract and attached drawing) and so disclosed All process or units of what method or apparatus are combined.Unless expressly stated otherwise, this specification is (including adjoint power Benefit require, abstract and attached drawing) disclosed in each feature can carry out generation with an alternative feature that provides the same, equivalent, or similar purpose It replaces.
In addition, it will be appreciated by those of skill in the art that although some embodiments described herein include other embodiments In included certain features rather than other feature, but the combination of the feature of different embodiments mean it is of the invention Within the scope of and form different embodiments.For example, in the following claims, embodiment claimed is appointed Meaning one of can in any combination mode come using.
In addition, be described as herein can be by the processor of computer system or by executing by some in the embodiment The combination of method or method element that other devices of the function are implemented.Therefore, have for implementing the method or method The processor of the necessary instruction of element forms the device for implementing this method or method element.In addition, Installation practice Element described in this is the example of following device: the device be used for implement as in order to implement the purpose of the invention element performed by Function.
Various technologies described herein are realized together in combination with hardware or software or their combination.To the present invention Method and apparatus or the process and apparatus of the present invention some aspects or part can take insertion tangible media, such as it is soft The form of program code (instructing) in disk, CD-ROM, hard disk drive or other any machine readable storage mediums, Wherein when program is loaded into the machine of such as computer etc, and is executed by the machine, the machine becomes to practice this hair Bright equipment.
In the case where program code executes on programmable computers, calculates equipment and generally comprise processor, processor Readable storage medium (including volatile and non-volatile memory and or memory element), at least one input unit, and extremely A few output device.Wherein, memory is configured for storage program code;Processor is configured for according to the memory Instruction in the said program code of middle storage executes the prediction technique of loss user of the invention.
By way of example and not limitation, computer-readable medium includes computer storage media and communication media.It calculates Machine readable medium includes computer storage media and communication media.Computer storage medium storage such as computer-readable instruction, The information such as data structure, program module or other data.Communication media is generally modulated with carrier wave or other transmission mechanisms etc. Data-signal processed passes to embody computer readable instructions, data structure, program module or other data including any information Pass medium.Above any combination is also included within the scope of computer-readable medium.
As used in this, unless specifically stated, come using ordinal number " first ", " second ", " third " etc. Description plain objects, which are merely representative of, is related to the different instances of similar object, and is not intended to imply that the object being described in this way must Must have the time it is upper, spatially, sequence aspect or given sequence in any other manner.
Although the embodiment according to limited quantity describes the present invention, above description, the art are benefited from It is interior it is clear for the skilled person that in the scope of the present invention thus described, it can be envisaged that other embodiments.Additionally, it should be noted that Language used in this specification primarily to readable and introduction purpose and select, rather than in order to explain or limit Determine subject of the present invention and selects.Therefore, without departing from the scope and spirit of the appended claims, for this Many modifications and changes are obvious for the those of ordinary skill of technical field.For the scope of the present invention, to this Invent done disclosure be it is illustrative and not restrictive, it is intended that the scope of the present invention be defined by the claims appended hereto.

Claims (10)

1. a kind of prediction technique for being lost user, comprising:
Subscriber data set is obtained, the subscriber data set includes a plurality of user data;
Each user data is separately input to be lost in user in predicting model and is handled, it is corresponding to predict the user data The probability of customer churn;
The probability of customer churn is corresponded to according to each user data, is determined and is lost user.
2. the method for claim 1, wherein the user data includes user identifier, behavioural characteristic and attributive character.
3. method according to claim 1 or 2, wherein the acquisition subscriber data set, comprising:
User identifier is extracted from application log and applies data, and the application data include event information, application message and set Standby information;
Data are applied according to described, determine the corresponding behavioural characteristic of each user identifier and attributive character;
By the user identifier, behavioural characteristic and attributive character respective associated, to generate user data;
Collect each user data, to form subscriber data set.
4. method as claimed in claim 3, wherein the event information includes event title, Time To Event and event Temporal information, the application message include application time information, application version and application site information.
5. method as claimed in claim 4, wherein it is described to apply data according to described, determine the corresponding row of each user identifier It is characterized and attributive character, comprising:
By the coding mode of one-hot encoding, the Time To Event, application version, application site information and equipment are believed respectively Breath carries out coded treatment, to generate corresponding first event temporal characteristics, application version feature, position feature and equipment feature;
By the first event temporal characteristics, application version feature, position feature and equipment feature, as the user identifier pair The attributive character answered.
6. method as claimed in claim 5, wherein it is described to apply data according to described, determine the corresponding row of each user identifier It is characterized and attributive character, comprising:
According to the user identifier and the event title, the event frequency feature of corresponding event is determined;
The event time information and application time information are accordingly converted respectively, to generate the corresponding second event time Feature and application time feature;
Based on the first event temporal characteristics, determine that the user identifier corresponds to the user of user and enlivens number of days feature;
The event frequency feature, second event temporal characteristics, application time feature and user are enlivened into number of days feature, made For the corresponding behavioural characteristic of the user identifier.
7. such as method of any of claims 1-6, wherein described to correspond to the general of customer churn according to each user data Rate determines and is lost user, comprising:
If user data corresponds to the probability of customer churn lower than the first probability threshold value, the user is determined as low probability and is lost User;
If user data corresponds to the probability of customer churn not less than the first probability threshold value, and is not higher than the second probability threshold value, then will The user is determined as middle probability current appraxia family;
If the probability that user data corresponds to customer churn is higher than the second probability threshold value, the user is determined as high probability and is lost User.
8. a kind of prediction meanss for being lost user, comprising:
Module is obtained, is suitable for obtaining subscriber data set, the subscriber data set includes a plurality of user data;
Prediction module is handled suitable for each user data to be separately input to be lost in user in predicting model, to predict State the probability that user data corresponds to customer churn;
Determining module determines suitable for corresponding to the probability of customer churn according to each user data and is lost user.
9. a kind of calculating equipment, comprising:
One or more processors;
Memory;And
One or more programs, wherein one or more of programs are stored in the memory and are configured as by described one A or multiple processors execute, and one or more of programs include for executing in method described in -7 according to claim 1 Either method instruction.
10. a kind of computer readable storage medium for storing one or more programs, one or more of programs include instruction, Described instruction when executed by a computing apparatus so that the calculating equipment executes according to claim 1 in method described in -7 Either method.
CN201910045620.XA 2019-01-17 2019-01-17 A kind of prediction technique, device, calculating equipment and the medium of loss user Pending CN109767045A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910045620.XA CN109767045A (en) 2019-01-17 2019-01-17 A kind of prediction technique, device, calculating equipment and the medium of loss user

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910045620.XA CN109767045A (en) 2019-01-17 2019-01-17 A kind of prediction technique, device, calculating equipment and the medium of loss user

Publications (1)

Publication Number Publication Date
CN109767045A true CN109767045A (en) 2019-05-17

Family

ID=66452876

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910045620.XA Pending CN109767045A (en) 2019-01-17 2019-01-17 A kind of prediction technique, device, calculating equipment and the medium of loss user

Country Status (1)

Country Link
CN (1) CN109767045A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110689381A (en) * 2019-10-10 2020-01-14 中国联合网络通信集团有限公司 Early warning method and device
CN111275245A (en) * 2020-01-13 2020-06-12 宜通世纪物联网研究院(广州)有限公司 Potential network switching user identification method, system, message pushing method, device and medium
CN112162918A (en) * 2020-09-07 2021-01-01 北京达佳互联信息技术有限公司 Application program testing method and device and electronic equipment
CN112837099A (en) * 2021-02-05 2021-05-25 深圳市欢太科技有限公司 Potential loss user identification method and device, storage medium and electronic equipment
CN113055208A (en) * 2019-12-27 2021-06-29 中移信息技术有限公司 Method, device and equipment for identifying information identification model based on transfer learning
CN113256044A (en) * 2020-02-13 2021-08-13 中国移动通信集团广东有限公司 Strategy determination method and device and electronic equipment
CN113496288A (en) * 2020-04-08 2021-10-12 中移动信息技术有限公司 User stability determination method, device, equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009071227A1 (en) * 2007-12-04 2009-06-11 Coremedia Ag Method and system for estimating a number of users of a website based on lossy compressed data
CN105099731A (en) * 2014-04-23 2015-11-25 腾讯科技(深圳)有限公司 Method and system for finding churn factor for user churn of network application
CN106327032A (en) * 2015-06-15 2017-01-11 阿里巴巴集团控股有限公司 Data analysis method used for customer loss early warning and data analysis device thereof
CN107609708A (en) * 2017-09-25 2018-01-19 广州赫炎大数据科技有限公司 A kind of customer loss Forecasting Methodology and system based on mobile phone games shop
CN108537587A (en) * 2018-04-03 2018-09-14 广州优视网络科技有限公司 It is lost in user's method for early warning, device, computer readable storage medium and server
CN108665321A (en) * 2018-05-18 2018-10-16 广州虎牙信息科技有限公司 High viscosity customer loss prediction technique, device and computer readable storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009071227A1 (en) * 2007-12-04 2009-06-11 Coremedia Ag Method and system for estimating a number of users of a website based on lossy compressed data
CN105099731A (en) * 2014-04-23 2015-11-25 腾讯科技(深圳)有限公司 Method and system for finding churn factor for user churn of network application
CN106327032A (en) * 2015-06-15 2017-01-11 阿里巴巴集团控股有限公司 Data analysis method used for customer loss early warning and data analysis device thereof
CN107609708A (en) * 2017-09-25 2018-01-19 广州赫炎大数据科技有限公司 A kind of customer loss Forecasting Methodology and system based on mobile phone games shop
CN108537587A (en) * 2018-04-03 2018-09-14 广州优视网络科技有限公司 It is lost in user's method for early warning, device, computer readable storage medium and server
CN108665321A (en) * 2018-05-18 2018-10-16 广州虎牙信息科技有限公司 High viscosity customer loss prediction technique, device and computer readable storage medium

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110689381A (en) * 2019-10-10 2020-01-14 中国联合网络通信集团有限公司 Early warning method and device
CN113055208A (en) * 2019-12-27 2021-06-29 中移信息技术有限公司 Method, device and equipment for identifying information identification model based on transfer learning
CN113055208B (en) * 2019-12-27 2023-01-13 中移信息技术有限公司 Method, device and equipment for identifying information identification model based on transfer learning
CN111275245A (en) * 2020-01-13 2020-06-12 宜通世纪物联网研究院(广州)有限公司 Potential network switching user identification method, system, message pushing method, device and medium
CN113256044A (en) * 2020-02-13 2021-08-13 中国移动通信集团广东有限公司 Strategy determination method and device and electronic equipment
CN113256044B (en) * 2020-02-13 2023-08-15 中国移动通信集团广东有限公司 Policy determination method and device and electronic equipment
CN113496288A (en) * 2020-04-08 2021-10-12 中移动信息技术有限公司 User stability determination method, device, equipment and storage medium
CN113496288B (en) * 2020-04-08 2024-04-12 中移动信息技术有限公司 User stability determining method, device, equipment and storage medium
CN112162918A (en) * 2020-09-07 2021-01-01 北京达佳互联信息技术有限公司 Application program testing method and device and electronic equipment
CN112837099A (en) * 2021-02-05 2021-05-25 深圳市欢太科技有限公司 Potential loss user identification method and device, storage medium and electronic equipment
CN112837099B (en) * 2021-02-05 2024-03-19 深圳市欢太科技有限公司 Potential loss user identification method and device, storage medium and electronic equipment

Similar Documents

Publication Publication Date Title
CN109767045A (en) A kind of prediction technique, device, calculating equipment and the medium of loss user
CN107728874A (en) The method, apparatus and equipment of user prompt operation are provided
CN108769198B (en) Method and device for pushing information
CN108764994A (en) A kind of user behavior guidance method, device, server and storage medium
CN109960761B (en) Information recommendation method, device, equipment and computer readable storage medium
CN109656918A (en) Prediction technique, device, equipment and the readable storage medium storing program for executing of epidemic disease disease index
CN112508118B (en) Target object behavior prediction method aiming at data offset and related equipment thereof
CN111125420B (en) Object recommendation method and device based on artificial intelligence and electronic equipment
CN103796183B (en) A kind of refuse messages recognition methods and device
CN110008339A (en) A kind of profound memory network model and its classification method for target emotional semantic classification
CN104508657B (en) Mediation computing device and the method associated for producing semantic label
CN110097170A (en) Information pushes object prediction model acquisition methods, terminal and storage medium
CN110390047A (en) Resource information recommended method, device, terminal and medium based on genetic algorithm
CN113870083A (en) Policy matching method, device and system, electronic equipment and readable storage medium
CN104679493B (en) A kind of improved method of the event processing mechanism of procedure
CN109767227A (en) The system and method for payment risk intelligent decision and control is realized by RDS
CN108829656A (en) The data processing method and data processing equipment of the network information
CN109885834A (en) A kind of prediction technique and device of age of user gender
CN112102011A (en) User grade prediction method, device, terminal and medium based on artificial intelligence
CN113627160B (en) Text error correction method and device, electronic equipment and storage medium
CN116305289B (en) Medical privacy data processing method, device, computer equipment and storage medium
CN110516151B (en) Effective behavior detection and personalized recommendation method
CN114693011A (en) Policy matching method, device, equipment and medium
CN116313016A (en) Medical material allocation method, device, electronic equipment and computer readable storage medium
CN113806540B (en) Text labeling method, text labeling device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 100027 302, 3 / F, aviation service building, Dongzhimen street, Dongcheng District, Beijing

Applicant after: BEIJING TENDCLOUD TIANXIA TECHNOLOGY Co.,Ltd.

Address before: Room 2104, 2 / F, building 4, 75 Suzhou street, Haidian District, Beijing 100027

Applicant before: BEIJING TENDCLOUD TIANXIA TECHNOLOGY Co.,Ltd.

CB02 Change of applicant information
RJ01 Rejection of invention patent application after publication

Application publication date: 20190517

RJ01 Rejection of invention patent application after publication