WO2016015471A1 - 一种预测用户离网的方法及装置 - Google Patents
一种预测用户离网的方法及装置 Download PDFInfo
- Publication number
- WO2016015471A1 WO2016015471A1 PCT/CN2015/073872 CN2015073872W WO2016015471A1 WO 2016015471 A1 WO2016015471 A1 WO 2016015471A1 CN 2015073872 W CN2015073872 W CN 2015073872W WO 2016015471 A1 WO2016015471 A1 WO 2016015471A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- user
- feature data
- social network
- data
- time period
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 46
- 230000000694 effects Effects 0.000 claims abstract description 123
- 238000004364 calculation method Methods 0.000 claims abstract description 54
- 239000011159 matrix material Substances 0.000 claims description 73
- 238000004422 calculation algorithm Methods 0.000 claims description 68
- 238000004891 communication Methods 0.000 claims description 43
- 238000012545 processing Methods 0.000 claims description 31
- 238000012549 training Methods 0.000 claims description 16
- 230000008569 process Effects 0.000 claims description 11
- 239000000284 extract Substances 0.000 claims description 9
- 238000013528 artificial neural network Methods 0.000 claims description 8
- 238000007477 logistic regression Methods 0.000 claims description 8
- 238000007637 random forest analysis Methods 0.000 claims description 8
- 238000012706 support-vector machine Methods 0.000 claims description 8
- 108010001267 Protein Subunits Proteins 0.000 claims description 6
- 230000006870 function Effects 0.000 description 16
- 230000009467 reduction Effects 0.000 description 8
- 238000010586 diagram Methods 0.000 description 6
- 238000001514 detection method Methods 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000000513 principal component analysis Methods 0.000 description 3
- 238000005295 random walk Methods 0.000 description 3
- 230000001133 acceleration Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 230000005484 gravity Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000011946 reduction process Methods 0.000 description 2
- 238000010845 search algorithm Methods 0.000 description 2
- 230000005236 sound signal Effects 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 241000699666 Mus <mouse, genus> Species 0.000 description 1
- 241000699670 Mus sp. Species 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000007599 discharging Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000012905 input function Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000010079 rubber tapping Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000010897 surface acoustic wave method Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/01—Customer relationship services
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L12/00—Data switching networks
- H04L12/02—Details
- H04L12/04—Switchboards
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/50—Network services
- H04L67/535—Tracking the activity of the user
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M15/00—Arrangements for metering, time-control or time indication ; Metering, charging or billing arrangements for voice wireline or wireless communications, e.g. VoIP
- H04M15/58—Arrangements for metering, time-control or time indication ; Metering, charging or billing arrangements for voice wireline or wireless communications, e.g. VoIP based on statistics of usage or network monitoring
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M15/00—Arrangements for metering, time-control or time indication ; Metering, charging or billing arrangements for voice wireline or wireless communications, e.g. VoIP
- H04M15/60—Arrangements for metering, time-control or time indication ; Metering, charging or billing arrangements for voice wireline or wireless communications, e.g. VoIP based on actual use of network resources
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W4/00—Services specially adapted for wireless communication networks; Facilities therefor
- H04W4/24—Accounting or billing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M2215/00—Metering arrangements; Time controlling arrangements; Time indicating arrangements
- H04M2215/01—Details of billing arrangements
- H04M2215/0188—Network monitoring; statistics on usage on called/calling number
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M2215/00—Metering arrangements; Time controlling arrangements; Time indicating arrangements
- H04M2215/32—Involving wireless systems
Definitions
- the embodiments of the present invention relate to the field of communications technologies, and in particular, to a method and an apparatus for predicting a user leaving the network.
- the existing technology for predicting users' off-net is mainly based on the user's early business consumption characteristic data, which can be from the user's bills, bills, etc., for example, the user's daily call duration, daily data usage, and sent SMS.
- the number of items, the amount of monthly consumption, etc. are not comprehensive enough to describe the user's off-net characteristics. It is often impossible to accurately predict the future off-net status of the user. For example, the user may have a daily call duration, daily data usage, and transmission within half a year before leaving the network. The number of SMS messages and the monthly consumption amount have not changed much, which makes it difficult to predict the status of users after half a year.
- the embodiments of the present invention provide a method and an apparatus for predicting a user's off-network, which can improve the accuracy of the user's off-network prediction.
- a method for predicting a user off-net includes:
- the service consumption feature data refers to the user in the first preset time period and each Related data of base station communication
- the social network feature data refers to the first Relevant data of the user communicating with other users in the social network during a preset time period
- the acquiring the location activity feature data of the user in the first preset time period includes:
- the location activity feature data of the user is extracted from a location activity feature matrix, and the location activity feature matrix is a matrix formed by related data that each user communicates with each base station in the first preset time period.
- the acquiring the social network feature data of the user in the first preset time period includes:
- the social network feature data of the user is extracted from a social network feature matrix, and the social network feature matrix is a matrix formed by related data in which the users in the social network communicate with each other during the first preset time period.
- the acquiring user is within the first preset time period After the service consumption feature data, the location activity feature data, and the social network feature data, the method further includes:
- the business consumption feature data, the location activity feature data reduced to the preset dimension, and the calculated influence of the user in the social network are input to a pre-trained classifier for calculation and outputting the calculation result.
- the service consumption feature data, the location activity feature data that is reduced to a preset dimension, and the calculated user In a process in which the influence in the social network is input to a pre-trained classifier for calculation, the greater the service consumption feature data, the lower the probability that the user is off-grid; the user is in the social network The greater the influence, the lower the probability that the user is off-grid; when the user is the same
- the base station that communicates with the user has a higher correlation, and the probability that the user leaves the network in which it is located is lower.
- the method before acquiring the service consumption feature data, the location activity feature data, and the social network feature data of the user in the first preset time period, the method further includes training the classifier,
- the specific method is as follows:
- the preset algorithms include: a random forest algorithm, a support vector machine algorithm, a deep neural network algorithm, and a logistic regression algorithm.
- a second aspect of the embodiments of the present invention provides an apparatus for predicting a user to leave the network, including:
- An obtaining unit configured to acquire service consumption feature data, location activity feature data, and social network feature data of the user in a first preset time period, where the location activity feature data refers to that the user is in the first preset Relevant data for communicating with each base station in a time period, the social network feature data referring to related data of the user communicating with other users in the social network during the first preset time period;
- a processing unit configured to input the service consumption feature data, the location activity feature data, and the social network feature data acquired by the acquiring unit into a pre-trained classifier to calculate and output a calculation result, where the calculation result is Predicting results for the user's off-net.
- the acquiring unit extracts the location activity feature data of the user from the location activity feature matrix, where the location activity feature matrix is a matrix formed by related data that each user communicates with each base station in the first preset time period.
- the second implementation of the second aspect In the manner, the obtaining, by the acquiring unit, the social network feature data of the user in the first preset time period includes:
- the acquiring unit extracts social network feature data of the user from a social network feature matrix, where the social network feature matrix is related data of each user in the social network communicating with each other during the first preset time period. Matrix.
- the processing unit includes:
- a first processing sub-unit configured to reduce a dimension of the location activity feature data acquired by the acquiring unit to a preset dimension
- a second processing sub-unit configured to calculate, according to the social network feature data acquired by the acquiring unit, the influence of the user in the social network
- a third processing sub-unit configured to input the service consumption feature data, the location activity feature data reduced to a preset dimension, and the calculated influence of the user in the social network into a pre-trained classifier Perform calculations and output calculation results.
- the third processing sub-unit reduces the service consumption feature data to a location activity characteristic data of a preset dimension and The calculated influence of the user in the social network is input to a pre-trained classifier for calculation, and the greater the service consumption feature data, the lower the probability that the user is off-grid;
- the greater the influence of the user in the social network the lower the probability that the user is off-grid;
- the smaller the related data the lower the probability that the user is off-grid.
- the base station with the relevant data communicating with the user is larger, and the user leaves the network where the user is located. The lower the probability.
- the apparatus further includes a classifier training unit, configured to train the classifier, where the classifier training unit is specifically configured to:
- the embodiments of the present invention have the following advantages:
- the user's business consumption feature data, the location activity feature data, and the social network feature data are obtained, and the three types of data are input to the classifier to perform off-network prediction on the user.
- the embodiment of the present invention adds the location activity feature data and the social network feature data of the user, and uses the three types of data.
- the user's off-grid characteristics are comprehensively characterized, and the users are off-grid based on these three types of data, and the prediction results are more reliable and accurate.
- FIG. 1 is a schematic diagram of an embodiment of a method for predicting a user leaving the network according to the present invention
- FIG. 2 is a schematic diagram of another embodiment of a method for predicting a user leaving the network according to the present invention.
- FIG. 4 is a schematic diagram of another embodiment of a device for predicting a user leaving the network according to the present invention.
- FIG. 5 is a schematic diagram of another embodiment of an apparatus for predicting a user leaving the network according to the present invention.
- the service consumption characteristic data refers to the data displayed on the user's bill and bill, such as: the user's daily call duration, the daily data usage, the monthly consumption amount, etc.; the location activity characteristic data refers to the user's Relevant data communicated with each base station in the first preset time period, for example, an identifier of a base station that communicates with the user, a frequency and duration of the connection between the user and the base station, and the like; the social network feature data refers to the user within the first preset time period. Relevant data that communicates with other users in the social network, such as the identity of other users communicating with the user, the length of time, frequency, etc. that the user communicates with other users.
- the user's service consumption feature data, location activity feature data, and social network feature data may be obtained from an operator, including but not limited to, for example, a telecommunication operator, a mobile operator, or a Unicom carrier.
- the first preset time period may be preset, for example, three months, six months, etc., and the relevant data of the user's previous M months may be used to predict the off-network status of the user in the next N months, where M and N are both A positive integer, M can be greater than or equal to N, or M can be less than N, but the predicted result when M is greater than or equal to N is more accurate than the predicted result when M is less than N.
- M and N can be preset according to actual needs. limited.
- the obtained service consumption feature data, the location activity feature data, and the social network feature data are input into a pre-trained classifier to calculate and output a calculation result, where the calculation result is an off-net prediction result of the user.
- the user's business consumption feature data, the location activity feature data, and the social network feature data are obtained, and the three types of data are input to the classifier to perform off-network prediction on the user.
- the embodiment of the invention combines the user's location activity feature data and the social network feature data to comprehensively describe the user's off-network characteristics, and performs off-grid prediction on the basis according to the above three types of data, and the prediction result is more reliable and accurate.
- the method in this embodiment includes:
- the service consumption feature data, the location activity feature data, and the social network feature data of each user in the second preset time period are used as the first input of the classifier, and the current network state of each user is used as the second input of the classifier. And using a preset algorithm to train the first input and the second input to obtain a classifier;
- the training process of the classifier is to use the service consumption feature data, the location activity feature data, and the social network feature data of each user in the second preset time period as the first input of the classifier, and each user is The current network state (including off-net or in-network) is used as the second input of the classifier, and the first input and the second input are trained by using a preset algorithm to obtain a classifier, wherein the preset algorithm includes: a random forest algorithm , support vector machine algorithm, deep neural network algorithm and logistic regression algorithm.
- the training of the classifier in this embodiment refers to the input of the known classifier f as the service consumption feature data, the location activity feature data and the social network feature data of each user, and the output of the known classifier f is the current state of each user.
- the business consumption characteristic data refers to the data presented on the user's bill and bill, such as: the user's daily call duration, daily data usage, monthly consumption amount, etc.
- the business consumption characteristic data can be directly from the user's bill, Get it directly from the bill.
- the location activity feature data refers to related data that the user communicates with each base station in the first preset time period, such as the identity of the base station that communicates with the user, the frequency and duration of the connection between the user and the base station, and the like.
- related data that each user communicates with each base station in the first preset time period constitutes a matrix, and the matrix is referred to as a location activity feature matrix, and each element in the matrix represents a user communicating with a base station.
- Related data and then extract relevant data of the user's communication with each base station from the location activity feature matrix as the location activity feature data of the user.
- the social network feature data refers to related data that the user communicates with other users in the social network during the first preset time period, such as the identifiers of other users communicating with the user, the duration, frequency, and the like of the user communicating with other users.
- each user in the social network will be in the first preset time period.
- the related data communicating with each other constitutes a matrix, which is called a social network feature matrix.
- Each element in the matrix represents related data that one user communicates with another user, and then the user is extracted from the social network feature matrix to communicate with other users.
- the relevant data is used as the social network feature data of the user.
- the first preset time period may be preset, for example, three months, six months, and the like.
- the user may use the relevant data of the first M months of the user to predict the off-net status of the user in the next N months.
- M and N are positive integers, M may be greater than or equal to N, or M may be less than N, but the predicted result when M is greater than or equal to N is more accurate than the predicted result when M is less than N, and M and N may be preset according to actual needs. This is not specifically limited.
- the second preset time period needs to be greater than the first preset time period.
- the dimension of the location activity feature data of the user is reduced to a preset dimension, and the influence of the user in the social network is calculated according to the social network feature data of the user.
- the dimension of the user's location activity feature data will be relatively high, usually the dimension M ⁇ 10 5 , which cannot be used directly. Therefore, in this embodiment, after acquiring the location activity feature data of the user, the location activity feature data needs to be subjected to dimensionality reduction processing, and the algorithm for the dimensionality reduction process includes but is not limited to: a principal component analysis (PCA) algorithm, Latent Dirichlet allocation (LDA) algorithm and Probabilistic Matrix Factorization (PMF) algorithm.
- PCA principal component analysis
- LDA Latent Dirichlet allocation
- PMF Probabilistic Matrix Factorization
- the LDA algorithm can be used for dimensionality reduction.
- a sparse matrix that will be used to represent the location activity feature data of the user Decomposed into the product of ⁇ N ⁇ K and ⁇ K ⁇ M , ie
- the matrix ⁇ N ⁇ K is obtained by the LDA dimension reduction algorithm, and the matrix ⁇ N ⁇ K is taken as the position activity characteristic data reduced to the preset dimension.
- the influence of the user in the social network can be calculated according to the social network feature data of the user. Because in a social network, other users communicating with the user generally focus on only a few fixed users, and therefore, a matrix for representing the social network feature data of the user. Still a sparse matrix, most of the elements in the matrix are 0.
- the influence of the user in the social network is calculated by a preset influence transfer algorithm.
- the above influence transfer algorithm includes but is not limited to the page rank PageRank algorithm, based on The topic of hyperlink analysis searches Hypertext-Induced Topic Search algorithm and randomly walks the Random Walk algorithm.
- the larger the user's business consumption characteristic data the lower the probability that the user is off-grid in the calculation result; the greater the influence of the user in the social network, the lower the probability that the user is off-grid in the calculation result;
- the smaller the data related to the communication between the user and the base station with poor communication quality in the network the lower the probability that the user will leave the network in the calculation result, when the user communicates with the base stations in different networks.
- the base station with the larger data related to the user communication the lower the probability that the user leaves the network where the user is located in the calculation result.
- the base station and other related data communicated with the user can be known.
- the user communicates with different base stations in the same network, for example, the user communicates with three base stations A, B, and C in the same network.
- the communication quality of base station A is better than that of base station B.
- the communication quality of base station B is better than that of base station C.
- the service experienced by the user is very poor, which ultimately leads to the future.
- Off-net on the contrary, if the user often communicates with the base station A, the service experienced by the user is very good, and the probability of leaving the network in the future becomes lower; when the user communicates with the base station in different networks, for example, in a preset During the time period, the user has communicated with the A base station in the X network (the communication network of X), and has communicated with the base station B in the Y network (the communication network of the Y), and the duration of the communication between the user and the A base station. The frequency is reduced compared to the previous one. On the contrary, the duration and frequency of communication with the base station B are increased compared with the previous one. At this time, it is possible that the user comes to the Y ground from the X ground, and the user leaves X in the future. Probability network becomes large.
- the user's business consumption characteristic data, the location activity characteristic data, and the social network characteristic data are acquired, and the three types of data are input to the classifier to perform off-network prediction on the user.
- the embodiment of the present invention utilizes the user's business consumption feature data, location activity feature data, and social network feature data to comprehensively describe the user's off-network characteristics, and performs off-network prediction on the user according to the three types of data.
- the prediction result is more reliable and accurate, and the experiment proves that the method provided by the embodiment is used.
- the predicted AUC value is greater than 0.8.
- the AUC value refers to the index of the predictor accuracy of the classifier, and the AUC value is generally greater than 0 and less than 1, and the larger the value, the higher the prediction accuracy.
- the apparatus 300 of this embodiment includes:
- the obtaining unit 301 is configured to acquire the service consumption feature data, the location activity feature data, and the social network feature data of the user in the first preset time period, where the location activity feature data refers to the user in the first preset time period and each Related data of the base station communication, the social network feature data refers to related data that the user communicates with other users in the social network during the first preset time period;
- the processing unit 302 is configured to input the service consumption feature data, the location activity feature data, and the social network feature data acquired by the obtaining unit 301 into a pre-trained classifier to calculate and output a calculation result, where the calculation result is an off-network prediction result of the user.
- the device 400 of this embodiment includes:
- the classifier training unit 401 is configured to train the classifier, specifically: using the service consumption feature data, the location activity feature data, and the social network feature data of each user in the second preset time period as the first input of the classifier
- the current network state of each user is used as the second input of the classifier, and the first input and the second input input to the classifier are trained by using a preset algorithm to obtain the classifier, and the second preset time
- the segment is larger than the first preset time period
- the preset algorithm includes: a random forest algorithm, a support vector machine algorithm, a deep neural network algorithm, and a logistic regression algorithm;
- the obtaining unit 402 is configured to acquire service consumption feature data, location activity feature data, and social network feature data of the user in the first preset time period;
- the processing unit 403 is configured to input the service consumption feature data, the location activity feature data, and the social network feature data acquired by the acquisition unit into a pre-trained classifier to calculate and output a calculation result, where the calculation result is an off-network prediction result of the user.
- the processing unit 403 includes:
- the first processing sub-unit 4031 is configured to reduce the dimension of the location activity feature data acquired by the obtaining unit 402 to a preset dimension
- a second processing sub-unit 4032 configured to calculate, according to the social network feature data acquired by the acquiring unit 402, the influence of the user in the social network;
- the third processing sub-unit 4033 is configured to input the business consumption feature data, the location activity feature data reduced to the preset dimension, and the calculated influence of the user in the social network into the pre-trained classifier for calculation and output calculation. result.
- the classifier training unit 401 uses the service consumption feature data, the location activity feature data, and the social network feature data of each user in the second preset time period as the first input of the classifier, and classifies the current network state of each user as a classification.
- the second input of the device is used to train the first input and the second input to obtain a classifier by using a preset algorithm.
- the preset algorithms include: random forest algorithm, support vector machine algorithm, deep neural network algorithm and logistic regression algorithm. That is, the training of the classifier in this embodiment refers to the input of the known classifier f as the service consumption feature data, the location activity feature data and the social network feature data of each user, and the output of the known classifier f is the current state of each user.
- the state of the network, the process of estimating the parameters of the function f is the process of estimating the parameters of the function f.
- the obtaining unit 402 acquires the service consumption feature data, the location activity feature data, and the social network feature data of the user in the first preset time period.
- the service consumption characteristic data refers to the data displayed on the user's bill and bill, for example, the user's daily call duration, daily data usage, monthly consumption amount, etc., and the business consumption characteristic data can be directly from the user. Obtained directly from bills and bills.
- the location activity feature data refers to related data that the user communicates with each base station in the first preset time period, such as the identity of the base station that communicates with the user, the frequency and duration of the connection between the user and the base station, and the like.
- the related data that each user communicates with each base station in the first preset time period is first formed into a matrix, and the matrix is referred to as a location active feature matrix, and each element in the matrix represents a user and a base station.
- Correlation data of the communication, and then the obtaining unit 402 extracts relevant data of the user's communication with each base station from the location activity feature matrix as the location activity feature data of the user.
- the social network feature data refers to related data that the user communicates with other users in the social network during the first preset time period, such as the identifiers of other users communicating with the user, the duration, frequency, and the like of the user communicating with other users.
- the related data of each user in the social network in the first preset time period may be first formed into a matrix, and the matrix is referred to as a social network feature matrix, and each element in the matrix represents a user and Another user communicates with the relevant data, then gets the unit 402 extracts relevant data of the user's communication with other users from the social network feature matrix as the social network feature data of the user.
- the first preset time period may be preset, for example, three months, six months, and the like.
- the user may use the relevant data of the first M months of the user to predict the off-net status of the user in the next N months.
- M and N are positive integers, M may be greater than or equal to N, or M may be less than N, but the predicted result when M is greater than or equal to N is more accurate than the predicted result when M is less than N, and M and N may be preset according to actual needs. This is not specifically limited.
- the second preset time period needs to be greater than the first preset time period.
- the first processing sub-unit 4031 reduces the dimension of the location activity feature data of the user acquired by the obtaining unit 402 to a preset dimension. Because the dimension of the user's location activity feature data will be relatively high, usually the dimension M ⁇ 10 5 can not be used directly. Therefore, in this embodiment, after acquiring the location activity feature data of the user, the first processing sub-unit 4031 needs to perform dimension reduction processing on the location activity feature data, and the algorithm of the dimension reduction process includes but is not limited to: Principal Component (Principal Component) Analysis, PCA) algorithm, Latent Dirichlet allocation (LDA) algorithm and Probabilistic Matrix Factorization (PMF) algorithm.
- PCA Principal Component
- LDA Latent Dirichlet allocation
- PMF Probabilistic Matrix Factorization
- the matrix used to represent the location activity feature data of the user is a sparse matrix, that is, most of the elements in the matrix are 0.
- the LDA algorithm can be used for dimensionality reduction.
- a sparse matrix that will be used to represent the location activity feature data of the user Decomposed into the product of ⁇ N ⁇ K and ⁇ K ⁇ M , ie
- K is much smaller than M
- the dimension of the matrix ⁇ N ⁇ K is K, thereby achieving the effect of dimensionality reduction.
- the matrix ⁇ N ⁇ K is obtained by the LDA dimension reduction algorithm, and the matrix ⁇ N ⁇ K is taken as the position activity characteristic data reduced to the preset dimension.
- the second processing sub-unit 4032 can calculate the influence of the user in the social network according to the social network feature data of the user. Because in a social network, other users communicating with the user generally focus on only a few fixed users, and therefore, a matrix for representing the social network feature data of the user. Still a sparse matrix, most of the elements in the matrix are 0.
- the second processing sub-unit 4032 calculates the influence of the user in the social network by using a preset influence transmission algorithm, which includes but is not limited to Pagerank PageRank algorithm, hyperlink-based topic search Hypertext-Induced Topic Search algorithm, random walk Random Walk algorithm.
- the third processing sub-unit 4033 lowers the user's business consumption feature data to a position of a preset dimension
- the activity feature data and the calculated influence of the user in the social network are input to the trained classifier for calculation and outputting the calculation result, which is the user's off-network prediction result.
- the larger the user's business consumption characteristic data the lower the probability that the user is off-grid in the calculation result; the greater the influence of the user in the social network, the lower the probability that the user is off-grid in the calculation result;
- the smaller the data related to the communication between the user and the base station with poor communication quality in the network the lower the probability that the user will leave the network in the calculation result, when the user communicates with the base stations in different networks.
- the base station with the larger data related to the user communication the lower the probability that the user leaves the network where the user is located in the calculation result.
- the base station and other related data communicated with the user can be known.
- the user communicates with different base stations in the same network, for example, the user communicates with three base stations A, B, and C in the same network.
- the communication quality of base station A is better than that of base station B.
- the communication quality of base station B is better than that of base station C.
- the service experienced by the user is very poor, which ultimately leads to the future.
- Off-net on the contrary, if the user often communicates with the base station A, the service experienced by the user is very good, and the probability of leaving the network in the future becomes lower; when the user communicates with the base station in different networks, for example, in a preset During the time period, the user has communicated with the A base station in the X network (the communication network of X), and has communicated with the base station B in the Y network (the communication network of the Y), and the duration of the communication between the user and the A base station. The frequency is reduced compared to the previous one. On the contrary, the duration and frequency of communication with the base station B are increased compared with the previous one. At this time, it is possible that the user comes to the Y ground from the X ground, and the user leaves X in the future. Probability network becomes large.
- the acquiring unit acquires the service consumption feature data, the location activity feature data, and the social network feature data of the user, and the processing unit inputs the three types of data into the classifier to perform off-network prediction on the user.
- the embodiment of the present invention utilizes the user's business consumption feature data, location activity feature data, and social network feature data to comprehensively describe the user's off-network characteristics, and performs off-network prediction on the user according to the three types of data. The prediction results are more reliable and accurate.
- FIG. 5 is a schematic diagram of another embodiment of an apparatus for predicting a user's off-net according to the present invention.
- the device 500 for predicting a user's off-network can be used to implement the predicted user off-network provided by the foregoing embodiment.
- Method, in practical applications, the device 500 for predicting that the user is off-grid can be integrated into the electronic device
- the electronic device can be a computer or the like. Specifically:
- the device 500 for predicting user off-net may include an RF (Radio Frequency) circuit 510, a memory 520 including one or more computer readable storage media, an input unit 530, a display unit 540, a sensor 550, an audio circuit 560, and a WiFi.
- the RF circuit 510 can be used for receiving and transmitting signals during and after a message or a call, in particular, after receiving downlink information of the base station, and processing it by one or more processors 580; in addition, transmitting data related to the uplink to the base station .
- the RF circuit 510 includes, but is not limited to, an antenna, at least one amplifier, a tuner, one or more oscillators, a Subscriber Identity Module (SIM) card, a transceiver, a coupler, an LNA (Low Noise Amplifier). , duplexer, etc.
- SIM Subscriber Identity Module
- RF circuitry 510 can also communicate with the network and other devices via wireless communication.
- the wireless communication may use any communication standard or protocol, including but not limited to GSM (Global System of Mobile communication), GPRS (General Packet Radio Service), CDMA (Code Division Multiple Access). , Code Division Multiple Access), WCDMA (Wideband Code Division Multiple Access), LTE (Long Term Evolution), e-mail, SMS (Short Messaging Service), and the like.
- GSM Global System of Mobile communication
- GPRS General Packet Radio Service
- CDMA Code Division Multiple Access
- WCDMA Wideband Code Division Multiple Access
- LTE Long Term Evolution
- e-mail Short Messaging Service
- the memory 520 can be used to store software programs and modules, and the processor 580 executes various functional applications and data processing by running software programs and modules stored in the memory 520.
- the memory 520 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application required for at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may be stored according to The use of storage devices creates data (such as audio data, phone books, etc.).
- memory 520 can include high speed random access memory, and can also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, memory 520 may also include a memory controller to provide access to memory 520 by processor 580 and input unit 530.
- Input unit 530 can be used to receive input numeric or character information, as well as to generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function controls.
- input unit 530 can include touch-sensitive surface 531 as well as other input devices 532.
- a touch-sensitive surface 531 also referred to as a touch display or trackpad, can collect touch operations on or near the user (eg, the user uses a finger, stylus, etc., any suitable object or accessory on the touch-sensitive surface 531 or The operation near the touch-sensitive surface 531) and driving the corresponding connecting device according to a preset program.
- the touch-sensitive surface 531 can include two portions of a touch detection device and a touch controller.
- the touch detection device detects the touch orientation of the user, and detects a signal brought by the touch operation, and transmits the signal to the touch controller; the touch controller receives the touch information from the touch detection device, converts the touch information into contact coordinates, and sends the touch information.
- the processor 580 is provided and can receive commands from the processor 580 and execute them.
- the touch sensitive surface 531 can be implemented in various types such as resistive, capacitive, infrared, and surface acoustic waves.
- the input unit 530 can also include other input devices 532. Specifically, other input devices 532 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control buttons, switch buttons, etc.), trackballs, mice, joysticks, and the like.
- Display unit 540 can be used to display information entered by the user or information provided to the user and various graphical user interfaces of the device, which can be composed of graphics, text, icons, video, and any combination thereof.
- the display unit 540 can include a display panel 541.
- the display panel 541 can be configured in the form of an LCD (Liquid Crystal Display), an OLED (Organic Light-Emitting Diode), or the like.
- the touch-sensitive surface 531 can cover the display panel 541, and when the touch-sensitive surface 531 detects a touch operation thereon or nearby, it is transmitted to the processor 580 to determine the type of the touch event, and then the processor 580 according to the touch event The type provides a corresponding visual output on display panel 541.
- touch-sensitive surface 531 and display panel 541 are implemented as two separate components to implement input and input functions, in some embodiments, touch-sensitive surface 531 can be integrated with display panel 541 for input. And output function.
- the device 500 that predicts that the user is off-grid may also include at least one type of sensor 550, such as a light sensor, motion sensor, and other sensors.
- the light sensor may include an ambient light sensor and a proximity sensor, wherein the ambient light sensor may adjust the brightness of the display panel 541 according to the brightness of the ambient light, and the proximity sensor may close the display panel 541 when the device 500 moves to the ear / or backlight.
- the gravity acceleration sensor can detect acceleration in all directions (usually three axes) The size and direction of gravity can be detected at rest. It can be used to identify the posture of the device (such as horizontal and vertical screen switching, related games, magnetometer attitude calibration), vibration recognition related functions (such as pedometer, tapping), etc.
- Other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, an infrared sensor, and the like, which are also configurable by the device 500, will not be described herein.
- Audio circuit 560, speaker 561, and microphone 562 can provide an audio interface between the user and the device.
- the audio circuit 560 can transmit the converted electrical data of the received audio data to the speaker 561, and convert it into a sound signal output by the speaker 561.
- the microphone 562 converts the collected sound signal into an electrical signal, and the audio circuit 560 is used by the audio circuit 560. After receiving, it is converted into audio data, and then processed by the audio data output processor 580, transmitted to the device, for example, by the RF circuit 510, or outputted to the memory 520 for further processing.
- the audio circuit 560 may also include an earbud jack to provide communication of the peripheral earphones to the device.
- WiFi is a short-range wireless transmission technology.
- the device 500 for predicting users' off-network can help users to send and receive emails, browse web pages, and access streaming media through the WiFi module 570, which provides wireless broadband Internet access for users.
- FIG. 5 shows the WiFi module 570, it can be understood that it does not belong to the essential configuration of the device, and may be omitted as needed within the scope of not changing the essence of the invention.
- Processor 580 is a control center for devices that predict users to leave the network, connecting various portions of the entire device using various interfaces and lines, by running or executing software programs and/or modules stored in memory 520, and by calling stored in memory 520.
- the internal data performs various functions of the storage device and processes the data, thereby performing overall monitoring of the storage device.
- the processor 580 may include one or more processing cores; preferably, the processor 580 may integrate an application processor and a modem processor, where the application processor mainly processes an operating system, a user interface, an application, and the like.
- the modem processor primarily handles wireless communications. It will be appreciated that the above described modem processor may also not be integrated into the processor 580.
- the device 500 for predicting user off-net also includes a power source 590 (such as a battery) for powering various components.
- a power source 590 such as a battery
- the power source can be logically coupled to the processor 580 through a power management system to manage charging, discharging, and power through the power management system. Consumption management and other functions.
- Power supply 590 may also include any one or more of a DC or AC power source, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.
- the device 500 for predicting the user to leave the network may further include a camera, a Bluetooth module, and the like, and details are not described herein again.
- the means 500 for predicting user off-net includes a memory 520, and one or more programs, wherein one or more programs are stored in the memory 520 and configured to be comprised by one or more processors. 580 Executing one or more of the above programs includes instructions for performing the following operations:
- the social network feature data refers to related data that the user communicates with other users in the social network during the first preset time period;
- the obtained business consumption feature data, the location activity feature data, and the social network feature data are input into a pre-trained classifier to calculate and output a calculation result, and the calculation result is an off-network prediction result of the user.
- obtaining location activity characteristic data of the user in the first preset time period includes:
- the location activity feature data of the user is extracted from the location activity feature matrix, and the location activity feature matrix is a matrix formed by related data that each user communicates with each base station in the first preset time period.
- obtaining social network feature data of the user in the first preset time period includes:
- the social network feature data of the user is extracted from the social network feature matrix, and the social network feature matrix is a matrix composed of related data in which each user in the social network communicates with each other in the first preset time period.
- the method further includes:
- the dimension of the location activity feature data is reduced to a preset dimension, and the influence of the user in the social network is calculated according to the social network feature data;
- the obtained business consumption feature data, the location activity feature data, and the social network feature data are input into a pre-trained classifier for calculation and outputting the calculation result includes:
- the business consumption feature data, the location activity feature data reduced to the preset dimension, and the calculated influence of the user in the social network are input to the pre-trained classifier for calculation and the calculation result is output.
- the business consumption feature data, the location activity feature data reduced to the preset dimension, and the calculated influence of the user in the social network are input to the pre-trained classifier for calculation.
- the larger the service consumption characteristic data the lower the probability that the user is off-grid; the greater the influence of the user in the social network, the lower the probability of the user leaving the network; when the user communicates with different base stations in the same network, the user The smaller the data related to communication with the base station with poor communication quality in the network, the lower the probability that the user is off-grid.
- the base station with the relevant data that communicates with the user is larger, and the user leaves the base station. The lower the probability of the network in which it is located.
- the method before acquiring the service consumption feature data, the location activity feature data, and the social network feature data of the user in the first preset time period, the method further includes training the classifier, and the specific method is as follows:
- the service consumption feature data, the location activity feature data and the social network feature data of each user in the second preset time period are used as the first input of the classifier, and the current network state of each user is used as the second input of the classifier.
- the preset algorithm trains the first input and the second input of the input classifier to obtain the classifier, and the second preset time period is greater than the first preset time period, and the preset algorithm includes: a random forest algorithm, a support vector Machine algorithm, deep neural network algorithm and logistic regression algorithm.
- the device 500 for predicting the user's off-network provided by the embodiment of the present invention may also be used to implement other functions in the foregoing device embodiments, and details are not described herein again.
- the device embodiments described above are merely illustrative, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be Physical units can be located in one place or distributed to multiple network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
- the connection relationship between the modules indicates that there is a communication connection between them, and specifically, one or more communication buses or signal lines can be realized.
- Part of it can be embodied in the form of a software product stored in a readable storage medium, such as a computer floppy disk, a U disk, a mobile hard disk, a read-only memory (ROM, Read-Only Memory), random access.
- a memory RAM
- a magnetic disk or an optical disk, etc. includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the methods described in various embodiments of the present invention.
Landscapes
- Engineering & Computer Science (AREA)
- Signal Processing (AREA)
- Computer Networks & Wireless Communication (AREA)
- Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Accounting & Taxation (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Economics (AREA)
- Marketing (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- Development Economics (AREA)
- Finance (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Probability & Statistics with Applications (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Computer Hardware Design (AREA)
- Telephonic Communication Services (AREA)
- Mobile Radio Communication Systems (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
本发明实施例公开了一种预测用户离网的方法及装置,方法包括:获取用户在第一预置时间段内的业务消费特征数据、位置活动特征数据及社交网络特征数据,所述位置活动特征数据指的是所述用户在所述第一预置时间段内与各个基站通信的相关数据,所述社交网络特征数据指的是在所述第一预置时间段内所述用户与社交网络中的其他用户通信的相关数据;将获取的所述业务消费特征数据、所述位置活动特征数据及所述社交网络特征数据输入预先训练好的分类器进行计算并输出计算结果,所述计算结果为所述用户的离网预测结果。本发明实施例能够提高用户离网预测的准确度。
Description
本申请要求于2014年7月30日提交中国专利局、申请号为201410371307.2、发明名称为“一种预测用户离网的方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本发明实施例涉及通信技术领域,尤其涉及一种预测用户离网的方法及装置。
对于多数基于入网服务的企业来说,预测用户未来是否离网及其离网的主要原因至关重要。例如,电信运营商非常关心其在网用户未来是否可能离网及离网的时间和原因,然后利用这些结果针对性的为可能离网的用户进行维系和挽留活动,从而保障存量用户的价值,持续为电信运营商提供稳定的利润。通常,运营商希望能够提前一段时间预测出用户离网的倾向,从而有足够的时间来维系和挽留用户。
现有的预测用户离网的技术,主要是基于用户早期的业务消费特征数据,这些数据可以来自用户的账单、话单等,例如:用户每天的通话时长、每天使用的数据流量、发送的短信条数、每月消费金额等。而这些数据对用户离网特征刻画的不够全面,常常不能准确地预测出用户未来的离网状况,例如,用户可能在离网前的半年内,每天的通话时长、每天使用的数据流量、发送的短信条数、每月消费金额变化不大,这样就很难预测半年后用户的状态。
发明内容
有鉴于此,本发明实施例提供了一种预测用户离网的方法及装置,能够提高用户离网预测的准确度。
第一方面,本发明实施例提供的预测用户离网的方法,包括:
获取用户在第一预置时间段内的业务消费特征数据、位置活动特征数据及社交网络特征数据,所述位置活动特征数据指的是所述用户在所述第一预置时间段内与各个基站通信的相关数据,所述社交网络特征数据指的是在所述第一
预置时间段内所述用户与社交网络中的其他用户通信的相关数据;
将获取的所述业务消费特征数据、所述位置活动特征数据及所述社交网络特征数据输入预先训练好的分类器进行计算并输出计算结果,所述计算结果为所述用户的离网预测结果。
结合第一方面,在第一方面的第一种实施方式中,所述获取用户在第一预置时间段内的位置活动特征数据包括:
从位置活动特征矩阵中抽取所述用户的位置活动特征数据,所述位置活动特征矩阵为在所述第一预置时间段内各个用户与各个基站通信的相关数据构成的矩阵。
结合第一方面,或第一方面的第一种实施方式,在第一方面的第二种实施方式中,所述获取用户在第一预置时间段内的社交网络特征数据包括:
从社交网络特征矩阵中抽取所述用户的社交网络特征数据,所述社交网络特征矩阵为在所述第一预置时间段内所述社交网络中各个用户相互通信的相关数据构成的矩阵。
结合第一方面,或第一方面的第一种实施方式,或第一方面的第二种实施方式,在第一方面的第三种实施方式中,在获取用户在第一预置时间段内的业务消费特征数据、位置活动特征数据及社交网络特征数据之后,所述方法还包括:
将所述位置活动特征数据的维度降低到预设维度,以及根据所述社交网络特征数据计算所述用户在所述社交网络中的影响力;
所述将获取的所述业务消费特征数据、所述位置活动特征数据及所述社交网络特征数据输入预先训练好的分类器进行计算并输出计算结果包括:
将所述业务消费特征数据、降低到预设维度的位置活动特征数据及计算所得的所述用户在所述社交网络中的影响力输入到预先训练好的分类器进行计算并输出计算结果。
结合第一方面的第三种实施方式,在第一方面的第四种实施方式中,所述将所述业务消费特征数据、降低到预设维度的位置活动特征数据及计算所得的所述用户在所述社交网络中的影响力输入到预先训练好的分类器进行计算的过程中,所述业务消费特征数据越大,所述用户离网的概率越低;所述用户在所述社交网络中的影响力越大,所述用户离网的概率越低;当所述用户与同一
网络中的不同基站通信时,所述用户与网络中通信质量越差的基站通信的相关数据越小,所述用户离网的概率越低,当所述用户与不同网络中的基站通信时,与所述用户通信的相关数据越大的基站,所述用户离开其所在的网络的概率越低。
结合第一方面,或第一方面的第一种实施方式,或第一方面的第二种实施方式,或第一方面的第三种实施方式,或第一方面的第四种实施方式,在第一方面的第五种实施方式中,在获取用户在第一预置时间段内的业务消费特征数据、位置活动特征数据及社交网络特征数据之前,所述方法还包括训练所述分类器,具体方法如下:
将各个用户在第二预置时间段内的业务消费特征数据、位置活动特征数据及社交网络特征数据作为所述分类器的第一输入,将所述各个用户当前的网络状态作为所述分类器的第二输入,利用预设的算法对输入所述分类器的第一输入及第二输入进行训练得到所述分类器,所述第二预置时间段大于所述第一预置时间段,所述预设的算法包括:随机森林算法、支持向量机算法、深层神经网络算法及逻辑回归算法。
本发明实施例第二方面提供了一种预测用户离网的装置,包括:
获取单元,用于获取用户在第一预置时间段内的业务消费特征数据、位置活动特征数据及社交网络特征数据,所述位置活动特征数据指的是所述用户在所述第一预置时间段内与各个基站通信的相关数据,所述社交网络特征数据指的是在所述第一预置时间段内所述用户与社交网络中的其他用户通信的相关数据;
处理单元,用于将所述获取单元获取的所述业务消费特征数据、所述位置活动特征数据及所述社交网络特征数据输入预先训练好的分类器进行计算并输出计算结果,所述计算结果为所述用户的离网预测结果。
结合第二方面,在第二方面的第一种实施方式中,所述获取单元获取所述用户在第一预置时间段内的位置活动特征数据包括:
所述获取单元从位置活动特征矩阵中抽取所述用户的位置活动特征数据,所述位置活动特征矩阵为在所述第一预置时间段内各个用户与各个基站通信的相关数据构成的矩阵。
结合第二方面,或第二方面的第一种实施方式,在第二方面的第二种实施
方式中,所述获取单元获取所述用户在第一预置时间段内的社交网络特征数据包括:
所述获取单元从社交网络特征矩阵中抽取所述用户的社交网络特征数据,所述社交网络特征矩阵为在所述第一预置时间段内所述社交网络中各个用户相互通信的相关数据构成的矩阵。
结合第二方面,或第二方面的第一种实施方式,或第二方面的第二种实施方式,在第二方面的第三种实施方式中,所述处理单元包括:
第一处理子单元,用于将所述获取单元获取的所述位置活动特征数据的维度降低到预设维度;
第二处理子单元,用于根据所述获取单元获取的所述社交网络特征数据计算所述用户在所述社交网络中的影响力;
第三处理子单元,用于将所述业务消费特征数据、降低到预设维度的位置活动特征数据及计算所得的所述用户在所述社交网络中的影响力输入到预先训练好的分类器进行计算并输出计算结果。
结合第二方面的第三种实施方式,在第二方面的第四种实施方式中,在所述第三处理子单元将所述业务消费特征数据、降低到预设维度的位置活动特征数据及计算所得的所述用户在所述社交网络中的影响力输入到预先训练好的分类器进行计算的过程中,所述业务消费特征数据越大,所述用户离网的概率越低;所述用户在所述社交网络中的影响力越大,所述用户离网的概率越低;当所述用户与同一网络中的不同基站通信时,所述用户与网络中通信质量越差的基站通信的相关数据越小,所述用户离网的概率越低,当所述用户与不同网络中的基站通信时,与所述用户通信的相关数据越大的基站,所述用户离开其所在的网络的概率越低。
结合第二方面,或第二方面的第一种实施方式,或第二方面的第二种实施方式,或第二方面的第三种实施方式,或第二方面的第四种实施方式,在第二方面的第五种实施方式中,所述装置还包括分类器训练单元,用于训练所述分类器,所述分类器训练单元具体用于:
将各个用户在第二预置时间段内的业务消费特征数据、位置活动特征数据及社交网络特征数据作为所述分类器的第一输入,将所述各个用户当前的网络状态作为所述分类器的第二输入,利用预设的算法对输入所述分类器的第一输
入及第二输入进行训练得到所述分类器,所述第二预置时间段大于所述第一预置时间段,所述预设的算法包括:随机森林算法、支持向量机算法、深层神经网络算法及逻辑回归算法。从以上技术方案可以看出,本发明实施例具有以下优点:
本发明实施例中,获取用户的业务消费特征数据、位置活动特征数据及社交网络特征数据,将这三类数据输入到分类器对用户进行离网预测。相较于现有技术中只用用户的业务消费特征数据对用户的离网特征进行刻画的方法,本发明实施例新增了用户的位置活动特征数据及社交网络特征数据,利用这三类数据对用户离网特征进行了全面的刻画,根据这三类数据对用户进行离网预测,预测结果更加可靠准确。
为了更清楚地说明本发明实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本发明预测用户离网的方法一个实施例示意图;
图2为本发明预测用户离网的方法另一实施例示意图;
图3为本发明预测用户离网的装置一个实施例示意图;
图4为本发明预测用户离网的装置另一实施例示意图;
图5为本发明预测用户离网的装置另一实施例示意图。
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
本发明实施例提供了一种预测用户离网的方法及装置,能够准确地预测用户未来离网的状况。
请参阅图1,本发明预测用户离网的方法一个实施例包括:
101、获取用户在第一预置时间段内的业务消费特征数据、位置活动特征数据及社交网络特征数据;
其中,业务消费特征数据指的是用户的账单、话单上呈现的数据,例如:用户每天的通话时长、每天使用的数据流量,每月的消费金额等;位置活动特征数据指的是用户在第一预置时间段内与各个基站通信的相关数据,例如与用户通信的基站的标识,用户与基站连接的频率、时长等;社交网络特征数据指的是在第一预置时间段内用户与社交网络中的其他用户通信的相关数据,例如与用户通信的其他用户的标识,用户与其他用户通信的时长、频率等。
在具体实现中,用户的业务消费特征数据、位置活动特征数据及社交网络特征数据可以从运营商处获取,运营商包括但不限于例如电信运营商、移动运营商或联通运营商。第一预置时间段可预先设定,例如三个月、六个月等,通常可以利用用户前M个月的相关数据预测用户未来N个月的离网状况,其中,M与N均为正整数,M可以大于等于N,也可M小于N,但是M大于等于N时预测的结果比M小于N时预测的结果准确,具体可根据实际需要预设M与N,此处不做具体限定。
102、将获取的业务消费特征数据、位置活动特征数据及社交网络特征数据输入预先训练好的分类器进行计算并输出计算结果,所述计算结果为用户的离网预测结果。
本发明实施例中,获取用户的业务消费特征数据、位置活动特征数据及社交网络特征数据,将这三类数据输入到分类器对用户进行离网预测。本发明实施例结合用户的位置活动特征数据及社交网络特征数据对用户离网特征进行了全面的刻画,根据上述三类数据对用户进行离网预测,预测结果更加可靠准确。
为便于理解,下面以一个具体实施例对本发明预测用户离网的方法进行描述,请参阅图2,本实施例的方法包括:
201、将各个用户在第二预置时间段内的业务消费特征数据、位置活动特征数据及社交网络特征数据作为分类器的第一输入,将各个用户当前的网络状态作为分类器的第二输入,利用预设的算法对第一输入及第二输入进行训练得到分类器;
本实施例中的分类器f可以是二值分类器,所谓二值分类器指的是一种将输入的样本特征向量xn映射到两值yn={0,1}的函数,由若干参数构成,通常参数的具体值待定,通过训练得到。
分类器f的训练指的是:给定已知的正样本和负样本{xn,yn}对,估计函数f的参数的过程,其中,属于yn=1的样本xn称为正样本,属于yn=0的样本xn称为负样本。
具体在本实施例中,分类器的训练过程是将各个用户在第二预置时间段内的业务消费特征数据、位置活动特征数据及社交网络特征数据作为分类器的第一输入,将各个用户当前的网络状态(包括离网或在网)作为分类器的第二输入,利用预设的算法对第一输入及第二输入进行训练得到分类器,其中,预设的算法包括:随机森林算法、支持向量机算法、深层神经网络算法及逻辑回归算法等。即本实施例中分类器的训练指的是已知分类器f的输入为各个用户的业务消费特征数据、位置活动特征数据及社交网络特征数据,且已知分类器f的输出为各个用户当前的网络状态,估计函数f的参数的过程。
202、获取用户在第一预置时间段内的业务消费特征数据,从位置活动特征矩阵中抽取用户的位置活动特征数据,从社交网络特征矩阵中抽取用户的社交网络特征数据;
业务消费特征数据指的是用户的账单、话单上呈现的数据,例如:用户每天的通话时长、每天使用的数据流量,每月的消费金额等,业务消费特征数据可直接从用户的账单、话单中直接获取。
位置活动特征数据指的是用户在第一预置时间段内与各个基站通信的相关数据,例如与用户通信的基站的标识,用户与基站连接的频率、时长等。本实施例中,将在第一预置时间段内各个用户与各个基站通信的相关数据构成一个矩阵,将该矩阵称为位置活动特征矩阵,矩阵中的每一个元素代表一个用户与一个基站通信的相关数据,然后从位置活动特征矩阵中抽取用户与各个基站通信的相关数据作为该用户的位置活动特征数据。
社交网络特征数据指的是在第一预置时间段内用户与社交网络中的其他用户通信的相关数据,例如与用户通信的其他用户的标识,用户与其他用户通信的时长、频率等。本实施例中,将在第一预置时间段内社交网络中各个用户
相互通信的相关数据构成一个矩阵,将该矩阵称为社交网络特征矩阵,矩阵中的每一个元素代表一个用户与另一个用户通信的相关数据,然后从社交网络特征矩阵中抽取用户与其他用户通信的相关数据作为该用户的社交网络特征数据。
在具体实现中,第一预置时间段可预先设定,例如三个月、六个月等,通常可以利用用户前M个月的相关数据预测用户未来N个月的离网状况,其中,M与N均为正整数,M可以大于等于N,也可M小于N,但是M大于等于N时预测的结果比M小于N时预测的结果准确,具体可根据实际需要预设M与N,此处不做具体限定。
另外,需要说明的是,第二预置时间段需要大于第一预置时间段。
203、将用户的位置活动特征数据的维度降低到预设维度,根据用户的社交网络特征数据计算用户在社交网络中的影响力;
通常来说,用户的位置活动特征数据的维度会比较高,通常维度M≥105,无法直接使用。因此,本实施例中,在获取用户的位置活动特征数据之后,需要对位置活动特征数据进行降维处理,降维处理的算法包括但不限于:主成分分析(Principal Component Analysis,PCA)算法、隐含狄利克雷分布(Latent Dirichlet allocation,LDA)算法及概率矩阵分解(Probabilistic Matrix Factorization,PMF)算法。因为用户在不同的时间段内仅连接部分基站,所以用于表示用户的位置活动特征数据的矩阵是个稀疏矩阵,即矩阵中的大部分元素为0,具体地,可采用LDA算法来降维,将用于表示用户的位置活动特征数据的稀疏矩阵分解成θN×K和φK×M的乘积,即其中,K为用户指定的值,例如K=100,K远小于M,矩阵θN×K的维度为K,从而达到降维的效果。在LDA降维算法的作用下得到矩阵θN×K,将矩阵θN×K作为降低到预设维度的位置活动特征数据。
对于社交网络特征数据,可以根据用户的社交网络特征数据计算用户在社交网络中的影响力。因为在社交网络中,与用户通信的其他用户一般只集中在几个固定的用户,因此,用于表示用户的社交网络特征数据的矩阵仍是一个稀疏矩阵,矩阵中的大多数元素为0,接下来采用预设的影响力传递算法计算用户在社交网络中的影响力,上述影响力传递算法包括但不限于网页排名
PageRank算法、基于超链接分析的主题搜索Hypertext-Induced Topic Search算法,随机游走Random Walk算法。
204、将用户的业务消费特征数据、降低到预设维度的位置活动特征数据及计算所得的用户在社交网络中的影响力输入到训练好的分类器进行计算并输出计算结果,所述计算结果为用户的离网预测结果。
在上述计算的过程中,用户的业务消费特征数据越大,计算结果中用户离网的概率越低;用户在社交网络中的影响力越大,计算结果中用户离网的概率越低;当用户与同一网络中的不同基站通信时,用户与网络中通信质量越差的基站通信的相关数据越小,计算结果中用户离网的概率越低,当用户与不同网络中的基站进行通信时,与用户通信的相关数据越大的基站,计算结果中用户离开其所在的网络的概率越低。
因为用户的业务消费特征数据越大,说明用户离网的成本越高,用户就不会轻易离网。同样社交网络中影响力越大的用户离网成本也会越高,用户也不会轻易离网。根据用户的位置活动特征数据就可以得知与用户通信的基站及其他相关数据,当用户与同一网络中的不同基站通信,例如用户与同一网络中的A、B、C三个基站通信,经过前期调查统计发现基站A的通信质量优于基站B,基站B的通信质量优于基站C,如果用户经常与通信质量很差的基站C通信,那么用户体验到的服务就很差,最终导致未来离网,相反,如果用户经常与基站A通信,那么用户体验到的服务就很好,未来离网的概率就会变低;当用户与不同网络中的基站进行通信时,例如在预置的时间段内,用户与X网络(X地的通信网络)中的A基站有过通信,且与Y网络(Y地的通信网络)中的基站B有过通信,且用户与A基站通信的时长、频率相比之前都有所减小,相反与基站B通信的时长、频率相比之前都有所增加,此时有可能用户从X地来到了Y地,那么用户未来离开X网络的概率就会变大。
本实施例中,获取用户的业务消费特征数据、位置活动特征数据及社交网络特征数据,将这三类数据输入到分类器对用户进行离网预测。相较于现有技术,本发明实施例利用用户的业务消费特征数据、位置活动特征数据及社交网络特征数据对用户离网特征进行了全面的刻画,根据这三类数据对用户进行离网预测,预测结果更加可靠准确,实验证明,采用本实施例所提供的方法进行
离网预测,预测的AUC值大于0.8。其中,AUC值指的是分类器预测精度的指标,AUC值一般大于0小于1,值越大代表预测精度越高。
下面对本发明实施例提供预测用户离网的装置进行描述,请参阅图3,本实施例的装置300包括:
获取单元301,用于获取用户在第一预置时间段内的业务消费特征数据、位置活动特征数据及社交网络特征数据,位置活动特征数据指的是用户在第一预置时间段内与各个基站通信的相关数据,社交网络特征数据指的是在第一预置时间段内用户与社交网络中的其他用户通信的相关数据;
处理单元302,用于将获取单元301获取的业务消费特征数据、位置活动特征数据及社交网络特征数据输入预先训练好的分类器进行计算并输出计算结果,计算结果为用户的离网预测结果。
为便于理解,下面以一个具体实施例对本发明预测用户离网的装置进行描述,请参阅图4,本实施例的装置400包括:
分类器训练单元401,用于训练分类器,具体为:将各个用户在第二预置时间段内的业务消费特征数据、位置活动特征数据及社交网络特征数据作为所述分类器的第一输入,将各个用户当前的网络状态作为所述分类器的第二输入,利用预设的算法对输入所述分类器的第一输入及第二输入进行训练得到所述分类器,第二预置时间段大于第一预置时间段,所述预设的算法包括:随机森林算法、支持向量机算法、深层神经网络算法及逻辑回归算法;
获取单元402,用于获取用户在第一预置时间段内的业务消费特征数据、位置活动特征数据及社交网络特征数据;
处理单元403,用于将获取单元获取的业务消费特征数据、位置活动特征数据及社交网络特征数据输入预先训练好的分类器进行计算并输出计算结果,计算结果为用户的离网预测结果。
其中,处理单元403包括:
第一处理子单元4031,用于将获取单元402获取的位置活动特征数据的维度降低到预设维度;
第二处理子单元4032,用于根据所述获取单元402获取的所述社交网络特征数据计算所述用户在所述社交网络中的影响力;
第三处理子单元4033,用于将业务消费特征数据、降低到预设维度的位置活动特征数据及计算所得的用户在社交网络中的影响力输入到预先训练好的分类器进行计算并输出计算结果。
为进一步理解,下面以一个实际应用场景对本实施例中预测用户离网的装置400内的各单元之间的交互方式进行描述,具体如下:
首先,分类器训练单元401将各个用户在第二预置时间段内的业务消费特征数据、位置活动特征数据及社交网络特征数据作为分类器的第一输入,将各个用户当前的网络状态作为分类器的第二输入,利用预设的算法对第一输入及第二输入进行训练得到分类器。其中,预设的算法包括:随机森林算法、支持向量机算法、深层神经网络算法及逻辑回归算法等。即本实施例中分类器的训练指的是已知分类器f的输入为各个用户的业务消费特征数据、位置活动特征数据及社交网络特征数据,且已知分类器f的输出为各个用户当前的网络状态,估计函数f的参数的过程。
在分类器训练单元401训练好分类器之后,获取单元402获取用户在第一预置时间段内的业务消费特征数据,位置活动特征数据及社交网络特征数据。
其中,业务消费特征数据指的是用户的账单、话单上呈现的数据,例如:用户每天的通话时长、每天使用的数据流量,每月的消费金额等,业务消费特征数据可直接从用户的账单、话单中直接获取。
位置活动特征数据指的是用户在第一预置时间段内与各个基站通信的相关数据,例如与用户通信的基站的标识,用户与基站连接的频率、时长等。本实施例中,先将在第一预置时间段内各个用户与各个基站通信的相关数据构成一个矩阵,将该矩阵称为位置活动特征矩阵,矩阵中的每一个元素代表一个用户与一个基站通信的相关数据,然后获取单元402从位置活动特征矩阵中抽取用户与各个基站通信的相关数据作为该用户的位置活动特征数据。
社交网络特征数据指的是在第一预置时间段内用户与社交网络中的其他用户通信的相关数据,例如与用户通信的其他用户的标识,用户与其他用户通信的时长、频率等。本实施例中,可先将在第一预置时间段内社交网络中各个用户相互通信的相关数据构成一个矩阵,将该矩阵称为社交网络特征矩阵,矩阵中的每一个元素代表一个用户与另一个用户通信的相关数据,然后获取单元
402从社交网络特征矩阵中抽取用户与其他用户通信的相关数据作为该用户的社交网络特征数据。
在具体实现中,第一预置时间段可预先设定,例如三个月、六个月等,通常可以利用用户前M个月的相关数据预测用户未来N个月的离网状况,其中,M与N均为正整数,M可以大于等于N,也可M小于N,但是M大于等于N时预测的结果比M小于N时预测的结果准确,具体可根据实际需要预设M与N,此处不做具体限定。
另外,需要说明的是,第二预置时间段需要大于第一预置时间段。
接下来第一处理子单元4031将获取单元402获取的用户的位置活动特征数据的维度降低到预设维度。因为用户的位置活动特征数据的维度会比较高,通常维度M≥105,无法直接使用。因此,本实施例中,在获取用户的位置活动特征数据之后,第一处理子单元4031需要对位置活动特征数据进行降维处理,降维处理的算法包括但不限于:主成分分析(Principal Component Analysis,PCA)算法、隐含狄利克雷分布(Latent Dirichlet allocation,LDA)算法及概率矩阵分解(Probabilistic Matrix Factorization,PMF)算法。因为用户在不同的时间段内仅连接部分基站,所以用于表示用户的位置活动特征数据的矩阵是个稀疏矩阵,即矩阵中的大部分元素为0,具体地,可采用LDA算法来降维,将用于表示用户的位置活动特征数据的稀疏矩阵分解成θN×K和φK×M的乘积,即其中,K为用户指定的值,例如K=100,K远小于M,矩阵θN×K的维度为K,从而达到降维的效果。在LDA降维算法的作用下得到矩阵θN×K,将矩阵θN×K作为降低到预设维度的位置活动特征数据。
对于社交网络特征数据,第二处理子单元4032可以根据用户的社交网络特征数据计算用户在社交网络中的影响力。因为在社交网络中,与用户通信的其他用户一般只集中在几个固定的用户,因此,用于表示用户的社交网络特征数据的矩阵仍是一个稀疏矩阵,矩阵中的大多数元素为0,接下来第二处理子单元4032采用预设的影响力传递算法计算用户在社交网络中的影响力,上述影响力传递算法包括但不限于网页排名PageRank算法、基于超链接分析的主题搜索Hypertext-Induced Topic Search算法,随机游走Random Walk算法。
第三处理子单元4033将用户的业务消费特征数据、降低到预设维度的位置
活动特征数据及计算所得的用户在社交网络中的影响力输入到训练好的分类器进行计算并输出计算结果,所述计算结果为用户的离网预测结果。
在上述计算的过程中,用户的业务消费特征数据越大,计算结果中用户离网的概率越低;用户在社交网络中的影响力越大,计算结果中用户离网的概率越低;当用户与同一网络中的不同基站通信时,用户与网络中通信质量越差的基站通信的相关数据越小,计算结果中用户离网的概率越低,当用户与不同网络中的基站进行通信时,与用户通信的相关数据越大的基站,计算结果中用户离开其所在的网络的概率越低。
因为用户的业务消费特征数据越大,说明用户离网的成本越高,用户就不会轻易离网。同样社交网络中影响力越大的用户离网成本也会越高,用户也不会轻易离网。根据用户的位置活动特征数据就可以得知与用户通信的基站及其他相关数据,当用户与同一网络中的不同基站通信,例如用户与同一网络中的A、B、C三个基站通信,经过前期调查统计发现基站A的通信质量优于基站B,基站B的通信质量优于基站C,如果用户经常与通信质量很差的基站C通信,那么用户体验到的服务就很差,最终导致未来离网,相反,如果用户经常与基站A通信,那么用户体验到的服务就很好,未来离网的概率就会变低;当用户与不同网络中的基站进行通信时,例如在预置的时间段内,用户与X网络(X地的通信网络)中的A基站有过通信,且与Y网络(Y地的通信网络)中的基站B有过通信,且用户与A基站通信的时长、频率相比之前都有所减小,相反与基站B通信的时长、频率相比之前都有所增加,此时有可能用户从X地来到了Y地,那么用户未来离开X网络的概率就会变大。
本实施例中,获取单元获取用户的业务消费特征数据、位置活动特征数据及社交网络特征数据,处理单元将这三类数据输入到分类器对用户进行离网预测。相较于现有技术,本发明实施例利用用户的业务消费特征数据、位置活动特征数据及社交网络特征数据对用户离网特征进行了全面的刻画,根据这三类数据对用户进行离网预测,预测结果更加可靠准确。
下面请参阅图5,图5提供了本发明预测用户离网的装置的另一实施例示意图,本实施例的预测用户离网的装置500可以用于实施上述实施例提供的预测用户离网的方法,在实际应用中,预测用户离网的装置500可以集成到电子设
备中,该电子设备可以计算机等。具体来讲:
预测用户离网的装置500可以包括RF(Radio Frequency,射频)电路510、包括有一个或一个以上计算机可读存储介质的存储器520、输入单元530、显示单元540、传感器550、音频电路560、WiFi(wireless fidelity,无线保真)模块570、包括有一个或者一个以上处理核心的处理器580、以及电源590等部件。本领域技术人员可以理解,图5中示出的结构并不构成对预测用户离网的装置500的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。其中:
RF电路510可用于收发消息或通话过程中,信号的接收和发送,特别地,将基站的下行信息接收后,交由一个或者一个以上处理器580处理;另外,将涉及上行的数据发送给基站。通常,RF电路510包括但不限于天线、至少一个放大器、调谐器、一个或多个振荡器、用户身份模块(SIM)卡、收发信机、耦合器、LNA(Low Noise Amplifier,低噪声放大器)、双工器等。此外,RF电路510还可以通过无线通信与网络和其他设备通信。所述无线通信可以使用任一通信标准或协议,包括但不限于GSM(Global System of Mobile communication,全球移动通讯系统)、GPRS(General Packet Radio Service,通用分组无线服务)、CDMA(Code Division Multiple Access,码分多址)、WCDMA(Wideband Code Division Multiple Access,宽带码分多址)、LTE(Long Term Evolution,长期演进)、电子邮件、SMS(Short Messaging Service,短消息服务)等。
存储器520可用于存储软件程序以及模块,处理器580通过运行存储在存储器520的软件程序以及模块,从而执行各种功能应用以及数据处理。存储器520可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序(比如声音播放功能、图像播放功能等)等;存储数据区可存储根据存储设备的使用创建数据(比如音频数据、电话本等)。此外,存储器520可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。相应地,存储器520还可以包括存储器控制器,以提供处理器580和输入单元530对存储器520的访问。
输入单元530可用于接收输入的数字或字符信息,以及产生与用户设置以及功能控制有关的键盘、鼠标、操作杆、光学或者轨迹球信号输入。具体地,输入单元530可包括触敏表面531以及其他输入设备532。触敏表面531,也称为触摸显示屏或者触控板,可收集用户在其上或附近的触摸操作(比如用户使用手指、触笔等任何适合的物体或附件在触敏表面531上或在触敏表面531附近的操作),并根据预先设定的程式驱动相应的连接装置。可选的,触敏表面531可包括触摸检测装置和触摸控制器两个部分。其中,触摸检测装置检测用户的触摸方位,并检测触摸操作带来的信号,将信号传送给触摸控制器;触摸控制器从触摸检测装置上接收触摸信息,并将它转换成触点坐标,再送给处理器580,并能接收处理器580发来的命令并加以执行。此外,可以采用电阻式、电容式、红外线以及表面声波等多种类型实现触敏表面531。除了触敏表面531,输入单元530还可以包括其他输入设备532。具体地,其他输入设备532可以包括但不限于物理键盘、功能键(比如音量控制按键、开关按键等)、轨迹球、鼠标、操作杆等中的一种或多种。
显示单元540可用于显示由用户输入的信息或提供给用户的信息以及装置的各种图形用户接口,这些图形用户接口可以由图形、文本、图标、视频和其任意组合来构成。显示单元540可包括显示面板541,可选的,可以采用LCD(Liquid Crystal Display,液晶显示器)、OLED(Organic Light-Emitting Diode,有机发光二极管)等形式来配置显示面板541。进一步的,触敏表面531可覆盖显示面板541,当触敏表面531检测到在其上或附近的触摸操作后,传送给处理器580以确定触摸事件的类型,随后处理器580根据触摸事件的类型在显示面板541上提供相应的视觉输出。虽然在图5中,触敏表面531与显示面板541是作为两个独立的部件来实现输入和输入功能,但是在某些实施例中,可以将触敏表面531与显示面板541集成而实现输入和输出功能。
预测用户离网的装置500还可包括至少一种传感器550,比如光传感器、运动传感器以及其他传感器。具体地,光传感器可包括环境光传感器及接近传感器,其中,环境光传感器可根据环境光线的明暗来调节显示面板541的亮度,接近传感器可在装置500移动到耳边时,关闭显示面板541和/或背光。作为运动传感器的一种,重力加速度传感器可检测各个方向上(一般为三轴)加速度
的大小,静止时可检测出重力的大小及方向,可用于识别装置姿态的应用(比如横竖屏切换、相关游戏、磁力计姿态校准)、振动识别相关功能(比如计步器、敲击)等;至于装置500还可配置的陀螺仪、气压计、湿度计、温度计、红外线传感器等其他传感器,在此不再赘述。
音频电路560、扬声器561,传声器562可提供用户与装置之间的音频接口。音频电路560可将接收到的音频数据转换后的电信号,传输到扬声器561,由扬声器561转换为声音信号输出;另一方面,传声器562将收集的声音信号转换为电信号,由音频电路560接收后转换为音频数据,再将音频数据输出处理器580处理后,经RF电路510以发送给比如另一装置,或者将音频数据输出至存储器520以便进一步处理。音频电路560还可能包括耳塞插孔,以提供外设耳机与装置的通信。
WiFi属于短距离无线传输技术,预测用户离网的装置500通过WiFi模块570可以帮助用户收发电子邮件、浏览网页和访问流式媒体等,它为用户提供了无线的宽带互联网访问。虽然图5示出了WiFi模块570,但是可以理解的是,其并不属于装置的必须构成,完全可以根据需要在不改变发明的本质的范围内而省略。
处理器580是预测用户离网的装置的控制中心,利用各种接口和线路连接整个装置的各个部分,通过运行或执行存储在存储器520内的软件程序和/或模块,以及调用存储在存储器520内的数据,执行存储设备的各种功能和处理数据,从而对存储设备进行整体监控。可选的,处理器580可包括一个或多个处理核心;优选的,处理器580可集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作系统、用户界面和应用程序等,调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到处理器580中。
预测用户离网的装置500还包括给各个部件供电的电源590(比如电池),优选的,电源可以通过电源管理系统与处理器580逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。电源590还可以包括一个或一个以上的直流或交流电源、再充电系统、电源故障检测电路、电源转换器或者逆变器、电源状态指示器等任意组件。
尽管未示出,预测用户离网的装置500还可以包括摄像头、蓝牙模块等,在此不再赘述。具体在本实施例中,预测用户离网的装置500包括有存储器520,以及一个或者一个以上的程序,其中一个或者一个以上程序存储于存储器520中,且经配置以由一个或者一个以上处理器580执行上述一个或者一个以上程序包含用于进行以下操作的指令:
获取用户在第一预置时间段内的业务消费特征数据、位置活动特征数据及社交网络特征数据,位置活动特征数据指的是用户在第一预置时间段内与各个基站通信的相关数据,社交网络特征数据指的是在第一预置时间段内用户与社交网络中的其他用户通信的相关数据;
将获取的业务消费特征数据、位置活动特征数据及社交网络特征数据输入预先训练好的分类器进行计算并输出计算结果,计算结果为用户的离网预测结果。
可选的,获取用户在第一预置时间段内的位置活动特征数据包括:
从位置活动特征矩阵中抽取用户的位置活动特征数据,位置活动特征矩阵为在第一预置时间段内各个用户与各个基站通信的相关数据构成的矩阵。
可选的,获取用户在第一预置时间段内的社交网络特征数据包括:
从社交网络特征矩阵中抽取用户的社交网络特征数据,社交网络特征矩阵为在第一预置时间段内社交网络中各个用户相互通信的相关数据构成的矩阵。
可选的,在获取用户在第一预置时间段内的业务消费特征数据、位置活动特征数据及社交网络特征数据之后,所述方法还包括:
将位置活动特征数据的维度降低到预设维度,以及根据社交网络特征数据计算用户在社交网络中的影响力;
将获取的所述业务消费特征数据、位置活动特征数据及社交网络特征数据输入预先训练好的分类器进行计算并输出计算结果包括:
将业务消费特征数据、降低到预设维度的位置活动特征数据及计算所得的用户在社交网络中的影响力输入到预先训练好的分类器进行计算并输出计算结果。
可选的,将业务消费特征数据、降低到预设维度的位置活动特征数据及计算所得的用户在社交网络中的影响力输入到预先训练好的分类器进行计算的
过程中,业务消费特征数据越大,用户离网的概率越低;用户在社交网络中的影响力越大,用户离网的概率越低;当用户与同一网络中的不同基站通信时,用户与网络中通信质量越差的基站通信的相关数据越小,用户离网的概率越低,当用户与不同网络中的基站进行通信时,与用户通信的相关数据越大的基站,用户离开其所在的网络的概率越低。
可选的,在获取用户在第一预置时间段内的业务消费特征数据、位置活动特征数据及社交网络特征数据之前,所述方法还包括训练所述分类器,具体方法如下:
将各个用户在第二预置时间段内的业务消费特征数据、位置活动特征数据及社交网络特征数据作为分类器的第一输入,将各个用户当前的网络状态作为分类器的第二输入,利用预设的算法对输入分类器的第一输入及第二输入进行训练得到所述分类器,第二预置时间段大于第一预置时间段,预设的算法包括:随机森林算法、支持向量机算法、深层神经网络算法及逻辑回归算法。
需要说明的是,本发明实施例提供的预测用户离网的装置500,还可以用于实现上述装置实施例中的其它功能,在此不再赘述。
另外需说明的是,以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。另外,本发明提供的装置实施例附图中,模块之间的连接关系表示它们之间具有通信连接,具体可以实现为一条或多条通信总线或信号线。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到本发明可借助软件加必需的通用硬件的方式来实现,当然也可以通过专用硬件包括专用集成电路、专用CPU、专用存储器、专用元器件等来实现。一般情况下,凡由计算机程序完成的功能都可以很容易地用相应的硬件来实现,而且,用来实现同一功能的具体硬件结构也可以是多种多样的,例如模拟电路、数字电路或专用电路等。但是,对本发明而言更多情况下软件程序实现是更佳的实施方式。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的
部分可以以软件产品的形式体现出来,该计算机软件产品存储在可读取的存储介质中,如计算机的软盘,U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述的方法。
以上对本发明实施例所提供的一种预测用户离网的方法及装置进行了详细介绍,对于本领域的一般技术人员,依据本发明实施例的思想,在具体实施方式及应用范围上均会有改变之处,因此,本说明书内容不应理解为对本发明的限制。
Claims (12)
- 一种预测用户离网的方法,其特征在于,包括:获取用户在第一预置时间段内的业务消费特征数据、位置活动特征数据及社交网络特征数据,所述位置活动特征数据指的是所述用户在所述第一预置时间段内与各个基站通信的相关数据,所述社交网络特征数据指的是在所述第一预置时间段内所述用户与社交网络中的其他用户通信的相关数据;将获取的所述业务消费特征数据、所述位置活动特征数据及所述社交网络特征数据输入预先训练好的分类器进行计算并输出计算结果,所述计算结果为所述用户的离网预测结果。
- 如权利要求1所述的方法,其特征在于,所述获取用户在第一预置时间段内的位置活动特征数据包括:从位置活动特征矩阵中抽取所述用户的位置活动特征数据,所述位置活动特征矩阵为在所述第一预置时间段内各个用户与各个基站通信的相关数据构成的矩阵。
- 如权利要求1或2所述的方法,其特征在于,所述获取用户在第一预置时间段内的社交网络特征数据包括:从社交网络特征矩阵中抽取所述用户的社交网络特征数据,所述社交网络特征矩阵为在所述第一预置时间段内所述社交网络中各个用户相互通信的相关数据构成的矩阵。
- 如权利要求1至3任意一项所述的方法,其特征在于,在获取用户在第一预置时间段内的业务消费特征数据、位置活动特征数据及社交网络特征数据之后,所述方法还包括:将所述位置活动特征数据的维度降低到预设维度,以及根据所述社交网络特征数据计算所述用户在所述社交网络中的影响力;所述将获取的所述业务消费特征数据、所述位置活动特征数据及所述社交网络特征数据输入预先训练好的分类器进行计算并输出计算结果包括:将所述业务消费特征数据、降低到预设维度的位置活动特征数据及计算所得的所述用户在所述社交网络中的影响力输入到预先训练好的分类器进行计 算并输出计算结果。
- 如权利要求4所述的方法,其特征在于,所述将所述业务消费特征数据、降低到预设维度的位置活动特征数据及计算所得的所述用户在所述社交网络中的影响力输入到预先训练好的分类器进行计算的过程中,所述业务消费特征数据越大,所述用户离网的概率越低;所述用户在所述社交网络中的影响力越大,所述用户离网的概率越低;当所述用户与同一网络中的不同基站通信时,所述用户与网络中通信质量越差的基站通信的相关数据越小,所述用户离网的概率越低,当所述用户与不同网络中的基站通信时,与所述用户通信的相关数据越大的基站,所述用户离开其所在的网络的概率越低。
- 如权利要求1至5任意一项所述的方法,其特征在于,在获取用户在第一预置时间段内的业务消费特征数据、位置活动特征数据及社交网络特征数据之前,所述方法还包括训练所述分类器,具体方法如下:将各个用户在第二预置时间段内的业务消费特征数据、位置活动特征数据及社交网络特征数据作为所述分类器的第一输入,将所述各个用户当前的网络状态作为所述分类器的第二输入,利用预设的算法对输入所述分类器的第一输入及第二输入进行训练得到所述分类器,所述第二预置时间段大于所述第一预置时间段,所述预设的算法包括:随机森林算法、支持向量机算法、深层神经网络算法及逻辑回归算法。
- 一种预测用户离网的装置,其特征在于,包括:获取单元,用于获取用户在第一预置时间段内的业务消费特征数据、位置活动特征数据及社交网络特征数据,所述位置活动特征数据指的是所述用户在所述第一预置时间段内与各个基站通信的相关数据,所述社交网络特征数据指的是在所述第一预置时间段内所述用户与社交网络中的其他用户通信的相关数据;处理单元,用于将所述获取单元获取的所述业务消费特征数据、所述位置活动特征数据及所述社交网络特征数据输入预先训练好的分类器进行计算并输出计算结果,所述计算结果为所述用户的离网预测结果。
- 如权利要求7所述的装置,其特征在于,所述获取单元获取所述用户在第一预置时间段内的位置活动特征数据包括:所述获取单元从位置活动特征矩阵中抽取所述用户的位置活动特征数据,所述位置活动特征矩阵为在所述第一预置时间段内各个用户与各个基站通信的相关数据构成的矩阵。
- 如权利要求7或8所述的装置,其特征在于,所述获取单元获取所述用户在第一预置时间段内的社交网络特征数据包括:所述获取单元从社交网络特征矩阵中抽取所述用户的社交网络特征数据,所述社交网络特征矩阵为在所述第一预置时间段内所述社交网络中各个用户相互通信的相关数据构成的矩阵。
- 如权利要求7至9任意一项所述的装置,其特征在于,所述处理单元包括:第一处理子单元,用于将所述获取单元获取的所述位置活动特征数据的维度降低到预设维度;第二处理子单元,用于根据所述获取单元获取的所述社交网络特征数据计算所述用户在所述社交网络中的影响力;第三处理子单元,用于将所述业务消费特征数据、降低到预设维度的位置活动特征数据及计算所得的所述用户在所述社交网络中的影响力输入到预先训练好的分类器进行计算并输出计算结果。
- 如权利要求9所述的装置,其特征在于,在所述第三处理子单元将所述业务消费特征数据、降低到预设维度的位置活动特征数据及计算所得的所述用户在所述社交网络中的影响力输入到预先训练好的分类器进行计算的过程中,所述业务消费特征数据越大,所述用户离网的概率越低;所述用户在所述社交网络中的影响力越大,所述用户离网的概率越低;当所述用户与同一网络中的不同基站通信时,所述用户与网络中通信质量越差的基站通信的相关数据越小,所述用户离网的概率越低,当所述用户与不同网络中的基站通信时,与所述用户通信的相关数据越大的基站,所述用户离开其所在的网络的概率越低。
- 如权利要求7至11任意一项所述的装置,其特征在于,所述装置还包括分类器训练单元,用于训练所述分类器,所述分类器训练单元具体用于:将各个用户在第二预置时间段内的业务消费特征数据、位置活动特征数据 及社交网络特征数据作为所述分类器的第一输入,将所述各个用户当前的网络状态作为所述分类器的第二输入,利用预设的算法对输入所述分类器的第一输入及第二输入进行训练得到所述分类器,所述第二预置时间段大于所述第一预置时间段,所述预设的算法包括:随机森林算法、支持向量机算法、深层神经网络算法及逻辑回归算法。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/392,698 US20170109756A1 (en) | 2014-07-30 | 2016-12-28 | User Unsubscription Prediction Method and Apparatus |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410371307.2A CN105447583A (zh) | 2014-07-30 | 2014-07-30 | 一种预测用户离网的方法及装置 |
CN201410371307.2 | 2014-07-30 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/392,698 Continuation US20170109756A1 (en) | 2014-07-30 | 2016-12-28 | User Unsubscription Prediction Method and Apparatus |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2016015471A1 true WO2016015471A1 (zh) | 2016-02-04 |
Family
ID=55216726
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2015/073872 WO2016015471A1 (zh) | 2014-07-30 | 2015-03-09 | 一种预测用户离网的方法及装置 |
Country Status (3)
Country | Link |
---|---|
US (1) | US20170109756A1 (zh) |
CN (1) | CN105447583A (zh) |
WO (1) | WO2016015471A1 (zh) |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107527238A (zh) * | 2017-08-17 | 2017-12-29 | 北京小度信息科技有限公司 | 确定用户行为衰退倾向的方法、装置及电子设备 |
CN108090785B (zh) * | 2017-12-07 | 2021-03-02 | 北京小度信息科技有限公司 | 确定用户行为衰退倾向的方法、装置及电子设备 |
CN108377204B (zh) * | 2018-02-13 | 2020-03-24 | 中国联合网络通信集团有限公司 | 一种用户离网的预测方法及装置 |
CN108712279B (zh) * | 2018-04-27 | 2021-08-17 | 中国联合网络通信集团有限公司 | 用户离网的预测方法及装置 |
CN109086931A (zh) * | 2018-08-01 | 2018-12-25 | 中国联合网络通信集团有限公司 | 预测用户离网方法及系统 |
CN109635990B (zh) * | 2018-10-12 | 2022-09-16 | 创新先进技术有限公司 | 一种训练方法、预测方法、装置、电子设备及存储介质 |
CN110175711A (zh) * | 2019-05-17 | 2019-08-27 | 北京市天元网络技术股份有限公司 | 一种基于联合lstm基站小区流量预测方法以及装置 |
CN113543117B (zh) * | 2020-04-22 | 2022-10-04 | 中国移动通信集团重庆有限公司 | 携号转网用户的预测方法、装置及计算设备 |
CN114090243B (zh) * | 2021-11-10 | 2024-06-18 | 支付宝(杭州)信息技术有限公司 | 模型计算方法和装置 |
CN114330866B (zh) * | 2021-12-24 | 2023-11-24 | 江苏微皓智能科技有限公司 | 数据处理方法、装置、电子设备及计算机可读存储介质 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101729682A (zh) * | 2009-11-11 | 2010-06-09 | 南京联创科技集团股份有限公司 | 通信网络用户自动跟踪方法 |
US20140129420A1 (en) * | 2012-11-08 | 2014-05-08 | Mastercard International Incorporated | Telecom social network analysis driven fraud prediction and credit scoring |
CN103854065A (zh) * | 2012-11-30 | 2014-06-11 | 西门子公司 | 一种用于客户流失预测的方法和装置 |
CN103905229A (zh) * | 2012-12-27 | 2014-07-02 | 中国移动通信集团四川有限公司 | 一种终端用户流失预警方法与装置 |
-
2014
- 2014-07-30 CN CN201410371307.2A patent/CN105447583A/zh active Pending
-
2015
- 2015-03-09 WO PCT/CN2015/073872 patent/WO2016015471A1/zh active Application Filing
-
2016
- 2016-12-28 US US15/392,698 patent/US20170109756A1/en not_active Abandoned
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101729682A (zh) * | 2009-11-11 | 2010-06-09 | 南京联创科技集团股份有限公司 | 通信网络用户自动跟踪方法 |
US20140129420A1 (en) * | 2012-11-08 | 2014-05-08 | Mastercard International Incorporated | Telecom social network analysis driven fraud prediction and credit scoring |
CN103854065A (zh) * | 2012-11-30 | 2014-06-11 | 西门子公司 | 一种用于客户流失预测的方法和装置 |
CN103905229A (zh) * | 2012-12-27 | 2014-07-02 | 中国移动通信集团四川有限公司 | 一种终端用户流失预警方法与装置 |
Also Published As
Publication number | Publication date |
---|---|
US20170109756A1 (en) | 2017-04-20 |
CN105447583A (zh) | 2016-03-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2016015471A1 (zh) | 一种预测用户离网的方法及装置 | |
CN108334539B (zh) | 对象推荐方法、移动终端及计算机可读存储介质 | |
WO2016110182A1 (zh) | 匹配表情图像的方法、装置及终端 | |
WO2017041664A1 (zh) | 一种征信评分确定方法、装置及存储介质 | |
WO2015081801A1 (en) | Method, server, and system for information push | |
JP2018520403A (ja) | メッセージ更新方法、装置、および端末 | |
CN111125523B (zh) | 搜索方法、装置、终端设备及存储介质 | |
CN107241492B (zh) | 一种动态信息显示方法、设备及计算机可读存储介质 | |
CN107391198B (zh) | 任务调度方法和装置、计算机可读存储介质、移动终端 | |
CN111444425B (zh) | 一种信息推送方法、电子设备及介质 | |
CN109144705A (zh) | 应用程序管理方法、移动终端及计算机可读存储介质 | |
JP6915074B2 (ja) | メッセージ通知方法及び端末 | |
CN117093766A (zh) | 问诊平台的信息推荐方法、相关装置及存储介质 | |
CN107632985B (zh) | 网页预加载方法及装置 | |
CN106294087B (zh) | 一种对业务执行操作的操作频率的统计方法及装置 | |
CN107193453A (zh) | 联系人标注方法及装置 | |
CN103455594B (zh) | 一种浏览器地址栏推荐网址的方法、装置及终端设备 | |
CN110866114B (zh) | 对象行为的识别方法、装置及终端设备 | |
CN110069320B (zh) | 一种应用程序的分类校正方法、终端、系统及存储介质 | |
WO2023173666A1 (zh) | 人脸支付方法、装置、电子设备、存储介质、程序和产品 | |
CN112862289B (zh) | 一种临床研究从业者的信息匹配方法和装置 | |
CN114840570A (zh) | 数据处理方法、装置、电子设备及存储介质 | |
CN114640739A (zh) | 应用推送方法、智能终端及存储介质 | |
CN111104823B (zh) | 人脸识别方法、装置及存储介质、终端设备 | |
CN112181508B (zh) | 一种页面自动刷新方法、装置及计算机设备 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 15827607 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 15827607 Country of ref document: EP Kind code of ref document: A1 |