CN112153636A - Method for predicting number portability and roll-out of telecommunication industry user based on machine learning - Google Patents

Method for predicting number portability and roll-out of telecommunication industry user based on machine learning Download PDF

Info

Publication number
CN112153636A
CN112153636A CN202011178646.0A CN202011178646A CN112153636A CN 112153636 A CN112153636 A CN 112153636A CN 202011178646 A CN202011178646 A CN 202011178646A CN 112153636 A CN112153636 A CN 112153636A
Authority
CN
China
Prior art keywords
machine learning
prediction
predicting
prediction model
samples
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011178646.0A
Other languages
Chinese (zh)
Inventor
吴勇
严伟强
钟宏泽
王凯
李纺
梁建斌
陈一蕾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Hongcheng Computer Systems Co Ltd
Original Assignee
Zhejiang Hongcheng Computer Systems Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Hongcheng Computer Systems Co Ltd filed Critical Zhejiang Hongcheng Computer Systems Co Ltd
Priority to CN202011178646.0A priority Critical patent/CN112153636A/en
Publication of CN112153636A publication Critical patent/CN112153636A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W8/00Network data management
    • H04W8/26Network addressing or numbering for mobility support
    • H04W8/28Number portability ; Network address portability
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention relates to the technical field of telecommunication, in particular to a method for predicting number portability and transfer-out of telecommunication users based on machine learning, which comprises the following steps: 1) collecting characteristic variable data, preprocessing the characteristic variable data, storing the characteristic variable data in a database, sampling samples in the database, and controlling the proportion of positive samples to negative samples to be 1: 10; 2) samples are randomly divided into a training set and a testing set; 3) selecting an XGboost algorithm as a basis to construct a prediction model, inputting a training set to train the prediction model, and obtaining a prediction probability value and the importance degree of characteristics; 4) and performing data prediction on the test set by using the trained model, evaluating the prediction model according to the prediction result, and performing optimization iteration on the prediction model if the evaluation result is lower than a threshold value. The invention has the beneficial effects that: the prediction efficiency is improved, early warning and timely maintenance are achieved in advance, and the prediction model can dynamically perform optimization iteration.

Description

Method for predicting number portability and roll-out of telecommunication industry user based on machine learning
Technical Field
The invention relates to the technical field of telecommunication, in particular to a method for predicting number portability and transfer-out of telecommunication users based on machine learning.
Background
The number portability service is implemented completely in 11 months in 2019 according to the requirements of the Ministry of industry and communications, the main content is that a user can select a proper telecom operator according to own will, and meanwhile, in the process, the original number can be reserved, and the purpose of moving a mobile phone without changing the number is achieved.
The number portability to the operator is actually divided into two parts: and transferring out the number portability and transferring in the number portability. The number portability is that the user carries the original number to be transferred from the local network operator to other network operators, and can be regarded as a situation that the high-risk user is off the network. The sign-on transfer is opposite.
The customer resources are the core competitiveness of telecom operation enterprises, and how to reduce customer loss, reduce the probability of number portability and transfer out of customers and reduce the economic loss caused by number portability and transfer out of customers becomes a main topic discussed by the telecom operation enterprises. The telecommunication enterprises actively utilize the leading-edge technology and capital, so that the enterprises develop towards intellectualization, synthesis and individuation, and the competitiveness capability is improved, so as to maximize the market share and profit. The method aims to solve the problems of reduced market share and reduced income caused by the user number portability transferring, and simultaneously aims to improve the success rate of saving, reduce the number portability transferring rate and reduce the income loss caused by the user number portability transferring.
Before the number portability network is developed, due to the lack of forward samples, namely users who actually carry numbers to roll out, most of the established models are based on the traditional data mining method. Most of the models are non-machine learning models, such as rule empirical model analysis methods and expert scoring methods. Through several network switching scenes made by the service side, the statistics processing, the analysis and the induction are carried out, and then different network switching probabilities are divided. The method has the problems that the method is easy to approximate and has a plurality of uncertain factors, so that the prediction accuracy of the network forwarding user is not high, and the advance warning of the number portability forwarding of the user is difficult to realize.
Disclosure of Invention
The invention aims to overcome the defects and provide a method for predicting number portability and transfer-out of telecommunication users based on machine learning, so that accurate and effective early warning is performed in advance.
The present invention achieves the above objectives by the following desensitization protocol: a method for predicting number portability and roll-out of telecommunication industry users based on machine learning comprises the following steps:
1) collecting characteristic variable data, preprocessing the characteristic variable data, storing the characteristic variable data in a database, sampling samples in the database, and controlling the proportion of positive samples to negative samples to be 1: 10;
2) samples are randomly divided into a training set and a testing set;
3) selecting an XGboost algorithm as a basis to construct a prediction model, inputting a training set to train the prediction model, and obtaining a prediction probability value and the importance degree of characteristics;
4) and performing data prediction on the test set by using the trained model, evaluating the prediction model according to the prediction result, and performing optimization iteration on the prediction model if the evaluation result is lower than a threshold value.
Preferably, the characteristic variables take into account various dimensional characteristics of existing inventory users, including basic attributes, behavior data, package information, consumption characteristics, terminal information and derivative variables.
Preferably, the preprocessing comprises data cleaning and data conversion, wherein the data cleaning comprises correcting error values, filling missing values and normalizing data types.
Preferably, the missing values are supplemented with a median or zero value.
Preferably, a positive sample represents a carry number roll-out user and a negative sample represents a non-carry number roll-out user.
Preferably, five-fold cross validation and network search are adopted to obtain the optimal solution of the algorithm model.
Preferably, the evaluation of the algorithmic model is performed using the F1 values and AUC values.
Preferably, the threshold is set to 0.5.
Preferably, the optimization iteration comprises the following method: the method comprises the steps of dividing negative samples in a training set into N equal parts, combining a positive sample data set and the negative sample data set of each equal part to form N small training sets, training each small training set by using an XGboost algorithm to obtain N base models, and calculating the average value of output values of the N base models.
The invention has the beneficial effects that: compared with the traditional non-machine learning method, the number portability and roll-out condition of the user can be more accurately predicted, and early warning and timely maintenance are achieved in advance; the prediction model can dynamically perform optimization iteration, and the prediction efficiency is improved.
Drawings
FIG. 1 is a schematic flow diagram of the process of the present invention;
FIG. 2 is a schematic diagram of the composition of characteristic variables of the present invention;
FIG. 3 is a schematic flow chart of optimization iteration in the method of the present invention.
Detailed Description
The invention is further described below with reference to specific embodiments, but the scope of protection of the invention is not limited thereto:
example (b): as shown in fig. 1, a method for predicting the carrier roll-out of a telecom business user based on machine learning includes the following steps:
1) collecting characteristic variable data, preprocessing the characteristic variable data, storing the characteristic variable data in a database, sampling samples in the database, and controlling the proportion of positive samples to negative samples to be 1: and 10, positive samples represent carry-out users, and negative samples represent non-carry-out users.
Defining the caliber carried out by the target variable, namely a positive sample:
the actual number portability of the user is transferred to the caliber:
taking the target user in the nth month as an example, one of the following requirements is satisfied:
and (n +1 month) or (n +2 month) or (n +3 month) carry number roll-out users.
As shown in fig. 2, the feature variables include features of various dimensions acquired by existing stock users, and are gradually subdivided into 151 subdivided feature variables starting from six dimensions of basic attributes, behavior data, package information, consumption features, terminal information and derived variables of the users, univariate analysis is performed on part of key variables, the relationship between the key variables and the number portability of the users is measured, whether the key variables conform to actual business rules or not is checked, and the required feature variables are acquired and determined.
And carrying out data preprocessing after acquiring the required characteristic variables, wherein the preprocessing comprises data cleaning and data conversion. The missing numerical values are filled by adopting zero values and median numbers, for example, the age of the user in the invention can be filled by adopting the median numbers, and if the charge amount of the user is missing, the zero values are filled. If there are a large number of missing instances of a feature, the ratio exceeding ninety percent, then the feature is removed. And converting category characteristics, such as the terminal model of a mobile phone, the package name of a user and the traffic conversation trend of the user, by using the one-hot code to convert the categories so as to enable the data to be suitable for a matching algorithm model.
2) Samples are randomly divided into a training set and a testing set;
3) selecting an XGboost algorithm as a prediction model, obtaining a loss function and a prediction probability value, and finishing the training of the prediction model, wherein the loss function represents the inconsistency degree between a predicted user to be transferred and an actual user to be transferred;
objective function
Figure BDA0002749472290000041
Wherein,
Figure BDA0002749472290000042
probability value, y, representing the predicted value of the model, i.e. the user portable roll-out predictiontClass label representing nth sample, K representing number of trees, fkRepresenting the K-th tree model,
Figure BDA0002749472290000051
is a loss function that is the degree of disparity between the predicted and actual roll-out users in the present invention,to measure its fit. If the loss function value is smaller, the model robustness is higher, and the fitting effect is better.
Figure BDA0002749472290000052
The sum of the complexity of K trees is the regularization term, namely the complexity of the estimated carry-out model.
Where Ω complexity is defined as follows:
Figure BDA0002749472290000053
t represents the number of leaf nodes per tree, omega represents the set of fractional components of the leaf nodes per tree, and gamma and lambda are adjustable coefficients.
And finally, simplifying and approximating the objective function by using Taylor second-order expansion to obtain an optimal solution.
4) And predicting data of the test set by using the trained prediction model, and evaluating the prediction model according to the prediction result, wherein in the embodiment, the evaluation indexes are F1 values, AUC, recall rate and accuracy rate. If the evaluation index is lower than the threshold, considering the influence of the number of the collected samples and the model features on the model performance, as shown in fig. 3, the model is optimized: the method comprises the steps of dividing negative samples in a training set into N equal parts, combining a positive sample data set and the negative sample data set of each equal part to form N small training sets, training each small training set by using an XGboost algorithm to obtain N base models, and calculating the average value of output values of the N base models.
The various indices of the model on the test set are as follows:
AUC value (Train): 0.966670
Recall (Train): 0.590293
Accuracy (Train): 0.942627
F1 value (Train): 0.724977
A reasonable threshold value is established according to the situation scale given by the service side. If the service side needs to maintain a large number of users, a smaller threshold value can be selected. Or the traffic side may prefer accuracy, a larger threshold may be selected.
While the invention has been described in connection with specific embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (9)

1. A method for predicting number portability and roll-out of telecommunication industry users based on machine learning is characterized by comprising the following steps:
1) collecting characteristic variable data, preprocessing the characteristic variable data, storing the characteristic variable data in a database, sampling samples in the database, and controlling the proportion of positive samples to negative samples to be 1: 10;
2) samples are randomly divided into a training set and a testing set;
3) selecting an XGboost algorithm as a basis to construct a prediction model, inputting a training set to train the prediction model, and obtaining a prediction probability value and the importance degree of characteristics;
4) and performing data prediction on the test set by using the trained model, evaluating the prediction model according to the prediction result, and performing optimization iteration on the prediction model if the evaluation result is lower than a threshold value.
2. The method for predicting the carrier roll-out of the telecommunication industry user based on the machine learning of claim 1, wherein the feature variables comprise various dimensional features of existing stock users, including basic attributes, behavior data, package information, consumption features, terminal information and derivative variables.
3. The method for predicting the carrier roll-out of the telecommunication industry user based on the machine learning as claimed in claim 2, wherein the preprocessing comprises data cleaning and data conversion, the data cleaning comprises correcting error values, and zero value is adopted to supplement missing values.
4. The method for predicting the carrier roll-out of the telecom industry users based on the machine learning as claimed in claim 3, wherein the missing value is supplemented with a median.
5. The method for predicting the carrier roll-out of the telecommunication industry user based on the machine learning as claimed in claim 3, wherein a positive sample represents the carrier roll-out user and a negative sample represents the non-carrier roll-out user.
6. The method for predicting the carrier roll-out of the telecommunication industry user based on the machine learning as claimed in claim 5, wherein the optimal solution of the prediction model is obtained by adopting five-fold cross validation and network search.
7. The method for predicting the carrier roll-out of the telecommunication industry user based on the machine learning of claim 6, wherein the F1 value and the AUC value are used for the evaluation of the prediction model.
8. The method for predicting the carrier roll-out of the telecom industry users based on the machine learning of claim 7, wherein the threshold is set to 0.5.
9. The method for predicting the carrier roll-out of the telecom industry users based on the machine learning of claim 8, wherein the optimization iteration comprises the following steps: the method comprises the steps of dividing negative samples in a training set into N equal parts, combining a positive sample data set and the negative sample data set of each equal part to form N small training sets, training each small training set by using an XGboost algorithm to obtain N base models, and calculating the average value of output values of the N base models.
CN202011178646.0A 2020-10-29 2020-10-29 Method for predicting number portability and roll-out of telecommunication industry user based on machine learning Pending CN112153636A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011178646.0A CN112153636A (en) 2020-10-29 2020-10-29 Method for predicting number portability and roll-out of telecommunication industry user based on machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011178646.0A CN112153636A (en) 2020-10-29 2020-10-29 Method for predicting number portability and roll-out of telecommunication industry user based on machine learning

Publications (1)

Publication Number Publication Date
CN112153636A true CN112153636A (en) 2020-12-29

Family

ID=73953560

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011178646.0A Pending CN112153636A (en) 2020-10-29 2020-10-29 Method for predicting number portability and roll-out of telecommunication industry user based on machine learning

Country Status (1)

Country Link
CN (1) CN112153636A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115134805A (en) * 2021-03-29 2022-09-30 中国移动通信集团福建有限公司 Method, device, equipment and storage medium for predicting potential carried-in different network numbers
CN115412421A (en) * 2022-08-30 2022-11-29 南京华苏科技有限公司 Unsatisfactory user early warning method based on CNN-LSTM model
CN115988475A (en) * 2022-12-20 2023-04-18 中国联合网络通信集团有限公司 Prediction method, equipment and storage medium of portable user
CN116033370A (en) * 2021-10-25 2023-04-28 中国移动通信集团广东有限公司 Method and device for processing number-carrying network transfer

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107832581A (en) * 2017-12-15 2018-03-23 百度在线网络技术(北京)有限公司 Trend prediction method and device
CN109344201A (en) * 2018-10-17 2019-02-15 国网江苏省电力有限公司信息通信分公司 A kind of database performance load evaluation system and method based on machine learning
CN109451527A (en) * 2018-12-21 2019-03-08 广东宜通世纪科技股份有限公司 A kind of mobile communication subscriber is lost day granularity prediction technique and device
CN109558962A (en) * 2017-09-26 2019-04-02 中国移动通信集团山西有限公司 Predict device, method and storage medium that telecommunication user is lost
CN109636446A (en) * 2018-11-16 2019-04-16 北京奇虎科技有限公司 Customer churn prediction technique, device and electronic equipment
CN109886755A (en) * 2019-03-04 2019-06-14 深圳微品致远信息科技有限公司 A kind of communication user attrition prediction method and system based on evolution algorithm
US20190318202A1 (en) * 2016-10-31 2019-10-17 Tencent Technology (Shenzhen) Company Limited Machine learning model training method and apparatus, server, and storage medium
CN110472817A (en) * 2019-07-03 2019-11-19 西北大学 A kind of XGBoost of combination deep neural network integrates credit evaluation system and its method
CN110866767A (en) * 2018-08-27 2020-03-06 中国移动通信集团江西有限公司 Method, device, equipment and medium for predicting satisfaction degree of telecommunication user
US20200120003A1 (en) * 2018-10-10 2020-04-16 Sandvine Corporation System and method for predicting and reducing subscriber churn
CN111092762A (en) * 2019-12-19 2020-05-01 深圳市博瑞得科技有限公司 Prediction method, device and storage medium for number portability potential user
CN111242358A (en) * 2020-01-07 2020-06-05 杭州策知通科技有限公司 Enterprise information loss prediction method with double-layer structure
CN111275245A (en) * 2020-01-13 2020-06-12 宜通世纪物联网研究院(广州)有限公司 Potential network switching user identification method, system, message pushing method, device and medium
CN111582577A (en) * 2020-05-07 2020-08-25 北京思特奇信息技术股份有限公司 Method, system, medium and equipment for predicting off-network of telecommunication user

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190318202A1 (en) * 2016-10-31 2019-10-17 Tencent Technology (Shenzhen) Company Limited Machine learning model training method and apparatus, server, and storage medium
CN109558962A (en) * 2017-09-26 2019-04-02 中国移动通信集团山西有限公司 Predict device, method and storage medium that telecommunication user is lost
CN107832581A (en) * 2017-12-15 2018-03-23 百度在线网络技术(北京)有限公司 Trend prediction method and device
CN110866767A (en) * 2018-08-27 2020-03-06 中国移动通信集团江西有限公司 Method, device, equipment and medium for predicting satisfaction degree of telecommunication user
US20200120003A1 (en) * 2018-10-10 2020-04-16 Sandvine Corporation System and method for predicting and reducing subscriber churn
CN109344201A (en) * 2018-10-17 2019-02-15 国网江苏省电力有限公司信息通信分公司 A kind of database performance load evaluation system and method based on machine learning
CN109636446A (en) * 2018-11-16 2019-04-16 北京奇虎科技有限公司 Customer churn prediction technique, device and electronic equipment
CN109451527A (en) * 2018-12-21 2019-03-08 广东宜通世纪科技股份有限公司 A kind of mobile communication subscriber is lost day granularity prediction technique and device
CN109886755A (en) * 2019-03-04 2019-06-14 深圳微品致远信息科技有限公司 A kind of communication user attrition prediction method and system based on evolution algorithm
CN110472817A (en) * 2019-07-03 2019-11-19 西北大学 A kind of XGBoost of combination deep neural network integrates credit evaluation system and its method
CN111092762A (en) * 2019-12-19 2020-05-01 深圳市博瑞得科技有限公司 Prediction method, device and storage medium for number portability potential user
CN111242358A (en) * 2020-01-07 2020-06-05 杭州策知通科技有限公司 Enterprise information loss prediction method with double-layer structure
CN111275245A (en) * 2020-01-13 2020-06-12 宜通世纪物联网研究院(广州)有限公司 Potential network switching user identification method, system, message pushing method, device and medium
CN111582577A (en) * 2020-05-07 2020-08-25 北京思特奇信息技术股份有限公司 Method, system, medium and equipment for predicting off-network of telecommunication user

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
任新月;: "机器学习在电信客户离网预测中的应用", 信息通信, no. 05 *
李为康;杨小兵;: "一种基于双层融合结构的客户流失预测模型", 小型微型计算机系统, no. 08 *
沈江明;张磊;曾志勇;: "基于深度置信神经网络的电信客户流失分析", 通讯世界, no. 06 *
赵慧;刘颖慧;崔羽飞;张第;: "机器学习在运营商用户流失预警中的运用", 信息通信技术, no. 01 *
黄展正: "DG电信公司宽带用户流失的预警模型构建", 《中国优秀硕士学位论文全文数据库 经济与管理科学辑》, no. 05 *
龙克树;邓娟;刘晓斌;: "基于机器学习算法的运营商用户流失预判及应对策略研究", 信息记录材料, no. 05 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115134805A (en) * 2021-03-29 2022-09-30 中国移动通信集团福建有限公司 Method, device, equipment and storage medium for predicting potential carried-in different network numbers
CN116033370A (en) * 2021-10-25 2023-04-28 中国移动通信集团广东有限公司 Method and device for processing number-carrying network transfer
CN115412421A (en) * 2022-08-30 2022-11-29 南京华苏科技有限公司 Unsatisfactory user early warning method based on CNN-LSTM model
CN115988475A (en) * 2022-12-20 2023-04-18 中国联合网络通信集团有限公司 Prediction method, equipment and storage medium of portable user

Similar Documents

Publication Publication Date Title
CN112153636A (en) Method for predicting number portability and roll-out of telecommunication industry user based on machine learning
CN109492026B (en) Telecommunication fraud classification detection method based on improved active learning technology
CN107766929A (en) model analysis method and device
CN112054943B (en) Traffic prediction method for mobile network base station
CN110309967A (en) Prediction technique, system, equipment and the storage medium of customer service session grading system
CN109787821B (en) Intelligent prediction method for large-scale mobile client traffic consumption
CN112200375B (en) Prediction model generation method, prediction model generation device, and computer-readable medium
CN112149352B (en) Prediction method for marketing activity clicking by combining GBDT automatic characteristic engineering
CN107704868A (en) Tenant group clustering method based on Mobile solution usage behavior
CN114528395A (en) Risk prediction method for text word feature double-line attention fusion
CN112883062A (en) Self-defined rule checking method not based on rule
CN113780345A (en) Small sample classification method and system facing small and medium-sized enterprises and based on tensor attention
CN116245399A (en) Model training method and device, nonvolatile storage medium and electronic equipment
CN116579640A (en) Power marketing service channel user experience assessment method and system
CN105873119A (en) Method for classifying flow use behaviors of mobile network user groups
CN113486174A (en) Model training, reading understanding method and device, electronic equipment and storage medium
CN109543571B (en) Intelligent identification and retrieval method for special-shaped processing characteristics of complex products
CN108763289B (en) Massive heterogeneous sensor format data analysis method
CN110955835A (en) Sharing platform information publishing system based on big data technology
CN114519343A (en) 95598-based repeated incoming call preprocessing method, device, equipment and storage medium
CN109919811B (en) Insurance agent culture scheme generation method based on big data and related equipment
CN114066506A (en) AI analysis algorithm for network behavior
CN113761897A (en) Text big data-based call center customer service work order entity identification method
CN112749841A (en) User public praise prediction method and system based on self-training learning
CN114492552A (en) Method, device and equipment for training broadband user authenticity judgment model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination