CN109242257B - 4G internet user complaint model based on key index correlation analysis - Google Patents

4G internet user complaint model based on key index correlation analysis Download PDF

Info

Publication number
CN109242257B
CN109242257B CN201810902832.0A CN201810902832A CN109242257B CN 109242257 B CN109242257 B CN 109242257B CN 201810902832 A CN201810902832 A CN 201810902832A CN 109242257 B CN109242257 B CN 109242257B
Authority
CN
China
Prior art keywords
model
complaint
user
data
success rate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810902832.0A
Other languages
Chinese (zh)
Other versions
CN109242257A (en
Inventor
仇春芳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Hantele Communication Co ltd
Original Assignee
Guangzhou Hantele Communication Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Hantele Communication Co ltd filed Critical Guangzhou Hantele Communication Co ltd
Priority to CN201810902832.0A priority Critical patent/CN109242257B/en
Publication of CN109242257A publication Critical patent/CN109242257A/en
Application granted granted Critical
Publication of CN109242257B publication Critical patent/CN109242257B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/01Customer relationship services
    • G06Q30/015Providing customer assistance, e.g. assisting a customer within a business location or via helpdesk
    • G06Q30/016After-sales
    • G06Q50/40

Abstract

The invention discloses a 4G internet user complaint model based on key index correlation analysis, which comprises the following steps: s1, finding out a plurality of factors influencing the complaints of the user by using a logistic regression model and establishing a decision tree model; s2 extracting user data, integrating and processing the data; s3, carrying out T test on the user data, and primarily finding out factors influencing complaints; s4, setting a training data set, and establishing a logistic regression model by using R language; optimizing by using a post-step logistic regression method according to a result returned after the primary model is established to obtain a final result and determine a final model; s5, carrying out chi-square inspection on the model; and S6, predicting the test data set by using the model, and performing cross statistics on the predicted result and the actual result. The invention provides a 4G internet user complaint model based on key index correlation analysis, which is used for solving the cause of complaint of a 4G internet user and carrying out preventive solution on potential complaint users in advance.

Description

4G internet user complaint model based on key index correlation analysis
Technical Field
The invention relates to the field of mobile communication, in particular to a 4G internet user complaint model based on key index correlation analysis.
Background
With the rapid development of the mobile internet, the requirements of users on the mobile network are higher and higher, the reduction of the complaint amount and the prevention of potential complaints of 4G users in advance are important tasks of network workers, the complaint amount of the 4G users can be reduced, and the internet surfing satisfaction of the 4G users is improved.
Disclosure of Invention
The invention aims to solve one or more defects and provides a 4G internet user complaint model based on key index correlation analysis.
In order to realize the purpose, the technical scheme is as follows:
A4G internet user complaint model building method based on key index correlation analysis comprises the following steps:
s1: exploring influence factors and conditions of the complaint users, finding out a plurality of factors influencing the complaints of the users by utilizing a logistic regression model, and establishing a decision tree model;
s2: extracting user data comprising the plurality of factors described in step S1, including raw complaint user data and non-complaint user data; integrating and processing data;
s3: performing T test on the original complaint user data and the non-complaint user data acquired in the step S2, comparing the difference between the complaint indexes and the non-complaint indexes, and primarily finding out factors influencing complaint;
s4: setting a training data set, and establishing a logistic regression model by using an R language; setting whether complaints are taken as dependent variables or not, setting values to be 0 and 1, optimizing by using a post-step logistic regression method according to a result returned after a primary model is established, obtaining a final result and determining a final model;
s5: performing chi-square test on the model to ensure that all variables of the model pass significance test and ensure that the whole model is significant;
s6: and predicting the test data set by using the model, and performing cross statistics on a predicted result and an actual result.
Further, in step S1, the several factors affecting the user complaint include an attach success rate, an attach delay, a default bearer success rate, a default bearer delay, a Tcp23 handshake success rate, and a Tcp23 handshake delay.
Further, the step S2 of sorting the data includes the following steps:
s2.1: removing percentage signs from all data success rates, reserving numbers between 0 and 100, and reserving 2 bits at decimal points;
s2.2: removing records with index missing number larger than 5;
s2.3: filling the missing values by adopting a K nearest neighbor method for the records with the number of the missing values of 1-5;
s2.4: 80% of complaint records and 80% of non-complaint records were randomly drawn for training the model, with the remaining 20% for prediction.
The final model is:
Figure BDA0001759874170000021
compared with the prior art, the invention has the beneficial effects that:
the invention provides a 4G internet user complaint model based on key index correlation analysis, which is used for solving the cause of complaint of a 4G internet user and carrying out preventive solution on potential complaint users in advance.
Drawings
FIG. 1 is a flow chart of the present invention.
Detailed Description
The drawings are for illustrative purposes only and are not to be construed as limiting the patent;
the invention is further illustrated below with reference to the figures and examples.
Example 1
A method for establishing a complaint model of a 4G internet user based on key index correlation analysis, please refer to fig. 1, comprising the following steps:
s1: exploring influence factors and conditions of the complaint users, finding out a plurality of factors influencing the complaints of the users by utilizing a logistic regression model, and establishing a decision tree model;
and exploring influencing factors and conditions of the complaint users. T test is carried out on 15 indexes between a complaint group and a non-complaint group, and the significant difference of the success rate of attach, the time delay of attach, the success rate of default bearing, the time delay of default bearing, the success rate of Tcp23 handshake is found, and the time delay of Tcp23 handshake is found. Using a logistic regression model, 5 factors affecting user complaints were found: attach success rate, attach delay, default bearer delay, Tcp23 handshake success rate, Tcp23 handshake delay. And establishing a decision tree model according to the 5 factors to obtain the conditions of the user with high possibility of complaints:
1) tcp23 handshake success rate <100 and Tcp23 handshake delay <80 and attach delay < 178;
2) tcp23 handshake success rate <100 and Tcp23 handshake delay > 80;
3) tcp23 handshake success rate is 100 and attach delay is <374 and attach delay is 179 and default bearer delay is 191;
4) tcp23 handshake success rate is 100 and attach delay is < 179;
5) tcp23 handshake success rate is 100 and attach delay is 374.
The decision tree model has a prediction accuracy of 65.7%, but the resulting complaint conditions are less than satisfactory. Finally, analyzing whether a threshold exists in each index single factor analysis, so that the user is likely to complain beyond the threshold, and finally concluding as follows:
1) the user is likely to complain when the attach success rate is less than or equal to 60%;
2) the user is likely to complain when the attach delay is higher than or equal to 1500 ms;
3) when the default bearing success rate is less than or equal to 20%, the user is likely to complain;
4) when the default bearing delay is more than or equal to 1000ms, the user is likely to complain;
5) a user is likely to complain when the Tcp23 handshake success rate is 90% or less.
S2: extracting user data comprising the plurality of factors described in step S1, including raw complaint user data and non-complaint user data; integrating and processing data;
this example extracts 2336 copies of original complaint user data and 2993 copies of non-complaint users, for a total of 5329 records. The data are integrated into a table, which contains the following indices:
the method comprises the following steps of attach success rate, attach time delay, default bearing success rate, default bearing time delay, DNS success rate, DNS time delay, Tcp12 handshake success rate, Tcp12 handshake time delay, Tcp23 handshake success rate, Tcp23 handshake time delay, Get response success rate, Get response time delay, Post response success rate, Post response time delay and large packet (more than 500KB) download rate.
For modeling, the data is processed as follows:
1) all success rates are removed with percentiles, numbers between 0 and 100 are reserved, and decimal points are reserved with 2 bits. For example, 99.5% transforms to 99.50;
2) records with deletion index number greater than 5: firstly, calculating the index missing number of each record: if more than 5 (not containing 5) indexes of 15 indexes have missing values, direct elimination is considered, otherwise, the later modeling is very unfavorable. After kicking off, 4819 data remained (complaint 1999, non-complaint 2820);
3) for the records with the number of the missing values of 1-5, filling the missing values by adopting a K nearest neighbor method, so that the data to be modeled has no missing values and is convenient to model;
4) 80% of complaint records and 80% of non-complaint records were drawn randomly for training the model (total 3820 records), with the remaining 20% for prediction.
S3: performing T test on the original complaint user data and the non-complaint user data acquired in the step S2, comparing the difference between the complaint indexes and the non-complaint indexes, and primarily finding out factors influencing complaint;
the mathematical application of the P values in the T-test is as follows:
p value Probability of coincidence For invalid hypothesis Statistical significance of
P>0.05 The probability of occurrence of the accident is more than 5 percent Fail to negate null hypothesis Two groups of differences have no significant meaning
P<0.05 The probability of occurence is less than 5 percent Invalid hypotheses may be negated Two groups of differences have significant meaning
P<0.01 The probability of occurence is less than 1% Invalid hypotheses may be negated The difference between the two is very significant
The difference between the complaint index and the non-complaint index is compared by using a T test method, and the factors influencing the complaint can be preliminarily found out. The data adopts 2336 parts of original complaint user data, 2993 parts of non-complaint users and 5329 records (containing missing values), each index is independently calculated, the records containing the missing values are automatically ignored, and the results are shown in the following table:
Figure BDA0001759874170000041
Figure BDA0001759874170000051
as can be seen from the above table, there are strong significant differences (99% confidence level) between the complaining user and the non-complaining user in the indexes of attach success rate, attach delay, default bearer delay, Tcp23 handshake success rate, and Tcp23 handshake delay. The default bearer success rate also differed significantly at the 95% confidence level and not at the 99% confidence level. Other indices were not significantly different.
S4: setting a training data set, and establishing a logistic regression model by using an R language; setting whether complaints are taken as dependent variables or not, setting values to be 0 and 1, optimizing by using a post-step logistic regression method according to a result returned after a primary model is established, obtaining a final result and determining a final model;
a logistic regression model is built using the R language from the training data set (3820 records). Whether complaints are taken as dependent variables or not is judged, the values are only 0 and 1(0 is non-complaint and 1 is complaint), and 15 indexes are taken as independent variables. R the returned results after establishing the preliminary model are as follows:
Call:
glm (whether complaint-attach success rate + attach delay + default bearing success rate +
Default bearer delay + DNS success rate + DNS delay + Tcp12 handshake success rate + Tcp12 handshake delay + Tcp23 handshake success rate + Tcp23 handshake delay + Get response success rate + Get response delay + Post response success rate + Post response delay + big packet, greater than 500KB. download rate,
family="binomial",data=train.dt)
Deviance Residuals:
Min 1Q Median 3Q Max
-3.2577-0.9793-0.9161 1.3126 2.3556
Coefficients:
Figure BDA0001759874170000061
Figure BDA0001759874170000071
big package, download rate of more than 500KB. -1.491 e-062.603 e-06-0.5730.566695
---
Signif.codes:0‘***’0.001‘**’0.01‘*’0.05‘.’0.1‘’1
(Dispersion parameter for binomial family taken to be 1)
Null deviance:5182.4 on 3819 degrees of freedom Residual deviance:5006.4 on 3804 degrees of freedom AIC:5038.4
Number of Fisher Scoring iterations:5
In the above results, the bands indicate that the coefficients are very significant, the bands indicate insignificant, and the bands do not indicate significant. The coefficients for the multiple indices in the results are not significant and further optimization of the model is required.
Optimizing the model by using a backward stepwise logistic regression method (removing a factor which has the least obvious influence in each step to ensure that the model is better than the previous model and only the factor with the obvious coefficient is finally kept), wherein the final result of the backward stepwise regression is as follows:
Call:
glm (whether complaint-attach success rate + attach delay + default bearer delay + Tcp12 handshake delay + Tcp23 handshake success rate + Tcp23 handshake delay, family ═ binomial ", data ═ train
Deviance Residuals:
Min 1Q Median 3Q Max -3.2489 -0.9789 -0.9198 1.3174 1.7080
Coefficients:
Figure BDA0001759874170000072
Figure BDA0001759874170000081
Signif.codes:0‘***’0.001‘**’0.01‘*’0.05‘.’0.1‘’1
(Dispersion parameter for binomial family taken to be 1)
Null deviance:5182.4 on 3819 degrees of freedom Residual deviance:5012.0 on 3813 degrees of freedom AIC:5026
Number of Fisher Scoring iterations:5
The model finally retains 6 metrics, but the factor of Tcp12 handshake delay is still not significant, and the model is re-modeled with the remaining 5 metrics in consideration of rejecting the metric:
Call:
glm (format is whether complaint is-attach success rate + attach delay + default bearer delay + Tcp23 handshake success rate + Tcp23 handshake delay, family is "binomial", data is train
Deviance Residuals:
Min 1Q Median 3Q Max -3.2410 -0.9790 -0.9201 1.3180 1.5413
Coefficients:
Figure BDA0001759874170000082
Figure BDA0001759874170000091
Signif.codes:0‘***’0.001‘**’0.01‘*’0.05‘.’0.1‘’1
(Dispersion parameter for binomial family taken to be 1)
Null deviance:5182.4 on 3819 degrees of freedom Residual deviance:5014.2 on 3814 degrees of freedom AIC:5026.2
Number of Fisher Scoring iterations:5
So far, all coefficients are remarkable, and relatively important variables are reserved. The model thus established is:
Figure BDA0001759874170000092
+0.0005773 default bearer delay-0.0666100 Tcp23 handshake success rate +0.0038896 Tcp23 handshake delay
When P calculated by the model is greater than 0.5, complaints are considered, otherwise complaints are not.
S5: performing chi-square test on the model to ensure that all variables of the model pass significance test and ensure that the whole model is significant;
the variables of the model pass the significance test and ensure that the whole model is significant, so that the model is correct and meaningful. Chi-square test was performed on the model, and the results are shown below:
Analysis of Deviance Table
Model:binomial,link:logit
response whether complaints are made
Terms added sequentially(first to last)
Figure BDA0001759874170000093
Figure BDA0001759874170000101
Signif.codes:0‘***’0.001‘**’0.01‘*’0.05‘.’0.1‘’1
The model passed the global significance test, indicating that a model consisting of the above variables is meaningful.
S6: and predicting the test data set by using the model, and performing cross statistics on a predicted result and an actual result.
The model is used to predict a test data set (956 records, complaint 396, non-complaint 560, total) to predict whether an unknown customer is likely to complain. When P calculated by the model is greater than 0.5, complaints are considered, otherwise complaints are not. And performing cross statistics on the predicted result and the actual result, wherein the results are shown in the following table:
complaint user Non-complaint users
Predicting complaints 162 94
Prediction of non-complaints 234 466
Predicting the complaint accuracy rate: 162/(162+94) × 100% ═ 63.8%;
predicting the non-complaint accuracy rate: 466/(466+234) × 100% — 66.6%;
overall prediction accuracy: (466+162)/(466+162+234+94) × 100 ═ 65.7%;
the recall ratio is as follows: 162/(162+234) ═ 40.9%.
And (3) knotting: the model prediction capability is better.
It should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims (3)

1. A4G internet user complaint model based on key index correlation analysis is characterized by comprising the following steps:
s1: exploring influence factors and conditions of the complaint users, finding out a plurality of factors influencing the complaints of the users by utilizing a logistic regression model, and establishing a decision tree model; the plurality of factors influencing the user complaints comprise an attach success rate, an attach time delay, a default bearing success rate, a default bearing time delay, a Tcp23 handshake success rate and a Tcp23 handshake time delay;
s2: extracting user data comprising the plurality of factors described in step S1, including raw complaint user data and non-complaint user data; integrating and processing data;
s3: performing T test on the original complaint user data and the non-complaint user data acquired in the step S2, comparing the difference between the complaint indexes and the non-complaint indexes, and primarily finding out factors influencing complaint;
s4: setting a training data set, and establishing a logistic regression model by using an R language; setting whether complaints are taken as dependent variables or not, setting values to be 0 and 1, optimizing by using a post-step logistic regression method according to a result returned after a primary model is established, obtaining a final result and determining a final model;
s5: performing chi-square test on the model to ensure that all variables of the model pass significance test and ensure that the whole model is significant;
s6: and predicting the test data set by using the model, and performing cross statistics on a predicted result and an actual result.
2. The 4G internet user complaint model based on key index correlation analysis of claim 1, wherein the step S2 of collating data includes the following steps:
s2.1: removing percentage signs from all data success rates, reserving numbers between 0 and 100, and reserving 2 bits at decimal points;
s2.2: removing records with index missing number larger than 5;
s2.3: filling the missing values by adopting a K nearest neighbor method for the records with the number of the missing values of 1-5;
s2.4: 80% of complaint records and 80% of non-complaint records were randomly drawn for training the model, with the remaining 20% for prediction.
3. The 4G internet user complaint model based on key index correlation analysis as claimed in claim 1 or 2, wherein the final model is established as
Figure FDA0003160484920000021
CN201810902832.0A 2018-08-09 2018-08-09 4G internet user complaint model based on key index correlation analysis Active CN109242257B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810902832.0A CN109242257B (en) 2018-08-09 2018-08-09 4G internet user complaint model based on key index correlation analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810902832.0A CN109242257B (en) 2018-08-09 2018-08-09 4G internet user complaint model based on key index correlation analysis

Publications (2)

Publication Number Publication Date
CN109242257A CN109242257A (en) 2019-01-18
CN109242257B true CN109242257B (en) 2021-08-20

Family

ID=65069998

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810902832.0A Active CN109242257B (en) 2018-08-09 2018-08-09 4G internet user complaint model based on key index correlation analysis

Country Status (1)

Country Link
CN (1) CN109242257B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112101692B (en) * 2019-06-18 2023-11-24 中国移动通信集团浙江有限公司 Identification method and device for mobile internet bad quality users
CN111314124B (en) * 2020-02-07 2023-04-07 中国联合网络通信集团有限公司 Network problem analysis method, device, equipment and storage medium for high-speed rail network
CN111553816B (en) * 2020-04-20 2023-11-03 北京北大软件工程股份有限公司 Administrative multiple-proposal influence factor analysis method and device
CN112699099A (en) * 2020-12-30 2021-04-23 广州杰赛科技股份有限公司 Method, device and storage medium for expanding user complaint database
CN113157763B (en) * 2021-01-04 2023-10-13 北京汇达城数科技发展有限公司 Accurate identification system and method for user with specified behavior information
CN113780677A (en) * 2021-09-26 2021-12-10 深圳供电局有限公司 Prediction method and device for potential power repeated appeal user

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20110132083A (en) * 2010-06-01 2011-12-07 가톨릭대학교 산학협력단 Methof for prediction of hepatocellular carcinoma development in patients with liver cirrhosis
AU2012278931B2 (en) * 2011-07-06 2015-06-18 Fred Bergman Healthcare Pty Ltd Improvements relating to event detection algorithms
US20130311387A1 (en) * 2012-04-18 2013-11-21 Jurgen Schmerler Predictive method and apparatus to detect compliance risk
CN105574601A (en) * 2014-10-25 2016-05-11 胡峻源 Regression model modeling method for mobile traffic statistics
CN106204106A (en) * 2016-06-28 2016-12-07 武汉斗鱼网络科技有限公司 A kind of specific user's recognition methods and system
CN106548357B (en) * 2016-10-27 2020-11-20 南方电网科学研究院有限责任公司 Client satisfaction evaluation method and system
CN106971310A (en) * 2017-03-16 2017-07-21 国家电网公司 A kind of customer complaint quantitative forecasting technique and device
CN107437124A (en) * 2017-07-20 2017-12-05 大连大学 A kind of operator based on big data analysis complains and trouble correlation analytic method

Also Published As

Publication number Publication date
CN109242257A (en) 2019-01-18

Similar Documents

Publication Publication Date Title
CN109242257B (en) 4G internet user complaint model based on key index correlation analysis
CN110910901B (en) Emotion recognition method and device, electronic equipment and readable storage medium
CN109561322A (en) A kind of method, apparatus, equipment and the storage medium of video audit
CN108595422B (en) Method for filtering bad multimedia messages
CN111314691B (en) Video call quality assessment method and device
CN112087334A (en) Alarm root cause analysis method, electronic device and storage medium
CN111931809A (en) Data processing method and device, storage medium and electronic equipment
CN110348345A (en) A kind of Weakly supervised timing operating position fixing method based on continuity of movement
CN111510368A (en) Family group identification method, device, equipment and computer readable storage medium
CN111045902A (en) Pressure testing method and device for server
CN110781303A (en) Short text classification method and system
CN110713088B (en) Early warning method, device, equipment and medium for elevator complaints
CN113033909A (en) Portable user analysis method, device, equipment and computer storage medium
CN112101692B (en) Identification method and device for mobile internet bad quality users
CN116074183B (en) C3 timeout analysis method, device and equipment based on rule engine
CN112687402A (en) Intelligent medical internet big data processing method based on artificial intelligence and intelligent cloud service platform
CN112153636A (en) Method for predicting number portability and roll-out of telecommunication industry user based on machine learning
CN112084105A (en) Log file monitoring and early warning method, device, equipment and storage medium
CN114860788A (en) Technology promotion information service system and method
CN115408182A (en) Service system fault positioning method and device
CN114862092A (en) Evaluation method and device based on neural network
CN114828055A (en) User service perception evaluation method, device, equipment, medium and program product
CN111026851B (en) Model prediction capability optimization method, device, equipment and readable storage medium
CN109308565B (en) Crowd performance grade identification method and device, storage medium and computer equipment
CN114254088A (en) Method for constructing automatic response model and automatic response method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant