CN110278555B - Method, platform and storage medium for identifying international roaming silent number - Google Patents

Method, platform and storage medium for identifying international roaming silent number Download PDF

Info

Publication number
CN110278555B
CN110278555B CN201810215482.0A CN201810215482A CN110278555B CN 110278555 B CN110278555 B CN 110278555B CN 201810215482 A CN201810215482 A CN 201810215482A CN 110278555 B CN110278555 B CN 110278555B
Authority
CN
China
Prior art keywords
preset
silent
domestic
indexes
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810215482.0A
Other languages
Chinese (zh)
Other versions
CN110278555A (en
Inventor
徐海勇
韩林
陶涛
黄岩
舒敏根
于�玲
尚晶
陈春松
梁恩磊
周世峰
余韦
詹灵月
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Information Technology Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Information Technology Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN201810215482.0A priority Critical patent/CN110278555B/en
Publication of CN110278555A publication Critical patent/CN110278555A/en
Application granted granted Critical
Publication of CN110278555B publication Critical patent/CN110278555B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/06Testing, supervising or monitoring using simulated traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/08Testing, supervising or monitoring using real traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W8/00Network data management
    • H04W8/18Processing of user or subscriber data, e.g. subscribed services, user preferences or user profiles; Transfer of user or subscriber data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W8/00Network data management
    • H04W8/26Network addressing or numbering for mobility support

Abstract

The embodiment of the invention discloses a method for identifying international roaming silent numbers, which comprises the following steps: determining characteristic indexes based on the correlation among all indexes in preset basic indexes; determining a positive sample belonging to a silent number class and a negative sample belonging to a non-silent number class in a sample set according to a preset silent number definition rule; determining training set data and test set data according to domestic and foreign communication data corresponding to the positive sample, domestic and foreign communication data corresponding to the negative sample and characteristic indexes; training the recognition model according to the training set data and a preset mining algorithm, and carrying out accuracy test on the recognition model according to the test set data; and when the accuracy of the identification model is greater than or equal to a preset accuracy threshold, identifying whether the target number is a silent number or not according to domestic and foreign communication data and characteristic indexes corresponding to the target number and the identification model.

Description

Method, platform and storage medium for identifying international roaming silent number
Technical Field
The invention relates to the technical field of mobile communication, in particular to an identification method, a platform and a storage medium of international roaming silent numbers.
Background
Silent users refer to users with less mobile service usage and low value contribution rate. The number of the silent user for realizing the communication function through the terminal is the silent number. By identifying the silent number, a corresponding marketing strategy can be formulated for the silent user, the requirement of the silent user is released, and the survival rate of the user is improved.
In the prior art, the identification scheme for international roaming silent numbers mainly includes: first, the numbers are clustered. The numbers are classified into silent type, dosage ascending type and the like through partial indexes of voice, flow and short messages. Second, the number is subjectively defined. And classifying the classified numbers into passengers, businesses and the like according to the classified number communication conditions. Finally, corresponding marketing suggestions are given. For example, numbers with low traffic and relatively high traffic are classified into a silent class, and the numbers are classified into businesses according to communication conditions of the number, so that a package of voice and traffic can be pushed out for the silent class number.
However, in the above identification scheme for international roaming silent numbers, the numbers are classified only by clustering, and defined labels are provided to give marketing suggestions. Accurate service definition is not given for international roaming silent numbers, the communication behaviors of the numbers are not analyzed, and a clustering algorithm used in number clustering belongs to an unsupervised algorithm, so that service experts are required to participate in optimization, the processing difficulty of related data is high, and the accuracy of identifying silent numbers is low.
Disclosure of Invention
In order to solve the foregoing technical problems, embodiments of the present invention desirably provide a method, a platform, and a storage medium for identifying an international roaming silent number, which can train a model for identifying the international roaming silent number according to a communication behavior of the number, thereby improving accuracy of identifying the silent number.
The technical scheme of the invention is realized as follows:
the embodiment of the invention provides a method for identifying an international roaming silent number, which comprises the following steps:
determining characteristic indexes based on the correlation among all indexes in preset basic indexes;
determining a positive sample belonging to a silent number class and a negative sample belonging to a non-silent number class in a sample set according to a preset silent number definition rule;
determining training set data and test set data according to the domestic and foreign communication data corresponding to the positive sample, the domestic and foreign communication data corresponding to the negative sample and the characteristic indexes;
training a recognition model according to the training set data and a preset mining algorithm, and carrying out accuracy test on the recognition model according to the test set data;
and when the accuracy of the identification model is greater than or equal to a preset accuracy threshold, identifying whether the target number is a silent number according to domestic and foreign communication data corresponding to the target number, the characteristic index and the identification model.
In the foregoing solution, the determining the characteristic index based on the correlation between indexes in the preset basic index includes:
checking the correlation among all indexes in the preset basic indexes;
and determining indexes with the correlation smaller than a preset correlation threshold value in the preset basic indexes as the characteristic indexes.
In the foregoing solution, the determining, according to a preset silent number definition rule, a positive sample belonging to a silent number class and a negative sample belonging to a non-silent number class in a sample set includes:
in the sample set, it will simultaneously satisfy: the average daily calling time is less than or equal to a first preset time, the average daily called time is less than or equal to a second preset time, the average daily flow is less than or equal to a first preset flow, and the average daily cost is less than or equal to a first preset cost, determining the samples as the positive samples, and simultaneously satisfying: and determining the samples with the average daily call time longer than the third preset time, the average daily flow larger than the second preset flow and the average daily charge larger than the second preset charge as the negative samples.
In the above scheme, the determining training set data and test set data according to the domestic and foreign communication data corresponding to the positive sample, the domestic and foreign communication data corresponding to the negative sample, and the characteristic index includes:
calculating data which accords with the characteristic indexes in the domestic and foreign communication data corresponding to the positive sample and the domestic and foreign communication data corresponding to the negative sample within a first preset time period according to a preset weighted sum algorithm to obtain training set data;
and calculating the data which accords with the characteristic indexes in the domestic and foreign communication data corresponding to the positive sample and the domestic and foreign communication data corresponding to the negative sample in a second preset time period according to the preset weighted sum algorithm to obtain the test set data.
In the above solution, after the accuracy test is performed on the recognition model according to the test set data, the method further includes:
and when the accuracy of the identification model is smaller than the preset accuracy threshold, adjusting the identification model.
The embodiment of the invention also provides an identification platform of the international roaming silent number, which comprises: a processor, a memory, and a communication bus;
the communication bus is used for realizing connection communication between the processor and the memory;
the processor is used for executing the identification program of the international roaming silent number stored in the memory so as to realize the following steps:
determining characteristic indexes based on the correlation among all indexes in preset basic indexes; determining a positive sample belonging to a silent number class and a negative sample belonging to a non-silent number class in a sample set according to a preset silent number definition rule; determining training set data and test set data according to the domestic and foreign communication data corresponding to the positive sample, the domestic and foreign communication data corresponding to the negative sample and the characteristic indexes; training a recognition model according to the training set data and a preset mining algorithm, and carrying out accuracy test on the recognition model according to the test set data; and when the accuracy of the identification model is greater than or equal to a preset accuracy threshold, identifying whether the target number is a silent number according to domestic and foreign communication data corresponding to the target number, the characteristic index and the identification model.
In the foregoing platform, the processor is specifically configured to execute the procedure for identifying the international roaming silent number, so as to implement the following steps:
checking the correlation among all indexes in the preset basic indexes; and determining indexes with the correlation smaller than a preset correlation threshold value in the preset basic indexes as the characteristic indexes.
In the foregoing platform, the processor is specifically configured to execute the procedure for identifying the international roaming silent number, so as to implement the following steps:
in the sample set, it will simultaneously satisfy: the average daily calling time is less than or equal to a first preset time, the average daily called time is less than or equal to a second preset time, the average daily flow is less than or equal to a first preset flow, and the average daily cost is less than or equal to a first preset cost, determining the samples as the positive samples, and simultaneously satisfying: and determining the samples with the average daily call time longer than the third preset time, the average daily flow larger than the second preset flow and the average daily charge larger than the second preset charge as the negative samples.
Calculating data which accords with the characteristic indexes in the domestic and foreign communication data corresponding to the positive sample and the domestic and foreign communication data corresponding to the negative sample within a first preset time period according to a preset weighted sum algorithm to obtain training set data;
and calculating the data which accords with the characteristic indexes in the domestic and foreign communication data corresponding to the positive sample and the domestic and foreign communication data corresponding to the negative sample in a second preset time period according to the preset weighted sum algorithm to obtain the test set data.
In the foregoing platform, after the accuracy test is performed on the identification model according to the test set data, the processor is further configured to execute an identification procedure of the international roaming silent number, so as to implement the following steps:
and when the accuracy of the identification model is smaller than the preset accuracy threshold, adjusting the identification model.
The embodiment of the invention also provides a computer-readable storage medium, which stores one or more programs, and the one or more programs can be executed by one or more processors to implement the above method for identifying the international roaming silent number.
Therefore, in the technical scheme of the invention, the identification platform of the international roaming silent number determines the characteristic indexes based on the correlation among all indexes in the preset basic indexes; determining a positive sample belonging to a silent number class and a negative sample belonging to a non-silent number class in a sample set according to a preset silent number definition rule; determining training set data and test set data according to domestic and foreign communication data corresponding to the positive sample, domestic and foreign communication data corresponding to the negative sample and characteristic indexes; training the recognition model according to the training set data and a preset mining algorithm, and carrying out accuracy test on the recognition model according to the test set data; and when the accuracy of the identification model is greater than or equal to a preset accuracy threshold, identifying whether the target number is a silent number or not according to domestic and foreign communication data and characteristic indexes corresponding to the target number and the identification model. That is to say, in the technical solution of the embodiment of the present invention, a model for identifying an international roaming silent number can be trained according to a communication behavior of the number, thereby improving accuracy of identifying the silent number.
Drawings
Fig. 1 is a first flowchart illustrating a method for identifying an international roaming silent number according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of an exemplary determination of a characteristic indicator provided by an embodiment of the present invention;
fig. 3 is a flowchart illustrating a second method for identifying an international roaming silent number according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an identification platform of an international roaming silent number according to an embodiment of the present invention.
Detailed Description
The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.
Example one
The embodiment of the invention provides a method for identifying international roaming silent numbers. Fig. 1 is a first flowchart illustrating an international roaming silent number identification method according to an embodiment of the present invention. As shown in fig. 1, the method mainly comprises the following steps:
s101, determining characteristic indexes based on correlation among all indexes in preset basic indexes.
In the embodiment of the invention, the identification platform of the international roaming silent number can determine the characteristic index based on the correlation among all indexes in the preset basic indexes.
It should be noted that, in the embodiment of the present invention, the identification Platform of the international roaming silent number may be a Platform as a Service (PaaS) Platform, that is, accurate identification of the international roaming silent number is finally achieved through the PaaS Platform.
Specifically, in the embodiment of the present invention, the PaaS platform first checks the correlation between each index in the preset basic index, and then determines, as the characteristic index, an index whose correlation is lower than a preset correlation threshold in the preset basic index.
It should be noted that, in the embodiment of the present invention, a basic index and a correlation threshold are preset on the PaaS platform, the preset basic index is related to a communication behavior, and the specific preset basic index and the preset correlation threshold are not limited in the embodiment of the present invention.
Fig. 2 is a schematic diagram of an exemplary determination characteristic indicator according to an embodiment of the present invention. As shown in fig. 2, the preset basic indexes include: the number of the voice communication call is 26, the voice time length, the number of the short message call and the flow rate are 26. The preset correlation threshold is a. The PaaS platform firstly checks the correlation among each index in 26 indexes, and then screens 5 indexes with the correlation more than or equal to A, namely, 21 indexes with the correlation less than A are reserved and determined as characteristic indexes.
It can be understood that, in the embodiment of the present invention, the PaaS platform determines the feature index based on the correlation between each index in the preset basic index, instead of directly determining all the preset basic indexes as the feature index, so that the complexity of subsequent data processing of the PaaS platform can be reduced, and the efficiency of training the recognition model can be improved.
S102, according to a preset silent number definition rule, positive samples belonging to a silent number class and negative samples belonging to a non-silent number class in a sample set are determined.
In the embodiment of the present invention, after determining the characteristic index, the PaaS platform may define, according to a preset silent number definition rule, samples in the sample set, and determine a positive sample belonging to a silent number class and a negative sample belonging to a non-silent number class in the sample set.
It should be noted that, in the embodiment of the present invention, the sample set is actually composed of some international roaming service numbers, and the data set may be used to perform extraction, loading, conversion, and cleaning on the stored source data based on the PaaS platform, that is, perform preprocessing on the related data, so as to obtain the sample set. Specific sample sets are not limiting on the embodiments of the present invention.
Illustratively, the international roaming service active numbers are mainly located in north, upper, deep, Yunnan and other areas, so that a total of six million numbers in the areas can be extracted to form a sample set.
Specifically, in the embodiment of the present invention, the PaaS platform in the sample set will satisfy: the method comprises the following steps that samples of the average daily calling time length less than or equal to a first preset time length, the average daily called time length less than or equal to a second preset time length, the average daily flow less than or equal to a first preset flow and the average daily cost less than or equal to a first preset cost are determined as positive samples, and the conditions are met simultaneously: and determining the samples with the average daily call time longer than the third preset time, the average daily flow larger than the second preset flow and the average daily charge larger than the second preset charge as negative samples.
It should be noted that, in the embodiment of the present invention, the first preset time, the second preset time, the first preset flow rate, the first preset cost, the third preset time, the second preset flow rate, and the second preset cost may be preset according to actual needs, and specific setting of the embodiment of the present invention is not limited.
It should be noted that, in the embodiment of the present invention, the average daily calling time, the average daily called time, the average daily traffic, the average daily charge, and the average daily call time may be an average value of the sample set in four months, and a specific period for calculating the average value is not limited in the embodiment of the present invention.
For example, the preset silent number definition rule is shown in table 1:
TABLE 1
Figure BDA0001598466620000071
Wherein, t1For a first predetermined duration, t2For a second predetermined duration,/1For the first preset flow rate, f1For a first predetermined cost, t3For a third predetermined duration,/2For a second predetermined flow rate, and f2For a second predetermined fee. Therefore, the PaaS platform will satisfy, in a sample set, both: the average daily calling time is less than or equal to t1The average daily called time length is less than or equal to t2Daily average flow rate is less than or equal to l1And the average daily cost is less than or equal to f1Is determined to be a positive sample, will satisfy both: the average daily call duration is greater than t3The daily average flow is more than l2And the average daily cost is greater than f2Is determined as a negative sample.
It will be appreciated that in embodiments of the invention, the positive examples are actually silent numbers and the negative examples are actually non-silent numbers.
S103, determining training set data and testing set data according to the domestic and foreign communication data corresponding to the positive sample, the domestic and foreign communication data corresponding to the negative sample and the characteristic indexes.
In the embodiment of the invention, after the PaaS platform determines the positive sample and the negative sample, the training set data for training the recognition model and the test set data for testing the recognition model can be further determined according to the domestic and foreign communication data corresponding to the positive sample, the domestic and foreign communication data corresponding to the negative sample and the characteristic indexes.
Specifically, in the embodiment of the present invention, the PaaS platform calculates, in the first preset time period, data that meets the characteristic index in the domestic and foreign communication data corresponding to the positive sample and the domestic and foreign communication data corresponding to the negative sample according to a preset weighted sum algorithm to obtain training set data, and calculates, in the second preset time period, data that meets the characteristic index in the domestic and foreign communication data corresponding to the positive sample and the domestic and foreign communication data corresponding to the negative sample according to a preset weighted sum algorithm to obtain test set data.
It is to be understood that, in the embodiment of the present invention, there are corresponding domestic communication data and foreign communication data in each of the positive and negative examples, where the foreign communication data is international roaming data. In domestic communication data and foreign communication data, screening can be performed according to the previously determined characteristic indexes, and data meeting the characteristic indexes are selected. For any positive sample or negative sample, because domestic communication data has data meeting a certain characteristic index, and foreign communication data also has corresponding data meeting the characteristic index, according to a preset weighted summation algorithm, the data meeting the characteristic index in the domestic communication data and the data meeting the characteristic index in the foreign communication data are weighted and summed, and final data of the sample meeting the characteristic index can be obtained.
It should be noted that, in the embodiment of the present invention, in the preset weighted summation algorithm, a weight of the domestic communication data and a weight of the foreign communication data are set for each feature index. For example, the characteristic indicator is traffic, that is, a weight of domestic traffic and a weight of foreign traffic are set, so as to perform weighted summation.
It should be noted that, in the embodiment of the present invention, a first preset time period and a second preset time period are stored in the PaaS platform, where the first preset time period is used to determine a value range of the training set data, and the second preset time period is used to determine a value range of the test set data. In general, the first preset time period is larger, and the second preset time period is smaller. The specific first preset time period and the second preset time period are not limited in the embodiments of the present invention.
Illustratively, the first preset time period is three months, the second preset time period is one month, and the characteristic index is: the voice duration, the flow and the number of the short messages, and the number 1 is any one of a positive sample or a negative sample. The current date is 6 months, the PaaS platform takes the current date as a standard, and domestic and foreign data which are in accordance with characteristic indexes and have the number 1 in three months, namely 3-5 months, such as domestic voice time A of the number 1 on a certain day1Domestic flow B1Number of domestic short messages C1And foreign voice time length A2Foreign traffic B2And number of domestic short messages C2And calculating according to the domestic and foreign data meeting the characteristic indexes and a preset weighted summation algorithm. Specifically, in the preset weighted sum algorithm, the weight corresponding to the domestic voice durationWeight value of a1The corresponding weighted value of the foreign voice time length is a2The corresponding weight value of the domestic flow is b1The corresponding weighted value of foreign traffic is b2The weight value corresponding to the number of the domestic short messages is c1The weighted value corresponding to the number of foreign short messages is c2. Therefore, the final data X ═ a of the number 1 with respect to the voice time length can be calculated1A1+a2A2Final data on domestic traffic, Y ═ b1B1+b2B2And final data Z ═ c regarding domestic short message numbers1C1+c2C2X, Y and Z are training set data for number 1. Similarly, the data related to the number 1 in the month of month 2 is obtained, the same calculation is carried out, and the final data is the test set data related to the number 1.
And S104, training the recognition model according to the training set data and a preset mining algorithm, and testing the accuracy of the recognition model according to the test set data.
In the embodiment of the invention, after the PaaS platform determines the training set data and the test set data, the recognition model is trained according to the training set data and the preset mining algorithm, and further, the accuracy of the recognition model is tested according to the test set data.
It should be noted that, in the embodiment of the present invention, the PaaS platform stores the preset mining algorithm, the recognition models trained by different preset mining algorithms have different effects, and the specific preset mining algorithm is not limited in the embodiment of the present invention.
It can be understood that there are many mining algorithms commonly used in the prior art, including: a logistic regression algorithm, a random forest algorithm, a naive Bayes algorithm, and the like. The naive Bayes algorithm is good in performance aiming at small-scale data, can process any multi-classification, is suitable for incremental training, is not sensitive to missing data, is simple, and is commonly used for text classification. Random forest algorithms have been proved to be over-simulated in some classification or regression problems with larger noise, and for data with attributes of different values, attributes with more value division have larger influence on the random forest algorithms, so that attribute weights generated by the random forest algorithms on the data are not credible. The logistic regression algorithm is generally used for solving the binary problem, establishing a loss function, and then iteratively solving the optimal model parameters through an optimization method. Logistic regression is actually a classification method, and in a regression model, the output is a qualitative variable, such as 0 or 1.
It should be noted that, in the embodiment of the present invention, since the logistic regression algorithm is simple and easy to understand, and the calculation speed is fast, the weights of the characteristic indexes in the recognition model can be directly seen, the model can be easily updated to absorb new data, and the method is suitable for occasions with large data volume, and therefore, the logistic regression algorithm is set as the preset mining algorithm.
In the embodiment of the invention, the PaaS platform substitutes training set data into a logistic regression algorithm, and weights of all characteristic indexes can be determined through calculation, so that an identification model about the characteristic indexes is trained. That is, the process of training the model is actually to determine the importance of the feature index in the process of measuring whether the number is a silent number, i.e. the coefficient in the recognition model, according to the logistic regression algorithm. Specifically, training the recognition model according to training set data and a logistic regression algorithm is the prior art, and is not described herein again.
It can be understood that, in the embodiment of the present invention, after the PaaS platform obtains the identification model, the data of the test set is input into the identification model, and the accuracy of the identification model is determined according to the output result. For example, the test set includes the voice duration M1, the traffic M2 and the number M3 of the number 1 in the second preset time period, and the M1, the M2 and the M3 are input into the recognition model to recognize the number 1, and the output result is that the number 1 is a silent number or the number 1 is one of non-silent numbers. In step S102, the number 1 is already defined according to the default number definition rule, if the number 1 is currently defined as a positive sample belonging to the silent number category, and the result output by the recognition model is also that the number 1 is a silent number, it indicates that the recognition is accurate, and if the number 1 is currently defined as a positive sample belonging to the silent number category, and the result output by the recognition model is also that the number 1 is a non-silent number, it indicates that the recognition is inaccurate. Therefore, a large amount of data in the test set data is input into the identification model, and the accuracy of the identification model can be tested by comparing the output result with the previous definition result according to the preset silent number definition rule.
And S105, when the accuracy of the identification model is greater than or equal to a preset accuracy threshold, identifying whether the target number is a silent number or not according to domestic and foreign communication data and characteristic indexes corresponding to the target number and the identification model.
In the embodiment of the invention, after the PaaS platform trains the identification model and carries out accuracy test, the accuracy of the identification model is compared with a preset accuracy threshold, and when the accuracy of the identification model is more than or equal to the preset accuracy threshold, whether the target number is a silent number is identified according to domestic and foreign communication data and characteristic indexes corresponding to the target number and the identification model.
It should be noted that, in the embodiment of the present invention, a preset accuracy threshold is stored in the PaaS platform, and a designer may preset the accuracy threshold according to a specific requirement on the accuracy of the recognition model, and may set a larger value if the requirement on the recognition effect of the recognition model is higher, and may set a smaller value if the requirement on the recognition effect of the recognition model is general. The specific preset accuracy threshold is not limited in the embodiments of the present invention.
Specifically, in the embodiment of the present invention, when the accuracy of the recognition model is greater than or equal to the preset accuracy threshold, that is, the recognition effect of the recognition model meets the requirement. Therefore, the method can be used for identifying the international roaming silent number, determining data meeting the characteristic indexes according to the target number, such as domestic and foreign communication data and the characteristic indexes of a certain number, inputting an identification model, and judging whether the number is the silent number.
It should be noted that, in the embodiment of the present invention, after the step S104, a step S106 may be further included. Fig. 3 is a flowchart illustrating a second method for identifying an international roaming silent number according to an embodiment of the present invention. As shown in fig. 3, step S106 is:
and S106, when the accuracy of the recognition model is smaller than a preset accuracy threshold, adjusting the recognition model.
It should be noted that, in the embodiment of the present invention, after the PaaS platform performs the accuracy test on the identification model according to the test set data, when the accuracy of the identification model is smaller than the preset accuracy threshold, the identification model needs to be adjusted.
It can be understood that, in the embodiment of the present invention, the accuracy of the recognition model is not high, which may be the reason that the weight of each determined feature index is not accurate due to less training set data in the calculation process, or the selected feature index is not good, and therefore, corresponding correction may be performed to increase the training set data or change the feature index, and further adjust the recognition model.
It is to be appreciated that, in embodiments of the invention, after the recognition model is adjusted, the accuracy of the adjusted recognition model may also be tested according to the test set data. And when the accuracy of the adjusted identification model is greater than a preset accuracy threshold, identifying the silent number by using the adjusted identification model.
The embodiment of the invention provides an identification method of an international roaming silent number.A platform for identifying the international roaming silent number determines characteristic indexes based on the correlation among all indexes in preset basic indexes; determining a positive sample belonging to a silent number class and a negative sample belonging to a non-silent number class in a sample set according to a preset silent number definition rule; determining training set data and test set data according to domestic and foreign communication data corresponding to the positive sample, domestic and foreign communication data corresponding to the negative sample and characteristic indexes; training the recognition model according to the training set data and a preset mining algorithm, and carrying out accuracy test on the recognition model according to the test set data; and when the accuracy of the identification model is greater than or equal to a preset accuracy threshold, identifying whether the target number is a silent number or not according to domestic and foreign communication data and characteristic indexes corresponding to the target number and the identification model. That is to say, in the technical solution of the embodiment of the present invention, a model for identifying an international roaming silent number can be trained according to a communication behavior of the number, thereby improving accuracy of identifying the silent number.
Example two
Fig. 4 is a schematic structural diagram of an identification platform of an international roaming silent number according to an embodiment of the present invention. As shown in fig. 4, the platform includes: a processor 401, a memory 402, and a communication bus 403;
the communication bus 403 is used for realizing connection communication between the processor and the memory;
the processor 401 is configured to execute the program for identifying the international roaming silent number stored in the memory, so as to implement the following steps:
determining characteristic indexes based on the correlation among all indexes in preset basic indexes; determining a positive sample belonging to a silent number class and a negative sample belonging to a non-silent number class in a sample set according to a preset silent number definition rule; determining training set data and test set data according to the domestic and foreign communication data corresponding to the positive sample, the domestic and foreign communication data corresponding to the negative sample and the characteristic indexes; training a recognition model according to the training set data and a preset mining algorithm, and carrying out accuracy test on the recognition model according to the test set data; and when the accuracy of the identification model is greater than or equal to a preset accuracy threshold, identifying whether the target number is a silent number according to domestic and foreign communication data corresponding to the target number, the characteristic index and the identification model.
Optionally, the processor 401 is specifically configured to execute the identification procedure of the international roaming silent number, so as to implement the following steps:
checking the correlation among all indexes in the preset basic indexes; and determining indexes with the correlation smaller than a preset correlation threshold value in the preset basic indexes as the characteristic indexes.
Optionally, the processor 401 is specifically configured to execute the identification procedure of the international roaming silent number, so as to implement the following steps:
in the sample set, it will simultaneously satisfy: the average daily calling time is less than or equal to a first preset time, the average daily called time is less than or equal to a second preset time, the average daily flow is less than or equal to a first preset flow, and the average daily cost is less than or equal to a first preset cost, determining the samples as the positive samples, and simultaneously satisfying: and determining the samples with the average daily call time longer than the third preset time, the average daily flow larger than the second preset flow and the average daily charge larger than the second preset charge as the negative samples.
Calculating data which accords with the characteristic indexes in the domestic and foreign communication data corresponding to the positive sample and the domestic and foreign communication data corresponding to the negative sample within a first preset time period according to a preset weighted sum algorithm to obtain training set data;
and calculating the data which accords with the characteristic indexes in the domestic and foreign communication data corresponding to the positive sample and the domestic and foreign communication data corresponding to the negative sample in a second preset time period according to the preset weighted sum algorithm to obtain the test set data.
Optionally, the processor 401 is further configured to execute an identification procedure of the international roaming silent number after the accuracy test is performed on the identification model according to the test set data, so as to implement the following steps:
and when the accuracy of the identification model is smaller than the preset accuracy threshold, adjusting the identification model.
The embodiment of the invention provides an identification method of an international roaming silent number.A platform for identifying the international roaming silent number determines characteristic indexes based on the correlation among all indexes in preset basic indexes; determining a positive sample belonging to a silent number class and a negative sample belonging to a non-silent number class in a sample set according to a preset silent number definition rule; determining training set data and test set data according to domestic and foreign communication data corresponding to the positive sample, domestic and foreign communication data corresponding to the negative sample and characteristic indexes; training the recognition model according to the training set data and a preset mining algorithm, and carrying out accuracy test on the recognition model according to the test set data; and when the accuracy of the identification model is greater than or equal to a preset accuracy threshold, identifying whether the target number is a silent number or not according to domestic and foreign communication data and characteristic indexes corresponding to the target number and the identification model. That is to say, in the technical solution of the embodiment of the present invention, a model for identifying an international roaming silent number can be trained according to a communication behavior of the number, thereby improving accuracy of identifying the silent number.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention.

Claims (8)

1. A method for identifying international roaming silent numbers, the method comprising:
determining characteristic indexes based on the correlation among all indexes in preset basic indexes;
determining a positive sample belonging to a silent number class and a negative sample belonging to a non-silent number class in a sample set according to a preset silent number definition rule;
determining training set data and test set data according to the domestic and foreign communication data corresponding to the positive sample, the domestic and foreign communication data corresponding to the negative sample and the characteristic indexes;
training a recognition model according to the training set data and a preset mining algorithm, and carrying out accuracy test on the recognition model according to the test set data;
when the accuracy of the identification model is greater than or equal to a preset accuracy threshold, identifying whether the target number is a silent number or not according to domestic and foreign communication data corresponding to the target number, the characteristic index and the identification model;
determining characteristic indexes based on the correlation among all indexes in the preset basic indexes, wherein the characteristic indexes comprise:
checking the correlation of each index in the preset basic indexes, and determining the index of which the correlation is lower than a preset correlation threshold value in the preset basic indexes as a characteristic index; wherein the preset basic index is related to communication behavior, including: voice communication call number, voice duration, short message call number and flow.
2. The method of claim 1, wherein determining positive samples in the sample set belonging to a silent number class and negative samples belonging to a non-silent number class according to a preset silent number definition rule comprises:
in the sample set, it will simultaneously satisfy: the average daily calling time is less than or equal to a first preset time, the average daily called time is less than or equal to a second preset time, the average daily flow is less than or equal to a first preset flow, and the average daily cost is less than or equal to a first preset cost, determining the samples as the positive samples, and simultaneously satisfying: and determining the samples with the average daily call time longer than the third preset time, the average daily flow larger than the second preset flow and the average daily charge larger than the second preset charge as the negative samples.
3. The method according to claim 1, wherein the determining training set data and test set data according to the domestic and foreign communication data corresponding to the positive sample, the domestic and foreign communication data corresponding to the negative sample, and the characteristic index comprises:
calculating data which accords with the characteristic indexes in the domestic and foreign communication data corresponding to the positive sample and the domestic and foreign communication data corresponding to the negative sample within a first preset time period according to a preset weighted sum algorithm to obtain training set data;
and calculating the data which accords with the characteristic indexes in the domestic and foreign communication data corresponding to the positive sample and the domestic and foreign communication data corresponding to the negative sample in a second preset time period according to the preset weighted sum algorithm to obtain the test set data.
4. The method of claim 1, wherein after said accuracy testing of said identification model from said test set data, said method further comprises:
and when the accuracy of the identification model is smaller than the preset accuracy threshold, adjusting the identification model.
5. An identification platform for international roaming silent numbers, the platform comprising: a processor, a memory, and a communication bus;
the communication bus is used for realizing connection communication between the processor and the memory;
the processor is used for executing the identification program of the international roaming silent number stored in the memory so as to realize the following steps:
determining characteristic indexes based on the correlation among all indexes in preset basic indexes; determining a positive sample belonging to a silent number class and a negative sample belonging to a non-silent number class in a sample set according to a preset silent number definition rule; determining training set data and test set data according to the domestic and foreign communication data corresponding to the positive sample, the domestic and foreign communication data corresponding to the negative sample and the characteristic indexes; training a recognition model according to the training set data and a preset mining algorithm, and carrying out accuracy test on the recognition model according to the test set data; when the accuracy of the identification model is greater than or equal to a preset accuracy threshold, identifying whether the target number is a silent number or not according to domestic and foreign communication data corresponding to the target number, the characteristic index and the identification model;
the processor is used for executing the identification program of the international roaming silent number stored in the memory and is also used for realizing the following steps:
checking the correlation of each index in the preset basic indexes, and determining the index of which the correlation is lower than a preset correlation threshold value in the preset basic indexes as a characteristic index; wherein the preset basic index is related to communication behavior, including: voice communication call number, voice duration, short message call number and flow.
6. The platform of claim 5, wherein the processor is specifically configured to perform the procedure for identifying the international roaming silent number to implement the following steps:
in the sample set, it will simultaneously satisfy: the average daily calling time is less than or equal to a first preset time, the average daily called time is less than or equal to a second preset time, the average daily flow is less than or equal to a first preset flow, and the average daily cost is less than or equal to a first preset cost, determining the samples as the positive samples, and simultaneously satisfying: determining samples with the average daily call time being greater than a third preset time, the average daily flow being greater than a second preset flow and the average daily charge being greater than a second preset charge as the negative samples;
calculating data which accords with the characteristic indexes in the domestic and foreign communication data corresponding to the positive sample and the domestic and foreign communication data corresponding to the negative sample within a first preset time period according to a preset weighted sum algorithm to obtain training set data;
and calculating the data which accords with the characteristic indexes in the domestic and foreign communication data corresponding to the positive sample and the domestic and foreign communication data corresponding to the negative sample in a second preset time period according to the preset weighted sum algorithm to obtain the test set data.
7. The platform of claim 5, wherein the processor is further configured to perform an identification procedure of the international roaming silent number after the accuracy test of the identification model according to the test set data, so as to implement the following steps:
and when the accuracy of the identification model is smaller than the preset accuracy threshold, adjusting the identification model.
8. A computer-readable storage medium, characterized in that the computer-readable storage medium stores one or more programs which are executable by one or more processors to implement the method of any one of claims 1-4.
CN201810215482.0A 2018-03-15 2018-03-15 Method, platform and storage medium for identifying international roaming silent number Active CN110278555B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810215482.0A CN110278555B (en) 2018-03-15 2018-03-15 Method, platform and storage medium for identifying international roaming silent number

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810215482.0A CN110278555B (en) 2018-03-15 2018-03-15 Method, platform and storage medium for identifying international roaming silent number

Publications (2)

Publication Number Publication Date
CN110278555A CN110278555A (en) 2019-09-24
CN110278555B true CN110278555B (en) 2022-04-01

Family

ID=67958126

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810215482.0A Active CN110278555B (en) 2018-03-15 2018-03-15 Method, platform and storage medium for identifying international roaming silent number

Country Status (1)

Country Link
CN (1) CN110278555B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101170740A (en) * 2007-11-21 2008-04-30 中兴通讯股份有限公司 A system and method for recognizing silent user in SMS service management
CN107133265A (en) * 2017-03-31 2017-09-05 咪咕动漫有限公司 A kind of method and device of identification behavior abnormal user

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9195910B2 (en) * 2013-04-23 2015-11-24 Wal-Mart Stores, Inc. System and method for classification with effective use of manual data input and crowdsourcing

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101170740A (en) * 2007-11-21 2008-04-30 中兴通讯股份有限公司 A system and method for recognizing silent user in SMS service management
CN107133265A (en) * 2017-03-31 2017-09-05 咪咕动漫有限公司 A kind of method and device of identification behavior abnormal user

Also Published As

Publication number Publication date
CN110278555A (en) 2019-09-24

Similar Documents

Publication Publication Date Title
CN106202427B (en) Application processing method and device and computer storage medium
WO2019184054A1 (en) Method and system for processing on-screen comment information
CN101295381A (en) Junk mail detecting method
CN109816043B (en) Method and device for determining user identification model, electronic equipment and storage medium
CN110069545B (en) Behavior data evaluation method and device
CN109214446A (en) Potentiality good performance personnel kind identification method, system, terminal and computer readable storage medium
CN111476296A (en) Sample generation method, classification model training method, identification method and corresponding devices
CN111191731A (en) Data processing method and device, storage medium and electronic equipment
CN110189165A (en) Channel abnormal user and abnormal channel recognition methods and device
CN107230090B (en) Method and device for classifying net recommendation value NPS
CN113139570A (en) Dam safety monitoring data completion method based on optimal hybrid valuation
CN111881948A (en) Training method and device of neural network model, and data classification method and device
CN111639493A (en) Address information standardization method, device, equipment and readable storage medium
CN106611021B (en) Data processing method and equipment
CN110278555B (en) Method, platform and storage medium for identifying international roaming silent number
CN112289379B (en) Method and device for determining cell type, storage medium and electronic device
CN110751400B (en) Risk assessment method and device
CN115423600B (en) Data screening method, device, medium and electronic equipment
CN111368131A (en) User relationship identification method and device, electronic equipment and storage medium
CN116070958A (en) Attribution analysis method, attribution analysis device, electronic equipment and storage medium
CN107291722B (en) Descriptor classification method and device
CN111984842B (en) Bank customer data processing method and device
CN112487295A (en) 5G package pushing method and device, electronic equipment and computer storage medium
CN109451449B (en) Mobile communication 4G package downshift prediction method based on decision tree algorithm
CN112434839B (en) Distribution transformer heavy overload risk prediction method and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20200319

Address after: Room 1006, building 16, yard 16, Yingcai North Third Street, future science city, Changping District, Beijing 100032

Applicant after: China Mobile Information Technology Co., Ltd

Applicant after: CHINA MOBILE COMMUNICATIONS GROUP Co.,Ltd.

Address before: 100032 Beijing Finance Street, No. 29, Xicheng District

Applicant before: CHINA MOBILE COMMUNICATIONS GROUP Co.,Ltd.

Applicant before: CHINA MOBILE INFORMATION TECHNOLOGY Co.,Ltd.

GR01 Patent grant
GR01 Patent grant