CN112001756B - Method and device for determining abnormal telecommunication service scene and computer equipment - Google Patents
Method and device for determining abnormal telecommunication service scene and computer equipment Download PDFInfo
- Publication number
- CN112001756B CN112001756B CN202010854354.8A CN202010854354A CN112001756B CN 112001756 B CN112001756 B CN 112001756B CN 202010854354 A CN202010854354 A CN 202010854354A CN 112001756 B CN112001756 B CN 112001756B
- Authority
- CN
- China
- Prior art keywords
- telecommunication service
- index data
- service index
- clustered
- matched
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000002159 abnormal effect Effects 0.000 title claims abstract description 111
- 238000000034 method Methods 0.000 title claims abstract description 44
- 238000004364 calculation method Methods 0.000 claims description 39
- 238000004590 computer program Methods 0.000 claims description 25
- 238000011156 evaluation Methods 0.000 claims description 23
- 238000012545 processing Methods 0.000 claims description 14
- 238000012549 training Methods 0.000 claims description 11
- 238000004422 calculation algorithm Methods 0.000 claims description 9
- 230000006870 function Effects 0.000 claims description 8
- 238000010606 normalization Methods 0.000 claims description 2
- 230000000694 effects Effects 0.000 description 12
- 238000004458 analytical method Methods 0.000 description 7
- 238000000605 extraction Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 230000002829 reductive effect Effects 0.000 description 5
- 238000010276 construction Methods 0.000 description 2
- 238000012217 deletion Methods 0.000 description 2
- 230000037430 deletion Effects 0.000 description 2
- 239000013589 supplement Substances 0.000 description 2
- 230000005856 abnormality Effects 0.000 description 1
- 230000002411 adverse Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000000670 limiting effect Effects 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000036961 partial effect Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/906—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/60—Business processes related to postal services
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Strategic Management (AREA)
- Finance (AREA)
- Development Economics (AREA)
- Accounting & Taxation (AREA)
- Economics (AREA)
- Entrepreneurship & Innovation (AREA)
- Databases & Information Systems (AREA)
- General Business, Economics & Management (AREA)
- Marketing (AREA)
- General Engineering & Computer Science (AREA)
- Probability & Statistics with Applications (AREA)
- Evolutionary Biology (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Game Theory and Decision Science (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Human Resources & Organizations (AREA)
- Primary Health Care (AREA)
- Tourism & Hospitality (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application relates to a method and a device for determining an abnormal telecommunication service scene, computer equipment and a storage medium. The method comprises the following steps: acquiring initial telecommunication service index data; selecting telecommunication service index data matched with a telecommunication service scene from the initial telecommunication service index data; clustering the matched telecommunication service index data, and determining the clustered telecommunication service index data contained in each group corresponding to the target grouping number; and extracting the index characteristics of the clustered telecommunication service index data contained in each group, inputting the index characteristics into a preset abnormal telecommunication service scene determination model, and determining an abnormal telecommunication service scene. By adopting the method, the abnormal telecommunication service scene can be automatically determined according to the telecommunication service index data, so that the auditing efficiency of the telecommunication service data is improved, unnecessary investment can be saved, and the average income of each user is improved.
Description
Technical Field
The present application relates to the field of internet application technologies, and in particular, to a method and an apparatus for determining an abnormal telecommunication service scenario, a computer device, and a storage medium.
Background
With the rapid development of the telecommunication industry, the business system of the telecommunication operator has stronger bearing capacity, more and more complex business, and the generated telecommunication business data is more and more complicated. Generally, telecom operators need to monitor revenue loss risks, analyze, diagnose and process various problems checked in each service link according to telecom service data, so as to guarantee the quality of the telecom service data and the accurate realization of service revenue to the maximum extent and to recover or avoid revenue loss.
The traditional data auditing aiming at the telecommunication service scene usually adopts manual auditing, however, the traditional method for auditing the telecommunication service data has lower auditing efficiency.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a method, an apparatus, a computer device, and a storage medium for determining an abnormal telecommunication service scenario, which can improve auditing efficiency.
A method for determining an abnormal telecommunication service scenario, the method comprising:
acquiring initial telecommunication service index data;
selecting telecommunication service index data matched with a telecommunication service scene from the initial telecommunication service index data;
clustering the matched telecommunication service index data, and determining clustered telecommunication service index data contained in each group corresponding to the target grouping number;
extracting the index features of the clustered telecommunication service index data contained in each group, inputting the index features into a preset abnormal telecommunication service scene determination model, and determining an abnormal telecommunication service scene, wherein the abnormal telecommunication service scene determination model is obtained by training according to a telecommunication service index data sample of the abnormal telecommunication service scene.
In one embodiment, the initial teleservice indicator data comprises a plurality of types of teleservice indicator data; the clustering calculation of the matched telecommunication service index data to determine the clustered telecommunication service index data contained in each group corresponding to the target grouping number comprises the following steps:
in the matched telecommunication service index data, traversing and calculating two types of telecommunication service index data and obtaining the similarity between any two types of telecommunication service index data;
deleting one type of telecommunication service index data in the two types of telecommunication service index data with the similarity greater than a similarity threshold value from the matched telecommunication service index data to obtain first telecommunication service index data;
and performing clustering calculation on the first telecommunication service index data, and determining the clustered telecommunication service index data contained in each group corresponding to the target grouping number.
In one embodiment, the initial teleservice indicator data comprises a plurality of types of teleservice indicator data; the clustering calculation of the matched telecommunication service index data to determine the clustered telecommunication service index data contained in each group corresponding to the target grouping number comprises the following steps:
determining a discrete value for each type of teleservice indicator data in the matched teleservice indicator data;
deleting the telecommunication service index data of which the discrete value is smaller than the discrete value threshold value from the matched telecommunication service index data to obtain second telecommunication service index data;
and clustering the second telecommunication service index data, and determining the clustered telecommunication service index data contained in each group corresponding to the target grouping number.
In one embodiment, the initial teleservice indicator data comprises a plurality of types of teleservice indicator data; the clustering calculation of the matched telecommunication service index data to determine the clustered telecommunication service index data contained in each group corresponding to the target grouping number comprises the following steps:
determining a first quartile and a third quartile in the matched telecommunication service index data through a quartile method;
determining a lower edge and an upper edge in the matched telecommunication service index data according to the first quartile and the third quartile;
in the matched telecommunication service index data, replacing the telecommunication service index data smaller than the lower edge with the minimum number in a main body index data interval, and replacing the telecommunication service index data larger than the upper edge with the maximum number in the main body index data interval to obtain third telecommunication service index data, wherein the main body index data interval is formed by the telecommunication service index data between the lower edge and the upper edge;
and clustering the third telecommunication service index data, and determining the clustered telecommunication service index data contained in each group corresponding to the target grouping number.
In one embodiment, the clustering the third telecommunication service index data to determine the clustered telecommunication service index data included in each group corresponding to the number of target clusters includes:
normalizing the third telecommunication service index data to obtain normalized telecommunication service index data;
and clustering the normalized telecommunication service index data, and determining the clustered telecommunication service index data contained in each group corresponding to the target grouping number.
In one embodiment, the clustering the matched telecommunication service index data to determine the clustered telecommunication service index data included in each group corresponding to the number of target clusters includes:
selecting a plurality of candidate grouping numbers through a Kmeans + + algorithm, and respectively carrying out clustering calculation on the matched telecommunication service index data according to the candidate grouping numbers to obtain clustered telecommunication service index data and clustered evaluation scores contained in each group corresponding to the candidate grouping numbers;
and determining the clustered telecommunication service index data contained in each group corresponding to the target grouping number according to the candidate grouping number, the clustered telecommunication service index data contained in each group corresponding to the candidate grouping number and the clustered evaluation score.
In one embodiment, the selecting, from the initial teleservice index data, teleservice index data matched with a teleservice scenario includes:
selecting telecommunication service index data required by the telecommunication service scene from the initial telecommunication service index data, and carrying out digital processing on the required telecommunication service index data according to the weight corresponding to the required telecommunication service index data associated with the telecommunication service scene to obtain the telecommunication service index data matched with the telecommunication service scene.
An apparatus for determining abnormal telecommunication service scenario, the apparatus comprising:
the data acquisition module is used for acquiring initial telecommunication service index data;
the data matching module is used for selecting telecommunication service index data matched with a telecommunication service scene from the initial telecommunication service index data;
the data clustering module is used for clustering the matched telecommunication service index data and determining clustered telecommunication service index data contained in each group corresponding to the target grouping number;
and the scene determining module is used for extracting the index features of the clustered telecommunication service index data contained in each group, inputting the index features into a preset abnormal telecommunication service scene determining model and determining an abnormal telecommunication service scene, wherein the abnormal telecommunication service scene determining model is obtained by training according to the telecommunication service index data samples of the abnormal telecommunication service scene.
A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:
acquiring initial telecommunication service index data;
selecting telecommunication service index data matched with a telecommunication service scene from the initial telecommunication service index data;
clustering the matched telecommunication service index data, and determining clustered telecommunication service index data contained in each group corresponding to the target grouping number;
extracting the index features of the clustered telecommunication service index data contained in each group, inputting the index features into a preset abnormal telecommunication service scene determination model, and determining an abnormal telecommunication service scene, wherein the abnormal telecommunication service scene determination model is obtained by training according to a telecommunication service index data sample of the abnormal telecommunication service scene.
A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:
acquiring initial telecommunication service index data;
selecting telecommunication service index data matched with a telecommunication service scene from the initial telecommunication service index data;
performing clustering calculation on the matched telecommunication service index data, and determining the clustered telecommunication service index data contained in each group corresponding to the target grouping number;
extracting the index features of the clustered telecommunication service index data contained in each group, inputting the index features into a preset abnormal telecommunication service scene determination model, and determining an abnormal telecommunication service scene, wherein the abnormal telecommunication service scene determination model is obtained by training according to a telecommunication service index data sample of the abnormal telecommunication service scene.
The method, the device and the computer equipment for determining the abnormal telecommunication service scene comprise the steps of firstly selecting telecommunication service index data matched with the telecommunication service scene from initial telecommunication service index data, then carrying out clustering calculation on the matched telecommunication service index data, determining clustered telecommunication service index data contained in each group corresponding to the target grouping number, then extracting index characteristics of the clustered telecommunication service index data contained in each group, inputting the index characteristics into a preset abnormal telecommunication service scene determination model, and determining the abnormal telecommunication service scene. It can be understood that the abnormal telecommunication service scene is determined by performing data matching, clustering and feature extraction on the telecommunication service index data and finally inputting the extracted data into the abnormal telecommunication service scene determination model, so that the effect of automatically determining the abnormal telecommunication service scene according to the telecommunication service index data is realized, the auditing efficiency of the telecommunication service data is improved, unnecessary investment can be saved, and the average income (arpu value) of each user can be increased.
Drawings
FIG. 1 is a flow diagram illustrating a method for determining an abnormal telecommunication service scenario in one embodiment;
FIG. 2 is a schematic flow chart illustrating deletion of two types of teleservice indicator data with higher similarity in one embodiment;
FIG. 3 is a schematic flow chart illustrating deletion of similar index data of the same type of telecommunication services in one embodiment;
fig. 4 is a schematic flow chart illustrating a supplement scheme of clustered telecommunication service indicator data included in each group corresponding to the number of target groups, which is performed with cluster calculation on the matched telecommunication service indicator data in one embodiment;
fig. 5 is a schematic flow chart illustrating a supplement scheme of clustered telecommunication service indicator data included in each group corresponding to the number of target groups, which is performed with cluster calculation on the matched telecommunication service indicator data in another embodiment;
FIG. 6 is a block diagram of an apparatus for determining an abnormal teleservice scenario in one embodiment;
FIG. 7 is a diagram illustrating an internal structure of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more clearly understood, the present application is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
In an embodiment, as shown in fig. 1, a method for determining an abnormal telecommunication service scenario is provided, and this embodiment is illustrated by applying the method to a server, it is to be understood that the method may also be applied to a terminal, and may also be applied to a system including the terminal and the server, and is implemented by interaction between the terminal and the server. In this embodiment, the method includes the steps of:
step S202, obtaining initial telecommunication service index data.
Specifically, the server obtains initial telecommunication service index data. The initial telecommunication service index data refers to index data which is generated when the telecommunication service runs in the computer equipment and is related to the telecommunication service. Optionally, the initial teleservice indicator data comprises a plurality of types of teleservice indicator data. In one embodiment, the initial teleservice indicator data includes three types of indicator data, namely user occupied resource data, telecommunications actual revenue data, and telecommunications investment cost data.
Step S204, selecting telecommunication service index data matched with the telecommunication service scene from the initial telecommunication service index data.
In particular, typically, there are numerous types of teleservice scenarios, and the teleservice indicator data generated by different types of teleservice scenarios are typically stored together. Therefore, aiming at different telecommunication service scenes, the server matches corresponding telecommunication service index data from the initial telecommunication service index data according to the type of the telecommunication service scene and carries out digital processing to obtain the telecommunication service index data matched with the telecommunication service scene.
Step S206, clustering calculation is carried out on the matched telecommunication service index data, and clustered telecommunication service index data contained in each group corresponding to the target grouping number are determined.
Specifically, the server performs clustering calculation on the matched telecommunication service index data by adopting a preset clustering algorithm, and determines the clustered telecommunication service index data contained in each group corresponding to the target grouping number. It can be understood that the telecommunication service index data after clustering is divided into a plurality of groups, and each group contains the clustered telecommunication service index data. Wherein, the clustering number is the number of the clusters obtained after clustering.
And S208, extracting the index characteristics of the clustered telecommunication service index data contained in each group, inputting the index characteristics into a preset abnormal telecommunication service scene determination model, and determining an abnormal telecommunication service scene.
The abnormal telecommunication service scene determining model is obtained by training according to telecommunication service index data samples of the abnormal telecommunication service scene. In one embodiment, the abnormal telecommunication service scene determination model is determined according to the characteristics of the telecommunication service index data samples contained in each clustered group by performing feature extraction and telecommunication service rule analysis and matching, and analyzing the sample characteristics of each group.
Specifically, after obtaining the clustered telecommunication service index data contained in each group corresponding to the target grouping number, the server performs feature extraction on the clustered telecommunication service index data contained in each group to obtain a plurality of groups of index features, and then inputs the index features into a preset abnormal telecommunication service scene determination model, wherein the abnormal telecommunication service scene determination model is obtained by training telecommunication service index data samples of abnormal telecommunication service scenes, so that the abnormal telecommunication service scene determination model can be used for analyzing the telecommunication service index data with abnormality and determining the abnormal telecommunication service scenes. Optionally, the server may generate and output the warning information for the abnormal telecommunication service scene.
In the method for determining the abnormal telecommunication service scene, firstly, telecommunication service index data matched with the telecommunication service scene is selected from initial telecommunication service index data, then the matched telecommunication service index data is subjected to clustering calculation, the clustered telecommunication service index data contained in each group corresponding to the target clustering number is determined, then the index characteristics of the clustered telecommunication service index data contained in each group are extracted, and the index characteristics are input into a preset abnormal telecommunication service scene determination model to determine the abnormal telecommunication service scene. The method can be understood that the abnormal telecommunication service scene is determined by performing data matching, clustering and feature extraction on the telecommunication service index data and finally inputting the data into the abnormal telecommunication service scene determining model, so that the effect of automatically determining the abnormal telecommunication service scene according to the telecommunication service index data is realized, the auditing efficiency of the telecommunication service data is improved, unnecessary investment can be saved, and the average income of each user can be increased.
In one embodiment, the user occupied resource data includes at least one of user used speech volume, user used traffic, user used short traffic, user occupied wide bandwidth, user ITV profile, and fixed number of users. The telecommunication actual income data comprises at least one of a user charge-out amount, a user arrearage amount, a user discount and gift amount and a one-time fee income. And the telecommunication investment cost data is obtained by calculation according to the amount of the corresponding terminal of the user and the construction reduced amount. For example, the telecommunication investment cost can be obtained by apportioning the amount of money of the corresponding terminal of the user, the construction reduced amount of money and a preset formula.
In the embodiment, different types of telecommunication service index data are adopted, so that the judgment accuracy of the abnormal telecommunication service scene is improved, and the more types of the telecommunication service index data are adopted, the higher the judgment accuracy of the abnormal telecommunication service scene is.
In general, the scenarios of telecommunication services are diverse. The generated teleservice indicator data is usually different for different types of teleservice scenarios. In order to effectively analyze different types of telecommunication services, in an embodiment, the step S204 can be specifically implemented by the following steps:
step S2042, selecting telecommunication service index data required by a telecommunication service scene from the initial telecommunication service index data, and carrying out digital processing on the required telecommunication service index data according to the weight corresponding to the required telecommunication service index data associated with the telecommunication service scene to obtain the telecommunication service index data matched with the telecommunication service scene.
Specifically, the server selects telecommunication service index data required by the telecommunication service scene of the type from the initial telecommunication service index data according to the type of the telecommunication service scene. And then the server carries out digital processing on the required telecommunication service index data according to the weight and the correlation coefficient corresponding to the required telecommunication service index data set in the telecommunication service scene to obtain the telecommunication service index data matched with the telecommunication service scene.
In the embodiment, the accuracy of the telecommunication service scene analysis is ensured and the analysis efficiency is improved by selecting the telecommunication service index data required by the telecommunication service scene and carrying out digital processing.
In one embodiment, after obtaining the matched teleservice index data, the server performs quality detection on the matched teleservice index data, which specifically includes: the server identifies null values, zero values and abnormal values which are not matched with the telecommunication service scenes in the matched telecommunication service index data, then the server counts the occupation ratio of the null values or the zero values of each type of telecommunication service index data, and deletes the telecommunication service index data of which the occupation ratio is greater than the occupation ratio threshold value from the matched telecommunication service index data. Alternatively, the server may delete the teleservice indicator data with a percentage of more than 95% based on the 95-quantile principle. It is understood that the duty threshold may be set to other values.
In the embodiment, the telecommunication service index data which contains a large number of null values, zero values, abnormal values which are not matched with the telecommunication service scene and the like and have no analytical significance are deleted, so that on one hand, the data size for determining the abnormal telecommunication service scene can be reduced, and the data processing efficiency is improved; on the other hand, adverse effects of wrong telecommunication service index data on abnormal telecommunication service scene analysis are reduced, and the accuracy of determining the abnormal telecommunication service scene is improved.
In an embodiment, as shown in fig. 2, step S206 may be specifically implemented by the following steps:
step S2062, in the matched telecommunication service index data, traversing and calculating the telecommunication service index data of two types and obtaining the similarity between the telecommunication service index data of any two types;
step S2064, deleting one type of telecommunication service index data in the two types of telecommunication service index data with the similarity greater than the similarity threshold value from the matched telecommunication service index data to obtain first telecommunication service index data;
step S2066, clustering calculation is carried out on the first telecommunication service index data, and clustered telecommunication service index data contained in each group corresponding to the target grouping number are determined.
Specifically, in the matched telecommunication service index data, for different types of telecommunication service index data, the server calculates the similarity between two types of telecommunication service index data in a traversing manner, and deletes any one type of telecommunication service index data in the two types of telecommunication service index data with the similarity larger than the similarity threshold value from the matched telecommunication service index data, namely only one type of telecommunication service index data is reserved to obtain the first telecommunication service index data. For example, assuming that the telecommunication service index data includes user usage phrase volume, user arrearage amount and telecommunication investment cost data, traversing and calculating the similarity between any two types of telecommunication service index data refers to calculating the similarity between the user usage phrase volume and the user arrearage amount, calculating the similarity between the user arrearage amount and the telecommunication investment cost data and calculating the similarity between the user usage phrase volume and the telecommunication investment cost data. And then, the server performs clustering calculation on the first telecommunication service index data and determines the clustered telecommunication service index data contained in each group corresponding to the target grouping number. Optionally, the server may delete any one type of telecommunication service index data in two types of telecommunication service index data with a similarity greater than 0.95 based on a 95-quantile principle, and if, as in the previous example, the similarity between the user-used expression volume and the user arrearage amount is 0.98, the similarity between the user arrearage amount and the telecommunication investment cost data is 0.7, and the similarity between the user-used expression volume and the telecommunication investment cost data is 0.5, the server determines the telecommunication service index data with the similarity greater than the similarity threshold as the user-used expression volume and the user arrearage amount by comparing each similarity with the similarity threshold of 0.95, thereby deleting the user-used expression volume or the user arrearage amount. If the server deletes the volume of the wording used by the user, the arrearage amount of the user is reserved; if the server deletes the amount of the user defaulting, the volume of the wording used by the user is reserved. It is understood that the similarity threshold may be set to other values.
Optionally, in one embodiment, the similarity between any two types of teleservice indicator data includes a pearson correlation coefficient between any two types of teleservice indicator data. Specifically, the server may delete any one type of the two types of the telecommunication service index data with the similarity greater than 0.95 or less than-0.95 based on the 95-quantile principle. In another embodiment, the similarity between any two types of teleservice indicator data comprises a Jacard similarity factor between any two types of teleservice indicator data.
In this embodiment, it is considered that the effects of two different types of telecommunication service index data with high similarity on analyzing a telecommunication service scene are almost equal, so that by deleting one type of telecommunication service index data, not only the processing of the data amount can be reduced, but also the accuracy of analyzing an abnormal telecommunication service scene can be ensured.
In an embodiment, as shown in fig. 3, step S206 may be specifically implemented by the following steps:
step S2061, determining the discrete value of each type of telecommunication service index data in the matched telecommunication service index data;
step S2063, deleting the telecommunication service index data with the discrete value smaller than the discrete value threshold value from the matched telecommunication service index data to obtain second telecommunication service index data;
step S2065, performing cluster calculation on the second telecommunication service index data, and determining the clustered telecommunication service index data included in each group corresponding to the number of the target groups.
The discrete values of the telecommunication service index data can be used for representing the similarity of the telecommunication service index data of the same type.
Specifically, in the matched telecommunication service index data, for each type of telecommunication service index data, the server determines a discrete value of each type of telecommunication service index data, and deletes the telecommunication service index data of which the discrete value is smaller than a discrete value threshold value from the matched telecommunication service index data to obtain second telecommunication service index data. And then, the server performs clustering calculation on the second telecommunication service index data and determines the clustered telecommunication service index data contained in each group corresponding to the target grouping number. Alternatively, the server may delete the teleservice index data with the discrete value less than 0.05 based on the 95-quantile principle. It will be appreciated that the discrete value threshold may also be set to other values.
In this embodiment, if the similarity of the telecommunication service index data of the same type is very high, it indicates that the probability of the telecommunication service index data being abnormal is small, and the analysis effect on the abnormal telecommunication service scene is also small, so that deleting the telecommunication service index data can not only reduce the processing of the data amount, but also ensure the accuracy of the analysis on the abnormal telecommunication service scene.
In an embodiment, as shown in fig. 4, step S206 may be specifically implemented by the following steps:
step S206a, determining a first quartile and a third quartile in the matched telecommunication service index data by a quartile method;
step S206c, determining the lower edge and the upper edge in the matched telecommunication service index data according to the first quartile and the third quartile;
step S206e, in the matched telecommunication service index data, replacing the telecommunication service index data smaller than the lower edge with the minimum number in the main body index data interval, and replacing the telecommunication service index data larger than the upper edge with the maximum number in the main body index data interval to obtain third telecommunication service index data, wherein the main body index data interval is formed by the telecommunication service index data between the lower edge and the upper edge;
step S206g, performing clustering calculation on the third telecommunication service index data, and determining clustered telecommunication service index data included in each group corresponding to the number of target clusters.
Specifically, the server determines a first quartile, a second quartile and a third quartile in the matched telecommunication service index data through a quartile method, wherein the first quartile is smaller than the second quartile, and the second quartile is smaller than the third quartile. Then, the server processes the data by the formula: and determining the lower edge in the matched telecommunication service index data by using least (a third quartile plus the first quartile) 1.5, the maximum value in the matched telecommunication service index data), wherein the least function returns the minimum value of participation. Then, the server processes the data according to the formula: and determining the upper edge in the matched telecommunication service index data according to the larget (the first quartile- (the third quartile-the first quartile) × 1.5, the minimum value in the matched telecommunication service index data), wherein the larget function returns the maximum value of participation. Then, the server takes the telecommunication service index data between the lower edge and the upper edge as a main body index data interval, replaces the telecommunication service index data smaller than the lower edge with the minimum number in the main body index data interval, replaces the telecommunication service index data larger than the upper edge with the maximum number in the main body index data interval, and obtains third telecommunication service index data after the replacement is completed. And finally, the server performs clustering calculation on the third telecommunication service index data and determines the clustered telecommunication service index data contained in each group corresponding to the target grouping number.
Further, in an embodiment, the step S206g may be specifically implemented by the following steps: the server normalizes the third telecommunication service index data to obtain normalized telecommunication service index data, performs clustering calculation on the normalized telecommunication service index data, and determines clustered telecommunication service index data contained in each group corresponding to the number of target clusters. In the embodiment, the telecommunication service index data are mapped between 0 and 1, so that subsequent operation and analysis are facilitated, and the operation efficiency is improved.
In an embodiment, as shown in fig. 5, step S206 may be specifically implemented by the following steps:
step S206b, selecting a plurality of candidate clustering numbers through a Kmeans + + algorithm, and performing clustering calculation on the matched telecommunication service index data according to the plurality of candidate clustering numbers to obtain clustered telecommunication service index data and clustered evaluation scores contained in each group corresponding to the plurality of candidate clustering numbers;
step S206d, determining the clustered telecommunication service index data included in each group corresponding to the target clustering number according to the candidate clustering numbers and the clustered telecommunication service index data and the clustered evaluation scores included in each group corresponding thereto.
Specifically, the server selects a plurality of candidate clustering numbers K through a Kmeans + + algorithm. Alternatively, K may be selected to be a partial value or a full value of 5-50. And the server carries out clustering calculation on the matched telecommunication service index data according to the number of the candidate groups respectively to obtain clustered telecommunication service index data and clustered evaluation scores contained in each group corresponding to the number of the candidate groups. Table 1 shows an exemplary distribution of the number of clusters K and the clustered evaluation score, where the abscissa indicates the number of clusters K and the ordinate indicates the clustered evaluation score. The higher the evaluation score, the better the clustering effect.
TABLE 1
Then, the server determines the number of target groups by adopting the following formula:
max((Val-min(valList))/(max(valList)-min(valList))*(1-0.618)+(k-5)/(50-5))*0.618)
wherein max represents a max function, taking the maximum value; min represents a min function, and the minimum value is taken; val represents the clustered evaluation score; valList represents clustered telecommunication service index data contained in each group corresponding to the number of candidate groups; k represents the number of candidate clusters; 5-50 is that the channel number of the telecommunication operator is between 9000-30000, the value space is between 10-30 according to the 3 sigma principle (mu-3 sigma, mu +3 sigma), but in order to prevent special cases, the value space of the candidate grouping number is defined as 5-50 on the basis of not obviously increasing the operation amount; 0.618 represents a golden section line, and since a result having a large K value is prioritized in the case of a close val value, the calculation result of the K value is appropriately weighted.
Continuing with table 1, through the above calculation, the server determines that the number k of target clusters is 9, and although the evaluation score when k is 9 is slightly lower than that when k is 6, it is more favorable for the subdivision of the clusters when k is 9, so that the clustering effect is better when k is 9 in the aggregate.
In this embodiment, the number of candidate clusters, the clustered telecommunication service index data included in each group corresponding to the candidate clusters, and the clustered evaluation score are calculated to select the number of target clusters with a better clustering effect, and the abnormal telecommunication service scene is determined based on the clustered telecommunication service index data included in each group corresponding to the number of target clusters, so that the accuracy of scene determination can be improved.
In a specific application scenario, the server may input historical teleservice index data into the abnormal teleservice scenario determination model, identify historical teleservice scenarios, and thereby mine the unknown risk.
The unknown risk mining results are shown in table 2:
TABLE 2
The actual application effect of the method for determining the abnormal telecommunication service scene related to the embodiment of the application comprises the following steps:
abnormal telecommunication service scenario 1: the influence of user behavior data on income is abnormally represented as: the user usage amount is large, the corresponding expenditure amount is small, and the corresponding income is low; the user usage is small, the corresponding charge amount is large, and the corresponding off-network risk or complaint risk is high.
Abnormal telecommunication service scenario 2: the impact of telecommunication investments on revenue is manifested abnormally as: the telecommunication investment is large, the corresponding income is small, the corresponding risk is that the income is small, or the agreement period violates the contract to cause the income to be less than expected.
It should be understood that although the various steps in the flow diagrams of fig. 1-5 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 1-5 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed sequentially, but may be performed alternately or in alternation with other steps or at least some of the other steps or stages.
In one embodiment, as shown in fig. 6, there is provided an apparatus for determining an abnormal telecommunication service scenario, including: a data acquisition module 302, a data matching module 304, a data clustering module 306, and a scene determination module 308, wherein:
a data obtaining module 302, configured to obtain initial telecommunication service index data;
a data matching module 304, configured to select, from the initial telecommunication service index data, telecommunication service index data matched with a telecommunication service scene;
the data clustering module 306 is used for performing clustering calculation on the matched telecommunication service index data and determining clustered telecommunication service index data contained in each group corresponding to the target clustering number;
the scene determining module 308 is configured to extract an index feature of the clustered telecommunication service index data included in each group, input the index feature into a preset abnormal telecommunication service scene determining model, and determine an abnormal telecommunication service scene, where the abnormal telecommunication service scene determining model is obtained by training a telecommunication service index data sample of the abnormal telecommunication service scene.
In the device for determining the abnormal telecommunication service scene, firstly, telecommunication service index data matched with the telecommunication service scene is selected from initial telecommunication service index data, then the matched telecommunication service index data is subjected to clustering calculation, clustered telecommunication service index data contained in each group corresponding to the target grouping number are determined, then index features of the clustered telecommunication service index data contained in each group are extracted, and the index features are input into a preset abnormal telecommunication service scene determination model to determine the abnormal telecommunication service scene. The device can be understood that the abnormal telecommunication service scene is determined by performing data matching, clustering and feature extraction on the telecommunication service index data and finally inputting the data into the abnormal telecommunication service scene determination model, so that the effect of automatically determining the abnormal telecommunication service scene according to the telecommunication service index data is realized, the auditing efficiency of the telecommunication service data is improved, unnecessary investment can be saved, and the average income of each user is improved.
In one embodiment, the data clustering module 306 is specifically configured to traverse and calculate two types of telecommunication service index data in the matched telecommunication service index data and obtain a similarity between any two types of telecommunication service index data; deleting one type of telecommunication service index data in the two types of telecommunication service index data with the similarity greater than the similarity threshold value from the matched telecommunication service index data to obtain first telecommunication service index data; and clustering the first telecommunication service index data, and determining the clustered telecommunication service index data contained in each group corresponding to the target grouping number.
In one embodiment, the data clustering module 306 is specifically configured to determine a discrete value of each type of the teleservice indicator data in the matched teleservice indicator data; deleting the telecommunication service index data with the discrete value smaller than the discrete value threshold value from the matched telecommunication service index data to obtain second telecommunication service index data; and clustering the second telecommunication service index data, and determining the clustered telecommunication service index data contained in each group corresponding to the target grouping number.
In one embodiment, the data clustering module 306 is specifically configured to determine a first quartile and a third quartile in the matched telecommunication service indicator data by a quartile method; determining a lower edge and an upper edge in the matched telecommunication service index data according to the first quartile and the third quartile; in the matched telecommunication service index data, replacing the telecommunication service index data smaller than the lower edge with the minimum number in a main body index data interval, and replacing the telecommunication service index data larger than the upper edge with the maximum number in the main body index data interval to obtain third telecommunication service index data, wherein the main body index data interval is formed by the telecommunication service index data positioned between the lower edge and the upper edge; and clustering the third telecommunication service index data, and determining the clustered telecommunication service index data contained in each group corresponding to the target grouping number.
In an embodiment, the data clustering module 306 is specifically configured to perform normalization processing on the third telecommunication service index data to obtain normalized telecommunication service index data; and clustering the normalized telecommunication service index data, and determining the clustered telecommunication service index data contained in each group corresponding to the target grouping number.
In an embodiment, the data clustering module 306 is specifically configured to select a plurality of candidate clustering numbers through a Kmeans + + algorithm, and perform clustering calculation on the matched telecommunication service index data according to the plurality of candidate clustering numbers, to obtain clustered telecommunication service index data and clustered evaluation scores included in each group corresponding to the plurality of candidate clustering numbers; and determining the clustered telecommunication service index data contained in each group corresponding to the target grouping number according to the candidate grouping number, the clustered telecommunication service index data contained in each group corresponding to the candidate grouping number and the clustered evaluation score.
In an embodiment, the data matching module 304 is specifically configured to select, from the initial telecommunication service index data, telecommunication service index data required by a telecommunication service scene, and perform digital processing on the required telecommunication service index data according to a weight corresponding to the required telecommunication service index data associated with the telecommunication service scene, so as to obtain telecommunication service index data matched with the telecommunication service scene.
For specific limitations of the determining apparatus for an abnormal telecommunication service scenario, reference may be made to the above limitations of the determining method for an abnormal telecommunication service scenario, and details are not described herein again. The modules in the device for determining the abnormal telecommunication service scenario may be implemented in whole or in part by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 7. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of determining an abnormal telecommunication service scenario.
It will be appreciated by those skilled in the art that the configuration shown in fig. 7 is a block diagram of only a portion of the configuration associated with the present application, and is not intended to limit the computing device to which the present application may be applied, and that a particular computing device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program:
acquiring initial telecommunication service index data;
selecting telecommunication service index data matched with a telecommunication service scene from the initial telecommunication service index data;
clustering the matched telecommunication service index data, and determining clustered telecommunication service index data contained in each group corresponding to the target grouping number;
extracting the index features of the clustered telecommunication service index data contained in each group, inputting the index features into a preset abnormal telecommunication service scene determination model, and determining an abnormal telecommunication service scene, wherein the abnormal telecommunication service scene determination model is obtained by training according to the telecommunication service index data samples of the abnormal telecommunication service scene.
In the computer equipment, firstly, telecommunication service index data matched with a telecommunication service scene is selected from initial telecommunication service index data, then the matched telecommunication service index data is subjected to clustering calculation, clustered telecommunication service index data contained in each group corresponding to the target grouping number is determined, then index features of the clustered telecommunication service index data contained in each group are extracted, the index features are input into a preset abnormal telecommunication service scene determination model, and the abnormal telecommunication service scene is determined. It can be understood that the computer device determines the abnormal telecommunication service scene by performing data matching, clustering and feature extraction on the telecommunication service index data and finally inputting the data into the abnormal telecommunication service scene determination model, thereby realizing the effect of automatically determining the abnormal telecommunication service scene according to the telecommunication service index data, improving the auditing efficiency of the telecommunication service data, and being capable of saving unnecessary investment and improving the average income of each user.
In one embodiment, the processor when executing the computer program further performs the steps of: in the matched telecommunication service index data, traversing and calculating two types of telecommunication service index data and obtaining the similarity between any two types of telecommunication service index data; deleting one type of telecommunication service index data in the two types of telecommunication service index data with the similarity greater than the similarity threshold value from the matched telecommunication service index data to obtain first telecommunication service index data; and clustering the first telecommunication service index data, and determining the clustered telecommunication service index data contained in each group corresponding to the target grouping number.
In one embodiment, the processor, when executing the computer program, further performs the steps of: determining a discrete value of each type of telecommunication service index data in the matched telecommunication service index data; deleting the telecommunication service index data with the discrete value smaller than the discrete value threshold value from the matched telecommunication service index data to obtain second telecommunication service index data; and clustering the second telecommunication service index data, and determining the clustered telecommunication service index data contained in each group corresponding to the target grouping number.
In one embodiment, the processor, when executing the computer program, further performs the steps of: determining a first quartile and a third quartile in the matched telecommunication service index data by a quartile method; determining a lower edge and an upper edge in the matched telecommunication service index data according to the first quartile and the third quartile; in the matched telecommunication service index data, replacing the telecommunication service index data smaller than the lower edge with the minimum number in a main body index data interval, and replacing the telecommunication service index data larger than the upper edge with the maximum number in the main body index data interval to obtain third telecommunication service index data, wherein the main body index data interval consists of the telecommunication service index data positioned between the lower edge and the upper edge; and clustering the third telecommunication service index data, and determining the clustered telecommunication service index data contained in each group corresponding to the target grouping number.
In one embodiment, the processor, when executing the computer program, further performs the steps of: normalizing the third telecommunication service index data to obtain normalized telecommunication service index data; and clustering the normalized telecommunication service index data, and determining the clustered telecommunication service index data contained in each group corresponding to the target grouping number.
In one embodiment, the processor, when executing the computer program, further performs the steps of: selecting a plurality of candidate clustering numbers through a Kmeans + + algorithm, and performing clustering calculation on the matched telecommunication service index data according to the plurality of candidate clustering numbers to obtain clustered telecommunication service index data and clustered evaluation scores contained in each group corresponding to the plurality of candidate clustering numbers; and determining the clustered telecommunication service index data contained in each group corresponding to the target grouping number according to the candidate grouping number, the clustered telecommunication service index data contained in each group corresponding to the candidate grouping number and the clustered evaluation score.
In one embodiment, the processor, when executing the computer program, further performs the steps of: selecting telecommunication service index data required by a telecommunication service scene from the initial telecommunication service index data, and carrying out digital processing on the required telecommunication service index data according to the weight corresponding to the required telecommunication service index data associated with the telecommunication service scene to obtain the telecommunication service index data matched with the telecommunication service scene.
In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, performs the steps of:
acquiring initial telecommunication service index data;
selecting telecommunication service index data matched with a telecommunication service scene from the initial telecommunication service index data;
clustering the matched telecommunication service index data, and determining clustered telecommunication service index data contained in each group corresponding to the target grouping number;
extracting the index features of the clustered telecommunication service index data contained in each group, inputting the index features into a preset abnormal telecommunication service scene determination model, and determining an abnormal telecommunication service scene, wherein the abnormal telecommunication service scene determination model is obtained by training according to the telecommunication service index data samples of the abnormal telecommunication service scene.
In the computer-readable storage medium, firstly, telecommunication service index data matched with a telecommunication service scene is selected from initial telecommunication service index data, then clustering calculation is carried out on the matched telecommunication service index data, clustered telecommunication service index data contained in each group corresponding to the number of target groups are determined, then index features of the clustered telecommunication service index data contained in each group are extracted, and the index features are input into a preset abnormal telecommunication service scene determination model to determine an abnormal telecommunication service scene. It can be understood that the computer-readable storage medium determines the abnormal telecommunication service scene by performing data matching, clustering and feature extraction on the telecommunication service index data and finally inputting the data into the abnormal telecommunication service scene determination model, thereby achieving the effect of automatically determining the abnormal telecommunication service scene according to the telecommunication service index data, improving the auditing efficiency of the telecommunication service data, and being capable of saving unnecessary investment and improving the average income of each user.
In one embodiment, the computer program when executed by the processor further performs the steps of: in the matched telecommunication service index data, traversing and calculating the telecommunication service index data of two types and obtaining the similarity between the telecommunication service index data of any two types; deleting one type of telecommunication service index data in the two types of telecommunication service index data with the similarity larger than the similarity threshold value from the matched telecommunication service index data to obtain first telecommunication service index data; and clustering the first telecommunication service index data, and determining the clustered telecommunication service index data contained in each group corresponding to the target grouping number.
In one embodiment, the computer program when executed by the processor further performs the steps of: determining a discrete value of each type of telecommunication service index data in the matched telecommunication service index data; deleting the telecommunication service index data with the discrete value smaller than the discrete value threshold value from the matched telecommunication service index data to obtain second telecommunication service index data; and clustering the second telecommunication service index data, and determining the clustered telecommunication service index data contained in each group corresponding to the target grouping number.
In one embodiment, the computer program when executed by the processor further performs the steps of: determining a first quartile and a third quartile in the matched telecommunication service index data by a quartile method; determining a lower edge and an upper edge in the matched telecommunication service index data according to the first quartile and the third quartile; in the matched telecommunication service index data, replacing the telecommunication service index data smaller than the lower edge with the minimum number in a main body index data interval, and replacing the telecommunication service index data larger than the upper edge with the maximum number in the main body index data interval to obtain third telecommunication service index data, wherein the main body index data interval is formed by the telecommunication service index data positioned between the lower edge and the upper edge; and clustering the third telecommunication service index data, and determining the clustered telecommunication service index data contained in each group corresponding to the target grouping number.
In one embodiment, the computer program when executed by the processor further performs the steps of: normalizing the third telecommunication service index data to obtain normalized telecommunication service index data; and clustering the normalized telecommunication service index data, and determining the clustered telecommunication service index data contained in each group corresponding to the target grouping number.
In one embodiment, the computer program when executed by the processor further performs the steps of: selecting a plurality of candidate grouping numbers through a Kmeans + + algorithm, and respectively carrying out clustering calculation on the matched telecommunication service index data according to the plurality of candidate grouping numbers to obtain clustered telecommunication service index data and clustered evaluation scores contained in each group corresponding to the plurality of candidate grouping numbers; and determining the clustered telecommunication service index data contained in each group corresponding to the target grouping number according to the candidate grouping number, the clustered telecommunication service index data contained in each group corresponding to the candidate grouping number and the clustered evaluation score.
In one embodiment, the computer program when executed by the processor further performs the steps of: selecting telecommunication service index data required by a telecommunication service scene from the initial telecommunication service index data, and carrying out digital processing on the required telecommunication service index data according to the weight corresponding to the required telecommunication service index data associated with the telecommunication service scene to obtain the telecommunication service index data matched with the telecommunication service scene.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.
Claims (9)
1. A method for determining an abnormal telecommunication service scene, the method comprising:
acquiring initial telecommunication service index data;
selecting telecommunication service index data matched with a telecommunication service scene from the initial telecommunication service index data;
performing cluster calculation on normalized telecommunication service index data corresponding to the matched telecommunication service index data, and determining the clustered telecommunication service index data contained in each group corresponding to the number of target groups, wherein the normalized telecommunication service index data is obtained by performing normalization processing on the matched telecommunication service index data;
extracting the index features of the clustered telecommunication service index data contained in each group, inputting the index features into a preset abnormal telecommunication service scene determining model, and determining an abnormal telecommunication service scene, wherein the abnormal telecommunication service scene determining model is obtained by training according to a telecommunication service index data sample of the abnormal telecommunication service scene;
the clustering calculation of the matched telecommunication service index data and the determination of the clustered telecommunication service index data contained in each group corresponding to the target grouping number specifically comprise:
selecting a plurality of candidate grouping numbers through a Kmeans + + algorithm, and respectively carrying out clustering calculation on the matched telecommunication service index data according to the candidate grouping numbers to obtain clustered telecommunication service index data and clustered evaluation scores contained in each group corresponding to the candidate grouping numbers; determining clustered telecommunication service index data contained in each group corresponding to the target grouping number according to the candidate grouping number, the clustered telecommunication service index data contained in each group corresponding to the candidate grouping number and the clustered evaluation score;
the number of the target groups is calculated by the following formula:
max((Val-min(valList))/(max(valList)-min(valList))*(1-0.618)+((k-5)/(50-5))*0.618);
wherein max represents a max function, taking the maximum value; min represents a min function, and the minimum value is taken; val represents the evaluation score after clustering; the valList represents clustered telecommunication service index data contained in each group corresponding to the number of the candidate groups; k represents the number of candidate clusters; 5-50 represents the value space of the candidate group number; 0.618 represents the golden section line for weighting the k value.
2. The method of claim 1, wherein the initial teleservice indicator data comprises a plurality of types of teleservice indicator data; the clustering calculation of the matched telecommunication service index data to determine the clustered telecommunication service index data contained in each group corresponding to the target grouping number comprises the following steps:
in the matched telecommunication service index data, traversing and calculating two types of telecommunication service index data and obtaining the similarity between any two types of telecommunication service index data;
deleting one type of telecommunication service index data in the two types of telecommunication service index data with the similarity greater than a similarity threshold value from the matched telecommunication service index data to obtain first telecommunication service index data;
and performing clustering calculation on the first telecommunication service index data, and determining the clustered telecommunication service index data contained in each group corresponding to the target grouping number.
3. The method of claim 1, wherein the initial teleservice indicator data comprises a plurality of types of teleservice indicator data; the clustering calculation of the matched telecommunication service index data to determine the clustered telecommunication service index data contained in each group corresponding to the target grouping number comprises the following steps:
determining a discrete value for each type of teleservice indicator data in the matched teleservice indicator data;
deleting the telecommunication service index data of which the discrete value is smaller than the discrete value threshold value from the matched telecommunication service index data to obtain second telecommunication service index data;
and clustering the second telecommunication service index data, and determining the clustered telecommunication service index data contained in each group corresponding to the target grouping number.
4. The method of claim 1, wherein the initial teleservice indicator data comprises a plurality of types of teleservice indicator data.
5. The method of claim 1, wherein the clustering the matched teleservice index data to determine the clustered teleservice index data included in each group corresponding to the number of target clusters comprises:
determining a first quartile and a third quartile in the matched telecommunication service index data by a quartile method;
determining a lower edge and an upper edge in the matched telecommunication service index data according to the first quartile and the third quartile;
in the matched telecommunication service index data, replacing the telecommunication service index data smaller than the lower edge with the minimum number in a main body index data interval, and replacing the telecommunication service index data larger than the upper edge with the maximum number in the main body index data interval to obtain third telecommunication service index data, wherein the main body index data interval is formed by the telecommunication service index data between the lower edge and the upper edge;
and performing clustering calculation on the telecommunication service index data obtained after the third telecommunication service index data is normalized, and determining the clustered telecommunication service index data contained in each group corresponding to the target grouping number.
6. The method of claim 1, wherein the selecting the teleservice indicator data matching the teleservice scenario from the initial teleservice indicator data comprises:
selecting telecommunication service index data required by the telecommunication service scene from the initial telecommunication service index data, and carrying out digital processing on the required telecommunication service index data according to the weight corresponding to the required telecommunication service index data associated with the telecommunication service scene to obtain the telecommunication service index data matched with the telecommunication service scene.
7. An apparatus for determining abnormal telecommunication service scenario, the apparatus comprising:
the data acquisition module is used for acquiring initial telecommunication service index data;
the data matching module is used for selecting telecommunication service index data matched with a telecommunication service scene from the initial telecommunication service index data;
the data clustering module is used for clustering and calculating normalized telecommunication service index data corresponding to the matched telecommunication service index data and determining the clustered telecommunication service index data contained in each group corresponding to the target grouping number, wherein the normalized telecommunication service index data are obtained by normalizing the matched telecommunication service index data;
the data clustering module is specifically used for selecting a plurality of candidate clustering numbers through a Kmeans + + algorithm, and performing clustering calculation on the matched telecommunication service index data according to the candidate clustering numbers to obtain clustered telecommunication service index data and clustered evaluation scores contained in each group corresponding to the candidate clustering numbers; determining clustered telecommunication service index data contained in each group corresponding to the target grouping number according to the candidate grouping number, the clustered telecommunication service index data contained in each group corresponding to the candidate grouping number and the clustered evaluation score;
the scene determining module is used for extracting the index characteristics of the clustered telecommunication service index data contained in each group, inputting the index characteristics into a preset abnormal telecommunication service scene determining model and determining an abnormal telecommunication service scene, wherein the abnormal telecommunication service scene determining model is obtained by training according to a telecommunication service index data sample of the abnormal telecommunication service scene;
the target grouping number is calculated by the following formula:
max((Val-min(valList))/(max(valList)-min(valList))*(1-0.618)+((k-5)/(50-5))*0.618);
wherein max represents a max function, taking the maximum value; min represents a min function, and the minimum value is taken; val represents the evaluation score after clustering; valList represents clustered telecommunication service index data contained in each group corresponding to the number of candidate groups; k represents the number of candidate clusters; 5-50 represents the value space of the candidate group number; 0.618 represents the golden section line for weighting the k value.
8. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 6.
9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010854354.8A CN112001756B (en) | 2020-08-24 | 2020-08-24 | Method and device for determining abnormal telecommunication service scene and computer equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010854354.8A CN112001756B (en) | 2020-08-24 | 2020-08-24 | Method and device for determining abnormal telecommunication service scene and computer equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112001756A CN112001756A (en) | 2020-11-27 |
CN112001756B true CN112001756B (en) | 2022-07-12 |
Family
ID=73473136
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010854354.8A Active CN112001756B (en) | 2020-08-24 | 2020-08-24 | Method and device for determining abnormal telecommunication service scene and computer equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112001756B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112633638B (en) * | 2020-12-03 | 2022-07-08 | 北京道隆华尔软件股份有限公司 | Business risk assessment method and device, computer equipment and storage medium |
CN112783725B (en) * | 2021-01-26 | 2024-04-09 | 中国工商银行股份有限公司 | Index collection method and device |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108809745A (en) * | 2017-05-02 | 2018-11-13 | 中国移动通信集团重庆有限公司 | A kind of user's anomaly detection method, apparatus and system |
CN110837874A (en) * | 2019-11-18 | 2020-02-25 | 上海新炬网络信息技术股份有限公司 | Service data abnormity detection method based on time series classification |
CN111371581A (en) * | 2018-12-26 | 2020-07-03 | 中国移动通信集团重庆有限公司 | Method, device, equipment and medium for detecting business abnormity of Internet of things card |
CN111445259A (en) * | 2018-12-27 | 2020-07-24 | 中国移动通信集团辽宁有限公司 | Method, device, equipment and medium for determining business fraud behaviors |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11277420B2 (en) * | 2017-02-24 | 2022-03-15 | Ciena Corporation | Systems and methods to detect abnormal behavior in networks |
-
2020
- 2020-08-24 CN CN202010854354.8A patent/CN112001756B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108809745A (en) * | 2017-05-02 | 2018-11-13 | 中国移动通信集团重庆有限公司 | A kind of user's anomaly detection method, apparatus and system |
CN111371581A (en) * | 2018-12-26 | 2020-07-03 | 中国移动通信集团重庆有限公司 | Method, device, equipment and medium for detecting business abnormity of Internet of things card |
CN111445259A (en) * | 2018-12-27 | 2020-07-24 | 中国移动通信集团辽宁有限公司 | Method, device, equipment and medium for determining business fraud behaviors |
CN110837874A (en) * | 2019-11-18 | 2020-02-25 | 上海新炬网络信息技术股份有限公司 | Service data abnormity detection method based on time series classification |
Also Published As
Publication number | Publication date |
---|---|
CN112001756A (en) | 2020-11-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110363387A (en) | Portrait analysis method, device, computer equipment and storage medium based on big data | |
WO2019218699A1 (en) | Fraud transaction determining method and apparatus, computer device, and storage medium | |
CN112001756B (en) | Method and device for determining abnormal telecommunication service scene and computer equipment | |
CN112633962B (en) | Service recommendation method and device, computer equipment and storage medium | |
CN110503566B (en) | Wind control model building method and device, computer equipment and storage medium | |
CN111915156B (en) | Service pushing method based on user value, electronic equipment and storage medium | |
CN114125154B (en) | Outbound policy parameter adjusting method and device, computer equipment and storage medium | |
CN111797320A (en) | Data processing method, device, equipment and storage medium | |
CN111652661B (en) | Mobile phone client user loss early warning processing method | |
CN111177217A (en) | Data preprocessing method and device, computer equipment and storage medium | |
CN112884569A (en) | Credit assessment model training method, device and equipment | |
CN112785420A (en) | Credit scoring model training method and device, electronic equipment and storage medium | |
CN110348215B (en) | Abnormal object identification method, abnormal object identification device, electronic equipment and medium | |
CN115357764A (en) | Abnormal data detection method and device | |
CN113448955B (en) | Data set quality evaluation method and device, computer equipment and storage medium | |
CN112348685A (en) | Credit scoring method, device, equipment and storage medium | |
CN112905987B (en) | Account identification method, device, server and storage medium | |
CN114997879B (en) | Payment routing method, device, equipment and storage medium | |
CN114697127B (en) | Service session risk processing method based on cloud computing and server | |
CN113052422A (en) | Wind control model training method and user credit evaluation method | |
CN113313386B (en) | Intelligent voice investigation system and investigation method for automobile financial risk | |
CN115660730A (en) | Loss user analysis method and system based on classification algorithm | |
CN113239236B (en) | Video processing method and device, electronic equipment and storage medium | |
CN116012123B (en) | Wind control rule engine method and system based on Rete algorithm | |
CN113139447A (en) | Feature analysis method, feature analysis device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |