CN113163057B - Method for constructing dynamic identification interval of fraud telephone - Google Patents

Method for constructing dynamic identification interval of fraud telephone Download PDF

Info

Publication number
CN113163057B
CN113163057B CN202110073654.7A CN202110073654A CN113163057B CN 113163057 B CN113163057 B CN 113163057B CN 202110073654 A CN202110073654 A CN 202110073654A CN 113163057 B CN113163057 B CN 113163057B
Authority
CN
China
Prior art keywords
sample
data
fraud
model
telephone
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110073654.7A
Other languages
Chinese (zh)
Other versions
CN113163057A (en
Inventor
林绍福
常晴晴
刘希亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN202110073654.7A priority Critical patent/CN113163057B/en
Publication of CN113163057A publication Critical patent/CN113163057A/en
Application granted granted Critical
Publication of CN113163057B publication Critical patent/CN113163057B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/22Arrangements for supervision, monitoring or testing
    • H04M3/2281Call monitoring, e.g. for law enforcement purposes; Call tracing; Detection or prevention of malicious calls
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Security & Cryptography (AREA)
  • Technology Law (AREA)
  • Signal Processing (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention discloses a method for constructing a dynamic identification interval of a fraud telephone. The method creatively provides a method for combining the super-parameter optimization and the gradient elevator to construct the fraud phone identification model, optimizes the parameters of the gradient elevator by using the super-parameter optimization algorithm, and improves the model identification effect. The method uses a random forest algorithm to select the data features, and selects the dimension with the feature importance degree larger than 0.8 to construct the fraud telephone feature vector. The invention provides a method for sampling data by using a mixed sampling method combining undersampling and oversampling, which relieves the imbalance of data distribution and is a feasible method through experimental verification. The invention provides a parameterization method based on a probability prediction model, which takes the probability output by a classifier as the confidence coefficient of a sample, and constructs a fraud telephone dynamic identification interval according to the sample confidence coefficient output by the model.

Description

Method for constructing dynamic identification interval of fraud telephone
Technical Field
The invention relates to the field of internet communication and artificial intelligence, in particular to a method for constructing a dynamic identification interval of a fraud telephone, which can be applied to the field of telecommunication anti-fraud.
Background
The fraudulent call seriously disturbs the normal communication order, impairs the free communication of the citizens and interferes with the normal working and life of the masses, which is a serious problem in the current society. How to effectively identify and intercept fraud phones plays an important role in the anti-fraud mechanism of telecommunications, and has attracted extensive attention from academic, industrial, and government subsidized institutions.
In the related art, fraud telephone identification by using a crowdsourcing method is a common method, but the crowdsourcing method has high cost and low efficiency. With the rapid development of the artificial intelligence technology, in the related technology, a fraud telephone identification model is also constructed by using a machine learning method, but most researchers only evaluate the quality of the model by using the accuracy rate output by the model, however, for a typical unbalanced data set such as fraud telephone bill data, the model identification has a large deviation, and the accuracy rate cannot accurately reflect the identification effect of the model. Therefore, the invention provides a fraud telephone dynamic identification interval based on a machine learning algorithm with various evaluation indexes.
Disclosure of Invention
The invention aims to provide a method for constructing a fraud telephone identification dynamic interval, and aims to solve the problem of low fraud telephone identification accuracy in an anti-fraud scene in the telecommunication field, namely a telecommunication operator can use the model to complete fraud telephone identification and take corresponding control measures, so that the user loss is reduced, and the user experience is improved. The method comprises the steps of inputting user ticket log data serving as a model, outputting the confidence coefficient that each piece of user ticket log data is a fraud phone through model analysis and judgment, judging whether a sample is a suspicious fraud phone according to the confidence coefficient and upper and lower bound threshold values of a fraud phone dynamic interval, and providing important reference for an operator to analyze and manage users.
A method for constructing a dynamic identification interval of a fraud phone is characterized by comprising the following steps,
step 1: a method for extracting features of fraud telephone user bill data based on random forest is provided;
step 2: according to the data processed in the step 1, a hybrid sampling method is used for carrying out rebalancing processing on the data, so that the influence on the model caused by unbalanced distribution of the data is reduced;
and 3, step 3: according to the characteristics of the phone bill data of the fraud phone user, a fraud phone identification model is constructed, and the model identification effect is measured by using various evaluation indexes;
and 4, step 4: and 3, judging the probability that the data sample is a fraud phone by using the fraud phone identification model, and constructing a fraud phone dynamic identification interval.
1. The method comprises the steps of calculating the information gain of each dimension characteristic in a data set by utilizing a random forest fraud telephone user ticket data characteristic extraction method, constructing node splitting of each tree according to the information gain, and finally calculating the score of each dimension data. The original fraud telephone user ticket data is used as input, the VIM is used for representing the importance measurement of the variable, and the GI is used for representing the Gini index.
The training data set S with n examples is defined as:
S={s i },i=1,2,...,n (1)
wherein s is i Represents any sample point in the sample set, n represents that the sample set contains n sample points, s i The definition of (A) is shown in formula 2.
s i =(x i ,y i ),i=1,2,...,n (2)
Wherein x is i ={v 1 ,v 2 ,...,v w Is an example, v j Is represented by x i Characteristic of the sample, y i X represents a corresponding x i The data of the label is divided into normal telephone user call bill data and fraud telephone user call bill data, namely C is 2.
Data dimension used in the invention is data desensitization mobile phone number v 1 And the called mobile phone number v 2 Frequency v of conversation 3 Ratio of successful connections v 4 Average duration of conversation v 5 Average duration v of the ring tone 6 Call type v 7 Calling time v 8 Duration of the call v 9 Ratio v of hung up calls 10 Condition v of the mobile phone 11 Time of conversation v 12 A field. Therefore, in the present invention w is 12.
GI kini index is defined as:
Figure GDA0003088841390000021
wherein K denotes K classes, p mk Represents the proportion of the class k in the node m, p mk' Indicating the proportion of nodes m whose classes are not k.
The VIM variable importance is defined as:
Figure GDA0003088841390000031
wherein, GI left And GI right GI indexes of left and right new branch nodes respectively representing m nodes.
Finally, theThe importance measures of all variables are normalized. Feature v for any fraudulent call i With importance of VIM i The normalized calculation formula of the importance in this period is shown in formula (5).
Figure GDA0003088841390000032
Where Σ VIM represents the sum of feature importance of the 12 features in the present invention. And sorting the data according to the importance scores, selecting the feature vectors of the first 9 feature construction data with scores greater than 0.8, and obtaining a new fraud telephone user ticket data set which can be used for subsequent experiments.
2. According to the fact that the user call ticket data is typical unbalanced data, the invention provides a method for sampling the data by using mixed sampling, and the data processed by the method 1 is used as input. Setting a sampling ratio r according to the unbalanced ratio of the normal telephone samples to the fraud telephone samples, and setting the number of the normal telephone samples as p and the number of the fraud telephone samples as q, then
Figure GDA0003088841390000033
One of the sample points s is selected i Calculating s using Euclidean distance i Obtaining r neighbors of the r minority sample points near the r minority sample points; for each few classes of fraud phone samples s c Randomly taking several samples from its r nearest neighbor samples
Figure GDA0003088841390000034
Where r ∈ {1,2, 3.. a },
Figure GDA0003088841390000035
representing a sample s c All around except for c For each selected neighboring sample, other than the sample point
Figure GDA0003088841390000036
According to s with the original sample new =s c +rand(0,1)×(s c '-s c ) Synthesis of a novel sample s new Where rand (0,1) is a function generating a random number between 0 and 1, s c ' denotes each randomly selected neighbor sample. The newly synthesized sample s new Adding the data into the original data set to form a new sample set; in the invention, 107,935 bars are used as normal telephones, 8,448 bars are used as fraud telephones, and 116,383 bars are used in total, and 107,007 bars are used as normal telephones, 104,059 bars are used as fraud telephones, and 211,066 bars are used in total after the normal telephones are processed by the method.
3. According to the characteristics of fraud telephone user ticket data, the invention innovatively provides a fraud telephone identification model established by combining a gradient-based unilateral sampling and feature binding lifting tree algorithm, meanwhile, a random forest-based hyper-parameter optimization algorithm is used for optimizing the parameters of a gradient lifter, the fraud telephone identification model is established, and the model performance is judged by using various indexes of accuracy, recall rate, F1 value and AUC value.
Figure GDA0003088841390000037
Figure GDA0003088841390000041
Wherein True Positive (TP) represents the number of fraud telephones predicted as fraud telephones, false positive is the number of normal telephones predicted as fraud telephones, false negative is the number of fraud telephones predicted as normal telephones, and true negative is the number of normal telephones predicted as normal telephones.
The accuracy (Precision) is a ratio of the samples predicted to be fraudulent calls, which are originally fraudulent calls, and is expressed by a mathematical formula as shown in the following formula (6).
Figure GDA0003088841390000042
The Recall rate (Recall) is a ratio of the fraud calls predicted from the samples originally identified as fraud calls, and is expressed by a mathematical formula as shown in the following equation (7).
Figure GDA0003088841390000043
F1 is a new evaluation index F-measure of harmonizing accuracy and recall, abbreviated as F1, and the specific mathematical formula is shown in the following formula (8).
Figure GDA0003088841390000044
The AUC is the area under the ROC curve, which is a curve made from the results predicted by the algorithm, the ratio of the samples that are originally normal phones to be predicted as fraudulent phones and the ratio that is originally fraudulent phones to be predicted as fraudulent phones, and the specific mathematical formula is shown in the following formula (9). Wherein S min Indicating the number of fraudulent calls, S maj Indicating the number of normal calls and,
Figure GDA0003088841390000045
represents the serial number of the ith sample,
Figure GDA0003088841390000046
indicating that the fraudulent telephone numbers are added up.
Figure GDA0003088841390000047
4. The fraud phone identification model in step 3 is characterized in that the model can output the confidence level of each sample, the probability that the fraud phone is a fraud phone can be judged according to the confidence level output by the model, a fraud phone discrimination threshold is set according to the confidence level that the sample is a fraud phone and the sample true tag data result, and a fraud phone dynamic identification interval is constructed. The working flow of the fraud phone dynamic identification interval model is as follows,
step 4.1: preparing the data of 107,007 normal telephones, 104,059 normal telephones and 211,066 normal telephones obtained after 1,2 processing;
and 4.2: randomly dividing the data obtained in the step 4.1 into 10 parts, and taking 8 parts of the data to be used for training the model and 2 parts of the data to be used for testing the model;
step 4.3: continuously optimizing the model by using a random forest-based hyper-parametric optimization algorithm until a plurality of evaluation indexes of the model on a training set and a test set, such as accuracy, recall rate, F1 value and AUC value, are all greater than 0.9;
step 4.4: outputting the confidence coefficient y of the training sample by using the model trained in the steps 4.2 and 4.3;
step 4.5: drawing a sample scatter diagram, analyzing the difference and the sameness between the confidence coefficient of each sample and the true label of the sample, and obtaining a fraud telephone identification dynamic interval of which alpha is more than or equal to 0 and beta is less than or equal to 1, wherein alpha is 0.2, and beta is 0.8. When the model output result is more than or equal to y and less than or equal to alpha, the sample is a normal telephone; when the model outputs a result alpha < y < beta, the sample is a suspicious telephone; when the model output result beta is less than y and less than or equal to 1, the sample is a fraud phone;
step 6: testing and verifying the effect of the model by using the remaining 2 test sets divided in the step 2;
and 7: and (6) ending.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive labor.
FIG. 1 shows a system block diagram of the present invention;
FIG. 2 is a graph showing a portion of the test results of the present invention;
FIG. 3 illustrates a partial sample distribution statistical plot of the present invention;
FIG. 4 shows a partial sample distribution density histogram of the invention;
Detailed Description
For better understanding of the technical solutions of the present invention, the following detailed descriptions of the embodiments of the present invention are provided with reference to the accompanying drawings.
It should be understood that the described embodiments are only some embodiments of the invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the examples of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
In order to realize the fraud telephone identification system, the technical scheme adopted by the invention is a method for constructing a fraud telephone dynamic identification interval, the overall system result diagram of the invention is shown in figure 1, and the method is divided into five steps in total:
(1) data preprocessing: the invention needs to take the user call ticket log data as input data to process the user call ticket log data, wherein the processing comprises missing value processing, abnormal value processing, uniform specification and repeated value deletion; secondly, in order to reduce the influence of the data dimension on the subsequent model, the data is standardized, and finally, a preprocessed data set is output.
(2) Feature extraction: the method comprises the steps of calculating information gain of each dimension characteristic in a data set by a fraud telephone user ticket data characteristic extraction method based on random forests, constructing node splitting of each tree according to the information gain, finally calculating the score of each dimension data, sorting the data according to the scores, and selecting the first 9 characteristics with the scores larger than 0.8 to construct a characteristic vector of the data.
(3) Unbalanced data rebalancing: using the data processed in the steps (1) and (2) as input, setting sampling ratio r according to the unbalanced ratio of normal telephone and fraud telephone samples, setting the number of normal telephone samples as p and the number of fraud telephone samples as q, then
Figure GDA0003088841390000061
One of the sample points s is selected i Calculating s using Euclidean distance i Obtaining r neighbors of the r minority sample points near the r minority sample points; for each few classes of fraud phone samples s c Randomly taking several samples from its r nearest neighbor samples
Figure GDA0003088841390000062
Where r ∈ {1,2, 3.. a },
Figure GDA0003088841390000063
representing a sample s c All around except for c For each selected neighboring sample, other than the sample point
Figure GDA0003088841390000064
According to s with the original sample new =s c +rand(0,1)×(s c '-s c ) Synthesis of a novel sample s new Where rand (0,1) is a function generating a random number between 0 and 1, s c ' denotes each randomly selected neighbor sample. The newly synthesized sample s new Adding the data into the original data set to form a new sample set; in the invention, 107,935 normal telephones, 8,448 fraud telephones and 116,383 total, 107,007 normal telephones, 104,059 fraud telephones and 211,066 total are processed by the method.
(4) Building a fraud telephone identification model: and randomly dividing the data in the last step into 10 parts, randomly taking 8 parts as a training set, training a constructed gradient-based unilateral sampling and feature binding lifting tree algorithm model as input data of the model, outputting the accuracy, the recall rate, the F1 value and the AUC value of sample identification by the model according to the fraud telephone identification model, continuously optimizing the model by using a random forest-based hyper-parametric optimization algorithm, and testing the effect of the model by using the rest 2 parts of data until a plurality of evaluation indexes of the accuracy, the recall rate, the F1 value and the AUC value of the model on the training set and the testing set are all greater than 0.9.
(5) Building a fraud call dynamic identification interval: and (3) dividing the new data set formed in the step 3 into 10 parts by using the fraud telephone recognition model constructed in the previous step, randomly taking 8 parts as a training set, taking 2 parts as a test set, firstly taking the training set as an input, analyzing the model to output a sample confidence coefficient, and constructing a fraud telephone recognition dynamic interval of 0 & lt alpha & lt beta & lt 1 according to the sample confidence coefficient, wherein alpha is 0.2, and beta is 0.8. When the model output result is more than or equal to y and less than or equal to alpha, the sample is a normal telephone; when the model outputs a result alpha < y < beta, the sample is a suspicious telephone; and when the model output result beta is less than or equal to 1, the sample is a fraud phone, then the test set is used as the input of the model, and whether the output sample is a fraud phone or not is compared with the true label of the sample. The experiment uses the fraud telephone bill data disclosed by Liuming and the like to test the method, and partial test results are shown in figure 2. The partial sample distribution statistical graph is shown in fig. 3, the partial sample distribution density statistical graph is shown in fig. 4, and it can be seen from the sample distribution statistical graph that the confidence of the model on normal telephone bill data is mostly below 0.2, the confidence of the model on fraud telephone bill data is mostly above 0.8 when the confidence is before 0.2 to 0.8, and the reasonableness and feasibility of the dynamic identification interval α of 0.2 and β of 0.8 proposed by the present invention are verified through experiments on the data set.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (3)

1. A method for constructing a dynamic identification interval of a fraud phone is characterized by comprising the following steps,
step 1: performing feature extraction on fraud telephone user bill data based on a random forest;
step 2: according to the data processed in the step 1, the data is rebalanced by using a hybrid sampling method, so that the influence on the model caused by unbalanced distribution of the data is reduced;
and step 3: according to the characteristics of the phone bill data of the fraud phone user, a fraud phone identification model is constructed, and the model identification effect is measured by using various evaluation indexes;
and 4, step 4: according to the step 3, judging the probability that the data sample is a fraud phone by using a fraud phone identification model, and constructing a fraud phone dynamic identification interval;
wherein the step 1 specifically comprises the following steps: calculating the information gain of each dimension characteristic in the data set, constructing node splitting of each tree according to the information gain, and finally calculating the score of each dimension data; using original fraud telephone user bill data as input, using VIM to represent the importance measurement of variables, and using GI to represent the Gini index;
the training data set S with n examples is defined as:
S={s i },i=1,2,...,n (1)
wherein s is i Represents any sample point in the sample set, n represents that the sample set contains n sample points, s i The definition of (A) is shown as formula 2;
s i =(x i ,y i ),i=1,2,...,n (2)
wherein x is i ={v 1 ,v 2 ,...,v w Is an example, v j Is represented by x i Characteristic of the sample, y i X represents a corresponding x i The data of the label (C) is divided into normal telephone user bill data and fraud telephone user bill data, namely C is 2;
using data dimension as data desensitization mobile phone number v 1 Called mobile phone number v 2 Frequency v of conversation 3 Ratio v of successful connections 4 Average call duration v 5 Average duration v of the ring tone 6 Call type v 7 Calling time v 8 Duration of call v 9 Ratio v of hung up calls 10 Condition v of the mobile phone 11 Time of conversation v 12 A field; i.e., w-12;
GI kini index is defined as:
Figure FDA0003807435700000011
wherein K represents K classes, p mk Represents the proportion of the class k in the node m, p mk' Representing the proportion of the nodes m with the classes not being k;
VIM feature importance is defined as:
Figure FDA0003807435700000021
wherein, GI left And GI right GI indexes of a left branch node and a right branch node of the m node are respectively represented;
finally, all feature importance measures are normalized; feature v for any fraudulent call i Its characteristic importance is VIM i The standardized calculation formula of the importance degree in this period is shown as a formula (5);
Figure FDA0003807435700000022
wherein, Sigma VIM represents the sum of feature importance of 12 features; and sequencing the data according to the feature importance, selecting feature vectors of the first 9 feature construction data with scores greater than 0.8, and obtaining a new fraud telephone user ticket data set which can be used for subsequent experiments.
2. The method according to claim 1, characterized in that the data is rebalanced using a hybrid sampling method, specifically: setting a sampling ratio r according to the unbalanced ratio of the normal telephone samples to the fraud telephone samples, wherein if the number of the normal telephone samples is p and the number of the fraud telephone samples is q, then
Figure FDA0003807435700000023
One of the sample points s is selected i Calculating the distance from si to the few class sample points nearby by using the Euclidean distance,obtaining r neighbor thereof; for each few classes of fraud phone samples s c Randomly taking several samples from its r nearest neighbor samples
Figure FDA0003807435700000024
Where r ∈ {1,2, 3.. a },
Figure FDA0003807435700000025
representing a sample s c All around except for c For each selected neighboring sample, other than the sample point
Figure FDA0003807435700000026
According to s with the original sample new =s c +rand(0,1)×(s c '-s c ) Synthesis of a novel sample s new Where rand (0,1) is a function generating a random number between 0 and 1, s c ' represents each randomly selected neighbor sample; the newly synthesized sample s new And adding the data into the original data set to form a new sample set.
3. The method according to claim 1, wherein step 4 is specifically:
step 4.1: inputting a new sample set;
and 4.2: randomly dividing the data obtained in the step 4.1, wherein one part of the data is used for training the model, and the other part of the data is used for testing the model;
step 4.3: continuously optimizing the model by using a random forest-based hyper-parametric optimization algorithm until a plurality of evaluation indexes of accuracy, Recall rate, F1 value and AUC value of the model on a training set and a testing set are all greater than 0.9, wherein the accuracy Precision refers to the proportion of fraud telephones in samples predicted as fraud telephones, and the Recall rate Recall refers to the proportion of fraud telephones in samples predicted as fraud telephones; f1 is a new evaluation index F-measure of harmonizing accuracy and recall, F1 for short, and AUC is the area under the ROC curve, wherein the ROC curve is a curve made from the results predicted by the algorithm, the ratio of the samples originally being normal telephones predicted as fraudulent telephones and the ratio of the samples originally being fraudulent telephones predicted as fraudulent telephones;
step 4.4: outputting the confidence coefficient y of the training sample by using the model trained in the steps 4.2 and 4.3;
step 4.5: drawing a sample scatter diagram, analyzing the difference and the sameness between the confidence coefficient of each sample and the true label of the sample, and obtaining a fraud telephone identification dynamic interval of which alpha is more than or equal to 0 and beta is less than or equal to 1, wherein alpha is 0.2, and beta is 0.8; when the output result of the model is more than or equal to 0 and less than or equal to y and less than or equal to alpha, the sample is a normal telephone; when the model outputs a result alpha < y < beta, the sample is a suspicious telephone; when the model outputs a result β < y ≦ 1, the sample is a fraudulent call.
CN202110073654.7A 2021-01-20 2021-01-20 Method for constructing dynamic identification interval of fraud telephone Active CN113163057B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110073654.7A CN113163057B (en) 2021-01-20 2021-01-20 Method for constructing dynamic identification interval of fraud telephone

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110073654.7A CN113163057B (en) 2021-01-20 2021-01-20 Method for constructing dynamic identification interval of fraud telephone

Publications (2)

Publication Number Publication Date
CN113163057A CN113163057A (en) 2021-07-23
CN113163057B true CN113163057B (en) 2022-09-30

Family

ID=76878732

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110073654.7A Active CN113163057B (en) 2021-01-20 2021-01-20 Method for constructing dynamic identification interval of fraud telephone

Country Status (1)

Country Link
CN (1) CN113163057B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115022464A (en) * 2022-05-06 2022-09-06 中国联合网络通信集团有限公司 Number processing method, system, computing device and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106686264A (en) * 2016-11-04 2017-05-17 国家计算机网络与信息安全管理中心 Method and system for fraud call screening and analyzing
CN107506776A (en) * 2017-01-16 2017-12-22 恒安嘉新(北京)科技股份公司 A kind of analysis method of fraudulent call number
CN108093405A (en) * 2017-11-06 2018-05-29 北京邮电大学 A kind of fraudulent call number analysis method and apparatus
US10045218B1 (en) * 2016-07-27 2018-08-07 Argyle Data, Inc. Anomaly detection in streaming telephone network data
CN109447180A (en) * 2018-11-14 2019-03-08 山东省通信管理局 A kind of fooled people's discovery method of the telecommunication fraud based on big data and machine learning
CN110147430A (en) * 2019-04-25 2019-08-20 上海欣方智能系统有限公司 Harassing call recognition methods and system based on random forests algorithm
CN110378364A (en) * 2019-05-29 2019-10-25 上海欣方智能系统有限公司 Ticket swindles method of model identification and system
CN110401780A (en) * 2018-04-25 2019-11-01 中国移动通信集团广东有限公司 A kind of method and device identifying fraudulent call

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10045218B1 (en) * 2016-07-27 2018-08-07 Argyle Data, Inc. Anomaly detection in streaming telephone network data
CN106686264A (en) * 2016-11-04 2017-05-17 国家计算机网络与信息安全管理中心 Method and system for fraud call screening and analyzing
CN107506776A (en) * 2017-01-16 2017-12-22 恒安嘉新(北京)科技股份公司 A kind of analysis method of fraudulent call number
CN108093405A (en) * 2017-11-06 2018-05-29 北京邮电大学 A kind of fraudulent call number analysis method and apparatus
CN110401780A (en) * 2018-04-25 2019-11-01 中国移动通信集团广东有限公司 A kind of method and device identifying fraudulent call
CN109447180A (en) * 2018-11-14 2019-03-08 山东省通信管理局 A kind of fooled people's discovery method of the telecommunication fraud based on big data and machine learning
CN110147430A (en) * 2019-04-25 2019-08-20 上海欣方智能系统有限公司 Harassing call recognition methods and system based on random forests algorithm
CN110378364A (en) * 2019-05-29 2019-10-25 上海欣方智能系统有限公司 Ticket swindles method of model identification and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于随机森林算法的移动电话骚扰号码识别策略研究;李家樑;《通信设计与应用》;20190831;93-94 *

Also Published As

Publication number Publication date
CN113163057A (en) 2021-07-23

Similar Documents

Publication Publication Date Title
CN111181939B (en) Network intrusion detection method and device based on ensemble learning
CN109615116B (en) Telecommunication fraud event detection method and system
CN109451182B (en) Detection method and device for fraud telephone
CN110298663B (en) Fraud transaction detection method based on sequence wide and deep learning
CN111915437B (en) Training method, device, equipment and medium of money backwashing model based on RNN
CN109491914B (en) High-impact defect report prediction method based on unbalanced learning strategy
CN110138784A (en) A kind of Network Intrusion Detection System based on feature selecting
CN109474756B (en) Telecommunication anomaly detection method based on collaborative network representation learning
CN113961712B (en) Knowledge-graph-based fraud telephone analysis method
CN110162970A (en) A kind of program processing method, device and relevant device
CN111695597B (en) Credit fraud group identification method and system based on improved isolated forest algorithm
CN109547393B (en) Malicious number identification method, device, equipment and storage medium
CN115577858B (en) Block chain-based carbon emission prediction method and device and electronic equipment
CN113163057B (en) Method for constructing dynamic identification interval of fraud telephone
CN104217088B (en) The optimization method and system of operator&#39;s mobile service resource
CN112464058A (en) XGboost algorithm-based telecommunication internet fraud identification method
CN112866486A (en) Multi-source feature-based fraud telephone identification method, system and equipment
CN115987552A (en) Network intrusion detection method based on deep learning
CN115577357A (en) Android malicious software detection method based on stacking integration technology
CN112364901A (en) LGB algorithm-based fraud call identification method
CN109460872B (en) Mobile communication user loss imbalance data prediction method
CN116597197A (en) Long-tail target detection method capable of adaptively eliminating negative gradient of classification
CN116192530A (en) Unknown threat self-adaptive detection method based on deceptive defense
CN111144430A (en) Genetic algorithm-based card number identification method and device
CN111930808B (en) Method and system for improving blacklist accuracy by using key value matching model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant