CN114205462A - Fraud telephone identification method, device, system and computer storage medium - Google Patents

Fraud telephone identification method, device, system and computer storage medium Download PDF

Info

Publication number
CN114205462A
CN114205462A CN202111526088.7A CN202111526088A CN114205462A CN 114205462 A CN114205462 A CN 114205462A CN 202111526088 A CN202111526088 A CN 202111526088A CN 114205462 A CN114205462 A CN 114205462A
Authority
CN
China
Prior art keywords
sample data
detection model
data set
model
training sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111526088.7A
Other languages
Chinese (zh)
Inventor
王晨
包森成
余娜
徐强
王健
葛胜利
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN202111526088.7A priority Critical patent/CN114205462A/en
Publication of CN114205462A publication Critical patent/CN114205462A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/22Arrangements for supervision, monitoring or testing
    • H04M3/2281Call monitoring, e.g. for law enforcement purposes; Call tracing; Detection or prevention of malicious calls
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computer Security & Cryptography (AREA)
  • Technology Law (AREA)
  • Signal Processing (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention discloses a fraud telephone identification method, device, system and computer storage medium. Wherein, the method comprises the following steps: acquiring a training sample data set and a test sample data set in a current scene; carrying out model training on a plurality of first features obtained by carrying out multi-dimensional feature extraction on a training sample data set through a random forest algorithm to obtain a detection model; inputting a test sample data set into a detection model and performing parameter optimization on the detection model to obtain an updated detection model and a model prediction result; evaluating the model prediction result according to a plurality of evaluation indexes, and judging whether the detection model is feasible or not; when the detection model is feasible, inputting a plurality of second features obtained by extracting the multidimensional features of the number to be predicted into the updated detection model for prediction to obtain the probability P that the number to be predicted is abnormal; and comparing the probability P with a preset threshold value, and judging whether the number to be predicted is abnormal or not according to the comparison result. The method has long timeliness and high accuracy.

Description

Fraud telephone identification method, device, system and computer storage medium
Technical Field
The invention relates to the technical field of network security, in particular to a fraud telephone identification method, device, system and computer storage medium.
Background
In the prior art, the number card management aiming at telecommunication fraud is mainly researched and judged based on a list library and a business rule. The first method of filtering the number card through the black and white list mechanism has effectiveness mainly depending on the effectiveness of the list library, which usually enters the system after the event, and both effectiveness of the judgment and comprehensiveness of the card capturing related to the fraud are obvious. The other is to analyze the service data based on a history blacklist and extract strong service rules of region attributes, frequency attributes and the like, and the research and judgment mode of the service rules fully depends on the expert experience, so that the problems of difficult maintenance, unpredictable interception accuracy and the like exist.
Aiming at the problems of short time efficiency, incompleteness and low accuracy and difficult maintenance in the method for filtering the number card through a blacklist mechanism in the prior art by depending on expert experience, an effective solution is not provided at present.
Disclosure of Invention
The embodiment of the invention provides a fraud telephone identification method, a fraud telephone identification device, a fraud telephone identification system and a computer storage medium, which are used for solving the problems of short timeliness and incompleteness existing in a method for filtering a number card through a blacklist mechanism in the prior art, low accuracy and difficulty in maintenance existing in research and judgment by depending on expert experience.
To achieve the above object, in one aspect, the present invention provides a fraud phone identification method, including: acquiring a training sample data set and a test sample data set in a current scene; performing multi-dimensional feature extraction on the training sample data set to obtain a plurality of first features; performing model training on the plurality of first characteristics through a random forest algorithm to obtain a detection model; inputting the test sample data set into the detection model and performing parameter optimization on the detection model to obtain an updated detection model and a model prediction result; evaluating the model prediction result according to a plurality of evaluation indexes, and judging whether the detection model is feasible or not according to the evaluation result; when the detection model is feasible, extracting the multidimensional characteristics of the number to be predicted, and inputting the extracted second characteristics into the updated detection model for prediction to obtain the probability P that the number to be predicted is abnormal; and comparing the probability P with a preset threshold value, and judging whether the telephone number to be predicted is abnormal or not according to a comparison result.
Optionally, the evaluating the model prediction result according to the multiple evaluation indexes, and determining whether the detection model is feasible according to the evaluation result includes: when the evaluation value of each evaluation index on the model prediction result is greater than 90 minutes, the detection model is determined to be feasible.
Optionally, the multidimensional features at least include: a call feature, a short message feature, and a traffic feature.
Optionally, the performing multidimensional feature extraction on the training sample data set to obtain a plurality of first features includes: screening the training sample data set, and screening out a training sample data subset with a higher negative sample proportion in the training sample data set; and carrying out multi-dimensional feature extraction on the training sample data subset to obtain the plurality of first features.
Optionally, the scenario at least includes: a silent card revival scene, an abnormal roaming fraud scene, and a new card opening fraud scene.
In another aspect, the present invention provides a fraud telephone identification apparatus, comprising:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a training sample data set and a test sample data set in a current scene; the training unit is used for carrying out multi-dimensional feature extraction on the training sample data set to obtain a plurality of first features; performing model training on the plurality of first characteristics through a random forest algorithm to obtain a detection model; the updating unit is used for inputting the test sample data set into the detection model and carrying out parameter optimization on the detection model to obtain an updated detection model and a model prediction result; the evaluation unit is used for evaluating the model prediction result according to a plurality of evaluation indexes and judging whether the detection model is feasible according to the evaluation result; the prediction unit is used for extracting multi-dimensional features of the number to be predicted when the detection model is feasible, inputting a plurality of extracted second features into the updated detection model for prediction, and obtaining the probability P that the telephone number to be predicted is abnormal; and the judging unit is used for comparing the probability P with a preset threshold value and judging whether the telephone number to be predicted is abnormal or not according to a comparison result.
Optionally, the evaluation unit includes: and the evaluation subunit is used for judging that the detection model is feasible when the evaluation value of each evaluation index on the model prediction result is greater than 90 minutes.
Optionally, the training unit includes: the screening subunit is used for screening the training sample data set and screening out a training sample data subset with a higher negative sample percentage in the training sample data set; and the extraction subunit is used for performing multi-dimensional feature extraction on the training sample data subset to obtain the plurality of first features.
In another aspect, the invention further provides a fraud telephone identification system, which comprises the fraud telephone identification device.
In another aspect, the present invention also provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the above-mentioned fraud telephone identification method.
The invention has the beneficial effects that:
the invention provides a fraud telephone identification method, which comprises the following steps: acquiring a training sample data set and a test sample data set in a current scene; performing multi-dimensional feature extraction on the training sample data set to obtain a plurality of first features; performing model training on the plurality of first characteristics through a random forest algorithm to obtain a detection model; inputting the test sample data set into the detection model and performing parameter optimization on the detection model to obtain an updated detection model and a model prediction result; evaluating the model prediction result according to a plurality of evaluation indexes, and judging whether the detection model is feasible or not according to the evaluation result; when the detection model is feasible, extracting the multidimensional characteristics of the number to be predicted, and inputting a plurality of extracted second characteristics into the updated detection model for prediction to obtain the probability P that the number to be predicted is abnormal; and comparing the probability P with a preset threshold value, and judging whether the telephone number to be predicted is abnormal or not according to a comparison result.
In the method, the accuracy of detection can be improved by carrying out multi-dimensional feature extraction; the method has the advantages that the training sample data set is subjected to model training through a random forest algorithm to obtain a detection model, numbers to be predicted are input into the detection model in real time for prediction, detection comprehensiveness can be guaranteed, and long timeliness and convenience in subsequent maintenance are guaranteed through the method.
Drawings
FIG. 1 is a flow chart of a fraudulent call identification method provided by an embodiment of the present invention;
FIG. 2 is a flow chart of obtaining a plurality of first features according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a fraudulent telephone identification device provided by the embodiment of the present invention;
fig. 4 is a schematic structural diagram of a training unit according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of them. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.
In the prior art, the number card management aiming at telecommunication fraud is mainly researched and judged based on a list library and a business rule. The first method of filtering the number card through the black and white list mechanism has effectiveness mainly depending on the effectiveness of the list library, which usually enters the system after the event, and both effectiveness of the judgment and comprehensiveness of the card capturing related to the fraud are obvious. The other is to analyze the service data based on a history blacklist and extract strong service rules of region attributes, frequency attributes and the like, and the research and judgment mode of the service rules fully depends on the expert experience, so that the problems of difficult maintenance, unpredictable interception accuracy and the like exist.
Thus, the present invention provides a fraud phone identification method, fig. 1 is a flow chart of a fraud phone identification method provided by an embodiment of the present invention, as shown in fig. 1, the method includes:
s101, acquiring a training sample data set and a test sample data set in a current scene;
in an optional embodiment, the scenario includes at least: a silent card revival scene, an abnormal roaming fraud scene, and a new card opening fraud scene.
The following is illustrated by an asynchronous roaming fraud scenario:
and acquiring a training sample data set and a test sample data set in the scene.
S102, performing multi-dimensional feature extraction on the training sample data set to obtain a plurality of first features; performing model training on the plurality of first characteristics through a random forest algorithm to obtain a detection model;
the performing multi-dimensional feature extraction on the training sample data set to obtain a plurality of first features comprises:
s1021, screening the training sample data set, and screening out a training sample data subset with a higher negative sample proportion in the training sample data set;
for example: the distribution difference of the silent period and the active period of the call and traffic activities between normal users and fraud-related users is large, and the positive sample is far higher than the negative sample. The conversation silence period is between one month and two months, and the proportion of the fraud-related users is 1.98 times that of the normal users. Similarly, the traffic silencing period is between 14 days and 1 month, and the percentage of the fraud-related users is 1.2 times that of the normal users.
The active period is defined in terms of consecutive active days, i.e. the active period is calculated in terms of the number of days between the fraud telephone number entering the active state and the temporary cessation of consecutive activity. The active period for a fraud-related user is significantly less than for a normal user. 94.0% of the fraud-related users have the longest continuous active days not exceeding 30 days; while only 7.25% of normal users have no more than 7 days of continuous activity, and 62.88% of normal users have more than 30 days of continuous activity.
Therefore, when the training sample data set is screened, the training sample data subset is screened by adopting 30 days of silent conversation or 14 days of silent flow or no more than 7 days of continuous activity. All screened samples are positive samples, so that the negative sample ratio is higher, and the sample imbalance is obviously reduced.
And S1022, performing multi-dimensional feature extraction on the training sample data subset to obtain the plurality of first features.
Specifically, the multidimensional characteristics at least include: a call feature, a short message feature, and a traffic feature.
The call feature at least comprises: calling/called frequency ratio; only the called has no calling/more called than calling; a duration of time; base station dispersion of calling/called; incoming/outgoing number dispersion; peak call frequency/fluctuation rate; the communication activity; local/roaming call calling/called frequency; roaming city dispersion; talk period/talk duration preference.
The short message characteristics comprise: the operation frequency of other short messages except the short message receiving and sending is carried out; sending short messages by the ratio of times; local sending short message frequency; the dispersion of the number of the opposite terminal sent by the short message; the dispersion of the numbers of the opposite terminals of all the short message operations; the divergence of the opposite terminal numbers of other short message operations except the short message sending and receiving.
The flow characteristics include: number of active hours of cross-provincial roaming traffic; number of active hours of intra-provincial roaming traffic; the variance of the provinces of the flow; the dispersion of a flow base station; the ratio of the uplink flow; the downlink flow rate is proportional to the activity of the flow rate behavior; dispersion of upstream flow fluctuation; and (4) descending flow fluctuation dispersion.
Performing model training on the obtained first characteristics through a random forest algorithm to obtain a detection model; in the invention, because the screened training sample data subset is reduced compared with the training sample data set, when the multi-dimensional characteristic extraction is carried out on the training sample data subset subsequently, the first characteristic is reduced, thereby reducing the data processing difficulty and accelerating the subsequent model training process.
The random forest is to integrate the results of multiple decision trees, each tree randomly selects a part of the number of first features and a part of the number of first feature attributes to make a decision, and the final result is generated by voting of the multiple decision trees.
S103, inputting the test sample data set into the detection model and performing parameter optimization on the detection model to obtain an updated detection model and a model prediction result;
in an optional implementation manner, the test sample data set is input into the detection model for prediction, and a model prediction result is obtained; and meanwhile, the detection model is subjected to parameter optimization by using methods such as grid search, random search and the like to obtain an updated detection model.
S104, evaluating the model prediction result according to a plurality of evaluation indexes, and judging whether the detection model is feasible or not according to the evaluation result;
specifically, the evaluation index includes at least: precision evaluation index, recall evaluation index, F1-score (harmonic mean of precision and recall) evaluation index. And when the evaluation value of each evaluation index on the model prediction result is greater than 90 minutes, judging that the detection model is feasible.
S105, when the detection model is feasible, extracting multidimensional characteristics of the telephone number to be predicted, inputting a plurality of extracted second characteristics into the updated detection model for prediction, and obtaining the probability P that the telephone number to be predicted is abnormal;
s106, comparing the probability P with a preset threshold value, and judging whether the telephone number to be predicted is abnormal or not according to a comparison result.
In an alternative embodiment, the prediction result of the telephone number to be predicted is explained by using an Eli5 algorithm in the invention. Because the second feature has multiple dimensions and different contribution degrees (namely abnormal expressions) of different second features, the contribution degrees of the second features are arranged in a reverse order, and the second features corresponding to the first contribution degrees are main features influencing the prediction result of the telephone number to be predicted.
Fig. 3 is a schematic structural diagram of a fraud telephone identification apparatus provided in an embodiment of the present invention, as shown in fig. 3, the apparatus includes:
an obtaining unit 201, configured to obtain a training sample data set and a test sample data set in a current scene;
in an optional embodiment, the scenario includes at least: a silent card revival scene, an abnormal roaming fraud scene, and a new card opening fraud scene.
The following is illustrated by an asynchronous roaming fraud scenario:
and acquiring a training sample data set and a test sample data set in the scene.
A training unit 202, configured to perform multidimensional feature extraction on the training sample data set to obtain multiple first features; performing model training on the plurality of first characteristics through a random forest algorithm to obtain a detection model;
in an alternative implementation manner, fig. 4 is a schematic structural diagram of a training unit provided in an embodiment of the present invention, and as shown in fig. 4, the training unit 202 includes:
a screening subunit 2021, configured to screen the training sample data set, and screen out a training sample data subset with a higher negative sample percentage in the training sample data set;
for example: the distribution difference of the silent period and the active period of the call and traffic activities between normal users and fraud-related users is large, and the positive sample is far higher than the negative sample. The conversation silence period is between one month and two months, and the proportion of the fraud-related users is 1.98 times that of the normal users. Similarly, the traffic silencing period is between 14 days and 1 month, and the percentage of the fraud-related users is 1.2 times that of the normal users.
The active period is defined in terms of consecutive active days, i.e. the active period is calculated in terms of the number of days between the fraud telephone number entering the active state and the temporary cessation of consecutive activity. The active period for a fraud-related user is significantly less than for a normal user. 94.0% of the fraud-related users have the longest continuous active days not exceeding 30 days; while only 7.25% of normal users have no more than 7 days of continuous activity, and 62.88% of normal users have more than 30 days of continuous activity.
Therefore, when the training sample data set is screened, the training sample data subset is screened by adopting 30 days of silent conversation or 14 days of silent flow or no more than 7 days of continuous activity. All screened samples are positive samples, so that the negative sample ratio is higher, and the sample imbalance is obviously reduced.
The extracting subunit 2022 is configured to perform multidimensional feature extraction on the training sample data subset to obtain the plurality of first features.
Specifically, the multidimensional characteristics at least include: a call feature, a short message feature, and a traffic feature.
The call feature at least comprises: calling/called frequency ratio; only the called has no calling/more called than calling; a duration of time; base station dispersion of calling/called; incoming/outgoing number dispersion; peak call frequency/fluctuation rate; the communication activity; local/roaming call calling/called frequency; roaming city dispersion; talk period/talk duration preference.
The short message characteristics comprise: the operation frequency of other short messages except the short message receiving and sending is carried out; sending short messages by the ratio of times; local sending short message frequency; the dispersion of the number of the opposite terminal sent by the short message; the dispersion of the numbers of the opposite terminals of all the short message operations; the divergence of the opposite terminal numbers of other short message operations except the short message sending and receiving.
The flow characteristics include: number of active hours of cross-provincial roaming traffic; number of active hours of intra-provincial roaming traffic; the variance of the provinces of the flow; the dispersion of a flow base station; the ratio of the uplink flow; the downlink flow rate is proportional to the activity of the flow rate behavior; dispersion of upstream flow fluctuation; and (4) descending flow fluctuation dispersion.
Performing model training on the obtained first characteristics through a random forest algorithm to obtain a detection model; in the invention, because the screened training sample data subset is reduced compared with the training sample data set, when the multi-dimensional characteristic extraction is carried out on the training sample data subset subsequently, the first characteristic is reduced, thereby reducing the data processing difficulty and accelerating the subsequent model training process.
The random forest is to integrate the results of multiple decision trees, each tree randomly selects a part of the number of first features and a part of the number of first feature attributes to make a decision, and the final result is generated by voting of the multiple decision trees.
An updating unit 203, configured to input the test sample data set into the detection model and perform parameter optimization on the detection model to obtain an updated detection model and a model prediction result;
in an optional implementation manner, the test sample data set is input into the detection model for prediction, and a model prediction result is obtained; and meanwhile, the detection model is subjected to parameter optimization by using methods such as grid search, random search and the like to obtain an updated detection model.
The evaluation unit 204 is configured to evaluate the model prediction result according to a plurality of evaluation indexes, and determine whether the detection model is feasible according to the evaluation result;
specifically, the evaluation index includes at least: precision evaluation index, recall evaluation index, F1-score (harmonic mean of precision and recall) evaluation index. And when the evaluation value of each evaluation index on the model prediction result is greater than 90 minutes, judging that the detection model is feasible.
The prediction unit 205 is configured to, when the detection model is feasible, perform multidimensional feature extraction on the phone number to be predicted, and input a plurality of extracted second features into the updated detection model for prediction, so as to obtain a probability P that the phone number to be predicted is abnormal;
the determining unit 206 is configured to compare the probability P with a preset threshold, and determine whether the phone number to be predicted is abnormal according to a comparison result.
In an alternative embodiment, the prediction result of the telephone number to be predicted is explained by using an Eli5 algorithm in the invention. Because the second feature has multiple dimensions and different contribution degrees (namely abnormal expressions) of different second features, the contribution degrees of the second features are arranged in a reverse order, and the second features corresponding to the first contribution degrees are main features influencing the prediction result of the telephone number to be predicted.
The invention also provides a fraud telephone identification system which comprises the fraud telephone identification device.
The present invention also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the above-described fraud phone recognition method.
The storage medium stores the software, and the storage medium includes but is not limited to: optical disks, floppy disks, hard disks, erasable memory, etc.
The invention has the beneficial effects that:
the invention provides a fraud telephone identification method, which comprises the following steps: acquiring a training sample data set and a test sample data set in a current scene; performing multi-dimensional feature extraction on the training sample data set to obtain a plurality of first features; performing model training on the plurality of first characteristics through a random forest algorithm to obtain a detection model; inputting the test sample data set into the detection model and performing parameter optimization on the detection model to obtain an updated detection model and a model prediction result; evaluating the model prediction result according to a plurality of evaluation indexes, and judging whether the detection model is feasible or not according to the evaluation result; when the detection model is feasible, extracting the multidimensional characteristics of the number to be predicted, and inputting a plurality of extracted second characteristics into the updated detection model for prediction to obtain the probability P that the number to be predicted is abnormal; and comparing the probability P with a preset threshold value, and judging whether the telephone number to be predicted is abnormal or not according to a comparison result.
In the method, the accuracy of detection can be improved by carrying out multi-dimensional feature extraction; the method has the advantages that the training sample data set is subjected to model training through a random forest algorithm to obtain a detection model, numbers to be predicted are input into the detection model in real time for prediction, detection comprehensiveness can be guaranteed, and long timeliness and convenience in subsequent maintenance are guaranteed through the method.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may be modified or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions.

Claims (10)

1. A fraud telephone identification method, comprising:
acquiring a training sample data set and a test sample data set in a current scene;
performing multi-dimensional feature extraction on the training sample data set to obtain a plurality of first features; performing model training on the plurality of first characteristics through a random forest algorithm to obtain a detection model;
inputting the test sample data set into the detection model and performing parameter optimization on the detection model to obtain an updated detection model and a model prediction result;
evaluating the model prediction result according to a plurality of evaluation indexes, and judging whether the detection model is feasible or not according to the evaluation result;
when the detection model is feasible, carrying out multi-dimensional feature extraction on the telephone number to be predicted, and inputting a plurality of extracted second features into the updated detection model for prediction to obtain the probability P that the telephone number to be predicted is abnormal;
and comparing the probability P with a preset threshold value, and judging whether the telephone number to be predicted is abnormal or not according to a comparison result.
2. The method of claim 1, wherein the evaluating the model prediction result according to a plurality of evaluation indexes, and the determining whether the detection model is feasible according to the evaluation result comprises:
and when the evaluation value of each evaluation index on the model prediction result is greater than 90 minutes, judging that the detection model is feasible.
3. The method of claim 1, wherein:
the multi-dimensional features include at least: a call feature, a short message feature, and a traffic feature.
4. The method of claim 1, wherein the performing multi-dimensional feature extraction on the training sample data set to obtain a plurality of first features comprises:
screening the training sample data set, and screening out a training sample data subset with a higher negative sample proportion in the training sample data set;
and performing multi-dimensional feature extraction on the training sample data subset to obtain the plurality of first features.
5. The method of claim 1, wherein:
the scene at least comprises: a silent card revival scene, an abnormal roaming fraud scene, and a new card opening fraud scene.
6. A fraud telephone identification apparatus, comprising:
the acquisition unit is used for acquiring a training sample data set and a test sample data set in a current scene;
the training unit is used for carrying out multi-dimensional feature extraction on the training sample data set to obtain a plurality of first features; performing model training on the plurality of first characteristics through a random forest algorithm to obtain a detection model;
the updating unit is used for inputting the test sample data set into the detection model and carrying out parameter optimization on the detection model so as to obtain an updated detection model and a model prediction result;
the evaluation unit is used for evaluating the model prediction result according to a plurality of evaluation indexes and judging whether the detection model is feasible according to the evaluation result;
the prediction unit is used for extracting the multidimensional characteristics of the telephone number to be predicted when the detection model is feasible, inputting a plurality of extracted second characteristics into the updated detection model for prediction, and obtaining the probability P that the telephone number to be predicted is abnormal;
and the judging unit is used for comparing the probability P with a preset threshold value and judging whether the telephone number to be predicted is abnormal or not according to a comparison result.
7. The apparatus of claim 6, wherein the evaluation unit comprises:
and the evaluation subunit is used for judging that the detection model is feasible when the evaluation value of each evaluation index on the model prediction result is greater than 90 minutes.
8. The apparatus of claim 6, wherein the training unit comprises:
a screening subunit, configured to screen the training sample data set, and screen out a training sample data subset with a higher negative sample percentage in the training sample data set;
and the extraction subunit is used for performing multi-dimensional feature extraction on the training sample data subset to obtain the plurality of first features.
9. A fraud telephone identification system, comprising: the fraud telephone identification apparatus of any of claims 6-8.
10. A computer-readable storage medium on which a computer program is stored, the program, when executed by a processor, implementing the fraudulent call identification method as recited in any one of claims 1 to 5.
CN202111526088.7A 2021-12-14 2021-12-14 Fraud telephone identification method, device, system and computer storage medium Pending CN114205462A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111526088.7A CN114205462A (en) 2021-12-14 2021-12-14 Fraud telephone identification method, device, system and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111526088.7A CN114205462A (en) 2021-12-14 2021-12-14 Fraud telephone identification method, device, system and computer storage medium

Publications (1)

Publication Number Publication Date
CN114205462A true CN114205462A (en) 2022-03-18

Family

ID=80653469

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111526088.7A Pending CN114205462A (en) 2021-12-14 2021-12-14 Fraud telephone identification method, device, system and computer storage medium

Country Status (1)

Country Link
CN (1) CN114205462A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114549026A (en) * 2022-04-26 2022-05-27 浙江鹏信信息科技股份有限公司 Method and system for identifying unknown fraud based on algorithm component library analysis

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114549026A (en) * 2022-04-26 2022-05-27 浙江鹏信信息科技股份有限公司 Method and system for identifying unknown fraud based on algorithm component library analysis

Similar Documents

Publication Publication Date Title
CN109600752B (en) Deep clustering fraud detection method and device
WO2016197675A1 (en) Method and apparatus for identifying crank call
CN108462785B (en) Method and device for processing malicious call
CN109168168B (en) Method for detecting international embezzlement
CN101459718A (en) Rubbish voice filtering method based on mobile communication network and system thereof
CN107092651B (en) Key character mining method and system based on communication network data analysis
CN110611929A (en) Abnormal user identification method and device
CN101389085B (en) Rubbish short message recognition system and method based on sending behavior
CN114205462A (en) Fraud telephone identification method, device, system and computer storage medium
CN112351429B (en) Harmful information detection method and system based on deep learning
KR20170006158A (en) System and method for detecting fraud usage of message
CN109963292B (en) Complaint prediction method, complaint prediction device, electronic apparatus, and storage medium
CN109819125A (en) A kind of method and device limiting telecommunication fraud
CN111930808B (en) Method and system for improving blacklist accuracy by using key value matching model
CN114449106B (en) Method, device, equipment and storage medium for identifying abnormal telephone number
CN114168423A (en) Abnormal number calling monitoring method, device, equipment and storage medium
CN112153220B (en) Communication behavior identification method based on social evaluation dynamic update
CN113596260B (en) Abnormal telephone number detection method and electronic equipment
CN113517990B (en) Method and device for predicting net recommendation value NPS (network performance indicator)
CN112307075B (en) User relationship identification method and device
CN113645356A (en) Fraud telephone identification method and system based on in-network card opening behavior analysis
CN113411828A (en) Method, device and equipment for sensing call quality and computer readable storage medium
CN111242147A (en) Method and device for identifying close contact and frequent active area
CN111131626B (en) Group harmful call detection method and device based on stream data atlas and readable medium
CN113780407B (en) Data detection method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination