CN115130577A

CN115130577A - Method and device for identifying fraudulent number and electronic equipment

Info

Publication number: CN115130577A
Application number: CN202210747342.4A
Authority: CN
Inventors: 邹琴; 贺嘉; 何美斌; 张建国; 廖晓萍
Original assignee: China Telecom Corp Ltd
Current assignee: China Telecom Corp Ltd
Priority date: 2022-06-28
Filing date: 2022-06-28
Publication date: 2022-09-30

Abstract

The application relates to the technical field of artificial intelligence, in particular to a method, a device and electronic equipment for identifying a fraudulent number, wherein the method comprises the following steps: the method comprises the steps of obtaining a communication data set corresponding to a number to be detected, matching each communication data in the communication data set with each normal feature in a normal feature set and each abnormal feature in an abnormal feature set respectively, obtaining a normal feature parameter set and an abnormal feature parameter set corresponding to the communication data set, calculating a fraud probability value corresponding to the number to be detected based on each normal feature parameter in the normal feature parameter set and each abnormal feature parameter in the abnormal feature parameter set, and determining the number to be detected as the fraud number in response to the fraud probability value being larger than a preset fraud probability value. By the method, the communication data set is classified and detected, the fraud probability value of the number to be detected is determined, the detection efficiency of the number to be detected is improved, and the accuracy of identifying the number to be detected as the fraud number is ensured.

Description

Method and device for identifying fraudulent number and electronic equipment

Technical Field

The application relates to the technical field of artificial intelligence, in particular to a method and a device for identifying a fraudulent number and electronic equipment.

Background

With the development of artificial intelligence technology, the number of telecommunication fraud is increasing, and in order to prevent telecommunication fraud, the method adopted at present is: the method comprises the steps of obtaining attribute information of a number to be detected, wherein the attribute information comprises network access information, a customized package, call duration in a preset time period, short message statistics, flow statistics, an active time period and the like, analyzing the attribute information according to a telephone number prediction model, generating at least one randomly combined feature set based on the attribute information, detecting each combined feature set, determining a risk factor corresponding to each combined feature set, wherein the risk factor is used for indicating the probability that the number to be detected is a fraud number, determining a risk index of the number to be detected based on each risk factor, determining the number to be detected as the fraud number capable of generating adverse effects on a user when the risk index is greater than a risk threshold, and processing the fraud number when the number to be detected is the fraud number.

In the above-described method, the attribute information is offline data or real-time data, when the attribute information is offline data, because the offline data is data of one hour or one day ago, when it is recognized that the attribute information corresponding to the offline data contains fraud data, a fraud has ended, and thus a fraud call cannot be intercepted in time, and there is a case where the same feature appears in multiple combined features, so that the same feature is repeatedly detected many times, and thus it will take a lot of time to detect the combined feature set of each random combination, and when the attribute information is real-time data, it is impossible to determine that the number to be detected is a fraud number before the fraud ends.

Disclosure of Invention

The application provides a method and a device for identifying a fraudulent number and electronic equipment, which are used for improving the efficiency and the accuracy of detecting that the number to be detected is the fraudulent number, so that the number to be detected can be determined to be the fraudulent number before the end of fraudulent activities.

In a first aspect, the present application provides a method for identifying a fraudulent number, the method including:

obtaining a communication data set corresponding to a number to be detected;

matching each piece of communication data in the communication data set with each normal feature in a normal feature set and each abnormal feature in an abnormal feature set respectively to obtain a normal feature parameter set and an abnormal feature parameter set corresponding to the communication data set, wherein the normal feature parameter set comprises normal feature parameters associated with the normal features matched with the communication data, and the abnormal feature parameter set comprises abnormal feature parameters associated with the abnormal features matched with the communication data;

calculating fraud probability values corresponding to the numbers to be detected based on the normal characteristic parameters in the normal characteristic parameter set and the abnormal characteristic parameters in the abnormal characteristic parameter set;

and determining the number to be detected as a fraud number in response to the fraud probability value being greater than a preset fraud probability value.

In one possible design, matching each communication data in the communication data set with each normal feature in a normal feature set and each abnormal feature in an abnormal feature set respectively includes:

obtaining a training feature set, wherein the training feature set comprises a normal communication data set and an abnormal communication data set;

determining a weight value corresponding to each training feature in the training feature set, and associating each training feature with the weight value corresponding to each training feature to obtain an associated training feature set corresponding to the training feature set;

determining the normal feature set and the abnormal feature set based on each associated training feature meeting preset conditions in the associated training feature set;

and matching each communication data in the communication data set with each normal feature in the normal feature set and each abnormal feature in the abnormal feature set respectively.

In one possible design, determining a weight value corresponding to each training feature in the training feature set includes:

determining an initial weight value corresponding to each training feature in the training feature set;

inputting the training feature set and each initial weight value corresponding to the training feature set into a preset iterative model, and obtaining the weight value corresponding to each training feature in the training feature set.

In one possible design, obtaining a weight value corresponding to each training feature in the training feature set includes:

in response to the fact that the iteration times of the initial weight values reach preset iteration times, taking each current weight value corresponding to the training feature set as a weight value corresponding to each training feature of the training feature set; or

Determining a loss value of the training feature set, and in response to the loss value being smaller than a preset loss threshold value, taking a current weight value corresponding to the training feature set as a weight value corresponding to each training feature of the training feature set, wherein the loss value represents the accuracy of the training feature set in detecting the number to be detected as the fraud number.

In one possible design, determining the normal feature set and the abnormal feature set based on each associated training feature meeting a preset condition in the associated training feature set includes:

determining a first parameter of each associated training feature based on a first preset formula, and determining a second parameter of each associated training feature based on a second preset formula;

extracting each first associated training feature and each second associated training feature of which the first parameter is lower than a first preset threshold and the second parameter is lower than a second preset threshold, wherein the first associated training feature is a feature in a normal communication data set, and the second associated training feature is a feature in an abnormal communication data set;

and generating a normal feature set corresponding to the associated training feature set based on each first associated training feature, and generating an abnormal feature set corresponding to the associated training feature set based on each second associated training feature.

In one possible design, the first predetermined formula is as follows:

wherein, w _k The associated training characteristics are represented in a representation,

coefficient of variation, std (w), representing associated training features _k ) Means (w) representing the standard deviation of the associated training features corresponding to each subscriber number _k ) And the mean value of the associated training characteristics corresponding to each user number is represented.

In one possible design, the second predetermined formula is as follows:

representing associated training features w _k The stability of (a) is high,

represents the (n-1) th associated training feature w _k ，

Representing the nth associated training feature w _k ，

Represents the n-1 thThe stability of the associated training feature and the nth associated training feature.

In one possible design, matching each piece of communication data in the communication data set with each normal feature in a normal feature set and each abnormal feature in an abnormal feature set, to obtain a normal feature parameter set and an abnormal feature parameter set corresponding to the image data set, includes:

responding to the fact that each piece of communication data in the communication data set is respectively matched with the normal feature in the normal feature set, recording the normal feature parameters respectively corresponding to each normal feature, and generating a normal feature parameter set based on each normal feature parameter; and

and responding to the fact that each piece of communication data in the communication data set is respectively matched with the abnormal feature in the abnormal feature set, recording abnormal feature parameters respectively corresponding to each abnormal feature, and generating an abnormal feature parameter set based on each abnormal feature parameter.

In a second aspect, the present application provides a fraudulent number identification apparatus, comprising:

the obtaining module is used for obtaining a communication data set corresponding to the number to be detected;

the matching module is used for respectively matching each piece of communication data in the communication data set with each normal feature in a normal feature set and each abnormal feature in an abnormal feature set to obtain a normal feature parameter set and an abnormal feature parameter set corresponding to the communication data set;

the calculation module is used for calculating a fraud probability value corresponding to the number to be detected based on each normal characteristic parameter in the normal characteristic parameter set and each abnormal characteristic parameter in the abnormal characteristic parameter set;

and the response module is used for responding to the fraud probability value being larger than the preset fraud probability value and determining the number to be detected as a fraud number.

In a possible design, the matching module is specifically configured to obtain training feature sets, determine a weight value corresponding to each training feature in the training feature sets, associate each training feature with each corresponding weight value, obtain associated training feature sets corresponding to the training feature sets, determine the normal feature sets and the abnormal feature sets based on each associated training feature meeting a preset condition in the associated training feature sets, and match each piece of communication data in the communication data sets with each normal feature in the normal feature sets and each abnormal feature in the abnormal feature sets respectively.

In a possible design, the matching module is further configured to determine an initial weight value corresponding to each training feature in the training feature set, and input the training feature set and each initial weight value corresponding to the training feature set into a preset iterative model to obtain a weight value corresponding to each training feature in the training feature set.

In one possible design, the matching module is further configured to, in response to that the iteration number of the initial weight value reaches a preset iteration number, use each current weight value corresponding to the training feature set as a weight value corresponding to each training feature of the training feature set, or determine a loss value of the training feature set, and in response to that the loss value is smaller than a preset loss threshold, use the current weight value corresponding to the training feature set as a weight value corresponding to each training feature of the training feature set.

In a possible design, the matching module is further configured to determine a first parameter of each associated training feature based on a first preset formula, determine a second parameter of each associated training feature based on a second preset formula, extract each first associated training feature and each second associated training feature, where the first parameter is lower than a first preset threshold and the second parameter is lower than a second preset threshold, where the first associated training feature is a feature in a normal communication data set, the second associated training feature is a feature in an abnormal communication data set, generate a normal feature set corresponding to the associated training feature set based on each first associated training feature, and generate an abnormal feature set corresponding to the associated training feature set based on each second associated training feature.

In one possible design, the first predetermined formula is as follows:

wherein, w _k A representation of the associated training feature is presented,

In one possible design, the second predetermined formula is as follows:

representing associated training features w _k The stability of (a) is high,

represents the (n-1) th associated training feature w _k ，

Representing the nth associated training feature w _k ，

And representing the stability of the (n-1) th associated training feature and the nth associated training feature.

In one possible design, the matching module is further configured to match, in response to each piece of communication data in the communication data set, a normal feature in a normal feature set, record a normal feature parameter corresponding to each normal feature, generate a normal feature parameter set based on each normal feature parameter, match, in response to each piece of communication data in the communication data set, an abnormal feature in an abnormal feature set, record an abnormal feature parameter corresponding to each abnormal feature, and generate an abnormal feature parameter set based on each abnormal feature parameter.

In a third aspect, the present application provides an electronic device, comprising:

a memory for storing a computer program;

and the processor is used for realizing the steps of the fraud number identification method when executing the computer program stored in the memory.

In a fourth aspect, a computer-readable storage medium has stored therein a computer program which, when executed by a processor, implements a method step of fraudulent number identification as described above.

In a fifth aspect, the present application provides a computer program product which, when run on a computer, causes the computer to perform the above-mentioned fraudulent number identification method steps.

Drawings

FIG. 1 is a flow chart of the steps of a method for identifying a fraudulent number provided by the present application;

fig. 2 is a schematic structural diagram of a fraudulent number identification device provided in the present application;

fig. 3 is a schematic structural diagram of an electronic device provided in the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more clear, the present application will be further described in detail with reference to the accompanying drawings. The particular methods of operation in the method embodiments may also be applied to apparatus embodiments or system embodiments. It should be noted that "a plurality" is understood as "at least two" in the description of the present application. "and/or" describes the association relationship of the associated object, indicating that there may be three relationships, for example, a and/or B, which may indicate: a exists alone, A and B exist simultaneously, and B exists alone. A is connected with B and can represent: a and B are directly connected and A and B are connected through C. In addition, in the description of the present application, the terms "first," "second," and the like are used for descriptive purposes only and are not intended to indicate or imply relative importance nor order to be construed.

In the prior art, in order to identify a fraudulent number, a mode is adopted that attribute information of a number to be identified is obtained, information in the attribute information is randomly combined to generate a plurality of combined feature sets, risk factors corresponding to the combined feature sets are determined, and risk indexes corresponding to the number to be identified are determined based on the risk factors, however, when the attribute information is offline data, if the number to be identified is a fraudulent number, when the number to be identified is detected to be the fraudulent number, fraudulent activities are finished, so that the fraudulent number cannot be intercepted in time, when the attribute information is real-time data, because a plurality of combined feature sets need to be detected, the same feature appears in at least one combined feature set, so that the same feature is detected for a plurality of times, and the number to be identified cannot be determined to be the fraudulent number before the fraudulent activities are finished, delaying the opportunity to intercept fraudulent numbers.

In order to solve the above problem, embodiments of the present application provide a method for identifying a fraudulent number, so as to efficiently and accurately identify the fraudulent number. The method and the device in the embodiment of the application are based on the same technical concept, and because the principles of the problems solved by the method and the device are similar, the device and the embodiment of the method can be mutually referred, repeated parts are not repeated, and the data acquisition, storage, use, processing and the like in the technical scheme of the application all conform to relevant regulations of national laws and regulations.

The embodiments of the present application will be described in detail below with reference to the accompanying drawings.

Referring to fig. 1, the present application provides a method for identifying a fraudulent number, which can efficiently and accurately identify the fraudulent number, and the implementation flow of the method is as follows:

step S1: and obtaining a communication data set corresponding to the number to be detected.

In order to efficiently and accurately identify a fraudulent number, a communication data set corresponding to a number to be detected needs to be obtained, where the communication data set includes device information, access port information, a geographic location, network access time, package information, and a number attribution place, and no limitation is made here.

Step S2: and matching each piece of communication data in the communication data set with each normal feature in a normal feature set and each abnormal feature in an abnormal feature set respectively to obtain a normal feature parameter set and an abnormal feature parameter set corresponding to the communication data set.

In order to detect a communication data set corresponding to a number to be detected, a training feature set is required to be obtained, the training feature set comprises a normal communication data set and an abnormal communication data set, in order to ensure the accuracy of identifying the number to be detected as a fraudulent number, the training feature set is required to be trained, and the specific training process is as follows:

because the training feature set includes the normal communication data set and the abnormal communication data set, in order to respectively screen out effective features from the normal communication data set and the abnormal communication data set, the effective features are features that can identify that the number to be detected is a fraudulent number or features that can identify that the number to be detected is a non-fraudulent number, therefore, the normal communication data set and the abnormal communication data set need to be respectively screened, and the specific screening process is as follows:

in order to ensure the accuracy of the detection communication data set, communication data sets corresponding to a plurality of mobile phone numbers are obtained, the communication data sets corresponding to the obtained groups of the plurality of mobile phone numbers are determined to be normal communication data sets or abnormal communication data sets, and the obtained communication data sets corresponding to the plurality of mobile phone numbers are placed into a training feature set.

Such as: the training feature set is T, T { (T) ₁ ，y ₁ )，(t ₂ ，y ₂ )，……，(t _n ，y _n )}，t ₁ Communication data set, t, representing a first mobile telephone number ₂ Communication data set, t, representing a second mobile telephone number _n Communication data set, y, representing the nth mobile phone number ₁ When y is 0 or 1 ₁ When 0, represents y ₁ Corresponding t ₁ For normal communication of data sets, when y ₁ When 1, represents y ₁ Corresponding t ₁ Is an anomalous communication data set.

After the training feature set is obtained, each training feature in the training feature set is placed in the same initial weight value, since the normal communication data set in the training feature set includes: determining that the number to be detected is a non-fraudulent number based on one communication data, and determining that the number to be detected is a non-fraudulent number based on a plurality of communication data, wherein the abnormal communication data set also comprises: determining that the number to be detected is a fraudulent number based on one communication data, and determining that the number to be detected is a fraudulent number based on a plurality of communication data, such as: communication data A, B, C in the abnormal communication data set, the probability that the number to be detected is the fraud number is determined to be 80% based on a, the probability that the number to be detected is the fraud number is determined to be 60% based on B, the probability that the number to be detected is the fraud number is determined to be 40% based on C, it is obvious that the probabilities that the number to be detected is the fraud number are determined based on each communication data and are not consistent, therefore, in order to more accurately identify that the number to be detected is the fraud number, the weight value of each communication data in the abnormal communication data set needs to be redistributed, the communication data in the normal communication data set also has the above-described situation, therefore, the weight value of each communication data in the normal communication data set needs to be redistributed, and the specific process is as follows:

obtaining a normal communication data set from the training feature set, the normal communication data set including features of the plurality of non-fraudulent numbers over a plurality of time periods, such as: the normal communication data set includes: the number of calling, call duration, average call duration, terminal use duration, etc., and the normal communication data set is shown in table 1:

	a	b	c	d	......
						X1	a1	b1	c1	d1	......
X2	a2	b2	c2	d2	......
						X3	a3	b3	c3	d3	......
X4	a4	b4	c4	d4	......
						......	......	......	......	......	......

TABLE 1

The relationship between each feature in the normal communication data set and each cell phone number is described in table 1 above, 4 features a, c, d are described in table 1 above, X1, X2, X3, and X4 are 4 different normal cell phone numbers, a1 is data corresponding to feature a in cell phone number X1, b1 is data corresponding to feature b in cell phone number X1, the data corresponding to features in other collected numbers refer to the above-described examples of a1 and b1, and the relationship between other cell phone numbers and each feature refers to table 1 above, which is not described one by one here.

The initial weight values of the features in the normal communication data set are consistent, the initial weight values corresponding to the features in the normal communication data set and the normal communication data set are input into a preset iteration model, the preset iteration model in the embodiment of the application is an adaboost algorithm model, and since the iteration of the initial weight values of the features in the normal communication data set by using the adaboost algorithm is a technology known by those skilled in the art, the iteration of the initial weight values is not described in detail here.

After initial weight values corresponding to the features in the normal communication data set and the normal communication data set are input into a preset iteration model, when the iteration times of the initial weight values reach the preset iteration times, the iteration model outputs the weight values corresponding to the current normal communication data, and the weight values corresponding to the features in the normal communication data set are obtained.

In a possible design, in the process of iterating the initial weight values of the features in the normal communication data set by the preset iteration model, a loss value is calculated after each iteration, when the loss value is smaller, the weight value corresponding to each feature in the iterated normal communication data set is more accurate, and when the loss value is smaller than a preset loss threshold value, the preset iteration model obtains the weight value corresponding to the current normal communication data set and outputs the current weight value as a training result of the initial weight value to obtain the weight value corresponding to each feature in the normal communication data set.

The above description describes that the method for obtaining the weight value corresponding to each feature in the normal communication data set, and the method for obtaining the weight value in the abnormal communication data set are the same as the method for obtaining the weight value corresponding to each feature in the normal communication data set, and the specific process is as follows:

obtaining an abnormal communication data set from the training feature set, wherein the abnormal communication data set comprises features of a plurality of fraudulent numbers in a plurality of times, such as: the abnormal communication data set comprises: the number of times of calling, the provincial proportion of calling, the number of times of terminal change, the sending amount of short messages, the receiving amount of short messages, the city where the user roams, and the like, and the abnormal communication data set is shown in table 2:

	A	B	C	D	......
						Y1	a1	b1	c1	d1	......
Y2	a2	b2	c2	d2	......
						Y3	a3	b3	c3	d3	......
Y4	a4	b4	c4	d4	......
						......	......	......	......	......	......

TABLE 2

The relationship between each feature in the abnormal communication data set and each cell phone number is described in table 2 above, 4 features A, B, C, D are described in table 2 above, Y1, Y2, Y3, and Y4 are 4 different fraudulent cell phone numbers, a1 is data corresponding to feature a in cell phone number Y1, b1 is data corresponding to feature b in cell phone number Y1, the data corresponding to features in other collected numbers refer to the above-described examples of a1 and b1, and the relationship between other cell phone numbers and each feature refers to table 2 above, which is not set forth herein one by one.

The initial weight values of the features in the abnormal communication data set are consistent, the initial weight values corresponding to the features in the abnormal communication data set and the abnormal communication data set are input into a preset iteration model, the preset iteration model in the embodiment of the application is an adaboost algorithm model, and since the iteration of the initial weight values of the features in the abnormal communication data set by using the adaboost algorithm is a technology known by those skilled in the art, the detailed description of the iteration process of the initial weight values is not provided here.

After the initial weight values corresponding to the characteristics in the abnormal communication data set and the abnormal communication data set are input into the preset iteration model, when the iteration times of the initial weight values reach the preset iteration times, the iteration model outputs the weight values corresponding to the current abnormal communication data, and the weight values corresponding to the characteristics in the abnormal communication data set are obtained.

In a possible design, in the process of iterating the initial weight values of the features in the abnormal communication data set by the preset iteration model, a loss value is calculated after each iteration, when the loss value is smaller, the weight value corresponding to each feature in the iterated abnormal communication data set is more accurate, and when the loss value is smaller than a preset loss threshold value, the preset iteration model obtains the weight value corresponding to the current abnormal communication data set and outputs the current weight value as a training result of the initial weight value to obtain the weight value corresponding to each feature in the abnormal communication data set.

It should be noted that, the features in the normal communication data set and the abnormal communication data set in the training feature set may be the same or different, and are not limited herein, the data of the same training feature corresponding to each mobile phone number in the training feature set may be the same or different, and the training feature set is specifically shown in table 3:

TABLE 3

The number of calls in the training feature set is described in table 3, and the number of calls corresponding to a non-fraudulent cell phone number X1 in different time periods and the number of calls corresponding to a fraudulent cell phone number Y1 in different time periods are recorded, and the number of calls for each cell phone number in table 1 is determined based on the actual number of calls for the cell phone number, so that the number of calls for each cell phone number may be the same or different, and the communication data corresponding to other cell phone numbers in the training feature set and each training feature refer to table 3, which is not set forth herein.

Further, taking the number of calls as an example in table 3, the feature parameter corresponding to the number of calls represents the number of calls of the mobile phone number in a certain time period, when the training feature in the training feature set is a call duration, the feature parameter corresponding to the call duration represents the call duration of the mobile phone number in a certain time period, each training feature in the training feature set corresponds to a numerical value, the numerical value is used as a feature parameter, each feature parameter is obtained after an iteration of a preset iteration model based on an initial weight value of the training feature set, each training feature in the training feature set corresponds to a weight value, each training feature is associated with each corresponding weight value to generate an associated training feature set, and the associated training feature set is shown in table 4:

	number of calls	Amount of received short message	Amount of short message sent	Number of minutes of conversation	......
						X1	Communication data 1	Communication data 21	Communication data 31	Communication data 41	......
X2	Communication data 2	Communication data 22	Communication data 32	Communication data 42	......
						......	......	......	......	......	......
Weighted value	0.12	0.10	0.31	0.22

TABLE 4

Table 4 describes communication data corresponding to each cell phone number in the associated training feature set and each feature, each training feature corresponds to a weight value, table 4 describes only communication data corresponding to two cell phone numbers and four training features, only two cell phone numbers and four training features are taken as examples, other cell phone numbers and other training features refer to the examples in table 4, which is not specifically described here, the cumulative weight value of all training features in the training feature set is 1, and since the training feature set includes a normal communication data set and an abnormal communication data set, the forms of the feature associated weight values in the normal communication data set and the feature associated weight values in the abnormal communication data set are consistent with the respective training feature associated weight values in the training feature set, and therefore, the respective normal communication data associated weight values in the normal communication data set and the respective abnormal communication data associated weight values in the abnormal communication data set are consistent with the respective training feature associated weight values in the training feature set The weighted values are referred to the above table 4, respectively, and will not be described in detail herein.

Further, since each associated training feature in the associated training feature set corresponds to a feature parameter, and the feature parameter is obtained based on a weight value of each associated training feature, in the embodiment of the present application, a gradient descent algorithm is used to calculate the feature parameter of each associated training feature, and since the gradient descent algorithm is a technique known to those skilled in the art, it is not described in detail here, and each associated training feature is associated with its corresponding feature parameter, and the feature parameter corresponding to each associated training feature in the associated training feature set is shown in table 5:

TABLE 5

Table 5 above is a set of associated training features, and table 5 lists two mobile phone numbers and four training features: the number of calls, the number of received short messages, the number of sent short messages, the number of minutes of calls, and communication data corresponding to each mobile phone number and each associated training feature respectively, the communication data corresponding to each mobile phone number are data of different time periods, the communication data can be obtained from a telecom operation platform, each associated training feature is associated with a feature parameter, and the associated feature parameters of other associated training features refer to the table 5, which is not described herein.

After obtaining the associated training feature set, in order to avoid that the associated training feature set contains a large amount of invalid data, where the invalid data is a feature that a number to be detected cannot be identified as a non-fraudulent number or a fraudulent number, each associated training feature in the associated training feature set needs to be screened out to screen out an associated training feature that meets a preset condition, and a specific screening process is as follows:

the specific process of screening the normal feature set meeting the preset conditions from the associated training feature sets is as follows:

extracting an associated training feature set corresponding to the non-fraud mobile phone number from the associated training feature set based on the non-fraud mobile phone number, determining a first parameter of each associated training feature based on a first preset formula, and determining a second parameter of each associated training feature based on a second preset formula.

The first preset formula is as follows:

wherein w _k A representation of the associated training feature is presented,

The second preset formula is as follows:

wherein w _k The associated training characteristics are represented in a representation,

representing associated training features w _k The stability of (a) is high,

represents the (n-1) th associated training feature w _k ，

Representing the nth associated training feature w _k ，

For a clearer explanation of the process of obtaining the first parameter, the following description will be given by way of example:

TABLE 6

The above table 6 lists one associated training feature in the associated training feature set, i.e., the number of calls, the table 6 lists the number of calls corresponding to each of the four time periods, and other associated training features in the associated training feature set refer to the table 6, which is not described herein.

Based on table 6 above, since the training feature set is collected from the telecom operation platform, the operating time of the telecom operation platform may be 8h, or may also be 24h, and this is not limited herein, in order to obtain the first parameter of the number of calls, the mean value and the standard deviation of the number of calls are calculated, taking 8h as an example, the mean value of the number of calls is (4+5/8+10/(8 × 7) +30/(8 × 30))1/4 is 1.23, and then the standard deviation is calculated to be 1.61 based on 1.23, so that the first parameter of the number of calls is 1.61/1.23 is 1.31.

Further, when the second parameter of the number of times of call is calculated based on the table 6, it is necessary to determine the time period in which the number of times of call in the last hour is 4, the number of times of call in the same time period in the last day, the number of times of call in the same time period in the last week, and the number of times of call in the same time period in the last month in the table 6.

If the number of calls in the last hour is 4, the number of calls in the last hour is 10 in 2022, 6 months and 8 days: 00 to 11: number of calls of 00, then

6, 7 and 10 in 2022 are screened from the number of calls in the last day: 00 to 11: number of calls of 00 is 4, then

The average number of calls in the same time period is determined from the number of calls in the last week, and the specific determination process is shown in table 7:

time period	Communication systemNumber of times
		6/month/1/10/2022: 00 to 11: 00	5
6/month 2/10 in 2022: 00 to 11: 00	0
		6/month/3/10 in 2022: 00 to 11: 00	6
6/month/4/10/2022: 00 to 11: 00	2
		6/month/5/10/2022: 00 to 11: 00	3
6/10/2022: 00 to 11: 00	5
		6/2022, 7/10: 00 to 11: 00	3

TABLE 7

The number of calls in the same time slot for the last week of each day in table 6 is recorded in table 7, and if the average number of calls is (5+0+6+2+3+5+3)/7 ≈ 3.43 can be calculated based on table 7, the number of calls can be calculated

The average number of times of call in the last month in the same time period of each day in the last month is determined based on table 6, and since the method for determining the average number of times of call in the last month in the same time period of each day in the last month is the same as the method for determining the average number of times of call in the same time period of each day in the last week, the process for determining the average number of times of call in the last month in the same time period of each day in the last week is described above with reference to the process for determining the average number of times of call in the same time period of each day in the last week, which is not described in detail herein, if the average number of times of call in the same time period of each day in the last month is 4.02.

The stability between the number of calls for each time segment in the training feature is described above

The process of obtaining the second parameter of the number of times of call in table 6 above is as follows:

and

the stability between is:

and

the stability between is:

and

the stability between is:

in summary,

therefore, the second parameter of the number of calls in table 6 above is 0.287.

The above description describes the process of obtaining the second parameter of the number of calls based on table 6, and the process of determining the second parameter corresponding to the other associated training features refers to the above process of determining the second parameter of the number of calls, which is not set forth herein.

The above description describes the process of obtaining the first parameter and the second parameter of the call times, and the process of obtaining the first parameter and the second parameter of the other training features in the associated training feature set refers to the process of obtaining the first parameter and the second parameter of the call times.

The above description describes obtaining a first parameter and a second parameter of each training feature in the associated training feature set, extracting each training feature of which the first parameter is lower than a first preset threshold and the second parameter is lower than a second preset threshold from the associated training feature set corresponding to the non-fraudulent mobile phone number, taking the training feature as a first associated training feature, and generating a normal feature set corresponding to the associated training feature set based on each first associated training feature.

In the process of obtaining the abnormal feature set from the associated training feature set, the associated training feature set corresponding to the fraud number needs to be obtained, then a first parameter of each associated training feature is determined based on a first preset formula, a second parameter of each associated training feature is determined based on a second preset formula, each associated training feature with the first parameter being lower than a first preset threshold value and the second parameter being lower than a second preset threshold value is used as a second associated training feature, and the abnormal feature set corresponding to the associated training feature set is generated based on each second associated training feature.

Since the process of obtaining the abnormal feature set is the same as the process of obtaining the normal feature set, the process of obtaining the abnormal feature set from the associated training feature set refers to the above process of obtaining the normal feature from the associated training feature set, and therefore, in order to avoid a large number of repetitive contents, detailed description is not provided here.

It should be noted that, in the embodiment of the present application, the first preset threshold is 1, the second preset threshold is 0.3, and the first preset threshold and the second preset threshold may be adjusted according to an actual situation, and in order to improve an accuracy of identifying a number to be detected as a fraudulent number, the first preset threshold and the second preset threshold may be optimized by using a particle swarm algorithm in the embodiment of the present application, and since optimizing the first preset threshold and the second preset threshold based on the particle swarm algorithm is a technique known by those skilled in the art, detailed description is not provided herein.

The server analyzes each communication data in the communication data set corresponding to the number to be detected, the communication data comprises the feature corresponding to the number to be detected, each communication data is matched with each normal feature in the normal feature set, each communication data is matched with each abnormal feature in the abnormal feature set, and the normal feature and the abnormal feature matched with the communication data are recorded, because each normal feature in the normal feature set corresponds to one normal feature parameter and each abnormal feature in the abnormal feature set corresponds to one abnormal feature parameter, the server can determine the normal feature parameter or the abnormal feature parameter corresponding to each communication data based on the normal feature or the abnormal feature matched with each communication data, therefore, the normal characteristic parameter set and the abnormal characteristic parameter set corresponding to the communication data set can be determined based on the normal characteristic parameter or the abnormal characteristic parameter corresponding to each communication data.

Such as: the set of normal features is shown in table 8:

set of normal features	Normal feature 1	Normal feature 2	Normal feature 3	Normal feature 4
					Normal characteristic parameter	0.2	0.2	0.34	0.43

TABLE 8

The table 8 records 4 normal features in the normal feature set and normal feature parameters corresponding to each normal feature, where only 4 normal features are used for example, and the normal feature parameters corresponding to other normal features and other normal features may refer to the table 8, which is not set forth herein.

The set of abnormal features is shown in table 9:

abnormal feature set	Abnormal feature 1	Abnormal feature 2	Abnormal feature 3	Abnormal feature 4
					Abnormal characteristic parameter	0.23	0.21	0.33	0.44

TABLE 9

The table 8 records 4 abnormal features in the abnormal feature set and abnormal feature parameters corresponding to each abnormal feature, which are only exemplified by 4 abnormal features, and the table 9 can be referred to for other abnormal features and abnormal feature parameters corresponding to other abnormal features, which are not set forth herein.

If the communication data in the communication data set matches the normal feature 1 and the normal feature 2 in table 8, and matches the abnormal feature 2 and the abnormal feature 3 in table 9, then the corresponding normal feature parameter set in the communication data set is {0.2, 0.2}, and the corresponding abnormal feature parameter set in the communication data set is {0.21, 0.33 }.

Based on the above description, the communication data in the communication data set is classified and respectively matched with the normal feature set and the abnormal feature set, and the normal feature parameter set and the abnormal feature parameter set corresponding to the communication data set are obtained, so that the efficiency and the accuracy of detecting the communication data set can be improved.

Step S3: and calculating a fraud probability value corresponding to the number to be detected based on each normal characteristic parameter in the normal characteristic parameter set and each abnormal characteristic parameter in the abnormal characteristic parameter set.

In order to calculate the fraud probability value of the number to be detected, the normal feature parameter set and the abnormal feature parameter set corresponding to the communication data set of the number to be detected are determined, and after the quantity value of the normal feature in the communication data set of the number to be detected is obtained, the quantity value, the normal feature parameters in the normal feature parameter set and the abnormal feature parameters in the abnormal feature parameter set are brought into a third preset formula, so that the fraud probability value corresponding to the number to be detected is calculated, wherein the third preset formula is as follows:

wherein P represents the fraud probability value of the number to be detected, N is the quantity value of the normal characteristics in the communication data set corresponding to the number to be detected, w _vi Normal characteristic parameter, w, representing the corresponding normal characteristic of the communication data set _ti An anomaly characteristic parameter representing a corresponding anomaly characteristic of the communication data set.

Such as: based on the above tables 8 and 9 as an example, if the magnitude of the normal feature parameter in the communication data set is 2, the normal feature parameter set of the communication data set is known as {0.2, 0.2}, and the corresponding abnormal feature parameter set in the communication data set is known as {0.21, 0.33}, then the fraud probability value of the communication data set is calculated as (0.2/0.21+0.2/0.33) × (1/2) × (0.78) by using the third preset formula.

Based on the method, the fraud probability value corresponding to the number to be detected is calculated based on the normal characteristic parameters and the abnormal characteristic parameters, and the accuracy of the obtained fraud probability value is ensured by combining the normal characteristic and the abnormal characteristic corresponding to the communication data set of the number to be detected.

Step S4: and determining the number to be detected as a fraud number in response to the fraud probability value being greater than a preset fraud probability value.

And after determining the fraud probability corresponding to the communication data set corresponding to the number to be detected, comparing the fraud probability value with a preset fraud probability value, and when the fraud probability value is greater than the preset fraud probability value, taking the number to be detected corresponding to the fraud probability value as a fraud number.

Based on the method, the communication data in the communication data set of the number to be detected is matched with the normal features in the normal feature set and the abnormal features in the abnormal feature set, the normal feature parameters corresponding to the matched normal features and the abnormal feature parameters corresponding to the matched abnormal features are obtained, the fraud probability value is calculated based on the normal feature parameters and the abnormal feature parameters, the accuracy of the obtained fraud probability value is ensured, the number to be detected which is the fraud number can be identified in real time, and the number to be detected can be intercepted quickly.

Based on the same inventive concept, an embodiment of the present application further provides a fraudulent number identification device, where the fraudulent number identification device is configured to implement a function of a fraudulent number identification method, and with reference to fig. 2, the device includes:

an obtaining module 201, configured to obtain a communication data set corresponding to a number to be detected;

a matching module 202, configured to match each piece of communication data in the communication data set with each normal feature in a normal feature set and each abnormal feature in an abnormal feature set, respectively, to obtain a normal feature parameter set and an abnormal feature parameter set corresponding to the communication data set;

the calculating module 203 is configured to calculate a fraud probability value corresponding to the number to be detected based on each normal characteristic parameter in the normal characteristic parameter set and each abnormal characteristic parameter in the abnormal characteristic parameter set;

the response module 204 is configured to determine that the number to be detected is a fraud number in response to the fraud probability value being greater than a preset fraud probability value.

In a possible design, the matching module 202 is specifically configured to obtain a training feature set, determine a weight value corresponding to each training feature in the training feature set, associate each training feature with each corresponding weight value, obtain an associated training feature set corresponding to the training feature set, determine the normal feature set and the abnormal feature set based on each associated training feature meeting a preset condition in the associated training feature set, and match each communication data in the communication data set with each normal feature in the normal feature set and each abnormal feature in the abnormal feature set.

In a possible design, the matching module 202 is further configured to determine an initial weight value corresponding to each training feature in the training feature set, and input the training feature set and each initial weight value corresponding to the training feature set into a preset iterative model to obtain a weight value corresponding to each training feature in the training feature set.

In a possible design, the matching module 202 is further configured to, in response to that the iteration number of the initial weight value reaches a preset iteration number, use each current weight value corresponding to the training feature set as a weight value corresponding to each training feature of the training feature set, or determine a loss value of the training feature set, and in response to that the loss value is smaller than a preset loss threshold, use the current weight value corresponding to the training feature set as a weight value corresponding to each training feature of the training feature set.

In a possible design, the matching module 202 is further configured to determine a first parameter of each associated training feature based on a first preset formula, determine a second parameter of each associated training feature based on a second preset formula, extract each first associated training feature and each second associated training feature, where the first parameter is lower than a first preset threshold and the second parameter is lower than a second preset threshold, where the first associated training feature is a feature in a normal communication data set, the second associated training feature is a feature in an abnormal communication data set, generate a normal feature set corresponding to the associated training feature set based on each first associated training feature, and generate an abnormal feature set corresponding to the associated training feature set based on each second associated training feature.

In one possible design, the first predetermined formula is as follows:

In one possible design, the second predetermined formula is as follows:

representing associated training features w _k The stability of (a) is high,

represents the (n-1) th associated training feature w _k ，

Representing the nth associated training feature w _k ，

In a possible design, the matching module 202 is further configured to, in response to that each piece of communication data in the communication data set is respectively matched with a normal feature in a normal feature set, record a normal feature parameter corresponding to each normal feature, generate a normal feature parameter set based on each normal feature parameter, and in response to that each piece of communication data in the communication data set is respectively matched with an abnormal feature in an abnormal feature set, record an abnormal feature parameter corresponding to each abnormal feature, and generate an abnormal feature parameter set based on each abnormal feature parameter.

Based on the same inventive concept, an embodiment of the present application further provides an electronic device, where the electronic device may implement the function of the foregoing fraudulent number identification apparatus, and with reference to fig. 3, the electronic device includes:

at least one processor 301 and a memory 302 connected to the at least one processor 301, in this embodiment, a specific connection medium between the processor 301 and the memory 302 is not limited in this application, and fig. 3 illustrates an example where the processor 301 and the memory 302 are connected through a bus 300. The bus 300 is shown in fig. 3 by a thick line, and the connection between other components is merely illustrative and not limited thereto. The bus 300 may be divided into an address bus, a data bus, a control bus, etc., and is shown with only one thick line in fig. 3 for ease of illustration, but does not represent only one bus or type of bus. Alternatively, the processor 301 may also be referred to as a controller, without limitation to name a few.

In the embodiment of the present application, the memory 302 stores instructions executable by the at least one processor 301, and the at least one processor 301 may execute the instructions stored in the memory 302 to perform a fraud number identification method as discussed above. The processor 301 may implement the functions of the various modules in the apparatus shown in fig. 2.

The processor 301 is a control center of the apparatus, and may connect various parts of the entire control device by using various interfaces and lines, and perform various functions of the apparatus and process data by operating or executing instructions stored in the memory 302 and calling up data stored in the memory 302, thereby performing overall monitoring of the apparatus.

In one possible design, processor 301 may include one or more processing units, and processor 301 may integrate an application processor that primarily handles operating systems, user interfaces, application programs, and the like, and a modem processor that primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 301. In some embodiments, the processor 301 and the memory 302 may be implemented on the same chip, or in some embodiments, they may be implemented separately on separate chips.

The processor 301 may be a general-purpose processor, such as a Central Processing Unit (CPU), a digital signal processor, an application specific integrated circuit, a field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or the like, that implements or performs the methods, steps, and logic blocks disclosed in embodiments of the present application. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of a method for identifying a fraudulent number disclosed in connection with the embodiments of the present application may be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules in the processor.

Memory 302, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. The Memory 302 may include at least one type of storage medium, and may include, for example, a flash Memory, a hard disk, a multimedia card, a card-type Memory, a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Programmable Read Only Memory (PROM), a Read Only Memory (ROM), a charge Erasable Programmable Read Only Memory (EEPROM), a magnetic Memory, a magnetic disk, an optical disk, and so on. The memory 302 is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. The memory 302 in the embodiments of the present application may also be circuitry or any other device capable of performing a storage function for storing program instructions and/or data.

The processor 301 is programmed to solidify the code corresponding to a fraudulent number identification method described in the foregoing embodiment into the chip, so that the chip can execute a fraudulent number identification method step of the embodiment shown in fig. 1 when running. How to program the processor 301 is well known to those skilled in the art and will not be described herein.

Based on the same inventive concept, the present application also provides a storage medium storing computer instructions, which when run on a computer, cause the computer to execute a fraud number identification method as discussed above.

In some possible embodiments, the present application provides that the various aspects of a fraudulent number identification method may also be implemented in the form of a program product comprising program code means for causing the control device to carry out the steps of a fraudulent number identification method according to various exemplary embodiments of the present application described above in this specification when the program product is run on an apparatus.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The program product of the data processing method provided in the embodiments of the present invention may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a computing device. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device over any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., over the internet using an internet service provider).

It should be noted that although several units or sub-units of the apparatus are mentioned in the above detailed description, such division is merely exemplary and not mandatory. Indeed, the features and functions of two or more of the units described above may be embodied in one unit, according to embodiments of the invention. Conversely, the features and functions of one unit described above may be further divided into embodiments by a plurality of units.

Moreover, while the operations of the method of the invention are depicted in the drawings in a particular order, this does not require or imply that the operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A method for identifying a fraudulent number, comprising:

obtaining a communication data set corresponding to a number to be detected;

2. The method of claim 1, wherein matching each communication data in the communication data set with each normal feature in a normal feature set and each abnormal feature in an abnormal feature set, respectively, comprises:

determining a weight value corresponding to each training feature in the training feature set, and associating each training feature with each corresponding weight value to obtain an associated training feature set corresponding to the training feature set;

3. The method of claim 2, wherein determining the weight value for each training feature in the set of training features comprises:

4. The method of claim 3, wherein obtaining the weight value for each training feature in the set of training features comprises:

5. The method of claim 2, wherein determining the normal feature set and the abnormal feature set based on each associated training feature meeting a preset condition in the associated training feature sets comprises:

6. The method of claim 5, wherein the first predetermined formula is as follows:

7. The method of claim 5, wherein the second predetermined formula is as follows:

representing associated training features w _k The stability of (a) is high,

represents the (n-1) th associated training feature w _k ，

Represents the nth associated training feature w _k ，

8. The method of claim 1, wherein matching each communication data in the communication data set with each normal feature in a normal feature set and each abnormal feature in an abnormal feature set to obtain a normal feature parameter set and an abnormal feature parameter set corresponding to the image data set comprises:

9. An apparatus for identifying a fraudulent number, said apparatus comprising:

10. The apparatus according to claim 9, wherein the matching module is specifically configured to obtain a training feature set, determine a weight value corresponding to each training feature in the training feature set, associate each training feature with the corresponding weight value, obtain an associated training feature set corresponding to the training feature set, determine the normal feature set and the abnormal feature set based on each associated training feature meeting a preset condition in the associated training feature set, and match each communication data in the communication data set with each normal feature in the normal feature set and each abnormal feature in the abnormal feature set.

11. The apparatus of claim 9, wherein the matching module is further configured to determine an initial weight value corresponding to each training feature in the training feature set, and input the training feature set and each initial weight value corresponding to the training feature set into a preset iterative model to obtain a weight value corresponding to each training feature in the training feature set.

12. The apparatus of claim 9, wherein the matching module is further configured to, in response to the iteration number of the initial weight value reaching a preset iteration number, use each current weight value corresponding to the training feature set as a weight value corresponding to each training feature of the training feature set, or determine a loss value of the training feature set, and in response to the loss value being smaller than a preset loss threshold, use the current weight value corresponding to the training feature set as a weight value corresponding to each training feature of the training feature set.

13. An electronic device, comprising:

a memory for storing a computer program;

a processor for implementing the method steps of any one of claims 1-8 when executing the computer program stored on the memory.

14. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of claims 1 to 8.

15. A computer program product, characterized in that, when the computer program product is run on a computer, it causes the computer to perform the method according to any of claims 1-8.