CN113902025A

CN113902025A - Fraud call identification method and system

Info

Publication number: CN113902025A
Application number: CN202111228723.3A
Authority: CN
Inventors: 任思颖; 刘晶; 张作凤; 吴钢
Original assignee: China United Network Communications Group Co Ltd
Current assignee: China United Network Communications Group Co Ltd
Priority date: 2021-10-21
Filing date: 2021-10-21
Publication date: 2022-01-07

Abstract

The invention provides a fraud telephone identification method and a system, aiming at solving the problems of isolated data and low accuracy rate of fraud telephone identification in the prior art, the method comprises the following steps: each participant node periodically sends user ID data and label data in a blacklist maintained by each participant node to a manager node; the management party node finds out the label data with the same ID, judges whether a label marked as a fraud phone exists in the label data with the same ID, and if so, informs the related participant node to update the label mark corresponding to the ID in the local blacklist of the related participant node as the fraud phone; when each participant node finds a suspected fraud number, the participant node first acquires an ID corresponding to the number, and then uses the acquired ID to inquire corresponding label data in a local blacklist of the participant node so as to identify whether the number is a fraud call. The technical scheme provided by the invention accurately and efficiently identifies the fraud telephone through the data fusion technology, and simultaneously realizes the data privacy protection.

Description

Fraud call identification method and system

Technical Field

The invention relates to the technical field of communication, in particular to a fraud telephone identification method and a fraud telephone identification system.

Background

Telecommunication fraud is receiving more and more attention due to various crime-solving means, low crime-solving cost, more victims, difficulty in tracking and the like, and has become a serious social problem, which also brings great public opinion pressure to operators, and in recent years, each operator is always searching for telecommunication anti-fraud technology.

At present, a fraud call identification mode generally performs machine learning and modeling analysis by analyzing behaviors of numbers, for example, modeling and predicting based on behaviors of number-based call duration, internet surfing duration, hang-up times, package service ordering, telephone dialing frequency and the like, and then generates a black and grey list to be transmitted to a communication management office for record processing. In the traditional scene, operators cannot share data with each other due to the requirement of data privacy protection, and only can build and predict models respectively, so that the false alarm and missing report rate of fraud calls is high.

Disclosure of Invention

The invention is completed in order to at least partially solve the technical problems that the identification of the fraud telephone in the prior art has isolated data and can not carry out data linkage, thereby influencing the identification accuracy.

According to an aspect of the present invention, there is provided a fraudulent call identification method, the method comprising:

each participant node periodically sends user Identity (ID) data and label data in a blacklist maintained by each participant node to a manager node, each blacklist comprises a plurality of pieces of data, and each piece of data comprises an ID, a telephone number, a plurality of pieces of attribute information and a label marked to indicate whether the data is a fraud telephone or not;

the management party node receives the ID data and the label data sent by each participant node, finds out the label data with the same ID, judges whether a label marked as a fraud phone exists in the label data with the same ID, and if so, informs the relevant participant node to update the label mark corresponding to the ID in a local blacklist of the participant node to be the fraud phone;

when each participant node finds a suspected fraud number, the participant node first acquires an ID corresponding to the number, and then uses the acquired ID to inquire corresponding label data in a local blacklist of the participant node so as to identify whether the number is a fraud call.

Optionally, before each participant node sends the ID data and the tag data in the blacklist maintained by each participant node to the administrator node, the method further includes:

and each participant node regularly converges a plurality of pieces of data corresponding to the same ID in the blacklist maintained by each participant node to obtain the local blacklist converged in the ID dimension.

Optionally, aggregating multiple pieces of data corresponding to the same ID includes:

and correspondingly summing the attribute information of each item in the data corresponding to the ID respectively, and performing OR operation on the tag values in the data corresponding to the ID.

Optionally, the method further comprises:

each participant node respectively calculates the hash value h of each ID data in the blacklist maintained by each participant node₁Then the respective corresponding hash value h₁Sending the information to a manager node;

the manager node receives the hash value h sent by each participant node₁And finds out tag data in which the IDs are the same.

Alternatively, the method further comprises:

all the participator nodes negotiate a random number r together, and then calculate the hash value h of each ID data cascade random number r in the blacklist maintained by each participator node respectively₂Then the respective corresponding hash value h₂Sending the information to a manager node;

the manager node receives the hash value h sent by each participant node₂And finds out tag data in which the IDs are the same.

Optionally, the method further comprises:

each participant node uses data in respective local blacklist to conduct multi-party horizontal federal learning model training to obtain an anti-fraud model;

when each participant node finds a suspected fraud number, if the number is identified as a non-fraud phone according to the local blacklist, the attribute information corresponding to the number is input into the anti-fraud model, whether the output result of the model is a fraud phone is judged, and if yes, the corresponding label in the local blacklist is updated to be a fraud phone.

Optionally, before each participant node obtains the anti-fraud model, the method further includes:

if the management side node judges that the label data with the same ID does not have the label marked as a fraud telephone, the related participant nodes negotiate the public and private keys of a homomorphic encryption algorithm, respectively encrypt each item of attribute information corresponding to the ID in the blacklist maintained by the participant nodes by using the public keys and send the attribute information to the management side node;

the management side node receives each item of encrypted attribute information corresponding to the ID sent by the related participant side node, respectively calculates the sum of each same item of encrypted attribute information corresponding to the ID, and then respectively sends the sum to the related participant side node;

and the related party nodes receive the sum of the encrypted attribute information of each same item corresponding to the ID, respectively use the private key to decrypt the sum and then divide the decrypted sum by the number of the related party nodes to obtain the average value of each item of attribute information corresponding to the ID, and respectively use the average value of each item of attribute information corresponding to the ID to update the corresponding original attribute information in the local blacklist of each item.

Optionally, the method further comprises:

each participant node periodically synchronizes to the administrator node the tag data labeled as fraudulent calls in the local blacklist maintained by each participant node.

Optionally, the participant comprises an operator and the administrator comprises a communication authority.

According to another aspect of the present invention, there is provided a fraud phone identification system, the system comprising:

the management side node and the plurality of participant nodes;

each participant node is set to periodically send user Identity (ID) data and label data in a blacklist maintained by each participant node to a manager node, each blacklist comprises a plurality of pieces of data, and each piece of data comprises an ID, a telephone number, a plurality of pieces of attribute information and a label marked to indicate whether the data is a fraud telephone or not;

the management party node is set to receive the ID data and the label data sent by each participant node, find out the label data with the same ID, judge whether the label data with the same ID has a label marked as a fraud phone, and if the label data with the same ID has the label marked as the fraud phone, inform the relevant participant node to update the label mark corresponding to the ID in the local blacklist of the relevant participant node as the fraud phone;

each participant node is further configured to, when a suspected fraud number is found, first acquire an ID corresponding to the number, and then use the acquired ID to query corresponding tag data in its local blacklist to identify whether the number is a fraud call.

The technical scheme provided by the invention can have the following beneficial effects:

in the fraud telephone identification method and system provided by the invention, each participant node sends ID data in the blacklist maintained by each participant node and label data marked as whether a fraud telephone is sent to the manager node, the manager node performs data fusion and then informs the relevant participant node to update the label data in the local blacklist, so that each participant node can accurately and efficiently identify the fraud telephone according to the latest updated blacklist, the interference caused by one certificate and multiple numbers is eliminated by the data fusion method, and the manager does not need to acquire user behavior data of each participant, thereby realizing data privacy protection among the participants and between the participants and the manager.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the example serve to explain the principles of the invention and not to limit the invention.

FIG. 1 is a flow chart of a fraud telephone identification method according to an embodiment of the present invention;

FIG. 2 is a flow chart of another fraud phone identification method provided by the embodiment of the invention;

fig. 3 is a schematic structural diagram of a fraud telephone identification system provided in the embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the following detailed description of the embodiments of the present invention is provided with reference to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present invention, are given by way of illustration and explanation only, not limitation.

At present, the telecom anti-fraud means is marked by various operators or internet companies through analyzing own user data. Because each company can only master the user data of one party using the company, the data volume is insufficient when the company analyzes the user data, and the false alarm rate of fraud telephone identification are high. Aiming at the technical problems that the existing telecommunication anti-fraud technology has isolated data and cannot perform data linkage, so that the identification accuracy is influenced, the embodiment of the invention provides a scheme capable of fusing multi-party data to perform fraud telephone identification, and simultaneously can meet the requirements of national regulatory authorities on data privacy protection, and the detailed description is performed through specific embodiments.

Fig. 1 is a flowchart illustrating a fraud telephone identification method according to an embodiment of the present invention. As shown in fig. 1, the method includes the following steps S101 to S103.

S101, each participant node sends ID (identity) data and label data in a blacklist maintained by each participant node to a manager node regularly, each blacklist comprises a plurality of pieces of data, and each piece of data comprises an ID, a telephone number, a plurality of pieces of attribute information and a label marked as whether a fraud telephone is available or not. The specific value of the sending period can be set and adjusted by those skilled in the art according to actual requirements.

In this embodiment, each blacklist data includes ID (user identity), NUM (telephone number), IA, IB, …, IH (attribute information related to user behavior, such as call duration, internet duration, number of times of being hung up, package service subscription information, frequency of making phone calls, etc.), and DF (label for marking whether a phone fraud is detected). The ID may be an identification number, or may also be a passport number or other numbers capable of identifying the identity of the user, and the identification number will be described as an example, but the present invention is not limited thereto.

The format of the black list is shown in table 1 below:

TABLE 1

For convenience of description, the present embodiment takes three parties a, b and c as an example, each party node maintains a black list containing label data labeled as fraud phones, where DF ═ 1 denotes that the corresponding phone number is a fraud phone, and DF ═ 0 denotes that the corresponding phone number is a non-fraud phone. It can be seen that the blacklist includes, in addition to fraudulent phones, non-fraudulent phones.

The blacklist format maintained by the participant node a is shown in table 2 below:

TABLE 2

ID

NUM

IA

IB

…

IH

DF

ID_a1

NUM_a1

IA_a1

IB_a1

…

IH_a1

0

ID_a2

NUM_a2

IA_a2

IB_a2

…

IH_a2

1

…

ID_al

NUM_al

IA_al

IB_al

…

IH_al

1

The blacklist format maintained by the participant node b is shown in table 3 below:

TABLE 3

The blacklist format maintained by participant node c is shown in table 4 below:

TABLE 4

ID

NUM

IA

IB

…

IH

DF

ID_c1

NUM_c1

IA_c1

IB_c1

…

IH_c1

0

ID_c2

NUM_c2

IA_c2

IB_c2

…

IH_c2

0

…

ID_cn

NUM_cn

IA_cn

IB_cn

…

IH_cn

1

Besides three participants a, b and c providing blacklist data, the scheme also relates to a management party g which is responsible for cooperating with the participants to execute the protocol and summarize the result data.

S102, the management party node receives the ID data and the label data sent by each participant node, finds out the label data with the same ID, judges whether a label marked as a fraud phone exists in the label data with the same ID, and if so, informs the relevant participant node to update the label mark corresponding to the ID in the local blacklist of the relevant participant node as the fraud phone.

Wherein, the relevant participant node means: for each participant node that sends the same ID data (possibly more than one) to the manager node, if there is a label labeled as a fraudulent phone in the label data corresponding to one of the same ID data, those participant nodes that send the same ID data are the relevant participant nodes.

S103, when each participant node finds a suspected fraud number, the participant node firstly acquires an ID corresponding to the number, and then uses the acquired ID to inquire corresponding label data in a local blacklist of the participant node so as to identify whether the number is a fraud call.

The participant may be an operator, and the manager may be a communication management office.

Since one identity document can register multiple numbers in one operator or multiple operators, a telecom fraudster may only dial a fraud phone with one number at the same time period, while other numbers interfere with fraud number identification due to the absence of typical telecom fraud features and possible across operators. Based on this, in this embodiment, each participant node sends ID data in the blacklist maintained by each participant node and tag data labeled as whether a fraud phone is sent to the manager node, and the manager node performs data fusion and then notifies the relevant participant node to update the tag data in its local blacklist, so that each participant node can accurately and efficiently identify the fraud phone according to its recently updated blacklist, interference caused by one certificate and multiple numbers is eliminated by the data fusion method, and the manager does not need to acquire user behavior data of each participant, thereby realizing data privacy protection among each participant, participant and manager.

In a specific embodiment, before step S101, the following step S104 is further included.

And S104, each participant node regularly converges a plurality of pieces of data corresponding to the same ID in the blacklist maintained by each participant node to obtain the local blacklist converged in the ID dimension. The specific value of the period can be set and adjusted by those skilled in the art according to actual requirements.

In this embodiment, the same ID in each blacklist may correspond to multiple NUM, that is, multiple data, so that data preprocessing needs to be performed inside each participant node, so as to obtain each blacklist aggregated in the ID dimension, which is convenient for subsequent judgment and processing of the manager node.

In a specific embodiment, step S104 aggregates multiple pieces of data corresponding to the same ID, specifically:

In this embodiment, three participant nodes a, b, and c respectively converge on the ID dimension for their respective blacklist data. Specifically, if the same ID in a black list corresponds to s pieces of data, attributes IA, IB, …, and IH in the s pieces of data are summed correspondingly, and tag values in the s pieces of data are or' ed, so as to obtain a new attribute value and a new tag value corresponding to the ID.

For the ID data sent by each participant node to the manager node, in order to prevent the user ID from being leaked in the transmission process, the invention adopts a Hash algorithm to ensure the safe transmission of the ID data, and specifically can adopt one of the following two schemes.

In an alternative embodiment, before step S101, the following step S105 is further included.

S105, each participant node respectively calculates the hash value h of each ID data in the blacklist maintained by each participant node₁。

Correspondingly, step S101 specifically is: each participant node respectively sends the corresponding hash value h₁Sending the information to a manager node; step S102 specifically includes: the manager node receives the hash value h sent by each participant node₁And finds out the tag data in which the IDs are the same (hash values are the same).

In this embodiment, three participant nodes a, b, and c respectively calculate hash values h of the IDs₁When the h is sent to the manager node g, the manager node g collects the h sent by the three parties₁Thereafter, tag data in which the IDs are the same, such as the ID of the participant node a, is found_axID of participant node b_byAnd ID of participant node c_czThe same is true.

To solve this problem, the present invention introduces a random factor to dope the plaintext for Hash calculation, and accordingly, in another alternative embodiment, before step S101, the following step S106 is further included.

S106, each participant node negotiates a random number r together, and then hash values h of all ID data cascade random numbers r in the blacklist maintained by each participant node are calculated respectively₂。

Correspondingly, step S101 specifically is: each participant node respectively sends the corresponding hash value h₂Sending the information to a manager node; step S102 specifically includes: the manager node receives the hash value h sent by each participant node₂And finds out the tag data in which the IDs are the same (hash values are the same).

In this embodiment, three participant nodes a, b, and c negotiate a random number r together, and calculate hash values h of ID cascades r respectively₂When the h is sent to the manager node g, the manager node g collects the h sent by the three parties₂Then, find out the object in which the ID is the sameAnd (4) label data. For example, ID of participant node a_axID of participant node b_byAnd ID of participant node c_czSame, DF needs to be judged_ax||DF_by||DF_czIf it is 1, DF is indicated if it is so_ax、DF_byAnd DF_czWhen at least one of the data is not 0, the node a of the participant is informed to set the DF in the x-th data in the local blacklist of the node a of the participant as 1, the node b of the participant is informed to set the DF in the y-th data in the local blacklist of the node b of the participant as 1, and the node c of the participant is informed to set the DF in the z-th data in the local blacklist of the node c of the participant as 1, so that the nodes a, b and c of the participant update the local blacklist respectively.

After each participant node completes local blacklist updating according to the scheme, in order to further improve the accuracy of fraud phone identification, the invention utilizes the updated blacklist data of multiple parties to perform anti-fraud model training based on federal learning, each participant node can obtain a telecommunication anti-fraud model after the training is completed, and then utilizes the anti-fraud model to further identify whether suspected fraud numbers are fraud phones, and the specific scheme is as follows.

In a specific embodiment, before step S103, the following step S107 is further included.

S107, each participant node uses data in the local blacklist to conduct lateral federal learning model training of multiple parties, and an anti-fraud model is obtained.

Here, Horizontal federal Learning (also called homogeneous federal Learning) refers to federal Learning in which participants have the same feature space but differ from each other in sample space. Since the horizontal federated learning algorithm is itself the prior art, the detailed description of the algorithm is omitted here.

Accordingly, when each participant node finds a suspected fraud number in step S103, if the number is identified as a non-fraud phone according to its local blacklist, the attribute information corresponding to the number is input into the anti-fraud model, and whether the output result of the model is a fraud phone is determined, if so, the corresponding label in its local blacklist is updated to be a fraud phone.

In this embodiment, when a participant node finds a suspected fraud number, it first queries the user ID corresponding to the number_wThen query the local blacklist for the ID_wIf the DF is equal to 1, the number is judged to be a fraud telephone, and a local blacklist is updated; if DF is equal to 0, inputting each item of attribute information corresponding to the number into the anti-fraud model and judging the output result, if the output result is a fraud call, updating the local blacklist.

In order to improve the accuracy of the anti-fraud model output result, the blacklist data maintained by each participant node can be subjected to data generalization in advance, and then the blacklist data of each participant node after data generalization is utilized to perform multi-party horizontal federal learning model training, so that a more accurate anti-fraud model is obtained.

After the manager node finds out the tag data having the same ID based on the ID data and the tag data transmitted from each participant node, if there is no tag labeled as a fraudulent call, such as DF, in the tag data having the same ID_ax||DF_by||DF_czEqual to 0, DF is indicated_ax、DF_byAnd DF_czAll are 0, at this time, the relevant participant node does not need to update the local blacklist, and the following multi-party data generalization scheme is directly executed.

In a specific embodiment, before step S107, the following steps S108 to S110 are further included.

S108, if the management side node judges that the label data with the same ID does not have the label marked as a fraud telephone, the related participant nodes negotiate the public and private keys of a homomorphic encryption algorithm, respectively encrypt each item of attribute information corresponding to the ID in the blacklist maintained by the participant nodes by using the public keys and send the attribute information to the management side node;

s109, the management side node receives various encrypted attribute information corresponding to the ID sent by the related participant side node, calculates the sum of the same encrypted attribute information corresponding to the ID respectively, and sends the sum to the related participant side node respectively;

s110, the related party nodes receive the sum of the encrypted attribute information of each same item corresponding to the ID, the sum is decrypted by using a private key and then divided by the number of the related party nodes to obtain the average value of each item of attribute information corresponding to the ID, and each item of original attribute information corresponding to each local blacklist is updated by using the average value of each item of attribute information corresponding to the ID.

Specifically, the participating nodes a, b and c negotiate the public and private keys of the homomorphic encryption algorithm E, and then the participating node a encrypts the attribute information IA in its blacklist using the public key of the algorithm E_ax，IB_ax，...，IH_axTo obtain E (IA)_ax)，E(IB_ax)，...，E(IH_ax) And sending the information to a manager node g, and a participant node b uses a public key of an algorithm E to encrypt attribute information IA in a blacklist_by，IB_by，...，IH_byTo obtain E (IA)_by)，E(IB_by)，...，E(IH_by) And sending the information to a manager node g, wherein a participant node c uses a public key of an algorithm E to encrypt attribute information IA in a blacklist_cz，IB_cz，...，IH_czTo obtain E (IA)_cz)，E(IB_cz)，...，E(IH_cz) And sent to the manager node g. Manager node computation E (IA)_ax)+E(IA_by)+E(IA_cz)，E(IB_ax)+E(IB_by)+E(IB_cz)，…，E(IH_ax)+E(IH_by)+E(IH_cz) And sent to the participant nodes a, b and c, respectively. Since the algorithm E is an addition homomorphic encryption algorithm, the participant nodes a, b, and c can all calculate the average value of each item of attribute information:

and updating the corresponding original attribute information in the local blacklists by using the average value of the attribute information.

In this embodiment, to implement data linkage, the problem of data privacy protection must be considered, and first, data privacy protection among all parties that grasp user data is mastered, plaintext data cannot be brought out of a network in the process of performing collaborative analysis, and then data privacy protection between parties and a management party is protected, and service data of normal users cannot be known by a management department. Therefore, each participant node encrypts each item of attribute information related to user behaviors based on a homomorphic encryption algorithm and then sends the encrypted item of attribute information to the management node, and the management node directly sums the data and then returns the summed data to each participant node without decrypting the data, so that the data privacy protection among the participants and between the participants and the management node is realized, and the normal service data of the user is ensured not to be known by a management department.

In a specific embodiment, the method further includes the following step S111.

And S111, each participant node periodically synchronizes label data marked as fraud telephones in the local blacklists maintained by each participant node with the manager node.

In this embodiment, the telecom fraud black and gray list data (i.e., the label data corresponding to the ID) analyzed by each participant node should be submitted to the management department, so that the management department can perform data integration and subsequent application conveniently.

It should be noted that the sequence of the above steps is only a specific example provided for illustrating the embodiment of the present invention, and the present invention does not limit the sequence of the above steps, and those skilled in the art can adjust the sequence as required in practical application; and the sequence number of the steps does not limit the execution sequence.

Fig. 2 is a flow chart of another fraud phone identification method according to an embodiment of the present invention. As shown in fig. 2, the method includes the following steps S201 to S213.

S201, each operator node regularly assembles a plurality of pieces of data corresponding to the same ID in the blacklist maintained by each operator node, and generates a local blacklist assembled in ID dimension;

s202, the operator nodes negotiate a random number r together, Hash values h of all ID data cascade random numbers r in a blacklist maintained by each operator node are calculated to be Hash (ID | r), and then the Hash values h corresponding to each operator node are sent to the management department node regularly;

s203, collecting the hash values h transmitted from each operator node by the management department node, and finding out the label data with the same ID;

s204, the management department node judges whether a label marked as a fraud phone exists in the label data with the same ID, if so, the step S205 is executed; if not, go to step S206;

s205, the management department node informs the relevant operator node to update the label corresponding to the ID in the local blacklist of the relevant operator node into a fraud phone, and the relevant operator node updates the local blacklist data of the relevant operator node after receiving the notice of the management department node;

s206, the nodes of the relevant operators negotiate public and private keys of a homomorphic encryption algorithm, encrypt each item of attribute information corresponding to the ID in the blacklist maintained by each node respectively by using the public keys and send the attribute information to the nodes of the management department;

s207, the management department node receives various encrypted attribute information corresponding to the ID sent by the relevant operator node, calculates the sum of the same encrypted attribute information corresponding to the ID respectively, and sends the sum to the relevant operator node respectively;

s208, the related operator nodes receive the sum of the encrypted attribute information of each same item corresponding to the ID, the sum is decrypted by using a private key and then divided by the number of the related operator nodes to obtain the average value of each item of attribute information corresponding to the ID, and each item of original attribute information corresponding to each local blacklist is updated by using the average value of each item of attribute information corresponding to the ID;

s209, each operator node performs multi-party transverse federal learning model training by using the local blacklist data generalized by the data in the step S208 and the local blacklist data updated in the step S205 to obtain an anti-fraud model;

s210, when a suspected fraud number is found, the operator node firstly acquires an ID corresponding to the number, then uses the acquired ID to inquire corresponding label data in a local blacklist of the operator node so as to identify whether the number is a fraud phone, and if the number is identified to be a non-fraud phone, inputs each item of attribute information corresponding to the number into the anti-fraud model obtained in the step S209;

s211, the operator node judges whether the anti-fraud model output result is a fraud phone, if yes, the step S212 is executed; if not, returning to the step S210;

s212, updating the corresponding label in the local blacklist as a fraud phone;

and S213, each operator node synchronizes label data marked as fraud calls in the local blacklists maintained by each operator node with the management department node regularly, and the management department node updates the management party blacklist data after receiving the data synchronized by each operator node.

The fraud call identification method provided by the embodiment of the disclosure converges multi-party data for anti-fraud analysis on the premise of ensuring privacy protection of user behavior data, simultaneously eliminates interference caused by one-number-guaranteeing, realizes telecommunication fraud identification of multiple parties based on identity information, and has higher accuracy.

Fig. 3 is a schematic structural diagram of a fraud telephone identification system provided in the embodiment of the present invention. As shown in fig. 3, the system includes: a manager node 31 and several participant nodes 32.

Wherein each participant node 32 is configured to periodically send ID (user identity) data and tag data in a blacklist maintained by each participant node to the manager node 31, each blacklist includes a plurality of pieces of data, each piece of data includes an ID, a phone number, a plurality of pieces of attribute information, and a tag marked as a fraud phone or not;

the management node 31 is configured to receive the ID data and the tag data sent by each participant node 32, find out the tag data with the same ID, determine whether a tag labeled as a fraud phone exists in the tag data with the same ID, and if so, notify the relevant participant node 32 to update the tag label corresponding to the ID in the local blacklist of the relevant participant node as a fraud phone;

each participant node 32 is further configured to, upon finding a suspected fraud number, first obtain the ID corresponding to the number, and then use the obtained ID to query the corresponding tag data in its local blacklist to identify whether the number is a fraud call.

In a specific embodiment, before sending the ID data and the tag data in the maintained blacklist to the manager node 31, each participant node 32 is further configured to periodically aggregate a plurality of pieces of data corresponding to the same ID in the maintained blacklist, so as to obtain a local blacklist aggregated in the ID dimension.

In an optional implementation manner, each participant node 32 aggregates multiple pieces of data corresponding to the same ID, specifically:

each participant node 32 sums the attribute information of each item in the data corresponding to the ID, and performs an or operation on the tag values in the data corresponding to the ID.

In one embodiment, each participant node 32 is further configured to calculate a hash value h of each ID data in the maintained blacklist respectively₁Then the respective corresponding hash value h₁Sending to the manager node 31; the manager node 31 is specifically configured to receive the hash value h sent by each participant node 32₁And finds out tag data in which the IDs are the same.

In another optional implementation, each of the participant nodes 32 is further configured to negotiate a random number r together, and then calculate hash values h of the concatenated random numbers r of each item of ID data in the black list maintained by each participant node respectively₂Then the respective corresponding hash value h₂Sending to the manager node 31; the manager node 31 is specifically configured to receive the hash value h sent by each participant node 32₂And finds out tag data in which the IDs are the same.

In one embodiment, each participant node 32 is further configured to perform a multi-party horizontal federal learning model training using data in its respective local blacklist to obtain an anti-fraud model; and when a suspected fraud number is found, if the number is identified as a non-fraud phone according to the local blacklist, inputting each item of attribute information corresponding to the number into the anti-fraud model, judging whether the output result of the model is a fraud phone, and if so, updating the corresponding label in the local blacklist as a fraud phone.

In one embodiment, if the management node 31 determines that there is no label labeled as a fraud phone in the label data with the same ID, the relevant participant node 32 is configured to negotiate a public-private key of a homomorphic encryption algorithm before obtaining the anti-fraud model, encrypt each item of attribute information corresponding to the ID in the maintained blacklist respectively by using the public key, and send the encrypted item of attribute information to the management node 31; the manager node 31 is further configured to receive each item of encrypted attribute information corresponding to the ID sent by the relevant participant node 32, calculate the sum of each same item of encrypted attribute information corresponding to the ID, and send the sum to the relevant participant node 32; the relevant participant node 32 is further configured to receive the sum of the encrypted attribute information of each identical item corresponding to the ID, decrypt the sum by using a private key, divide the decrypted sum by the number of the relevant participant node to obtain an average value of each item of attribute information corresponding to the ID, and update each item of original attribute information corresponding to each local blacklist by using the average value of each item of attribute information corresponding to the ID.

In one embodiment, each participant node 32 is further configured to periodically synchronize tag data labeled as fraudulent calls in the respective maintained local blacklist to the managing node 31.

In one embodiment, the participant comprises an operator and the administrator comprises a communications authority.

The fraud telephone identification system provided by the embodiment of the disclosure converges multi-party data for anti-fraud analysis on the premise of ensuring privacy protection of user behavior data, simultaneously eliminates interference caused by one-number-guaranteeing, realizes telecommunication fraud identification of multiple parties based on identity information, and has higher accuracy.

In summary, the fraud phone identification method and system provided by the invention provide a scheme capable of fusing multi-party data to perform fraud phone identification, firstly, a self-designed data reduction protocol is utilized to perform multi-party data integration on the premise of ensuring privacy protection of user behavior data so as to eliminate the influence of one card with multiple numbers (one identity card applies for multiple mobile phone numbers) on an analysis result, and then, a telecommunication anti-fraud model is constructed and predicted by means of a federal learning technology, so that fraud phone identification efficiency and identification accuracy can be effectively improved, and meanwhile, the requirement of data privacy protection of a national supervision department can be met.

It will be understood by those of ordinary skill in the art that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed by several physical components in cooperation. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A fraud telephone identification method, comprising:

2. The method of claim 1, before each of the participant nodes sends the ID data and the tag data in the maintained blacklist to the administrator node, further comprising:

3. The method according to claim 2, wherein aggregating a plurality of pieces of data corresponding to the same ID specifically comprises:

4. The method according to any one of claims 1-3, further comprising:

5. The method according to any one of claims 1-3, further comprising:

6. The method according to any one of claims 1-3, further comprising:

7. The method as recited in claim 6, further comprising, prior to obtaining an anti-fraud model:

8. The method of claim 6, further comprising:

9. The method of claim 1, wherein the participant comprises an operator and the administrator comprises a communications authority.

10. A fraud telephone identification system, comprising: the management side node and the plurality of participant nodes;