CN114693317A - Telecommunication fraud security federation detection method fusing homogeneous graph and bipartite graph - Google Patents

Telecommunication fraud security federation detection method fusing homogeneous graph and bipartite graph Download PDF

Info

Publication number
CN114693317A
CN114693317A CN202210397973.8A CN202210397973A CN114693317A CN 114693317 A CN114693317 A CN 114693317A CN 202210397973 A CN202210397973 A CN 202210397973A CN 114693317 A CN114693317 A CN 114693317A
Authority
CN
China
Prior art keywords
graph
user
data
mobile phone
bipartite
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210397973.8A
Other languages
Chinese (zh)
Inventor
许国良
张林泉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN202210397973.8A priority Critical patent/CN114693317A/en
Publication of CN114693317A publication Critical patent/CN114693317A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/018Certifying business or products
    • G06Q30/0185Product, service or business identity fraud
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W12/00Security arrangements; Authentication; Protecting privacy or anonymity
    • H04W12/12Detection or prevention of fraud
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W12/00Security arrangements; Authentication; Protecting privacy or anonymity
    • H04W12/12Detection or prevention of fraud
    • H04W12/128Anti-malware arrangements, e.g. protection against SMS fraud or mobile malware

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Signal Processing (AREA)
  • Data Mining & Analysis (AREA)
  • Development Economics (AREA)
  • Computing Systems (AREA)
  • Accounting & Taxation (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Finance (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Primary Health Care (AREA)
  • Tourism & Hospitality (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention relates to a safety federal telecommunication fraud detection method fusing a homogeneity diagram and a bipartite diagram, belonging to the field of big data analysis and mining, and comprising the following steps of S1: based on user service data of a telecom operator, extracting and preprocessing voice call data, short message communication data and mobile phone application access data of a user; s2: constructing a telecommunication user social network homogeneous graph and a user mobile phone application bipartite graph data set; s3: constructing a homogeneous graph embedded network aiming at a social network homogeneous graph, constructing a bipartite graph embedded network aiming at a bipartite graph applied to a mobile phone accessed by a user, sampling user nodes to obtain a neighbor node co-occurrence sequence, performing iterative training to obtain embedded representations of all nodes, and fusing the embedded representations as embedded representations of the user; s4: and extracting local telecommunication user characteristics by different participants according to the local data characteristics, performing combined training on local data of different organizations by adopting a safe federal gradient elevated tree classification model, and outputting a final prediction result of the fraud number.

Description

Telecommunication fraud security federation detection method fusing homogeneous graph and bipartite graph
Technical Field
The invention belongs to the field of big data analysis and mining, and relates to a telecommunication fraud security federation detection method fusing a homogeneity diagram and a bipartite diagram.
Background
With the development of mobile communication and the popularization of various network applications, the global telecommunication phishing situation is getting stronger and there is a trend of gradually moving to high technology and moving to phishing. Today, with the rapid development of internet technology, telecommunication phishing is increasingly becoming one of the "stubborn" societies in countries around the world. At present, the implementation of global telecommunication network fraud still mainly involves telephone contact, and increasingly presents new problems with new characteristics such as intellectualization, industrialization, homogenization and the like, and fraud objects gradually change from wide-spread type to precise fraud. The fraud modes are gradually spread from telephone, short message and email to social network sites and mobile phone applications, various fraud means are continuously renewed, the technical antagonism is continuously enhanced, fraud scripts are closely attached to social hotspots and personal privacy, and the fraud modes are gradually changed from domestic fraud to cross-border fraud.
Currently, the fraud number detection schemes in the industry mainly include two schemes, namely a rule-based expert system and a machine learning-based model system. The rule-based expert system needs anti-fraud experts to manually analyze a large amount of normal and abnormal telecommunication data, accurately identify the fraudulent behavior modes of fraudulent molecules, find important characteristics capable of effectively distinguishing whether fraud is caused, and write expert rules to detect the fraudulent behavior. Rule-based expert systems are therefore strongly dependent on the expertise and business knowledge of anti-fraud experts, causing huge losses if the experts are not able to detect the increasingly complex patterns of fraud with great acuity in time.
With the continuous expansion of data scale and the continuous increase of machine computing power, model systems based on machine learning have appeared. Machine learning based models are typically feature analyzed from historical transaction data, after which the models are trained and evaluated on feature data sets using machine learning classification algorithms and then applied to fraud number detection. Whether it is a rule-based expert system or a machine learning-based model system, individual behavior patterns that repeatedly occur when transaction fraud occurs are discovered from historical data. As the specialization degree of telecommunication fraud is continuously increased, fraud molecules can evade fraud detection by changing self fraud techniques, but the fraud molecules have difficulty in changing all the association relations. When the associated network covers a large range, the spidrome trail is revealed by the fraudulent molecules even if they take further care. Therefore, in the context of large-scale data, how to mine effective features to improve the effect of model fraud detection is a new direction currently explored by researchers.
In the present day that data security is more and more emphasized, there is often great difficulty in directly using telecommunication big data. The problem of difficult data integration exists among operators and related enterprises, even among different business departments of the same organization, so that the joint training of the telecommunication user characteristic data extracted by different departments is also the current research focus.
Disclosure of Invention
In view of the above, in order to fully utilize the communication service data of each operator and the fraud number label data of the public security department to identify fraud numbers, the invention provides a fraud number feature extraction and classification method based on a voice short message social graph and a mobile phone application access bipartite graph based on graph embedding learning.
In order to achieve the purpose, the invention provides the following technical scheme:
a telecommunication fraud security federation detection method fusing a homogeneity graph and a bipartite graph comprises the following steps:
s1: based on user service data of a telecom operator, extracting voice call data, short message communication data and mobile phone application access data of a user, and preprocessing the voice call data, the short message communication data and the mobile phone application access data;
s2: constructing a telecommunication user social network homogeneous graph and a user mobile phone application bipartite graph data set by utilizing the preprocessed data, wherein the graph data set comprises three types of weighted graphs of a voice social network homogeneous graph, a short message social network homogeneous graph and a mobile phone application access bipartite graph, and the weight setting of the sides is used for carrying out statistical feature extraction and weight aggregation according to the characteristics of different services;
s3: constructing a homogeneous graph embedded network aiming at a social network homogeneous graph, constructing a bipartite graph embedded network aiming at a bipartite graph applied to a mobile phone accessed by a user, sampling user nodes by adopting a graph embedding learning mode to obtain a neighbor node co-occurrence sequence, and then obtaining an embedded representation of each node by reconstructing an embedded function and performing negative sampling iterative training on co-occurrence information; fusing the embedding characteristics obtained by training as the embedding representation of the user;
s4: different participants extract local telecommunication user characteristics according to local data characteristics, and a safe federal gradient elevated tree classification model is adopted to carry out combined training on local data of different organizations; and performing encrypted data sample alignment and encrypted model parameter exchange on sample data among different mechanisms through a reliable third-party server, thereby realizing multi-party model combined training, wherein a two-stage training method is adopted in the training process, the first-stage training is used for screening the features, the second-stage training is used for classifying the screened features, and a final prediction result of the fraud number is output.
Further, step S1 specifically includes: constructing a fraud number detection data set by utilizing different service data of users collected from a telecom operator; the data is divided into the following four types according to different service data characteristics: the mobile phone comprises user basic information data, voice call data, short message communication data and mobile phone application access data; performing data cleaning operation on the acquired data, wherein the data cleaning operation comprises abnormal value processing, missing value processing and standardized processing; and meanwhile, marking the extracted telecommunication users according to the grasped telecommunication fraud report information, wherein the fraud users are marked as 1, and the non-fraud users are marked as 0.
Further, in step S2, the process of constructing the voice and short message social network homogeneity map and the user application bipartite map for accessing the mobile phone includes: aiming at voice and short message data, extracting a telecommunication user voice social graph G according to the calling and called relations of the voice call1(ii) a Constructing a short message social graph G according to the uplink and downlink transceiving relation of short message communication2(ii) a Aiming at the user internet log data, summarizing and merging the data according to the record of the user accessing the mobile phone application to obtain a mobile phone application access bipartite graph G3(ii) a The three types of graph data are in the form of weighted graphs, wherein the edge weight of the voice social graph is weighted and evaluated according to the communication relation characteristics between calling and called partiesThe edge weight of the short message social graph is subjected to weighted evaluation according to the communication relation characteristics of the receiving and sending users, and the edge weight of the bipartite graph accessed by the user is subjected to weighted evaluation according to the internet access condition characteristics of the application accessed by the user.
Further, in step S2, constructing a telecommunication user social network homogeneity graph and a user mobile phone application bipartite graph data set by using the preprocessed data specifically includes:
voice social network graph G1=(U1,E1) Social network G with short message2=(U2,E2) Wherein U isiIs a set of user nodes, EiIs a user and user communication relation set; each edge (i, j) in the edge set belongs to E and has a pair of user node pairs (u)i,uj) Having a weight wijThe number is more than or equal to 0, which represents the interaction condition between two users;
for Voice social networking graph G1User pair (u)i,uj) Directed edge weights between
Figure BDA0003587212450000031
By extracting (u)i,uj) Feature set of conversation between parties
Figure BDA0003587212450000032
Feature set F1Including but not limited to a number of calls feature f1 (1)Total call duration characteristics
Figure BDA0003587212450000033
Average call duration feature
Figure BDA0003587212450000034
Talk time period feature
Figure BDA0003587212450000035
Toll call feature
Figure BDA0003587212450000036
Caller on-network time feature
Figure BDA0003587212450000037
Called on-line time characteristics
Figure BDA0003587212450000038
Then, the weighted summation is carried out on all the elements in the set to obtain the weight of the edge, and the weight solving formula is shown as the following formula:
Figure BDA0003587212450000039
wherein alpha isiThe weighting coefficient is n, and the n is the total number of the extracted voice call features;
for short message social network graph G2Directed edge weight of
Figure BDA00035872124500000310
By extracting (u)i,uj) Feature set of conversation between parties
Figure BDA00035872124500000311
Figure BDA00035872124500000312
Feature set F2Including but not limited to a transmission times characteristic f1 (2)Total byte number characteristic of short message
Figure BDA00035872124500000313
Average byte number characteristic of short message
Figure BDA00035872124500000314
Short message sending time period characteristics
Figure BDA00035872124500000315
Whether the short message is the verification code
Figure BDA00035872124500000316
Sender on-network time characteristics
Figure BDA00035872124500000317
Receiver on-line time characteristics
Figure BDA00035872124500000318
Then, the weighted summation is carried out on all the elements in the set to obtain the weight of the edge, and the weight solving formula is shown as the following formula:
Figure BDA00035872124500000319
wherein beta isiM is the total number of the extracted short message communication characteristics;
mobile phone application access bipartite graph G3=(U3,V3,E3) Wherein U is3Representing a set of user nodes, V3Representing a mobile phone application node set;
Figure BDA00035872124500000320
representing a set of relational edges for a user to access a mobile application, each edge having a non-negative weight wijThe number is more than or equal to 0, which represents the internet access use condition of the user accessing the mobile phone application; dichotomy G for mobile phone application access3User handset application relationship pair (u)i,vj) Directed edge weights between
Figure BDA00035872124500000321
By extracting (u)i,vj) Inter-networking feature set
Figure BDA00035872124500000322
Feature set F3Including but not limited to the access times characteristic f1 (3)Total length of access feature
Figure BDA0003587212450000041
Average access duration characteristic f3 (3)Access total consumption traffic characteristics
Figure BDA0003587212450000042
Average consumption flow characteristic
Figure BDA0003587212450000043
User on-network time characteristics
Figure BDA0003587212450000044
Mobile phone application class features
Figure BDA0003587212450000045
Then, the weighted summation is carried out on all the elements in the set to obtain the weight of the edge, and the weight solving formula is shown as the following formula:
Figure BDA0003587212450000046
wherein gamma isiK is the total number of features extracted from the user access APP data, as a weighting coefficient.
Further, step S3 specifically includes the following steps:
s31: according to the constructed social network homogeneous graph, the short message social network homogeneous graph and the mobile phone application access bipartite graph, graph embedding training is carried out on the user nodes by adopting corresponding graph embedding models respectively;
s32: finding out a neighbor sequence set of the user node according to first-order and second-order neighbor similarity between the nodes of the homogeneous graph, and finding out a neighbor sequence set of the user node according to an explicit relation and an implicit relation of the bipartite graph;
s33: and respectively splicing the node embedding obtained by the first-order similarity training and the node embedding obtained by the second-order training to obtain the embedding vector of the user node of the homogeneous graph, and carrying out combined optimization training on the explicit relation and the implicit relation to obtain the user node embedding vector of the bipartite graph.
Further, in step S3, for the homogenous graph, the user node is mapped from the graph domain to the embedded domain, i.e. when the user node index i is given, the node u is directly obtainediIs embedded in uiThe mapping function is expressed as:
Figure BDA00035872124500000414
wherein e isi∈{0,1}NRepresenting user node uiOne-hot encoding (where N ═ U | represents the number of user nodes; e.g. of the typeiRepresenting the corresponding i-th element e in the vectori[i]Is 1, and the other elements are 0; wN×dIs the embedding parameter matrix to learn, where d is the embedding dimension; the ith row of the matrix W is node uiAn embedded representation of (a);
for bipartite graph, due to the original bipartite graph G3Two types of node sets exist, and considering that the fraud number detection task only needs to pay attention to the characteristics of the user nodes, the user node-based homogeneous graph needs to be split out GUExtracting features as implicit relation, mapping nodes of bipartite graph from graph domain to embedded domain, and using u as each nodeiAnd viRepresenting user node ui∈U3And mobile phone application node vi∈V3The embedded vector of (2);
extracting key structure information of user nodes in graph domain
Figure BDA0003587212450000047
Wherein the homogeneous graph network reconstructs domain information for the nodes based on first and second order similarities of the nodes
Figure BDA0003587212450000048
The bipartite graph network models and extracts the key structure information of the user nodes in the graph domain according to the explicit relation and the implicit relation of the graph domain nodes
Figure BDA0003587212450000049
Reconstructing extracted graph domain co-occurrence information using embedded representation of embedded domain
Figure BDA00035872124500000410
And
Figure BDA00035872124500000411
the reconstructed information is represented as
Figure BDA00035872124500000412
And
Figure BDA00035872124500000413
by pairing co-occurrence-based information
Figure BDA0003587212450000051
And reconstructing information
Figure BDA0003587212450000052
The target function of (2) is optimized, and a mapping function and all parameters related in a reconstructor are learned;
for the homogeneity map, the objective function that the first order similarity needs to be optimized is:
Figure BDA0003587212450000053
the objective function to be optimized for the second order similarity is:
Figure BDA0003587212450000054
for bipartite graph G3The optimization objective function for modeling by the explicit relationship is:
Figure BDA0003587212450000055
the optimization objective function for modeling by the implicit relationship is as follows:
Figure BDA0003587212450000056
by pairing co-occurrence-based information
Figure BDA0003587212450000057
And reconstructing information
Figure BDA0003587212450000058
Object function O of5And optimizing, and learning all parameters related in the mapping function and the reconstructor. The final joint optimization overall objective function of the bipartite graph is as follows:
maximize O5=-μO3+ηO4
where μ and η are the hyper-parameters to be specified for combining the different components in the joint optimization.
Further, step S4 specifically includes the following steps:
s41: splicing the homogeneity graph and the bipartite graph embedding vector to obtain a final node embedding characteristic, and combining the basic user characteristic and label information, wherein information is input into a safe federal gradient elevation tree classification model for primary training;
s42: sorting the features obtained by the training of the first stage according to importance, screening out the features n before ranking, and distributing the features to different participants for optimizing the features;
s43: after different participants carry out feature screening, carrying out two-stage federal gradient elevated tree classification training again, and outputting fraud number prediction results;
s44: and processing the final classification result of the user and outputting a suspicious fraud number list.
Further, the two-stage training process of the safe federated gradient lifting tree model comprises encrypted sample alignment and encrypted model training; in the training process, the central server carries out encryption exchange on the intermediate calculation result and the parameters of the model to finally obtain the optimal model parameter combination; the encryption mode is carried out by adopting a mode based on an RSA algorithm and a hash function; in the training process, the local data are calculated only locally, the calculation result is encrypted and then transmitted to the central server, and other participants cannot obtain the details of the local data. Thus, the local data can be secured.
The invention has the beneficial effects that: the method solves the problem of feature extraction for interactive features of the historical telephone bill and the online data of the telecommunication user in the detection task of the fraud user, and combines the basic information features of the user acquired by feature engineering for classification prediction of the machine learning model. A more multivariate data feature extraction method is provided for the traditional fraud number detection task. The method can be mutually fused and supplemented with other traditional fraud number detection models, and has good generalization capability in fraud number detection tasks. The data required to be collected by the method can be processed in an anonymized encrypted data form, the same characteristic extraction effect can be achieved, and the method has positive practical significance for protecting the privacy safety of the user to a certain extent. The invention can combine data of different telecom operators and other related organizations as model input to carry out joint training, and the used safety federal machine learning model can ensure that the data of all the participants are not leaked to each other. The data security can be guaranteed, and meanwhile, the telecommunication fraud detection can be carried out by fully utilizing multi-party data. For the scenes that the use of the current privacy data is stricter and stricter, the scheme can well solve the problems of data isolation and data fragmentation. The invention adopts a two-stage training mode in the multi-party combined modeling, can perform characteristic screening on multi-party data characteristics, and can improve the generalization capability of the model to a certain extent. The method belongs to a mode of model optimization, and can be applied to different training models.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims thereof.
Drawings
For the purposes of promoting a better understanding of the objects, aspects and advantages of the invention, reference will now be made to the following detailed description taken in conjunction with the accompanying drawings in which:
FIG. 1 is a schematic representation of the steps of the process of the present invention;
FIG. 2 is a schematic general flow diagram of the process of the present invention;
FIG. 3 is a schematic diagram of a social graph embedding module for voice messages according to the present invention;
FIG. 4 is a schematic diagram of a cell phone application access bipartite graph embedded module employed in the present invention;
FIG. 5 is a schematic diagram of a local machine learning classification module employed by the present invention;
FIG. 6 is a schematic diagram of a secure federal multi-party training model used in the present invention;
fig. 7 is a schematic diagram of secure federal encrypted training in the present invention.
Detailed Description
The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention in a schematic way, and the features in the following embodiments and examples may be combined with each other without conflict.
Wherein the showings are for the purpose of illustrating the invention only and not for the purpose of limiting the same, and in which there is shown by way of illustration only and not in the drawings in which there is no intention to limit the invention thereto; to better illustrate the embodiments of the present invention, some parts of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
The same or similar reference numerals in the drawings of the embodiments of the present invention correspond to the same or similar components; in the description of the present invention, it should be understood that if there is an orientation or positional relationship indicated by terms such as "upper", "lower", "left", "right", "front", "rear", etc., based on the orientation or positional relationship shown in the drawings, it is only for convenience of description and simplification of description, but it is not an indication or suggestion that the referred device or element must have a specific orientation, be constructed in a specific orientation, and be operated, and therefore, the terms describing the positional relationship in the drawings are only used for illustrative purposes, and are not to be construed as limiting the present invention, and the specific meaning of the terms may be understood by those skilled in the art according to specific situations.
The invention provides a telecommunication fraud security federation detection method fusing a homogeneity diagram and a bipartite diagram, as shown in FIG. 1, which specifically comprises the following steps:
a fraud number detection data set is constructed from user different service data collected at a telecom operator. Firstly, data are divided into the following four types according to different service data characteristics: user information data, voice call data, short message communication data and mobile phone application access data. And performing data cleaning operations such as abnormal value processing, missing value processing, normalization processing and the like on the collected metadata, and marking the extracted telecommunication users according to the grasped telecommunication fraud report information, wherein the fraud users are marked as 1, and the non-fraud users are marked as 0.
And constructing a telecommunication user social network homogeneous graph and a user mobile phone application bipartite graph data set by utilizing the preprocessed data, wherein the data set comprises the label information of the user for fraud number two-class training and testing. The specific construction process is as follows:
voice social network graph G1=(U1,E1) Social network G with short message2=(U2,E2) Wherein U isiIs a set of user nodes, EiIs a set of user-user communication relationships. Each edge (i, j) in the edge set belongs to E and has a pair of user node pairs (u)i,uj) While having a weight, i.e. wijAnd the value is more than or equal to 0 and represents the interaction condition between the two users. For Voice social networking graph G1User pair (u)i,uj) Directed edge weights between
Figure BDA0003587212450000071
By extracting (u)i,uj) Feature set of conversation between parties
Figure BDA0003587212450000072
Feature set F1Including but not limited to a number of calls feature f1 (1)Total call duration feature
Figure BDA0003587212450000073
Average call duration feature
Figure BDA0003587212450000074
Talk time period feature
Figure BDA0003587212450000075
Toll call feature
Figure BDA0003587212450000076
Caller on-network time feature
Figure BDA0003587212450000077
Called on-line time characteristics
Figure BDA0003587212450000078
And then, carrying out weighted summation on all elements in the set to obtain the weight of the edge, wherein the weight solving formula is shown as the following formula:
Figure BDA0003587212450000081
wherein alpha isiN is the total number of extracted voice call features as a weighting coefficient. Social network diagram G for short messages as well2Directed edge weight of
Figure BDA0003587212450000082
By extracting (u)i,uj) Feature set of conversation between parties
Figure BDA0003587212450000083
Feature set F2Including but not limited to a transmission times characteristic f1 (2)Total number of bytes of short message
Figure BDA0003587212450000084
Short message average byte number characteristic f3 (2)Time period characteristics for sending short messages
Figure BDA0003587212450000085
Whether the short message is the verification code
Figure BDA0003587212450000086
Sender on-network time characteristics
Figure BDA0003587212450000087
Receiver on-line time characteristics
Figure BDA0003587212450000088
And then, carrying out weighted summation on all elements in the set to obtain the weight of the edge, wherein the weight solving formula is shown as the following formula:
Figure BDA0003587212450000089
wherein beta isiAnd m is the total number of the extracted short message communication characteristics.
Mobile phone application access bipartite graph G3=(U3,V3,E3) Wherein U is3Representing a set of user nodes, V3Representing a set of handset application nodes.
Figure BDA00035872124500000810
Representing a set of relational edges for a user to access a mobile application, each edge having a non-negative weight wijAnd the number is more than or equal to 0, which represents the internet access use condition of the user accessing the mobile phone application. Dichotomy G for mobile phone application access3User APP relationship Pair (u)i,vj) Directed edge weights between
Figure BDA00035872124500000811
By extractingui,vj) Inter-networking feature set
Figure BDA00035872124500000812
Feature set F3Including but not limited to a number of accesses feature f1 (3)Accessing total duration features
Figure BDA00035872124500000813
Average access duration characteristic f3 (3)Accessing total consumed traffic characteristics
Figure BDA00035872124500000814
Average consumption flow characteristic
Figure BDA00035872124500000819
User on-network time characteristics
Figure BDA00035872124500000816
Mobile phone application class features
Figure BDA00035872124500000817
And then, carrying out weighted summation on all elements in the set to obtain the weight of the edge, wherein the weight solving formula is shown as the following formula:
Figure BDA00035872124500000818
wherein gamma isiK is the total number of features extracted from the user access APP data, as a weighting coefficient.
The method comprises the steps of constructing a homogeneous graph embedded network aiming at a voice short message social graph, constructing a bipartite graph embedded network aiming at a mobile phone application access bipartite graph, sampling user nodes in an unsupervised learning mode to obtain a neighbor node co-occurrence sequence, and then obtaining embedded representation of each node through reconstruction of an embedded function and co-occurrence information negative sampling iterative training.
And carrying out splicing operation on the node embedding characteristics output by each embedding model, screening out sample data with label data, dividing the sample data into a training set and a test set according to the label attributes in proportion to be used as the input of a classification model, and finally obtaining an optimal model for the classification prediction of fraud numbers through iterative training of the model on the training set and the test set. And finally, predicting other user data by using the model, and outputting a prediction result to a suspected fraud number database for reference of an operator.
The invention also provides a telecommunication fraud security federation detection device fusing the homogeneity map and the bipartite map, as shown in fig. 2, specifically comprising:
the original data acquisition module is firstly connected to a data warehouse of an operator, periodically extracts user communication data and user access mobile phone application data through HiveSQL, and combines and summarizes data records according to a time period to obtain three user communication tables which are stored in the storage module, wherein the three tables are voice call data, short message communication data and mobile phone application flow use condition data respectively.
The graph data preprocessing module periodically reads a voice call data table, a short message communication data table and a mobile phone application flow use condition data table stored in the memory, extracts interaction relations between users and between mobile phone applications in each table through combination and summarization, and stores three kinds of interaction graph data in a form of an adjacent table.
And the graph embedding feature extraction module is used for dividing the processed three graph structure data into two types and respectively extracting features. The first type is a telecommunication user social network homogeneity graph G based on voice and short message data1And G2. The second type is a mobile phone application access bipartite graph G based on the traffic use condition of the mobile phone application3
FIG. 3 is a schematic diagram of a feature-embedded network of a homogeneity map. For homogeneity map G1And G2The embedded feature extraction comprises the following specific steps:
the method comprises the following steps: a node embedding mapping module for mapping the user node from the graph domain to the embedding domain, i.e. when the index i of the user node is given, the node u can be directly obtainediIs embedded in uiThe mapping function can be expressed as:
Figure BDA0003587212450000095
wherein e isi∈{0,1}NRepresenting user node uiOne-hot encoding (where N ═ U denotes the number of user nodes). e.g. of the typeiRepresenting the corresponding i-th element e in the vectori[i]Is 1, and the other elements are all 0. WN×dIs the embedding parameter matrix to be learned, where d is the dimension of the embedding. The ith row of the matrix W is node uiIs shown embedded.
Step two: a map domain co-occurrence information extraction module for extracting the key structure information of the user node in the map domain
Figure BDA0003587212450000091
I.e. reconstructing the domain information of the node according to the first-order and second-order similarities of the node
Figure BDA0003587212450000092
Wherein, the first-order similarity refers to the local pairwise similarity between user nodes in the network, and the formalization is described as if node uiAnd ujThere is a direct edge between them, then the weight w of the direct edgeijThe similarity of the two vertexes is obtained; if no straight edge exists, the first-order similarity is 0. For node uiAnd ujThe similarity joint probability distribution of the undirected edges is defined as:
Figure BDA0003587212450000093
the empirical distribution among nodes in the embedded domain is defined as follows:
Figure BDA0003587212450000094
wherein v isi∈RdDisplay sectionPoint uiD-dimensional vector representation in the embedded domain.
The second-order similarity refers to similarity of neighbor nodes of user nodes in the network. Formalized by the definition of pu=(wu,1,wu,2,…,wu,|V|) Representing the first-order similarity between the node u and all other nodes, the second-order similarity between the nodes u and v can be represented by puAnd pvIs expressed by the similarity of (c). If there is no common neighbor node between nodes u and v, the second-order similarity between u and v is 0. The second order similarity may express a global feature of the graph.
For second-order similarity, two embedded vectors are required to be introduced into each node for characterization, and one embedded vector is used for characterizing the node per se, namely the embedded vector u of the central nodecen(ii) a The other is an embedded vector of context nodes as other nodes, namely a neighborhood node embedding ucon. Thus in the graph domain, for an arbitrary edge (u)i,uj) E, the joint distribution of the two is defined as:
Figure BDA0003587212450000101
wherein, wijIs node uiAnd ujWeight of the edge of (d)iIs a vertex uiNumber of neighbor nodes, N (u)i) Is node uiThe set of domain nodes.
And in the embedding domain, the conditional probability between nodes is uiIn the presence of ujThe probability of presence is defined as:
Figure BDA0003587212450000102
wherein the content of the first and second substances,
Figure BDA0003587212450000103
representing a node uiIs embedded in the central node of the network,
Figure BDA0003587212450000104
representing a node uiThe neighborhood node of (1) is embedded, and the | V | represents the number of the neighborhood nodes.
Step three: an embedded domain information reconstruction module for reconstructing the extracted map domain co-occurrence information using the embedded representation of the embedded domain
Figure BDA0003587212450000105
The reconstructed information is represented as
Figure BDA0003587212450000106
Step four: an objective function optimization module based on co-occurrence information
Figure BDA0003587212450000107
And reconstructing information
Figure BDA0003587212450000108
The target function of (2) is optimized, and the mapping function and all parameters involved in the reconstructor are learned.
Where the first order similarity measures the difference between two probability distributions using KL divergence. The optimization objective function of the first-order similarity obtained after the constant term is omitted is as follows:
Figure BDA0003587212450000109
the second-order similarity also adopts KL divergence to calculate the difference of different distributions, and the optimization objective function of the second-order similarity obtained by neglecting a constant term is as follows:
Figure BDA00035872124500001010
FIG. 4 is a schematic diagram of a bipartite graph embedded network architecture. For bipartite graph G3The embedded feature extraction comprises the following specific steps:
the method comprises the following steps: bipartite graph reconstruction Module, original bipartite graph G3In which there are two types of node assemblies, fraud is taken into accountThe number detection task only needs to pay attention to the characteristics of the user nodes, so that the user node-based homogeneous graph only needs to be split out G in the moduleUFeature extraction is performed as an implicit relationship.
Step two: a node embedding mapping module for mapping each node of the bipartite graph from the graph domain to the embedding domain by uiAnd viRepresenting user node ui∈U3And mobile phone application node vi∈V3The embedded vector of (2).
Step three: the map domain co-occurrence information extraction module is used for extracting key structure information of the user node in the map domain
Figure BDA0003587212450000111
Step four: an embedded domain information reconstruction module for reconstructing the extracted map domain co-occurrence information using the embedded representation of the embedded domain
Figure BDA0003587212450000112
The reconstructed information is represented as
Figure BDA0003587212450000113
For bipartite graph G3Given a node pair (u)i,vj)∈E3Wherein u isi∈U3And v3∈V3And the joint probability between two nodes in the graph domain is as follows:
Figure BDA0003587212450000114
whereas the empirical distribution of nodes within the embedding domain is:
Figure BDA0003587212450000115
for explicit relationships, the difference in the distribution of the map domain and the embedded domain is measured by the KL divergence, so the objective function is:
Figure BDA0003587212450000116
the final objective function after ignoring the constant term is:
Figure BDA0003587212450000117
implicit relationship homogeneity map G for bipartite mapsUAnd training optimization is carried out by adopting homogeneity graph embedding based on first-order similarity. The joint probability distribution of the user nodes, the experience distribution of the nodes in the embedded domain and the objective function to be optimized are as follows:
Figure BDA0003587212450000118
Figure BDA0003587212450000119
Figure BDA00035872124500001110
step four: an objective function optimization module based on co-occurrence information
Figure BDA0003587212450000121
And reconstructing information
Figure BDA0003587212450000122
Object function O of5And optimizing, and learning all parameters related in the mapping function and the reconstructor. The final joint optimization overall objective function is:
maximize O5=-μO3+ηO4
wherein, O3For bipartite graph nodes explicit relationship objective function, O4Implicit relational objective function for bipartite graph nodesThe numbers μ and η are the hyper-parameters to be specified for combining the different components in the joint optimization.
Obtaining three kinds of embedded vector feature representations X of the user through the iterative optimization of the graph embedding module1,X2,X3
Fig. 5 is a fraud user detection local classification model architecture employed by the present invention, and fig. 6 is a joint training model architecture for performing secure federal learning in conjunction with a multi-party local model. For the local models of a plurality of participants, firstly, the fraud user detection module is used for sorting the user basic information characteristics X of the data processing module0Embedding features X with a user1,X2,X3And splicing to obtain a telecommunication user characteristic combination table, and combining the telecommunication user label data obtained by the actual alarm information to form a fraud user detection data set. The same method is adopted for a plurality of participants to construct a sample data set, then each participant conducts encryption sample entity alignment on the operation process through a central server as a coordinator, encryption operation and exchange are conducted on the operation results of local data models of each participant, optimal model parameters are finally obtained through continuous iterative optimization to be used for result prediction of fraud numbers, and users predicted as fraud numbers are led out to a suspicious user list to be further researched and used. In this module, the local classification models used by each organization include, but are not limited to, logistic regression, decision trees, deep learning networks, ensemble learning, and the like.
Fig. 7 is a schematic diagram of secure federal multi-party combined encryption training. And (3) adopting a secondary training mode in the training process, wherein the first training is used for carrying out feature screening, the feature importance weight is obtained after the first training of each participant feature is finished, and the features are sorted according to the value to screen out the features which are ranked at the top 50. And then enabling the participants owning the characteristics to carry out secondary joint modeling, and providing the result of secondary training as output to the label owner operator. The operator extracts a list of fraud numbers from the prediction as a reference.
In a preferred embodiment, when a new type of telecommunication fraud mode occurs, the new fraud sample is classified and labeled, sample data of normal users and new fraud user are selected and input into the trained model, and the model can adapt to the detection of the new fraud type through iterative optimization of model parameters.
According to the embodiment of the invention, different types and different quantities of data sets are selected in different processes, so that a telecommunication user fraud detection method based on the voice short message social graph and the mobile phone application access bipartite graph can be realized, and a fraud user in the telecommunication users can be detected and identified.
Finally, the above embodiments are only intended to illustrate the technical solutions of the present invention and not to limit the present invention, and although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions, and all of them should be covered by the claims of the present invention.

Claims (8)

1. A telecommunication fraud security federation detection method fusing a homogeneity graph and a bipartite graph is characterized in that: the method comprises the following steps:
s1: based on user service data of a telecom operator, extracting voice call data, short message communication data and mobile phone application access data of a user, and preprocessing the voice call data, the short message communication data and the mobile phone application access data;
s2: constructing a telecommunication user social network homogeneous graph and a user mobile phone application bipartite graph data set by utilizing the preprocessed data, wherein the graph data set comprises three types of weighted graphs of a voice social network homogeneous graph, a short message social network homogeneous graph and a mobile phone application access bipartite graph, and the weight setting of the sides is used for carrying out statistical feature extraction and weight aggregation according to the characteristics of different services;
s3: constructing a homogeneous graph embedded network aiming at a social network homogeneous graph, constructing a bipartite graph embedded network aiming at a bipartite graph applied to a mobile phone accessed by a user, sampling user nodes by adopting a graph embedding learning mode to obtain a neighbor node co-occurrence sequence, and then obtaining an embedded representation of each node by reconstructing an embedded function and performing negative sampling iterative training on co-occurrence information; fusing the embedding characteristics obtained by training as the embedding representation of the user;
s4: different participants extract local telecommunication user characteristics according to local data characteristics, and a safe federal gradient elevated tree classification model is adopted to carry out combined training on local data of different organizations; and performing encrypted data sample alignment and encrypted model parameter exchange on sample data among different mechanisms through a reliable third-party server, thereby realizing multi-party model combined training, wherein a two-stage training method is adopted in the training process, the first-stage training is used for screening the features, the second-stage training is used for classifying the screened features, and a final prediction result of the fraud number is output.
2. The telecommunication fraud security federation detection method of fusing a homogeneity map and a bipartite map according to claim 1, wherein: step S1 specifically includes: constructing a fraud number detection data set by utilizing different service data of users collected from a telecom operator; the data is divided into the following four types according to different service data characteristics: user basic information data, voice call data, short message communication data and mobile phone application access data; performing data cleaning operation on the acquired data, wherein the data cleaning operation comprises abnormal value processing, missing value processing and standardized processing; and meanwhile, marking the extracted telecommunication users according to the grasped telecommunication fraud report information, wherein the fraud users are marked as 1, and the non-fraud users are marked as 0.
3. The telecommunication fraud security federation detection method fusing a homogeneity map and a bipartite map according to claim 1, wherein: in step S2, the process of constructing the bipartite graph for the social network homogeneity graph of voice and sms messages and the user access to the mobile phone includes: aiming at voice and short message data, extracting a voice social graph G of a telecommunication user according to calling and called relations of voice communication1(ii) a Constructing a short message social graph G according to the uplink and downlink receiving and sending relations of the short message communication2(ii) a Aiming at the user internet log data, summarizing and merging the data according to the record of the user accessing the mobile phone application to obtain a mobile phone application access bipartite graph G3(ii) a All the three types of graph data are weighted graphsThe method comprises the following steps that the side weight of a voice social graph is weighted and evaluated according to the communication relation characteristics between calling and called parties, the side weight of a short message social graph is weighted and evaluated according to the communication relation characteristics of a receiving and sending user, and the side weight of a user accessing a mobile phone application bipartite graph is weighted and evaluated according to the internet access condition characteristics of the user accessing the mobile phone application.
4. The telecommunication fraud security federation detection method fusing a homogeneity map and a bipartite map according to claim 3, wherein: in step S2, the constructing a telecommunication user social network homogeneity map and user mobile phone application bipartite graph data set by using the preprocessed data specifically includes:
voice social network graph G1=(U1,E1) Social network G with short message2=(U2,E2) Wherein U isiIs a set of user nodes, EiIs a user and user communication relation set; each edge (i, j) in the edge set belongs to E and has a pair of user node pairs (u)i,uj) Having a weight wijThe number is more than or equal to 0, which represents the interaction condition between two users;
for Voice social networking graph G1User pair (u)i,uj) Directed edge weights between
Figure FDA0003587212440000021
By extracting (u)i,uj) Feature set of conversation between parties
Figure FDA0003587212440000022
Feature set F1Including but not limited to a number of calls feature f1 (1)Total call duration characteristics
Figure FDA0003587212440000023
Average call duration feature
Figure FDA0003587212440000024
Talk time period feature
Figure FDA0003587212440000025
Toll call feature
Figure FDA0003587212440000026
Caller on-network time feature
Figure FDA0003587212440000027
Called on-line time characteristics
Figure FDA0003587212440000028
Then, the weighted summation is carried out on all the elements in the set to obtain the weight of the edge, and the weight solving formula is shown as the following formula:
Figure FDA0003587212440000029
wherein alpha isiThe weighting coefficient is n, and the n is the total number of the extracted voice call features;
for short message social network graph G2Directed edge weight of
Figure FDA00035872124400000210
By extracting (u)i,uj) Feature set of conversation between parties
Figure FDA00035872124400000211
Figure FDA00035872124400000212
Feature set F2Including but not limited to a transmission times characteristic f1 (2)Total byte number characteristic of short message
Figure FDA00035872124400000213
Average byte number characteristic of short message
Figure FDA00035872124400000214
Short message sending time period characteristics
Figure FDA00035872124400000215
Whether the short message is the verification code
Figure FDA00035872124400000216
Sender on-network time characteristics
Figure FDA00035872124400000217
Receiver on-line time characteristics
Figure FDA00035872124400000218
Then, the weighted summation is carried out on all the elements in the set to obtain the weight of the edge, and the weight solving formula is shown as the following formula:
Figure FDA00035872124400000219
wherein beta isiM is the total number of the extracted short message communication characteristics as a weighting coefficient;
mobile phone application access bipartite graph G3=(U3,V3,E3) Wherein U is3Representing a set of user nodes, V3Representing a mobile phone application node set;
Figure FDA00035872124400000220
representing a set of relational edges for a user to access a mobile application, each edge having a non-negative weight wijThe number is more than or equal to 0, which represents the internet access use condition of the user accessing the mobile phone application; dichotomy G for mobile phone application access3User handset application relationship pair (u)i,vj) Directed edge weights between
Figure FDA00035872124400000221
By extracting (u)i,vj) Inter-networking feature set
Figure FDA00035872124400000222
Feature set F3Including but not limited to a number of accesses feature f1 (3)Total length of access feature
Figure FDA00035872124400000223
Average access duration feature
Figure FDA00035872124400000224
Accessing total consumed traffic characteristics
Figure FDA00035872124400000225
Average consumption flow characteristic
Figure FDA00035872124400000226
User on-network time characteristics
Figure FDA00035872124400000227
Mobile phone application class features
Figure FDA0003587212440000031
Then, the weighted summation is carried out on all the elements in the set to obtain the weight of the edge, and the weight solving formula is shown as the following formula:
Figure FDA0003587212440000032
wherein gamma isiK is the total number of features extracted from the user access APP data, as a weighting coefficient.
5. The telecommunication fraud security federation detection method fusing a homogeneity map and a bipartite map according to claim 1, wherein: step S3 specifically includes the following steps:
s31: according to the constructed social network homogeneous graph, the short message social network homogeneous graph and the mobile phone application access bipartite graph, graph embedding training is carried out on the user nodes by adopting corresponding graph embedding models respectively;
s32: finding out a neighbor sequence set of the user node according to first-order and second-order neighbor similarity between the nodes of the homogeneous graph, and finding out a neighbor sequence set of the user node according to an explicit relation and an implicit relation of the bipartite graph;
s33: and respectively splicing the node embedding obtained by the first-order similarity training and the node embedding obtained by the second-order training to obtain the embedding vector of the user node of the homogeneous graph, and carrying out combined optimization training on the explicit relation and the implicit relation to obtain the user node embedding vector of the bipartite graph.
6. The telecommunication fraud security federation detection method fusing a homogeneity map and a bipartite map according to claim 5, wherein: in step S3, for the homogenous graph, the user node is mapped from the graph domain to the embedded domain, i.e. when the user node index i is given, the node u is directly obtainediIs embedded in uiThe mapping function is expressed as:
Figure FDA0003587212440000033
wherein e isi∈{0,1}NRepresenting user node uiOne-hot encoding (where N ═ U | represents the number of user nodes; e.g. of the typeiRepresenting the corresponding i-th element e in the vectori[i]Is 1, and the other elements are 0; wN×dIs the embedding parameter matrix to be learned, where d is the embedding dimension; the ith row of the matrix W is node uiAn embedded representation of (a);
splitting the homogeneous graph based on the user nodes into G for the bipartite graphUExtracting features as implicit relation, mapping nodes of bipartite graph from graph domain to embedded domain, and using u as each nodeiAnd viRepresenting user node ui∈U3And mobile phone application node vi∈V3The embedded vector of (2);
extracting key nodes of user nodes in graph domainConstruct information
Figure FDA0003587212440000034
Wherein the homogeneous graph network reconstructs domain information for the nodes based on first and second order similarities of the nodes
Figure FDA0003587212440000035
The bipartite graph network models and extracts the key structure information of the user nodes in the graph domain according to the explicit relation and the implicit relation of the graph domain nodes
Figure FDA0003587212440000036
Reconstructing extracted graph domain co-occurrence information using embedded representation of embedded domain
Figure FDA0003587212440000037
And
Figure FDA0003587212440000038
the reconstructed information is represented as
Figure FDA0003587212440000039
And
Figure FDA00035872124400000310
by pairing co-occurrence-based information
Figure FDA00035872124400000311
And reconstructing information
Figure FDA00035872124400000312
The target function of (2) is optimized, and a mapping function and all parameters related in a reconstructor are learned;
for the homogeneity map, the objective function that the first order similarity needs to be optimized is:
Figure FDA0003587212440000041
the objective function to be optimized for the second order similarity is:
Figure FDA0003587212440000042
for bipartite graph G3The optimization objective function for modeling by the explicit relationship is:
Figure FDA0003587212440000043
the optimization objective function for modeling by the implicit relationship is as follows:
Figure FDA0003587212440000044
by pairing co-occurrence-based information
Figure FDA0003587212440000045
And reconstructing information
Figure FDA0003587212440000046
Object function O of5Optimizing, and learning all parameters related in the mapping function and the reconstructor; the final joint optimization overall objective function of the bipartite graph is as follows:
maximize O5=-μO3+ηO4
where μ and η are the hyper-parameters to be specified for combining the different components in the joint optimization.
7. The telecommunication fraud security federation detection method fusing a homogeneity map and a bipartite map according to claim 1, wherein: step S4 specifically includes the following steps:
s41: splicing the homogeneity graph and the bipartite graph embedding vector to obtain a final node embedding characteristic, and combining the basic user characteristic and label information, wherein information is input into a safe federal gradient elevation tree classification model for primary training;
s42: sorting the features obtained by the training of the first stage according to importance, screening out the features n before ranking, and distributing the features to different participants for optimizing the features;
s43: after different participants carry out feature screening, carrying out second-stage federal gradient elevation tree classification training again, and outputting fraud number prediction results;
s44: and processing the final classification result of the user and outputting a suspicious fraud number list.
8. The telecommunication fraud security federation detection method fusing a homogeneity map and a bipartite map according to claim 1, wherein: the two-stage training process of the safe federated gradient lifting tree model comprises encrypted sample alignment and encrypted model training; in the training process, the central server carries out encryption exchange on the intermediate calculation result and the parameters of the model to finally obtain the optimal model parameter combination; the encryption mode is carried out by adopting a mode based on an RSA algorithm and a hash function; in the training process, the local data are calculated only locally, the calculation result is encrypted and then transmitted to the central server, and other participants cannot obtain the details of the local data.
CN202210397973.8A 2022-04-08 2022-04-08 Telecommunication fraud security federation detection method fusing homogeneous graph and bipartite graph Pending CN114693317A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210397973.8A CN114693317A (en) 2022-04-08 2022-04-08 Telecommunication fraud security federation detection method fusing homogeneous graph and bipartite graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210397973.8A CN114693317A (en) 2022-04-08 2022-04-08 Telecommunication fraud security federation detection method fusing homogeneous graph and bipartite graph

Publications (1)

Publication Number Publication Date
CN114693317A true CN114693317A (en) 2022-07-01

Family

ID=82142402

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210397973.8A Pending CN114693317A (en) 2022-04-08 2022-04-08 Telecommunication fraud security federation detection method fusing homogeneous graph and bipartite graph

Country Status (1)

Country Link
CN (1) CN114693317A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115664847A (en) * 2022-12-08 2023-01-31 南京金科院大学科技园管理有限公司 User information safe storage method of internet education platform
CN117009999A (en) * 2023-09-22 2023-11-07 中关村科学城城市大脑股份有限公司 Smart park data storage method, device, equipment and computer readable medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8561184B1 (en) * 2010-02-04 2013-10-15 Adometry, Inc. System, method and computer program product for comprehensive collusion detection and network traffic quality prediction
CN110555455A (en) * 2019-06-18 2019-12-10 东华大学 Online transaction fraud detection method based on entity relationship
CN112866486A (en) * 2021-02-01 2021-05-28 西安交通大学 Multi-source feature-based fraud telephone identification method, system and equipment
CN113362160A (en) * 2021-06-08 2021-09-07 南京信息工程大学 Federal learning method and device for credit card anti-fraud
CN113420232A (en) * 2021-06-02 2021-09-21 杭州电子科技大学 Privacy protection-oriented graph neural network federal recommendation method
CN113569906A (en) * 2021-06-10 2021-10-29 重庆大学 Heterogeneous graph information extraction method and device based on meta-path subgraph
CN113887577A (en) * 2021-09-14 2022-01-04 同济大学 Fine-grained telecommunication network anti-fraud detection method based on microscopic event map

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8561184B1 (en) * 2010-02-04 2013-10-15 Adometry, Inc. System, method and computer program product for comprehensive collusion detection and network traffic quality prediction
CN110555455A (en) * 2019-06-18 2019-12-10 东华大学 Online transaction fraud detection method based on entity relationship
CN112866486A (en) * 2021-02-01 2021-05-28 西安交通大学 Multi-source feature-based fraud telephone identification method, system and equipment
CN113420232A (en) * 2021-06-02 2021-09-21 杭州电子科技大学 Privacy protection-oriented graph neural network federal recommendation method
CN113362160A (en) * 2021-06-08 2021-09-07 南京信息工程大学 Federal learning method and device for credit card anti-fraud
CN113569906A (en) * 2021-06-10 2021-10-29 重庆大学 Heterogeneous graph information extraction method and device based on meta-path subgraph
CN113887577A (en) * 2021-09-14 2022-01-04 同济大学 Fine-grained telecommunication network anti-fraud detection method based on microscopic event map

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
JUNSHUAI SONG 等: "A subgraph-based knowledge resoning method for collective fraud detection in E-commerce", 《NEUROCOMPUTING》, vol. 461, 21 October 2021 (2021-10-21), pages 587 - 597, XP086797121, DOI: 10.1016/j.neucom.2021.03.134 *
张林泉: "融合图嵌入与纵向联邦学习的诈骗号码检测方法研究", 《中国优秀硕士学位论文全文数据库 基础科学辑》, no. 6, 15 June 2023 (2023-06-15), pages 002 - 21 *
高雅丽: "面向大数据的网络威胁情报可信感知关键技术研究", 《中国博士学位论文全文数据库信息科技辑》, no. 2, 15 February 2021 (2021-02-15), pages 139 - 8 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115664847A (en) * 2022-12-08 2023-01-31 南京金科院大学科技园管理有限公司 User information safe storage method of internet education platform
CN117009999A (en) * 2023-09-22 2023-11-07 中关村科学城城市大脑股份有限公司 Smart park data storage method, device, equipment and computer readable medium
CN117009999B (en) * 2023-09-22 2024-01-16 中关村科学城城市大脑股份有限公司 Smart park data storage method, device, equipment and computer readable medium

Similar Documents

Publication Publication Date Title
Pourhabibi et al. Fraud detection: A systematic literature review of graph-based anomaly detection approaches
Soto et al. Prediction of socioeconomic levels using cell phone records
Sarraute et al. A study of age and gender seen through mobile phone usage patterns in mexico
Choudhury et al. A novel approach to fake news detection in social networks using genetic algorithm applying machine learning classifiers
CN114693317A (en) Telecommunication fraud security federation detection method fusing homogeneous graph and bipartite graph
Taha et al. SIIMCO: A forensic investigation tool for identifying the influential members of a criminal organization
Lu et al. Telecom fraud identification based on ADASYN and random forest
CN107527240A (en) A kind of operator's industry product Praise effect identification system and method
CN112380572B (en) Privacy protection quantitative evaluation method and system under multi-party data collaborative scene
CN114140036A (en) Enterprise portrait method, system and readable storage medium based on data analysis
CN113159866A (en) Method for building network user trust evaluation model in big data environment
Barman et al. A complete literature review on financial fraud detection applying data mining techniques
CN113010578A (en) Community data analysis method and device, community intelligent interaction platform and storage medium
Pang et al. Information matching model and multi-angle tracking algorithm for loan loss-linking customers based on the family mobile social-contact big data network
Hui Construction of information security risk assessment model in smart city
Shaikh et al. A model for identifying relationships of suspicious customers in money laundering using social network functions
WO2022143431A1 (en) Method and apparatus for training anti-money laundering model
US20210357942A1 (en) Method and apparatus for identifying risky vertices
Gursoy et al. Customer churn behaviour predicting using social network analysis techniques: A case study
CN108564380B (en) Telecommunication user classification method based on iterative decision tree
Mu Spam identification in cloud computing based on text filtering system
Susanto The Digital Poverty and Empowerment Issue in Indonesia
Radhi Adaptive learning system of ontology using semantic web to mining data from distributed heterogeneous environment
Knyazeva et al. A graph-based data mining approach to preventing financial fraud: a case study
Birhanu Near Real-time SIM-box Fraud Detection in Telecommunication System Using Machine Learning Approach in the Case of Ethio Telecom.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination