CN114693317A - Telecommunication fraud security federation detection method fusing homogeneous graph and bipartite graph - Google Patents
Telecommunication fraud security federation detection method fusing homogeneous graph and bipartite graph Download PDFInfo
- Publication number
- CN114693317A CN114693317A CN202210397973.8A CN202210397973A CN114693317A CN 114693317 A CN114693317 A CN 114693317A CN 202210397973 A CN202210397973 A CN 202210397973A CN 114693317 A CN114693317 A CN 114693317A
- Authority
- CN
- China
- Prior art keywords
- graph
- user
- data
- mobile phone
- bipartite
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 34
- 238000012549 training Methods 0.000 claims abstract description 59
- 238000004891 communication Methods 0.000 claims abstract description 26
- 238000013145 classification model Methods 0.000 claims abstract description 8
- 238000005070 sampling Methods 0.000 claims abstract description 7
- 238000007781 pre-processing Methods 0.000 claims abstract description 4
- 230000006870 function Effects 0.000 claims description 40
- 238000000034 method Methods 0.000 claims description 31
- 238000005457 optimization Methods 0.000 claims description 21
- 239000013598 vector Substances 0.000 claims description 18
- 230000008569 process Effects 0.000 claims description 16
- 238000013507 mapping Methods 0.000 claims description 15
- 238000000605 extraction Methods 0.000 claims description 12
- 238000012545 processing Methods 0.000 claims description 12
- 238000012216 screening Methods 0.000 claims description 9
- 239000000284 extract Substances 0.000 claims description 7
- 239000011159 matrix material Substances 0.000 claims description 6
- 238000004140 cleaning Methods 0.000 claims description 5
- 230000003993 interaction Effects 0.000 claims description 5
- 230000002159 abnormal effect Effects 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 claims description 4
- 230000005540 biological transmission Effects 0.000 claims description 3
- 230000006855 networking Effects 0.000 claims description 3
- 238000012795 verification Methods 0.000 claims description 3
- 230000002776 aggregation Effects 0.000 claims description 2
- 238000004220 aggregation Methods 0.000 claims description 2
- 238000004422 calculation algorithm Methods 0.000 claims description 2
- 230000007246 mechanism Effects 0.000 claims description 2
- 238000010586 diagram Methods 0.000 abstract description 16
- 238000007405 data analysis Methods 0.000 abstract description 2
- 238000007418 data mining Methods 0.000 abstract description 2
- 238000009826 distribution Methods 0.000 description 9
- 238000010801 machine learning Methods 0.000 description 8
- 230000008901 benefit Effects 0.000 description 4
- 230000006399 behavior Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000008520 organization Effects 0.000 description 2
- 206010063385 Intellectualisation Diseases 0.000 description 1
- 230000008485 antagonism Effects 0.000 description 1
- 230000000712 assembly Effects 0.000 description 1
- 238000000429 assembly Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 238000007635 classification algorithm Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013499 data model Methods 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000013467 fragmentation Methods 0.000 description 1
- 238000006062 fragmentation reaction Methods 0.000 description 1
- 238000000265 homogenisation Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000002955 isolation Methods 0.000 description 1
- 238000007477 logistic regression Methods 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000013515 script Methods 0.000 description 1
- 238000011524 similarity measure Methods 0.000 description 1
- 238000003860 storage Methods 0.000 description 1
- 238000000547 structure data Methods 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/018—Certifying business or products
- G06Q30/0185—Product, service or business identity fraud
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W12/00—Security arrangements; Authentication; Protecting privacy or anonymity
- H04W12/12—Detection or prevention of fraud
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W12/00—Security arrangements; Authentication; Protecting privacy or anonymity
- H04W12/12—Detection or prevention of fraud
- H04W12/128—Anti-malware arrangements, e.g. protection against SMS fraud or mobile malware
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Security & Cryptography (AREA)
- Computer Networks & Wireless Communication (AREA)
- Marketing (AREA)
- Economics (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- Signal Processing (AREA)
- Data Mining & Analysis (AREA)
- Development Economics (AREA)
- Computing Systems (AREA)
- Accounting & Taxation (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Finance (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Entrepreneurship & Innovation (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Human Resources & Organizations (AREA)
- Primary Health Care (AREA)
- Tourism & Hospitality (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Telephonic Communication Services (AREA)
Abstract
The invention relates to a safety federal telecommunication fraud detection method fusing a homogeneity diagram and a bipartite diagram, belonging to the field of big data analysis and mining, and comprising the following steps of S1: based on user service data of a telecom operator, extracting and preprocessing voice call data, short message communication data and mobile phone application access data of a user; s2: constructing a telecommunication user social network homogeneous graph and a user mobile phone application bipartite graph data set; s3: constructing a homogeneous graph embedded network aiming at a social network homogeneous graph, constructing a bipartite graph embedded network aiming at a bipartite graph applied to a mobile phone accessed by a user, sampling user nodes to obtain a neighbor node co-occurrence sequence, performing iterative training to obtain embedded representations of all nodes, and fusing the embedded representations as embedded representations of the user; s4: and extracting local telecommunication user characteristics by different participants according to the local data characteristics, performing combined training on local data of different organizations by adopting a safe federal gradient elevated tree classification model, and outputting a final prediction result of the fraud number.
Description
Technical Field
The invention belongs to the field of big data analysis and mining, and relates to a telecommunication fraud security federation detection method fusing a homogeneity diagram and a bipartite diagram.
Background
With the development of mobile communication and the popularization of various network applications, the global telecommunication phishing situation is getting stronger and there is a trend of gradually moving to high technology and moving to phishing. Today, with the rapid development of internet technology, telecommunication phishing is increasingly becoming one of the "stubborn" societies in countries around the world. At present, the implementation of global telecommunication network fraud still mainly involves telephone contact, and increasingly presents new problems with new characteristics such as intellectualization, industrialization, homogenization and the like, and fraud objects gradually change from wide-spread type to precise fraud. The fraud modes are gradually spread from telephone, short message and email to social network sites and mobile phone applications, various fraud means are continuously renewed, the technical antagonism is continuously enhanced, fraud scripts are closely attached to social hotspots and personal privacy, and the fraud modes are gradually changed from domestic fraud to cross-border fraud.
Currently, the fraud number detection schemes in the industry mainly include two schemes, namely a rule-based expert system and a machine learning-based model system. The rule-based expert system needs anti-fraud experts to manually analyze a large amount of normal and abnormal telecommunication data, accurately identify the fraudulent behavior modes of fraudulent molecules, find important characteristics capable of effectively distinguishing whether fraud is caused, and write expert rules to detect the fraudulent behavior. Rule-based expert systems are therefore strongly dependent on the expertise and business knowledge of anti-fraud experts, causing huge losses if the experts are not able to detect the increasingly complex patterns of fraud with great acuity in time.
With the continuous expansion of data scale and the continuous increase of machine computing power, model systems based on machine learning have appeared. Machine learning based models are typically feature analyzed from historical transaction data, after which the models are trained and evaluated on feature data sets using machine learning classification algorithms and then applied to fraud number detection. Whether it is a rule-based expert system or a machine learning-based model system, individual behavior patterns that repeatedly occur when transaction fraud occurs are discovered from historical data. As the specialization degree of telecommunication fraud is continuously increased, fraud molecules can evade fraud detection by changing self fraud techniques, but the fraud molecules have difficulty in changing all the association relations. When the associated network covers a large range, the spidrome trail is revealed by the fraudulent molecules even if they take further care. Therefore, in the context of large-scale data, how to mine effective features to improve the effect of model fraud detection is a new direction currently explored by researchers.
In the present day that data security is more and more emphasized, there is often great difficulty in directly using telecommunication big data. The problem of difficult data integration exists among operators and related enterprises, even among different business departments of the same organization, so that the joint training of the telecommunication user characteristic data extracted by different departments is also the current research focus.
Disclosure of Invention
In view of the above, in order to fully utilize the communication service data of each operator and the fraud number label data of the public security department to identify fraud numbers, the invention provides a fraud number feature extraction and classification method based on a voice short message social graph and a mobile phone application access bipartite graph based on graph embedding learning.
In order to achieve the purpose, the invention provides the following technical scheme:
a telecommunication fraud security federation detection method fusing a homogeneity graph and a bipartite graph comprises the following steps:
s1: based on user service data of a telecom operator, extracting voice call data, short message communication data and mobile phone application access data of a user, and preprocessing the voice call data, the short message communication data and the mobile phone application access data;
s2: constructing a telecommunication user social network homogeneous graph and a user mobile phone application bipartite graph data set by utilizing the preprocessed data, wherein the graph data set comprises three types of weighted graphs of a voice social network homogeneous graph, a short message social network homogeneous graph and a mobile phone application access bipartite graph, and the weight setting of the sides is used for carrying out statistical feature extraction and weight aggregation according to the characteristics of different services;
s3: constructing a homogeneous graph embedded network aiming at a social network homogeneous graph, constructing a bipartite graph embedded network aiming at a bipartite graph applied to a mobile phone accessed by a user, sampling user nodes by adopting a graph embedding learning mode to obtain a neighbor node co-occurrence sequence, and then obtaining an embedded representation of each node by reconstructing an embedded function and performing negative sampling iterative training on co-occurrence information; fusing the embedding characteristics obtained by training as the embedding representation of the user;
s4: different participants extract local telecommunication user characteristics according to local data characteristics, and a safe federal gradient elevated tree classification model is adopted to carry out combined training on local data of different organizations; and performing encrypted data sample alignment and encrypted model parameter exchange on sample data among different mechanisms through a reliable third-party server, thereby realizing multi-party model combined training, wherein a two-stage training method is adopted in the training process, the first-stage training is used for screening the features, the second-stage training is used for classifying the screened features, and a final prediction result of the fraud number is output.
Further, step S1 specifically includes: constructing a fraud number detection data set by utilizing different service data of users collected from a telecom operator; the data is divided into the following four types according to different service data characteristics: the mobile phone comprises user basic information data, voice call data, short message communication data and mobile phone application access data; performing data cleaning operation on the acquired data, wherein the data cleaning operation comprises abnormal value processing, missing value processing and standardized processing; and meanwhile, marking the extracted telecommunication users according to the grasped telecommunication fraud report information, wherein the fraud users are marked as 1, and the non-fraud users are marked as 0.
Further, in step S2, the process of constructing the voice and short message social network homogeneity map and the user application bipartite map for accessing the mobile phone includes: aiming at voice and short message data, extracting a telecommunication user voice social graph G according to the calling and called relations of the voice call1(ii) a Constructing a short message social graph G according to the uplink and downlink transceiving relation of short message communication2(ii) a Aiming at the user internet log data, summarizing and merging the data according to the record of the user accessing the mobile phone application to obtain a mobile phone application access bipartite graph G3(ii) a The three types of graph data are in the form of weighted graphs, wherein the edge weight of the voice social graph is weighted and evaluated according to the communication relation characteristics between calling and called partiesThe edge weight of the short message social graph is subjected to weighted evaluation according to the communication relation characteristics of the receiving and sending users, and the edge weight of the bipartite graph accessed by the user is subjected to weighted evaluation according to the internet access condition characteristics of the application accessed by the user.
Further, in step S2, constructing a telecommunication user social network homogeneity graph and a user mobile phone application bipartite graph data set by using the preprocessed data specifically includes:
voice social network graph G1=(U1,E1) Social network G with short message2=(U2,E2) Wherein U isiIs a set of user nodes, EiIs a user and user communication relation set; each edge (i, j) in the edge set belongs to E and has a pair of user node pairs (u)i,uj) Having a weight wijThe number is more than or equal to 0, which represents the interaction condition between two users;
for Voice social networking graph G1User pair (u)i,uj) Directed edge weights betweenBy extracting (u)i,uj) Feature set of conversation between partiesFeature set F1Including but not limited to a number of calls feature f1 (1)Total call duration characteristicsAverage call duration featureTalk time period featureToll call featureCaller on-network time featureCalled on-line time characteristicsThen, the weighted summation is carried out on all the elements in the set to obtain the weight of the edge, and the weight solving formula is shown as the following formula:
wherein alpha isiThe weighting coefficient is n, and the n is the total number of the extracted voice call features;
for short message social network graph G2Directed edge weight ofBy extracting (u)i,uj) Feature set of conversation between parties Feature set F2Including but not limited to a transmission times characteristic f1 (2)Total byte number characteristic of short messageAverage byte number characteristic of short messageShort message sending time period characteristicsWhether the short message is the verification codeSender on-network time characteristicsReceiver on-line time characteristicsThen, the weighted summation is carried out on all the elements in the set to obtain the weight of the edge, and the weight solving formula is shown as the following formula:
wherein beta isiM is the total number of the extracted short message communication characteristics;
mobile phone application access bipartite graph G3=(U3,V3,E3) Wherein U is3Representing a set of user nodes, V3Representing a mobile phone application node set;representing a set of relational edges for a user to access a mobile application, each edge having a non-negative weight wijThe number is more than or equal to 0, which represents the internet access use condition of the user accessing the mobile phone application; dichotomy G for mobile phone application access3User handset application relationship pair (u)i,vj) Directed edge weights betweenBy extracting (u)i,vj) Inter-networking feature setFeature set F3Including but not limited to the access times characteristic f1 (3)Total length of access featureAverage access duration characteristic f3 (3)Access total consumption traffic characteristicsAverage consumption flow characteristicUser on-network time characteristicsMobile phone application class featuresThen, the weighted summation is carried out on all the elements in the set to obtain the weight of the edge, and the weight solving formula is shown as the following formula:
wherein gamma isiK is the total number of features extracted from the user access APP data, as a weighting coefficient.
Further, step S3 specifically includes the following steps:
s31: according to the constructed social network homogeneous graph, the short message social network homogeneous graph and the mobile phone application access bipartite graph, graph embedding training is carried out on the user nodes by adopting corresponding graph embedding models respectively;
s32: finding out a neighbor sequence set of the user node according to first-order and second-order neighbor similarity between the nodes of the homogeneous graph, and finding out a neighbor sequence set of the user node according to an explicit relation and an implicit relation of the bipartite graph;
s33: and respectively splicing the node embedding obtained by the first-order similarity training and the node embedding obtained by the second-order training to obtain the embedding vector of the user node of the homogeneous graph, and carrying out combined optimization training on the explicit relation and the implicit relation to obtain the user node embedding vector of the bipartite graph.
Further, in step S3, for the homogenous graph, the user node is mapped from the graph domain to the embedded domain, i.e. when the user node index i is given, the node u is directly obtainediIs embedded in uiThe mapping function is expressed as:
wherein e isi∈{0,1}NRepresenting user node uiOne-hot encoding (where N ═ U | represents the number of user nodes; e.g. of the typeiRepresenting the corresponding i-th element e in the vectori[i]Is 1, and the other elements are 0; wN×dIs the embedding parameter matrix to learn, where d is the embedding dimension; the ith row of the matrix W is node uiAn embedded representation of (a);
for bipartite graph, due to the original bipartite graph G3Two types of node sets exist, and considering that the fraud number detection task only needs to pay attention to the characteristics of the user nodes, the user node-based homogeneous graph needs to be split out GUExtracting features as implicit relation, mapping nodes of bipartite graph from graph domain to embedded domain, and using u as each nodeiAnd viRepresenting user node ui∈U3And mobile phone application node vi∈V3The embedded vector of (2);
extracting key structure information of user nodes in graph domainWherein the homogeneous graph network reconstructs domain information for the nodes based on first and second order similarities of the nodesThe bipartite graph network models and extracts the key structure information of the user nodes in the graph domain according to the explicit relation and the implicit relation of the graph domain nodes
Reconstructing extracted graph domain co-occurrence information using embedded representation of embedded domainAndthe reconstructed information is represented asAnd
by pairing co-occurrence-based informationAnd reconstructing informationThe target function of (2) is optimized, and a mapping function and all parameters related in a reconstructor are learned;
for the homogeneity map, the objective function that the first order similarity needs to be optimized is:
the objective function to be optimized for the second order similarity is:
for bipartite graph G3The optimization objective function for modeling by the explicit relationship is:
the optimization objective function for modeling by the implicit relationship is as follows:
by pairing co-occurrence-based informationAnd reconstructing informationObject function O of5And optimizing, and learning all parameters related in the mapping function and the reconstructor. The final joint optimization overall objective function of the bipartite graph is as follows:
maximize O5=-μO3+ηO4
where μ and η are the hyper-parameters to be specified for combining the different components in the joint optimization.
Further, step S4 specifically includes the following steps:
s41: splicing the homogeneity graph and the bipartite graph embedding vector to obtain a final node embedding characteristic, and combining the basic user characteristic and label information, wherein information is input into a safe federal gradient elevation tree classification model for primary training;
s42: sorting the features obtained by the training of the first stage according to importance, screening out the features n before ranking, and distributing the features to different participants for optimizing the features;
s43: after different participants carry out feature screening, carrying out two-stage federal gradient elevated tree classification training again, and outputting fraud number prediction results;
s44: and processing the final classification result of the user and outputting a suspicious fraud number list.
Further, the two-stage training process of the safe federated gradient lifting tree model comprises encrypted sample alignment and encrypted model training; in the training process, the central server carries out encryption exchange on the intermediate calculation result and the parameters of the model to finally obtain the optimal model parameter combination; the encryption mode is carried out by adopting a mode based on an RSA algorithm and a hash function; in the training process, the local data are calculated only locally, the calculation result is encrypted and then transmitted to the central server, and other participants cannot obtain the details of the local data. Thus, the local data can be secured.
The invention has the beneficial effects that: the method solves the problem of feature extraction for interactive features of the historical telephone bill and the online data of the telecommunication user in the detection task of the fraud user, and combines the basic information features of the user acquired by feature engineering for classification prediction of the machine learning model. A more multivariate data feature extraction method is provided for the traditional fraud number detection task. The method can be mutually fused and supplemented with other traditional fraud number detection models, and has good generalization capability in fraud number detection tasks. The data required to be collected by the method can be processed in an anonymized encrypted data form, the same characteristic extraction effect can be achieved, and the method has positive practical significance for protecting the privacy safety of the user to a certain extent. The invention can combine data of different telecom operators and other related organizations as model input to carry out joint training, and the used safety federal machine learning model can ensure that the data of all the participants are not leaked to each other. The data security can be guaranteed, and meanwhile, the telecommunication fraud detection can be carried out by fully utilizing multi-party data. For the scenes that the use of the current privacy data is stricter and stricter, the scheme can well solve the problems of data isolation and data fragmentation. The invention adopts a two-stage training mode in the multi-party combined modeling, can perform characteristic screening on multi-party data characteristics, and can improve the generalization capability of the model to a certain extent. The method belongs to a mode of model optimization, and can be applied to different training models.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims thereof.
Drawings
For the purposes of promoting a better understanding of the objects, aspects and advantages of the invention, reference will now be made to the following detailed description taken in conjunction with the accompanying drawings in which:
FIG. 1 is a schematic representation of the steps of the process of the present invention;
FIG. 2 is a schematic general flow diagram of the process of the present invention;
FIG. 3 is a schematic diagram of a social graph embedding module for voice messages according to the present invention;
FIG. 4 is a schematic diagram of a cell phone application access bipartite graph embedded module employed in the present invention;
FIG. 5 is a schematic diagram of a local machine learning classification module employed by the present invention;
FIG. 6 is a schematic diagram of a secure federal multi-party training model used in the present invention;
fig. 7 is a schematic diagram of secure federal encrypted training in the present invention.
Detailed Description
The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention in a schematic way, and the features in the following embodiments and examples may be combined with each other without conflict.
Wherein the showings are for the purpose of illustrating the invention only and not for the purpose of limiting the same, and in which there is shown by way of illustration only and not in the drawings in which there is no intention to limit the invention thereto; to better illustrate the embodiments of the present invention, some parts of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
The same or similar reference numerals in the drawings of the embodiments of the present invention correspond to the same or similar components; in the description of the present invention, it should be understood that if there is an orientation or positional relationship indicated by terms such as "upper", "lower", "left", "right", "front", "rear", etc., based on the orientation or positional relationship shown in the drawings, it is only for convenience of description and simplification of description, but it is not an indication or suggestion that the referred device or element must have a specific orientation, be constructed in a specific orientation, and be operated, and therefore, the terms describing the positional relationship in the drawings are only used for illustrative purposes, and are not to be construed as limiting the present invention, and the specific meaning of the terms may be understood by those skilled in the art according to specific situations.
The invention provides a telecommunication fraud security federation detection method fusing a homogeneity diagram and a bipartite diagram, as shown in FIG. 1, which specifically comprises the following steps:
a fraud number detection data set is constructed from user different service data collected at a telecom operator. Firstly, data are divided into the following four types according to different service data characteristics: user information data, voice call data, short message communication data and mobile phone application access data. And performing data cleaning operations such as abnormal value processing, missing value processing, normalization processing and the like on the collected metadata, and marking the extracted telecommunication users according to the grasped telecommunication fraud report information, wherein the fraud users are marked as 1, and the non-fraud users are marked as 0.
And constructing a telecommunication user social network homogeneous graph and a user mobile phone application bipartite graph data set by utilizing the preprocessed data, wherein the data set comprises the label information of the user for fraud number two-class training and testing. The specific construction process is as follows:
voice social network graph G1=(U1,E1) Social network G with short message2=(U2,E2) Wherein U isiIs a set of user nodes, EiIs a set of user-user communication relationships. Each edge (i, j) in the edge set belongs to E and has a pair of user node pairs (u)i,uj) While having a weight, i.e. wijAnd the value is more than or equal to 0 and represents the interaction condition between the two users. For Voice social networking graph G1User pair (u)i,uj) Directed edge weights betweenBy extracting (u)i,uj) Feature set of conversation between partiesFeature set F1Including but not limited to a number of calls feature f1 (1)Total call duration featureAverage call duration featureTalk time period featureToll call featureCaller on-network time featureCalled on-line time characteristicsAnd then, carrying out weighted summation on all elements in the set to obtain the weight of the edge, wherein the weight solving formula is shown as the following formula:
wherein alpha isiN is the total number of extracted voice call features as a weighting coefficient. Social network diagram G for short messages as well2Directed edge weight ofBy extracting (u)i,uj) Feature set of conversation between partiesFeature set F2Including but not limited to a transmission times characteristic f1 (2)Total number of bytes of short messageShort message average byte number characteristic f3 (2)Time period characteristics for sending short messagesWhether the short message is the verification codeSender on-network time characteristicsReceiver on-line time characteristicsAnd then, carrying out weighted summation on all elements in the set to obtain the weight of the edge, wherein the weight solving formula is shown as the following formula:
wherein beta isiAnd m is the total number of the extracted short message communication characteristics.
Mobile phone application access bipartite graph G3=(U3,V3,E3) Wherein U is3Representing a set of user nodes, V3Representing a set of handset application nodes.Representing a set of relational edges for a user to access a mobile application, each edge having a non-negative weight wijAnd the number is more than or equal to 0, which represents the internet access use condition of the user accessing the mobile phone application. Dichotomy G for mobile phone application access3User APP relationship Pair (u)i,vj) Directed edge weights betweenBy extractingui,vj) Inter-networking feature setFeature set F3Including but not limited to a number of accesses feature f1 (3)Accessing total duration featuresAverage access duration characteristic f3 (3)Accessing total consumed traffic characteristicsAverage consumption flow characteristicUser on-network time characteristicsMobile phone application class featuresAnd then, carrying out weighted summation on all elements in the set to obtain the weight of the edge, wherein the weight solving formula is shown as the following formula:
wherein gamma isiK is the total number of features extracted from the user access APP data, as a weighting coefficient.
The method comprises the steps of constructing a homogeneous graph embedded network aiming at a voice short message social graph, constructing a bipartite graph embedded network aiming at a mobile phone application access bipartite graph, sampling user nodes in an unsupervised learning mode to obtain a neighbor node co-occurrence sequence, and then obtaining embedded representation of each node through reconstruction of an embedded function and co-occurrence information negative sampling iterative training.
And carrying out splicing operation on the node embedding characteristics output by each embedding model, screening out sample data with label data, dividing the sample data into a training set and a test set according to the label attributes in proportion to be used as the input of a classification model, and finally obtaining an optimal model for the classification prediction of fraud numbers through iterative training of the model on the training set and the test set. And finally, predicting other user data by using the model, and outputting a prediction result to a suspected fraud number database for reference of an operator.
The invention also provides a telecommunication fraud security federation detection device fusing the homogeneity map and the bipartite map, as shown in fig. 2, specifically comprising:
the original data acquisition module is firstly connected to a data warehouse of an operator, periodically extracts user communication data and user access mobile phone application data through HiveSQL, and combines and summarizes data records according to a time period to obtain three user communication tables which are stored in the storage module, wherein the three tables are voice call data, short message communication data and mobile phone application flow use condition data respectively.
The graph data preprocessing module periodically reads a voice call data table, a short message communication data table and a mobile phone application flow use condition data table stored in the memory, extracts interaction relations between users and between mobile phone applications in each table through combination and summarization, and stores three kinds of interaction graph data in a form of an adjacent table.
And the graph embedding feature extraction module is used for dividing the processed three graph structure data into two types and respectively extracting features. The first type is a telecommunication user social network homogeneity graph G based on voice and short message data1And G2. The second type is a mobile phone application access bipartite graph G based on the traffic use condition of the mobile phone application3。
FIG. 3 is a schematic diagram of a feature-embedded network of a homogeneity map. For homogeneity map G1And G2The embedded feature extraction comprises the following specific steps:
the method comprises the following steps: a node embedding mapping module for mapping the user node from the graph domain to the embedding domain, i.e. when the index i of the user node is given, the node u can be directly obtainediIs embedded in uiThe mapping function can be expressed as:
wherein e isi∈{0,1}NRepresenting user node uiOne-hot encoding (where N ═ U denotes the number of user nodes). e.g. of the typeiRepresenting the corresponding i-th element e in the vectori[i]Is 1, and the other elements are all 0. WN×dIs the embedding parameter matrix to be learned, where d is the dimension of the embedding. The ith row of the matrix W is node uiIs shown embedded.
Step two: a map domain co-occurrence information extraction module for extracting the key structure information of the user node in the map domainI.e. reconstructing the domain information of the node according to the first-order and second-order similarities of the node
Wherein, the first-order similarity refers to the local pairwise similarity between user nodes in the network, and the formalization is described as if node uiAnd ujThere is a direct edge between them, then the weight w of the direct edgeijThe similarity of the two vertexes is obtained; if no straight edge exists, the first-order similarity is 0. For node uiAnd ujThe similarity joint probability distribution of the undirected edges is defined as:
the empirical distribution among nodes in the embedded domain is defined as follows:
wherein v isi∈RdDisplay sectionPoint uiD-dimensional vector representation in the embedded domain.
The second-order similarity refers to similarity of neighbor nodes of user nodes in the network. Formalized by the definition of pu=(wu,1,wu,2,…,wu,|V|) Representing the first-order similarity between the node u and all other nodes, the second-order similarity between the nodes u and v can be represented by puAnd pvIs expressed by the similarity of (c). If there is no common neighbor node between nodes u and v, the second-order similarity between u and v is 0. The second order similarity may express a global feature of the graph.
For second-order similarity, two embedded vectors are required to be introduced into each node for characterization, and one embedded vector is used for characterizing the node per se, namely the embedded vector u of the central nodecen(ii) a The other is an embedded vector of context nodes as other nodes, namely a neighborhood node embedding ucon. Thus in the graph domain, for an arbitrary edge (u)i,uj) E, the joint distribution of the two is defined as:
wherein, wijIs node uiAnd ujWeight of the edge of (d)iIs a vertex uiNumber of neighbor nodes, N (u)i) Is node uiThe set of domain nodes.
And in the embedding domain, the conditional probability between nodes is uiIn the presence of ujThe probability of presence is defined as:
wherein the content of the first and second substances,representing a node uiIs embedded in the central node of the network,representing a node uiThe neighborhood node of (1) is embedded, and the | V | represents the number of the neighborhood nodes.
Step three: an embedded domain information reconstruction module for reconstructing the extracted map domain co-occurrence information using the embedded representation of the embedded domainThe reconstructed information is represented as
Step four: an objective function optimization module based on co-occurrence informationAnd reconstructing informationThe target function of (2) is optimized, and the mapping function and all parameters involved in the reconstructor are learned.
Where the first order similarity measures the difference between two probability distributions using KL divergence. The optimization objective function of the first-order similarity obtained after the constant term is omitted is as follows:
the second-order similarity also adopts KL divergence to calculate the difference of different distributions, and the optimization objective function of the second-order similarity obtained by neglecting a constant term is as follows:
FIG. 4 is a schematic diagram of a bipartite graph embedded network architecture. For bipartite graph G3The embedded feature extraction comprises the following specific steps:
the method comprises the following steps: bipartite graph reconstruction Module, original bipartite graph G3In which there are two types of node assemblies, fraud is taken into accountThe number detection task only needs to pay attention to the characteristics of the user nodes, so that the user node-based homogeneous graph only needs to be split out G in the moduleUFeature extraction is performed as an implicit relationship.
Step two: a node embedding mapping module for mapping each node of the bipartite graph from the graph domain to the embedding domain by uiAnd viRepresenting user node ui∈U3And mobile phone application node vi∈V3The embedded vector of (2).
Step three: the map domain co-occurrence information extraction module is used for extracting key structure information of the user node in the map domain
Step four: an embedded domain information reconstruction module for reconstructing the extracted map domain co-occurrence information using the embedded representation of the embedded domainThe reconstructed information is represented as
For bipartite graph G3Given a node pair (u)i,vj)∈E3Wherein u isi∈U3And v3∈V3And the joint probability between two nodes in the graph domain is as follows:
whereas the empirical distribution of nodes within the embedding domain is:
for explicit relationships, the difference in the distribution of the map domain and the embedded domain is measured by the KL divergence, so the objective function is:
the final objective function after ignoring the constant term is:
implicit relationship homogeneity map G for bipartite mapsUAnd training optimization is carried out by adopting homogeneity graph embedding based on first-order similarity. The joint probability distribution of the user nodes, the experience distribution of the nodes in the embedded domain and the objective function to be optimized are as follows:
step four: an objective function optimization module based on co-occurrence informationAnd reconstructing informationObject function O of5And optimizing, and learning all parameters related in the mapping function and the reconstructor. The final joint optimization overall objective function is:
maximize O5=-μO3+ηO4
wherein, O3For bipartite graph nodes explicit relationship objective function, O4Implicit relational objective function for bipartite graph nodesThe numbers μ and η are the hyper-parameters to be specified for combining the different components in the joint optimization.
Obtaining three kinds of embedded vector feature representations X of the user through the iterative optimization of the graph embedding module1,X2,X3。
Fig. 5 is a fraud user detection local classification model architecture employed by the present invention, and fig. 6 is a joint training model architecture for performing secure federal learning in conjunction with a multi-party local model. For the local models of a plurality of participants, firstly, the fraud user detection module is used for sorting the user basic information characteristics X of the data processing module0Embedding features X with a user1,X2,X3And splicing to obtain a telecommunication user characteristic combination table, and combining the telecommunication user label data obtained by the actual alarm information to form a fraud user detection data set. The same method is adopted for a plurality of participants to construct a sample data set, then each participant conducts encryption sample entity alignment on the operation process through a central server as a coordinator, encryption operation and exchange are conducted on the operation results of local data models of each participant, optimal model parameters are finally obtained through continuous iterative optimization to be used for result prediction of fraud numbers, and users predicted as fraud numbers are led out to a suspicious user list to be further researched and used. In this module, the local classification models used by each organization include, but are not limited to, logistic regression, decision trees, deep learning networks, ensemble learning, and the like.
Fig. 7 is a schematic diagram of secure federal multi-party combined encryption training. And (3) adopting a secondary training mode in the training process, wherein the first training is used for carrying out feature screening, the feature importance weight is obtained after the first training of each participant feature is finished, and the features are sorted according to the value to screen out the features which are ranked at the top 50. And then enabling the participants owning the characteristics to carry out secondary joint modeling, and providing the result of secondary training as output to the label owner operator. The operator extracts a list of fraud numbers from the prediction as a reference.
In a preferred embodiment, when a new type of telecommunication fraud mode occurs, the new fraud sample is classified and labeled, sample data of normal users and new fraud user are selected and input into the trained model, and the model can adapt to the detection of the new fraud type through iterative optimization of model parameters.
According to the embodiment of the invention, different types and different quantities of data sets are selected in different processes, so that a telecommunication user fraud detection method based on the voice short message social graph and the mobile phone application access bipartite graph can be realized, and a fraud user in the telecommunication users can be detected and identified.
Finally, the above embodiments are only intended to illustrate the technical solutions of the present invention and not to limit the present invention, and although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions, and all of them should be covered by the claims of the present invention.
Claims (8)
1. A telecommunication fraud security federation detection method fusing a homogeneity graph and a bipartite graph is characterized in that: the method comprises the following steps:
s1: based on user service data of a telecom operator, extracting voice call data, short message communication data and mobile phone application access data of a user, and preprocessing the voice call data, the short message communication data and the mobile phone application access data;
s2: constructing a telecommunication user social network homogeneous graph and a user mobile phone application bipartite graph data set by utilizing the preprocessed data, wherein the graph data set comprises three types of weighted graphs of a voice social network homogeneous graph, a short message social network homogeneous graph and a mobile phone application access bipartite graph, and the weight setting of the sides is used for carrying out statistical feature extraction and weight aggregation according to the characteristics of different services;
s3: constructing a homogeneous graph embedded network aiming at a social network homogeneous graph, constructing a bipartite graph embedded network aiming at a bipartite graph applied to a mobile phone accessed by a user, sampling user nodes by adopting a graph embedding learning mode to obtain a neighbor node co-occurrence sequence, and then obtaining an embedded representation of each node by reconstructing an embedded function and performing negative sampling iterative training on co-occurrence information; fusing the embedding characteristics obtained by training as the embedding representation of the user;
s4: different participants extract local telecommunication user characteristics according to local data characteristics, and a safe federal gradient elevated tree classification model is adopted to carry out combined training on local data of different organizations; and performing encrypted data sample alignment and encrypted model parameter exchange on sample data among different mechanisms through a reliable third-party server, thereby realizing multi-party model combined training, wherein a two-stage training method is adopted in the training process, the first-stage training is used for screening the features, the second-stage training is used for classifying the screened features, and a final prediction result of the fraud number is output.
2. The telecommunication fraud security federation detection method of fusing a homogeneity map and a bipartite map according to claim 1, wherein: step S1 specifically includes: constructing a fraud number detection data set by utilizing different service data of users collected from a telecom operator; the data is divided into the following four types according to different service data characteristics: user basic information data, voice call data, short message communication data and mobile phone application access data; performing data cleaning operation on the acquired data, wherein the data cleaning operation comprises abnormal value processing, missing value processing and standardized processing; and meanwhile, marking the extracted telecommunication users according to the grasped telecommunication fraud report information, wherein the fraud users are marked as 1, and the non-fraud users are marked as 0.
3. The telecommunication fraud security federation detection method fusing a homogeneity map and a bipartite map according to claim 1, wherein: in step S2, the process of constructing the bipartite graph for the social network homogeneity graph of voice and sms messages and the user access to the mobile phone includes: aiming at voice and short message data, extracting a voice social graph G of a telecommunication user according to calling and called relations of voice communication1(ii) a Constructing a short message social graph G according to the uplink and downlink receiving and sending relations of the short message communication2(ii) a Aiming at the user internet log data, summarizing and merging the data according to the record of the user accessing the mobile phone application to obtain a mobile phone application access bipartite graph G3(ii) a All the three types of graph data are weighted graphsThe method comprises the following steps that the side weight of a voice social graph is weighted and evaluated according to the communication relation characteristics between calling and called parties, the side weight of a short message social graph is weighted and evaluated according to the communication relation characteristics of a receiving and sending user, and the side weight of a user accessing a mobile phone application bipartite graph is weighted and evaluated according to the internet access condition characteristics of the user accessing the mobile phone application.
4. The telecommunication fraud security federation detection method fusing a homogeneity map and a bipartite map according to claim 3, wherein: in step S2, the constructing a telecommunication user social network homogeneity map and user mobile phone application bipartite graph data set by using the preprocessed data specifically includes:
voice social network graph G1=(U1,E1) Social network G with short message2=(U2,E2) Wherein U isiIs a set of user nodes, EiIs a user and user communication relation set; each edge (i, j) in the edge set belongs to E and has a pair of user node pairs (u)i,uj) Having a weight wijThe number is more than or equal to 0, which represents the interaction condition between two users;
for Voice social networking graph G1User pair (u)i,uj) Directed edge weights betweenBy extracting (u)i,uj) Feature set of conversation between partiesFeature set F1Including but not limited to a number of calls feature f1 (1)Total call duration characteristicsAverage call duration featureTalk time period featureToll call featureCaller on-network time featureCalled on-line time characteristicsThen, the weighted summation is carried out on all the elements in the set to obtain the weight of the edge, and the weight solving formula is shown as the following formula:
wherein alpha isiThe weighting coefficient is n, and the n is the total number of the extracted voice call features;
for short message social network graph G2Directed edge weight ofBy extracting (u)i,uj) Feature set of conversation between parties Feature set F2Including but not limited to a transmission times characteristic f1 (2)Total byte number characteristic of short messageAverage byte number characteristic of short messageShort message sending time period characteristicsWhether the short message is the verification codeSender on-network time characteristicsReceiver on-line time characteristicsThen, the weighted summation is carried out on all the elements in the set to obtain the weight of the edge, and the weight solving formula is shown as the following formula:
wherein beta isiM is the total number of the extracted short message communication characteristics as a weighting coefficient;
mobile phone application access bipartite graph G3=(U3,V3,E3) Wherein U is3Representing a set of user nodes, V3Representing a mobile phone application node set;representing a set of relational edges for a user to access a mobile application, each edge having a non-negative weight wijThe number is more than or equal to 0, which represents the internet access use condition of the user accessing the mobile phone application; dichotomy G for mobile phone application access3User handset application relationship pair (u)i,vj) Directed edge weights betweenBy extracting (u)i,vj) Inter-networking feature setFeature set F3Including but not limited to a number of accesses feature f1 (3)Total length of access featureAverage access duration featureAccessing total consumed traffic characteristicsAverage consumption flow characteristicUser on-network time characteristicsMobile phone application class featuresThen, the weighted summation is carried out on all the elements in the set to obtain the weight of the edge, and the weight solving formula is shown as the following formula:
wherein gamma isiK is the total number of features extracted from the user access APP data, as a weighting coefficient.
5. The telecommunication fraud security federation detection method fusing a homogeneity map and a bipartite map according to claim 1, wherein: step S3 specifically includes the following steps:
s31: according to the constructed social network homogeneous graph, the short message social network homogeneous graph and the mobile phone application access bipartite graph, graph embedding training is carried out on the user nodes by adopting corresponding graph embedding models respectively;
s32: finding out a neighbor sequence set of the user node according to first-order and second-order neighbor similarity between the nodes of the homogeneous graph, and finding out a neighbor sequence set of the user node according to an explicit relation and an implicit relation of the bipartite graph;
s33: and respectively splicing the node embedding obtained by the first-order similarity training and the node embedding obtained by the second-order training to obtain the embedding vector of the user node of the homogeneous graph, and carrying out combined optimization training on the explicit relation and the implicit relation to obtain the user node embedding vector of the bipartite graph.
6. The telecommunication fraud security federation detection method fusing a homogeneity map and a bipartite map according to claim 5, wherein: in step S3, for the homogenous graph, the user node is mapped from the graph domain to the embedded domain, i.e. when the user node index i is given, the node u is directly obtainediIs embedded in uiThe mapping function is expressed as:
wherein e isi∈{0,1}NRepresenting user node uiOne-hot encoding (where N ═ U | represents the number of user nodes; e.g. of the typeiRepresenting the corresponding i-th element e in the vectori[i]Is 1, and the other elements are 0; wN×dIs the embedding parameter matrix to be learned, where d is the embedding dimension; the ith row of the matrix W is node uiAn embedded representation of (a);
splitting the homogeneous graph based on the user nodes into G for the bipartite graphUExtracting features as implicit relation, mapping nodes of bipartite graph from graph domain to embedded domain, and using u as each nodeiAnd viRepresenting user node ui∈U3And mobile phone application node vi∈V3The embedded vector of (2);
extracting key nodes of user nodes in graph domainConstruct informationWherein the homogeneous graph network reconstructs domain information for the nodes based on first and second order similarities of the nodesThe bipartite graph network models and extracts the key structure information of the user nodes in the graph domain according to the explicit relation and the implicit relation of the graph domain nodes
Reconstructing extracted graph domain co-occurrence information using embedded representation of embedded domainAndthe reconstructed information is represented asAnd
by pairing co-occurrence-based informationAnd reconstructing informationThe target function of (2) is optimized, and a mapping function and all parameters related in a reconstructor are learned;
for the homogeneity map, the objective function that the first order similarity needs to be optimized is:
the objective function to be optimized for the second order similarity is:
for bipartite graph G3The optimization objective function for modeling by the explicit relationship is:
the optimization objective function for modeling by the implicit relationship is as follows:
by pairing co-occurrence-based informationAnd reconstructing informationObject function O of5Optimizing, and learning all parameters related in the mapping function and the reconstructor; the final joint optimization overall objective function of the bipartite graph is as follows:
maximize O5=-μO3+ηO4
where μ and η are the hyper-parameters to be specified for combining the different components in the joint optimization.
7. The telecommunication fraud security federation detection method fusing a homogeneity map and a bipartite map according to claim 1, wherein: step S4 specifically includes the following steps:
s41: splicing the homogeneity graph and the bipartite graph embedding vector to obtain a final node embedding characteristic, and combining the basic user characteristic and label information, wherein information is input into a safe federal gradient elevation tree classification model for primary training;
s42: sorting the features obtained by the training of the first stage according to importance, screening out the features n before ranking, and distributing the features to different participants for optimizing the features;
s43: after different participants carry out feature screening, carrying out second-stage federal gradient elevation tree classification training again, and outputting fraud number prediction results;
s44: and processing the final classification result of the user and outputting a suspicious fraud number list.
8. The telecommunication fraud security federation detection method fusing a homogeneity map and a bipartite map according to claim 1, wherein: the two-stage training process of the safe federated gradient lifting tree model comprises encrypted sample alignment and encrypted model training; in the training process, the central server carries out encryption exchange on the intermediate calculation result and the parameters of the model to finally obtain the optimal model parameter combination; the encryption mode is carried out by adopting a mode based on an RSA algorithm and a hash function; in the training process, the local data are calculated only locally, the calculation result is encrypted and then transmitted to the central server, and other participants cannot obtain the details of the local data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210397973.8A CN114693317A (en) | 2022-04-08 | 2022-04-08 | Telecommunication fraud security federation detection method fusing homogeneous graph and bipartite graph |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210397973.8A CN114693317A (en) | 2022-04-08 | 2022-04-08 | Telecommunication fraud security federation detection method fusing homogeneous graph and bipartite graph |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114693317A true CN114693317A (en) | 2022-07-01 |
Family
ID=82142402
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210397973.8A Pending CN114693317A (en) | 2022-04-08 | 2022-04-08 | Telecommunication fraud security federation detection method fusing homogeneous graph and bipartite graph |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114693317A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115664847A (en) * | 2022-12-08 | 2023-01-31 | 南京金科院大学科技园管理有限公司 | User information safe storage method of internet education platform |
CN117009999A (en) * | 2023-09-22 | 2023-11-07 | 中关村科学城城市大脑股份有限公司 | Smart park data storage method, device, equipment and computer readable medium |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8561184B1 (en) * | 2010-02-04 | 2013-10-15 | Adometry, Inc. | System, method and computer program product for comprehensive collusion detection and network traffic quality prediction |
CN110555455A (en) * | 2019-06-18 | 2019-12-10 | 东华大学 | Online transaction fraud detection method based on entity relationship |
CN112866486A (en) * | 2021-02-01 | 2021-05-28 | 西安交通大学 | Multi-source feature-based fraud telephone identification method, system and equipment |
CN113362160A (en) * | 2021-06-08 | 2021-09-07 | 南京信息工程大学 | Federal learning method and device for credit card anti-fraud |
CN113420232A (en) * | 2021-06-02 | 2021-09-21 | 杭州电子科技大学 | Privacy protection-oriented graph neural network federal recommendation method |
CN113569906A (en) * | 2021-06-10 | 2021-10-29 | 重庆大学 | Heterogeneous graph information extraction method and device based on meta-path subgraph |
CN113887577A (en) * | 2021-09-14 | 2022-01-04 | 同济大学 | Fine-grained telecommunication network anti-fraud detection method based on microscopic event map |
-
2022
- 2022-04-08 CN CN202210397973.8A patent/CN114693317A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8561184B1 (en) * | 2010-02-04 | 2013-10-15 | Adometry, Inc. | System, method and computer program product for comprehensive collusion detection and network traffic quality prediction |
CN110555455A (en) * | 2019-06-18 | 2019-12-10 | 东华大学 | Online transaction fraud detection method based on entity relationship |
CN112866486A (en) * | 2021-02-01 | 2021-05-28 | 西安交通大学 | Multi-source feature-based fraud telephone identification method, system and equipment |
CN113420232A (en) * | 2021-06-02 | 2021-09-21 | 杭州电子科技大学 | Privacy protection-oriented graph neural network federal recommendation method |
CN113362160A (en) * | 2021-06-08 | 2021-09-07 | 南京信息工程大学 | Federal learning method and device for credit card anti-fraud |
CN113569906A (en) * | 2021-06-10 | 2021-10-29 | 重庆大学 | Heterogeneous graph information extraction method and device based on meta-path subgraph |
CN113887577A (en) * | 2021-09-14 | 2022-01-04 | 同济大学 | Fine-grained telecommunication network anti-fraud detection method based on microscopic event map |
Non-Patent Citations (3)
Title |
---|
JUNSHUAI SONG 等: "A subgraph-based knowledge resoning method for collective fraud detection in E-commerce", 《NEUROCOMPUTING》, vol. 461, 21 October 2021 (2021-10-21), pages 587 - 597, XP086797121, DOI: 10.1016/j.neucom.2021.03.134 * |
张林泉: "融合图嵌入与纵向联邦学习的诈骗号码检测方法研究", 《中国优秀硕士学位论文全文数据库 基础科学辑》, no. 6, 15 June 2023 (2023-06-15), pages 002 - 21 * |
高雅丽: "面向大数据的网络威胁情报可信感知关键技术研究", 《中国博士学位论文全文数据库信息科技辑》, no. 2, 15 February 2021 (2021-02-15), pages 139 - 8 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115664847A (en) * | 2022-12-08 | 2023-01-31 | 南京金科院大学科技园管理有限公司 | User information safe storage method of internet education platform |
CN117009999A (en) * | 2023-09-22 | 2023-11-07 | 中关村科学城城市大脑股份有限公司 | Smart park data storage method, device, equipment and computer readable medium |
CN117009999B (en) * | 2023-09-22 | 2024-01-16 | 中关村科学城城市大脑股份有限公司 | Smart park data storage method, device, equipment and computer readable medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Pourhabibi et al. | Fraud detection: A systematic literature review of graph-based anomaly detection approaches | |
Soto et al. | Prediction of socioeconomic levels using cell phone records | |
Sarraute et al. | A study of age and gender seen through mobile phone usage patterns in mexico | |
Choudhury et al. | A novel approach to fake news detection in social networks using genetic algorithm applying machine learning classifiers | |
CN114693317A (en) | Telecommunication fraud security federation detection method fusing homogeneous graph and bipartite graph | |
Taha et al. | SIIMCO: A forensic investigation tool for identifying the influential members of a criminal organization | |
Lu et al. | Telecom fraud identification based on ADASYN and random forest | |
CN107527240A (en) | A kind of operator's industry product Praise effect identification system and method | |
CN112380572B (en) | Privacy protection quantitative evaluation method and system under multi-party data collaborative scene | |
CN114140036A (en) | Enterprise portrait method, system and readable storage medium based on data analysis | |
CN113159866A (en) | Method for building network user trust evaluation model in big data environment | |
Barman et al. | A complete literature review on financial fraud detection applying data mining techniques | |
CN113010578A (en) | Community data analysis method and device, community intelligent interaction platform and storage medium | |
Pang et al. | Information matching model and multi-angle tracking algorithm for loan loss-linking customers based on the family mobile social-contact big data network | |
Hui | Construction of information security risk assessment model in smart city | |
Shaikh et al. | A model for identifying relationships of suspicious customers in money laundering using social network functions | |
WO2022143431A1 (en) | Method and apparatus for training anti-money laundering model | |
US20210357942A1 (en) | Method and apparatus for identifying risky vertices | |
Gursoy et al. | Customer churn behaviour predicting using social network analysis techniques: A case study | |
CN108564380B (en) | Telecommunication user classification method based on iterative decision tree | |
Mu | Spam identification in cloud computing based on text filtering system | |
Susanto | The Digital Poverty and Empowerment Issue in Indonesia | |
Radhi | Adaptive learning system of ontology using semantic web to mining data from distributed heterogeneous environment | |
Knyazeva et al. | A graph-based data mining approach to preventing financial fraud: a case study | |
Birhanu | Near Real-time SIM-box Fraud Detection in Telecommunication System Using Machine Learning Approach in the Case of Ethio Telecom. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |