CN117892151A

CN117892151A - Network platform fraud detection system

Info

Publication number: CN117892151A
Application number: CN202410078715.2A
Authority: CN
Inventors: 袁璐; 姜航顺; 程南昌; 沈浩; 赵晴
Original assignee: Communication University of China
Current assignee: Communication University of China
Priority date: 2024-01-19
Filing date: 2024-01-19
Publication date: 2024-04-16

Abstract

The invention provides a network platform fraud detection system, which is realized based on a preset countermeasure generation network model; the countermeasure generation network model comprises a discriminator and a generator, wherein the discriminator is used for dividing users to be detected into groups so as to acquire information of the users on the group level; identifying and removing masquerading of the user by amplifying collusion patterns based on the group level information; detecting whether the disguised user is a deception by capturing temporal and spatial collusion patterns; the generator is configured to generate a fraudster sample by generating a masquerade and intentional indication of fraud to compete with fraudsters under the fraud detected by the arbiter. According to the invention, users are divided into groups, based on the information of the group level to which the users belong, the disguise is identified and removed by amplifying the collusion mode, and finally the deception after the disguise is removed is detected by capturing the time and space collusion mode, so that the deception with the disguise can be more effectively identified.

Description

Network platform fraud detection system

Technical Field

The invention relates to the technical field of information detection in the technical field of artificial intelligence, in particular to a network platform fraud detection system.

Background

Fraud is one of the main threats for the health development of various network platforms, and the occurrence times are increasing because fraud is easy to obtain illegal interests and the exposure risk is low. In order to detect deceptions, space-time diagram neural network models have been widely used to detect temporal and spatial collusion patterns, and the application of machine learning to research on deception detection has also achieved rich and complex settings, methods, theories and results.

A typical fraud implementation flow is shown in figure 1. Fraudsters induce their response to impair their own interests by finding, studying fraudulent targets, and then misleading the targets by manipulation, distortion, or evidence of counterfeiting, etc. It can be seen that fraud is different from denial, which refers to actions taken to discourage or prevent an object from knowing something by taking means such as hiding or interference. The basis of rejection is to hide and conceal the true phase; the basis of fraud is masquerading, presenting false content.

A spoof channel refers to a flow path of information between a spoof and a spoof target. In the planning stage of deception, a strategic target is to be defined; pre-judging how a deceptive reflects; pre-judging how a deceptive perceives deceptive information; which information must be masked and which must be presented; how to mask; how to show false; the effects of the above on the deceived person were analyzed. During the enforcement phase of fraud, spurious features are delivered through various fraud channels and fraud should be perceived by the victim.

Anti-fraud includes intelligence activities that identify denial of fraud and actions performed to eliminate or reduce the impact of performing denial of fraud on an adversary. The most important thing to do anti-spoofing is to know the spoofers and to identify their masquerading.

The current detection of network platform fraud mainly comprises: fraud detection in psychology, fraud detection in the healthcare industry, and graph-based anomaly detection.

From a psychological perspective, concentration is on the non-linguistic and linguistic behavior that individual perceptors can use to detect fraud. Researchers believe that fraud is accompanied by different psychological activities, some of which may penetrate when individuals lie. Thus, people often attempt to detect fraud based on non-verbal indicators such as physical movement and eye concentration, while ignoring or paying less attention to verbal signals. For example DePaulo and Morris have meta-analyzed possible predictors of fraud. They claim that detecting fraud is an imprecise science, and that there is a link between lie and pupil enlargement, which is a hallmark of tension and concentration. In addition, they found that the person listening to thought himself more feared than the person speaking to the voice because of their higher sound tone. Lie-players are also easier to close their mouths than people speaking. However, they do notice that the lie-er does not appear more annoying, nor blink more or stand less in a casual manner. According to the terms of debaro and morris, lie-players will appear abnormally quiet and have significantly reduced eye contact with the audience only when their motivation is more intense and at risk.

The healthcare industry has developed medical insurance anti-spoofing methods ranging from statistical rules to classical machine learning methods. Later, deep neural networks were introduced to learn potential fraud patterns, revealing the strength of the deep architecture in fraud detection. At the same time, the means of cheating has also been upgraded to become too fraudulent and covert, and the classical depth model cannot detect it, since it treats each fraud as isolated. Recently, graph Neural Networks (GNNs) have been used for fraud detection and have been significantly successful because GNNs can learn potential features efficiently from historical interconnect behavior. In other words, the GNN can infer fraud probabilities more accurately by learning fraud behaviors from the relationship graph.

Graph-based anomaly detection (GAD) refers to identifying anomalies that deviate significantly from most objects in relational and structured data. As graph data becomes ubiquitous and growing, graph-based anomaly detection has received increasing attention for its wide range of applications, such as Ye and Akoglu, etc. to propose spammer identification, weber, etc. to use graphs for financial fraudster identification. Due to the complex interactions between nodes in real world systems, detecting anomalies in graph data becomes more challenging than anomaly detection in non-interactive feature spaces. Given the trend of rogue connections with a large number of normal users in a network, the ability of conventional GNNs is limited based on homogeneity assumptions, and thus various techniques can be utilized to mitigate negative effects. From the spectral domain, AMNet and BWGNN both design a multi-pass spectral filter to find high frequency anomalies. To select important neighbors, both CARE-GNN and AOGNN use reinforcement learning modules, respectively rewarding neighbor to self-node similarity measurements and AUC performance. While PC-GNN directly measures anomaly probability gaps by training additional MLPs that only take node attributes as input.

However, the above-mentioned existing fraud detection techniques rarely give enough attention to the camouflage behavior of the fraudsters, and many existing graph neural network models face the challenge of extreme sample imbalance caused by rare fraudsters hidden in massive users, so that the fraud detection schemes are imperfect and the detection accuracy is low.

Disclosure of Invention

In view of the problems of poor detection accuracy and the like in the current false information detection field, the invention aims to provide a network platform fraud detection system which is used for detecting fraudsters based on information of a group level by dividing users into groups.

In one aspect, the invention provides a network platform fraud detection system, which is realized based on a preset countermeasure generation network model; wherein the countermeasure generation network model includes a discriminator and a generator; wherein the arbiter is used for detecting the deception under the fake dress, includes:

The group dividing module is used for dividing users to be detected into groups so as to acquire information of the users at the group level;

A disguise identification module for identifying and removing disguises of the users by amplifying collusion patterns based on the information of the group level;

A deception detection module for detecting whether the disguised user is a deception by capturing a temporal and spatial collusion pattern;

The generator is used for generating a deception sample by generating camouflage and intentional false indication and competing with deception under the false condition detected by the discriminator;

and each node and each edge of the time-space diagram are converted into node characteristics and edge characteristics of the time-space diagram by two double-layer perceptrons.

The group dividing module divides the users to be detected into m groups through cluster contrast learning by using a Gaussian mixture model, and marks the m groups as: c= { C ₁,c₂,...,c_m }, wherein wherein C is a population set, C represents one population, v represents a node therein, and n _i|c_i | refers to the number of users of the population C _i; m is a superparameter.

Wherein, optionally, the group dividing module also

The output of the group dividing module h (v) contains more negative sample graph structure information nodes by mutually pulling the node v and positive sample neighbors thereof in an embedded space and simultaneously keeping the node v away from other nodes by using contrast learning; and

The time correlation is used to weight the positive samples to reduce the weight of the camouflage edges, where the population division penalty is expressed as follows:

Wherein denotes a population division loss, M is a negative sample, randomly selected from V other than the neighbors of node V, h (V _i) is the output of node V _i through RNN, and h (V _j) is the output of node V _j in the negative sample through RNN; e and/> are updated using the latest generic graph and graph trace, respectively.

Wherein, optionally, the disguise identification module includes:

a mask generation unit, configured to output a mask for each edge related to the spoofed node in each snapshot map; wherein masquerading of the user node during t is reflected on node features and edge features at the snapshot map Gt;

An edge feature updating unit configured to update an edge feature by multiplying the edge feature by an element-by-element of the mask;

and the disguise removing unit is used for adjusting the alignment of the corresponding node characteristic and the edge characteristic so as to remove disguise in the snapshot graph.

Wherein, optionally, the mask generating unit includes:

An embedding generating unit for generating a mask embedding H' _t(m_ij for each edge related to the spoofed node given the output of the time embedding layer and the edge embedding H _t (E), where m _ij represents the edge related to the spoofed node;

An embedding conversion unit for embedding and converting each mask into a mask value, the mask value being a continuous value between 0 and 1;

wherein a smooth approximation of the tanh function is used to generate a mask of edges associated with the spoofed node, as shown in the following equation:

wherein W and b are learnable parameters, beta is a super-parameter for controlling the intensity of smooth approximation, and output

Wherein, the generator comprises a camouflage generating module and a deliberate camouflage showing module; wherein,

The disguise generation module is used for adding disguises to the existing deception; comprising the following steps:

A camouflage edge feature generating unit, configured to output a camouflage edge feature for each of the plurality of snap graphs related to the spoofed node, where for a connection between each of the plurality of spoofers and other users in each of the plurality of snap graphs, the camouflage edge feature includes all information related to the camouflage, whether an edge exists in the original graph or not;

A sample edge feature generation unit for generating edge features of a new spoof sample by adding the generated camouflage edge feature to the initial edge feature x _t(e_ij) element by element;

A feature alignment unit for aligning edge features of the newly generated spoof sample with node features to generate masquerading of the node;

The intentional fake-showing module is used for adding fake-showing content vectors to existing deception and forming a deception sample together with the content generated by the disguise generation module.

Wherein, the optional scheme also comprises a countermeasure optimization module, which is used for identifying the disguises generated by the disguise generation module through the disguise identification module; detecting a fraudster in the fraudster sample by the disguise identification module and the fraudster detection module.

Wherein, the optional scheme is that the camouflage generated by the camouflage generating module is identified by the camouflage identifying module, and the method comprises the following steps:

a given group partition C;

Fix ₂, optimize by maximizing collusion losses and/> and maximizing camouflage recognition losses/> ; and optimizing by maximizing collusion losses/> and/> ;

fixed , ₂ is optimized by minimizing collusion losses and/> and minimizing camouflage recognition losses/> .

Wherein, optionally, detecting the deception in the deception sample through the disguise identification module and the deception detection module includes: fixed ₂ and ₃, the generator optimizes by maximizing the spoof detection error ; fixed , the arbiter optimizes ₂ and ₃ by minimizing spoof detection error/> .

According to the network platform deception detection system provided by the invention, users are divided into groups, deception is identified and removed by amplifying collusion modes based on the group level information to which the users belong, and deception after deception removal is detected by capturing time and space collusion modes, so that deception with deception can be more effectively identified.

Drawings

Other objects and attainments together with a more complete understanding of the invention will become apparent and appreciated by referring to the following description taken in conjunction with the accompanying drawings. In the drawings:

FIG. 1 is a schematic diagram of a general spoofing implementation flow;

FIG. 2 is a schematic diagram of a logic structure of a network platform fraud detection system according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a detection flow of the applied network platform fraud detection system according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of an optimization flow of a arbiter according to an embodiment of the present invention;

fig. 5 is a schematic diagram of an optimization flow of a generator according to an embodiment of the invention.

Detailed Description

In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of one or more embodiments. It may be evident, however, that such embodiment(s) may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form in order to facilitate describing one or more embodiments.

For a plurality of problems existing in the existing false information detection method, the invention provides a solution idea integrating psychology or human cognition, and the authenticity and the credibility of the information are comprehensively evaluated by utilizing the psychology principle so as to improve the accuracy of false information detection.

In order to better explain the technical scheme of the invention, the following will briefly explain some technical terms related to the invention.

The challenge model (GAN, generative Adversarial networks) is generated, and the conventional GAN mainly consists of a generating network (GENERATIVE NETWORK) and a judging network (ADVERSARIAL NETWORK), which are also called a generator (generator) and a discriminator (Discriminator). In GAN the two networks are trained separately. Firstly, training a discriminator to enable the discriminator to discriminate the true and false of the picture. And a training generator for generating an image by the generator and discriminating by the image generated by the discriminator.

The generator, which is a special iterator, uses the yield statement in the function to generate a value, and each time the next () method of the generator is called, the function will execute to yield, return the value and pause, and the next call will continue from where it was paused last time.

The discriminator is a network for identifying the data generated by the generator in the antagonistic neural network, and the discriminator is used for improving the classification precision by restraining membership and true label generated by the generator and making the result output by the discriminator be opposite.

Space-time diagrams, which are geometric languages used to analyze physical problems, can describe events and locations in the air using world lines and reference frames.

The RNN (Recurrent Neural Network ) model is a model that can be used specifically to process time series data, and it generally takes the series data as input, effectively captures the characteristics of the relationships between the sequences through the structure design inside the network, and generally outputs the sequences. The RNN is most different from the traditional neural network in that the previous output result is brought into the next hidden layer to be trained together each time.

Gaussian mixture model (Gaussian Mixture Models, GMM) is a clustering method in unsupervised learning, which refers to the linear combination of a plurality of Gaussian distribution functions, and theoretically, the GMM can fit any type of distribution. Gaussian mixture models are often used to solve the situation where data in the same collection contains multiple different distributions, with particular applications being clustering, density estimation, generating new data, etc.

The bidirectional long-short term memory network (BiLSTM) is an improved method based on a Recurrent Neural Network (RNN), is formed by combining a forward LSTM and a backward LSTM, and can better capture the bidirectional semantic dependency and the expression of emotion words.

Deep knowledge of the deceptive's true record can reveal that deceptive will generally disguise on a group level, specifically, time disguise and space disguise. Such disguising can disguise itself as a benign population by hiding the collusion mode, thereby spoofing many existing graph neural network models. Specifically, to mask the temporal collusion mode, a deception introduces temporal camouflage by engaging in illegal activities that are not limited to short time windows. In addition, the deception uses space camouflage to intentionally relate himself to benign individuals, and can effectively hide the space collusion mode. Therefore, the invention starts with analyzing the camouflage behavior of the deception, and improves the camouflage identification capability by identifying the group information of the deception.

In order to solve the problem of low detection accuracy in the existing detection, the invention provides a network platform fraud detection system, and provides Fraud Detection (ADVERSARIAL GENERATION FRAUD DETECTION, AGFD) for fraud detection based on fraud and anti-fraud theory, and particularly can also be used for detecting false information containing fraud intention, so that the accuracy of the fraud information and false information detection is improved.

Specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

It should be noted that the following description of the exemplary embodiment is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses. Techniques and equipment known to those of ordinary skill in the relevant art may not be discussed in detail, but should be considered part of the specification where appropriate.

In order to illustrate the network platform fraud detection system provided by the present invention, fig. 2 and fig. 3 respectively show exemplary marking of a framework and an application method flow of the network platform fraud detection system according to an embodiment of the present invention. It should be noted that the drawings and the embodiments in the following description are only some implementations of the present invention, and are not limiting. Other figures and implementations may be made by those of ordinary skill in the art without undue burden from these figures and implementations.

Referring to fig. 2, the network platform fraud detection system 200 provided in the present embodiment is implemented based on a preset challenge-generating network model; the countermeasure generation network model mainly comprises five modules, namely a discriminator 220 and a generator 240, which are respectively responsible for group division, disguise identification, fraud detection, disguise generation and false indication generation. Wherein the first three modules (group partitioning module 222, disguise identification module 224, and fraud detection module 226) constitute a arbiter 220 for detecting fraud under fraud; the latter two modules (disguise generation module 242 and fraud generation module 244) constitute a generator 240 for generating a fraudster sample by generating disguises and intentional fraud, competing with fraudsters under the fraud detected by the aforementioned arbiter 220.

Wherein the arbiter 220 uses the space-time diagram neural network as a base model. The group division module 222 is intended to divide users into groups, providing group-level information; the purpose of this is to identify the population among all users and extract population-level information. The masquerading recognition module 224 detects masquerading behavior by amplifying a deception crime collusion mode based on group division, which is a precondition for the deception detection module to perform deception detection. After the detected disguises are removed, the spoof detection module employs 226 a space-time graph neural network to capture temporal and spatial collusion patterns to detect spoofers. The generator 240 competes with the arbiter 220 by generating camouflage and intentional false positives to generate a fraudster sample to mitigate the challenges of sample imbalance and improve the robustness of the model.

Accordingly, according to the detection flow of the network platform fraud detection system shown in fig. 3, the detection flow mainly includes a fraud discrimination step S100 and a fraud generation step S200; wherein, the fraud discrimination step S100 is used for detecting a fraudster under the condition of false loading, and includes:

S110: dividing users to be detected into groups to acquire information of the users at the group level;

S120: identifying and removing masquerading of the user by amplifying a collusion pattern based on the information of the group level;

S130: detecting whether the disguised user is a deception by capturing temporal and spatial collusion patterns;

the fraud generating step S200 is configured to generate a fraud sample by generating masquerading and intentional falsification, and compete with the fraud under the falsification detected in the fraud discriminating step S100.

According to the corresponding deception detection flow of the deception detection system, aiming at the technical problems in the prior art, the deception detection system mainly improves the deception recognition capability by recognizing group information of deception so as to realize deception detection based on the deception and anti-deception theory.

The above-described network platform fraud detection system 200 of the present invention and the fraud detection flow of the network platform fraud detection system will be described in connection with more detailed embodiments.

The inference phase uses only the spoofing identification module 224 and the spoofing detection module 226. In order to support new registered users in the actual business scenario, the inference phase does not use group partitioning. This enables AGFD to handle newly added nodes in generic graphs and graph trajectories, while also being useful for false information detection with deception.

Trainable parameters in the group division module 222, the disguise identification module 224, the spoofing detection module 226, the disguise generation module 242, and the falsification display generation module 244 are written as ₁,₂,₃ and , , respectively, and the values of these five parameters are continuously optimized and updated during the depth model training process.

The five modules all share a time space diagram. Namely, the arbiter and the generator share a time-space diagram, and each node and each edge of the time-space diagram are converted into node characteristics and edge characteristics of the time-space diagram through two double-layer perceptrons. Specifically, as an example, each node of the time-space diagram of the graph track needs to go through two double-layer perceptron first, the feature X _t (V) of the node V in the time period t is transformed into the node feature/> of the time-space diagram, each side E also goes through two double-layer perceptron, the feature X _t(E_t) of the side E in the time period t is transformed into the side feature H _t(E_t) of the time-space diagram, wherein the side feature/> of the time-space diagram and the node feature/> n^t of the time-space diagram are the number of sides when the time period t is, and d _v and d _e are the dimensions of node embedding and side embedding. The space-time layer consists of a space embedding layer and a time embedding layer.

The spatial embedding layer aims at mining the spatial pattern of back interactions between users in each snapshot Gt (which is an undirected static graph containing information over a period of time t). Inputs corresponding to snapshot Gt are and H _t(E_t). The spatial embedding layer consists of 1 layer of schematic force. In each layer, node embeddings are updated with neighbor information. The spatial embedded layers of the four modules of the group partitioning module 222, the disguise identification module 224, the spoof detection module 226, and the disguise generation module 242 share the same parameters to avoid overfitting.

The input to the temporal embedding layer is the output of the spatial embedding layer at each time step . In the temporal embedding layer, RNN models are used to capture sequence/> based temporal patterns. The output of the RNN model is written as/>

Fig. 4 shows an optimization flow of the arbiter according to an embodiment of the present invention.

As shown in fig. 4, for the arbiter 220, the optimization may be split into two phases. In the first stage, the optimization population division module 222 divides the user into a plurality of populations. Initially, the population division is based on the initial generic graph G without regard to masquerading. In the second stage, the population division C, the disguise identification module 224 identifies disguises by maximizing collusion mode. After disguising, the spoofer is accurately detected using the spoof detection module 226. The modules of the two stages are alternately optimized. And after disguising is removed in the second stage, updating the universal graph and further using the universal graph in a next round of optimization group division module. After several rounds of optimization, when the arbiter converges, the generator starts to be optimized.

The specific implementations of the group partitioning module 222, the spoofing identification module 224, and the spoofing detection module 226 are described in the following as examples.

Specifically, as an example, the goal of the group partitioning module 222 is to assign users to m groups, denoted as the group set c= { C ₁,c₂,...,c_m }, where where C is the group set, C represents one group, v represents a node therein, and n _i|c_i | refers to the number of users of the group C _i. Note that each group c _i is a subgraph of the initial generic graph G. These populations are non-overlapping, wherein/>

Since the generic graph G is constantly changing, the group partitioning module 222 always partitions users into different groups according to the latest generic graph (state of the current graph G). Cluster contrast learning is typically used for population detection, and in the present invention, such a method of cluster contrast learning is also followed to divide the population. Based on the output of the last time step of the RNN, which contains an understanding or summary of the entire sequence by the RNN network, all nodes are partitioned into m groups using a gaussian mixture model, where m is a hyper-parameter. The obtained population is labeled as rogue and benign population according to user-level labels throughout the deep learning model training process. Notably, the fraudster samples generated by generator 240 do not incorporate group detection, they retain the original group.

The optimization objective of the population partitioning module 222 considers both graph structure and node attributes. The output h (v) of the population division module contains more graph structure information nodes (negative samples) by pulling the node v and its neighbors (positive samples) closer to each other in the embedding space while keeping the node v away from other nodes using contrast learning. Initially, masquerading may be detrimental to group partitioning because masquerading edges associated with spatial masquerading always connect benign and rogue nodes. To facilitate early population partitioning in the training process, time correlation is used to weight the positive samples. By measuring the behavioral similarity, the temporal correlation may reduce the weight of the camouflage edges. To this end, the population division penalty can be expressed as follows:

Wherein denotes a population division penalty, M is a negative sample, randomly selected from node V except its neighbors; h (v _i) is the output of node v _i through the RNN, h (v _j) is the output of node v _j in the negative sample through the RNN, and E and/> are updated using the latest generic graph and graph trace, respectively.

Before introducing the masquerading identification module 224, first, it is described how the masquerading is embodied in the snapshot map and how the masquerading is removed from the snapshot map. Masquerading during t is reflected on node features and edge features at snapshot map G _t. In one embodiment of the invention, the disguising operation is from edge to point of entry, and node characteristics are adjusted according to edge characteristics.

In one embodiment of the present invention, the masquerading recognition module includes a mask generating unit, an edge feature updating unit, and a masquerading removing unit (not shown in the figure).

Specifically, to remove masquerading, masquerading identification module 224 outputs a mask for each edge of each snapshot map that is associated with a rogue node via a mask generation unit. In the masquerading identification module 224, the time embedding layer is BiLSTM, so that node information of different time steps is mutually enhanced. As an example, the mask generating unit further includes an embedding generating unit and an embedding converting unit.

Wherein the embedding generating unit is configured to generate a mask embedding H' _t(m_ij for each edge associated with the spoofed node given the output of the embedding layer and edge embedding H _t (E), where m _ij represents the edge associated with the spoofed node. An embedding conversion unit for embedding and converting each mask into a mask value, the mask value being a continuous value between 0 and 1. To this end, a smooth approximation of the tanh function is used to generate a mask of edges associated with the spoofed node, as shown in the following equation:

where W and b are learnable parameters, is a superparameter for controlling the smooth approximation strength, and a mask for outputting an edge related to a spoofed node, for convenience, names a disguised corresponding node and edge as a disguised node and disguised edge, respectively. For those normal edges, the mask is close to 1, while for camouflage edges, the mask should be less than 1. Then, the edge characteristic is updated by multiplying the edge characteristic by the element-by-element of the mask by the edge characteristic updating unit, which is indicated as +.. Finally, the corresponding node features are adjusted to align with the edge features by the edge feature update unit. Thus, the camouflage in the snapshot map can be easily removed. Accordingly, the edge weight a _ij of the camouflage edge in the generic graph G is also reduced.

It should be noted that, in the formula parameter representation of the present invention, the same symbol is not distinguished from the symbol with the following symbol: the withdrew is true data and the withdrew is generated false (spoofed) data. Illustratively, y _t (m) represents a mask, and represents an edge mask associated with the spoofed node.

The output of the spoof detection module 226 is the detection result. The spatial and temporal embedding layers may mine the spatial and temporal collusion patterns to assist in fraud detection. Based on the output of the spoofing identification module 224, which is also the output of the last step of , a linear combination and softmax are used to generate an output representing the probability that node v is a spoofer/> as the output of the overall spoof detection module 226.

Optimization objective of masquerading recognition and fraud detection: the optimization targets of the two modules are from the perspective of a supervisor, namely, accurately identifying disguises and detecting deceptions. Specifically, as an example, in another embodiment of the present invention, the fraud detection module 226 further includes a disguise recognition optimizing unit (not shown in the figure) for optimizing disguise recognition using collusion loss and disguise recognition loss after removing disguises of the users, and guiding both the disguise recognition module 224 and the fraud detection module 226 to detect a fraudster using the fraud detection loss.

Wherein the collusion losses include a temporal collusion loss and a spatial collusion loss for directing the camouflage recognition module to remove camouflage edges by maximizing an increment of a temporal and spatial collusion score relative to an initial collusion score after the masking operation. The time collusion loss can be calculated as follows:

Where denotes the time collusion penalty, c represents a population,/> is the output of the masquerading recognition module,/> refers to the time collusion score of c _i after the masking operation. Similarly, the spatial collusion loss can be calculated as:

Where denotes the spatial collusion penalty, c represents a population,/> is the output of the camouflage recognition module,/> refers to the spatial collusion score of c _i after the masking operation.

The disguise recognition loss may direct the discriminant model to remove disguises by supervising the signal. Initially, the supervisory signal can only indicate that the space between the benign user and the fraudster is camouflaged. After the first round of optimization, the generator may provide additional supervisory signals. On this basis, the mask can be learned in a supervised manner by the binary classification task, the mask of the disguised edge generated by the generator is optimized towards 0, and the mask of the other edges is optimized towards 1. Calculating camouflage recognition loss by adopting cross entropy, wherein the method comprises the following steps:

Where is a disguise recognition loss, E is an edge of the space-time diagram, y _t(m_ij) represents a mask,/> represents a mask of an edge related to a spoofed node calculated in the foregoing equation (2), and/> represents a mask of a disguised edge related to a spoofed node.

In the second stage of the arbiter, the objective function for optimizing camouflage recognition can be expressed as follows:

Where gamma ₁,₂ and gamma ₃ are hyper-parameters and is the loss measured by a fraudster.

The deception detection loss is used for optimizing the deception detection module, detecting whether users in the community perform deception activities or not, and the calculation formula is as follows:

Where y _v represents the mask of node v, represents the masquerade mask of node v, and in a fraud context y _v =1 represents a fraudster and y _v =0 represents a benign user.

In one particular embodiment of the invention, the generator includes a masquerading generation module 242 and a spoofing generation module 244 to collectively create a new spoofing sample by the masquerading generation module 242 and the spoofing generation module 244. FIG. 5 is a schematic diagram of an optimization flow of a generator according to an embodiment of the invention, as shown in FIG. 5, in which a fraudster sample is created by adding camouflage and sham-show content to an existing community. In this way, the generator learns to create false fraud samples by incorporating feedback from the arbiter, in particular, to cause the arbiter to classify its output as benign users. After several rounds of optimization, when the generator converges, the generated spoof samples can be used to augment the spoof samples in the training set.

The masquerade generation module 242 is used to add masquerading to existing fraudsters. The method specifically comprises the following steps:

and the disguised edge feature generation unit is used for outputting disguised edge features for each edge related to the deception node in each snapshot graph, wherein for the connection between each deception and other users in each snapshot graph, whether the deception exists or not in the original graph, the features contain all information related to disguise. To ensure that the camouflage generation module can only change the initial feature by adding camouflage, in one embodiment of the invention, reLU is used such that each element in the generated camouflage edge feature is no less than 0.

A sample edge feature generation unit, configured to add the generated camouflage edge feature to the initial edge feature x _t(e_ij) element by element, and generate an edge feature of the new spoofing sample. If there is no edge between vi and vj in the original snapshot, then each bit in x _t(e_ij) is filled with 0 s.

And a feature alignment unit, similar to the deletion operation, for aligning the edge features of the newly generated spoof sample with the node features to generate masquerading of the node. This process is shown in fig. 5 as , unlike the delete operation, the generic graph G is not updated.

Camouflaging the generated optimization objective. Camouflage behavior is from the point of view of the fraudster, i.e. avoiding being discovered at minimal cost. Intuitively, the more camouflage the cost will be. Based on this, the generator follows a basic principle of adding as little camouflage as possible to hide the collusion mode as much as possible. For this purpose, the objective function of the camouflage generation module consists of two parts. The first part is to compete with the arbiter, including maximizing fraud detection loss, fraud identification loss, and collusion loss. Considering camouflage costs, L1 regularization is used to ensure that only a small amount of camouflage is generated. The loss function of the masquerading generation module may be expressed as an equation shown in the following equation (7):

Wherein denotes a camouflage detection loss of the camouflage generation module,/> denotes a spatial collusion loss of the camouflage generation module,/> denotes a temporal collusion loss of the camouflage generation module,/> denotes a camouflage recognition loss of the camouflage generation module,/> denotes L1 regularization of the camouflage generation module for preventing overfitting, ₄,₅,₆ and ₇ being super parameters.

The generated masquerading is further used as a supervision signal to optimize the masquerading identification module. The number of masquerading generated is small, resulting in an imbalance in the supervisory signal. This ensures that only a few edges are judged as camouflage edges by the camouflage recognition module, thereby keeping the camouflage recognition and camouflage generation consistent.

The fraud-indicating generation module 244 is configured to add the fraud-indicating content vector to the existing fraudster in an element-wise addition manner, and to compose a fraudster sample together with the content output by the fraud generation module 242.

In one embodiment of the invention, rather than having a corresponding masquerading recognition module as the masquerading generation module, optimization is performed by maximizing fraud detection loss and collusion loss only. The same regularization is used with L1. The overall loss function of the spurious generation module can be expressed as the following equation:

Where denotes a loss function showing a false generation module,/> denotes a false detection loss showing a false generation module,/> denotes a spatial collusion loss showing a false generation module,/> denotes a temporal collusion loss showing a false generation module, and ₈ and ₉ are hyper-parameters. Up to this point, the overall loss of the generator is:

The antagonism optimization process of the discriminators and generators is exemplified below.

The generator competes with the arbiter in two ways. In one embodiment of the present invention, the network platform fraud detection system 200 further includes a challenge optimization module (not shown) for identifying the masquerading generated by the masquerading generation module by the masquerading identification module 224; and detecting a fraudster in the fraudster sample by the spoof identification module 224 and the fraud detection module 226.

First, the camouflage generation module directly outputs camouflage. Its opponent, the disguise identification module, attempts to identify the disguise generated by the disguise generation module. From an optimization perspective, given the population division C, by fixing ₂, the camouflage generation module optimizes by maximizing collusion losses and/> and maximizing camouflage recognition losses/> to be detrimental to the camouflage recognition module. The pseudo-generation module is shown to optimize by maximizing collusion losses/> and/> . Instead, by fixing , the camouflage identification module optimizes ₂ by minimizing collusion losses/> and/> and minimizing camouflage identification losses/> .

On the other hand, the generator generates a spoof sample, and the spoof identification module and spoof detection module attempt to detect a spoofer. Similarly, by fixing ₂ and ₃, the generator optimizes by maximizing spoof detection error , thereby adversely affecting both modules. Instead, by fixing , the arbiter optimizes ₂ and ₃ by minimizing spoof detection error/> . In this way, the arbiter and the generator are optimized in a antagonistic manner.

The network platform fraud detection system according to the present invention is described above by way of example with reference to the accompanying drawings. Those skilled in the art will appreciate that various modifications may be made to the network platform fraud detection system of the present invention described above without departing from the spirit of the present invention. Accordingly, the scope of the invention should be determined from the following claims.

Claims

1. The network platform fraud detection system is characterized by being realized based on a preset countermeasure generation network model; wherein the countermeasure generation network model includes a discriminator and a generator;

wherein the arbiter is used for detecting the deception under the fake dress, includes:

2. The network platform fraud detection system of claim 1, wherein the group classification module classifies the users to be detected into m groups through cluster-contrast learning using a gaussian mixture model, denoted as: c= { C ₁,c₂,...,c_m }, wherein wherein C is a population set, C represents one population, v represents a node therein, and n _i|c_i | refers to the number of users of the population C _i; m is a superparameter.

3. The network platform fraud detection system of claim 2, wherein the group partitioning module further uses contrast learning to cause the output of the group partitioning module h (v) to contain more negative-sample graph structure information nodes by pulling node v and its positive-sample neighbors toward each other in the embedded space while keeping node v away from other nodes; and

4. The network platform fraud detection system of claim 3, wherein the masquerading identification module comprises:

5. The network platform fraud detection system of claim 4, wherein the mask generation unit includes:

6. The network platform fraud detection system of claim 4, wherein the fraud detection module includes a masquerading identification optimization unit to optimize masquerading identification using collusion loss and masquerading identification loss after removing masquerading of the user; wherein,

The collusion losses include a temporal collusion loss and a spatial collusion loss for directing the camouflage recognition module to remove camouflage edges by maximizing an increment of temporal and spatial collusion scores after the masking operation relative to an initial collusion score; wherein the temporal collusion loss is calculated as follows:

wherein represents a time collusion loss, c represents a group,/> is an output of the disguise recognition module, and/> is a time collusion score c _i after masking operation;

the spatial collusion loss is calculated as:

Wherein represents a spatial collusion loss, c represents a group,/> is an output of the disguise recognition module, and/> is a spatial collusion score of c _i after masking operation;

The disguise identification loss is used for guiding the model to remove disguises through the supervision signal; wherein, the mask is learned in a supervised mode through a binary classification task, the mask of the disguised edge generated by the generator is optimized towards 0, and the mask of the other edges is optimized towards 1; calculating camouflage recognition loss by adopting cross entropy, wherein the method comprises the following steps:

Wherein is camouflage recognition loss, E is an edge of the space-time diagram, y _t(m_ij) represents a mask,/> represents a mask of edges associated with the spoofed node,/> represents a mask of camouflage edges associated with the spoofed node;

the objective function for optimizing camouflage recognition is expressed as follows:

Wherein, gamma ₁,₂ and gamma ₃ are hyper-parameters, is a measure of loss for fraudsters;

7. The network platform fraud detection system of claim 6, wherein the generator includes a masquerading generation module and a fraud indication generation module; wherein,

A camouflage edge feature generating unit, configured to output a camouflage edge feature for each edge related to a spoofed node in each snapshot map, where for a connection between each spoofer and other users in each snapshot map, the camouflage edge feature contains all information related to the camouflage, whether or not an edge exists in the original map;

The fake-indicating generation module is used for adding fake-indicating content vectors to existing deception and forming a deception sample together with the content generated by the fake-indicating generation module.

8. The network platform fraud detection system of claim 7, further comprising a challenge optimization module for identifying, by the masquerade identification module, a masquerade generated by the masquerade generation module; and

Detecting a fraudster in the fraudster sample by the disguise identification module and the fraudster detection module.

9. The network platform fraud detection system of claim 8, wherein identifying, by the masquerade identification module, the masquerade generated by the masquerade generation module comprises:

a given group partition C;

10. A network platform fraud detection system according to claim 9, wherein detecting a fraudster in the fraudster sample by the spoof identification module and the fraud detection module includes:

Fixed ₂ and ₃, the generator optimizes by maximizing the spoof detection error ;

fixed , the arbiter optimizes ₂ and ₃ by minimizing spoof detection error .