CN114169458A

CN114169458A - Method and device for identifying cheater, storage medium and computer equipment

Info

Publication number: CN114169458A
Application number: CN202111515610.1A
Authority: CN
Inventors: 陈雪娇
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2021-12-13
Filing date: 2021-12-13
Publication date: 2022-03-11

Abstract

The application discloses a method and a device for identifying a cheater, a storage medium and computer equipment, wherein the method comprises the following steps: determining at least one initial node from a plurality of personnel nodes, and generating at least one path sequence according to a preset topological network and the at least one initial node; inputting the at least one path sequence into a preset unsupervised feature recognition model to obtain node feature vectors corresponding to the personnel nodes contained in the path sequence; and inputting the node feature vectors corresponding to the nodes of the people to be predicted into a preset classification model to obtain a prediction classification value corresponding to each node of the people to be predicted, and identifying cheaters according to the prediction classification values. The method and the device can effectively utilize the incidence relation among different people, save the labor identification cost, and accurately and efficiently identify the cheater from the cheating group.

Description

Method and device for identifying cheater, storage medium and computer equipment

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a method and an apparatus for identifying a fraudulent person, a storage medium, and a computer device.

Background

With the development of various services such as insurance services, fraud is being performed in cooperation to gain profit. Many fraudulent activities are characterized by group committees, and therefore, it is important to identify the fraudulent person from the fraudulent group based on a large amount of data.

Identification of fraudsters is nowadays performed either manually or by machine learning methods. For the first method, the relevant information needs to be collected manually before manual identification, which causes a lot of waste of time cost and labor cost, meanwhile, manual identification has strong subjectivity, and when manual experience is insufficient, the identification result is greatly influenced. For the second method, the behavior possibly belonging to fraud is detected by identifying through a machine learning method such as a BP neural network and fitting different fraud possible weights through calculating a decision factor, and the method usually ignores the possible relation among different persons and is very disadvantageous for identifying the fraud persons from fraud groups such as insurance claim services.

Disclosure of Invention

In view of this, the present application provides a method and an apparatus for identifying a fraudster, a storage medium, and a computer device, which can effectively utilize the association relationship between different people, save the human identification cost, and accurately and efficiently identify the fraudster from a fraudster group.

According to one aspect of the application, a method for identifying a cheater is provided, which comprises the following steps:

determining at least one initial node from a plurality of personnel nodes, and generating at least one path sequence according to a preset topology network and the at least one initial node, wherein the preset topology network comprises the personnel nodes, and the path sequence comprises the personnel nodes to be predicted;

inputting the at least one path sequence into a preset unsupervised feature recognition model to obtain node feature vectors corresponding to the personnel nodes contained in the path sequence, wherein the node feature vectors are feature vectors enabling the occurrence probability of front nodes and rear nodes under preset window values of the corresponding personnel nodes to be maximum, the front nodes are nodes located in front of the personnel nodes in the path sequence, and the rear nodes are nodes located behind the personnel nodes in the path sequence;

and inputting the node feature vectors corresponding to the nodes of the people to be predicted into a preset classification model to obtain a prediction classification value corresponding to each node of the people to be predicted, and identifying cheaters according to the prediction classification values.

Optionally, the generating at least one path sequence according to the preset topology network and the at least one initial node specifically includes:

determining a node similarity matrix corresponding to a preset topological network according to the preset topological network;

determining a target neighbor node corresponding to the initial node based on the node similarity matrix, taking the target neighbor node as a next node, and changing the current node from the initial node to the next node;

and repeatedly determining the next node of the current node according to the node similarity matrix until the number of the personnel nodes contained in the path sequence is consistent with the preset node number, and generating the path sequence.

Optionally, the determining, according to a preset topological network, a node similarity matrix corresponding to the preset topological network specifically includes:

determining a target adjacency matrix corresponding to a preset topological network based on the preset topological network;

calculating a Jaccard similarity matrix and a cosine similarity matrix corresponding to the preset topological network according to the target adjacency matrix;

and determining the node similarity matrix based on the Jaccard similarity matrix, the cosine similarity matrix and a preset weight parameter.

Optionally, before determining the target adjacency matrix corresponding to the preset topological network based on the preset topological network, the method further includes:

acquiring a historical service handling list corresponding to a target service, extracting personnel information in the historical service handling list, and determining an association relation between different personnel based on the personnel information;

and taking the personnel as personnel nodes, and constructing association edges among different personnel nodes according to the association relationship so as to construct the preset topology network.

Optionally, the path sequence further includes a trainee node; before the node feature vectors corresponding to the nodes of the person to be predicted are input into a preset classification model, the method further comprises the following steps:

inputting the node feature vectors corresponding to the training personnel nodes into the preset classification model, and calculating a model loss value of the preset classification model according to an output predicted classification value of the training personnel and a real classification value corresponding to the training personnel nodes;

and when the model loss value is larger than a preset loss threshold value, adjusting preset model parameters of the preset classification model, inputting the node feature vectors corresponding to the nodes of the training personnel into the adjusted preset classification model again, recalculating the model loss value according to the output predicted classification value of the training personnel, and repeating the steps until the model loss value is smaller than the preset loss threshold value.

Optionally, after repeating the above steps until the model loss value is smaller than the preset loss threshold, the method further includes:

recording the model loss value as a first loss value;

adjusting the number of the preset nodes and/or the preset weight parameters corresponding to the path sequence, obtaining the node feature vectors corresponding to the personnel nodes contained in the path sequence again, inputting the node feature vectors corresponding to the training personnel nodes into the preset classification model, and determining the model loss value again until the model loss value is smaller than or equal to the preset loss threshold value, and recording the model loss value as a second loss value;

determining a minimum loss value from the first loss value and at least one second loss value, inputting the node feature vector corresponding to the node of the person to be predicted into the preset classification model corresponding to the minimum loss value, and identifying the cheater according to the output predicted classification value.

Optionally, after the fraudster is identified according to the output predicted classification value, the method further comprises:

and acquiring the historical service handling list corresponding to any one of the cheaters, counting the number of the historical service handling lists, and marking the cheater as a heavy risk cheater when the number of the historical service handling lists is greater than a preset number threshold.

According to another aspect of the present application, there is provided a fraudster identifying apparatus comprising:

the system comprises a sequence generation module, a path prediction module and a prediction module, wherein the sequence generation module is used for determining at least one initial node from a plurality of personnel nodes and generating at least one path sequence according to a preset topology network and the at least one initial node, the preset topology network comprises the personnel nodes, and the path sequence comprises the personnel nodes to be predicted;

a feature vector generation module, configured to input the at least one path sequence into a preset unsupervised feature recognition model, and obtain a node feature vector corresponding to each of the staff nodes included in the path sequence, where the node feature vector is a feature vector that maximizes occurrence probability of a front node and a rear node under a preset window value of the corresponding staff node, the front node is a node located before the staff node in the path sequence, and the rear node is a node located after the staff node in the path sequence;

and the personnel identification module is used for inputting the node feature vectors corresponding to the personnel nodes to be predicted into a preset classification model, obtaining a prediction classification value corresponding to each personnel node to be predicted, and identifying the cheater according to the prediction classification value.

Optionally, the sequence generating module is specifically configured to:

determining a node similarity matrix corresponding to a preset topological network according to the preset topological network; determining a target neighbor node corresponding to the initial node based on the node similarity matrix, taking the target neighbor node as a next node, and changing the current node from the initial node to the next node; and repeatedly determining the next node of the current node according to the node similarity matrix until the number of the personnel nodes contained in the path sequence is consistent with the preset node number, and generating the path sequence.

Optionally, the sequence generating module is specifically further configured to:

determining a target adjacency matrix corresponding to a preset topological network based on the preset topological network; calculating a Jaccard similarity matrix and a cosine similarity matrix corresponding to the preset topological network according to the target adjacency matrix; and determining the node similarity matrix based on the Jaccard similarity matrix, the cosine similarity matrix and a preset weight parameter.

Optionally, the apparatus further comprises:

the topological network construction module is used for acquiring a historical service handling list corresponding to a target service before determining a target adjacency matrix corresponding to the preset topological network based on the preset topological network, extracting personnel information in the historical service handling list, and determining the incidence relation among different personnel based on the personnel information; and taking the personnel as personnel nodes, and constructing association edges among different personnel nodes according to the association relationship so as to construct the preset topology network.

Optionally, the path sequence further includes a trainee node; the device further comprises:

a loss value calculation module, configured to, before the node feature vector corresponding to the to-be-predicted person node is input to a preset classification model, input the node feature vector corresponding to the training person node to the preset classification model, and calculate a model loss value of the preset classification model according to an output training person predicted classification value and a real classification value corresponding to the training person node;

and the judging module is used for adjusting the preset model parameters of the preset classification model when the model loss value is greater than a preset loss threshold value, inputting the node feature vectors corresponding to the nodes of the trainee into the adjusted preset classification model again, recalculating the model loss value according to the output predicted classification value of the trainee, and repeating the steps until the model loss value is less than the preset loss threshold value.

Optionally, the apparatus further comprises:

the marking module is used for marking the model loss value as a first loss value after the steps are repeated until the model loss value is smaller than the preset loss threshold value;

a parameter adjusting module, configured to adjust the number of preset nodes and/or the preset weight parameter corresponding to the path sequence, obtain the node feature vectors corresponding to the staff nodes included in the path sequence again, input the node feature vectors corresponding to the training staff nodes into the preset classification model, and determine the model loss value again, until the model loss value is less than or equal to the preset loss threshold value, record the model loss value as a second loss value;

the person identification module is further configured to determine a minimum loss value from the first loss value and the at least one second loss value, input the node feature vector corresponding to the node of the person to be predicted into the preset classification model corresponding to the minimum loss value, and identify the fraudster according to the output predicted classification value.

Optionally, the marking module is further configured to, after identifying a fraudster according to the output predicted classification value, obtain the historical service transaction list corresponding to any fraudster, count the number of the historical service transaction lists, and mark the fraudster as a heavy risk fraudster when the number of the historical service transaction lists is greater than a preset number threshold.

According to yet another aspect of the present application, there is provided a storage medium having stored thereon a computer program which, when executed by a processor, implements the above-described fraudster identification method.

According to yet another aspect of the present application, there is provided a computer device comprising a storage medium, a processor, and a computer program stored on the storage medium and executable on the processor, the processor implementing the above-mentioned fraudster identification method when executing the program.

By applying the technical scheme of the embodiment, firstly, one or more initial nodes can be determined from a plurality of personnel nodes, and then, on the basis of the preset topology network, a corresponding path sequence is generated according to the preset topology network and each initial node. And then, all the generated path sequences are input into a preset unsupervised feature recognition model, and node feature vectors corresponding to all the personnel nodes contained in the path sequences can be obtained through the preset unsupervised feature recognition model. After the node feature vectors are obtained, the node feature vectors can be input into a preset classification model, a prediction classification value corresponding to each node of the person to be predicted can be obtained through the preset classification model, and then the cheater can be identified from the person to be predicted on the basis of the prediction classification value. According to the embodiment of the application, the path sequence is generated through the preset topological network, the cheater is identified according to the path sequence, the incidence relation among different persons can be effectively utilized, the labor identification cost is saved, and meanwhile the cheater can be accurately and efficiently identified from the cheating group.

The foregoing description is only an overview of the technical solutions of the present application, and the present application can be implemented according to the content of the description in order to make the technical means of the present application more clearly understood, and the following detailed description of the present application is given in order to make the above and other objects, features, and advantages of the present application more clearly understandable.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

fig. 1 is a schematic flow chart illustrating a method for identifying a fraudster according to an embodiment of the present application;

FIG. 2 is a flow chart illustrating another method for identifying a fraudster according to an embodiment of the present application;

fig. 3 shows a schematic structural diagram of a fraudster identification device provided by an embodiment of the present application.

Detailed Description

The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

In this embodiment, a method for identifying a fraudster is provided, as shown in fig. 1, the method includes:

step 101, determining at least one initial node from a plurality of personnel nodes, and generating at least one path sequence according to a preset topology network and the at least one initial node, wherein the preset topology network comprises the plurality of personnel nodes, and the path sequence comprises the personnel nodes to be predicted;

the embodiment of the application is mainly applicable to a scene of identifying a cheater, and an execution main body of the embodiment of the application is a device or equipment capable of identifying the cheater, and specifically can be arranged on a client side or a server side. The method for identifying the cheater, provided by the embodiment of the application, can be preset with a plurality of personnel nodes, each personnel node represents a person, and the plurality of personnel nodes can comprise personnel nodes to be predicted. For example, in an identification scenario for insurance fraudsters, each person node may be an insurance claim holder, and the person node to be predicted may be a person node that needs to identify whether the person belongs to a fraudster. First, one or more initial nodes may be determined from a plurality of personnel nodes, and then, based on a preset topology network, a path sequence is generated according to the preset topology network and the initial nodes. Each initial node may generate a path sequence correspondingly. Here, the preset topology network may include a plurality of personnel nodes as described above, and may include association relationships between different personnel nodes, such as a relationship between relatives and persons, a relationship between doctors and patients, and there may be a corresponding association edge between two personnel nodes having a relationship with each other. Specifically, after the initial node is determined, several neighbor nodes with higher similarity to the initial node may be found from all neighbor nodes corresponding to the initial node according to a preset topology network, and one of the several neighbor nodes is randomly selected as a first destination node, where the node similarity refers to a similarity of a relationship between any two person nodes, for example, a similarity of daily contact persons of the node a and the node B, that is, which persons the node a contacts in a daily manner, which persons the node B contacts in a daily manner, which persons between the two groups are consistent, and the more consistent persons are, the higher the node similarity between the node a and the node B is. And then according to a preset topological network, selecting a plurality of neighbor nodes with higher similarity with the first destination node from all neighbor nodes of the first destination node, randomly selecting one of the neighbor nodes as a second destination node … …, and so on, and finally generating a path sequence. Here, the generation of the path sequence is actually performed according to a similarity relationship between the human nodes in the preset topology network, where each next node is preferentially selected, and one neighbor node is selected from the neighbor nodes having higher similarity with the current node, rather than a certain neighbor node is randomly selected from all the neighbor nodes. Based on the method and the device, the complex relationships among the closely related people in the cheating group can be well utilized, and important clues are provided for mining the cheating people from the cheating group.

102, inputting the at least one path sequence into a preset unsupervised feature recognition model to obtain a node feature vector corresponding to each personnel node in the path sequence, wherein the node feature vector is a feature vector which enables the occurrence probability of a front node and a rear node under a preset window value of the corresponding personnel node to be maximum, the front node is a node located in front of the personnel node in the path sequence, and the rear node is a node located behind the personnel node in the path sequence;

in this embodimentOne or more path sequences may be obtained by the above method. When the number of path sequences is sufficiently large, these path sequences may include each person node. And then, all the generated path sequences are input into a preset unsupervised feature recognition model, and node feature vectors corresponding to each person node contained in the path sequences can be obtained through the preset unsupervised feature recognition model. Here, the preset unsupervised feature recognition model may be a word2vec model, and after the path sequences are input to the word2vec model, low-dimensional node feature vectors corresponding to the respective personnel nodes may be obtained. The word2vec model is an unsupervised model and does not require the division of training and test sets. If the path sequence is W_v1＝{v₁,v₂,...v_tAnd the node feature vector obtained by learning through a preset unsupervised feature recognition model can be expressed as { h }₁,h₂,...h_tIn which h is_i(i is more than or equal to 1 and less than or equal to t) is a row vector. Therefore, the most classical SkipGram model in word2vec can be used for calculation, namely the following conditional probability function needs to be minimized:

wherein w is the size of the settable co-occurrence window, i.e. the preset window value. Minimizing the function means that the center node v is passed_iIs expressed as h_iThe probability of the occurrence of the front and rear nodes under the preset window value can be maximized, and the probability of the occurrence of the non-front and rear nodes can be minimized. For example, in v₂Presetting the number t of nodes as 7 for the initial node, and obtaining the path sequence W_v2＝{v₂,v₅,v₄,v₇,v₁,v₆,v₉}. If the central node is v₇If the preset window value w is set to 2, the central node v₇The front and rear nodes under the preset window value are v₅,v₄,v₁,v₆(ii) a And v is₂,v₉Are non-front and back nodes. Front and rear nodes v₅,v₄,v₁,v₆General of appearanceAt maximum rate, i.e. at central node v_iIs denoted by h_iUnder the condition of v₅,v₄,v₁,v₆The probability of occurrence needs to be maximized. Therefore, a low-dimensional representation of all nodes can be learned by training the function.

Further, equation (1) may be implemented by equation (2):

wherein v is_j∈W_v2. i + w represents the rear node portion of the central node and i-w represents the front node portion of the central node. While equation (2) can be optimized using random gradient descent (SGD). And the low-dimensional representation obtained after optimization is the output node low-dimensional representation.

Step 103, inputting the node feature vectors corresponding to the nodes of the person to be predicted into a preset classification model, obtaining a prediction classification value corresponding to each node of the person to be predicted, and identifying a cheater according to the prediction classification value.

In this embodiment, after the node feature vectors are obtained, the node feature vectors corresponding to the nodes of the people to be predicted may be determined, the node feature vectors are input into the preset classification model, the prediction classification value corresponding to each node of the people to be predicted may be obtained through the preset classification model, and then the cheater may be identified from the people to be predicted based on the prediction classification value. For example, if the prediction classification value corresponding to the person to be predicted is 1, it may be determined that the person to be predicted is a fraudulent person; and if the prediction classification value corresponding to the person to be predicted is 0, judging that the person to be predicted is a non-fraudulent person. Here, the preset classification model may be an SVM classifier.

Further, as a refinement and an extension of the specific implementation of the above embodiment, in order to fully illustrate the specific implementation process of the embodiment, another fraudster identification method is provided, as shown in fig. 2, where the method includes:

step 201, acquiring a historical service handling list corresponding to a target service, extracting personnel information in the historical service handling list, and determining an incidence relation between different personnel based on the personnel information; taking the personnel as personnel nodes, and constructing association edges among different personnel nodes according to the association relationship so as to construct the preset topology network;

in this embodiment, the target service may be determined first, the historical service transaction lists corresponding to the target service are obtained from the preset database, and the staff information corresponding to each historical service transaction list is extracted from the historical service transaction lists. For example, when the target service is a health insurance claim service, the historical service processing list may be a claim insurance policy corresponding to the health insurance claim service, and each personal information may be extracted from the claim insurance policies, respectively. Then, whether an association relationship exists between any two persons is determined according to the person information, wherein the association relationship can be an identity relationship between the two persons, such as a relationship between relatives and doctors and patients, a communication relationship, such as communication through telephone, mail, information, WeChat and the like, or other association relationships. Taking each person as a person node in a preset topology network, and then constructing an association edge between different person nodes according to the association relations, for example, when an association relation exists between two persons, constructing an edge between person nodes corresponding to the two persons to indicate that a certain relation exists between the two persons. And when the construction of the associated edges among all the personnel nodes with the associated relationship is finished, finishing the construction of the preset topological network.

Step 202, determining at least one initial node from a plurality of personnel nodes; determining a node similarity matrix corresponding to a preset topological network according to the preset topological network; determining a target neighbor node corresponding to the initial node based on the node similarity matrix, taking the target neighbor node as a next node, and changing the current node from the initial node to the next node; repeatedly determining the next node of the current node according to the node similarity matrix until the number of the personnel nodes contained in the path sequence is consistent with the number of preset nodes, and generating the path sequence;

in this embodiment, one or more initial nodes may be determined from all the human nodes included in the preset topology network, and the determination may be performed randomly. And then, according to a preset topological network, determining a node similarity matrix corresponding to the preset topological network. And then determining a target neighbor node from a plurality of neighbor nodes of the initial node according to the node similarity matrix, and taking the target neighbor node as the next node of the initial node. Specifically, several neighboring nodes with higher similarity are determined from a plurality of neighboring nodes of the initial node according to the node similarity matrix, and then one neighboring node is randomly determined from the several neighboring nodes to serve as a target neighboring node; the target neighbor node of the initial node may also be determined according to a preset rule according to the node similarity matrix, for example, in the first selection, the node with the highest similarity to the initial node is taken as the target neighbor node, in the second selection, the node with the second highest similarity to the target neighbor node is taken as the next node … …, and until the similarity is reduced to a preset similarity threshold, the node with the highest similarity to the current node is reselected as the next node. And then repeating the steps to determine the next node of the current node, and increasing the number of the nodes in the path sequence by 1 each time the next node is determined and the current node is changed into the next node. And when the number of the personnel nodes in the path sequence reaches the preset number of the nodes, indicating that the path sequence is completely generated. Here, the path sequence is actually generated according to a node similarity matrix, where each next node is preferentially selected to be one of neighboring nodes having higher similarity with the current node, rather than randomly selecting a neighboring node from all neighboring nodes. Based on the method and the device, the complex relationships among the closely related people in the cheating group can be well utilized, and important clues are provided for mining the cheating people from the cheating group.

Step 203, inputting the at least one path sequence into a preset unsupervised feature recognition model to obtain node feature vectors corresponding to the personnel nodes contained in the path sequence;

step 204, inputting the node feature vectors corresponding to the nodes of the training personnel into the preset classification model, and calculating a model loss value of the preset classification model according to the output predicted classification value of the training personnel and the real classification value corresponding to the nodes of the training personnel;

in this embodiment, the path sequence further includes a trainee node; and the generated path sequence is completely input into a preset unsupervised feature recognition model, and node feature vectors corresponding to all personnel nodes contained in the path sequence can be output from the preset unsupervised feature recognition model. After the node feature vectors corresponding to each personnel node are obtained, which node feature vectors are the node feature vectors corresponding to the training personnel nodes can be determined from the node feature vectors, the node feature vectors corresponding to the training personnel nodes are input into a preset classification model, and the predicted classification values of the training personnel can be output. Then, the model loss value of the preset classification model can be calculated through the predicted classification value of the training personnel and the real classification value of the training personnel. Specifically, the predicted classification value of the trainee may be 0 or 1, the true classification value may also be 0 or 1, an error corresponding to each trainee node is calculated, and if the predicted classification value is the same as the true classification value, the error is 0. And finally, adding the error values corresponding to all the training personnel to obtain a model loss value of the preset classification model. In addition, the model loss value of the preset classification model can be calculated according to a preset cross entropy function.

Step 205, when the model loss value is greater than a preset loss threshold, adjusting preset model parameters of the preset classification model, inputting node feature vectors corresponding to the nodes of the trainee into the adjusted preset classification model again, recalculating the model loss value according to the output predicted classification value of the trainee, and repeating the steps until the model loss value is less than the preset loss threshold;

in this embodiment, when the calculated model loss value of the preset classification model is greater than the preset loss threshold, it indicates that the classification effect of the preset classification model is poor, at this time, the preset model parameters of the preset classification model may be adjusted, specifically, the preset adjustment policy may be adjusted, after adjustment, the node feature vector corresponding to each training person node is input into the preset classification model after parameter adjustment again, and the predicted classification value may be output again, the predicted classification value of the training person is determined from these predicted classification values, and the model loss value is recalculated. When the calculated model loss value is smaller than or equal to the preset loss threshold value, the classification result accuracy of the preset classification model is high, the classification result can be accepted, the node feature vector corresponding to the node of the person to be predicted can be input into the preset classification model, and whether the person to be predicted corresponding to the node of the person to be predicted is a cheater or not can be further determined according to the output of the preset classification model. For example, when the prediction classification value corresponding to the node of the person to be predicted is 0, it is determined that the person to be predicted is not a fraudster, and when the prediction classification value corresponding to the node of the person to be predicted is 1, it is determined that the person to be predicted is a fraudster.

And step 206, acquiring the historical service handling list corresponding to any fraud person, counting the number of the historical service handling lists, and marking the fraud person as a heavy risk fraud person when the number of the historical service handling lists is greater than a preset number threshold.

In this embodiment, after identifying whether the person to be predicted is a fraudster, the historical service transaction list corresponding to each person to be predicted identified as the fraudster may also be obtained, and the number of the historical service transaction lists of each fraudster is counted. When the statistical result shows that the number of the historical service handling lists of a certain cheater is larger than the preset number threshold, the cheater can be marked, and specifically the cheater can be marked as a heavy risk cheater.

In addition, all the heavy risk cheaters can be output in a list form, and historical business handling times information is recorded in the list, so that the user is prompted to pay attention to the heavy risk cheaters.

In this embodiment of the present application, optionally, the step 202 of "determining a node similarity matrix corresponding to a preset topological network according to the preset topological network" specifically includes: determining a target adjacency matrix corresponding to a preset topological network based on the preset topological network; calculating a Jaccard similarity matrix and a cosine similarity matrix corresponding to the preset topological network according to the target adjacency matrix; and determining the node similarity matrix based on the Jaccard similarity matrix, the cosine similarity matrix and a preset weight parameter.

In this embodiment, based on the preset topology network, the adjacency matrix corresponding to the preset topology network may be determined as the target adjacency matrix. Then, according to the target adjacency matrix, a Jaccard similarity matrix and a cosine similarity matrix can be obtained through calculation, and the two matrixes can reflect the similarity relation between the personnel nodes in the preset topological network. Wherein, for sets a and B, the Jaccard similarity is calculated as follows: jaccard (a, B) | a intersect B |/| a unit B |, i.e. the ratio between the size of the intersection of set a and set B and the size of the union of set a and set B. And then determining a node similarity matrix according to the Jaccard similarity matrix, the cosine similarity matrix and the preset weight parameters. For example, the preset weight parameter is (0.8,0.2), the Jaccard similarity matrix is denoted by a, the cosine similarity matrix is denoted by B, and the node similarity matrix is denoted by C, so that C is 0.8 × a +0.2 × B.

In this embodiment of the present application, optionally, after "repeating the above steps until the model loss value is smaller than the preset loss threshold" in step 206, the method further includes: recording the model loss value as a first loss value; adjusting the number of the preset nodes and/or the preset weight parameters corresponding to the path sequence, obtaining the node feature vectors corresponding to the personnel nodes contained in the path sequence again, inputting the node feature vectors corresponding to the training personnel nodes into the preset classification model, and determining the model loss value again until the model loss value is smaller than or equal to the preset loss threshold value, and recording the model loss value as a second loss value; determining a minimum loss value from the first loss value and at least one second loss value, inputting the node feature vector corresponding to the node of the person to be predicted into the preset classification model corresponding to the minimum loss value, and identifying the cheater according to the output predicted classification value.

In this embodiment, when the model loss value is less than or equal to the preset loss threshold, it indicates that the classification result of the preset classification model is acceptable, and the training of the preset classification model is stopped. At this time, the model loss value may be regarded as a first loss value. Since the preset weight parameters and the number of preset nodes corresponding to the path sequence in the previous step have influence on the final recognition result of the cheater, the preset weight parameters and/or the number of the preset nodes can be properly adjusted, the node feature vector corresponding to each person node is obtained again, the model loss value of the initial preset classification model is determined again through the node feature vector corresponding to the training person node, and when the model loss value of the preset classification model is greater than the preset loss threshold value, the preset model parameters of the preset classification model are adjusted according to the preset adjustment strategy; and when the model loss value of the preset classification model is smaller than or equal to the preset loss threshold value, recording the corresponding model loss value as a second loss value. And the preset weight parameters and the number of the preset nodes can be adjusted for multiple times, and each time of adjustment corresponds to a second loss value. And then determining a minimum loss value from the first loss value and the one or more second loss values, indicating that the classification effect of the preset classification model is best under the corresponding preset weight parameters and the preset node number, inputting the node feature vector corresponding to the person to be predicted into the preset classification model under the condition, and identifying the cheater according to the output predicted classification value. According to the embodiment of the application, the identification accuracy of the cheater can be improved by adjusting the preset weight parameters and the preset node number which have large influence on the classification result.

Further, as a specific implementation of the method in fig. 1, an embodiment of the present application provides a device for identifying a fraudster, as shown in fig. 3, the device includes:

Optionally, the sequence generating module is specifically configured to:

Optionally, the apparatus further comprises:

It should be noted that other corresponding descriptions of the functional units related to the fraud identifier apparatus provided in the embodiment of the present application may refer to corresponding descriptions in the methods in fig. 1 to fig. 2, and are not described herein again.

Based on the above methods shown in fig. 1 to 2, correspondingly, the present application further provides a storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the method for identifying a fraudster as shown in fig. 1 to 2 is implemented.

Based on such understanding, the technical solution of the present application may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.), and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the implementation scenarios of the present application.

Based on the above methods shown in fig. 1 to fig. 2 and the virtual device embodiment shown in fig. 3, in order to achieve the above object, an embodiment of the present application further provides a computer device, which may specifically be a personal computer, a server, a network device, and the like, where the computer device includes a storage medium and a processor; a storage medium for storing a computer program; a processor for executing a computer program to implement the above-described fraudster identification method as shown in fig. 1-2.

Optionally, the computer device may further include a human interface, a network interface, a camera, Radio Frequency (RF) circuitry, sensors, audio circuitry, a WI-FI module, and so forth. The human interface may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), etc., and the optional human interface may also include a USB interface, a card reader interface, etc. The network interface may optionally include a standard wired interface, a wireless interface (e.g., a bluetooth interface, WI-FI interface), etc.

It will be appreciated by those skilled in the art that the present embodiment provides a computer device architecture that is not limiting of the computer device, and that may include more or fewer components, or some components in combination, or a different arrangement of components.

The storage medium may further include an operating system and a network communication module. An operating system is a program that manages and maintains the hardware and software resources of a computer device, supporting the operation of information handling programs, as well as other software and/or programs. The network communication module is used for realizing communication among components in the storage medium and other hardware and software in the entity device.

Through the above description of the embodiments, those skilled in the art will clearly understand that the present application can be implemented by software plus a necessary general hardware platform, and can also be implemented by hardware. First, one or more initial nodes may be determined from a plurality of personnel nodes, and then, based on a preset topology network, a corresponding path sequence is generated according to the preset topology network and each initial node. And then, all the generated path sequences are input into a preset unsupervised feature recognition model, and node feature vectors corresponding to all the personnel nodes contained in the path sequences can be obtained through the preset unsupervised feature recognition model. After the node feature vectors are obtained, the node feature vectors can be input into a preset classification model, a prediction classification value corresponding to each node of the person to be predicted can be obtained through the preset classification model, and then the cheater can be identified from the person to be predicted on the basis of the prediction classification value. According to the embodiment of the application, the path sequence is generated through the preset topological network, the cheater is identified according to the path sequence, the incidence relation among different persons can be effectively utilized, the labor identification cost is saved, and meanwhile the cheater can be accurately and efficiently identified from the cheating group.

Those skilled in the art will appreciate that the figures are merely schematic representations of one preferred implementation scenario and that the blocks or flow diagrams in the figures are not necessarily required to practice the present application. Those skilled in the art will appreciate that the modules in the devices in the implementation scenario may be distributed in the devices in the implementation scenario according to the description of the implementation scenario, or may be located in one or more devices different from the present implementation scenario with corresponding changes. The modules of the implementation scenario may be combined into one module, or may be further split into a plurality of sub-modules.

The above application serial numbers are for description purposes only and do not represent the superiority or inferiority of the implementation scenarios. The above disclosure is only a few specific implementation scenarios of the present application, but the present application is not limited thereto, and any variations that can be made by those skilled in the art are intended to fall within the scope of the present application.

Claims

1. A fraudster identification method, comprising:

2. The method according to claim 1, wherein the generating at least one path sequence according to the preset topology network and the at least one initial node specifically includes:

3. The method according to claim 2, wherein the determining a node similarity matrix corresponding to a preset topology network according to the preset topology network specifically includes:

4. The method according to claim 3, wherein before determining the target adjacency matrix corresponding to the preset topological network based on the preset topological network, the method further comprises:

5. The method of claim 1, wherein the sequence of paths further comprises training personnel nodes; before the node feature vectors corresponding to the nodes of the person to be predicted are input into a preset classification model, the method further comprises the following steps:

6. The method of claim 5, wherein after repeating the above steps until the model loss value is less than the preset loss threshold, the method further comprises:

recording the model loss value as a first loss value;

7. The method of claim 6, wherein after identifying the fraudster based on the output predicted classification value, the method further comprises:

8. A fraudster identification device, comprising:

9. A storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the method of any of claims 1 to 7.

10. A computer device comprising a storage medium, a processor and a computer program stored on the storage medium and executable on the processor, characterized in that the processor implements the method of any one of claims 1 to 7 when executing the computer program.