CN114418780A

CN114418780A - Method, apparatus, computer device and storage medium for identifying fraudulent groups

Info

Publication number: CN114418780A
Application number: CN202210234874.8A
Authority: CN
Inventors: 李锦珊; 李恩燮; 叶秀春; 化成君; 邱少斌; 谢坤桉
Original assignee: Taiping Financial Technology Services Shanghai Co Ltd Shenzhen Branch
Current assignee: Taiping Financial Technology Services Shanghai Co Ltd Shenzhen Branch
Priority date: 2022-03-11
Filing date: 2022-03-11
Publication date: 2022-04-29
Anticipated expiration: 2042-03-11
Also published as: CN114418780B

Abstract

The application relates to a method, an apparatus, a computer device and a storage medium for identifying a fraudulent group. The method comprises the following steps: acquiring a plurality of associated topological networks from the knowledge graph, wherein the associated topological networks comprise a plurality of associated case nodes and non-case nodes corresponding to the associated case nodes; calculating target information entropy of non-case nodes aiming at each associated case node, and determining risk level of the associated case node according to the target information entropy of the non-case nodes; determining a risk level of a group in the associated topological network based on the risk level of the associated case node; a group includes non-case nodes in an associated topology network. By adopting the method, the accuracy of risk prediction of the cheating group can be improved.

Description

Method, apparatus, computer device and storage medium for identifying fraudulent groups

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for identifying a fraud group, a computer device, and a storage medium.

Background

With the rapid development of the insurance industry, the demand of people for various insurance products is increasing. However, there are often criminals who may commit insurance fraud for their illegal interests, and there are increasing instances of group crime. In order to ensure the good development of the insurance industry, corresponding means are needed to carry out risk prediction on related parties, and manual investigation is initiated on high-risk parties.

Conventionally, when risk prediction is performed on a related group, risk scoring is performed on each node according to the risk taking times of each node in the group, and then a comprehensive risk score of the group is determined according to the risk score of each node, so that the risk level of the group is judged according to the comprehensive risk score.

However, the above described risk prediction for related parties is less accurate.

Disclosure of Invention

In view of the above, there is a need to provide a fraud group identification method, apparatus, computer device and computer readable storage medium capable of improving accuracy of risk prediction for fraud groups.

In a first aspect, the present application provides a fraudulent group identification method. The method comprises the following steps:

acquiring a plurality of associated topological networks from the knowledge graph, wherein the associated topological networks comprise a plurality of associated case nodes and non-case nodes corresponding to the associated case nodes; calculating target information entropy of non-case nodes aiming at each associated case node, and determining risk level of the associated case node according to the target information entropy of the non-case nodes; determining a risk level of a group in the associated topological network based on the risk level of the associated case node; a group includes non-case nodes in an associated topology network.

In one embodiment, for each associated case node, calculating target information entropy of non-case nodes comprises:

aiming at each associated case node, determining a target non-case node from the non-case nodes; removing non-case nodes except the target non-case node from the non-case nodes to serve as other non-case nodes; and aiming at other non-case nodes, determining the distribution matrix of other non-case nodes from the associated topological network, and calculating the target information entropy of the target non-case node according to the distribution matrix of other non-case nodes.

In one embodiment, calculating the target information entropy of the target non-case node according to the distribution matrix of other non-case nodes comprises:

calculating the initial information entropy of the target non-case node according to the distribution matrix of other non-case nodes; acquiring the maximum value of each initial information entropy, and taking the maximum value as the intermediate information entropy of the target non-case node; and adjusting the intermediate information entropy of each target non-case node according to the size of the intermediate information entropy to generate the target information entropy of the target non-case node.

In one embodiment, adjusting the intermediate information entropy of each target non-case node according to the size of the intermediate information entropy to generate the target information entropy of the target non-case node includes:

determining a plurality of target non-case node sets from the associated topological network, wherein the target non-case node sets comprise a plurality of target non-case nodes of the same type; aiming at each target non-case node set, determining a first target non-case node from the target non-case node set, wherein the risk value corresponding to the target information entropy of the first target non-case node is the highest; calculating case node betweenness between the first target non-case node and other target non-case nodes according to a shortest path algorithm, and calculating a risk transfer distance of the first target non-case node; and adjusting the intermediate information entropy of each target non-case node according to the case node betweenness and the risk transfer distance to generate the target information entropy of the target non-case node.

In one embodiment, determining the risk level of the associated case node according to the target information entropy of the non-case node comprises:

classifying the non-case nodes for the first time to generate a first category; the first type comprises a first type of non-case node and a second type of non-case node; the first type of non-case nodes comprise non-case nodes directly related to cases, and the second type of non-case nodes comprise non-case nodes indirectly related to cases; determining a risk value of the non-case node according to a first type of the non-case node and a target information entropy of the non-case node and a preset rule corresponding to the type; classifying the non-case nodes for the second time to generate a second category; the second type comprises a third type non-case node, a fourth type non-case node and a fifth type non-case node; and determining the risk level of the associated case node according to the risk value of the non-case node and the second type of the non-case node.

In one embodiment, determining the risk value of the non-case node according to the first type of the non-case node and the target information entropy of the non-case node and a preset rule corresponding to the type comprises:

aiming at the first type of non-case nodes, determining the risk values of the non-case nodes according to a first preset rule; the larger the information entropy of the non-case node in the first preset rule is, the higher the risk value of the non-case node is; aiming at the second type of non-case nodes, determining the risk values of the non-case nodes according to a second preset rule; the larger the information entropy of the non-case node in the second preset rule is, the lower the risk value of the non-case node is.

In one embodiment, determining a risk level for a group in the associated topological network based on the risk level of the associated case node comprises:

determining the number of associated case nodes corresponding to each risk level in the associated topological network; determining that a group in the associated topological network is a fraudulent group based on the number of associated case nodes corresponding to each risk level.

In one embodiment, the method further includes:

acquiring a topological network corresponding to a plurality of cases from the knowledge graph, wherein the topological network comprises case nodes and character nodes corresponding to the case nodes; if the preset relationship among the character nodes is determined in the topological networks corresponding to the cases, the character nodes in the topological networks corresponding to the cases are determined to be fraud groups; the preset relationship comprises a conflict relationship and an association relationship.

In a second aspect, the present application further provides a fraudulent group identification apparatus. The device comprises:

the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a plurality of associated topological networks from a knowledge graph, and each associated topological network comprises a plurality of associated case nodes and non-case nodes corresponding to the associated case nodes;

the first determining module is used for calculating the target information entropy of the non-case nodes aiming at each associated case node and determining the risk level of the associated case node according to the target information entropy of the non-case nodes;

the second determination module is used for determining the risk level of the group in the associated topological network based on the risk level of the associated case node; a group includes non-case nodes in an associated topology network.

In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor implementing the method steps in any of the embodiments of the first aspect described above when executing the computer program.

In a fourth aspect, the present application further provides a computer-readable storage medium. The computer-readable storage medium having stored thereon a computer program which, when being executed by a processor, carries out the method steps of any of the embodiments of the first aspect described above.

The cheat group partner identification method, the cheat group partner identification device, the computer equipment and the storage medium acquire a plurality of associated topological networks from the knowledge graph; aiming at each associated case node in the associated topological network, calculating the target information entropy of the non-case node, and determining the risk level of the associated case node according to the target information entropy of the non-case node; determining a risk level of a group in the associated topological network based on the risk level of the associated case node. In the technical scheme provided by the embodiment of the application, the dispersion and concentration degrees of the non-case nodes are determined through the information entropy of the non-case nodes, so that the logic of business processing is better met, the risk values of different non-case nodes are measured according to the dispersion or concentration degrees, the risk level of the gangs is determined according to the risk values of a plurality of non-case nodes, and the accuracy of risk prediction of cheating gangs is improved.

Drawings

FIG. 1 is a diagram illustrating an internal structure of a computer device according to an embodiment;

FIG. 2 is a flow diagram of a fraudulent group identification method in one embodiment;

FIG. 3 is a diagram of a smallest network element in one embodiment;

FIG. 4 is a schematic diagram of a process for computing entropy of target information in one embodiment;

FIG. 5 is a flow diagram illustrating the determination of target information entropy, according to one embodiment;

FIG. 6 is a flow diagram illustrating adjustment of entropy of intermediate information according to an embodiment;

FIG. 7 is a schematic illustration of risk delivery in one embodiment;

FIG. 8 is a graphical illustration of the results of risk delivery in one embodiment;

FIG. 9 is a schematic representation of the results of risk delivery in another embodiment;

FIG. 10 is a schematic flow chart illustrating the determination of risk levels for associated case nodes in one embodiment;

FIG. 11 is a schematic illustration of a target vehicle scatter profile in one embodiment;

FIG. 12 is a schematic diagram of a centralized distribution of sales personnel in one embodiment;

FIG. 13 is a flow diagram illustrating the determination of a fraudulent group in one embodiment;

FIG. 14 is a flow diagram that illustrates the determination of a fraudulent group based on a persona node relationship, under an embodiment;

FIG. 15 is a schematic illustration of a case and person relationship in one embodiment;

FIG. 16 is a schematic illustration of person-to-person relationships in one embodiment;

FIG. 17 is a diagram illustrating conversion of case-to-person relationships to person-to-person relationships in one embodiment;

fig. 18 is a block diagram of the structure of a fraudulent group identification means in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The fraud group identification method provided by the application can be applied to computer equipment, the computer equipment can be a server or a terminal, wherein the server can be one server or a server cluster consisting of a plurality of servers.

Taking a computer device as an example of a terminal, fig. 1 shows a block diagram of a terminal, which, as shown in fig. 1, includes a processor, a memory, a communication interface, a display screen, and an input device connected through a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless communication can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a fraudulent group identification method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.

It will be appreciated by those skilled in the art that the configuration shown in fig. 1 is a block diagram of only a portion of the configuration relevant to the present solution and does not constitute a limitation on the terminal to which the present solution is applied, alternatively the terminal may include more or less components than shown in the figures, or combine certain components, or have a different arrangement of components.

It should be noted that, the execution subject in the embodiments of the present application may be a computer device, or may also be a fraud group identification apparatus, and the following method embodiments describe using a computer device as an execution subject.

In one embodiment, as shown in fig. 2, which illustrates a flowchart of a fraudulent group identification provided by an embodiment of the present application, the method may include the following steps:

step 220, obtaining a plurality of associated topological networks from the knowledge graph, wherein the associated topological networks comprise a plurality of associated case nodes and non-case nodes corresponding to the associated case nodes.

The knowledge graph is a complex network graph formed by different nodes and edge relations among the nodes, the nodes can comprise case nodes and non-case nodes, and one case node and a plurality of non-case nodes related to the case node form a minimum network unit in the knowledge graph. The associated topological network is a network formed by different case nodes and non-case nodes with association relations in a knowledge graph, when the associated topological network is obtained from the knowledge graph, a target minimum network unit can be determined, and the other minimum network units capable of being associated to the maximum extent and the target network unit form the associated topological network through the association relations between the non-case nodes in the target minimum network unit and the non-case nodes in the other minimum network units.

The construction method of the knowledge graph can be that related business data is obtained from a business system, then a plurality of key business fields are extracted from the business data, the extracted key business fields are input into a graph database, and the final knowledge graph is obtained after the key business fields are subjected to data cleaning, graph format conversion, node integration and other operations in the graph database. The data cleaning may be to process data missing, errors and other situations of the key service fields, the graph format conversion may be to convert the key service fields subjected to the data cleaning into a network graph structure after correlating the key service fields according to a preset node type and an edge relationship type, and the node integration may be a process of removing redundant nodes in the network graph structure. Taking the knowledge map of vehicle insurance claims as an example, the service system for providing relevant service data may include a filing system, a core claims system, a survey system, and the like, wherein the filing system is used for implementing filing and scheduling services, the core claims system is used for implementing filing, claims settling, claims checking, surveying, settling, and the like, and the survey system is used for implementing surveying, damage checking, and the like. The key service fields may include case information, personnel information, vehicle information, policy information, injury information, black and grey lists, personnel relationships, and the like, or may be other fields, which is not specifically limited in this embodiment, so that the minimum network units constructed according to the key service fields are as shown in fig. 3, and finally, a final knowledge graph may be constructed according to a plurality of minimum network units.

And 240, calculating the target information entropy of the non-case nodes aiming at each associated case node, and determining the risk level of the associated case node according to the target information entropy of the non-case nodes.

The associated topological network can comprise a plurality of associated case nodes and non-case nodes corresponding to the associated case nodes, and target information entropies of the non-case nodes can be calculated for the associated case nodes in the associated topological network, wherein the information entropies are used for representing the chaos degree of the non-case nodes, namely, the chaos is more dispersed, and the chaos is more concentrated. And quantifying the risk value of the non-case node through the information entropy, thereby determining the risk level of the associated case node according to the target information entropy of the non-case node.

Step 260, determining a risk level of a group in the associated topology network based on the risk level of the associated case node; a group includes non-case nodes in an associated topology network.

The risk level of the associated case node may be multiple levels, and the risk level of the partnership in the associated topology network is determined by determining the number of the associated case nodes with different risk levels in the associated topology network, or the risk level of the partnership in the associated topology network may be determined according to the number of the associated case nodes with a certain risk level, or the risk level of the partnership in the associated topology network may be determined according to other manners, which is not specifically limited in this embodiment.

In the embodiment, a plurality of associated topological networks are obtained from the knowledge graph; aiming at each associated case node in the associated topological network, calculating the target information entropy of the non-case node, and determining the risk level of the associated case node according to the target information entropy of the non-case node; determining a risk level of a group in the associated topological network based on the risk level of the associated case node. The dispersion and concentration degrees of the non-case nodes are determined through the information entropy of the non-case nodes, so that the logic of business processing is better met, the risk values of different non-case nodes are measured according to the dispersion or concentration degrees, the risk level of the ganged partners is determined according to the risk values of a plurality of non-case nodes, and the accuracy of risk prediction of cheating ganged partners is improved.

In one embodiment, as shown in fig. 4, which illustrates a flowchart of fraudulent group identification provided by the embodiment of the present application, specifically, a possible process for calculating target information entropy is provided, and the method may include the following steps:

and step 420, aiming at each associated case node, determining a target non-case node from the non-case nodes.

Step 440, removing non-case nodes except the target non-case node from the non-case nodes as other non-case nodes.

Step 460, for each other non-case node, determining a distribution matrix of the other non-case nodes from the associated topological network, and calculating a target information entropy of the target non-case node according to the distribution matrix of the other non-case nodes.

When the target information entropy of the non-case nodes is calculated, the number of the non-case nodes in the associated topological network is multiple, the target information entropy of one non-case node can be calculated by any other non-case node, and the any other non-case node is a non-case node except the target non-case node removed from the non-case nodes. Therefore, a target non-case node can be determined from the non-case nodes, and then distribution matrixes of other non-case nodes are determined from the associated topological network aiming at other non-case nodes, so that the target information entropy of the target non-case node is calculated and obtained according to the distribution matrixes of other non-case nodes. The distribution matrix is a combination of distribution numbers of the same non-case node in different case nodes, for example, there are two case nodes, and if each case node corresponds to a non-case node, the distribution matrix of the non-case node is (1, 1); if the two case nodes simultaneously correspond to a non-case node, the distribution matrix of the non-case node is (2). If the number of other non-case nodes is multiple, multiple information entropies of the non-case node may be obtained through calculation, the maximum information entropy may be directly used as the target information entropy of the non-case node, an average value of all the information entropies may also be used as the target information entropy of the non-case node, and of course, the target information entropy of the non-case node may also be determined according to other manners, which is not specifically limited in this embodiment.

Taking the car insurance claim settlement associated topological network as an example, the non-case nodes in the associated topological network may include but are not limited to target cars, reporting mobile phone numbers, target drivers, policemen and insureds, repair shops, surveys, loss assessment personnel, sales personnel, and the like, and different non-case nodes may all be used as a risk subject, for example, when multiple target cars are in danger and report with the same mobile phone number, and the mobile phone numbers are reported as risk subjects under the condition that no repair shop or sales personnel report; the method comprises the following steps that a vehicle owner serves as a risk main body under the condition that a plurality of target vehicles (namely household self-service vehicles) of the same vehicle owner are in danger; associating a plurality of insurance cases on the same case reporting mobile phone, wherein target vehicles, vehicle owners and insured persons are scattered, but the target vehicles, vehicle owners and insured persons are concentrated on sales personnel and survey and damage assessment personnel under the condition of the same sales personnel or survey and damage assessment personnel to serve as risk subjects; in the case that the same driver has many cases of taking out an insurance, the target driver is taken as a risk subject under the condition that the target vehicle and the insured person are scattered; a repair shop is taken as a risk subject in case the related group of cases is concentrated on a certain repair shop. When the target information entropy of the report mobile phone number is calculated, the target information entropy can be calculated according to the distribution matrixes of a target vehicle, a target driver, an applicant, an insured person, a repair shop, a survey, a loss assessment person and a salesperson.

In the embodiment, a target non-case node is determined from the non-case nodes by aiming at each associated case node; removing non-case nodes except the target non-case node from the non-case nodes to serve as other non-case nodes; and aiming at other non-case nodes, determining a distribution matrix of other non-case nodes from the associated topological network, calculating the target information entropy of the target non-case node according to the distribution matrix of other non-case nodes, and comprehensively calculating the target information entropy of the non-case node according to other non-case nodes, so that the accuracy of calculating the target information entropy can be improved.

In one embodiment, as shown in fig. 5, which illustrates a flowchart of fraudulent group identification provided by the embodiment of the present application, specifically, a possible process for determining target information entropy is provided, and the method may include the following steps:

and 520, calculating the initial information entropy of the target non-case node according to the distribution matrix of other non-case nodes.

And 540, acquiring the maximum value of the initial information entropies, and taking the maximum value as the intermediate information entropy of the target non-case node.

And 560, adjusting the intermediate information entropy of each target non-case node according to the size of the intermediate information entropy to generate the target information entropy of the target non-case node.

After the distribution matrix of each other non-case node is obtained, the initial information entropy of the target non-case node can be obtained through calculation of a formula (1), or can be obtained through querying a preset information entropy query table shown in table 1, where table 1 only explains a part of results. And comparing the initial information entropies, wherein the maximum value can be used as the intermediate information entropy of the target non-case node. Because risk transfer can be carried out among the non-case nodes of the same type, the intermediate information entropy of each target non-case node can be adjusted according to the size of the intermediate information entropy, and then the target information entropy of the target non-case node is generated.

（1）

Wherein X is the number of elements in the distribution matrix;

the distribution probability of each element in the distribution matrix is, for example, (1, 1) and the distribution probability of each element is 1/2.

In adjusting the intermediate entropy, optionally, as shown in fig. 6, which illustrates a flowchart of fraud group identification provided in an embodiment of the present application, specifically, related to a possible process of adjusting the intermediate entropy, the method may include the following steps:

step 620, determining a plurality of target non-case node sets from the associated topological network, wherein the target non-case node sets comprise a plurality of target non-case nodes of the same type.

And step 640, aiming at each target non-case node set, determining a first target non-case node from the target non-case node set, wherein the risk value corresponding to the target information entropy of the first target non-case node is the highest.

Step 660, calculating case node betweenness between the first target non-case node and other target non-case nodes according to a shortest path algorithm, and calculating risk transfer distance of the first target non-case node.

And step 680, adjusting the intermediate information entropy of each target non-case node according to the case node betweenness and the risk transfer distance to generate the target information entropy of the target non-case node.

When risk transfer is carried out, the non-case nodes of the same type need to be obtained first, specifically, a plurality of target non-case node sets can be determined from the associated topology network first, and each target non-case node set comprises a plurality of target non-case nodes of the same type. Since the target non-case nodes with higher risks transmit risks to the target non-case nodes with lower risks during risk transmission, the intermediate information entropies of the target non-case nodes in the target non-case node sets can be compared for each target non-case node set, and the first target non-case node is determined from the target non-case node sets, wherein the risk value corresponding to the target information entropy of the first target non-case node is the highest. The first target non-case node can transmit information entropy to other target non-case nodes, and meanwhile nodes with large risk values corresponding to target information entropy in other target non-case nodes can transmit risks to other nodes with small risk values.

Specifically, when risk transfer is performed, case node betweenthe first target non-case node and other target non-case nodes can be calculated according to a shortest path algorithm, the risk transfer distance of the first target non-case node is calculated, and then the case node betweenthe case node and the risk transfer distance are substituted into a formula (2) to calculate the transferred information entropy, so that the intermediate information entropy of each target non-case node is adjusted, and the target information entropy of the target non-case nodes is generated. The case node betweenness is the number of case nodes directly related to the first target non-case node between the first target non-case node and other target non-case nodes, and the risk transfer distance is the number of other target non-case nodes which can be transferred to the first target non-case node. When the intermediate information entropy of each target non-case node is adjusted, the information entropy calculated according to the formula (2) and the intermediate information entropy of the target non-case node may be superposed.

（2）

Wherein N is the case node betweenness between the first target non-case node and other target non-case nodes; k is a self-defined coefficient; t is the risk transfer distance;

the intermediate information entropy of the first target non-case node.

Taking fig. 7 as an example, risk transfer can be performed between two reporting mobile phone numbers, wherein the risk value of the right reporting mobile phone number is higher, and the right reporting mobile phone number can be used as a first target non-case node. Calculating the case node betweenness between the first target non-case node and other target non-case nodes to be 3 according to the shortest path algorithm, and customizing

The risk transfer distance t =3 can be calculated according to the formula (3), and the entropy of information that can be transferred at different risk transfer distances can be calculated according to the formula (2), and the result is shown in fig. 8.

（3）

Similarly, calculating the case node betweenness between the first target non-case node and other target non-case nodes to be 4 according to the shortest path algorithm, and customizing

The risk transfer distance t =4 can be calculated according to the formula (3), and the entropy of information that can be transferred at different risk transfer distances can be calculated according to the formula (2), and the result is shown in fig. 9.

In addition, it can be seen that the larger the case node betweenness between the first target non-case node and other target non-case nodes is, the larger the information entropy transmitted from the first target non-case node to other target non-case nodes is.

TABLE 1

In the embodiment, the initial information entropy of the target non-case node is calculated according to the distribution matrix of other non-case nodes; acquiring the maximum value of each initial information entropy, and taking the maximum value as the intermediate information entropy of the target non-case node; according to the size of the intermediate information entropy, the intermediate information entropy of each target non-case node is adjusted to generate the target information entropy of the target non-case node, the method for calculating the target information entropy is simple and easy to implement, the efficiency for calculating the target information entropy of the non-case node is improved, and the accuracy and the reliability of the intermediate information entropy are further improved by adjusting the intermediate information entropy of each target non-case node through risk transfer.

In one embodiment, as shown in fig. 10, which illustrates a flow chart of fraudulent group identification provided by the present application embodiment, and particularly relates to a possible process for determining a risk level of an associated case node, the method may include the following steps:

step 1020, classifying the non-case nodes for the first time to generate a first category; the first type comprises a first type of non-case node and a second type of non-case node; the first type of non-case nodes comprise non-case nodes directly related to cases, and the second type of non-case nodes comprise non-case nodes indirectly related to cases.

Step 1040, determining the risk value of the non-case node according to the first type of the non-case node and the target information entropy of the non-case node and the preset rule corresponding to the type.

1060, carrying out secondary classification on the non-case nodes to generate a second type; the second type comprises a third type non-case node, a fourth type non-case node and a fifth type non-case node.

And 1080, determining the risk level of the associated case node according to the risk value of the non-case node and the second type of the non-case node.

The non-case nodes can be classified for the first time according to the attribute information of the non-case nodes, and then the non-case nodes directly related to the case are obtained to form a first type of non-case nodes, and the non-case nodes indirectly related to the case are obtained to form a second type of non-case nodes. So that the risk value of the non-case node can be calculated according to the preset rule corresponding to the category. After the risk values of the non-case nodes are obtained, the non-case nodes can be classified for the second time to generate a second type, the second type can be divided into three types according to actual requirements, namely a third type non-case node, a fourth type non-case node and a fifth type non-case node are obtained, finally, a corresponding case risk matrix can be obtained according to the risk values of the non-case nodes and the second type of the non-case nodes, and therefore the risk level of the associated case node can be determined according to the case risk matrix. The risk matrix may be composed of three elements, each element corresponds to a level corresponding to a maximum risk value in the third type non-case node, the fourth type non-case node, and the fifth type non-case node, and the level corresponding to the maximum risk value of 100 may be defined as high, the levels corresponding to the maximum risk values of 60 and 40 as medium, and the level corresponding to the maximum risk value of 0 as low according to the actual business demand.

If the risk matrix comprises high, high and high, the risk level of the associated case node is 8 grade; if the risk matrix comprises two elements, namely high element and high element, the risk level of the associated case node is 7 level; if the risk matrix comprises high, medium and medium, the risk level of the associated case node is 6 grade; if the risk matrix comprises high, medium and low, the risk level of the associated case node is 5 grade; if the risk matrix comprises high, low and low, the risk level of the associated case node is 4 grade; if the risk matrix comprises a middle level, a middle level and a middle level, the risk level of the associated case node is 3 level; if the risk matrix comprises medium, medium and low, the risk level of the associated case node is level 2; if the risk matrix comprises medium, low and low, the risk level of the associated case node is 1 level; if the risk matrix comprises low, low and low, the risk level of the associated case node is 0. In this embodiment, the determination of the number of elements in the associated risk matrix and the setting of the corresponding relationship between the risk matrix and the risk level may both be changed according to actual requirements, which is not specifically limited in this embodiment.

By taking the car insurance claim association topological network as an example, after the non-case nodes are classified for the first time according to the attribute information of the non-case nodes, the non-case nodes such as the reporting mobile phone number, the insured person, the target car, the target driver and the like can be classified into a first type of non-case nodes, and the non-case nodes such as the salespersons, the repair shops, the survey, the damage assessment personnel and the like can be classified into a second type of non-case nodes. When calculating the risk value of the first-type non-case node, the risk value can be calculated according to the following preset rule: if the target information entropy is greater than or equal to the threshold value 1, determining that the risk value of the non-case node is 100, if the target information entropy is less than the threshold value 1 and greater than or equal to the threshold value 2, determining that the risk value of the non-case node is 60, if the target information entropy is less than the threshold value 2 and greater than or equal to the threshold value 3, determining that the risk value of the non-case node is 40, and if the target information entropy is less than the threshold value 3, determining that the risk value of the non-case node is 0.

When calculating the risk value of the second type of non-case node, the risk value can be calculated according to the following preset rule: if the target information entropy is smaller than a threshold value 4 and larger than or equal to the threshold value 4, determining that the risk value of the non-case node is the sum of 0 and the maximum risk value of the corresponding non-case node in the first class of non-case nodes, if the target information entropy is larger than or equal to the threshold value 5, determining that the risk value of the non-case node is the sum of 40 and the maximum risk value of the corresponding non-case node in the first class of non-case nodes, if the target information entropy is smaller than the threshold value 5 and larger than or equal to the threshold value 6, determining that the risk value of the non-case node is the sum of 60 and the maximum risk value of the corresponding non-case node in the first class of non-case nodes, and if the target information entropy is smaller than the threshold value 6, determining that the risk value of the non-case node is the sum of 100 and the maximum risk value of the corresponding non-case node in the first class of non-case nodes. The different thresholds can be customized according to business requirements.

And classifying the non-case nodes for the second time, wherein when the second type is generated, the third type of non-case nodes can comprise reporting mobile phone numbers, sales personnel, survey and damage assessment personnel, the fourth type of non-case nodes can comprise target vehicles, and the fifth type of non-case nodes can comprise insured persons and target drivers, so that corresponding risk matrixes can be obtained by combining the risk values of the non-case nodes with the second type of non-case nodes, and the risk grades of the associated case nodes can be determined according to the risk matrixes.

In the embodiment, a first category is generated by classifying the non-case nodes for the first time; determining a risk value of the non-case node according to a first type of the non-case node and a target information entropy of the non-case node and a preset rule corresponding to the type; classifying the non-case nodes for the second time to generate a second category; and determining the risk level of the associated case node according to the risk value of the non-case node and the second type of the non-case node, flexibly dividing the non-case node according to the service requirement, and calculating the corresponding risk value, so that the risk value of the associated case node can be calculated more accurately.

In one embodiment, determining a risk value of a non-case node according to a preset rule corresponding to a category according to a first category of the non-case node and a target information entropy of the non-case node comprises: aiming at the first type of non-case nodes, determining the risk values of the non-case nodes according to a first preset rule; the larger the information entropy of the non-case node in the first preset rule is, the higher the risk value of the non-case node is; aiming at the second type of non-case nodes, determining the risk values of the non-case nodes according to a second preset rule; the larger the information entropy of the non-case node in the second preset rule is, the lower the risk value of the non-case node is.

The method comprises the following steps that according to actual business logic, the larger the information entropy of non-case nodes in first-class non-case nodes is, the higher the risk value of the non-case nodes is; the larger the information entropy of the non-case nodes in the second type of non-case nodes is, the lower the risk value of the non-case nodes is. As shown in fig. 11, in a plurality of case nodes corresponding to one reporting mobile phone number, target vehicles corresponding to the case nodes are dispersed, that is, the entropy of the reporting mobile phone number is larger, and since the reporting mobile phone number is a first-type non-case node, the higher the risk value of the reporting mobile phone number can be obtained; as shown in fig. 12, in a plurality of case nodes corresponding to one reporting mobile phone number, most of the salespersons corresponding to the case nodes are centralized, that is, the smaller the information entropy of the salespersons is, the higher the risk value of the salespersons can be obtained because the salespersons are non-case nodes of the second type.

In one embodiment, as shown in fig. 13, which illustrates a flowchart of a fraudulent group identification provided by the embodiment of the present application, and particularly relates to a possible process for determining a fraudulent group, the method may include the following steps:

step 1320, determining the number of associated case nodes corresponding to each risk level in the associated topological network.

Step 1340, determining that the group in the associated topological network is a fraudulent group based on the number of associated case nodes corresponding to each risk level.

Because the plurality of associated topological networks can be obtained from the knowledge graph, when a cheating group is determined from the plurality of associated topological networks, an associated topological network with the highest risk value can be obtained from the plurality of associated topological networks, and the group in the associated topological network is used as the cheating group. Specifically, for each associated topological network, a plurality of case nodes included in the associated topological network have corresponding risk levels, and the risk values of the associated topological network can be determined by comparing the number of associated case nodes corresponding to each risk level. For example, if the associated topology network 1 includes 5 case nodes with 8-level risk and 3 case nodes with 6-level risk, and the associated topology network 2 includes 4 case nodes with 8-level risk and 4 case nodes with 6-level risk, the number of associated case nodes with the highest risk level is compared, and the greater the number of associated case nodes with the same risk level, the higher the risk value of the associated topology network, and the final fraud group can be obtained.

In the embodiment, the number of the associated case nodes corresponding to each risk level in the associated topological network is determined; and determining that the group in the associated topological network is a fraud group based on the number of the associated case nodes corresponding to each risk level, so that the fraud group can be quickly determined from the plurality of associated topological networks, and the efficiency of forecasting the risk of the fraud group is improved.

In one embodiment, as shown in fig. 14, which illustrates a flowchart of a fraudulent group identification provided by the embodiment of the present application, and particularly relates to a possible process for determining a fraudulent group according to a person node relationship, the method may include the following steps:

step 1420, acquiring a topology network corresponding to a plurality of cases from the knowledge graph, wherein the topology network comprises case nodes and character nodes corresponding to the case nodes.

Step 1440, if the preset relationship is determined among the character nodes in the topology networks corresponding to the cases, determining that the character nodes in the topology networks corresponding to the cases are fraud groups; the preset relationship comprises a conflict relationship and an association relationship.

The topological network corresponding to each case acquired from the knowledge graph is a minimum unit network, the topological network comprises case nodes and figure nodes corresponding to the case nodes, and the cases can be used as links to establish a person-to-person relationship. The established case-to-person relationship is shown as a part framed with a hexagon in fig. 15, and is converted into a person-to-person relationship by combining a preset relationship between person nodes based on the case-to-person relationship. As shown in fig. 16, the person-to-person relationship may include a conflict relationship and an association relationship, wherein the conflict relationship represents two parties related to the same case between the person nodes, and the association relationship represents acquaintance or indirect acquaintance between the task nodes. For example, in the field of vehicle insurance, a conflict relationship may be understood as a collision relationship, and an association relationship may be understood as an acquaintance relationship. In the preset relationship among the character nodes, the character nodes with collision relationship can comprise a three-party driver and a target vehicle owner, a three-party driver and a target driver, a three-party driver and a policyholder, a policyholder and the like; the character nodes having acquaintance relationships may include a target driver and a target driver, a target driver and a target owner, a target driver and an applicant, an insured person, and the like.

If the preset relationship is determined among the character nodes in the topology network corresponding to the plurality of cases, a ring-forming relationship is formed among the character nodes, namely, a ring-forming topology network is formed, so that the character nodes in the topology network corresponding to the plurality of cases are determined to be a cheating group. If the figure nodes in the existing topology network do not form a ring-forming relationship, the new case can be continuously waited for and further judged; if the figure nodes form a ring-forming relationship, people corresponding to the figure nodes can be added into a blacklist. As shown in fig. 17, the topology network formed by the case-to-person relationship is very complex in structure, and is more clearly understood after conversion into the person-to-person relationship.

In the embodiment, a topological network corresponding to a plurality of cases is obtained from a knowledge graph, and the topological network comprises case nodes and character nodes corresponding to the case nodes; and if the preset relationship is determined among the character nodes in the topological networks corresponding to the cases, determining the character nodes in the topological networks corresponding to the cases as the cheating group. The cheating group is determined by judging whether the figure nodes form a ring-forming relationship, so that the business logic is better met, and the determined cheating group is more accurate.

It should be understood that, although the steps in the flowcharts related to the embodiments as described above are sequentially displayed as indicated by arrows, the steps are not necessarily performed sequentially as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a part of the steps in the flowcharts related to the embodiments described above may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the execution order of the steps or stages is not necessarily sequential, but may be rotated or alternated with other steps or at least a part of the steps or stages in other steps.

Based on the same inventive concept, the embodiment of the present application further provides a fraud group identification apparatus for implementing the above mentioned fraud group identification method. The solution to the problem provided by the apparatus is similar to the solution described in the above method, so the specific limitations in one or more embodiments of the fraud group identification apparatus provided below may refer to the limitations in the above fraud group identification method, and are not described herein again.

In one embodiment, as shown in fig. 18, there is provided a fraudulent group identification apparatus 1800 comprising: a first obtaining module 1802, a first determining module 1804, and a second determining module 1806, wherein:

a first obtaining module 1802 is configured to obtain a plurality of associated topological networks from the knowledge graph, where the associated topological networks include a plurality of associated case nodes and non-case nodes corresponding to the associated case nodes.

A first determining module 1804, configured to calculate, for each associated case node, a target information entropy of a non-case node, and determine a risk level of the associated case node according to the target information entropy of the non-case node.

A second determining module 1806, configured to determine a risk level of a partnership in the associated topological network based on the risk level of the associated case node; a group includes non-case nodes in an associated topology network.

In an embodiment, the first determining module 1804 is specifically configured to determine, for each associated case node, a target non-case node from the non-case nodes; removing non-case nodes except the target non-case node from the non-case nodes to serve as other non-case nodes; and aiming at other non-case nodes, determining the distribution matrix of other non-case nodes from the associated topological network, and calculating the target information entropy of the target non-case node according to the distribution matrix of other non-case nodes.

In one embodiment, the first determining module 1804 is further configured to calculate an initial information entropy of the target non-case node according to the distribution matrix of the other non-case nodes; acquiring the maximum value of each initial information entropy, and taking the maximum value as the intermediate information entropy of the target non-case node; and adjusting the intermediate information entropy of each target non-case node according to the size of the intermediate information entropy to generate the target information entropy of the target non-case node.

In one embodiment, the first determining module 1804 is further configured to determine a plurality of target non-case node sets from the associated topological network, where the target non-case node sets include a plurality of target non-case nodes of the same type; aiming at each target non-case node set, determining a first target non-case node from the target non-case node set, wherein the risk value corresponding to the target information entropy of the first target non-case node is the highest; calculating case node betweenness between the first target non-case node and other target non-case nodes according to a shortest path algorithm, and calculating a risk transfer distance of the first target non-case node; and adjusting the intermediate information entropy of each target non-case node according to the case node betweenness and the risk transfer distance to generate the target information entropy of the target non-case node.

In an embodiment, the second determining module 1806 is specifically configured to perform a first classification on the non-case nodes, and generate a first category; the first type comprises a first type of non-case node and a second type of non-case node; the first type of non-case nodes comprise non-case nodes directly related to cases, and the second type of non-case nodes comprise non-case nodes indirectly related to cases; determining a risk value of the non-case node according to a first type of the non-case node and a target information entropy of the non-case node and a preset rule corresponding to the type; classifying the non-case nodes for the second time to generate a second category; the second type comprises a third type non-case node, a fourth type non-case node and a fifth type non-case node; and determining the risk level of the associated case node according to the risk value of the non-case node and the second type of the non-case node.

In an embodiment, the second determining module 1806 is further configured to determine, for the first type of non-case node, a risk value of the non-case node according to a first preset rule; the larger the information entropy of the non-case node in the first preset rule is, the higher the risk value of the non-case node is; aiming at the second type of non-case nodes, determining the risk values of the non-case nodes according to a second preset rule; the larger the information entropy of the non-case node in the second preset rule is, the lower the risk value of the non-case node is.

In an embodiment, the second determining module 1806 is further configured to determine the number of associated case nodes corresponding to each risk level in the associated topological network; determining that a group in the associated topological network is a fraudulent group based on the number of associated case nodes corresponding to each risk level.

In one embodiment, the fraud group identification apparatus 1800 further includes a second obtaining module and a third determining module, where the second obtaining module is configured to obtain, from the knowledge graph, a topological network corresponding to a plurality of cases, where the topological network includes case nodes and person nodes corresponding to the case nodes; the third determining module is used for determining that the character nodes in the topological networks corresponding to the cases are fraud groups if the preset relations are determined among the character nodes in the topological networks corresponding to the cases; the preset relationship comprises a conflict relationship and an association relationship.

The various modules in the above described fraudulent group identification means may be implemented in whole or in part by software, hardware and combinations thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program:

In one embodiment, the processor, when executing the computer program, further performs the steps of:

The implementation principle and technical effect of the computer device provided by the embodiment of the present application are similar to those of the method embodiment described above, and are not described herein again.

In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:

In one embodiment, the computer program when executed by the processor further performs the steps of:

The implementation principle and technical effect of the computer-readable storage medium provided by this embodiment are similar to those of the above-described method embodiment, and are not described herein again.

It should be noted that, the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, presented data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high-density embedded nonvolatile Memory, resistive Random Access Memory (ReRAM), Magnetic Random Access Memory (MRAM), Ferroelectric Random Access Memory (FRAM), Phase Change Memory (PCM), graphene Memory, and the like. Volatile Memory can include Random Access Memory (RAM), external cache Memory, and the like. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others. The databases referred to in various embodiments provided herein may include at least one of relational and non-relational databases. The non-relational database may include, but is not limited to, a block chain based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic devices, quantum computing based data processing logic devices, etc., without limitation.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present application shall be subject to the appended claims.

Claims

1. A fraudulent group identification method, the method comprising:

acquiring a plurality of associated topological networks from a knowledge graph, wherein each associated topological network comprises a plurality of associated case nodes and non-case nodes corresponding to the associated case nodes;

calculating the target information entropy of the non-case nodes aiming at each associated case node, and determining the risk level of the associated case node according to the target information entropy of the non-case nodes;

determining a risk level of a group in the associated topological network based on the risk level of the associated case node; the group includes the non-case node in the associated topology network.

2. The method according to claim 1, wherein said calculating target entropy of information for said non-case nodes for each of said associated case nodes comprises:

aiming at each associated case node, determining a target non-case node from the non-case nodes;

removing non-case nodes except the target non-case node from the non-case nodes to serve as other non-case nodes;

and aiming at each other non-case node, determining a distribution matrix of the other non-case nodes from the associated topological network, and calculating the target information entropy of the target non-case node according to the distribution matrix of the other non-case nodes.

3. The method according to claim 2, wherein said calculating target information entropy of said target non-case node according to said distribution matrix of other non-case nodes comprises:

calculating the initial information entropy of the target non-case node according to the distribution matrix of the other non-case nodes;

acquiring the maximum value in the initial information entropies, and taking the maximum value as the intermediate information entropy of the target non-case node;

and adjusting the intermediate information entropy of each target non-case node according to the size of the intermediate information entropy to generate the target information entropy of the target non-case node.

4. The method according to claim 3, wherein the adjusting the intermediate information entropy of each target non-case node according to the magnitude of the intermediate information entropy to generate the target information entropy of the target non-case node comprises:

determining a plurality of target non-case node sets from the associated topological network, wherein the target non-case node sets comprise a plurality of target non-case nodes of the same type;

aiming at each target non-case node set, determining a first target non-case node from the target non-case node set, wherein the risk value corresponding to the target information entropy of the first target non-case node is the highest;

calculating case node betweenness between the first target non-case node and other target non-case nodes according to a shortest path algorithm, and calculating a risk transfer distance of the first target non-case node;

and adjusting the intermediate information entropy of each target non-case node according to the case node betweenness and the risk transfer distance to generate the target information entropy of the target non-case node.

5. The method according to claim 3, wherein said determining a risk level of said associated case node according to target entropy of information of said non-case node comprises:

classifying the non-case nodes for the first time to generate a first category; the first type comprises a first type of non-case node and a second type of non-case node; the first type of non-case nodes comprise non-case nodes directly related to cases, and the second type of non-case nodes comprise non-case nodes indirectly related to cases;

determining a risk value of the non-case node according to a first type of the non-case node and a target information entropy of the non-case node and a preset rule corresponding to the type;

classifying the non-case nodes for the second time to generate a second category; the second type comprises a third type non-case node, a fourth type non-case node and a fifth type non-case node;

and determining the risk level of the associated case node according to the risk value of the non-case node and the second type of the non-case node.

6. The method according to claim 5, wherein the determining the risk value of the non-case node according to the first category of the non-case node and the target information entropy of the non-case node and the preset rule corresponding to the category comprises:

aiming at the first type of non-case nodes, determining the risk values of the non-case nodes according to a first preset rule; the larger the information entropy of the non-case node in the first preset rule is, the higher the risk value of the non-case node is;

aiming at the second type of non-case nodes, determining the risk values of the non-case nodes according to a second preset rule; the larger the information entropy of the non-case node in the second preset rule is, the lower the risk value of the non-case node is.

7. The method of claim 1, wherein determining a risk level for a partnership in the associated topological network based on the risk level of the associated case node comprises:

determining the number of the associated case nodes corresponding to each risk level in the associated topological network;

determining that a partnership in the associated topological network is a fraudulent partnership based on the number of associated case nodes corresponding to each of the risk levels.

8. The method of claim 1, further comprising:

acquiring a topological network corresponding to a plurality of cases from the knowledge graph, wherein the topological network comprises case nodes and figure nodes corresponding to the case nodes;

if the figure nodes are determined to have the preset relationship from the topological networks corresponding to the cases, determining that the figure nodes in the topological networks corresponding to the cases are fraud groups; the preset relationship comprises a conflict relationship and an association relationship.

9. A fraudulent group identification apparatus, characterised in that the apparatus comprises:

a first determining module, configured to calculate a target information entropy of the non-case node for each associated case node, and determine a risk level of the associated case node according to the target information entropy of the non-case node;

a second determining module for determining a risk level of a group in the associated topological network based on the risk level of the associated case node; the group includes the non-case node in the associated topology network.

10. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 8.

11. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 8.