CN116188174A

CN116188174A - Insurance fraud detection method and system based on modularity and mutual information

Info

Publication number: CN116188174A
Application number: CN202211579473.2A
Authority: CN
Inventors: 陈雪娇
Original assignee: Ping An Property and Casualty Insurance Company of China Ltd
Current assignee: Ping An Property and Casualty Insurance Company of China Ltd
Priority date: 2022-12-09
Filing date: 2022-12-09
Publication date: 2023-05-30

Abstract

The invention relates to the technical field of artificial intelligence, and provides a insurance fraud detection method and system based on modularity and mutual information, wherein the method comprises the following steps: acquiring a user adjacency matrix and a user attribute feature matrix according to the relationship topology network corresponding to the database; training the vector characterization learning model according to the user adjacency matrix and the user attribute feature matrix to obtain a trained vector characterization learning model; according to a user adjacency matrix and a user attribute feature matrix corresponding to a target user in a database, acquiring an optimal low-dimensional characterization matrix corresponding to the target user, and according to a target clustering algorithm, acquiring a target category corresponding to the target user; and determining whether the target user has insurance fraud according to the historical insurance fraud information of the user in the target category. The invention characterizes the cooperative relationship among the target users in the relationship topology network into the optimal low-dimensional characterization matrix, thereby more effectively and accurately detecting whether the target users have insurance fraud.

Description

Insurance fraud detection method and system based on modularity and mutual information

Technical Field

The invention relates to the technical field of computers, in particular to a method and a system for detecting insurance fraud based on modularity and mutual information.

Background

With the development of science and technology, it has been very common technical means to acquire data and use data modeling to solve corresponding problems, for example, each insurance purchasing platform may collect data of insurance purchase, acceptance, record, etc. of a user, and construct an insurance fraud detection model according to the collected record data, so as to reduce the phenomenon that an insurance company is cheated.

With the diversification of markets, the cooperation of agricultural production operators in the production activity process is more and more obvious, so that the coverage of agricultural risks among the agricultural producers is quickened, and meanwhile, the generation of agricultural risk fraud groups is accelerated. These partners aim to cheat the amount of claims paid by the company by co-operating with each other to create false claims incidents, which not only affects the market image of the company, but also greatly compromises the economic benefits of the company in the agricultural risk market.

The current agricultural risk fraud detection method mainly comprises the steps of mining the supply and marketing cooperation relation between a user according to historical application information of the user, so as to mine false claim settlement accidents. But the false claim accident detected by the method has low accuracy.

Therefore, there is a need for an agricultural fraud detection method that can improve the agricultural fraud detection results.

Disclosure of Invention

The invention provides a method, a system, equipment and a storage medium for detecting insurance fraud based on modularity and mutual information, which mainly aims to better represent a low-dimensional representation matrix in a relational topology network and effectively improve the detection efficiency of insurance fraudulent users.

In a first aspect, an embodiment of the present invention provides a method for detecting insurance fraud based on modularity and mutual information, including:

acquiring a user adjacency matrix and a user attribute feature matrix according to a corresponding relationship topology network in a database, wherein the relationship topology network comprises users in the database, cooperation relations between any two users in the database and personal attribute information of the users in the database;

training the vector representation learning model according to the user adjacency matrix and the user attribute feature matrix until a loss function corresponding to the trained vector representation learning model meets a target condition, and acquiring the trained vector representation learning model, wherein the loss function is determined according to the modularity maximization and mutual information maximization of the vector representation learning model;

Obtaining an optimal low-dimensional characterization matrix corresponding to the target user according to a user adjacency matrix corresponding to the target user in the database and a user attribute feature matrix corresponding to the target user, and obtaining a target category corresponding to the target user according to the optimal low-dimensional characterization matrix corresponding to the target user and a target clustering algorithm;

and determining whether the target user has insurance fraud or not according to the historical insurance fraud information of the user in the target category.

Preferably, the vector representation learning model includes a graph convolutional neural network, a mutual information module and a modularity module, and the training is performed on the vector representation learning model according to the user adjacency matrix and the user attribute feature matrix until a loss function corresponding to the trained vector representation learning model meets a target condition, so as to obtain the trained vector representation learning model, which includes:

acquiring an initial low-dimensional characterization matrix based on the graph convolution neural network according to the user adjacency matrix and the user attribute feature matrix, and acquiring a global representation vector according to the initial low-dimensional characterization matrix;

randomly arranging and combining elements in the user attribute feature matrix to obtain a reference feature representation matrix, and acquiring a reference low-dimensional characterization vector based on the graph convolution neural network according to the user adjacent matrix and the reference feature representation matrix;

Acquiring a first loss function based on the mutual information module according to the initial low-dimensional characterization matrix, the global representation vector and the reference low-dimensional characterization vector;

acquiring a second loss function based on the module degree module according to a preset category indication matrix, a module degree constant matrix and the initial low-dimensional characterization matrix;

judging whether the loss function meets the target condition, if not, adjusting the network parameters of the vector characterization learning model, repeating the process until the loss function corresponding to the adjusted vector characterization learning model meets the target condition, and acquiring the trained vector characterization learning model, wherein the loss function comprises the first loss function and the second loss function.

Preferably, the obtaining, based on the mutual information module, a first loss function according to the initial low-dimensional representation matrix, the global representation vector and the reference low-dimensional representation vector is obtained by the following formula:

wherein ,

representing the first loss function, i, j being positive integers and not equal, n representing the total number of users in the database, +.>

For bilinear scoring functions, s represents the global representation vector, h _i Representing an ith initial low-dimensional token vector in said initial low-dimensional token matrix,/->

Representing a j-th reference low-dimensional token vector in the reference low-dimensional token matrix.

Preferably, the obtaining the second loss function based on the module degree module according to the preset category indication matrix, the module degree constant matrix and the initial low-dimensional characterization matrix is obtained by the following formula:

Q＝tr(U ^T BU),s.t.tr(U ^T U)＝N，

wherein ,

representing the second loss function, U representing the preset class indication matrix, B representing the modularity constant matrix, H representing the initial low-dimensional feature representation matrix, C representing the preset class indication matrix, B _ij Represents the ith row and the jth column elements in the modularity constant matrix, A _ij Represents the jth column element of the ith row in the user adjacency matrix, d _i Representing the degree, d, corresponding to the ith node in the relational topology network _j And representing the degree corresponding to the j-th node in the relation topological network, |E| represents the total number of edges in the relation topological network, Q represents the modularity, N represents the sum of preset diagonal lines, and i and j are positive integers and are not equal.

Preferably, the global expression vector is obtained based on the initial low-dimensional characterization matrix, and is obtained by the following formula:

Wherein s represents the global representation vector, H represents the initial low-dimensional feature representation matrix, σ represents a linear rectification function, n represents the total number of users in the database, H _i And representing an ith initial low-dimensional characterization vector in the initial low-dimensional characterization matrix, wherein i is a positive integer.

Preferably, the determining whether the target user has insurance fraud according to the historical insurance fraud information of the user in the target category includes:

acquiring historical insurance fraud information in the target category;

fraud users corresponding to the historical insurance fraud information are identified;

and if the cooperative relationship exists between the target user and the fraudulent user, judging that the target user has insurance fraudulent activity.

Preferably, the target condition includes: the training times reach the preset times, and the difference between the loss functions corresponding to the two adjacent training times is smaller than the preset loss threshold value.

In a second aspect, an embodiment of the present invention provides an insurance fraud detection system based on modularity and mutual information, including:

the network module is used for acquiring a user adjacency matrix and a user attribute feature matrix according to a corresponding relationship topology network in a database, wherein the relationship topology network comprises users in the database, cooperation relations between any two users in the database and personal attribute information of the users in the database;

The characterization module is used for training the vector characterization learning model according to the user adjacency matrix and the user attribute feature matrix until a loss function corresponding to the trained vector characterization learning model meets a target condition, and acquiring the trained vector characterization learning model, wherein the loss function is determined according to the modularity maximization and mutual information maximization of the vector characterization learning model;

the clustering module is used for acquiring an optimal low-dimensional characterization matrix corresponding to the target user according to a user adjacent matrix corresponding to the target user in the database and a user attribute feature matrix corresponding to the target user, and acquiring a target category corresponding to the target user according to the optimal low-dimensional characterization matrix corresponding to the target user and a target clustering algorithm;

and the detection module is used for determining whether the target user has insurance fraud or not according to the historical insurance fraud information of the user in the target category.

In a third aspect, an embodiment of the present invention provides a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the above-mentioned insurance fraud detection method based on modularity and mutual information when the processor executes the computer program.

In a fourth aspect, embodiments of the present invention provide a computer storage medium storing a computer program which, when executed by a processor, implements the steps of the above-described method for detecting insurance fraud based on modularity and mutual information.

According to the insurance fraud detection method and system based on modularity and mutual information, firstly, the relationship topology network constructed by the users is based on the users recorded in the database, the topology relationship network can effectively model social relationships among the users and attribute information of the users, and the time complexity of the detection method can be reduced based on the mutual information maximization method, so that the operation efficiency of the detection method is improved; in addition, the invention is based on the maximization of modularity, so that the vector characterization model can effectively characterize the cooperative relationship between users in the relationship topology network into the optimal low-dimensional characterization matrix, thereby more effectively and accurately mining the cooperative relationship between the target user and other users and more effectively and accurately detecting whether the target user has insurance fraud.

Drawings

FIG. 1 is a schematic diagram of an application scenario of a method for detecting insurance fraud based on modularity and mutual information according to an embodiment of the present invention;

FIG. 2 is a flow chart of a method for detecting insurance fraud based on modularity and mutual information according to an embodiment of the present invention;

FIG. 3 is a diagram illustrating the execution of a security fraud detection method according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a system for detecting insurance fraud based on modularity and mutual information according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a computer device according to an embodiment of the present invention.

The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

In embodiments of the present invention, the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. In the description of the present application, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.

Fig. 1 is a schematic diagram of an application scenario of an insurance fraud detection method based on modularity and mutual information according to an embodiment of the present invention, as shown in fig. 1, all users and target clients stored in a database are input at a client, after the client receives all users and target clients stored in the database, all users and target clients stored in the database are sent to a server, after the server receives all users and target clients stored in the database, the method for detecting insurance fraud based on modularity and mutual information is executed, so as to determine whether the target users have insurance fraud.

It should be noted that the server may be implemented by an independent server or a server cluster formed by a plurality of servers. The client may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, etc. The client and the server may be connected by bluetooth, USB (Universal Serial Bus ) or other communication connection, which is not limited in this embodiment of the present invention.

The embodiment of the invention can acquire and process the related data based on the artificial intelligence technology. Among them, artificial intelligence (Artificial Intelligence, abbreviated as AI) is a theory, method, technique and application system that simulates, extends and expands human intelligence, senses environment, acquires knowledge and uses knowledge to obtain an optimal result using a digital computer or a machine controlled by a digital computer.

Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning, deep learning and other directions.

In the prior art, the artificial intelligence technology is utilized to carry out fraud detection on insurance, the method mainly judges whether the user has insurance fraud behavior by independently analyzing behavior information and data information of the user, but the method only considers personal attribute information of the user and does not consider supply and marketing relations, cooperation relations, social relations and the like among the users, so that the conventional detection method cannot reasonably divide clients into different community groups, the detection capability of potential agricultural insurance fraud partners is affected, and the accuracy of insurance fraud detection is reduced. The embodiment of the invention aims to establish an insurance fraud detection method based on modularity and mutual information maximization, a relationship topology network is established through social relations among users recorded in a database, the social relations among the users are fused into the insurance fraud detection method through the mutual information maximization, and structural information in the relationship topology network can be effectively fused into the insurance fraud detection method through the modularity maximization, so that the accuracy of insurance fraud detection can be improved, and potential insurance fraud partners can be accurately and rapidly detected. And according to the detected result, the group partner is subjected to key investigation, so that the fraud of the fraudulent group partner is effectively restrained and beaten, and the reputation and economic benefits of enterprises in the market are improved.

Fig. 2 is a flowchart of a method for detecting insurance fraud based on modularity and mutual information according to an embodiment of the present invention, as shown in fig. 2, where the method includes:

s210, acquiring a user adjacency matrix and a user attribute feature matrix according to a corresponding relationship topology network in a database, wherein the relationship topology network comprises users in the database, cooperation relations between any two users in the database and personal attribute information of the users in the database;

firstly, a relationship topology network corresponding to a user in a database is acquired, wherein the user recorded in the database can be a user who has an insurance purchase record in the enterprise, can be a user who has a purchase intention but has not purchased insurance in the enterprise, can be a user who only registers in the enterprise, and can be determined according to actual conditions. The relationship topology network in the embodiment of the invention is a network constructed according to the relationship among users in a database, and in the relationship topology network, each user respectively and independently corresponds to one node in the relationship topology network; the relationship topology network also comprises a cooperative relationship between any two users in the database, if the cooperative relationship exists between any two users, the any two users have a connecting edge between two corresponding nodes in the relationship topology network, and the cooperative relationship exists between any two users in the embodiment of the invention means that the two users have a cooperative relationship at one time, such as transaction and fund transaction and the like, and the method can be specifically determined according to actual conditions, and the embodiment of the invention is not particularly limited to the above; the relationship topology network in the embodiment of the present invention further includes personal attribute information of each user in the database, where the personal attribute information may include name, gender, age, occupation, usual location, insurance purchase record, historical insurance purchase information, etc. of the target user, and the specific information included in the relationship topology network may be determined according to actual situations, which is not specifically limited in the embodiment of the present invention.

According to the relation topology network, a user adjacency matrix and a user attribute feature matrix are obtained, in the embodiment of the invention, the user adjacency matrix is obtained through conversion of the cooperation relation between users in the relation topology network, in the embodiment of the invention, the cooperation relation between the users is quantified through the user adjacency matrix, specifically, if a connecting edge exists between two nodes in the relation topology network, the element values corresponding to the two nodes in the user adjacency matrix are set to be 1, and if the connecting edge does not exist, the element values corresponding to the two nodes are set to be 0. The user attribute feature matrix is used for quantifying the user personal attribute information, and the user attribute feature matrix is obtained by quantifying all the user personal attribute information in the database, wherein different rows in the matrix represent different users, and different columns represent different user personal attribute information.

In a specific implementation, the relationship topology network is represented by g= (V, E, X), where v= { V ₁ ,v ₂ ,…,v _n Node set representing a relational topology network G, v ₁ Represents the 1 st node, the 1 st node is the node corresponding to the 1 st user in the database, v ₂ Representing the 2 nd node, wherein the 2 nd node is the node corresponding to the 2 nd user in the database, v _n Representing an nth node, namely a node corresponding to an nth user in the database, wherein n represents the number of all users in the database; in the embodiment of the invention, each node represents a target user; in the embodiment of the invention, E= { E is adopted _i,j And (c) representing a cooperative relationship between users, wherein e _i,j Representing node v _i And node v _j A connecting edge between the two; in the embodiment of the invention, X represents the characteristic matrix of the user attribute, X epsilon R ⁿ ^×f Each row of vectors in the attribute feature matrix of the user respectively represents personal attribute information of different clients, and f represents attribute latitude of a target user, for example, the personal attribute information comprises name, gender, age, occupation, residence and insurance purchase history, and then the value of f is 6. The generation mode of X in the embodiment of the application can be used for collecting the text information of the users recorded in the enterprise and then converting the text information of each user into attribute characteristics by utilizing a word bag model.

Specifically, in the embodiment of the present invention, the method for acquiring the user adjacency matrix according to the relationship topology network includes: suppose A _ij For the elements corresponding to the ith row and the jth column of the user adjacent matrix A, if the node v _i And node v _j If there is a border, A _ij =1, which represents v _i and v_j A cooperative relationship exists between the two represented users; if node v _i And node v _j If there is no edge, A _ij =0, which represents v _i and v_j There is no cooperative relationship between the two users represented.

It should be further noted that, the insurance in the embodiment of the present invention may be an agricultural insurance, a health insurance, an unexpected insurance, etc., and may be specifically determined according to the actual situation, which is not specifically limited in the embodiment of the present invention.

S220, training the vector representation learning model according to the user adjacency matrix and the user attribute feature matrix until a loss function corresponding to the trained vector representation learning model meets a target condition, and acquiring the trained vector representation learning model, wherein the loss function is determined according to the modularity maximization and mutual information maximization of the vector representation learning model;

in the embodiment of the invention, the user adjacency matrix and the user attribute feature matrix are utilized to train the vector representation learning model, the loss function of the training is calculated when each training is performed, whether the loss function of the training meets the target condition is compared, if yes, the trained vector representation learning model is obtained, and the training is finished; if the model is not satisfied, training is performed again after the vector characterization learning model is adjusted until the loss function of the adjusted vector characterization learning model satisfies the target condition. The vector characterization learning model in the embodiment of the invention belongs to one of the neural networks, and before the vector characterization learning model is used, training or updating training is also required. The training process of the vector characterization learning model can be divided into three steps: defining the structure of the vector characterization learning model and the output result of forward propagation; defining a loss function and a back propagation optimization algorithm; finally, generating a session and repeatedly running a back propagation optimization algorithm on the training data.

In the embodiment of the invention, the loss function is determined according to the maximization of the modularity and the maximization of the mutual information of the vector characterization learning model. The modular scale is also called a modular scale value, and is a commonly used method for measuring the strength of a network community structure, the size of the modular scale value mainly depends on community allocation of nodes in a relational topology network, namely community division conditions of the relational topology network are utilized to measure community division quality in the relational topology network, and the closer the modular scale value is to 1, the stronger the strength of the community structure divided by the relational topology network is, namely the better the division quality is, so that optimal community division of the relational topology network can be obtained by maximizing the modular scale. According to the embodiment of the application, the division of the target users into the most suitable communities can be ensured through the maximization of the modularity, the communities can be understood as categories, namely, the users with similar characteristics or cooperative relationships are divided into the same communities, and therefore the structural relationship among the users can be represented to the greatest extent. According to the embodiment of the invention, the cooperation relationship among users can be furthest mined through the modularity maximization, so that the accuracy of the insurance fraud detection model can be improved.

Mutual information in the embodiment of the invention is represented as follows: the uncertainty of one parameter is caused by the variation of the other parameter, the variation of the uncertainty is the size of mutual information, the social relationship between users and the independent attribute information of the users can be effectively modeled based on the mutual information maximization method, the time complexity of the detection method can be reduced, and the operation efficiency of the detection method is improved.

S230, acquiring an optimal low-dimensional characterization matrix corresponding to the target user according to a user adjacent matrix corresponding to the target user in the database and a user attribute feature matrix corresponding to the target user, and acquiring a target category corresponding to the target user according to the optimal low-dimensional characterization matrix corresponding to the target user and a target clustering algorithm;

in the embodiment of the invention, the target user is one user recorded in the database, the user adjacency matrix corresponding to the target user is obtained, the user adjacency matrix corresponding to the target user represents the cooperative relationship between the target user and other users, if the cooperative relationship exists between the target user and one user, the value of the corresponding element is 1, and if the cooperative relationship does not exist between the target user and the other user, the value of the corresponding element is 0. The user attribute feature matrix corresponding to the target user represents personal attribute information corresponding to the target user. In the embodiment of the invention, the user adjacent matrix corresponding to the target user and the user attribute feature matrix corresponding to the target user are input into the trained vector characterization learning model, so that the optimal low-dimensional characterization matrix can be obtained.

And then, according to the optimal low-dimensional characterization matrix corresponding to the target user, inputting the optimal low-dimensional characterization matrix corresponding to the target user into a target clustering algorithm to obtain the target category corresponding to each target user. Clustering (Clustering) is the partitioning of a data set into different classes or clusters according to a certain criteria (e.g., distance) such that the similarity of data objects within the same cluster is as large as possible, while the variability of data objects that are not in the same cluster is as large as possible. I.e. the data of the same class are gathered together as much as possible, and the data of different classes are separated as much as possible. The algorithm of the cluster analysis may be classified into a partitioning method (Partitioning Methods), a hierarchy method (Hierarchical Methods), a density-Based method (density-Based Methods), a grid-Based method (grid-Based Methods), a Model-Based method (Model-Based Methods), and the like, and the cluster analysis method in the embodiment of the present invention may be specifically determined according to practical situations.

In the embodiment of the invention, a Density-based clustering algorithm (Density-Based Spatial Clustering ofApplications withNoise, DBSCAN) is adopted to cluster the optimal low-dimensional characterization matrix corresponding to the target user, so that the target category of the target user is obtained. Before the DBSCAN clustering algorithm is used, two parameters of a maximum radius r and a minimum point m of communities in the DBSCAN clustering algorithm are required to be adjusted, and the two parameters are directly related to the quality of the output class structure, so that high-quality community division is output while the efficiency is ensured by adjusting the two parameters, and potential fraudulent parties are more accurately mined.

It should be further noted that, in the embodiment of the present invention, the target class refers to a class in which the target user is located, and one class generally includes a plurality of users, and in the embodiment of the present invention, each target user is found out to be a corresponding class, that is, users with a certain same feature are classified into the same class, so when a certain target user in the same class has insurance fraud, whether other users in the class have insurance fraud is seriously examined, thereby reducing the screening range and improving the insurance fraud detection efficiency.

S240, determining whether the target user has insurance fraud according to the historical insurance fraud information of the user in the target category.

For a target class corresponding to a target user, if historical insurance fraud information exists in the target class, namely, if some target users in the target class have insurance fraud, the other clients in the target class are likely to have insurance fraud, and the target client is determined to have insurance fraud.

The invention provides a insurance fraud detection method based on modularity and mutual information, which is characterized in that firstly, a relationship topology network constructed based on users recorded in a database can effectively model social relationships among users and attribute information of the users, and the time complexity of the detection method can be reduced based on the mutual information maximization method, so that the operation efficiency of the detection method is improved; in addition, the invention is based on the maximization of modularity, so that the vector characterization model can effectively characterize the cooperative relationship between users in the relationship topology network into the optimal low-dimensional characterization matrix, thereby more effectively and accurately mining the cooperative relationship between the target user and other users and more effectively and accurately detecting whether the target user has insurance fraud.

On the basis of the foregoing embodiment, preferably, the vector characterization learning model includes a graph convolutional neural network, a mutual information module and a modularity module, and training the vector characterization learning model according to the user adjacency matrix and the user attribute feature matrix until a loss function corresponding to the trained vector characterization learning model meets a target condition, to obtain a trained vector characterization learning model, including:

The vector characterization learning model in the embodiment of the invention comprises a graph convolutional neural network, a mutual information module and a modularity module, and training the vector characterization learning model according to a user adjacency matrix and a user attribute feature matrix to obtain a trained vector characterization learning model, and specifically comprises the following steps:

FIG. 3 is a diagram showing the execution process of the security fraud detection method according to the embodiment of the present invention, as shown in FIG. 3, the user adjacency matrix A and the user attribute feature matrix X corresponding to each user in the database are first input into the graph convolution neural network ε _GCN In the method, an initial low-dimensional characterization matrix output by a graph convolution neural network is obtained, and an output formula is shown as (1):

wherein H represents an initial low-dimensional characterization matrix; w represents a weight matrix; i _n Representing the identity matrix corresponding to the user adjacent matrix, and n represents all uses in the databaseThe total number of users,

representation matrix->

Is a matrix, wherein the diagonal elements of the degree matrix are +.>

i. j are positive integers and are not equal to each other, sigma represents a linear rectification function, and H= { H ₁ ,h ₂ ,…,h _n Matrix composed of initial low-dimensional characterization vectors representing all nodes in the relational topology network, h _i Representing node v _i Is included.

And then obtaining a global expression vector according to the obtained initial low-dimensional characterization matrix, wherein the calculation formula is shown as (2):

where s represents the global representation vector, H represents the initial low-dimensional representation matrix, n represents the total number of users in the database, H _i Representing an i-th initial low-dimensional token vector in the initial low-dimensional token matrix, i being a positive integer.

Then, elements in the user attribute feature matrix are randomly arranged and combined to obtain a reference feature representation matrix

Inputting the reference feature representation matrix and the user adjacency matrix into a graph convolution neural network to obtain a reference low-dimensional representation vector, wherein a specific calculation formula is shown as (3):

wherein ,

representing node v _i Is a reference feature representation vector, epsilon _our Likewise representing a graph convolutional neural network, +. >

Representing a reference feature representation matrix, a representing a user adjacency matrix, and W representing a weight matrix.

In order to construct an unsupervised learning model, in the embodiment of the present invention, a mutual information maximization model is adopted, so that a vector characterization learning model updates a low-dimensional characterization vector of each node in an unsupervised manner, in the embodiment of the present invention, a first loss function is obtained based on a mutual information module according to an initial low-dimensional characterization matrix, a global representation vector and a reference low-dimensional characterization vector, and a specific calculation formula is shown in (4):

wherein ,

representing a first loss function, i, j being positive integers and not equal, n representing the total number of users in the database,/->

Representing a bilinear scoring function for scoring positive and negative samples, s representing a global representation vector, h _i Representing the i-th initial low-dimensional token vector in the initial low-dimensional token matrix,/th initial low-dimensional token vector>

Representing the j-th reference low-dimensional token vector in the reference low-dimensional token matrix.

In order to enable the initial low-dimensional characterization matrix to reflect the community structure characteristics of the relational topology network, a modularity maximization strategy is introduced in the embodiment of the invention, and specifically, the overall modularity expression is defined as shown in a formula (5):

Q＝tr(U ^T BU)，s.t.tr(U ^T U)＝N， (5)

Wherein s.t. represents such that XX is true, U.epsilon.R ^n×k Representing a preset class indication matrix having n rows and k columns, each row of U having only one element equal to 1, representing that the node corresponding to that row belongs to the class of the column value corresponding to element 1, and the other elements of that row being equal to 0, e.g., the ith row and jth column U of U _ij =1 represents node v _i Belonging to the j-th community. In the embodiment of the invention, Q represents modularity, N represents the sum of preset diagonals, tr represents the sum of diagonals of the matrix, and equation (5) represents the sum of the diagonals of the matrix (U ^T U) is N, the modularity Q is equal to the matrix (U) ^T BU) diagonal sum.

U initialization is random initialization, namely element random assignment in U, B is E R ^n×n The calculation mode of the ith row and the jth column of the module degree constant matrix is shown in the formula (6):

wherein ,A_ij The j-th column element of the i-th row representing the user adjacency matrix, d _i Represents the degree corresponding to the ith node, d _j The degrees corresponding to the j-th node are represented, the E| represents the total number of edges in the relation topology network, and i and j are positive integers and are not equal.

In order to make the modularity maximization process be carried out simultaneously with the updating process of the initial low-dimensional characterization matrix H, thereby realizing the description of the structural features of the relational topology network into the low-dimensional characterization vector matrix, the embodiment of the invention gives the modularity maximization process by using the factor containing H, namely, the loss function in the formula (7) needs to be minimized

/>

Wherein C represents a preset class representation matrix, e.g., C _i The i-th row of the preset category is represented by C, the low-dimensional vector of the i-th category is also represented by C, and the initialization of C is random initialization, namely the random assignment of elements in C.

And then obtaining a loss function corresponding to the characterization learning model, namely a final loss function according to the first loss function and the second loss function obtained in the previous step, and referring to a formula (8):

wherein ,

representing a loss function.

And comparing the loss function with a target condition after the loss function is calculated according to each iteration, stopping training if the target condition is met, and acquiring a trained vector characterization learning model. If the target condition is not met, training is continued after the network parameters of the vector representation learning model are adjusted until the training times or the loss value calculated according to the loss function of the adjusted vector representation learning model meets the target condition.

In the embodiment of the present invention, the target condition is that the training frequency reaches the preset frequency, and the difference between the loss functions corresponding to two adjacent training is smaller than the preset loss threshold. The target conditions in the embodiment of the invention comprise two aspects, the training times reach the preset times, the difference between the loss functions corresponding to the two adjacent training times is smaller than the preset loss threshold, and the training can be ended as long as one aspect of the target conditions is met. It is easy to understand that the training times reach the preset times to avoid the problem of falling into a dead loop under the condition that the loss value cannot meet the condition all the time; when the difference between the loss functions between the two adjacent trains is smaller than the preset loss threshold value, the model parameters are better, and even if the parameters of the vector characterization model are adjusted, the loss value cannot be smaller, so that when the difference between the loss functions between the two adjacent trains is smaller than the preset loss threshold value, the training can be ended.

On the basis of the foregoing embodiment, preferably, the determining whether the target user has insurance fraud according to the historical insurance fraud information of the user in the target category includes:

acquiring historical insurance fraud information in the target category;

Specifically, in the embodiment of the invention, whether the target user has insurance fraud is determined according to the historical insurance fraud information of the user in the target category. Firstly, according to historical insurance fraud information in a target category, acquiring a fraud user, and if a cooperative relationship exists between the target user and the fraud user, determining that the target user also has insurance fraud.

Fig. 4 is a schematic structural diagram of an insurance fraud detection system based on modularity and mutual information according to an embodiment of the present invention, as shown in fig. 4, the system includes a network module 410, a characterization module 420, a clustering module 430, and a detection module 440, where:

the network module 410 is configured to obtain a user adjacency matrix and a user attribute feature matrix according to a corresponding relationship topology network in a database, where the relationship topology network includes users in the database, cooperation relationships between any two users in the database, and personal attribute information of the users in the database;

The characterization module 420 is configured to train the vector characterization learning model according to the user adjacency matrix and the user attribute feature matrix until a loss function corresponding to the trained vector characterization learning model meets a target condition, and acquire a trained vector characterization learning model, where the loss function is determined according to a modularity maximization and a mutual information maximization of the vector characterization learning model;

the clustering module 430 is configured to obtain an optimal low-dimensional characterization matrix corresponding to the target user according to a user adjacency matrix corresponding to the target user in the database and a user attribute feature matrix corresponding to the target user, and obtain a target category corresponding to the target user according to the optimal low-dimensional characterization matrix corresponding to the target user and a target clustering algorithm;

the detection module 440 is configured to determine whether the target user has insurance fraud according to the historical insurance fraud information of the users in the target category.

The embodiment is a system embodiment corresponding to the above method, and the specific implementation process is the same as that of the above method embodiment, and the details refer to the above method embodiment, and the system embodiment is not limited in particular.

On the basis of the foregoing embodiment, preferably, the vector characterization learning model includes a graph convolutional neural network, a mutual information module, and a modularity module, and the characterization module includes an initial unit, a global unit, a first loss unit, a second loss unit, and a loss unit, where:

the initial unit is used for acquiring an initial low-dimensional representation matrix based on the graph convolution neural network according to the user adjacent matrix and the user attribute feature matrix, and acquiring a global representation vector according to the initial low-dimensional representation matrix;

the global unit is used for randomly arranging and combining elements in the user attribute feature matrix to obtain a reference feature representation matrix, and acquiring a reference low-dimensional characterization vector based on the graph convolution neural network according to the user adjacency matrix and the reference feature representation matrix;

the first loss unit is used for acquiring a first loss function based on the mutual information module according to the initial low-dimensional representation matrix, the global representation vector and the reference low-dimensional representation vector;

the second loss unit is used for acquiring a second loss function based on the module degree module according to a preset category indication matrix, a module degree constant matrix and the initial low-dimensional characterization matrix;

The loss unit is used for judging whether a loss function meets a target condition, if not, the network parameters of the vector representation learning model are adjusted, the process is repeated until the loss function corresponding to the adjusted vector representation learning model meets the target condition, the trained vector representation learning model is obtained, and the loss function comprises the first loss function and the second loss function.

On the basis of the above embodiment, preferably, the first loss unit is obtained by the following formula:

wherein ,

On the basis of the above embodiment, preferably, the second loss unit is obtained by the following formula:

Q＝tr(U ^T BU),s.t.tr(U ^T U)＝N，

wherein ,

On the basis of the above embodiment, preferably, the global unit is obtained by the following formula:

On the basis of the above embodiment, preferably, the detection module includes a history unit, an association unit, and an investigation unit, where:

the history unit is used for acquiring history insurance fraud information in the target category;

the association unit is used for corresponding fraudulent users according to the historical insurance fraud information;

And the checking unit is used for judging that the target user has insurance fraud behavior if the target user and the fraud user have a cooperative relationship.

On the basis of the above embodiment, preferably, the target condition includes: the training times reach the preset times, and the difference between the loss functions corresponding to the two adjacent training times is smaller than the preset loss threshold value.

The modules in the above-described insurance fraud detection system based on modularity and mutual information may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

Fig. 5 is a schematic structural diagram of a computer device according to an embodiment of the present invention, where the computer device may be a server, and an internal structure diagram of the computer device may be as shown in fig. 5. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a computer storage medium, an internal memory. The computer storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the computer storage media. The database of the computer device is used for storing data generated or acquired in the process of executing the insurance fraud detection method based on modularity and mutual information, such as a relationship topology network, a user adjacency matrix, a user attribute feature matrix and the like. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements a method of insurance fraud detection based on modularity and mutual information.

In one embodiment, a computer device is provided that includes a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the insurance fraud detection method based on modularity and mutual information in the above embodiments when the computer program is executed by the processor. Alternatively, the processor, when executing the computer program, implements the functions of the modules/units in this embodiment of the insurance fraud detection system based on modularity and mutual information.

In one embodiment, a computer storage medium is provided, on which a computer program is stored, which when executed by a processor implements the steps of the insurance fraud detection method based on modularity and mutual information in the above embodiments. Alternatively, the computer program, when executed by a processor, performs the functions of the modules/units in the embodiment of the insurance fraud detection system based on modularity and mutual information described above.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the various embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions.

The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention.

Claims

1. A method for detecting insurance fraud based on modularity and mutual information, comprising:

acquiring a user adjacency matrix and a user attribute feature matrix according to a relationship topology network corresponding to a database, wherein the relationship topology network comprises users in the database, cooperation relations between any two users in the database and personal attribute information of the users in the database;

2. The method for detecting insurance fraud based on modularity and mutual information according to claim 1, wherein the vector characterization learning model includes a graph convolutional neural network, a mutual information module and a modularity module, the training is performed on the vector characterization learning model according to the user adjacency matrix and the user attribute feature matrix until a loss function corresponding to the trained vector characterization learning model meets a target condition, and the obtaining the trained vector characterization learning model includes:

3. The method for detecting safe fraud based on modularity and mutual information of claim 2, wherein said obtaining a first loss function based on said mutual information module according to said initial low-dimensional characterization matrix, said global representation vector and said reference low-dimensional representation vector is obtained by the following formula:

wherein ,

4. The method for detecting safe fraud based on modularity and mutual information according to claim 2, wherein the obtaining the second loss function based on the modularity module according to a preset category indication matrix, a modularity constant matrix and the initial low-dimensional characterization matrix is obtained by the following formula:

Q＝tr(U ^T BU),s.t.tr(U ^T U)＝N，

wherein ,

5. The method for detecting safe fraud based on modularity and mutual information of claim 2, wherein said obtaining a global representation vector from said initial low-dimensional characterization matrix is obtained by the following formula:

6. The method for detecting safe fraud based on modularity and mutual information of any of claims 1 to 5, wherein said determining whether said target user has safe fraud based on historical safe fraud information of users in said target category comprises:

Acquiring historical insurance fraud information in the target category;

7. The method for detecting safe fraud based on modularity and mutual information of any of claims 1 to 5, wherein said target condition comprises: the training times reach the preset times, and the difference between the loss functions corresponding to the two adjacent training times is smaller than the preset loss threshold value.

8. An insurance fraud detection system based on modularity and mutual information, comprising:

the network module is used for acquiring a user adjacency matrix and a user attribute feature matrix according to a relationship topology network corresponding to a database, wherein the relationship topology network comprises users in the database, cooperation relations between any two users in the database and personal attribute information of the users in the database;

9. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the modularity and mutual information based insurance fraud detection method according to any of claims 1 to 7 when the computer program is executed.

10. A computer storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the modularity and mutual information based insurance fraud detection method according to any of claims 1 to 7.