CN118378201B

CN118378201B - Medical insurance group abnormal behavior detection method and device

Info

Publication number: CN118378201B
Application number: CN202410822742.6A
Authority: CN
Inventors: 吴健; 杜邦; 张铠; 邵谦; 刘伟泽; 应豪超; 李国聪
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2024-06-25
Filing date: 2024-06-25
Publication date: 2024-09-06
Anticipated expiration: 2044-06-25
Also published as: CN118378201A

Abstract

The invention discloses a method and a device for detecting abnormal behaviors of medical insurance groups, comprising the following steps: acquiring medical insurance settlement data, preprocessing the data, and extracting key characteristics of medical insurance settlement; constructing three-dimensional dynamic graphs based on key features, wherein the dynamic graphs respectively represent the times of purchasing medicines, the variety of medicines and the amount of purchasing medicines; fusing the dynamic graphs with three dimensions to obtain a fused graph, performing community discovery search of medical insurance groups on the fused graph, and constructing a community graph based on a searched community set; the community graph is encoded and decoded by the self-encoder, the reconstruction error of the community graph is calculated to serve as an abnormal index, abnormal communities in the community graph are screened, and abnormal behaviors of the medical insurance group are detected, so that the identification speed and accuracy of the abnormal behaviors of the medical insurance group can be effectively improved.

Description

Medical insurance group abnormal behavior detection method and device

Technical Field

The invention belongs to the technical field of medical insurance data anomaly detection, and particularly relates to a method and a device for detecting abnormal behaviors of medical insurance groups.

Background

The medical insurance department typically employs traditional manual review methods for exception checking, but in contrast to the manner in which artificial intelligence and data mining are used. Manual review of group hospitalization abnormal behavior relies on the experience and intuition of professionals who identify abnormal behavior by reviewing medical records, and financial data. The advantage of this approach is that it enables deep analysis using human intuition and experience, especially when dealing with complex and novel abnormal patterns. However, the disadvantages of manual inspection methods are also significant, including long time consumption, high cost, susceptibility to subjective judgment, and difficulty in processing large-scale data.

In contrast, by performing abnormal behavior check-out with artificial intelligence, and in particular using machine learning, data mining and deep learning techniques, a large amount of data can be automatically analyzed to identify complex abnormal behavior patterns and association rules. For example, patent application CN111275086a discloses a method for detecting fraudulent anomalies in a medical insurance group, which classifies patient information according to the type of hospital and disease being treated; analyzing at least two patient information of a target disease category of a target hospital to obtain a target patient group, wherein the target patient group comprises at least two target patients, the admission time difference between the at least two target patients is smaller than or equal to a first target time threshold, and the discharge time difference between the at least two target patients is smaller than or equal to a second target time threshold; a target patient group of a plurality of disease categories in a plurality of hospitals is counted, at least two abnormal patients are determined, and as an abnormal patient population, at least two abnormal patients simultaneously appear in greater than or equal to N target patient groups. Although the detection of the fraudulent abnormal behavior of the medical insurance group can be intelligently realized, the multidimensional information of the fraudulent behavior is not considered, and the abnormal detection accuracy is required to be improved.

Disclosure of Invention

In view of the above, the present invention aims to provide a method and a device for detecting abnormal behaviors of a medical insurance group, which effectively improve the recognition speed and accuracy of abnormal behaviors of medical insurance by fusing a multidimensional dynamic graph and a deep learning technology.

In order to achieve the above object, an embodiment of the present invention provides a method for detecting abnormal behaviors of a medical insurance group, including the following steps:

Acquiring medical insurance settlement data, preprocessing, and extracting key characteristics of medical insurance settlement, wherein the key characteristics comprise: drug purchase information, amount, time of drug purchase, and medical institution name and code;

Constructing three-dimensional dynamic graphs based on key features, wherein the dynamic graphs respectively represent the times of purchasing medicines, the variety of medicines and the amount of purchasing medicines;

Fusing the dynamic graphs with three dimensions to obtain a fused graph, performing community discovery search of medical insurance groups on the fused graph, and constructing a community graph based on a searched community set;

and (3) carrying out encoding and decoding reconstruction on the community graph by using the self-encoder, and calculating the reconstruction error of the community graph as an abnormal index to screen abnormal communities in the community graph, so as to realize detection of abnormal behaviors of medical insurance groups.

Preferably, the number of purchases is characterizedDynamic diagram of (a)Represented asWherein, the method comprises the steps of, wherein,Representing a dynamic diagram corresponding to a time window tA medium node set, wherein each node represents a participant with a purchasing behavior and a record;

Representing a dynamic diagram corresponding to a time window t Each side represents a medicine purchasing behavior relationship, and if any two participants purchase medicines in the same medical institution in the same time window, one side exists between the nodes corresponding to the two participants;

Representing a dynamic diagram corresponding to a time window t The matrix of each side weight is stored, the weight of each side is calculated based on the number of times that any two participants jointly appear in the purchasing record in a given time window, and the calculation formula is as follows:

；

Wherein, Representing a dynamic diagram corresponding to a time window tEdge weights between corresponding nodes of two underwriters i and j,To indicate a function, if two participants i and j have a record of purchases at the same medical facility k within a time window t, thenIf the number of the participating persons i and j is 1, otherwise, the number is 0, K represents the number of medical institutions in which the two participating persons i and j jointly appear in the data in a time window t;

In the dynamic variation of the time window from t to t +1, Wherein, the method comprises the steps of, wherein,、And (d) sumRespectively representing node increment, edge increment, and weight increment.

Preferably, a dynamic diagram characterizing the diversity D of drug classesRepresented asWherein, the method comprises the steps of, wherein,Representing a dynamic diagram corresponding to a time window tA medium node set, wherein each node represents a participant with a purchasing behavior and a record;

Representing a dynamic diagram corresponding to a time window t An edge set between the middle nodes, wherein each edge represents a medicine purchasing behavior relationship, and if any two participants purchase the same medicine in the same mechanism in the same time window, an edge exists between the nodes corresponding to the two participants;

Representing a dynamic diagram corresponding to a time window t Each side weight is expressed based on shannon entropy, and represents the medicine variety diversity of two participants i and j in the medicine purchasing behavior, and the calculation formula is as follows:

；

Wherein, Representing a dynamic diagram corresponding to a time window tEdge weights between corresponding nodes of two underwriters i and j,The entropy of shannon is represented by the sum of the entropy of shannon,Indicating that two participants i and j purchase the same drug within the t time windowThrough the same medicineDividing the purchase times of two participants by the total medicine purchase times of two participants, wherein L represents the number of kinds of medicines purchased together between the two participants i and j;

Preferably, a dynamic map characterizing the purchase amount MRepresented asWherein, the method comprises the steps of, wherein,Representing a dynamic diagram corresponding to a time window tA medium node set, wherein each node represents a participant with a purchasing behavior and a record;

Representing a dynamic diagram corresponding to a time window t An edge set between the middle nodes, wherein each edge represents a medicine purchasing behavior relationship, and if any two participants purchase medicines in the same mechanism in the same time window, an edge exists between the nodes corresponding to the two participants;

Representing a dynamic diagram corresponding to a time window t The matrix of each side weight is stored, the weight of each side is calculated based on the medicine purchase amount of any two participants in the same mechanism in a given time window, and the calculation formula is as follows:

；

Wherein, Representing a dynamic diagram corresponding to a time window tEdge weights between corresponding nodes of two underwriters i and j,Representing the sum of the purchase amounts of two participants i and j in the ts-th transaction within the t time window,Representing the total number of transactions of two participants i and j in an institution;

Preferably, fusing the dynamic graphs with three dimensions to obtain a fused graph includes:

combining nodes and continuous edges in the three-dimensional dynamic graph to obtain a node set Sum edge setAt the same time, the number of times of purchasing the medicine is markedDynamic diagram of (a)Dynamic diagram for representing medicine variety diversity DDynamic diagram representing medicine purchase amount MEdge weights in (a)And (d) sumAfter the normalization, the edge weights after the normalization are respectively carried out、、Weighting to obtain a weight matrixObtaining a fusion map；

Wherein, ，，AndA weight coefficient representing each edge weight, an，，。

Preferably, performing community discovery search of a medical insurance group on the fusion map, and constructing a community map based on the searched community set, wherein the community discovery search comprises the following steps:

community discovery search of medical insurance groups is carried out on the fusion graph by adopting a community discovery algorithm Louvain based on modularity optimization, and a modularity Q is introduced during search to measure indexes of community division quality, so that the fusion graph is obtained The modularity Q is defined as:

；

Wherein, Representing a fusion mapEdge weights between corresponding nodes of two underwriters i and j,AndThe weights of the nodes corresponding to the two participants i and j are respectively, the weights are the sum of the weights of all the edges connected with the nodes, and 2m represents a fusion graphIs included in the sum of all the edge weights of the block,，Indicating a function, 1 if two participants i and j are in the same community, or 0 otherwise,Representing resolution parameters for controlling a desired community;

Building community graphs based on search community sets Expressed asWherein, the method comprises the steps of, wherein,Representing a collection of communities of interest,Representing the union of all intra-and inter-community edge sets within the time window t,Indicating that within time window tA set of weights for each edge of the graph.

Preferably, the method for filtering the abnormal communities in the community graph by using the self-encoder to encode and decode the community graph and calculate the reconstruction error of the community graph as an abnormal index comprises the following steps:

Constructing a node characteristic matrix and an adjacent matrix of the community graph, wherein the node characteristic matrix X is an n multiplied by d matrix, n represents the number of communities, d represents the characteristic number of each community, the characteristics comprise the size of the communities, the weight sum of internal edges, the average node degree and the edge density, the characteristic values of each community are summarized and form one row in the node characteristic matrix X, and each row represents the characteristic vector of one community; the adjacency matrix A is an n multiplied by n matrix and represents the connection relation among communities, and elements in A Representing two communitiesAndWeighting of edges between if two communitiesAndWithout direct connection, then=0；

The self-encoder comprises an encoder adopting a graph convolution layer and a decoder adopting an inverse graph convolution layer, wherein the input node characteristic matrix and the adjacent matrix are sequentially encoded and decoded by the encoder and the decoder to obtain a reconstructed node characteristic matrix, and a reconstruction error is obtained based on the difference between the input node characteristic matrix and the reconstructed node characteristic matrix;

And screening communities with reconstruction errors higher than a threshold value from the community graph according to the reconstruction errors to serve as abnormal communities.

To achieve the above object, an embodiment further provides a device for detecting abnormal behaviors of a medical insurance group, including:

The data processing module is used for acquiring medical insurance settlement data and extracting key characteristics of medical insurance settlement after preprocessing, wherein the key characteristics comprise: drug purchase information, amount, time of drug purchase, and medical institution name and code;

The dynamic diagram construction module is used for constructing a three-dimensional dynamic diagram based on key characteristics, wherein the dynamic diagram is used for representing the number of times of purchasing medicines, representing the variety of medicines and representing the amount of purchasing medicines;

the community discovery module is used for fusing the dynamic graphs with three dimensions to obtain a fused graph, performing community discovery search of medical insurance groups on the fused graph, and constructing a community graph based on a searched community set;

The anomaly detection module is used for carrying out encoding and decoding reconstruction on the community graph by utilizing the self-encoder, calculating the reconstruction error of the community graph as an anomaly index, screening the anomaly communities in the community graph, and realizing detection of the anomaly behavior of the medical insurance group.

To achieve the above object, an embodiment of the present invention further provides a computing device, including a memory and one or more processors, where the memory stores executable codes, and the one or more processors are configured to implement the method for detecting abnormal behaviors of a medical insurance group described above when executing the executable codes.

To achieve the above object, an embodiment further provides a computer readable storage medium having a program stored thereon, which when executed by a processor, implements the above method for detecting abnormal behaviors of a medical insurance group.

Compared with the prior art, the invention has the beneficial effects that at least the following steps are included:

Three dynamic graphs are constructed by considering three dimensions of medicine purchase times, medicine variety diversity and medicine purchase amount, and on the basis, abnormal communities are screened based on reconstruction errors after graph fusion, community discovery and community reconstruction, so that the working efficiency and accuracy of medical insurance staff can be effectively improved. Moreover, the detection of the invention can adjust different time windows, can detect the abnormal medicine purchasing of the medical insurance group at all times, and can find out more comprehensive abnormal behaviors.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a method for detecting abnormal behaviors of a medical insurance group provided by an embodiment;

Fig. 2 is a schematic structural diagram of a device for detecting abnormal behaviors of medical insurance groups according to an embodiment;

FIG. 3 is a schematic diagram of a computing device provided by an embodiment.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the detailed description is presented by way of example only and is not intended to limit the scope of the invention.

The invention is characterized in that: the invention provides a medical insurance group abnormal behavior detection scheme, which is characterized in that a dynamic diagram capable of reflecting the medicine purchasing behavior of the insured personnel is constructed, the abnormal group is precisely positioned through community discovery and group structure analysis technology, and the abnormal group is rapidly and accurately identified on the basis of ensuring that a result has a certain degree of interpretability by applying a multidimensional abnormal detection and graph-based abnormal detection technology.

As shown in fig. 1, the method for detecting abnormal behaviors of a medical insurance group provided by the embodiment includes the following steps:

S1, acquiring medical insurance settlement data, preprocessing the data, and extracting key characteristics of medical insurance settlement.

In an embodiment, the medical insurance settlement data is extracted from a medical insurance database, including a medical insurance settlement table, a prescription information table, and a patient information table. Preprocessing after obtaining medical insurance settlement data, including:

Firstly, desensitizing sensitive original data in medical insurance settlement data, removing sensitive information completely, and transferring desensitized data to a local database;

then, cleaning the data, and cleaning the data records with the defects and errors;

and finally, merging the data, and carrying out data association on a plurality of data tables according to the main key and the external key to form a main table for constructing a subsequent dynamic diagram.

After pretreatment, the data after the language treatment is subjected to feature screening, key features are selected by referring to the opinion of medical insurance specialists, and the key features comprise medicine purchasing information, amount, medicine purchasing time, medical institution names, codes and the like.

Through data definition and data merging, the data is integrated into a state which can be used for constructing a graph structure, and the data is subjected to feature screening to extract key features in consideration of the problem of large dimension in the data, so that the dimension is reduced.

S2, constructing three-dimensional dynamic graphs based on key features, wherein the dynamic graphs are dynamic graphs representing the times of purchasing medicines, dynamic graphs representing the variety of medicines and dynamic graphs representing the amounts of purchasing medicines respectively.

In an embodiment, data is first divided based on the key feature of the medical institution name, a sub-table is built for each medical institution, and all dynamic images are built based on the sub-table of each medical institution. The dynamic graph is a graph model capable of capturing node and edge changes along with time, in the dynamic graph, the node represents the ginseng and insurance person, the edge represents the medicine purchasing behavior relation among the ginseng and insurance person, the weight of the edge is represented by the medicine purchasing condition of any two ginseng and insurance persons in the same medical institution, the dynamic graph reflects the relation at a certain moment, and meanwhile, how the relation changes along with time are recorded. In the embodiment, different dynamic graphs are constructed from three dimensions of medicine purchase times, medicine variety diversity and medicine purchase amount according to the data characteristics of medical insurance abnormal behaviors.

For the dimension of the number of times of purchasing medicine, a characterization number of times of purchasing medicine is built for each time window tDynamic diagram of (a)Expressed asWherein, the method comprises the steps of, wherein,Representing a dynamic diagram corresponding to a time window tA set of medium nodes, each node representing a participant with a drug purchase behavior and record, if the participant i has the drug purchase behavior and record in a time window t, iThe node is represented by a unique identifier ID of the participant.

Representing a dynamic diagram corresponding to a time window tA side set between the middle nodes, wherein each side represents a medicine purchasing behavior relation in a time window t, and if any two participants i and j purchase medicines in the same time window in the same medical institution, namely at least one time the medicines are purchased in the same medical institution, and the medicines are not necessarily the same medicines, one side exists between the nodes corresponding to the two participants i and j；

Representing a dynamic diagram corresponding to a time window tThe matrix of each side weight is stored, the weight of each side is calculated based on the number of times that any two participants jointly appear in the purchasing record in a given time window, and the calculation formula is as follows:

；

Wherein, Representing a dynamic diagram corresponding to a time window tEdge weights between corresponding nodes of two underwriters i and j,To indicate a function, if two participants i and j have a record of purchases at the same medical facility k within a time window t, thenAnd if not, 0, k represents the number of medical institutions in which two participants i and j co-occur in the data within the time window t.

By constructing edge weightsThe calculation mode of the system can reflect the relation strength of the ginseng and the insurance person in the medicine purchasing behavior, and if two ginseng and insurance persons have common medicine purchasing records in a plurality of medical institutions, the relation between the two ginseng and insurance persons is considered to be stronger, and the corresponding side weight is also larger.

Aiming at the dimension of medicine variety diversity, a dynamic diagram for representing medicine variety diversity D is constructed for each time window tReflects the diversity of the relationship between the participants in purchasing medicine. Dynamic diagramRepresented asWherein, the method comprises the steps of, wherein,Representing a dynamic diagram corresponding to a time window tA medium node set, wherein each node represents a participant with a purchasing behavior and a record;

；

Wherein, Representing a dynamic diagram corresponding to a time window tEdge weights between corresponding nodes of two underwriters i and j,Represents shannon entropy, which is used to measure the uncertainty or complexity of a system,Indicating that two participants i and j purchase the same drug within the t time windowProbability of (1), i.e. by the same drugIs calculated by dividing the number of purchases of the two participants by the total number of purchases of the two participants, L represents the number of types of the medicines purchased together between the two participants i and j, when0 Means that only when at least one of the two participants i and j purchases medicines together, the two participants i and j are connected by sides, and the shannon entropy of the side weight is high, which means that the diversity of the two participants i and j in medicine selection is high, and conversely, the diversity of the two participants i and j in medicine selection is low.

For the dimension of the consumption amount, a dynamic diagram representing the medicine purchase amount M is constructed for each time window tThe method is used for analyzing the economic mode among the paramedics. Dynamic diagramRepresented asWherein, the method comprises the steps of, wherein,Representing a dynamic diagram corresponding to a time window tA medium node set, wherein each node represents a participant with a purchasing behavior and a record;

Representing a dynamic diagram corresponding to a time window t The matrix of each side weight is stored, the side weight is calculated to be the sum of all the medicine purchase amounts of the participants corresponding to two nodes in the same mechanism, and specifically, each side weight is calculated based on the medicine purchase amounts of any two participants in the same mechanism in a given time window, and the calculation formula is as follows:

；

After modeling of the three-dimensional dynamic graph is completed, in the dynamic graph, nodes, edges and edge weights are updated along with a time window. Each time window corresponds to three dynamic graphs Respectively is，And. In dynamic change of nodes, new nodes can be added throughThe representation is made of a combination of a first and a second color,Is the newly joined set of nodes in the time window from time t to t + 1. Likewise, new edges are addedThe representation is made of a combination of a first and a second color,Is a collection of edges that are added. For the dynamic updating of the edge weights,，Representing the increment of the weight. So three different dimensions can be obtained in each time window，AndAt the same time, can obtain the change of different moments according to the time increment，And。

In particular, in the dynamic variation of the time window from t to t +1,Wherein, the method comprises the steps of, wherein,、And (d) sumRespectively representing node increment, edge increment, and weight increment.

In the dynamic variation of the time window from t to t +1,Wherein, the method comprises the steps of, wherein,、And (d) sumRespectively representing node increment, edge increment, and weight increment.

And S3, fusing the dynamic graphs with three dimensions to obtain a fused graph, performing community discovery search of medical insurance groups on the fused graph, and constructing a community graph based on the searched community set.

In the embodiment, in order to fuse three dynamic graphs with different dimensions constructed based on the number of times of medicine purchase, the amount of medicine purchase and the variety of medicines to form a fusion graph containing multi-dimensional information, the invention constructs a weighted synthesis method to integrate the attributes of the edges of the dimensions, and the relative importance of the different dimensions is reflected by adjusting the weights.

Firstly, confirming nodes and structures of three dynamic graphs, whereinAndThe medicine purchasing behavior of any pair of participants in the same mechanism is taken as the side, and the two graphs are identical in structure without considering the side weight, so the two graphs can be directly combinedAny pair of participants can purchase the same medicines in the same institution to form a sideThe figure is regarded asAndSubgraph of the graph, so，AndCan be combined together, namely, the nodes and the connecting edges in the dynamic graph with three dimensions are directly combined to obtain a node setSum edge set。

Consider three dynamic graphs，AndEdge weights of (2),AndThe measurement standards are different, so that the respective edge weights are normalized firstly through a normalization formulaThe number of times of purchasing medicine, the variety of medicine and the side weight of the amount of purchasing medicine are standardized, whereinIs the edge weight after the normalization,Is the original edge weight, and W is the set of all edge weights in the respective graph.

After the edge weights of the three dynamic images are normalized, the edge weights of each pair of parametrics in the three dynamic images are normalized、、A single weight is synthesized by a weighted summation mode to obtain a weight matrixThus, a fusion map is obtained。

Wherein, ，，AndA weight coefficient representing each edge weight reflecting the importance of each dimension in the summation of the edge weights, an，，。

The invention is used for detecting abnormal behavior of group purchasing medicine. Weight corresponding to number of times of purchasing medicine: Frequent drug purchasing behavior may indicate overdosing or unintended drug acquisition when detecting global drug purchasing anomalies, which is a critical monitoring point. High frequency drug purchasing behavior is often inconsistent with sustained or repeated treatment of disease, especially in non-chronic treatment, may indicate excessive drug purchasing or abuse, and thus may be given a relatively high weight, set to 0.3 to 1, preferably 0.4.

Weights corresponding to variety of medicines: A high value of diversity may indicate that a patient or some participants are attempting a number of different medications in a short period of time, which may be associated with attempted medication or drug abuse. But drug diversity may be difficult to interpret directly as fraud or abuse, and some attention may be paid to this dimension but the weight is not high, weight being 0.01 to 0.5, preferably 0.2.

Weight corresponding to amount of purchase: Abnormal purchase amounts often correlate to excessive purchases or potential abnormal purchases. Large purchase fees are often unusual in normal purchase patterns, especially under non-specific diseases or treatment regimens, where high amounts may be associated with illegal or non-canonical purchase behavior, and thus this indicator is also important. In this connection too, the amount should be given a higher weight, which is set to 0.3 to 1, preferably 0.4.

This weighting configuration is based on the most common risk points in the overall purchasing behavior, and by such a setting, possible abnormal purchasing behavior can be monitored and identified more comprehensively, thereby preventing and intervening in possible irregular behavior at an early stage. Through the weight distribution based on the actual business logic and expert knowledge, the analysis result of the dynamic graph has higher operability and pertinence, thereby improving the accuracy and efficiency of anomaly detection.

The constructed fusion map is used for community discovery of medical insurance groups. Among other things, community analysis is a very valuable step in graph models, especially when dealing with large-scale medical insurance data. For this case of drug purchasing anomaly detection, community analysis may provide several key advantages:

Localization of abnormal behavior: community analysis may help identify densely connected subgraphs in a fusion graph that may represent a group of participants who behave similarly. This dense pattern of connections may be caused by common drug purchasing activities, helping to localize and identify potentially abnormal populations.

The efficiency and the accuracy of the algorithm are improved: by decomposing the large graph into several smaller communities, the anomaly detection algorithm can be independently run on these smaller communities, which can not only increase the processing speed, but also increase the accuracy of detection, as the behaviors within communities are more consistent.

Deep understanding of the data structure: community analysis helps understand the inherent structure of data, such as which participants tend to purchase similar medications at the same medical facility.

Better interpretation of the results: the results obtained from communities tend to be easier to interpret because they represent actual populations or patterns of behavior in the real world. This is particularly important for reporting results to non-technical stakeholders.

In the embodiment, community discovery search of medical insurance groups is carried out on the fusion graph by adopting a community discovery algorithm Louvain based on modularity optimization, and the modularity Q is introduced to measure indexes of community division quality during search, so that the fusion graph is obtainedThe modularity Q is defined as:

；

Wherein, Representing a fusion mapEdge weights between corresponding nodes of two underwriters i and j,AndThe weights of the nodes corresponding to the two participants i and j are respectively, the weights are the sum of the weights of all the edges connected with the nodes, and 2m represents a fusion graphIs included in the sum of all the edge weights of the block,，Indicating function is represented, which is 1 if two participants i and j are in the same community, or 0 otherwise.

Considering that the standard module degree formula is possibly incompletely suitable for a network with medical insurance group drug purchasing abnormality, the problem of excessively merging small communities is avoided, and the resolution parameter is introduced on the basis of the original module degree QParameters for controlling the desired community, the modularity Q is redefined as:

；

currently, by automated parameter search calculations and result comparisons The community size is set to 0.6 to 1.4, preferably 0.8, and neither excessively large nor small communities appear in the middle of the community size.

The search process for community discovery by using a community discovery algorithm Louvain based on modularity optimization comprises the following steps:

(1) Firstly, initializing each node of the participant as an independent community;

(2) Then for each participant's node i, consider moving it from the current community to the community in which its neighbor node j is located, and calculate the impact of this movement, if any, on the modularity Q;

If node i is moved from community A to community B of j, the amount of change in modularity ΔQ can be approximated as:

；

Wherein, Is the sum of edge weights arranged in the B,Is the sum of the edge weights of node i to nodes within community B,Is the sum of all edge weights of community B, including the edge connected to the outside,Is the sum of the edge weights of node i;

(3) Selecting a community with the maximum modularity as a new community of the node i by calculating the variation delta Q of the modularity;

(4) Iteratively repeating steps (2) and (3) until modularity is no longer increasing, and fixing the community.

In an embodiment, after the preliminary community establishment is completed, the graph is fusedCommunity collection divided into multiple community componentsEach of which is provided withAnd representing a community, and continuing to perform the combination optimization of the communities. For the community graph constructed after the initial community establishment is completedCommunity mapIs considered as a single community of nodes,Representing the union of all intra-and inter-community edge sets within the time window t,Indicating that within time window tThe edge weight of the community is the sum of the weights of the edges of the community corresponding to the original image.

S4, coding and decoding reconstruction is carried out on the community graph by using the self-encoder, and the reconstruction error of the community graph is calculated to serve as an abnormal index, so that abnormal communities in the community graph are screened, and abnormal behaviors of medical insurance groups are detected.

In an embodiment, abnormal behavior detection of a medical insurance population is performed based on deep learning. An important preparation step before deep learning is to construct the node feature matrix X and the adjacency matrix a of the community graph. These two matrices provide the necessary input data representing the structure of the community graph and the attribute information of the nodes.

The node characteristic matrix X is an n X d matrix, n represents the number of communities, d represents the characteristic number of each community, the characteristics comprise information capable of representing the characteristics of each community, specifically comprise the size of the community, the weight sum of internal edges, the average node degree and the edge density, the characteristic values of each community are summarized and form one row in the node characteristic matrix X, and each row represents the characteristic vector of one community.

The size of each community, i.e. the number of nodes in the community, is used as a feature to reflect the size of the community. The total of the weights of the internal edges, i.e. the total of the weights of all the edges in the community, represents, as a feature, the tightness and activity level inside the community. The average degree of nodes, i.e., the average degree of connection of all nodes in the community, is used as a feature to help assess the complexity of connections within the community. The edge density, i.e. the ratio of the number of edges in the community to the maximum number of edges possible in the community, reflects the degree of tightness of the connection between nodes as a feature and is an intuitive way of measuring the cohesiveness of the community.

The adjacency matrix A is an n multiplied by n matrix and represents the connection relation among communities, and elements in ARepresenting two communitiesAndWeighting of edges between if two communitiesAndWithout direct connection, then=0。

After the community feature matrix X and the adjacency matrix a are built, the architecture of the graph rolling network is defined, which involves determining the number of layers in the network and the parameters of each layer. At the heart of a graph roll-up network (GCN) is the transfer and updating of node characteristic information through a graph structure. Through testing, the three-layer graph convolution layer can pay attention to the information of each node and the direct neighbors thereof, and meanwhile, the information of the indirect neighbors is considered, so that the three-layer graph convolution layer is selected as an encoder, and community characteristics in the community graph are converted into a compressed hidden layer space with rich information.

The decoder is then accessed at the output of the encoder, causing the low-dimensional community features generated from the encoder to reconstruct the original community features. Three inverse graph convolutional layers are used as a decoder in which the inverse graph convolutional layers are implemented by transposing weight matrices in the encoder in an attempt to reconstruct an adjacency matrix or node feature matrix of the divide graph, thereby recovering the original input features.

The encoder and decoder form a self-encoder, the input node characteristic matrix X is compressed into a low-dimensional representation Z by the encoder under the constraint of adjacent matrixes, and then the Z is sent into a decoder to be decoded by an inverse graph convolution layer, so as to obtain a reconstructed node characteristic matrix。

The self-encoder is trained prior to being applied, minimizing node feature matrix X and node feature matrix when trainedReconstruction errors betweenThe reconstruction error is typically achieved by a mean square error loss function, i.eThis mean square error loss function encourages the decoder to recover the input node characteristics as accurately as possible.

In an embodiment, the reconstruction error of each community is used as an abnormality index, specifically, communities with higher reconstruction errors may indicate that the group purchasing behavior of the communities is significantly different from the general mode, so that abnormality may be indicated.

And in the application reasoning stage, calculating reconstruction errors for all communities in each time window t, screening communities with reconstruction errors higher than a threshold value from the community graph according to the reconstruction errors as abnormal communities, representing suspected group medicine purchasing abnormality, extracting node information of all the parameters in the abnormal communities, and providing the node information for medical insurance authorities to carry out official examination.

In the structure of the self-encoder, a graph convolution layer is used to efficiently extract and compress features of the graph data, and an inverse graph convolution layer is used to reconstruct the data, attempting to recover lost information. The self-encoder structure is particularly suitable for unsupervised learning and can be used for anomaly detection. By learning and reconstructing the structure layer by layer, the self-encoder can reveal potentially complex and non-explicit patterns in the data, providing support for in-depth analysis.

Based on the same inventive concept, as shown in fig. 2, the embodiment further provides a device 20 for detecting abnormal behaviors of a medical insurance group, which includes a data processing module 21, a dynamic diagram construction module 22, a community discovery module 23, and an abnormality detection module 24, wherein the data processing module 21 is configured to acquire medical insurance settlement data and extract key features of medical insurance settlement after preprocessing, and the key features include: drug purchase information, amount, time of drug purchase, and medical institution name and code; the dynamic diagram construction module 22 is configured to construct a three-dimensional dynamic diagram based on the key features, which is a dynamic diagram representing the number of times of purchasing, a dynamic diagram representing the variety of the drug, and a dynamic diagram representing the amount of purchasing; the community discovery module 23 is used for fusing the dynamic graphs with three dimensions to obtain a fused graph, performing community discovery search of medical insurance groups on the fused graph, and constructing a community graph based on the searched community set; the anomaly detection module 24 is used for performing encoding and decoding reconstruction on the community graph by using the self-encoder, and calculating the reconstruction error of the community graph as an anomaly index to screen the anomaly communities in the community graph, so as to realize detection of the anomaly behavior of the medical insurance group.

It should be noted that, when the device for detecting abnormal behaviors of a medical insurance group provided in the above embodiment is used for detecting abnormal behaviors of a medical insurance group, the above functional modules should be divided into the above functional modules for illustration, and the above functional distribution may be completed by different functional modules according to needs, that is, the internal structure of the terminal or the server is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the device for detecting abnormal behaviors based on the medical insurance group provided by the embodiment and the method embodiment for detecting abnormal behaviors of the medical insurance group belong to the same conception, and detailed implementation processes of the device are shown in the method embodiment for detecting abnormal behaviors of the medical insurance group, and are not repeated here.

According to the technical warranty group abnormal behavior detection scheme provided by the embodiment, by constructing a multi-dimensional dynamic diagram, various elements in the technical warranty data and the interrelationships thereof are comprehensively considered, and the dynamic interaction in a technical warranty settlement system is analyzed integrally; deep features and patterns in the self-encoder learning graph structure constructed by the deep learning algorithm enable the detection method to be more effective in recognizing complex and hidden fraudulent behaviors. As new medical data is continually incorporated into the self-encoder, the detection self-encoder can be continually updated and optimized by continuous learning to accommodate new fraud approaches and patterns. This flexibility and adaptability is particularly important to cope with changing medical insurance abnormal behavior. By means of systematic automatic analysis of artificial intelligence, the need of manual examination is greatly reduced, and the detection efficiency is improved. Compared with the traditional detection method relying on manual experience and intuition, the method can process a larger amount of data, discover more complex fraudulent abnormal behaviors and is quicker and more accurate.

Based on the same inventive concept, the embodiment also provides a computing device, which comprises a memory and one or more processors, wherein executable codes are stored in the memory, and the one or more processors are used for realizing the method for detecting abnormal behaviors of the medical insurance group when executing the executable codes, and specifically comprises the following steps:

S1, acquiring medical insurance settlement data, preprocessing the data, and extracting key characteristics of medical insurance settlement, wherein the key characteristics of medical insurance settlement are extracted;

s2, constructing three-dimensional dynamic graphs based on key features, wherein the dynamic graphs are dynamic graphs representing the times of purchasing medicines, dynamic graphs representing the variety of medicines and dynamic graphs representing the amounts of purchasing medicines respectively;

s3, fusing the dynamic graphs with three dimensions to obtain a fused graph, performing community discovery search of medical insurance groups on the fused graph, and constructing a community graph based on a searched community set;

As shown in fig. 3, the computing device provided by the embodiment includes, at a hardware level, hardware required by other services such as internal buses, network interfaces, and memories, in addition to the processor and the memory. The memory is a nonvolatile memory, and the processor reads the corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to realize the medical insurance group abnormal behavior detection method described in the above S1-S4. Of course, other implementations, such as logic devices or combinations of hardware and software, are not excluded from the present invention, that is, the execution subject of the following processing flows is not limited to each logic unit, but may be hardware or logic devices.

Based on the same inventive concept, the embodiment also provides a computer readable storage medium, on which a program is stored, which when executed by a processor, implements the method for detecting abnormal behaviors of a medical insurance group, specifically including the following steps:

In embodiments, computer-readable media, including both permanent and non-permanent, removable and non-removable media, may be used to implement information storage by any method or technology.

The foregoing detailed description of the preferred embodiments and advantages of the invention will be appreciated that the foregoing description is merely illustrative of the presently preferred embodiments of the invention, and that no changes, additions, substitutions and equivalents of those embodiments are intended to be included within the scope of the invention.

Claims

1. The medical insurance group abnormal behavior detection method is characterized by comprising the following steps:

wherein, the times of purchasing medicine are represented Dynamic diagram of (a)Represented asWherein, the method comprises the steps of, wherein,Representing a dynamic diagram corresponding to a time window tA medium node set, wherein each node represents a participant with a purchasing behavior and a record;

；

In the dynamic variation of the time window from t to t +1, Wherein, the method comprises the steps of, wherein,、And (d) sumRespectively representing node increment, edge increment and weight increment;

dynamic diagram for representing drug variety D Represented asWherein, the method comprises the steps of, wherein,Representing a dynamic diagram corresponding to a time window tA medium node set, wherein each node represents a participant with a purchasing behavior and a record;

；

Dynamic diagram for indicating drug purchase amount M Represented asWherein, the method comprises the steps of, wherein,Representing a dynamic diagram corresponding to a time window tA medium node set, wherein each node represents a participant with a purchasing behavior and a record;

；

Wherein, Representing a dynamic diagram corresponding to a time window tEdge weights between corresponding nodes of two underwriters i and j,Representing the sum of the purchase amounts of two participants i and j in the ts-th transaction in the t time window,Representing the total number of transactions of two participants i and j in an institution;

Fusing the dynamic graphs with three dimensions to obtain a fused graph, wherein the fusing comprises the following steps: combining nodes and continuous edges in the three-dimensional dynamic graph to obtain a node set Sum edge setAt the same time, the number of times of purchasing the medicine is markedDynamic diagram of (a)Dynamic diagram for representing medicine variety diversity DDynamic diagram representing medicine purchase amount MEdge weights in (a)And (d) sumAfter the normalization, the edge weights after the normalization are respectively carried out、、Weighting to obtain a weight matrixObtaining a fusion map; Wherein, ，，AndA weight coefficient representing each edge weight, an，，；

Performing community discovery search of medical insurance groups on the fusion graph, and constructing a community graph based on a searched community set, wherein the community discovery search comprises the following steps: community discovery search of medical insurance groups is carried out on the fusion graph by adopting a community discovery algorithm Louvain based on modularity optimization, and a modularity Q is introduced during search to measure indexes of community division quality, so that the fusion graph is obtainedThe modularity Q is defined as:

；

Building community graphs based on search community sets Expressed asWherein, the method comprises the steps of, wherein,Representing a collection of communities of interest,Representing the union of all intra-and inter-community edge sets within the time window t,Indicating that within time window tA set of weights for each edge of the graph;

The method comprises the steps of utilizing a self-encoder to encode and decode a community graph, calculating a reconstruction error of the community graph as an anomaly index, screening anomaly communities in the community graph, and detecting anomaly behaviors of medical insurance groups, wherein the method comprises the following steps: constructing a node characteristic matrix and an adjacent matrix of the community graph, wherein the node characteristic matrix X is an n multiplied by d matrix, n represents the number of communities, d represents the characteristic number of each community, the characteristics comprise the size of the communities, the weight sum of internal edges, the average node degree and the edge density, the characteristic values of each community are summarized and form one row in the node characteristic matrix X, and each row represents the characteristic vector of one community; the adjacency matrix A is an n multiplied by n matrix and represents the connection relation among communities, and elements in A Representing two communitiesAndWeighting of edges between if two communitiesAndWithout direct connection, then=0；

2. A medical insurance group abnormal behavior detection device, characterized by comprising:

The dynamic diagram construction module is used for constructing a three-dimensional dynamic diagram based on key characteristics, namely a dynamic diagram representing the number of times of purchasing medicines, a dynamic diagram representing the variety of medicines and a dynamic diagram representing the amount of purchasing medicines, wherein the number of times of purchasing medicines is represented Dynamic diagram of (a)Represented asWherein, the method comprises the steps of, wherein,Representing a dynamic diagram corresponding to a time window tA medium node set, wherein each node represents a participant with a purchasing behavior and a record;

；

The community discovery module is used for fusing the dynamic graphs with three dimensions to obtain a fused graph, and comprises the following steps: combining nodes and continuous edges in the three-dimensional dynamic graph to obtain a node set Sum edge setAt the same time, the number of times of purchasing the medicine is markedDynamic diagram of (a)Dynamic diagram for representing medicine variety diversity DDynamic diagram representing medicine purchase amount MEdge weights in (a)And (d) sumAfter the normalization, the edge weights after the normalization are respectively carried out、、Weighting to obtain a weight matrixObtaining a fusion map; Wherein, ，，AndA weight coefficient representing each edge weight, an，，；

；

The anomaly detection module is used for carrying out encoding and decoding reconstruction on the community graph by utilizing the self-encoder, calculating the reconstruction error of the community graph as an anomaly index, screening the anomaly communities in the community graph, and realizing detection of the anomaly behavior of the medical insurance group, and comprises the following steps: constructing a node characteristic matrix and an adjacent matrix of the community graph, wherein the node characteristic matrix X is an n multiplied by d matrix, n represents the number of communities, d represents the characteristic number of each community, the characteristics comprise the size of the communities, the weight sum of internal edges, the average node degree and the edge density, the characteristic values of each community are summarized and form one row in the node characteristic matrix X, and each row represents the characteristic vector of one community; the adjacency matrix A is an n multiplied by n matrix and represents the connection relation among communities, and elements in A Representing two communitiesAndWeighting of edges between if two communitiesAndWithout direct connection, then=0；

3. A computing device comprising a memory and one or more processors, the memory having executable code stored therein, wherein the one or more processors, when executing the executable code, are configured to implement the method of medical insurance group anomaly detection method of claim 1.

4. A computer-readable storage medium having stored thereon a program which, when executed by a processor, implements the method for detecting abnormal behavior of a medical insurance population of claim 1.