CN111986034B

CN111986034B - Medical insurance group fraud monitoring method, system and storage medium

Info

Publication number: CN111986034B
Application number: CN202010818035.1A
Authority: CN
Inventors: 王琼; 邬正国; 李志峰; 谢提提; 胡磊
Original assignee: Jiangsu Yunnao Data Technology Co ltd
Current assignee: Jiangsu Yunnao Data Technology Co ltd
Priority date: 2020-08-14
Filing date: 2020-08-14
Publication date: 2022-05-10
Anticipated expiration: 2040-08-14
Also published as: CN111986034A

Abstract

The invention provides a medical insurance group fraud monitoring method, which comprises the following steps: step S1, generating an analysis dataset of the patient; step S2, calculating the similarity between patients; step S3, digging extremely large groups which are highly similar to each other; and step S4, manually examining and judging the suspicious group according to the visit details of the group members. The invention also provides a medical insurance group fraud monitoring system, which comprises: a memory storing a computer program; a processor for executing the computer program, the computer program when executed performing the steps of the method as hereinbefore described except for step S4. The method is convenient for accurately and efficiently identifying the abnormal group with the medical insurance fund fraud violation behaviors.

Description

Medical insurance group fraud monitoring method, system and storage medium

Technical Field

The invention relates to the field of medical insurance fund anti-fraud, in particular to a medical insurance group fraud monitoring method and system.

Background

At present, the application system in the field of medical insurance anti-fraud in China mainly establishes a rule base by summarizing the fraud cases which occur in actual business, the patterns of fraud behaviors are more and more complex and various as time goes on, and the solidified rule base is difficult to identify new fraud behaviors. And suspected fraud is determined by a fraud detection rule defined by experts, the selection of a threshold value and a weight in the rule is very difficult, the diagnosis and treatment speciality is strong, the fraud is relatively concealed in treatment, and a certain unreasonable fraud detection mode which is determined according to the rule is also provided, so that the accuracy is extremely low.

In reality, due to the concealment of fraudulent conduct, the complexity of the conduct subject, the high degree and diversity of the fraudulent case and the limitation of the anti-fraud capability of the medical insurance department, the visual judgment of the fraudulent conduct is very difficult, and the case of the fraudulent conduct is difficult to screen directly. However, from the background of big data, the fraudulent conduct of any main body is necessarily recorded in the medical insurance data, and the data of the medical institution of each agent is recorded in the data management system in the medical insurance field, so that the potential medical insurance fraudulent conduct rule can be found from the medical treatment conduct by means of professional data analysis technology, and a mode is formed for prejudging for medical service conduct detection, the existence of the fraudulent conduct is found, and the loss of the medical insurance fund is avoided.

Generally speaking, the monitoring of medical insurance fraud has very important effect and meaning, utilizes big data mining algorithm, excavates the rule that hides behind the data, through the mode of constructing medical fraud intelligent monitoring model, and accurate discernment has the group of medical insurance fund fraud violation act of law to realize:

(1) the improper use of the medical insurance fund is checked out, and the meaningless waste of the medical insurance fund is reduced.

(2) Aiming at the suspicious fraud behaviors with a certain range, the working efficiency is improved.

(3) Potential covert fraud outside of business rules is sought.

Driven by the interest, fraud cases occur at high frequency, and personal violations that were previously only participants have evolved into now organized group fraud violations. In current medical insurance fraud, medical insurance funds involved in group fraud are huge, for example, illegal organizations frequently purchase medicines within the medical insurance pool by purchasing medical insurance cards of numerous participants and insurers and passing the personnel to hospitals to seek medical advice.

Disclosure of Invention

In view of this, the invention aims to provide a medical insurance group fraud monitoring method and system, which realize the transition from manual bill-drawing auditing to big-data omnibearing full-flow intelligent monitoring of medical insurance fund monitoring and are convenient for accurately and efficiently identifying abnormal groups with medical insurance fund fraud violation behaviors.

In a first aspect, an embodiment of the present invention provides a medical insurance group fraud monitoring method, including the following steps:

step S1, generating an analysis dataset of the patient;

step S2, calculating the similarity between patients;

step S3, the extremely large groups that are highly similar to each other are mined.

Further, in the method, the first step of the method,

with P ═ P₁，p₂，...，p_mDenotes the set of patients to be treated, G ═ G₁，g₂，...，g_nRepresents a population with similar visit behavior;

and G for any two patients in G_i、g_jThe diagnosis behaviors are highly similar;

the visit behavior refers to the activity of a patient in one visit; b, the behavior b of the patient p at a certain time t and a certain place s for medical treatment is recorded as (p, t, s); site s includes a doctor or department or hospital;

similar behavior means that different patients p have undergone the same type of visit within a certain period of time; using SB (p)_i，p_j) A set representing similar behavior in any two patients;

step S1 specifically includes:

the following fields are extracted from the visit data imported from the hospital into the patient:

1) the date of the visit;

2) hospital ID and/or department ID and/or doctor ID;

3) a patient ID;

step S2 specifically includes:

firstly, calculating the similarity of similar behaviors; the similarity of the similar behaviors is used for measuring the similarity of the two similar behaviors; if b is_i＝(p_i，t_i，s_i) And b_j＝(p_j，t_j，s_j) Is a similar behavior, then s_i＝s_j，|t_i-t_jLess than or equal to T; t is a time interval; the calculation formula of the similarity of similar behaviors is as follows:

then, the similarity between the patients is calculated according to the formula:

wherein, N (p)_i) Indicates that the patient p is present within a certain period of time_iNumber of visits, N (p)_j) Indicates that the patient p is present within a certain period of time_jNumber of visits;

step S3 specifically includes:

firstly, calculating the similarity Sim between each patient and other patients according to a formula (2), then screening the patients with the Sim larger than the similarity threshold value between the patients, and outputting a sparse matrix of the highly similar patients;

then outputting the associated network map among patients according to the sparse matrix; in the associated network graph, N represents a set of nodes; representing a set of edges between the connection nodes by E; w represents the degree of similarity between nodes, then W_ij＝Sim(p_i，p_j)，p_i，p_j∈N；

After the associated network maps among the patients exist, the large groups which are highly similar to each other in the associated network maps are continuously mined.

Further, in the method, the first step of the method,

in step S3, the mining of the extremely large groups that are highly similar to each other in the associated network graph specifically includes:

the subset is a completely connected closed subgraph in the associated network graph, namely any two nodes in the subset are connected by edges; a subset is used to represent a population, i.e., any two patients in the subset are similar;

a subset is called a maximal subset if it can no longer be expanded into a larger subset by any one or more nodes; representing a population by a maximal subset;

according to the definition of the maximum subsets, groups can be positioned in the associated network maps among patients, and then all the maximum subsets in the associated network maps are continuously mined, namely all the groups are found;

the set of the nodes meeting the condition that the population at least comprises h members, and each member has at least h-1 edges is an h-node set;

and H represents an H-node set, then H is { n: n belongs to N, d (N) is more than or equal to H-1, d (N) is the degree of the node N and represents the number of edges of the node N, namely H represents a set of nodes with at least H-1 edges; using MH diagram to represent a subgraph formed by nodes in H in the inter-patient association network map;

the method comprises the steps of searching an H-node set H meeting the group member number H in an inter-patient association network map, deducing an MH map of the H-node set H, and then exhaustively and maximally sub-set on the MH map to excavate out all groups.

Preferably, in the method, after the MH graph is derived in step S3, the first X% nodes with the highest node similarity are selected as seed nodes, and the maximum subset enumeration based on the partition is performed in the MH graph with the seed nodes, so as to obtain the whole population;

the calculation formula of the node similarity is as follows:

wherein the content of the first and second substances,

(1) d (n) represents the degree of the node n, i.e. the number of edges of the node n;

(2) nei (n) represents the set of neighbor nodes of node n;

(3)W_nmrepresenting the similarity between node n and its neighbor node m.

Further, the inter-patient similarity threshold is set to 0.8.

Further, h is set to any number of 3 to 6.

Further, X% is set to 30%.

Further, after step S3, the method further includes:

and step S4, manually examining and judging the suspicious group according to the visit details of the group members.

In a second aspect, an embodiment of the present invention provides a medical insurance group fraud monitoring system, including:

a memory storing a computer program;

a processor for executing the computer program, the computer program when executed performing the steps of the method as hereinbefore described except for step S4.

In a third aspect, an embodiment of the present invention further provides a storage medium, in which a computer program is stored, the computer program being configured to, when executed, perform the steps of the method as above except for step S4.

The invention has the advantages that:

1) the manual auditing cost is reduced, and the manual auditing efficiency is improved;

in fact, since fraudulent patients account for only a small portion of the entire patient population, only a very small amount of the hospital's massive medical detail data is a fraudulent record. Whether random sampling or sampling according to a certain rule, the patient who is extracted is a normal-behavior patient with great probability. The method provided by the invention can automatically separate the group from massive data through the model and output the diagnosis behavior index of the group, thereby not only reducing the range of suspected patients, but also improving the efficiency of manual examination.

2) The manual auditing accuracy is improved, and the medical insurance fund loss is reduced;

at present, in the field of anti-fraud of medical insurance, a certain suspected fraud behavior generation rule base is defined by experts according to past experience, so as to circumscribe suspected patients. However, over time, the fraudulent behaviors of the fraudulent group are more and more concealed and varied, and the rule base has certain ineffectiveness. According to the method provided by the invention, the real-time visit behavior data is modeled, the rules among the data are learned, the suspect group is accurately identified, the accuracy of manual examination is increased, and the loss of the medical insurance fund is reduced.

Drawings

FIG. 1 is a flow chart of a method in an embodiment of the invention.

FIG. 2 is an exemplary diagram of a sparse matrix in an embodiment of the present invention.

Fig. 3 is an exemplary illustration of an inter-patient association network map in an embodiment of the invention.

FIG. 4 is an exemplary diagram of a subset in an embodiment of the invention.

FIG. 5 is an exemplary diagram of a maximum subset in an embodiment of the invention.

Fig. 6 is an illustration of MH in an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The embodiment of the invention provides a medical insurance group fraud monitoring method, which comprises the following steps:

definition 1, population:

within the group of patients, a group of people have highly similar treatment behaviors;

there may be multiple populations within P that behave similarly.

Definition 2, visit behavior:

the visit behavior refers to the activity of a patient in one visit;

patient p hospitalizes at a time t, a place s (doctor or department or hospital)Behavior b, which may be recorded as b ═ (p, t, s); e.g. b₁May be ("p ═ ID 01", "t ═ 2020/7/15", "s ═ sector/Department/Hospital");

in this embodiment, the default setting of s is a doctor, and may be switched to a department or a hospital. Because, under the actual scene of seeing a doctor, only if the patient is diagnosed with a certain disease, the doctor can prescribe proper medicine according to the disease condition. If under a special condition, a cheater is not ill and can instruct a doctor to take a prescription at will, the cheater can utilize the 'convenience' to maximize the utilization rate of the medical insurance card held in the hand, namely, the doctor is frequently visited to take a doctor to take a prescription frequently. In reality, perhaps a fraudster may easily exercise such "convenience" on some doctors, but it is difficult to achieve such "convenience" throughout a department or hospital;

definition 3, similar behavior:

similar behavior means that different patients p have undergone the same type of visit within a certain period of time;

if different patients p visit the same doctor or the same department or the same hospital within the time interval T, the patients are regarded as having the same type of medical treatment; the threshold value of the time interval T is set as 3 days by default, and the threshold value of the time interval T can also be set autonomously according to a specific scene;

if b is₁＝(“p₁＝ID01”，“t＝2020/7/15”，“Doctor＝ID123”)，b₂＝(“p₂ID02 "," t 2020/7/16 "," sector ID123 "), then b₁And b₂Is a similar behavior;

using SB (p)_i，p_j) A set representing similar behavior in any two patients; namely SB (p)_i，p_j) Is formed by p_i，p_jAll similar behaviors of the two patients over a certain period of time;

the method comprises the following steps:

step S1, generating an analysis dataset of the patient;

1) the date of the visit, in days;

2) hospital ID and/or department ID and/or doctor ID as classification fields;

3) a patient ID; if a plurality of treatment records exist in the same classification field on the same day, only one record is reserved, namely the patient Id is unique by taking the day and the classification field as a unit;

step S2, calculating the similarity between patients;

because the doctor seeing behaviors of the groups are highly similar, the similar doctor seeing behaviors are found out firstly, and then the similarity of the similar behaviors is calculated, wherein the higher the similarity value of the similar behaviors is, the more highly similar the doctor seeing behaviors are; finally, on the basis of the similarity of the similar behaviors, the similarity between the patients is calculated; considering the patients with similarity greater than the threshold value of the similarity between the patients as the highly similar patients, and sorting the patients in a descending order according to the similarity between the patients;

definition 4, similarity of similar behaviors:

the similarity of the similar behaviors is used for measuring the similarity of the two similar behaviors; if b is_i＝(p_i，t_i，s_i) And b_j＝(p_j，t_j，s_j) Is a similar behavior, then s_i＝s_j，|t_i-t_jLess than or equal to T; therefore, the similarity of the similar behaviors is only related to the time interval, and the shorter the time interval of the similar behaviors is, the greater the similarity between the diagnosis behaviors is; therefore, the calculation formula of the similarity of similar behaviors is:

definition 5, similarity between patients:

the inter-patient similarity refers to the similarity of the visit behavior between two patients within a certain period of time; i.e. the relationship between the sum of the similarities of all similar behaviors of two patients within the time interval T and their visit behavior; therefore, the similarity between patients is calculated by the formula:

wherein, N (p)_i) Indicates that the patient p is present within a certain period of time_iNumber of visits, N (p)_j) Indicates that the patient p is present within a certain period of time_jNumber of visits; obviously, Sim (p)_i，p_j) The larger, the patient p_iAnd p_jThe greater the similarity between them;

the threshold of the similarity between patients is set as 0.8 by default, and the size of the threshold can be adjusted automatically according to specific conditions; wherein, the closer the threshold value is to 1, the higher the similarity among patients is, and the closer the threshold value is to 0, the lower the similarity among patients is, namely, no correlation exists among patients;

step S3, digging extremely large groups which are highly similar to each other;

firstly, calculating the similarity Sim between each patient and other patients according to a formula (2), then screening the patients with the sims larger than the similarity threshold value between the patients, and finally outputting a sparse matrix of the highly similar patients; an example of a sparse matrix is shown in FIG. 2;

then outputting the associated network map among patients according to the sparse matrix; that is, the relationship between patients; an example of an associated network graph is shown in figure 3;

defining 6, wherein the association network map means that the association relation between the patients is expressed in the form of a graph by using the index of the similarity between the patients, wherein the graph consists of nodes and edges, the nodes represent the patients, the edges represent the similarity between the patients, and the length of the edges represents the similarity between the patients;

(1) representing an associated network graph by Map (length of node, edge and edge);

(2) representing a set of nodes by N;

(3) representing a set of edges between the connection nodes by E;

(4) representing the similarity degree edge between the nodes by W, then W_ij＝Sim(p_i，p_j)，p_i，p_j∈N；

After the associated network maps among the patients exist, according to the characteristic that each individual in the group is similar to each other, the maximal groups which are highly similar to each other in the associated network maps are continuously mined; the method comprises the following specific steps:

definition 7, subset:

the subset is a completely connected closed subgraph in the associated network graph, namely any two nodes in the subset are connected by edges; a subset is used to represent a population, i.e., any two patients in the subset are similar; such as shown in fig. 4;

definition 8, maximum subset:

a subset is called a maximal subset if it can no longer be expanded into a larger subset by any one or more nodes; representing a population by a maximal subset; such as shown in fig. 5;

more than 2 persons can form a group, the number of group members is different, the number of edges of nodes in the associated network graph is different, and the group can be appointed to be at least composed of h members; the influence of the groups with different magnitudes on the medical insurance fund is different, and the more the number of people in the groups is, the higher the cheating insurance sum is; h is set to 3 by default; the value of h can also be modified according to the actual situation;

defining a 9, h-node set;

the number of members in the group is different, the number of edges connecting the nodes is also different, and if the appointed group at least comprises h members, each member has at least h-1 edges; the set of the nodes meeting the condition that the population at least comprises h members, and each member has at least h-1 edges is an h-node set;

assuming that H is 4, taking fig. 3 as an example, H is { a, B, C, D, E, F, G, I }, MH diagram is shown in fig. 6;

searching an H-node set H meeting the group member number H in the inter-patient association network map, and pushing the H-node set H to an MH map of the inter-patient association network map, and excavating all groups by exhausting a maximum subset on the MH map, wherein the process greatly simplifies the calculated amount;

to further simplify the computation, an exhaustive maximum subset of partitions is used on the MH map;

definition 10, node similarity:

the node similarity is measured by C, the similarity of the node n and other adjacent nodes, namely the average similarity to other adjacent nodes_nTo represent the similarity of the node n, then there are

Wherein the content of the first and second substances,

(2) nei (n) represents the set of neighbor nodes of node n;

(3)W_nmrepresenting the similarity between the node n and the adjacent node m;

the higher the node similarity is, the more similar the node is to the adjacent nodes, so that before extracting the subgraph from the MH graph, the node with the high node similarity is found out as the seed node; the first 30% nodes with the highest node similarity can be selected as seed nodes, and the seed nodes are used for carrying out partition-based maximum subset enumeration in the MH graph so as to obtain all groups;

The group is only to illustrate that the treatment behaviors of any two patients are highly similar, but not to illustrate that all people in the group are fraudsters, for example, a normal patient can often be reviewed with several familiar relatives (this is often the case for patients in a nursing home); thus, these normal patients would be mined as a population because of the highly similar behavior; that is, the population is also divided into a normal population and a suspicious population, and a normal patient population aggregated due to an accidental or special reason is called a normal population, and an abnormal population is called a suspicious population.

Therefore, the output of the group is still sent to manual review; the method can be based on the visit details of the group members, such as: the frequency of the treatment, the period of the treatment, the cost of the treatment, the department and the doctor of the treatment, the commonly used medicines and the quantity and other indexes assist in manual examination and judgment.

The embodiment of the invention also provides a medical insurance group fraud monitoring system, which comprises:

a memory storing a computer program;

a processor for executing the computer program, the computer program being operable to perform the steps of the method as hereinbefore described except for step S4.

Embodiments of the present invention also propose a storage medium having stored therein a computer program configured to perform the steps of the method as described above except for step S4 when executed.

Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention has been described in detail with reference to examples, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, which should be covered by the claims of the present invention.

Claims

1. A medical insurance group fraud monitoring method, characterized in that the method comprises the steps of:

step S1, generating an analysis dataset of the patient;

step S2, calculating the similarity between patients;

step S3, digging extremely large groups which are highly similar to each other;

with P ═ P₁，p₂，…，p_m} tableSet of patients indicated for treatment with G ═ G₁，g₂，…，g_nRepresents a population with similar visit behavior;

and G for any two patients in G_i、g_jThe visit behaviors are highly similar;

step S1 specifically includes:

1) the date of the visit;

2) hospital ID and/or department ID and/or doctor ID;

3) a patient ID;

step S2 specifically includes:

step S3 specifically includes:

then outputting the associated network map among patients according to the sparse matrix; in the associated network graph, N represents a set of nodes; representing a set of edges between the connection nodes by E; w represents the degree of similarity between nodes, then W_ij＝Sim(p_i，p_j)，p_i,p_j∈N；

2. The method of fraud monitoring of medical insurance groups according to claim 1, wherein in the method,

using H to represent H-node set, then H ═ N ∈ N, d (N) ≧ H-1}, d (N) is the degree of node N, representing the number of edges of node N, that is, H represents the set of nodes with at least H-1 edges; using MH diagram to represent a subgraph formed by nodes in H in the inter-patient association network map;

3. The medical insurance group fraud monitoring method according to claim 2, wherein in the method, after the MH graph is derived in step S3, the first X% nodes with the highest node similarity are selected as seed nodes, and the MH graph is subjected to partition-based maximum subset enumeration with the seed nodes, thereby obtaining the whole group;

the calculation formula of the node similarity is as follows:

wherein the content of the first and second substances,

(2) nei (n) represents the set of neighbor nodes of node n;

(3)W_nmrepresenting the similarity between node n and its neighbor node m.

4. The method of fraud monitoring of medical insurance groups according to claim 1, wherein in the method,

the inter-patient similarity threshold was set to 0.8.

5. The medical insurance group fraud monitoring method of claim 2, wherein in the method,

h is set to any one of 3 to 6.

6. The medical insurance group fraud monitoring method of claim 3, wherein in the method,

x% is set to 30%.

7. The medical insurance group fraud monitoring method of any one of claims 1 to 6, further comprising, after step S3:

8. A medical insurance group fraud monitoring system, comprising:

a memory storing a computer program;

a processor for running the computer program, the computer program when running performing the steps of the method of any one of claims 1 to 6.

9. A storage medium, characterized in that it comprises,

the storage medium has stored therein a computer program configured to perform the steps of the method of any one of claims 1 to 6 when executed.