CN111105317A

CN111105317A - Medical insurance fraud detection method based on medicine purchase record

Info

Publication number: CN111105317A
Application number: CN201911383476.7A
Authority: CN
Inventors: 孙佰清; 鲍鑫; 王天辰; 高稳; 王思霖
Original assignee: Harbin Institute of Technology
Current assignee: Harbin Institute of Technology
Priority date: 2019-12-28
Filing date: 2019-12-28
Publication date: 2020-05-05
Anticipated expiration: 2039-12-28
Also published as: CN111105317B

Abstract

The invention discloses a medical insurance fraud detection method based on a medicine purchase record, belongs to the field of medicine fraud detection methods, and provides the medical insurance fraud detection method based on the medicine purchase record, which can accurately extract medical insurance fraud information, is convenient and fast to operate, and has strong applicability. In the invention, a fraudster classification model is constructed through a machine learning algorithm; inputting patient information and medicine purchasing information into the model, and establishing a patient-medicine bipartite graph; establishing a single mode projection drawing of the medicine according to the patient-medicine bipartite drawing to form a medicine chain; dividing the medicine chain into a normal chain and an abnormal chain by using an associated chain algorithm; respectively calculating the similarity of the normal chain and the abnormal chain through a cosine similarity formula; keeping the comparison combination of the abnormal chain and the normal chain with the similarity not being 0; removing the same products in the abnormal chain and the normal chain in the combination, and reserving other medicines; and synthesizing the residual medicines into a fraud chain, and outputting the fraud chain. The invention is mainly used for detecting the fraud of the fraudulent patient.

Description

Medical insurance fraud detection method based on medicine purchase record

Technical Field

The invention belongs to the field of medicine fraud detection methods, and particularly relates to a medical insurance fraud detection method.

Background

In recent years, China has attracted much attention in fighting against medical insurance fraud, and particularly, the first action is a special action for anti-fraud of medical insurance after the national medical insurance supervision and administration bureau (newly established government agency specially managing medical insurance fund) is established. However, fraud detection is still highly valued at present relying on the reporting of "hotlines" and social media platforms, as well as domain expert case studies. Therefore, for the regulatory agencies, developing an effective and automatic data-driven model will help to improve the efficiency of medical reformulation and better serve the civilians. Although fraud is not common, medical insurance fraud events often correspond to abnormal drug purchase records, while medical insurance fraud cases often have the following characteristics:

(1) not commonly seen: fraudulent events are rare but costly, so the number distribution between normal patients and fraudsters is highly unbalanced.

(2) And (3) knowledge sharing: fraudsters are often affected by their allies and contacts, and thus others. In the medical procurement behavior pattern, fraudulent knowledge is transferred and occurs.

(3) Behavioral simulation: fraudulent patients may also mimic the normal participant's buying behavior to mask their fraudulent goals in an effort to make their own buying behavior look "normal".

Therefore, a medical insurance fraud detection method based on a medicine purchasing record, which can accurately extract medical insurance fraud information, is convenient to operate and has strong applicability, is needed.

Disclosure of Invention

The invention provides a medical insurance fraud detection method based on a medicine purchasing record, which can accurately extract medical insurance fraud information, is convenient to operate and has strong applicability, aiming at the problems that the existing medical insurance fraud modes are various, the fraud information cannot be accurately determined, and the fraud information is fussy to extract manually.

The invention relates to a medical insurance fraud detection method based on a medicine purchase record, which has the technical scheme as follows:

the invention relates to a medical insurance fraud detection method based on a medicine purchase record, which comprises the following steps:

step S1, establishing a fraud classification model through a machine learning algorithm;

step S2, inputting patient information and medicine purchasing information into the model, and establishing a patient-medicine bipartite graph, wherein the patient information comprises normal patients and fraudulent patients;

step S3, establishing a drug single mode projection relation according to the patient-drug bipartite graph to form a drug chain;

step S4, dividing the medicine chain in the step S3 into a normal chain and an abnormal chain by using an associated chain algorithm;

step S5, calculating the similarity of the normal chain and the abnormal chain respectively through a cosine similarity formula;

step S6, removing the normal chain with the similarity of 0, and keeping the comparison combination of the abnormal chain with the similarity of not 0 and the normal chain;

step S7, removing the same products in the abnormal chain and the normal chain in the combination, and reserving other medicines;

and step S8, synthesizing the residual medicines into a fraud chain, and outputting the fraud chain.

Further: in step S1, patient information is integrated, a feature vector of the patient information is extracted by using a machine learning algorithm, an information amount IV of each feature is calculated for the feature vector by using a supervised screening algorithm, and a feature whose information amount IV is greater than that is extracted and put into the machine learning algorithm, so as to obtain a fraud classification model.

Further: in step S2, the medicine purchasing information of the normal patient and the medicine purchasing information of the fraudulent patient with fraudulent conduct are set as a patient node and a medicine node, and a medicine-patient undirected bipartite graph of the fraudulent patient and a medicine-patient undirected bipartite graph of the normal patient are respectively constructed; and performing a first round of derivative feature extraction on the patient-medicine bipartite graph, wherein the first round of extracted features comprise the total amount of the types of used medicines and the total amount of the used medicines, and establishing a medicine single mode projection relation according to the derivative features.

Further: in step S3, a second round of derivative feature extraction is performed on the abnormal chain, where the second round of extracted features includes a category abnormal rate, a quantity abnormal rate, and an abnormal drug usage rate in the abnormal chain.

Further: in step S4, the associated chain algorithm specifically includes: and sorting the corresponding matrixes of the bipartite graph according to the side weights, starting from the medicine combination corresponding to the highest side weight as the initial medicine combination in the abnormal chain, further searching the medicines connected with the next highest side weight in the combined medicines, sequentially searching, connecting the medicine chains in series, inputting the side weight adjacent matrix, and outputting one chain.

Further: in step S5, the cosine similarity formula is

Wherein a, b and c are normal chains or abnormal chains respectively.

Further: in step S8, a third round of derivative feature extraction is performed on the synthesized fraud chain, where the extracted features include a category anomaly rate, a quantity anomaly rate, and an abnormal drug usage rate in the anomaly chain.

The invention relates to a medical insurance fraud detection method based on a medicine purchase record, which has the beneficial effects that:

the invention relates to a medical insurance fraud detection method based on a medicine purchase record, which utilizes a bipartite graph and a single-mode projection relation derived from the bipartite graph to extract fraud mode transfer and hidden medicine purchase targets by using an associated chain algorithm, has the advantages of rapidness and accuracy in the aspect of business logic and is convenient to apply; meanwhile, the extraction of the fraud chain can help a monitoring organization to establish a monitoring rule for avoiding fraudulent activities and prevent malicious fraudulent activities of fraudulent patients. The medical insurance fraud detection method analyzes the medicine purchasing records in the medical insurance data, utilizes the graph theory algorithm to construct effective derivative characteristics, has high fraud judgment accuracy and can effectively detect variable medical insurance fraud modes.

Drawings

FIG. 1 is a flow chart of a medical insurance fraud detection method based on a record of purchasing a medicine according to the present invention;

FIG. 2 is a flow chart of a medical insurance fraud detection method based on a drug purchase record according to example 2;

FIG. 3 is a bipartite graph of the drug product of a normal patient in example 2;

FIG. 4 is a bipartite graph of a drug for a fraudulent patient of example 2.

Detailed Description

The technical solutions of the present invention are further described below with reference to the following examples, but the present invention is not limited thereto, and any modifications or equivalent substitutions may be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Example 1

The embodiment is described with reference to fig. 1, in this embodiment, the method for detecting fraud in medical insurance based on a medical record of purchase of a medicine according to the embodiment includes the following steps:

step S1, establishing a fraud classification model through a machine learning algorithm; integrating patient information, extracting a feature vector of the patient information by adopting a machine learning algorithm, calculating the information quantity IV of each feature by using a supervision and screening algorithm for the feature vector, extracting the feature with the information quantity IV larger than that, and putting the feature into the machine learning algorithm to obtain a fraud classification model. And completing the construction of a fraud detection model by using a machine learning algorithm. Integrating the patient's corresponding feature vector with the patient's corresponding fraud marker Y as { X₁，X₂，X₃，X₄，...，X_3q+2，X_3q+3，X_3q+4,...，X_3q+3r+2And Y, calculating the information quantity IV of each feature by using a supervised feature screening algorithm for the features, extracting the features with the information quantity IV larger than 0.05, and putting the features into a machine learning algorithm to obtain an effective classification model aiming at the cheater.

Step S2, inputting patient information and medicine purchasing information into the model, and establishing a patient-medicine bipartite graph, wherein the patient information comprises normal patients and fraudulent patients; setting the medicine purchasing information of the normal patient and the medicine purchasing information of the fraudulent patient with the fraudulent conduct as a patient node and a medicine node, and respectively constructing a medicine-patient undirected bipartite graph of the fraudulent patient and a medicine-patient undirected bipartite graph of the normal patient; and performing a first round of derivative feature extraction on the patient-medicine bipartite graph, wherein the first round of extracted features comprise the total amount of the types of used medicines and the total amount of the used medicines, and establishing a medicine single mode projection relation according to the derivative features.

And dividing the patients with the fraudulent behaviors and the patients with normal performances from the training data, wherein each patient corresponds to a plurality of medication records. The method comprises the steps of sorting a plurality of medication records of the same patient in the records, setting the patient and the medicine into two types of different nodes based on a graph theory, and respectively constructing a medicine-patient undirected bipartite graph of a cheating patient and a medicine-patient undirected bipartite graph of a normal patient by linking the node (IDj) of the patient and the node (Dk) of the medicine of the medication record. For the constructed medicine of the cheating patient-patient undirected bipartite graph and the medicine of the normal patient-patient undirected bipartite graph, the first round of characteristics are refined for the medicine purchasing behavior of each patient: (1) x₁: the total amount of the drug species (2) X was used₂: the total amount of the medicine is used;

respectively deducing the corresponding medicine single-mode projection relation of the medicine-patient bipartite graph of the cheating patient and the medicine-patient bipartite graph of the normal patient, and representing the medicine purchasing behavior of the cheating patient as an abnormal chain by using an associated chain algorithm; the normal patient's prescription behavior is indicated as the normal chain.

The single-mode projection algorithm of the bipartite graph is mainly used for researching the relationship among nodes of the same type, and the nodes of the same type are directly related and clustered by utilizing the characteristic that the nodes of the same type in the bipartite graph are connected through the nodes of the other type, so that a continuously-increased network graph containing the nodes of the single type is generated. The single mode projection relationship relies on the bipartite graph. Firstly, the researched drug nodes (Dk) are added into a new network one by one, the other drug nodes connected with the drug nodes are searched in the bipartite graph from low to high according to the node side weights by taking the added drug nodes as a starting point, and the drug nodes obtained in the searching process are connected with the starting point node. The process is repeated continuously until all the drug nodes are connected to form a new single mode projection relationship. And processing the two projection relations by using an associated chain algorithm to obtain an abnormal chain and a normal chain corresponding to the patient behavior.

Step S3, establishing a drug single mode projection relation according to the patient-drug bipartite graph to form a drug chain; performing a second round of derivative feature extraction on the abnormal chain, wherein the second round of extracted features comprise a class abnormal rate, a quantity abnormal rate and an abnormal drug utilization rate in the abnormal chain; and deriving the behavior characteristics of the second round of patients by taking the abnormal chain corresponding to the behavior of the cheating patient as a reference, and extracting the following three characteristics for each patient respectively for each abnormal chain: (1) the ratio of the same number of the types of the medicines taken by each patient as the types of the medicines taken by the fraudulent chain to the total amount of the types of the medicines taken by each patient; (2) the ratio of the total amount of medication used by each patient in the fraud chain to the total amount of medication used by each patient; (3) the ratio of the total amount of drug used in the fraud chain to the type of drug used in the exception chain for each patient.

Step S4, dividing the medicine chain in the step S3 into a normal chain and an abnormal chain by using an associated chain algorithm; the associated chain algorithm specifically comprises the following steps: and sorting the corresponding matrixes of the bipartite graph according to the side weights, starting from the medicine combination corresponding to the highest side weight as the initial medicine combination in the abnormal chain, further searching the medicines connected with the next highest side weight in the combined medicines, sequentially searching, connecting the medicine chains in series, inputting the side weight adjacent matrix, and outputting one chain.

Step S5, calculating the similarity of the normal chain and the abnormal chain respectively through a cosine similarity formula; the cosine similarity formula is

Wherein a, b and c are normal chains or abnormal chains respectively.

Step S6, removing the normal chain with the similarity of 0, and keeping the comparison combination of the abnormal chain with the similarity of not 0 and the normal chain; and calculating the cosine similarity of the abnormal chain corresponding to each fraudulent patient behavior and the normal chain corresponding to each normal patient behavior, and removing the comparison combination of the abnormal chain and the normal chain with the similarity of 0. And (4) keeping other comparison combinations, removing the medicines which are same as the normal chains on the abnormal chains in the combinations, and combining the residual medicines on the abnormal chains into a fraud chain.

step S8, synthesizing the residual medicines into a fraud chain, and performing a third round of derivative feature extraction on the synthesized fraud chain, wherein the third round of extracted features comprises the type exception rate, the quantity exception rate and the abnormal medicine utilization rate in the exception chain; and outputting the fraud chain.

Deriving a third round of patient behavior characteristics with reference to the fraud chain, and for each fraud chain, extracting the following three characteristics for each patient: (1) the ratio of the same number of the types of the medicines taken by each patient as the types of the medicines taken by the fraudulent chain to the total amount of the types of the medicines taken by each patient; (2) the ratio of the total amount of medication used by each patient in the fraud chain to the total amount of medication used by each patient; (3) the ratio of the total amount of drugs in the fraud chain used by each patient to the types of drugs in the fraud chain used by that patient.

Example 2

In the embodiment, a medical insurance fraud detection method based on a medicine purchase record according to the embodiment is described with reference to fig. 2, fig. 3, fig. 4 and embodiment 1, and a medicine bipartite graph corresponding to a patient, i.e., a medicine bipartite graph, is established for a fraudulent patient and a normal patient respectively by taking medical insurance medicine purchase data published in a certain market as a case and corresponding to a detection method of medical insurance fraud behavior.

Dividing the patients with fraud behaviors and the patients with normal performances from the training data, wherein each patient corresponds to a plurality of medication records, and the total number of the medication records is 1368148 for 15000 people. The method comprises the steps of sorting a plurality of medication records of the same patient in the records, setting the patient and the medicine into two types of different nodes based on a graph theory, and respectively constructing a medicine-patient undirected bipartite graph of a cheating patient and a medicine-patient undirected bipartite graph of a normal patient by linking the node (IDj) of the patient and the node (Dk) of the medicine of the medication record. In the undirected bipartite graph constructed by medication records, patient nodes can only be connected with each other through drug nodes and cannot be directly connected with each other; the drug nodes can also be interconnected only by patient nodes and not directly. And the medicine node and the patient node only have a simple purchasing relationship, so the undirected bipartite graph is used for finishing the representation of the medicine purchasing behavior. The side rights in the undivided bipartite graph are the number of purchases of the corresponding drug by the patient.

For the constructed medicine of the cheating patient-patient undirected bipartite graph and the medicine of the normal patient-patient undirected bipartite graph, the first round of characteristics are refined for the medicine purchasing behavior of each patient: (1) x₁: total amount of drug species, x_1，jRepresenting the number of drug nodes connected by the patient node j; (2) x₂: total amount of drug used, x_2，jRepresenting the sum of the weights of the drug nodes connected by patient node j.

The single-mode projection algorithm of the bipartite graph is mainly used for researching the relationship among nodes of the same type, and the nodes of the same type are directly related and clustered by utilizing the characteristic that the nodes of the same type in the bipartite graph are connected through the nodes of the other type, so that a continuously-increased network graph containing the nodes of the single type is generated. The single mode projection relationship relies on the bipartite graph. Firstly, the researched drug nodes (Dk) are added into a new network one by one, the other drug nodes connected with the drug nodes are searched in the bipartite graph from low to high according to the node side weights by taking the added drug nodes as a starting point, and the drug nodes obtained in the searching process are connected with the starting point node. The process is repeated continuously until all the drug nodes are connected to form a new single mode projection relationship. Finally, matrix representation is carried out on the newly formed single mode projection relation, the edge weight corresponding to the edge formed by node connection is the number of patients using two types of medicines together, and the corresponding matrix form is as follows:

wherein m is the number of drugs, and D is used simultaneously₁Drugs and D₂The number of patients of the drug is p₁₂。

Accordingly, the medicine single-mode projection relation corresponding to the medicine-patient bipartite graph of the cheating patient and the medicine-patient bipartite graph of the normal patient is obtained, and the two projection relations are processed by using an associated chain algorithm to obtain an abnormal chain and a normal chain corresponding to the patient behavior.

Taking the process of processing the medicine-patient bipartite graph of the cheating patient to obtain the abnormal chain as an example, firstly, the matrixes corresponding to the bipartite graph of the cheating patient are sorted according to the side weight, and the medicine combination corresponding to the highest side weight is used as the initial medicine combination corresponding to the abnormal chain, and then the medicines connected with the next highest side weight respectively corresponding to the combined traditional Chinese medicines are further searched. If the two medicines exist, the searched medicines are respectively connected to the two sides of the abnormal chain starting medicine combination according to the corresponding relation until the positions of the connected medicines corresponding to the second highest side weights of the medicines on the two sides of the chain cannot be searched repeatedly. And continuously repeating the steps until all the medicines in the single-mode projection relation are traversed, and finally obtaining the abnormal chain combination which bears the information of the cheating patient and does not contain the repeated medicines.

The processing method of the drug-patient single-mode projection relation of the normal patient is consistent with the method, and finally the normal chain combination which bears the normal patient information and does not contain repeated drugs is obtained.

And deriving the behavior characteristics of the second round of patients by taking the abnormal chain corresponding to the behavior of the cheating patient as a reference, and extracting the following three characteristics for each patient respectively for each abnormal chain: (1) the ratio of the same number of the types of the medicines taken by each patient as the types of the medicines taken by the fraudulent chain to the total amount of the types of the medicines taken by each patient; (2) the ratio of the total amount of medication used by each patient in the fraud chain to the total amount of medication used by each patient; (3) the ratio of the total amount of drug used in the fraud chain to the type of drug used in the exception chain for each patient. If the number of the acquired abnormal chains is q, corresponding to each patientQ × 3 derived features, labeled X, can be obtained₃,X₄,...,X_3q+2。

And calculating the cosine similarity of the abnormal chain corresponding to each fraudulent patient behavior and the normal chain corresponding to each normal patient behavior, and removing the comparison combination of the abnormal chain and the normal chain with the similarity of 0. And (4) keeping other comparison combinations, removing the medicines which are same as the normal chains on the abnormal chains in the combinations, and combining the residual medicines on the abnormal chains into a fraud chain. And performing the operation on each group of abnormal chains and normal chains with similarity different from 0, and solving corresponding fraud chains.

Deriving a third round of patient behavior characteristics with reference to the fraud chain, and for each fraud chain, extracting the following three characteristics for each patient: (1) the ratio of the same number of the types of the medicines taken by each patient as the types of the medicines taken by the fraudulent chain to the total amount of the types of the medicines taken by each patient; (2) the ratio of the total amount of medication used by each patient in the fraud chain to the total amount of medication used by each patient; (3) the ratio of the total amount of drugs in the fraud chain used by each patient to the types of drugs in the fraud chain used by that patient. If the number of the obtained fraud chains is r, r multiplied by 3 derived features can be obtained corresponding to each patient and are marked as X_3q+3,X_4q+4,...,X_3q+3r+2。

And completing the construction of a fraud detection model by using a machine learning algorithm.

In summary, the fraud flags Y corresponding to the patient and the feature vector corresponding to the patient are integrated into { X }₁，X₂，X₃，X₄，...，X_3q+2，X_3q+3，X_3q+4,...，X_3q+3r+2Y, calculating the information quantity IV of each feature by using a supervised feature screening algorithm for the features, extracting the features with the information quantity IV greater than 0.05, and putting the extracted features into a machine learning model, which takes a logistic regression algorithm as an example in the embodiment. Combining algorithm: the classification processing method under the R language aims at classifying the information quantity of a data set, removing the matching of a feature vector with excessively low information quantity and a fraud mark, firstly, performing ten-fold cross validation on the data set, and establishing a classification coefficient by using a logistic regression algorithm aiming at training set dataThe convex optimization target is iteratively updated by using a gradient descent method, ROC and AUC are used as evaluation variables of model performance, the comparison result is shown in the following table 1, and a classification coefficient vector t with the best relative performance is obtained, wherein,

t＝[t₀,t₁,t₂,t₃,t₄,...,t_3q+2,t_3q+3,t_3q+4,...,t_3q+3r+2]^T。

	1	2	3	4	5	6	7	8	9	10
											training AUC	0.86	0.86	0.82	0.85	0.85	0.86	0.86	0.86	0.86	0.85
Measurement of AUC	0.84	0.79	0.8	0.78	0.82	0.8	0.78	0.78	0.81	0.86

TABLE 1

According to the classification coefficient vector t, the fraud probability is

Obtaining S_jAnd calculating the fraud probability corresponding to the patient by using a logistic function to finish the judgment on whether the patient is a fraudulent patient. There are many ways to implement the establishment of a new credit model. In order to obtain a feasible credit scoring model, the attribute set of the new credit model in this embodiment is a subset of the feasible domain of the attribute set. The algorithm used by the new credit scoring model is further determined based on the attribute set properties. Currently, there are many types of algorithms that can be applied to generate credit scoring models. For example: based on logistic regression, based on random forests, based on GBDT, etc. In this embodiment, the algorithm screening includes using a new algorithm after algorithm fusion, and according to the algorithmThe following strategy implements the algorithm preferences. Logistic regression: the probability of an event can be obtained by studying the relationship between the probability of the event and a plurality of factors. When the probability is greater than 0.5. It is considered to occur, and when it is less than 0.5, it is considered not to occur.

And (3) associating a chain algorithm: and sorting the corresponding matrixes of the bipartite graph according to the edge weights (occurrence frequency of the medicine combinations), starting from the medicine combination corresponding to the highest edge weight to serve as an initial medicine combination in the abnormal chain, and further searching the medicines connected with the next highest edge weight in the combined medicines. And searching in sequence, and connecting drug chains in series. The edge weight adjacency matrix is input, and a chain is output. As shown in the two-part diagram of fig. 3 and 4, the medicine is a normal patient with EPM-butyl, an abnormal patient with HEG-octyl-nonyl, and a, b, c, d, e and f.

Single mode projection relationship

Edge-weighted adjacency matrix schematic relationship:

head (medicine interlink head)	Tail (ending of medicine link)	Right (medicine interlinking frequency)
			a medicine	b medicine	2
a medicine	c medicine	1
			a medicine	d medicine	0
b medicine	c medicine	4
			b medicine	d medicine	1
c medicine	d medicine	1

And (3) sorting the edge weight matrix of the bipartite graph according to the height of the edge weights, wherein the occurrence frequency of the medicine combination is 2 times a-b, 2 times a-c, 0 times a-d, 4 times b-c, 1 time b-d and 1 time c-d, starting from the medicine combination corresponding to the highest edge weight, namely b-c is used as the initial medicine combination in the abnormal chain, further searching the medicines connected with the next highest edge weight in the combined medicines, namely a-b is twice, and b-d is once, so that a-b is taken and sequentially searched, medicine chains are connected in series, and a-b-c-d, namely a normal chain, is output. Namely, the associated chain algorithm outputs a normal chain which is arranged according to the height of the edge weight.

Edge-weighted adjacency matrix schematic relationship:

and (3) sorting the edge weight matrix of the bipartite graph according to the height of the edge weights, wherein the occurrence frequency of the medicine combination is b-c 4 times, b-e 1 time, b-f 1 time, c-e 3 times, c-f 2 times, and e-f 2 times, starting from the medicine combination corresponding to the highest edge weight, namely b-c is used as the initial medicine combination in the abnormal chain, further searching the medicines connected with the next highest edge weight in the combined medicines, namely c-f 2 times, and c-e 3 times, so that c-f is taken and sequentially searched, the medicine chains are connected in series, and b-c-e-f is output, namely an abnormal chain. Namely, the associative chain algorithm outputs an abnormal chain which is arranged according to the height of the edge weight.

Since the cosine similarity after vectorization is not 0. Thus, after removing the same drugs b, c in the abnormal chain as in the normal chain, the fraudulent chain e-f is formed.

Cosine similarity definition: cosine similarity, also called cosine similarity, is to evaluate the similarity of two vectors by calculating the cosine value of their included angle. And the cosine similarity draws the vector into a vector space according to the coordinate value. The cosine value of the included angle between the two vectors in the vector space is used as the measure of the difference between the two individuals. The cosine value is closer to 1, which indicates that the included angle is closer to 0 degree, namely the two vectors are more similar, otherwise, the closer to 0, which indicates that the similarity of the two vectors is lower, which is called the cosine similarity.

The formula:

wherein a, b and c are normal chains and abnormal chains;

wherein the a vector is [ x₁,y₁]The b vector is [ x ]₂,y₂]The a vector is the vectorization of the normal chain, and the b vector is the vectorization of the abnormal chain. Thereby removing the vector with the similarity of 0.

Claims

1. A medical insurance fraud detection method based on a medicine purchase record is characterized by comprising the following steps:

2. The medical insurance fraud detection method based on medicine purchase record of claim 1, wherein in step S1, patient information is integrated, a feature vector of the patient information is extracted by using a machine learning algorithm, an information quantity IV of each feature is calculated by using a supervised screening algorithm for the feature vector, and a feature with the information quantity IV greater than that is extracted and put into the machine learning algorithm to obtain a fraud classification model.

3. The medical insurance fraud detection method based on medicine purchasing records of claim 1 or 2, wherein in step S2, the medicine purchasing information of normal patients and the medicine purchasing information of fraudulent patients with fraudulent behaviors are set as patient nodes and medicine nodes, and a medicine-patient undirected bipartite graph of fraudulent patients and a medicine-patient undirected bipartite graph of normal patients are respectively constructed; and performing a first round of derivative feature extraction on the patient-medicine bipartite graph, wherein the first round of extracted features comprise the total amount of the types of used medicines and the total amount of the used medicines, and establishing a medicine single mode projection relation according to the derivative features.

4. The medical insurance fraud detection method based on medicine purchase record according to claim 3, characterized in that in step S3, a second round of derivative feature extraction is performed on the abnormal chain, and the features extracted in the second round comprise type abnormal rate, quantity abnormal rate and abnormal drug usage rate in the abnormal chain.

5. The medical insurance fraud detection method based on medicine purchase record according to claim 1 or 2, characterized in that in step S4, the associative chain algorithm is specifically: and sorting the corresponding matrixes of the bipartite graph according to the side weights, starting from the medicine combination corresponding to the highest side weight as the initial medicine combination in the abnormal chain, further searching the medicines connected with the next highest side weight in the combined medicines, sequentially searching, connecting the medicine chains in series, inputting the side weight adjacent matrix, and outputting one chain.

6. The method for detecting medical insurance fraud based on medicine purchasing record as claimed in claim 5, wherein in step S5, the cosine similarity formula is

Wherein a, b and c are normal chains or abnormal chains respectively.

7. The medical insurance fraud detection method based on medicine purchase record of claim 4, wherein in step S8, a third round of derivative feature extraction is performed on the synthesized fraud chain, and the extracted features of the third round include type anomaly rate, quantity anomaly rate and abnormal drug usage rate in the anomaly chain.