CN111105317B

CN111105317B - Medical insurance fraud detection method based on medicine purchasing record

Info

Publication number: CN111105317B
Application number: CN201911383476.7A
Authority: CN
Inventors: 孙佰清; 鲍鑫; 王天辰; 高稳; 王思霖
Original assignee: Harbin Institute of Technology
Current assignee: Harbin Institute of Technology
Priority date: 2019-12-28
Filing date: 2019-12-28
Publication date: 2023-05-12
Anticipated expiration: 2039-12-28
Also published as: CN111105317A

Abstract

The invention provides a medical insurance fraud detection method based on a medicine purchase record, belongs to the field of medicine fraud detection methods, and provides the medical insurance fraud detection method based on the medicine purchase record, which can accurately extract medical insurance fraud information, is convenient to operate and has high applicability. In the invention, a fraudster classification model is constructed through a machine learning algorithm; inputting patient information and medicine purchasing information into a model, and establishing a patient-medicine bipartite graph; according to the patient-medicine bipartite graph, a medicine single-mode projection graph is established to form a medicine chain; dividing a medicine chain into a normal chain and an abnormal chain by using a correlation chain algorithm; calculating the similarity of the normal chain and the abnormal chain through cosine similarity formulas respectively; retaining a comparison combination of the abnormal chain and the normal chain with the similarity of not 0; removing the same products in the abnormal chain and the normal chain in the combination, and retaining other medicines; and synthesizing the rest medicines into a fraud chain, and outputting the fraud chain. The invention is mainly used for detecting the fraudulent behavior of fraudulent patients.

Description

Medical insurance fraud detection method based on medicine purchasing record

Technical Field

The invention belongs to the field of medicine fraud detection methods, and particularly relates to a medical insurance fraud detection method.

Background

Fraud management is not common, but medical insurance fraud events often correspond to abnormal drug purchase records, while medical insurance fraud cases often have the following characteristics:

(1) Unusual: fraud events are rare but costly, so the number distribution between normal patients and fraudsters is extremely unbalanced.

(2) Knowledge sharing: fraudsters are often affected by their allies and contacts, which in turn affects others. In the medical purchasing behavior mode, fraudulent knowledge is transferred and occurs.

(3) Behavior simulation: fraudulent patients will also imitate the normal participants' drug purchasing behavior to mask their fraudulent goals in an effort to make their drug purchasing behavior appear "normal".

Therefore, a medical insurance fraud detection method based on the medicine purchasing record, which can accurately extract medical insurance fraud information, is convenient to operate and high in applicability, is needed.

Disclosure of Invention

Aiming at the problems that the existing medical insurance fraud modes are various, the fraud information cannot be accurately determined and the manual extraction of the fraud information is complicated, the invention provides the medical insurance fraud detection method based on the medicine purchasing record, which can accurately extract the medical insurance fraud information, is convenient to operate and has high applicability.

The technical scheme of the medical insurance fraud detection method based on the medicine purchasing record is as follows:

the invention relates to a medical insurance fraud detection method based on a medicine purchase record, which comprises the following steps:

s1, constructing a fraudster classification model through a machine learning algorithm;

s2, inputting patient information and medicine purchasing information into the model, and establishing a patient-medicine bipartite graph, wherein the patient information comprises a normal patient and a fraudulent patient;

step S3, establishing a single-mode projection relation of the medicine according to the patient-medicine bipartite graph to form a medicine chain;

s4, dividing the medicine chain in the step S3 into a normal chain and an abnormal chain by using a correlation chain algorithm;

s5, calculating the similarity of the normal chain and the abnormal chain through cosine similarity formulas respectively;

s6, removing normal chains with the similarity of 0, and reserving a comparison combination of the abnormal chains with the similarity of not 0 and the normal chains;

s7, removing the same products in the abnormal chain and the normal chain in the combination, and reserving other medicines;

and S8, synthesizing the rest medicines into a fraud chain, and outputting the fraud chain.

Further: in step S1, integrating patient information, extracting feature vectors of the patient information by using a machine learning algorithm, calculating information quantity IV of each feature by using a supervision screening algorithm for the feature vectors, extracting features with information quantity IV larger than the information quantity IV, and inputting the extracted features into the machine learning algorithm to obtain a fraudster classification model.

Further: in step S2, setting the medicine purchasing information of a normal patient and the medicine purchasing information of a fraudulent patient with fraudulent activity as a patient node and a medicine node, and respectively constructing a medicine-patient undirected bipartite graph of the fraudulent patient and a medicine-patient undirected bipartite graph of the normal patient; a first round of derivative feature extraction is performed on the patient-medicine bipartite graph, wherein the features extracted in the first round comprise the total amount of the kinds of medicines and the total amount of the medicines used, and a medicine single-mode projection relation is established according to the derivative features.

Further: in step S3, a second round of derived feature extraction is performed on the abnormal chain, where the features extracted in the second round include a category abnormality rate, a quantity abnormality rate, and an abnormal drug usage rate in the abnormal chain.

Further: in step S4, the association chain algorithm specifically includes: the corresponding matrixes of the two graphs are ordered according to the side weights, the corresponding medicine combination of the highest side weight is used as the initial medicine combination in the abnormal chain, medicines connected with the order Gao Bianquan in the combined medicines are further searched, the medicine chains are sequentially searched, the medicine chains are connected in series, the side weight adjacent matrix is input, and one chain is output.

Further: in step S5, the cosine similarity formula is

The method comprises the steps of carrying out a first treatment on the surface of the Wherein a, b, c are normal or abnormal chains, respectively.

Further: in step S8, a third round of derived feature extraction is performed on the synthesized fraudulent chain, the features of the third round of extraction including a species abnormality rate, a quantity abnormality rate, and an abnormal drug usage rate in the abnormal chain.

The medical insurance fraud detection method based on the medicine purchase record has the beneficial effects that:

the medical insurance fraud detection method based on the medicine purchasing record utilizes the bipartite graph and the single-mode projection relation derived by the bipartite graph, and adopts the association chain algorithm to extract fraud mode transfer and hidden medicine purchasing targets, thereby having the advantages of rapidness and accuracy in terms of business logic and being convenient to apply; meanwhile, the extraction of the fraud chain can help the supervision organization establish supervision rules for avoiding fraudulent activities and prevent malicious fraudulent activities of fraudulent patients. The medical insurance fraud detection method analyzes the medicine purchasing records in the medical insurance data, utilizes a graph theory algorithm to construct effective derivative characteristics, has higher accuracy in fraud judgment, and can effectively detect changeable medical insurance fraud modes.

Drawings

FIG. 1 is a flow chart of a medical insurance fraud detection method based on a drug purchase record of the present invention;

FIG. 2 is a flow chart of a medical insurance fraud detection method based on a drug purchase record according to embodiment 2;

FIG. 3 is a diagram of the two parts of a drug for normal patients in example 2;

fig. 4 is a diagram of the drug bipartite of a rogue patient in example 2.

Description of the embodiments

The following embodiments are used for further illustrating the technical scheme of the present invention, but not limited thereto, and all modifications and equivalents of the technical scheme of the present invention are included in the scope of the present invention without departing from the spirit and scope of the technical scheme of the present invention.

Example 1

Referring to fig. 1, the embodiment is described, in which a medical insurance fraud detection method based on a drug purchase record according to the embodiment includes the following steps:

s1, constructing a fraudster classification model through a machine learning algorithm; integrating patient information, extracting feature vectors of the patient information by adopting a machine learning algorithm, calculating the information quantity IV of each feature by using a supervision screening algorithm for the feature vectors, extracting features with the information quantity IV larger than the information quantity IV, and putting the extracted features into the machine learning algorithm to obtain a fraudster classification model. The construction of the fraud detection model is accomplished using a machine learning algorithm. Integrating the characteristic vector corresponding to the patient with the fraud mark Y corresponding to the patient to be

And calculating the information quantity IV of each feature by using a supervised feature screening algorithm smbinning, extracting features with the information quantity IV larger than 0.05, and putting the features into a machine learning algorithm to obtain an effective classification model for a fraudster.

S2, inputting patient information and medicine purchasing information into the model, and establishing a patient-medicine bipartite graph, wherein the patient information comprises a normal patient and a fraudulent patient; setting the medicine purchasing information of a normal patient and the medicine purchasing information of a fraudulent patient with fraudulent activity as patient nodes and medicine nodes, and respectively constructing a medicine-patient undirected bipartite graph of the fraudulent patient and a medicine-patient undirected bipartite graph of the normal patient; a first round of derivative feature extraction is performed on the patient-medicine bipartite graph, wherein the features extracted in the first round comprise the total amount of the kinds of medicines and the total amount of the medicines used, and a medicine single-mode projection relation is established according to the derivative features.

Patients with fraudulent activity and patients with normal performance are segmented from the training data, and each patient corresponds to a plurality of medication records. And (3) arranging a plurality of medication records of the same patient in the records, setting the patient and the medicine as two different nodes based on graph theory, and respectively constructing a medicine-patient undirected two-part graph of a fraudulent patient and a medicine-patient undirected two-part graph of a normal patient by linking the medication record node (IDj) of the patient with the medicine node (Dk). Medicine-patient undirected bipartite graph for constructed fraudulent patients and medicine-patient undirected for normal patientsTwo-part graph, refine first round feature for each patient's drug purchasing behavior: (1)

: the total amount of the used medicine varieties (2)>

: the total amount of the drug is used;

respectively deducting a medicine single-mode projection relation corresponding to the medicine-patient two-part diagram of the fraudulent patient and the medicine-patient two-part diagram of the normal patient, and representing the medicine purchasing behavior of the fraudulent patient as an abnormal chain by using a correlation chain algorithm; the purchasing behavior of normal patients is expressed as a normal chain.

The single-mode projection algorithm of the bipartite graph is mainly used for researching the relation among similar nodes, and the similar nodes are directly associated and clustered by utilizing the characteristic that one type of nodes in the bipartite graph are connected through the other type of nodes, so that a network graph which continuously grows and comprises single type of nodes is generated. The single mode projection relationship depends on the bipartite graph. Firstly, the researched medicine nodes (Dk) are added into a new network one by one, other medicine nodes connected with the researched medicine nodes through a patient are searched in a two-part graph from low to high according to the node side weights by taking the added medicine nodes as starting points, and the medicine nodes obtained in the searching process are connected with the starting point nodes. This process is repeated until all the drug nodes are connected, forming a new single-mode projection relationship. And processing the two projection relations by using an associated chain algorithm to obtain an abnormal chain and a normal chain corresponding to the patient behaviors.

Carrying out second-round derivative feature extraction on the chain, wherein the features extracted in the second round comprise abnormal type rate, abnormal quantity rate and abnormal medicine utilization rate in the abnormal chain; deriving a second round of patient behavioral characteristics based on the abnormal chain corresponding to fraudulent patient behavior, and for each abnormal chain, extracting the following three characteristics for each patient: (1) The ratio of the same number of the types of medicines used by each patient to the total amount of the types of medicines used by each patient as the types of medicines used by the fraudulent chain; (2) The ratio of the total amount of drug in each patient to the total amount of drug in each patient in the fraudulent use chain; (3) The ratio of the total amount of drug in the fraudulent chain to the abnormal chain drug variety used by each patient.

S4, dividing the medicine chain in the step S3 into a normal chain and an abnormal chain by using a correlation chain algorithm; the association chain algorithm specifically comprises the following steps: the corresponding matrixes of the two graphs are ordered according to the side weights, the corresponding medicine combination of the highest side weight is used as the initial medicine combination in the abnormal chain, medicines connected with the order Gao Bianquan in the combined medicines are further searched, the medicine chains are sequentially searched, the medicine chains are connected in series, the side weight adjacent matrix is input, and one chain is output.

S5, calculating the similarity of the normal chain and the abnormal chain through cosine similarity formulas respectively; the cosine similarity formula is

S6, removing normal chains with the similarity of 0, and reserving a comparison combination of the abnormal chains with the similarity of not 0 and the normal chains; and calculating the cosine similarity between the abnormal chain corresponding to each fraudulent patient behavior and the normal chain corresponding to each normal patient behavior, and removing the comparison combination of the abnormal chain and the normal chain with the similarity of 0. And (3) retaining other comparison combinations, removing medicines which are the same as the normal chain on the abnormal chain in the combination, and combining the rest medicines on the abnormal chain into a fraudulent chain.

s8, synthesizing the residual medicines into a fraudulent chain, and extracting derivative characteristics of the synthesized fraudulent chain for a third round, wherein the extracted characteristics of the third round comprise abnormal type rate, abnormal quantity rate and abnormal medicine use rate in the abnormal chain; outputting a fraud chain.

Deriving a third round of patient behavioral characteristics on the basis of the fraudulent chain, extracting, for each fraudulent chain, the following three characteristics for each patient: (1) The ratio of the same number of the types of medicines used by each patient to the total amount of the types of medicines used by each patient as the types of medicines used by the fraudulent chain; (2) The ratio of the total amount of drug in each patient to the total amount of drug in each patient in the fraudulent use chain; (3) The ratio of the corresponding total amount of drug in the fraudulent chain to the type of drug in the fraudulent chain used by each patient.

Example 2

In this embodiment, a medical insurance fraud detection method based on a medicine purchase record according to this embodiment is described with reference to fig. 2, 3 and 4 and embodiment 1, and medical insurance fraud detection methods published in a certain city are adopted as cases, and corresponding medical insurance fraud detection methods are used to respectively establish two graphs of corresponding patients, namely medicines, for fraudulent patients and normal patients.

Patients with fraudulent activity and patients with normal performance are segmented from the training data, and each patient corresponds to a plurality of medication records, and the total of 1368148 medication records of 15000 persons. And (3) arranging a plurality of medication records of the same patient in the records, setting the patient and the medicine as two different nodes based on graph theory, and respectively constructing a medicine-patient undirected two-part graph of a fraudulent patient and a medicine-patient undirected two-part graph of a normal patient by linking the medication record node (IDj) of the patient with the medicine node (Dk). In the undirected bipartite graph constructed by medication records, patient nodes can only be connected with each other through medicine nodes and cannot be directly connected with each other; the medication nodes can only be connected to each other by patient nodes, and cannot be directly connected. And only a simple purchasing relationship exists between the medicine node and the patient node, so that the undirected bipartite graph is used for completing the representation of the medicine purchasing behavior. The side rights in the undirected bipartite graph are the number of purchases of the corresponding drug by the patient.

For the constructed medicine-patient undirected bipartite graph of the fraudulent patient and the medicine-patient undirected bipartite graph of the normal patient, the first round of characteristics are extracted for the medicine purchasing behavior of each patient: (1)

: the total amount of the used medicine varieties is->

Representing the number of drug nodes to which patient node j is connected; (2)/>

: total amount of the drug used,/->

Representing the sum of the weights of the drug nodes to which patient node j is connected.

The single-mode projection algorithm of the bipartite graph is mainly used for researching the relation among similar nodes, and the similar nodes are directly associated and clustered by utilizing the characteristic that one type of nodes in the bipartite graph are connected through the other type of nodes, so that a network graph which continuously grows and comprises single type of nodes is generated. The single mode projection relationship depends on the bipartite graph. Firstly, the researched medicine nodes (Dk) are added into a new network one by one, other medicine nodes connected with the researched medicine nodes through a patient are searched in a two-part graph from low to high according to the node side weights by taking the added medicine nodes as starting points, and the medicine nodes obtained in the searching process are connected with the starting point nodes. This process is repeated until all the drug nodes are connected, forming a new single-mode projection relationship. And finally, carrying out matrix representation on the newly formed single-mode projection relationship, wherein the side weight corresponding to the side formed by node connection is the number of patients commonly using two types of medicines, and the corresponding matrix form is as follows:

wherein m is the number of medicines, and is used simultaneously

Medicine and->

The number of patients of the medicine is->

。

Accordingly, a medicine-patient bipartite graph of a fraudulent patient and a medicine single-mode projection relation corresponding to the medicine-patient bipartite graph of a normal patient are obtained, and the two projection relations are processed by using a correlation chain algorithm to obtain an abnormal chain and a normal chain corresponding to the behavior of the patient.

Taking the process of processing the medicine of the fraudulent patient, namely the two-part graph of the patient, to obtain an abnormal chain as an example, firstly, the matrixes corresponding to the two-part graph are ordered according to the side weight, and starting from the medicine combination corresponding to the highest side weight, the matrixes are used as the corresponding initial medicine combination in the abnormal chain, and medicines connected with the sub Gao Bianquan corresponding to the medicines in the combination are further searched. If the medicine exists, the searched medicines are respectively connected to the two sides of the abnormal chain starting medicine combination according to the corresponding relation until the positions of the medicines at the two sides of the chain corresponding to the connected medicines with the next highest side weight cannot be searched repeatedly. And continuously repeating the steps until all medicines in the single-mode projection relation are traversed, and finally obtaining the abnormal chain combination which carries the information of the fraudulent patient and does not contain repeated medicines.

The processing mode of the medicine-patient single-mode projection relation of the normal patient is consistent with the method, and finally the normal chain combination which carries the information of the normal patient and does not contain repeated medicines is obtained.

Deriving a second round of patient behavioral characteristics based on the abnormal chain corresponding to fraudulent patient behavior, and for each abnormal chain, extracting the following three characteristics for each patient: (1) The ratio of the same number of the types of medicines used by each patient to the total amount of the types of medicines used by each patient as the types of medicines used by the fraudulent chain; (2) The ratio of the total amount of drug in each patient to the total amount of drug in each patient in the fraudulent use chain; (3) The ratio of the total amount of drug in the fraudulent chain to the abnormal chain drug variety used by each patient. If the number of the obtained abnormal chains is

Bars, then corresponding to each patient, can obtain +.>

A derivative feature, labeled->

。

And calculating the cosine similarity between the abnormal chain corresponding to each fraudulent patient behavior and the normal chain corresponding to each normal patient behavior, and removing the comparison combination of the abnormal chain and the normal chain with the similarity of 0. And (3) retaining other comparison combinations, removing medicines which are the same as the normal chain on the abnormal chain in the combination, and combining the rest medicines on the abnormal chain into a fraudulent chain. And carrying out the operation on each group of abnormal chains and normal chains with similarity of not 0, and solving the corresponding fraudulent chains.

Deriving a third round of patient behavioral characteristics on the basis of the fraudulent chain, extracting, for each fraudulent chain, the following three characteristics for each patient: (1) The ratio of the same number of the types of medicines used by each patient to the total amount of the types of medicines used by each patient as the types of medicines used by the fraudulent chain; (2) The ratio of the total amount of drug in each patient to the total amount of drug in each patient in the fraudulent use chain; (3) The ratio of the corresponding total amount of drug in the fraudulent chain to the type of drug in the fraudulent chain used by each patient. If the number of the fraud chains is r, the fraud chains can be obtained for each patient

A derivative feature, labeled->

。

The construction of the fraud detection model is accomplished using a machine learning algorithm.

In summary, the feature vector corresponding to the patient and the fraud mark Y corresponding to the patient are integrated as

And calculating the information quantity IV of each feature by using a supervised feature screening algorithm, and extracting features with the information quantity IV larger than 0.05 from the features to be put into a machine learning model. smbiningAlgorithm: the classification processing method under R language is aimed at classifying information quantity of data set, removing matching of feature vector with fraud mark, firstly, making ten-fold cross-validation on the above-mentioned data set, using logistic regression algorithm to create convex optimization target related to classification coefficient for training set data, using gradient descent method to make iterative update on the convex optimization target, using ROC and AUC as evaluation variable of model performance expression, comparing result as shown in table 1 below to obtain relative optimum performance classification coefficient vector t, in which>

。

	1	2	3	4	5	6	7	8	9	10
											Training AUC	0.86	0.86	0.82	0.85	0.85	0.86	0.86	0.86	0.86	0.85
Test AUC	0.84	0.79	0.8	0.78	0.82	0.8	0.78	0.78	0.81	0.86

TABLE 1

Based on the classification coefficient vector t, the fraud probability is

；

Obtaining

And calculating the corresponding fraud probability of the patient by using the logistic function, and finishing the judgment of whether the patient is a fraudulent patient. There are various implementations of creating a new credit model. To obtain a viable credit scoring model, a new credit is used in this embodimentThe attribute set of the model is used as a subset of the feasible domain of the attribute set. The algorithm used by the new credit scoring model is further determined based on the properties of the attribute set. At present, the variety of algorithms that can be applied to generate credit scoring models is large. For example: based on logistic regression, based on random forests, based on GBDT, etc. In this embodiment, the algorithm screening includes using a new algorithm after algorithm fusion, and implementing algorithm optimization according to the following strategy. Logistic regression: the probability of occurrence of an event can be obtained by studying the relationship between the probability of occurrence of an event and a plurality of factors. When the probability is greater than 0.5. It is considered that this occurs, and below 0.5 it is considered that this does not occur.

Correlation chain algorithm: the corresponding matrixes of the two graphs are ordered according to the side weights (the occurrence frequency of the medicine combinations), and medicines connected with the times Gao Bianquan in the combined medicines are further searched by starting from the medicine combination corresponding to the highest side weight as the initial medicine combination in the abnormal chain. And (3) sequentially searching, namely, connecting drug chains together in series. The side weight adjacency matrix is input and one chain is output. As shown in the two diagrams of fig. 3 and 4, the methamphetamine is a normal patient, the oxepin Xin Rengui is an abnormal patient, and the a, b, c, d, e and f are medicines.

Single mode projection relationship

Schematic relation of side weight adjacency matrix:

head (beginning of medicine link)	Tail (end of medicine link)	Weight (frequency of medicine)
			Medicine aProduct(s)	b medicine	2
a medicine	c medicine	1
			a medicine	d medicine	0
b medicine	c medicine	4
			b medicine	d medicine	1
c medicine	d medicine	1

The side weight matrix of the two pictures is ordered according to the side weight, the occurrence frequency of the medicine combinations is that a-b is 2 times, a-c is 2 times, a-d is 0 times, b-c is 4 times, b-d is 1 time, c-d is 1 time, the medicine combination corresponding to the highest side weight is used as the initial medicine combination in an abnormal chain, medicines connected with the medicine combination in the order Gao Bianquan are further searched, namely a-b is twice, b-d is one time, so that a-b is taken, the medicine chains are sequentially searched, and a-b-c-d is output, namely a normal chain is obtained. Namely, the associated chain algorithm outputs a normal chain which is arranged according to the side weight.

Schematic relation of side weight adjacency matrix:

head (beginning of medicine link)	Tail (end of medicine link)	Weight (frequency of medicine)
			b medicine	c medicine	4
b medicine	e medicine	1
			b medicine	f medicine	1
c medicine	e medicine	3
			c medicine	f medicine	2
e medicine	f medicine	2

Ordering the side weight matrix of the two pictures according to the side weight, wherein the occurrence frequency of the medicine combinations is b-c 4 times, b-e 1 time, b-f 1 time, c-e 3 times, c-f 2 times, e-f 2 times, starting from the medicine combination corresponding to the highest side weight, namely b-c as the initial medicine combination in the abnormal chain, further searching medicines connected with the Gao Bianquan times in the combined medicines, namely c-f 2 times and c-e 3 times, sequentially searching c-f, connecting medicine chains together in series, and outputting b-c-e-f, namely an abnormal chain. Namely, the associated chain algorithm outputs an abnormal chain which is arranged according to the side weight.

Because the vectorized cosine similarity is not 0. Thus, after removing the drugs b, c in the abnormal strand that are identical to the normal strand, a fraudulent strand e-f is formed.

Cosine similarity definition: cosine similarity, also known as cosine similarity, is evaluated by calculating the cosine value of the angle between two vectors. Cosine similarity draws a vector into a vector space according to coordinate values. The cosine value of the included angle of the two vectors in the vector space is used as the measurement of the difference between the two individuals. The closer the cosine value is to 1, the closer the angle is to 0 degrees, i.e. the more similar the two vectors are, whereas the closer to 0 the lower the similarity of the two vectors is, which is called "cosine similarity".

The formula:

wherein a, b, c are normal and abnormal chains;

wherein the a vector is [ x ] ₁ , y ₁ ]The b vector is [ x ] ₂ , y ₂ ]The a vector is the vectorization of the normal chain, and the b vector is the vectorization of the abnormal chain. Thereby removing the vector with similarity of 0.

。/>

Claims

1. The medical insurance fraud detection method based on the medicine purchase record is characterized by comprising the following steps of:

s8, synthesizing the rest medicines into a fraud chain, and outputting the fraud chain;

in step S2, setting the medicine purchasing information of a normal patient and the medicine purchasing information of a fraudulent patient with fraudulent activity as a patient node and a medicine node, and respectively constructing a medicine-patient undirected bipartite graph of the fraudulent patient and a medicine-patient undirected bipartite graph of the normal patient; performing first-round derivative feature extraction on the patient-medicine bipartite graph, wherein the features extracted in the first round comprise the total amount of the types of medicines and the total amount of the medicines used, and establishing a medicine single-mode projection relation according to the derivative features;

in step S3, performing second-round derivative feature extraction on the abnormal chain, wherein the features extracted in the second round comprise a species abnormality rate, a quantity abnormality rate and an abnormal medicine use rate in the abnormal chain;

in step S4, the association chain algorithm specifically includes: ordering the corresponding matrixes of the two graphs according to the side weights, starting from the medicine combination corresponding to the highest side weight as the initial medicine combination in the abnormal chain, further searching medicines connected with the order Gao Bianquan in the combined medicines, sequentially searching, connecting the medicine chains together in series, inputting the side weight adjacent matrix, and outputting one chain;

in step S8, a third round of derived feature extraction is performed on the synthesized fraudulent chain, the features of the third round of extraction including a species abnormality rate, a quantity abnormality rate, and an abnormal drug usage rate in the abnormal chain.

2. The medical insurance fraud detection method based on the medicine purchasing record according to claim 1, wherein in step S1, patient information is integrated, a feature vector of the patient information is extracted by adopting a machine learning algorithm, the information quantity IV of each feature is calculated on the feature vector by using a supervision screening algorithm smbinning, and features with the information quantity IV larger than the information quantity IV are extracted and put into the machine learning algorithm to obtain a fraudster classification model.

3. The medical insurance fraud detection method based on the medicine purchase record according to claim 2, wherein in step S5, the cosine similarity formula is

The method comprises the steps of carrying out a first treatment on the surface of the Wherein a, b, c are normal or abnormal chains, respectively. />