CN111105317B - Medical insurance fraud detection method based on medicine purchasing record - Google Patents

Medical insurance fraud detection method based on medicine purchasing record Download PDF

Info

Publication number
CN111105317B
CN111105317B CN201911383476.7A CN201911383476A CN111105317B CN 111105317 B CN111105317 B CN 111105317B CN 201911383476 A CN201911383476 A CN 201911383476A CN 111105317 B CN111105317 B CN 111105317B
Authority
CN
China
Prior art keywords
medicine
chain
patient
abnormal
normal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911383476.7A
Other languages
Chinese (zh)
Other versions
CN111105317A (en
Inventor
孙佰清
鲍鑫
王天辰
高稳
王思霖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Institute of Technology
Original Assignee
Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute of Technology filed Critical Harbin Institute of Technology
Priority to CN201911383476.7A priority Critical patent/CN111105317B/en
Publication of CN111105317A publication Critical patent/CN111105317A/en
Application granted granted Critical
Publication of CN111105317B publication Critical patent/CN111105317B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/08Insurance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/018Certifying business or products
    • G06Q30/0185Product, service or business identity fraud

Abstract

The invention provides a medical insurance fraud detection method based on a medicine purchase record, belongs to the field of medicine fraud detection methods, and provides the medical insurance fraud detection method based on the medicine purchase record, which can accurately extract medical insurance fraud information, is convenient to operate and has high applicability. In the invention, a fraudster classification model is constructed through a machine learning algorithm; inputting patient information and medicine purchasing information into a model, and establishing a patient-medicine bipartite graph; according to the patient-medicine bipartite graph, a medicine single-mode projection graph is established to form a medicine chain; dividing a medicine chain into a normal chain and an abnormal chain by using a correlation chain algorithm; calculating the similarity of the normal chain and the abnormal chain through cosine similarity formulas respectively; retaining a comparison combination of the abnormal chain and the normal chain with the similarity of not 0; removing the same products in the abnormal chain and the normal chain in the combination, and retaining other medicines; and synthesizing the rest medicines into a fraud chain, and outputting the fraud chain. The invention is mainly used for detecting the fraudulent behavior of fraudulent patients.

Description

Medical insurance fraud detection method based on medicine purchasing record
Technical Field
The invention belongs to the field of medicine fraud detection methods, and particularly relates to a medical insurance fraud detection method.
Background
Fraud management is not common, but medical insurance fraud events often correspond to abnormal drug purchase records, while medical insurance fraud cases often have the following characteristics:
(1) Unusual: fraud events are rare but costly, so the number distribution between normal patients and fraudsters is extremely unbalanced.
(2) Knowledge sharing: fraudsters are often affected by their allies and contacts, which in turn affects others. In the medical purchasing behavior mode, fraudulent knowledge is transferred and occurs.
(3) Behavior simulation: fraudulent patients will also imitate the normal participants' drug purchasing behavior to mask their fraudulent goals in an effort to make their drug purchasing behavior appear "normal".
Therefore, a medical insurance fraud detection method based on the medicine purchasing record, which can accurately extract medical insurance fraud information, is convenient to operate and high in applicability, is needed.
Disclosure of Invention
Aiming at the problems that the existing medical insurance fraud modes are various, the fraud information cannot be accurately determined and the manual extraction of the fraud information is complicated, the invention provides the medical insurance fraud detection method based on the medicine purchasing record, which can accurately extract the medical insurance fraud information, is convenient to operate and has high applicability.
The technical scheme of the medical insurance fraud detection method based on the medicine purchasing record is as follows:
the invention relates to a medical insurance fraud detection method based on a medicine purchase record, which comprises the following steps:
s1, constructing a fraudster classification model through a machine learning algorithm;
s2, inputting patient information and medicine purchasing information into the model, and establishing a patient-medicine bipartite graph, wherein the patient information comprises a normal patient and a fraudulent patient;
step S3, establishing a single-mode projection relation of the medicine according to the patient-medicine bipartite graph to form a medicine chain;
s4, dividing the medicine chain in the step S3 into a normal chain and an abnormal chain by using a correlation chain algorithm;
s5, calculating the similarity of the normal chain and the abnormal chain through cosine similarity formulas respectively;
s6, removing normal chains with the similarity of 0, and reserving a comparison combination of the abnormal chains with the similarity of not 0 and the normal chains;
s7, removing the same products in the abnormal chain and the normal chain in the combination, and reserving other medicines;
and S8, synthesizing the rest medicines into a fraud chain, and outputting the fraud chain.
Further: in step S1, integrating patient information, extracting feature vectors of the patient information by using a machine learning algorithm, calculating information quantity IV of each feature by using a supervision screening algorithm for the feature vectors, extracting features with information quantity IV larger than the information quantity IV, and inputting the extracted features into the machine learning algorithm to obtain a fraudster classification model.
Further: in step S2, setting the medicine purchasing information of a normal patient and the medicine purchasing information of a fraudulent patient with fraudulent activity as a patient node and a medicine node, and respectively constructing a medicine-patient undirected bipartite graph of the fraudulent patient and a medicine-patient undirected bipartite graph of the normal patient; a first round of derivative feature extraction is performed on the patient-medicine bipartite graph, wherein the features extracted in the first round comprise the total amount of the kinds of medicines and the total amount of the medicines used, and a medicine single-mode projection relation is established according to the derivative features.
Further: in step S3, a second round of derived feature extraction is performed on the abnormal chain, where the features extracted in the second round include a category abnormality rate, a quantity abnormality rate, and an abnormal drug usage rate in the abnormal chain.
Further: in step S4, the association chain algorithm specifically includes: the corresponding matrixes of the two graphs are ordered according to the side weights, the corresponding medicine combination of the highest side weight is used as the initial medicine combination in the abnormal chain, medicines connected with the order Gao Bianquan in the combined medicines are further searched, the medicine chains are sequentially searched, the medicine chains are connected in series, the side weight adjacent matrix is input, and one chain is output.
Further: in step S5, the cosine similarity formula is
Figure SMS_1
The method comprises the steps of carrying out a first treatment on the surface of the Wherein a, b, c are normal or abnormal chains, respectively.
Further: in step S8, a third round of derived feature extraction is performed on the synthesized fraudulent chain, the features of the third round of extraction including a species abnormality rate, a quantity abnormality rate, and an abnormal drug usage rate in the abnormal chain.
The medical insurance fraud detection method based on the medicine purchase record has the beneficial effects that:
the medical insurance fraud detection method based on the medicine purchasing record utilizes the bipartite graph and the single-mode projection relation derived by the bipartite graph, and adopts the association chain algorithm to extract fraud mode transfer and hidden medicine purchasing targets, thereby having the advantages of rapidness and accuracy in terms of business logic and being convenient to apply; meanwhile, the extraction of the fraud chain can help the supervision organization establish supervision rules for avoiding fraudulent activities and prevent malicious fraudulent activities of fraudulent patients. The medical insurance fraud detection method analyzes the medicine purchasing records in the medical insurance data, utilizes a graph theory algorithm to construct effective derivative characteristics, has higher accuracy in fraud judgment, and can effectively detect changeable medical insurance fraud modes.
Drawings
FIG. 1 is a flow chart of a medical insurance fraud detection method based on a drug purchase record of the present invention;
FIG. 2 is a flow chart of a medical insurance fraud detection method based on a drug purchase record according to embodiment 2;
FIG. 3 is a diagram of the two parts of a drug for normal patients in example 2;
fig. 4 is a diagram of the drug bipartite of a rogue patient in example 2.
Description of the embodiments
The following embodiments are used for further illustrating the technical scheme of the present invention, but not limited thereto, and all modifications and equivalents of the technical scheme of the present invention are included in the scope of the present invention without departing from the spirit and scope of the technical scheme of the present invention.
Example 1
Referring to fig. 1, the embodiment is described, in which a medical insurance fraud detection method based on a drug purchase record according to the embodiment includes the following steps:
s1, constructing a fraudster classification model through a machine learning algorithm; integrating patient information, extracting feature vectors of the patient information by adopting a machine learning algorithm, calculating the information quantity IV of each feature by using a supervision screening algorithm for the feature vectors, extracting features with the information quantity IV larger than the information quantity IV, and putting the extracted features into the machine learning algorithm to obtain a fraudster classification model. The construction of the fraud detection model is accomplished using a machine learning algorithm. Integrating the characteristic vector corresponding to the patient with the fraud mark Y corresponding to the patient to be
Figure SMS_2
And calculating the information quantity IV of each feature by using a supervised feature screening algorithm smbinning, extracting features with the information quantity IV larger than 0.05, and putting the features into a machine learning algorithm to obtain an effective classification model for a fraudster.
S2, inputting patient information and medicine purchasing information into the model, and establishing a patient-medicine bipartite graph, wherein the patient information comprises a normal patient and a fraudulent patient; setting the medicine purchasing information of a normal patient and the medicine purchasing information of a fraudulent patient with fraudulent activity as patient nodes and medicine nodes, and respectively constructing a medicine-patient undirected bipartite graph of the fraudulent patient and a medicine-patient undirected bipartite graph of the normal patient; a first round of derivative feature extraction is performed on the patient-medicine bipartite graph, wherein the features extracted in the first round comprise the total amount of the kinds of medicines and the total amount of the medicines used, and a medicine single-mode projection relation is established according to the derivative features.
Patients with fraudulent activity and patients with normal performance are segmented from the training data, and each patient corresponds to a plurality of medication records. And (3) arranging a plurality of medication records of the same patient in the records, setting the patient and the medicine as two different nodes based on graph theory, and respectively constructing a medicine-patient undirected two-part graph of a fraudulent patient and a medicine-patient undirected two-part graph of a normal patient by linking the medication record node (IDj) of the patient with the medicine node (Dk). Medicine-patient undirected bipartite graph for constructed fraudulent patients and medicine-patient undirected for normal patientsTwo-part graph, refine first round feature for each patient's drug purchasing behavior: (1)
Figure SMS_3
: the total amount of the used medicine varieties (2)>
Figure SMS_4
: the total amount of the drug is used;
respectively deducting a medicine single-mode projection relation corresponding to the medicine-patient two-part diagram of the fraudulent patient and the medicine-patient two-part diagram of the normal patient, and representing the medicine purchasing behavior of the fraudulent patient as an abnormal chain by using a correlation chain algorithm; the purchasing behavior of normal patients is expressed as a normal chain.
The single-mode projection algorithm of the bipartite graph is mainly used for researching the relation among similar nodes, and the similar nodes are directly associated and clustered by utilizing the characteristic that one type of nodes in the bipartite graph are connected through the other type of nodes, so that a network graph which continuously grows and comprises single type of nodes is generated. The single mode projection relationship depends on the bipartite graph. Firstly, the researched medicine nodes (Dk) are added into a new network one by one, other medicine nodes connected with the researched medicine nodes through a patient are searched in a two-part graph from low to high according to the node side weights by taking the added medicine nodes as starting points, and the medicine nodes obtained in the searching process are connected with the starting point nodes. This process is repeated until all the drug nodes are connected, forming a new single-mode projection relationship. And processing the two projection relations by using an associated chain algorithm to obtain an abnormal chain and a normal chain corresponding to the patient behaviors.
Carrying out second-round derivative feature extraction on the chain, wherein the features extracted in the second round comprise abnormal type rate, abnormal quantity rate and abnormal medicine utilization rate in the abnormal chain; deriving a second round of patient behavioral characteristics based on the abnormal chain corresponding to fraudulent patient behavior, and for each abnormal chain, extracting the following three characteristics for each patient: (1) The ratio of the same number of the types of medicines used by each patient to the total amount of the types of medicines used by each patient as the types of medicines used by the fraudulent chain; (2) The ratio of the total amount of drug in each patient to the total amount of drug in each patient in the fraudulent use chain; (3) The ratio of the total amount of drug in the fraudulent chain to the abnormal chain drug variety used by each patient.
S4, dividing the medicine chain in the step S3 into a normal chain and an abnormal chain by using a correlation chain algorithm; the association chain algorithm specifically comprises the following steps: the corresponding matrixes of the two graphs are ordered according to the side weights, the corresponding medicine combination of the highest side weight is used as the initial medicine combination in the abnormal chain, medicines connected with the order Gao Bianquan in the combined medicines are further searched, the medicine chains are sequentially searched, the medicine chains are connected in series, the side weight adjacent matrix is input, and one chain is output.
S5, calculating the similarity of the normal chain and the abnormal chain through cosine similarity formulas respectively; the cosine similarity formula is
Figure SMS_5
The method comprises the steps of carrying out a first treatment on the surface of the Wherein a, b, c are normal or abnormal chains, respectively.
S6, removing normal chains with the similarity of 0, and reserving a comparison combination of the abnormal chains with the similarity of not 0 and the normal chains; and calculating the cosine similarity between the abnormal chain corresponding to each fraudulent patient behavior and the normal chain corresponding to each normal patient behavior, and removing the comparison combination of the abnormal chain and the normal chain with the similarity of 0. And (3) retaining other comparison combinations, removing medicines which are the same as the normal chain on the abnormal chain in the combination, and combining the rest medicines on the abnormal chain into a fraudulent chain.
S7, removing the same products in the abnormal chain and the normal chain in the combination, and reserving other medicines;
s8, synthesizing the residual medicines into a fraudulent chain, and extracting derivative characteristics of the synthesized fraudulent chain for a third round, wherein the extracted characteristics of the third round comprise abnormal type rate, abnormal quantity rate and abnormal medicine use rate in the abnormal chain; outputting a fraud chain.
Deriving a third round of patient behavioral characteristics on the basis of the fraudulent chain, extracting, for each fraudulent chain, the following three characteristics for each patient: (1) The ratio of the same number of the types of medicines used by each patient to the total amount of the types of medicines used by each patient as the types of medicines used by the fraudulent chain; (2) The ratio of the total amount of drug in each patient to the total amount of drug in each patient in the fraudulent use chain; (3) The ratio of the corresponding total amount of drug in the fraudulent chain to the type of drug in the fraudulent chain used by each patient.
Example 2
In this embodiment, a medical insurance fraud detection method based on a medicine purchase record according to this embodiment is described with reference to fig. 2, 3 and 4 and embodiment 1, and medical insurance fraud detection methods published in a certain city are adopted as cases, and corresponding medical insurance fraud detection methods are used to respectively establish two graphs of corresponding patients, namely medicines, for fraudulent patients and normal patients.
Patients with fraudulent activity and patients with normal performance are segmented from the training data, and each patient corresponds to a plurality of medication records, and the total of 1368148 medication records of 15000 persons. And (3) arranging a plurality of medication records of the same patient in the records, setting the patient and the medicine as two different nodes based on graph theory, and respectively constructing a medicine-patient undirected two-part graph of a fraudulent patient and a medicine-patient undirected two-part graph of a normal patient by linking the medication record node (IDj) of the patient with the medicine node (Dk). In the undirected bipartite graph constructed by medication records, patient nodes can only be connected with each other through medicine nodes and cannot be directly connected with each other; the medication nodes can only be connected to each other by patient nodes, and cannot be directly connected. And only a simple purchasing relationship exists between the medicine node and the patient node, so that the undirected bipartite graph is used for completing the representation of the medicine purchasing behavior. The side rights in the undirected bipartite graph are the number of purchases of the corresponding drug by the patient.
For the constructed medicine-patient undirected bipartite graph of the fraudulent patient and the medicine-patient undirected bipartite graph of the normal patient, the first round of characteristics are extracted for the medicine purchasing behavior of each patient: (1)
Figure SMS_6
: the total amount of the used medicine varieties is->
Figure SMS_7
Representing the number of drug nodes to which patient node j is connected; (2)/>
Figure SMS_8
: total amount of the drug used,/->
Figure SMS_9
Representing the sum of the weights of the drug nodes to which patient node j is connected.
Respectively deducting a medicine single-mode projection relation corresponding to the medicine-patient two-part diagram of the fraudulent patient and the medicine-patient two-part diagram of the normal patient, and representing the medicine purchasing behavior of the fraudulent patient as an abnormal chain by using a correlation chain algorithm; the purchasing behavior of normal patients is expressed as a normal chain.
The single-mode projection algorithm of the bipartite graph is mainly used for researching the relation among similar nodes, and the similar nodes are directly associated and clustered by utilizing the characteristic that one type of nodes in the bipartite graph are connected through the other type of nodes, so that a network graph which continuously grows and comprises single type of nodes is generated. The single mode projection relationship depends on the bipartite graph. Firstly, the researched medicine nodes (Dk) are added into a new network one by one, other medicine nodes connected with the researched medicine nodes through a patient are searched in a two-part graph from low to high according to the node side weights by taking the added medicine nodes as starting points, and the medicine nodes obtained in the searching process are connected with the starting point nodes. This process is repeated until all the drug nodes are connected, forming a new single-mode projection relationship. And finally, carrying out matrix representation on the newly formed single-mode projection relationship, wherein the side weight corresponding to the side formed by node connection is the number of patients commonly using two types of medicines, and the corresponding matrix form is as follows:
Figure SMS_10
wherein m is the number of medicines, and is used simultaneously
Figure SMS_11
Medicine and->
Figure SMS_12
The number of patients of the medicine is->
Figure SMS_13
Accordingly, a medicine-patient bipartite graph of a fraudulent patient and a medicine single-mode projection relation corresponding to the medicine-patient bipartite graph of a normal patient are obtained, and the two projection relations are processed by using a correlation chain algorithm to obtain an abnormal chain and a normal chain corresponding to the behavior of the patient.
Taking the process of processing the medicine of the fraudulent patient, namely the two-part graph of the patient, to obtain an abnormal chain as an example, firstly, the matrixes corresponding to the two-part graph are ordered according to the side weight, and starting from the medicine combination corresponding to the highest side weight, the matrixes are used as the corresponding initial medicine combination in the abnormal chain, and medicines connected with the sub Gao Bianquan corresponding to the medicines in the combination are further searched. If the medicine exists, the searched medicines are respectively connected to the two sides of the abnormal chain starting medicine combination according to the corresponding relation until the positions of the medicines at the two sides of the chain corresponding to the connected medicines with the next highest side weight cannot be searched repeatedly. And continuously repeating the steps until all medicines in the single-mode projection relation are traversed, and finally obtaining the abnormal chain combination which carries the information of the fraudulent patient and does not contain repeated medicines.
The processing mode of the medicine-patient single-mode projection relation of the normal patient is consistent with the method, and finally the normal chain combination which carries the information of the normal patient and does not contain repeated medicines is obtained.
Deriving a second round of patient behavioral characteristics based on the abnormal chain corresponding to fraudulent patient behavior, and for each abnormal chain, extracting the following three characteristics for each patient: (1) The ratio of the same number of the types of medicines used by each patient to the total amount of the types of medicines used by each patient as the types of medicines used by the fraudulent chain; (2) The ratio of the total amount of drug in each patient to the total amount of drug in each patient in the fraudulent use chain; (3) The ratio of the total amount of drug in the fraudulent chain to the abnormal chain drug variety used by each patient. If the number of the obtained abnormal chains is
Figure SMS_14
Bars, then corresponding to each patient, can obtain +.>
Figure SMS_15
A derivative feature, labeled->
Figure SMS_16
And calculating the cosine similarity between the abnormal chain corresponding to each fraudulent patient behavior and the normal chain corresponding to each normal patient behavior, and removing the comparison combination of the abnormal chain and the normal chain with the similarity of 0. And (3) retaining other comparison combinations, removing medicines which are the same as the normal chain on the abnormal chain in the combination, and combining the rest medicines on the abnormal chain into a fraudulent chain. And carrying out the operation on each group of abnormal chains and normal chains with similarity of not 0, and solving the corresponding fraudulent chains.
Deriving a third round of patient behavioral characteristics on the basis of the fraudulent chain, extracting, for each fraudulent chain, the following three characteristics for each patient: (1) The ratio of the same number of the types of medicines used by each patient to the total amount of the types of medicines used by each patient as the types of medicines used by the fraudulent chain; (2) The ratio of the total amount of drug in each patient to the total amount of drug in each patient in the fraudulent use chain; (3) The ratio of the corresponding total amount of drug in the fraudulent chain to the type of drug in the fraudulent chain used by each patient. If the number of the fraud chains is r, the fraud chains can be obtained for each patient
Figure SMS_17
A derivative feature, labeled->
Figure SMS_18
The construction of the fraud detection model is accomplished using a machine learning algorithm.
In summary, the feature vector corresponding to the patient and the fraud mark Y corresponding to the patient are integrated as
Figure SMS_19
And calculating the information quantity IV of each feature by using a supervised feature screening algorithm, and extracting features with the information quantity IV larger than 0.05 from the features to be put into a machine learning model. smbiningAlgorithm: the classification processing method under R language is aimed at classifying information quantity of data set, removing matching of feature vector with fraud mark, firstly, making ten-fold cross-validation on the above-mentioned data set, using logistic regression algorithm to create convex optimization target related to classification coefficient for training set data, using gradient descent method to make iterative update on the convex optimization target, using ROC and AUC as evaluation variable of model performance expression, comparing result as shown in table 1 below to obtain relative optimum performance classification coefficient vector t, in which>
Figure SMS_20
1 2 3 4 5 6 7 8 9 10
Training AUC 0.86 0.86 0.82 0.85 0.85 0.86 0.86 0.86 0.86 0.85
Test AUC 0.84 0.79 0.8 0.78 0.82 0.8 0.78 0.78 0.81 0.86
TABLE 1
Based on the classification coefficient vector t, the fraud probability is
Figure SMS_21
Obtaining
Figure SMS_22
And calculating the corresponding fraud probability of the patient by using the logistic function, and finishing the judgment of whether the patient is a fraudulent patient. There are various implementations of creating a new credit model. To obtain a viable credit scoring model, a new credit is used in this embodimentThe attribute set of the model is used as a subset of the feasible domain of the attribute set. The algorithm used by the new credit scoring model is further determined based on the properties of the attribute set. At present, the variety of algorithms that can be applied to generate credit scoring models is large. For example: based on logistic regression, based on random forests, based on GBDT, etc. In this embodiment, the algorithm screening includes using a new algorithm after algorithm fusion, and implementing algorithm optimization according to the following strategy. Logistic regression: the probability of occurrence of an event can be obtained by studying the relationship between the probability of occurrence of an event and a plurality of factors. When the probability is greater than 0.5. It is considered that this occurs, and below 0.5 it is considered that this does not occur.
Correlation chain algorithm: the corresponding matrixes of the two graphs are ordered according to the side weights (the occurrence frequency of the medicine combinations), and medicines connected with the times Gao Bianquan in the combined medicines are further searched by starting from the medicine combination corresponding to the highest side weight as the initial medicine combination in the abnormal chain. And (3) sequentially searching, namely, connecting drug chains together in series. The side weight adjacency matrix is input and one chain is output. As shown in the two diagrams of fig. 3 and 4, the methamphetamine is a normal patient, the oxepin Xin Rengui is an abnormal patient, and the a, b, c, d, e and f are medicines.
Single mode projection relationship
Figure SMS_23
Schematic relation of side weight adjacency matrix:
head (beginning of medicine link) Tail (end of medicine link) Weight (frequency of medicine)
Medicine aProduct(s) b medicine 2
a medicine c medicine 1
a medicine d medicine 0
b medicine c medicine 4
b medicine d medicine 1
c medicine d medicine 1
The side weight matrix of the two pictures is ordered according to the side weight, the occurrence frequency of the medicine combinations is that a-b is 2 times, a-c is 2 times, a-d is 0 times, b-c is 4 times, b-d is 1 time, c-d is 1 time, the medicine combination corresponding to the highest side weight is used as the initial medicine combination in an abnormal chain, medicines connected with the medicine combination in the order Gao Bianquan are further searched, namely a-b is twice, b-d is one time, so that a-b is taken, the medicine chains are sequentially searched, and a-b-c-d is output, namely a normal chain is obtained. Namely, the associated chain algorithm outputs a normal chain which is arranged according to the side weight.
Schematic relation of side weight adjacency matrix:
head (beginning of medicine link) Tail (end of medicine link) Weight (frequency of medicine)
b medicine c medicine 4
b medicine e medicine 1
b medicine f medicine 1
c medicine e medicine 3
c medicine f medicine 2
e medicine f medicine 2
Ordering the side weight matrix of the two pictures according to the side weight, wherein the occurrence frequency of the medicine combinations is b-c 4 times, b-e 1 time, b-f 1 time, c-e 3 times, c-f 2 times, e-f 2 times, starting from the medicine combination corresponding to the highest side weight, namely b-c as the initial medicine combination in the abnormal chain, further searching medicines connected with the Gao Bianquan times in the combined medicines, namely c-f 2 times and c-e 3 times, sequentially searching c-f, connecting medicine chains together in series, and outputting b-c-e-f, namely an abnormal chain. Namely, the associated chain algorithm outputs an abnormal chain which is arranged according to the side weight.
Because the vectorized cosine similarity is not 0. Thus, after removing the drugs b, c in the abnormal strand that are identical to the normal strand, a fraudulent strand e-f is formed.
Cosine similarity definition: cosine similarity, also known as cosine similarity, is evaluated by calculating the cosine value of the angle between two vectors. Cosine similarity draws a vector into a vector space according to coordinate values. The cosine value of the included angle of the two vectors in the vector space is used as the measurement of the difference between the two individuals. The closer the cosine value is to 1, the closer the angle is to 0 degrees, i.e. the more similar the two vectors are, whereas the closer to 0 the lower the similarity of the two vectors is, which is called "cosine similarity".
The formula:
Figure SMS_24
wherein a, b, c are normal and abnormal chains;
wherein the a vector is [ x ] 1 , y 1 ]The b vector is [ x ] 2 , y 2 ]The a vector is the vectorization of the normal chain, and the b vector is the vectorization of the abnormal chain. Thereby removing the vector with similarity of 0.
Figure SMS_25
。/>

Claims (3)

1. The medical insurance fraud detection method based on the medicine purchase record is characterized by comprising the following steps of:
s1, constructing a fraudster classification model through a machine learning algorithm;
s2, inputting patient information and medicine purchasing information into the model, and establishing a patient-medicine bipartite graph, wherein the patient information comprises a normal patient and a fraudulent patient;
step S3, establishing a single-mode projection relation of the medicine according to the patient-medicine bipartite graph to form a medicine chain;
s4, dividing the medicine chain in the step S3 into a normal chain and an abnormal chain by using a correlation chain algorithm;
s5, calculating the similarity of the normal chain and the abnormal chain through cosine similarity formulas respectively;
s6, removing normal chains with the similarity of 0, and reserving a comparison combination of the abnormal chains with the similarity of not 0 and the normal chains;
s7, removing the same products in the abnormal chain and the normal chain in the combination, and reserving other medicines;
s8, synthesizing the rest medicines into a fraud chain, and outputting the fraud chain;
in step S2, setting the medicine purchasing information of a normal patient and the medicine purchasing information of a fraudulent patient with fraudulent activity as a patient node and a medicine node, and respectively constructing a medicine-patient undirected bipartite graph of the fraudulent patient and a medicine-patient undirected bipartite graph of the normal patient; performing first-round derivative feature extraction on the patient-medicine bipartite graph, wherein the features extracted in the first round comprise the total amount of the types of medicines and the total amount of the medicines used, and establishing a medicine single-mode projection relation according to the derivative features;
in step S3, performing second-round derivative feature extraction on the abnormal chain, wherein the features extracted in the second round comprise a species abnormality rate, a quantity abnormality rate and an abnormal medicine use rate in the abnormal chain;
in step S4, the association chain algorithm specifically includes: ordering the corresponding matrixes of the two graphs according to the side weights, starting from the medicine combination corresponding to the highest side weight as the initial medicine combination in the abnormal chain, further searching medicines connected with the order Gao Bianquan in the combined medicines, sequentially searching, connecting the medicine chains together in series, inputting the side weight adjacent matrix, and outputting one chain;
in step S8, a third round of derived feature extraction is performed on the synthesized fraudulent chain, the features of the third round of extraction including a species abnormality rate, a quantity abnormality rate, and an abnormal drug usage rate in the abnormal chain.
2. The medical insurance fraud detection method based on the medicine purchasing record according to claim 1, wherein in step S1, patient information is integrated, a feature vector of the patient information is extracted by adopting a machine learning algorithm, the information quantity IV of each feature is calculated on the feature vector by using a supervision screening algorithm smbinning, and features with the information quantity IV larger than the information quantity IV are extracted and put into the machine learning algorithm to obtain a fraudster classification model.
3. The medical insurance fraud detection method based on the medicine purchase record according to claim 2, wherein in step S5, the cosine similarity formula is
Figure QLYQS_1
The method comprises the steps of carrying out a first treatment on the surface of the Wherein a, b, c are normal or abnormal chains, respectively. />
CN201911383476.7A 2019-12-28 2019-12-28 Medical insurance fraud detection method based on medicine purchasing record Active CN111105317B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911383476.7A CN111105317B (en) 2019-12-28 2019-12-28 Medical insurance fraud detection method based on medicine purchasing record

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911383476.7A CN111105317B (en) 2019-12-28 2019-12-28 Medical insurance fraud detection method based on medicine purchasing record

Publications (2)

Publication Number Publication Date
CN111105317A CN111105317A (en) 2020-05-05
CN111105317B true CN111105317B (en) 2023-05-12

Family

ID=70423707

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911383476.7A Active CN111105317B (en) 2019-12-28 2019-12-28 Medical insurance fraud detection method based on medicine purchasing record

Country Status (1)

Country Link
CN (1) CN111105317B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112991075A (en) * 2021-02-04 2021-06-18 浙江大学山东工业技术研究院 Package type medicine purchasing abnormity detection method based on FP-growth and graph network
CN113592517A (en) * 2021-08-09 2021-11-02 深圳前海微众银行股份有限公司 Method and device for identifying cheating passenger groups, terminal equipment and computer storage medium

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105894038A (en) * 2016-04-22 2016-08-24 天云融创数据科技(北京)有限公司 Credit card fraud prediction method based on signal transmission and link mode
CN107133437A (en) * 2017-03-03 2017-09-05 平安医疗健康管理股份有限公司 The method and device that monitoring medicine is used
CN108596770A (en) * 2017-12-29 2018-09-28 山大地纬软件股份有限公司 Medicare fraud detection device and method based on outlier analysis
CN109523400A (en) * 2018-10-27 2019-03-26 平安医疗健康管理股份有限公司 Medicine method of specifying error and terminal device are taken based on data processing
CN109545316A (en) * 2018-10-30 2019-03-29 平安科技(深圳)有限公司 Purchase the processing method and Related product of medicine data
CN109559236A (en) * 2018-10-27 2019-04-02 平安医疗健康管理股份有限公司 The method and apparatus of drug reimbursement Information abnormity
CN109615547A (en) * 2018-12-13 2019-04-12 平安医疗健康管理股份有限公司 Recognition methods, device, terminal and the computer readable storage medium of abnormal purchase medicine
CN109615540A (en) * 2018-12-13 2019-04-12 平安医疗健康管理股份有限公司 Recognition methods, device, terminal and the computer readable storage medium of medicine are purchased in violation of rules and regulations
CN109636635A (en) * 2018-12-13 2019-04-16 平安医疗健康管理股份有限公司 Recognition methods, device, terminal and the computer readable storage medium write a prescription in violation of rules and regulations
CN109636652A (en) * 2018-12-13 2019-04-16 平安医疗健康管理股份有限公司 Purchase monitoring method, monitoring service end and the storage medium of medicine abnormal behavior
CN110349004A (en) * 2019-07-02 2019-10-18 北京淇瑀信息科技有限公司 Risk of fraud method for detecting and device based on user node relational network
CN110555455A (en) * 2019-06-18 2019-12-10 东华大学 Online transaction fraud detection method based on entity relationship

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105894038A (en) * 2016-04-22 2016-08-24 天云融创数据科技(北京)有限公司 Credit card fraud prediction method based on signal transmission and link mode
CN107133437A (en) * 2017-03-03 2017-09-05 平安医疗健康管理股份有限公司 The method and device that monitoring medicine is used
CN108596770A (en) * 2017-12-29 2018-09-28 山大地纬软件股份有限公司 Medicare fraud detection device and method based on outlier analysis
CN109523400A (en) * 2018-10-27 2019-03-26 平安医疗健康管理股份有限公司 Medicine method of specifying error and terminal device are taken based on data processing
CN109559236A (en) * 2018-10-27 2019-04-02 平安医疗健康管理股份有限公司 The method and apparatus of drug reimbursement Information abnormity
CN109545316A (en) * 2018-10-30 2019-03-29 平安科技(深圳)有限公司 Purchase the processing method and Related product of medicine data
CN109615547A (en) * 2018-12-13 2019-04-12 平安医疗健康管理股份有限公司 Recognition methods, device, terminal and the computer readable storage medium of abnormal purchase medicine
CN109615540A (en) * 2018-12-13 2019-04-12 平安医疗健康管理股份有限公司 Recognition methods, device, terminal and the computer readable storage medium of medicine are purchased in violation of rules and regulations
CN109636635A (en) * 2018-12-13 2019-04-16 平安医疗健康管理股份有限公司 Recognition methods, device, terminal and the computer readable storage medium write a prescription in violation of rules and regulations
CN109636652A (en) * 2018-12-13 2019-04-16 平安医疗健康管理股份有限公司 Purchase monitoring method, monitoring service end and the storage medium of medicine abnormal behavior
CN110555455A (en) * 2019-06-18 2019-12-10 东华大学 Online transaction fraud detection method based on entity relationship
CN110349004A (en) * 2019-07-02 2019-10-18 北京淇瑀信息科技有限公司 Risk of fraud method for detecting and device based on user node relational network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
有向网络下的CoDA社区发现算法评估;郭松;张冬雯;许云峰;杨玉林;郑雅洁;柳晨光;;河北科技大学学报(02);第169-174页 *
采用群体信息的二部图链接预测方法;蔡小雨;陈可佳;安琛;;计算机工程(10);第187-190页 *

Also Published As

Publication number Publication date
CN111105317A (en) 2020-05-05

Similar Documents

Publication Publication Date Title
CN108960833B (en) Abnormal transaction identification method, equipment and storage medium based on heterogeneous financial characteristics
da Fontoura Costa et al. Hierarchical characterization of complex networks
CN107657536A (en) The recognition methods of social security fraud and device
CN111105317B (en) Medical insurance fraud detection method based on medicine purchasing record
CN104636978B (en) A kind of overlapping community detection method propagated based on multi-tag
CN108960304B (en) Deep learning detection method for network transaction fraud behaviors
CN110795603B (en) Prediction method and device based on tree model
Cui et al. Detecting community structure via the maximal sub-graphs and belonging degrees in complex networks
CN106845521A (en) A kind of block chain node clustering method of Behavior-based control time series
Le et al. An efficient algorithm for hiding high utility sequential patterns
Virmani et al. Clustering in Aggregated User Profiles across Multiple Social Networks.
CN112529415A (en) Article scoring method based on combined multi-receptive-field-map neural network
Nath et al. Uncovering hidden community structures in evolving networks based on neighborhood similarity
CN112085171A (en) Recommendation method based on clustering multi-entity graph neural network
Kwakye et al. Machine learning-based classification algorithms for the prediction of coronary heart diseases
Ayumi et al. A study on medicinal plant leaf recognition using artificial intelligence
CN111460321A (en) Node2 Vec-based overlapped community searching method and equipment
CN115455457B (en) Chain data management method, system and storage medium based on intelligent big data
Samuel et al. Sales Level Analysis Using the Association Method With the Apriori Algorithm
CN109993338B (en) Link prediction method and device
CN106528584B (en) A kind of group recommending method based on ensemble learning
CN114266914A (en) Abnormal behavior detection method and device
CN110223786B (en) Method and system for predicting drug-drug interaction based on nonnegative tensor decomposition
Tibely Criterions for locally dense subgraphs
CN106991584B (en) A kind of electronic commerce credits computational methods based on scoring person's impression

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant