CN111105317A - Medical insurance fraud detection method based on medicine purchase record - Google Patents

Medical insurance fraud detection method based on medicine purchase record Download PDF

Info

Publication number
CN111105317A
CN111105317A CN201911383476.7A CN201911383476A CN111105317A CN 111105317 A CN111105317 A CN 111105317A CN 201911383476 A CN201911383476 A CN 201911383476A CN 111105317 A CN111105317 A CN 111105317A
Authority
CN
China
Prior art keywords
chain
medicine
patient
abnormal
normal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911383476.7A
Other languages
Chinese (zh)
Other versions
CN111105317B (en
Inventor
孙佰清
鲍鑫
王天辰
高稳
王思霖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Institute of Technology
Original Assignee
Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute of Technology filed Critical Harbin Institute of Technology
Priority to CN201911383476.7A priority Critical patent/CN111105317B/en
Publication of CN111105317A publication Critical patent/CN111105317A/en
Application granted granted Critical
Publication of CN111105317B publication Critical patent/CN111105317B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/08Insurance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/018Certifying business or products
    • G06Q30/0185Product, service or business identity fraud

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • General Physics & Mathematics (AREA)
  • Finance (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Technology Law (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention discloses a medical insurance fraud detection method based on a medicine purchase record, belongs to the field of medicine fraud detection methods, and provides the medical insurance fraud detection method based on the medicine purchase record, which can accurately extract medical insurance fraud information, is convenient and fast to operate, and has strong applicability. In the invention, a fraudster classification model is constructed through a machine learning algorithm; inputting patient information and medicine purchasing information into the model, and establishing a patient-medicine bipartite graph; establishing a single mode projection drawing of the medicine according to the patient-medicine bipartite drawing to form a medicine chain; dividing the medicine chain into a normal chain and an abnormal chain by using an associated chain algorithm; respectively calculating the similarity of the normal chain and the abnormal chain through a cosine similarity formula; keeping the comparison combination of the abnormal chain and the normal chain with the similarity not being 0; removing the same products in the abnormal chain and the normal chain in the combination, and reserving other medicines; and synthesizing the residual medicines into a fraud chain, and outputting the fraud chain. The invention is mainly used for detecting the fraud of the fraudulent patient.

Description

Medical insurance fraud detection method based on medicine purchase record
Technical Field
The invention belongs to the field of medicine fraud detection methods, and particularly relates to a medical insurance fraud detection method.
Background
In recent years, China has attracted much attention in fighting against medical insurance fraud, and particularly, the first action is a special action for anti-fraud of medical insurance after the national medical insurance supervision and administration bureau (newly established government agency specially managing medical insurance fund) is established. However, fraud detection is still highly valued at present relying on the reporting of "hotlines" and social media platforms, as well as domain expert case studies. Therefore, for the regulatory agencies, developing an effective and automatic data-driven model will help to improve the efficiency of medical reformulation and better serve the civilians. Although fraud is not common, medical insurance fraud events often correspond to abnormal drug purchase records, while medical insurance fraud cases often have the following characteristics:
(1) not commonly seen: fraudulent events are rare but costly, so the number distribution between normal patients and fraudsters is highly unbalanced.
(2) And (3) knowledge sharing: fraudsters are often affected by their allies and contacts, and thus others. In the medical procurement behavior pattern, fraudulent knowledge is transferred and occurs.
(3) Behavioral simulation: fraudulent patients may also mimic the normal participant's buying behavior to mask their fraudulent goals in an effort to make their own buying behavior look "normal".
Therefore, a medical insurance fraud detection method based on a medicine purchasing record, which can accurately extract medical insurance fraud information, is convenient to operate and has strong applicability, is needed.
Disclosure of Invention
The invention provides a medical insurance fraud detection method based on a medicine purchasing record, which can accurately extract medical insurance fraud information, is convenient to operate and has strong applicability, aiming at the problems that the existing medical insurance fraud modes are various, the fraud information cannot be accurately determined, and the fraud information is fussy to extract manually.
The invention relates to a medical insurance fraud detection method based on a medicine purchase record, which has the technical scheme as follows:
the invention relates to a medical insurance fraud detection method based on a medicine purchase record, which comprises the following steps:
step S1, establishing a fraud classification model through a machine learning algorithm;
step S2, inputting patient information and medicine purchasing information into the model, and establishing a patient-medicine bipartite graph, wherein the patient information comprises normal patients and fraudulent patients;
step S3, establishing a drug single mode projection relation according to the patient-drug bipartite graph to form a drug chain;
step S4, dividing the medicine chain in the step S3 into a normal chain and an abnormal chain by using an associated chain algorithm;
step S5, calculating the similarity of the normal chain and the abnormal chain respectively through a cosine similarity formula;
step S6, removing the normal chain with the similarity of 0, and keeping the comparison combination of the abnormal chain with the similarity of not 0 and the normal chain;
step S7, removing the same products in the abnormal chain and the normal chain in the combination, and reserving other medicines;
and step S8, synthesizing the residual medicines into a fraud chain, and outputting the fraud chain.
Further: in step S1, patient information is integrated, a feature vector of the patient information is extracted by using a machine learning algorithm, an information amount IV of each feature is calculated for the feature vector by using a supervised screening algorithm, and a feature whose information amount IV is greater than that is extracted and put into the machine learning algorithm, so as to obtain a fraud classification model.
Further: in step S2, the medicine purchasing information of the normal patient and the medicine purchasing information of the fraudulent patient with fraudulent conduct are set as a patient node and a medicine node, and a medicine-patient undirected bipartite graph of the fraudulent patient and a medicine-patient undirected bipartite graph of the normal patient are respectively constructed; and performing a first round of derivative feature extraction on the patient-medicine bipartite graph, wherein the first round of extracted features comprise the total amount of the types of used medicines and the total amount of the used medicines, and establishing a medicine single mode projection relation according to the derivative features.
Further: in step S3, a second round of derivative feature extraction is performed on the abnormal chain, where the second round of extracted features includes a category abnormal rate, a quantity abnormal rate, and an abnormal drug usage rate in the abnormal chain.
Further: in step S4, the associated chain algorithm specifically includes: and sorting the corresponding matrixes of the bipartite graph according to the side weights, starting from the medicine combination corresponding to the highest side weight as the initial medicine combination in the abnormal chain, further searching the medicines connected with the next highest side weight in the combined medicines, sequentially searching, connecting the medicine chains in series, inputting the side weight adjacent matrix, and outputting one chain.
Further: in step S5, the cosine similarity formula is
Figure BDA0002342873980000021
Wherein a, b and c are normal chains or abnormal chains respectively.
Further: in step S8, a third round of derivative feature extraction is performed on the synthesized fraud chain, where the extracted features include a category anomaly rate, a quantity anomaly rate, and an abnormal drug usage rate in the anomaly chain.
The invention relates to a medical insurance fraud detection method based on a medicine purchase record, which has the beneficial effects that:
the invention relates to a medical insurance fraud detection method based on a medicine purchase record, which utilizes a bipartite graph and a single-mode projection relation derived from the bipartite graph to extract fraud mode transfer and hidden medicine purchase targets by using an associated chain algorithm, has the advantages of rapidness and accuracy in the aspect of business logic and is convenient to apply; meanwhile, the extraction of the fraud chain can help a monitoring organization to establish a monitoring rule for avoiding fraudulent activities and prevent malicious fraudulent activities of fraudulent patients. The medical insurance fraud detection method analyzes the medicine purchasing records in the medical insurance data, utilizes the graph theory algorithm to construct effective derivative characteristics, has high fraud judgment accuracy and can effectively detect variable medical insurance fraud modes.
Drawings
FIG. 1 is a flow chart of a medical insurance fraud detection method based on a record of purchasing a medicine according to the present invention;
FIG. 2 is a flow chart of a medical insurance fraud detection method based on a drug purchase record according to example 2;
FIG. 3 is a bipartite graph of the drug product of a normal patient in example 2;
FIG. 4 is a bipartite graph of a drug for a fraudulent patient of example 2.
Detailed Description
The technical solutions of the present invention are further described below with reference to the following examples, but the present invention is not limited thereto, and any modifications or equivalent substitutions may be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.
Example 1
The embodiment is described with reference to fig. 1, in this embodiment, the method for detecting fraud in medical insurance based on a medical record of purchase of a medicine according to the embodiment includes the following steps:
step S1, establishing a fraud classification model through a machine learning algorithm; integrating patient information, extracting a feature vector of the patient information by adopting a machine learning algorithm, calculating the information quantity IV of each feature by using a supervision and screening algorithm for the feature vector, extracting the feature with the information quantity IV larger than that, and putting the feature into the machine learning algorithm to obtain a fraud classification model. And completing the construction of a fraud detection model by using a machine learning algorithm. Integrating the patient's corresponding feature vector with the patient's corresponding fraud marker Y as { X1,X2,X3,X4,...,X3q+2,X3q+3,X3q+4,...,X3q+3r+2And Y, calculating the information quantity IV of each feature by using a supervised feature screening algorithm for the features, extracting the features with the information quantity IV larger than 0.05, and putting the features into a machine learning algorithm to obtain an effective classification model aiming at the cheater.
Step S2, inputting patient information and medicine purchasing information into the model, and establishing a patient-medicine bipartite graph, wherein the patient information comprises normal patients and fraudulent patients; setting the medicine purchasing information of the normal patient and the medicine purchasing information of the fraudulent patient with the fraudulent conduct as a patient node and a medicine node, and respectively constructing a medicine-patient undirected bipartite graph of the fraudulent patient and a medicine-patient undirected bipartite graph of the normal patient; and performing a first round of derivative feature extraction on the patient-medicine bipartite graph, wherein the first round of extracted features comprise the total amount of the types of used medicines and the total amount of the used medicines, and establishing a medicine single mode projection relation according to the derivative features.
And dividing the patients with the fraudulent behaviors and the patients with normal performances from the training data, wherein each patient corresponds to a plurality of medication records. The method comprises the steps of sorting a plurality of medication records of the same patient in the records, setting the patient and the medicine into two types of different nodes based on a graph theory, and respectively constructing a medicine-patient undirected bipartite graph of a cheating patient and a medicine-patient undirected bipartite graph of a normal patient by linking the node (IDj) of the patient and the node (Dk) of the medicine of the medication record. For the constructed medicine of the cheating patient-patient undirected bipartite graph and the medicine of the normal patient-patient undirected bipartite graph, the first round of characteristics are refined for the medicine purchasing behavior of each patient: (1) x1: the total amount of the drug species (2) X was used2: the total amount of the medicine is used;
respectively deducing the corresponding medicine single-mode projection relation of the medicine-patient bipartite graph of the cheating patient and the medicine-patient bipartite graph of the normal patient, and representing the medicine purchasing behavior of the cheating patient as an abnormal chain by using an associated chain algorithm; the normal patient's prescription behavior is indicated as the normal chain.
The single-mode projection algorithm of the bipartite graph is mainly used for researching the relationship among nodes of the same type, and the nodes of the same type are directly related and clustered by utilizing the characteristic that the nodes of the same type in the bipartite graph are connected through the nodes of the other type, so that a continuously-increased network graph containing the nodes of the single type is generated. The single mode projection relationship relies on the bipartite graph. Firstly, the researched drug nodes (Dk) are added into a new network one by one, the other drug nodes connected with the drug nodes are searched in the bipartite graph from low to high according to the node side weights by taking the added drug nodes as a starting point, and the drug nodes obtained in the searching process are connected with the starting point node. The process is repeated continuously until all the drug nodes are connected to form a new single mode projection relationship. And processing the two projection relations by using an associated chain algorithm to obtain an abnormal chain and a normal chain corresponding to the patient behavior.
Step S3, establishing a drug single mode projection relation according to the patient-drug bipartite graph to form a drug chain; performing a second round of derivative feature extraction on the abnormal chain, wherein the second round of extracted features comprise a class abnormal rate, a quantity abnormal rate and an abnormal drug utilization rate in the abnormal chain; and deriving the behavior characteristics of the second round of patients by taking the abnormal chain corresponding to the behavior of the cheating patient as a reference, and extracting the following three characteristics for each patient respectively for each abnormal chain: (1) the ratio of the same number of the types of the medicines taken by each patient as the types of the medicines taken by the fraudulent chain to the total amount of the types of the medicines taken by each patient; (2) the ratio of the total amount of medication used by each patient in the fraud chain to the total amount of medication used by each patient; (3) the ratio of the total amount of drug used in the fraud chain to the type of drug used in the exception chain for each patient.
Step S4, dividing the medicine chain in the step S3 into a normal chain and an abnormal chain by using an associated chain algorithm; the associated chain algorithm specifically comprises the following steps: and sorting the corresponding matrixes of the bipartite graph according to the side weights, starting from the medicine combination corresponding to the highest side weight as the initial medicine combination in the abnormal chain, further searching the medicines connected with the next highest side weight in the combined medicines, sequentially searching, connecting the medicine chains in series, inputting the side weight adjacent matrix, and outputting one chain.
Step S5, calculating the similarity of the normal chain and the abnormal chain respectively through a cosine similarity formula; the cosine similarity formula is
Figure BDA0002342873980000041
Wherein a, b and c are normal chains or abnormal chains respectively.
Step S6, removing the normal chain with the similarity of 0, and keeping the comparison combination of the abnormal chain with the similarity of not 0 and the normal chain; and calculating the cosine similarity of the abnormal chain corresponding to each fraudulent patient behavior and the normal chain corresponding to each normal patient behavior, and removing the comparison combination of the abnormal chain and the normal chain with the similarity of 0. And (4) keeping other comparison combinations, removing the medicines which are same as the normal chains on the abnormal chains in the combinations, and combining the residual medicines on the abnormal chains into a fraud chain.
Step S7, removing the same products in the abnormal chain and the normal chain in the combination, and reserving other medicines;
step S8, synthesizing the residual medicines into a fraud chain, and performing a third round of derivative feature extraction on the synthesized fraud chain, wherein the third round of extracted features comprises the type exception rate, the quantity exception rate and the abnormal medicine utilization rate in the exception chain; and outputting the fraud chain.
Deriving a third round of patient behavior characteristics with reference to the fraud chain, and for each fraud chain, extracting the following three characteristics for each patient: (1) the ratio of the same number of the types of the medicines taken by each patient as the types of the medicines taken by the fraudulent chain to the total amount of the types of the medicines taken by each patient; (2) the ratio of the total amount of medication used by each patient in the fraud chain to the total amount of medication used by each patient; (3) the ratio of the total amount of drugs in the fraud chain used by each patient to the types of drugs in the fraud chain used by that patient.
Example 2
In the embodiment, a medical insurance fraud detection method based on a medicine purchase record according to the embodiment is described with reference to fig. 2, fig. 3, fig. 4 and embodiment 1, and a medicine bipartite graph corresponding to a patient, i.e., a medicine bipartite graph, is established for a fraudulent patient and a normal patient respectively by taking medical insurance medicine purchase data published in a certain market as a case and corresponding to a detection method of medical insurance fraud behavior.
Dividing the patients with fraud behaviors and the patients with normal performances from the training data, wherein each patient corresponds to a plurality of medication records, and the total number of the medication records is 1368148 for 15000 people. The method comprises the steps of sorting a plurality of medication records of the same patient in the records, setting the patient and the medicine into two types of different nodes based on a graph theory, and respectively constructing a medicine-patient undirected bipartite graph of a cheating patient and a medicine-patient undirected bipartite graph of a normal patient by linking the node (IDj) of the patient and the node (Dk) of the medicine of the medication record. In the undirected bipartite graph constructed by medication records, patient nodes can only be connected with each other through drug nodes and cannot be directly connected with each other; the drug nodes can also be interconnected only by patient nodes and not directly. And the medicine node and the patient node only have a simple purchasing relationship, so the undirected bipartite graph is used for finishing the representation of the medicine purchasing behavior. The side rights in the undivided bipartite graph are the number of purchases of the corresponding drug by the patient.
For the constructed medicine of the cheating patient-patient undirected bipartite graph and the medicine of the normal patient-patient undirected bipartite graph, the first round of characteristics are refined for the medicine purchasing behavior of each patient: (1) x1: total amount of drug species, x1,jRepresenting the number of drug nodes connected by the patient node j; (2) x2: total amount of drug used, x2,jRepresenting the sum of the weights of the drug nodes connected by patient node j.
Respectively deducing the corresponding medicine single-mode projection relation of the medicine-patient bipartite graph of the cheating patient and the medicine-patient bipartite graph of the normal patient, and representing the medicine purchasing behavior of the cheating patient as an abnormal chain by using an associated chain algorithm; the normal patient's prescription behavior is indicated as the normal chain.
The single-mode projection algorithm of the bipartite graph is mainly used for researching the relationship among nodes of the same type, and the nodes of the same type are directly related and clustered by utilizing the characteristic that the nodes of the same type in the bipartite graph are connected through the nodes of the other type, so that a continuously-increased network graph containing the nodes of the single type is generated. The single mode projection relationship relies on the bipartite graph. Firstly, the researched drug nodes (Dk) are added into a new network one by one, the other drug nodes connected with the drug nodes are searched in the bipartite graph from low to high according to the node side weights by taking the added drug nodes as a starting point, and the drug nodes obtained in the searching process are connected with the starting point node. The process is repeated continuously until all the drug nodes are connected to form a new single mode projection relationship. Finally, matrix representation is carried out on the newly formed single mode projection relation, the edge weight corresponding to the edge formed by node connection is the number of patients using two types of medicines together, and the corresponding matrix form is as follows:
Figure BDA0002342873980000061
wherein m is the number of drugs, and D is used simultaneously1Drugs and D2The number of patients of the drug is p12
Accordingly, the medicine single-mode projection relation corresponding to the medicine-patient bipartite graph of the cheating patient and the medicine-patient bipartite graph of the normal patient is obtained, and the two projection relations are processed by using an associated chain algorithm to obtain an abnormal chain and a normal chain corresponding to the patient behavior.
Taking the process of processing the medicine-patient bipartite graph of the cheating patient to obtain the abnormal chain as an example, firstly, the matrixes corresponding to the bipartite graph of the cheating patient are sorted according to the side weight, and the medicine combination corresponding to the highest side weight is used as the initial medicine combination corresponding to the abnormal chain, and then the medicines connected with the next highest side weight respectively corresponding to the combined traditional Chinese medicines are further searched. If the two medicines exist, the searched medicines are respectively connected to the two sides of the abnormal chain starting medicine combination according to the corresponding relation until the positions of the connected medicines corresponding to the second highest side weights of the medicines on the two sides of the chain cannot be searched repeatedly. And continuously repeating the steps until all the medicines in the single-mode projection relation are traversed, and finally obtaining the abnormal chain combination which bears the information of the cheating patient and does not contain the repeated medicines.
The processing method of the drug-patient single-mode projection relation of the normal patient is consistent with the method, and finally the normal chain combination which bears the normal patient information and does not contain repeated drugs is obtained.
And deriving the behavior characteristics of the second round of patients by taking the abnormal chain corresponding to the behavior of the cheating patient as a reference, and extracting the following three characteristics for each patient respectively for each abnormal chain: (1) the ratio of the same number of the types of the medicines taken by each patient as the types of the medicines taken by the fraudulent chain to the total amount of the types of the medicines taken by each patient; (2) the ratio of the total amount of medication used by each patient in the fraud chain to the total amount of medication used by each patient; (3) the ratio of the total amount of drug used in the fraud chain to the type of drug used in the exception chain for each patient. If the number of the acquired abnormal chains is q, corresponding to each patientQ × 3 derived features, labeled X, can be obtained3,X4,...,X3q+2
And calculating the cosine similarity of the abnormal chain corresponding to each fraudulent patient behavior and the normal chain corresponding to each normal patient behavior, and removing the comparison combination of the abnormal chain and the normal chain with the similarity of 0. And (4) keeping other comparison combinations, removing the medicines which are same as the normal chains on the abnormal chains in the combinations, and combining the residual medicines on the abnormal chains into a fraud chain. And performing the operation on each group of abnormal chains and normal chains with similarity different from 0, and solving corresponding fraud chains.
Deriving a third round of patient behavior characteristics with reference to the fraud chain, and for each fraud chain, extracting the following three characteristics for each patient: (1) the ratio of the same number of the types of the medicines taken by each patient as the types of the medicines taken by the fraudulent chain to the total amount of the types of the medicines taken by each patient; (2) the ratio of the total amount of medication used by each patient in the fraud chain to the total amount of medication used by each patient; (3) the ratio of the total amount of drugs in the fraud chain used by each patient to the types of drugs in the fraud chain used by that patient. If the number of the obtained fraud chains is r, r multiplied by 3 derived features can be obtained corresponding to each patient and are marked as X3q+3,X4q+4,...,X3q+3r+2
And completing the construction of a fraud detection model by using a machine learning algorithm.
In summary, the fraud flags Y corresponding to the patient and the feature vector corresponding to the patient are integrated into { X }1,X2,X3,X4,...,X3q+2,X3q+3,X3q+4,...,X3q+3r+2Y, calculating the information quantity IV of each feature by using a supervised feature screening algorithm for the features, extracting the features with the information quantity IV greater than 0.05, and putting the extracted features into a machine learning model, which takes a logistic regression algorithm as an example in the embodiment. Combining algorithm: the classification processing method under the R language aims at classifying the information quantity of a data set, removing the matching of a feature vector with excessively low information quantity and a fraud mark, firstly, performing ten-fold cross validation on the data set, and establishing a classification coefficient by using a logistic regression algorithm aiming at training set dataThe convex optimization target is iteratively updated by using a gradient descent method, ROC and AUC are used as evaluation variables of model performance, the comparison result is shown in the following table 1, and a classification coefficient vector t with the best relative performance is obtained, wherein,
t=[t0,t1,t2,t3,t4,...,t3q+2,t3q+3,t3q+4,...,t3q+3r+2]T
1 2 3 4 5 6 7 8 9 10
training AUC 0.86 0.86 0.82 0.85 0.85 0.86 0.86 0.86 0.86 0.85
Measurement of AUC 0.84 0.79 0.8 0.78 0.82 0.8 0.78 0.78 0.81 0.86
TABLE 1
According to the classification coefficient vector t, the fraud probability is
Figure BDA0002342873980000071
Obtaining SjAnd calculating the fraud probability corresponding to the patient by using a logistic function to finish the judgment on whether the patient is a fraudulent patient. There are many ways to implement the establishment of a new credit model. In order to obtain a feasible credit scoring model, the attribute set of the new credit model in this embodiment is a subset of the feasible domain of the attribute set. The algorithm used by the new credit scoring model is further determined based on the attribute set properties. Currently, there are many types of algorithms that can be applied to generate credit scoring models. For example: based on logistic regression, based on random forests, based on GBDT, etc. In this embodiment, the algorithm screening includes using a new algorithm after algorithm fusion, and according to the algorithmThe following strategy implements the algorithm preferences. Logistic regression: the probability of an event can be obtained by studying the relationship between the probability of the event and a plurality of factors. When the probability is greater than 0.5. It is considered to occur, and when it is less than 0.5, it is considered not to occur.
And (3) associating a chain algorithm: and sorting the corresponding matrixes of the bipartite graph according to the edge weights (occurrence frequency of the medicine combinations), starting from the medicine combination corresponding to the highest edge weight to serve as an initial medicine combination in the abnormal chain, and further searching the medicines connected with the next highest edge weight in the combined medicines. And searching in sequence, and connecting drug chains in series. The edge weight adjacency matrix is input, and a chain is output. As shown in the two-part diagram of fig. 3 and 4, the medicine is a normal patient with EPM-butyl, an abnormal patient with HEG-octyl-nonyl, and a, b, c, d, e and f.
Single mode projection relationship
Figure BDA0002342873980000081
Edge-weighted adjacency matrix schematic relationship:
head (medicine interlink head) Tail (ending of medicine link) Right (medicine interlinking frequency)
a medicine b medicine 2
a medicine c medicine 1
a medicine d medicine 0
b medicine c medicine 4
b medicine d medicine 1
c medicine d medicine 1
And (3) sorting the edge weight matrix of the bipartite graph according to the height of the edge weights, wherein the occurrence frequency of the medicine combination is 2 times a-b, 2 times a-c, 0 times a-d, 4 times b-c, 1 time b-d and 1 time c-d, starting from the medicine combination corresponding to the highest edge weight, namely b-c is used as the initial medicine combination in the abnormal chain, further searching the medicines connected with the next highest edge weight in the combined medicines, namely a-b is twice, and b-d is once, so that a-b is taken and sequentially searched, medicine chains are connected in series, and a-b-c-d, namely a normal chain, is output. Namely, the associated chain algorithm outputs a normal chain which is arranged according to the height of the edge weight.
Edge-weighted adjacency matrix schematic relationship:
Figure BDA0002342873980000082
Figure BDA0002342873980000091
and (3) sorting the edge weight matrix of the bipartite graph according to the height of the edge weights, wherein the occurrence frequency of the medicine combination is b-c 4 times, b-e 1 time, b-f 1 time, c-e 3 times, c-f 2 times, and e-f 2 times, starting from the medicine combination corresponding to the highest edge weight, namely b-c is used as the initial medicine combination in the abnormal chain, further searching the medicines connected with the next highest edge weight in the combined medicines, namely c-f 2 times, and c-e 3 times, so that c-f is taken and sequentially searched, the medicine chains are connected in series, and b-c-e-f is output, namely an abnormal chain. Namely, the associative chain algorithm outputs an abnormal chain which is arranged according to the height of the edge weight.
Since the cosine similarity after vectorization is not 0. Thus, after removing the same drugs b, c in the abnormal chain as in the normal chain, the fraudulent chain e-f is formed.
Cosine similarity definition: cosine similarity, also called cosine similarity, is to evaluate the similarity of two vectors by calculating the cosine value of their included angle. And the cosine similarity draws the vector into a vector space according to the coordinate value. The cosine value of the included angle between the two vectors in the vector space is used as the measure of the difference between the two individuals. The cosine value is closer to 1, which indicates that the included angle is closer to 0 degree, namely the two vectors are more similar, otherwise, the closer to 0, which indicates that the similarity of the two vectors is lower, which is called the cosine similarity.
The formula:
Figure BDA0002342873980000092
wherein a, b and c are normal chains and abnormal chains;
wherein the a vector is [ x1,y1]The b vector is [ x ]2,y2]The a vector is the vectorization of the normal chain, and the b vector is the vectorization of the abnormal chain. Thereby removing the vector with the similarity of 0.
Figure BDA0002342873980000093

Claims (7)

1. A medical insurance fraud detection method based on a medicine purchase record is characterized by comprising the following steps:
step S1, establishing a fraud classification model through a machine learning algorithm;
step S2, inputting patient information and medicine purchasing information into the model, and establishing a patient-medicine bipartite graph, wherein the patient information comprises normal patients and fraudulent patients;
step S3, establishing a drug single mode projection relation according to the patient-drug bipartite graph to form a drug chain;
step S4, dividing the medicine chain in the step S3 into a normal chain and an abnormal chain by using an associated chain algorithm;
step S5, calculating the similarity of the normal chain and the abnormal chain respectively through a cosine similarity formula;
step S6, removing the normal chain with the similarity of 0, and keeping the comparison combination of the abnormal chain with the similarity of not 0 and the normal chain;
step S7, removing the same products in the abnormal chain and the normal chain in the combination, and reserving other medicines;
and step S8, synthesizing the residual medicines into a fraud chain, and outputting the fraud chain.
2. The medical insurance fraud detection method based on medicine purchase record of claim 1, wherein in step S1, patient information is integrated, a feature vector of the patient information is extracted by using a machine learning algorithm, an information quantity IV of each feature is calculated by using a supervised screening algorithm for the feature vector, and a feature with the information quantity IV greater than that is extracted and put into the machine learning algorithm to obtain a fraud classification model.
3. The medical insurance fraud detection method based on medicine purchasing records of claim 1 or 2, wherein in step S2, the medicine purchasing information of normal patients and the medicine purchasing information of fraudulent patients with fraudulent behaviors are set as patient nodes and medicine nodes, and a medicine-patient undirected bipartite graph of fraudulent patients and a medicine-patient undirected bipartite graph of normal patients are respectively constructed; and performing a first round of derivative feature extraction on the patient-medicine bipartite graph, wherein the first round of extracted features comprise the total amount of the types of used medicines and the total amount of the used medicines, and establishing a medicine single mode projection relation according to the derivative features.
4. The medical insurance fraud detection method based on medicine purchase record according to claim 3, characterized in that in step S3, a second round of derivative feature extraction is performed on the abnormal chain, and the features extracted in the second round comprise type abnormal rate, quantity abnormal rate and abnormal drug usage rate in the abnormal chain.
5. The medical insurance fraud detection method based on medicine purchase record according to claim 1 or 2, characterized in that in step S4, the associative chain algorithm is specifically: and sorting the corresponding matrixes of the bipartite graph according to the side weights, starting from the medicine combination corresponding to the highest side weight as the initial medicine combination in the abnormal chain, further searching the medicines connected with the next highest side weight in the combined medicines, sequentially searching, connecting the medicine chains in series, inputting the side weight adjacent matrix, and outputting one chain.
6. The method for detecting medical insurance fraud based on medicine purchasing record as claimed in claim 5, wherein in step S5, the cosine similarity formula is
Figure FDA0002342873970000011
Wherein a, b and c are normal chains or abnormal chains respectively.
7. The medical insurance fraud detection method based on medicine purchase record of claim 4, wherein in step S8, a third round of derivative feature extraction is performed on the synthesized fraud chain, and the extracted features of the third round include type anomaly rate, quantity anomaly rate and abnormal drug usage rate in the anomaly chain.
CN201911383476.7A 2019-12-28 2019-12-28 Medical insurance fraud detection method based on medicine purchasing record Active CN111105317B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911383476.7A CN111105317B (en) 2019-12-28 2019-12-28 Medical insurance fraud detection method based on medicine purchasing record

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911383476.7A CN111105317B (en) 2019-12-28 2019-12-28 Medical insurance fraud detection method based on medicine purchasing record

Publications (2)

Publication Number Publication Date
CN111105317A true CN111105317A (en) 2020-05-05
CN111105317B CN111105317B (en) 2023-05-12

Family

ID=70423707

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911383476.7A Active CN111105317B (en) 2019-12-28 2019-12-28 Medical insurance fraud detection method based on medicine purchasing record

Country Status (1)

Country Link
CN (1) CN111105317B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112991075A (en) * 2021-02-04 2021-06-18 浙江大学山东工业技术研究院 Package type medicine purchasing abnormity detection method based on FP-growth and graph network
CN113592517A (en) * 2021-08-09 2021-11-02 深圳前海微众银行股份有限公司 Method and device for identifying cheating passenger groups, terminal equipment and computer storage medium

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105894038A (en) * 2016-04-22 2016-08-24 天云融创数据科技(北京)有限公司 Credit card fraud prediction method based on signal transmission and link mode
CN107133437A (en) * 2017-03-03 2017-09-05 平安医疗健康管理股份有限公司 The method and device that monitoring medicine is used
CN108596770A (en) * 2017-12-29 2018-09-28 山大地纬软件股份有限公司 Medicare fraud detection device and method based on outlier analysis
CN109523400A (en) * 2018-10-27 2019-03-26 平安医疗健康管理股份有限公司 Medicine method of specifying error and terminal device are taken based on data processing
CN109545316A (en) * 2018-10-30 2019-03-29 平安科技(深圳)有限公司 Purchase the processing method and Related product of medicine data
CN109559236A (en) * 2018-10-27 2019-04-02 平安医疗健康管理股份有限公司 The method and apparatus of drug reimbursement Information abnormity
CN109615547A (en) * 2018-12-13 2019-04-12 平安医疗健康管理股份有限公司 Recognition methods, device, terminal and the computer readable storage medium of abnormal purchase medicine
CN109615540A (en) * 2018-12-13 2019-04-12 平安医疗健康管理股份有限公司 Recognition methods, device, terminal and the computer readable storage medium of medicine are purchased in violation of rules and regulations
CN109636652A (en) * 2018-12-13 2019-04-16 平安医疗健康管理股份有限公司 Purchase monitoring method, monitoring service end and the storage medium of medicine abnormal behavior
CN109636635A (en) * 2018-12-13 2019-04-16 平安医疗健康管理股份有限公司 Recognition methods, device, terminal and the computer readable storage medium write a prescription in violation of rules and regulations
CN110349004A (en) * 2019-07-02 2019-10-18 北京淇瑀信息科技有限公司 Risk of fraud method for detecting and device based on user node relational network
CN110555455A (en) * 2019-06-18 2019-12-10 东华大学 Online transaction fraud detection method based on entity relationship

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105894038A (en) * 2016-04-22 2016-08-24 天云融创数据科技(北京)有限公司 Credit card fraud prediction method based on signal transmission and link mode
CN107133437A (en) * 2017-03-03 2017-09-05 平安医疗健康管理股份有限公司 The method and device that monitoring medicine is used
CN108596770A (en) * 2017-12-29 2018-09-28 山大地纬软件股份有限公司 Medicare fraud detection device and method based on outlier analysis
CN109523400A (en) * 2018-10-27 2019-03-26 平安医疗健康管理股份有限公司 Medicine method of specifying error and terminal device are taken based on data processing
CN109559236A (en) * 2018-10-27 2019-04-02 平安医疗健康管理股份有限公司 The method and apparatus of drug reimbursement Information abnormity
CN109545316A (en) * 2018-10-30 2019-03-29 平安科技(深圳)有限公司 Purchase the processing method and Related product of medicine data
CN109615547A (en) * 2018-12-13 2019-04-12 平安医疗健康管理股份有限公司 Recognition methods, device, terminal and the computer readable storage medium of abnormal purchase medicine
CN109615540A (en) * 2018-12-13 2019-04-12 平安医疗健康管理股份有限公司 Recognition methods, device, terminal and the computer readable storage medium of medicine are purchased in violation of rules and regulations
CN109636652A (en) * 2018-12-13 2019-04-16 平安医疗健康管理股份有限公司 Purchase monitoring method, monitoring service end and the storage medium of medicine abnormal behavior
CN109636635A (en) * 2018-12-13 2019-04-16 平安医疗健康管理股份有限公司 Recognition methods, device, terminal and the computer readable storage medium write a prescription in violation of rules and regulations
CN110555455A (en) * 2019-06-18 2019-12-10 东华大学 Online transaction fraud detection method based on entity relationship
CN110349004A (en) * 2019-07-02 2019-10-18 北京淇瑀信息科技有限公司 Risk of fraud method for detecting and device based on user node relational network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
蔡小雨;陈可佳;安琛;: "采用群体信息的二部图链接预测方法", 计算机工程 *
郭松;张冬雯;许云峰;杨玉林;郑雅洁;柳晨光;: "有向网络下的CoDA社区发现算法评估", 河北科技大学学报 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112991075A (en) * 2021-02-04 2021-06-18 浙江大学山东工业技术研究院 Package type medicine purchasing abnormity detection method based on FP-growth and graph network
CN113592517A (en) * 2021-08-09 2021-11-02 深圳前海微众银行股份有限公司 Method and device for identifying cheating passenger groups, terminal equipment and computer storage medium

Also Published As

Publication number Publication date
CN111105317B (en) 2023-05-12

Similar Documents

Publication Publication Date Title
CN108960833B (en) Abnormal transaction identification method, equipment and storage medium based on heterogeneous financial characteristics
Van Vlasselaer et al. Gotcha! network-based fraud detection for social security fraud
CN107657536A (en) The recognition methods of social security fraud and device
CN110084610A (en) A kind of network trading fraud detection system based on twin neural network
CN110415107B (en) Data processing method, data processing device, storage medium and electronic equipment
TWI752349B (en) Risk identification method and device
CN105095238A (en) Decision tree generation method used for detecting fraudulent trade
CN108648038B (en) Credit frying and malicious evaluation identification method based on subgraph mining
CN109472626B (en) Intelligent financial risk control method and system for mobile phone leasing service
CN110998608A (en) Machine learning system for various computer applications
Cui et al. Detecting community structure via the maximal sub-graphs and belonging degrees in complex networks
CN110992059B (en) Surrounding string behavior recognition analysis method based on big data
CN106845521A (en) A kind of block chain node clustering method of Behavior-based control time series
CN108960304A (en) A kind of deep learning detection method of network trading fraud
CN111105317A (en) Medical insurance fraud detection method based on medicine purchase record
CN106529110A (en) Classification method and equipment of user data
CN109934615A (en) Product marketing method based on depth sparse network
CN112927072A (en) Block chain-based anti-money laundering arbitration method, system and related device
CN110119980A (en) A kind of anti-fraud method, apparatus, system and recording medium for credit
Lo et al. Mining direct antagonistic communities in signed social networks
CN111275480B (en) Multi-dimensional sparse sales data warehouse oriented fraud behavior mining method
CN112529415A (en) Article scoring method based on combined multi-receptive-field-map neural network
CN110223786B (en) Method and system for predicting drug-drug interaction based on nonnegative tensor decomposition
CN115455457B (en) Chain data management method, system and storage medium based on intelligent big data
CN109472694A (en) A kind of suspicious trading activity discovery system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant