CN108596770A - Medicare fraud detection device and method based on outlier analysis - Google Patents

Medicare fraud detection device and method based on outlier analysis Download PDF

Info

Publication number
CN108596770A
CN108596770A CN201711471001.4A CN201711471001A CN108596770A CN 108596770 A CN108596770 A CN 108596770A CN 201711471001 A CN201711471001 A CN 201711471001A CN 108596770 A CN108596770 A CN 108596770A
Authority
CN
China
Prior art keywords
patient
outlier
data
score
similarity score
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711471001.4A
Other languages
Chinese (zh)
Other versions
CN108596770B (en
Inventor
王新军
闫中敏
陈志勇
姜诚
于杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
DAREWAY SOFTWARE Co Ltd
Original Assignee
DAREWAY SOFTWARE Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by DAREWAY SOFTWARE Co Ltd filed Critical DAREWAY SOFTWARE Co Ltd
Priority to CN201711471001.4A priority Critical patent/CN108596770B/en
Publication of CN108596770A publication Critical patent/CN108596770A/en
Application granted granted Critical
Publication of CN108596770B publication Critical patent/CN108596770B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/08Insurance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2433Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Finance (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Accounting & Taxation (AREA)
  • Evolutionary Biology (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • General Business, Economics & Management (AREA)
  • Medical Treatment And Welfare Office Work (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

The present invention provides a kind of Medicare fraud detection devices and detection method based on outlier analysis, wherein the Medicare fraud detection device based on outlier analysis, including:Medical insurance data acquisition module;Medical insurance data preprocessing module;Similarity score computing module;Outlier detection module;Patient's fraud detection module.The present invention improves existing outlier analysis method by data prediction, and has adjusted the similarity score of each patient so as to fit medical insurance field.Based on this method, by the similarity score for calculating patient, in such a way that similitude and outlier are combined, it is that each patient calculates a similarity score using similitude, then outlier analysis is carried out, by statistics, it can be found that the distribution of similarity score is similar to normal distribution, using threshold value in such a way that confidence interval is combined, to determine the flexible critical value of detection outlier.

Description

Medicare fraud detection device and method based on outlier analysis
Technical field
It is detected the invention belongs to medical insurance field more particularly to a kind of Medicare fraud based on outlier analysis Method.
Background technology
For Medicare fraud, the anti-definition for cheating association (NHCAA) of National health care is:" personal or group Deliberate deception or the statement of falseness are knitted so that he or she or tissue obtain undue profits ".It is flourishing with social insurance cause Development, medical insurance fraud also grow in intensity, and this behavior constitutes safely medical insurance fund great threat, hampers medical treatment The implementation of insurance policies.
Traditional medical insurance fraud detection method is largely rule-based.With the development of medical insurance cause, doctor The a large amount of data accumulated in guarantor field, including medical diagnostic information in medical insurance settlement system, diagnosis and treatment are detailed, prescription is detailed and Digital medical archives magnanimity accumulate, form medical services big data, inside conceal a large amount of medical services knowledge and Rule.The present invention is based on big datas, it is proposed that a kind of medical insurance fraud detection method based on outlier analysis.
Invention content
The present invention is in order to improve the effect and precision of fraud detection, it is proposed that the Medicare fraud based on outlier analysis Detection device and method.
The object of the present invention is to provide a kind of device and methods of the Medicare fraud detection based on outlier analysis.This The innovative point of invention is by calculating a similarity score, the side being combined with confidence interval using threshold value for each patient Formula designs a kind of assessment algorithm and realizes fraud detection in medical insurance field.
To achieve the goals above, the present invention adopts the following technical scheme that:
A kind of Medicare fraud detection device based on outlier analysis, including:Medical insurance data acquisition module 100, energy Enough obtain be hospitalized record, medication record and the diagnosis records of somewhere medical insurance mechanism, including patient's essential information, medication information, Disease information and medical information etc.;Medical insurance data preprocessing module 200 can utilize data cleansing technology and pharmacopeia to original number Data prediction is carried out according to collection;Similarity score computing module 300 can be that each patient calculates a phase by heterogeneous network Like degree score, and consider that the drug of patient can treat the quantity of disease and is adjusted to similarity score;Outlier detects Module 400, can by fixed threshold with confidence interval be combined in the way of a flexible critical value be set carry out outlier Iterative search;Patient's fraud detection module 500 can be analyzed by outlier the outlier found being determined as doubtful fraud Patient.
A kind of Medicare fraud detection method based on outlier analysis, includes the following steps:Step S1 obtains somewhere It sees a doctor and records in the actual medical insurance in area;Step S2, pre-processes data set using data cleansing and pharmacopeia;Step S3, extraction patient, disease, medicine information build heterogeneous network;Step S4 utilizes the similarity point of Similarity measures difference patient Number;Step S5, by threshold value with confidence interval be combined in the way of carry out outlier analysis, with distinguish normally and fraud patient, Finally determine the patient of doubtful fraud.
Preferably, the Medicare fraud detection method based on outlier analysis, wherein the acquisition somewhere Record of seeing a doctor in actual medical insurance includes the following steps:Step S101, obtain somewhere Medical Insurance Organizations it is a large amount of just Related data is cured, useful data is retained, removes hash;Step S102, extraction patient's essential information record, medication record, The data such as treatment record.
Preferably, the Medicare fraud detection method based on outlier analysis, wherein described clear using data It washes that data set pre-process with pharmacopeia and includes the following steps:Step S201, extraction will carry out the patient of fraud detection just Cure data;Step S202 handles patient's sensitive data and the high data of data loss rate, ensures every using data cleansing technology The medical treatment record of a patient no less than three;Step S203 carries out classification processing, according to medicine by inquiring pharmacopeia to medicine information Many quasi-drugs are unified for same category by the correspondence between product and classification.
Preferably, the Medicare fraud detection method based on outlier analysis, wherein the extraction patient, disease Disease, medicine information structure heterogeneous network include the following steps:Step S301 divides patient data's collection after pretreatment Patient's essential information, patient's illness information and patient's medicining condition are mainly analyzed in analysis;Step S302 is extracted in data set Patient's essential information, medicine information, disease information build the Heterogeneous Information network of patient.
Preferably, the Medicare fraud detection method based on outlier analysis, wherein described utilize similitude The similarity score for calculating different patients includes the following steps:Step S401 analyzes the heterogeneous network built first, by for The correlation of a similarity score and then reflection patient that each patient calculates, it is contemplated that score increases with data volume and increased The problem of, visual sexual factor is indicated with following this form:
Step S402 considers that the drug treatment disease quantity of patient is adjusted score, if patient is using same The many diseases of drug treatment, it is believed that be abnormal, which will be computed by deduction score and adjusted process, two diseases with score The formula of similarity score between people can be expressed as:
Wherein, NKIt is using the quantity of the disease of kth kind drug, t is the quantity of the patient disease;Step S403 is calculated Candidate set and the similarity score with reference to all patients in group, and using average value as reflection itself and normal patient similarity degree Final score, last score can be expressed with following this formula:
Wherein m is the quantity with reference to patient in group,It is the similarity score of each patient.
Preferably, the Medicare fraud detection method based on outlier analysis, it is characterised in that the utilization Threshold value carries out outlier analysis with the mode that confidence interval is combined and includes the following steps:Step S501, in the phase for calculating patient After degree score, by for statistical analysis to score, the distribution of the similarity score of patient is found close to normal distribution, in conjunction with Properties of normal distribution carries out threshold calculations;Step S502 assists determining detection outlier using the confidence interval of normal distribution Flexible critical value makes when other factors change, and critical value can be automatically adjusted to value appropriate, the critical value, i.e. threshold value It can indicate to be formulated:
Wherein Y is the threshold value being calculated,It is patient's similarity scoreAverage value;Step S503, iterative calculation critical value are compared with similarity score, and score is considered outlier less than critical value, with this side Formula carries out outlier screening, by iterative operation, it can be found that some outliers, but they may be the one of all outliers Part deletes above-mentioned outlier, and calculates critical value again using remaining data, and deletes new outlier, repeats this mistake Journey, until can not find new outlier, and using the last one critical value as end value;Step S504, the outlier that will be found It is determined as the patient of doubtful fraud.
Compared with prior art, beneficial effects of the present invention are:
(1) present invention improves existing outlier analysis method by data prediction, and has adjusted each patient's Similarity score is so as to fit medical insurance field.Based on this method, by calculating the similarity score of patient, use is similar Property and the mode that is combined of outlier, be that each patient calculates a similarity score using similitude, then carry out outlier Analysis.
(2) present invention devises a kind of assessment algorithm progress fraud detection that threshold value is combined with confidence interval.Pass through system Meter, it can be found that the distribution of similarity score is similar to normal distribution, using threshold value in such a way that confidence interval is combined, with true Regular inspection surveys the flexible critical value of outlier.
Description of the drawings
The accompanying drawings which form a part of this application are used for providing further understanding of the present application, and the application's shows Meaning property embodiment and its explanation do not constitute the improper restriction to the application for explaining the application.
Fig. 1 is to detect dress according to the Medicare fraud of the exemplary embodiment of present inventive concept analyzed based on outlier Set structural schematic diagram;
Fig. 2 is the Medicare fraud detection side analyzed based on outlier according to the exemplary embodiment of present inventive concept The overview flow chart of method;
Fig. 3 is the Medicare fraud detection side analyzed based on outlier according to the exemplary embodiment of present inventive concept The flow chart of the medical insurance data acquisition step of method;
Fig. 4 is the Medicare fraud detection side analyzed based on outlier according to the exemplary embodiment of present inventive concept The flow chart of the medical insurance data prediction step of method;
Fig. 5 is the Medicare fraud detection side analyzed based on outlier according to the exemplary embodiment of present inventive concept The information extraction of method and the flow chart for building heterogeneous network step;
Fig. 6 is the Medicare fraud detection side analyzed based on outlier according to the exemplary embodiment of present inventive concept The similarity score of method calculates and the flow chart of set-up procedure;
Fig. 7 is the Medicare fraud detection side analyzed based on outlier according to the exemplary embodiment of present inventive concept The flow chart of the outlier analysis fraud detection step of method.
Specific implementation mode
The invention will be further described with embodiment below in conjunction with the accompanying drawings.
It is noted that following detailed description is all illustrative, it is intended to provide further instruction to the application.Unless another It indicates, all technical and scientific terms used herein has usual with the application person of an ordinary skill in the technical field The identical meanings of understanding.
It should be noted that term used herein above is merely to describe specific implementation mode, and be not intended to restricted root According to the illustrative embodiments of the application.As used herein, unless the context clearly indicates otherwise, otherwise singulative It is also intended to include plural form, additionally, it should be understood that, when in the present specification using term "comprising" and/or " packet Include " when, indicate existing characteristics, step, operation, device, component and/or combination thereof.
The present invention is directed to a phenomenon of pointed actual medical in open source literature, to somewhere medical insurance Patient sees a doctor after record analyzes, and extracts patient's essential information therein, medicine information, disease information etc., utilization is similar Degree score calculates and flexibly critical value calculates to carry out doubtful fraud patient detection, and the present invention proposes that a kind of outlier analysis carries out The method of fraud detection:
It is that each patient calculates a phase using similitude first, considering similitude and outlier these two aspects factor simultaneously Like degree score, outlier analysis assessment then is carried out to the score of patient again.
Second is that in such a way that threshold calculations and confidence interval are combined, a flexible critical value is determined, it can be with The variation of data carries out itself adjustment, to distinguish fraud patient and normal patient.
Wherein, following noun according to the present invention is:
Heterogeneous network:Heterogeneous Information network can indicate that wherein V represents vertex, E representative edges with a figure G=(V, E). It can be related to society, science, engineering etc. by many interconnections, large-scale dataset constructions, range.Medical field Can be modeled as a medical information network, its vertex may include doctor, patient, disease, treatment, equipment etc., vertex it Between can be described as taking between patient and drug, medical etc. between the illness between patient and disease, patient and doctor Relationship.
First path:First path (Meta-Path) is the path constituted by connecting multiple vertex, it can systematically reflect Incidence relation in Heterogeneous Information network between different vertex.
Vertex representation member path can also be directly used in the case where not causing ambiguity.Such as A->P->A can direct table It is shown as APA.If Meta-Path=(Meta-Path)-1, then this yuan of path is symmetrical, such as APA, APVPA, APTPA etc..
The present invention devises a symmetrical first path, and the roads Tiao Yuan between patient, disease and drug are indicated with PMDMP Diameter, this symmetrical first path are meant that in patient p1And p2Between exist drug similar with same disease.Then it can use Similitude of the quantity in symmetry element path as scoring reflection patient.
Threshold value:Threshold means boundary, therefore threshold value is called critical value, refers to minimum that an effect can generate or most High level is widely used in each door scientific domain.Setting for a good threshold value can obtain more preferably result.
Confidence interval:It refer to the estimation interval of the population parameter constructed by sample statistic.In statistics, one general The confidence interval (Confidence interval) of rate sample is the interval estimation to some population parameter of this sample.It sets What letter section showed, which is the actual value of this parameter, has certain probability to fall the degree around measurement result.Confidence interval provides Be be measured parameter measured value credibility, i.e., front required by " probability ".
Fig. 1 is to detect dress according to the Medicare fraud of the exemplary embodiment of present inventive concept analyzed based on outlier Set structural schematic diagram.
As shown in Figure 1, according to the Medicare fraud based on outlier analysis of the exemplary embodiment of present inventive concept Detection device, including:
Medical insurance data acquisition module 100 obtains be hospitalized record, medication record and the diagnosis records of somewhere medical insurance mechanism, Including patient's essential information, medication information, disease information and medical information etc.;
Medical insurance data preprocessing module 200 carries out data to raw data set using data cleansing technology and pharmacopeia and locates in advance Reason;Wherein,
Since data set is there are shortage of data, a series of problems that data are inconsistent etc., such as the essential information of patient, There is missing in dosage of drug etc., the present invention utilizes data cleansing technology, carries out the close processing that disappears to sensitive data, ensure that information Integrality and confidentiality, while having handled high miss rate data well.Preferably, the pharmacopeia is《Chinese Pharmacopoeia》 (version in 2015) utilizes《Chinese Pharmacopoeia》(version in 2015) is further to the drug of fine grain to be extracted and is sorted out to coarse grain diameter Drug classification, solve medicine information process problem;
Similarity score computing module 300 calculates a similarity score by heterogeneous network for each patient;Wherein,
In the similarity score computing module, by analyzing heterogeneous network, a similarity is calculated for each patient Score, and consider that the drug of patient can treat the quantity of disease and is adjusted to similarity score;
Outlier detection module 400, by fixed threshold with confidence interval be combined in the way of be arranged one and flexibly face Dividing value carries out outlier iterative search;Wherein,
In the outlier detection module 400, find that the distribution of similarity score is similar to normal distribution by statistics, In normal distribution confidence interval is calculated in view of critical value.By calculating flexible critical value, and by its to it is similar Property score is compared, and finds out outlier therein;
Patient's fraud detection module 500 analyzes the patient that the outlier found is determined as to doubtful fraud by outlier; Wherein,
It in the fraud detection module 500, is analyzed by outlier, normal disease can be distinguished according to obtained outlier People and fraud patient, and the outlier found is determined as to the patient of doubtful fraud.
Fig. 2 is the Medicare fraud detection side analyzed based on outlier according to the exemplary embodiment of present inventive concept The overview flow chart of method.
As shown in Fig. 2, according to the Medicare fraud based on outlier analysis of the exemplary embodiment of present inventive concept Detection method, including:
Step S1 obtains record of seeing a doctor in the actual medical insurance in somewhere;
In the medical treatment record for specifically in practice, obtaining the actual medical insurance in somewhere, extraction is many available from the inside Patient related information, such as patient's essential information, medicine information and disease information etc. carry out medical fraud detection.
Step S2, pre-processes data set using data cleansing and pharmacopeia;
Since data set is there are shortage of data, a series of problems that data are inconsistent etc., such as the essential information of patient, There is missing in dosage of drug etc., most of they cannot be directly used to medical field.Such as in medical field, due to drug kind Class is too many, while the data volume of each drug is very little, so each drug is specifically difficult with analysis;
Preferably, the present invention utilizes《Chinese Pharmacopoeia》(version in 2015) solves medicine information process problem.Pass through inquiry 《Chinese Pharmacopoeia》(version in 2015), can be further to the drug of fine grain to be extracted according to the drug classification of existing record With the drug classification of classification to coarse grain diameter;
The present invention utilizes data cleansing technology, carries out the close processing that disappears to sensitive data, and delete the higher number of miss rate According to ensure that the integrality and confidentiality of information, while having handled high miss rate data well.
Step S3, extraction patient, disease, medicine information build heterogeneous network;
Extraction patient essential information, medicine information and disease information from the medical treatment of acquisition record, according to mutual A Heterogeneous Information network is established in contact, so that them is occurred in the form of an apex in heterogeneous network, the connection between them can To be described as, patient takes certain drug, certain drug can treat the information such as certain disease.
Patient takes certain drug because suffering from certain disease every time, in this process, between patient, drug and disease Information connects each other, forms a huge network, can be regarded as the heterogeneous network constituted as vertex using this three.
Based on above-mentioned described this medical field scene, the Heterogeneous Information network of structure is made of three kinds of vertex Network:Patient (P), drug (M), disease (D).The correlation between first path description three is then used, can be designed as:
Meta-Path=(V1V2...Vn) or (Meta-Path)-1=(VnVn-1...V1)
And what the present invention designed is a kind of symmetry element path, i.e., is the phase of each patient later in the form of P-M-D-M-P It prepares like the calculating of degree score.
Step S4 utilizes the similarity score of Similarity measures difference patient;
By the heterogeneous network built, the similarity of each patient and other patients are calculated, then from disease angle It sets out, this abnormal conditions of many diseases can be treated for same drug, further adjustment is done to the score being calculated, By calculating candidate set patient and the score with reference to all patients in group, and it is similar to normal patient using average value as reflection The final score of degree.
Specifically, after building heterogeneous network, design is reflected by calculating the similarity score of patient between patient Similarity degree.Vertex p1With vertex p2Connection can be expressed as P between two vertexsymSymmetry element path examples number Amount indicates as follows:
Psym=(Meta-Path) (Meta-Path)-1=(PMDMP)
One symmetry element path means in patient p1And p2Between exist drug similar with same disease.It then can be with It uses the quantity in symmetry element path as the similitude of scoring reflection patient, can be indicated with following formula:
K(va, vb)=| π Psym(va, vb)|
In view of aforesaid way calculates similarity score, it is possible that the problem of score increases with data volume and increased, This is existing defects.Therefore designing this following formula solves the problems, such as this:
K(va, va)=| π Psym(va, va)|
In medical insurance field, a kind of drug is normal for treating several diseases, but can be inferred that the medicine simultaneously Product are not specific disease drugs.If patient use many diseases of same drug treatment, it is considered to be it is abnormal, then this Patient will be adjusted in similarity score, will be by deduction score.
Specifically, after being adjusted and calculate, the formula of the similarity score between two patients can be expressed as:
Wherein, it is in and the quantity of disease is considered, patient's score is adjusted, wherein NKIt is to use kth kind drug Disease quantity, and t is the quantity of patient's illnesses.
Step S5, by threshold value with confidence interval be combined in the way of carry out outlier analysis, to distinguish normal and fraud Patient finally determines the patient of doubtful fraud;
After the similarity score for calculating patient, the present invention devises an assessment algorithm detection outlier.By threshold value with Confidence interval is combined, and by statistics, it is found that the distribution of similarity score is similar to normal distribution.Then the present invention considers to take just The confidence interval of state distribution, i.e. average value subtract the form that standard deviation adds standard deviation with average value.It flexibly calculates critical Value is to determine outlier, to find fraud patient.
The Medicare fraud based on outlier analysis of the exemplary embodiment according to present inventive concept is detected below The step of method, is specifically illustrated, as shown in Fig. 3 to Fig. 7:
Step S101 obtains a large amount of medical treatment related datas of somewhere Medical Insurance Organizations, retains useful data, removes nothing Use data;
Step S102, the data such as extraction patient's essential information record, medication record, treatment record.
Step S201, extraction will carry out the medical treatment data of the patient of fraud detection;
Step S202 handles patient's sensitive data and the high data of data loss rate (ensures every using data cleansing technology The medical treatment record of a patient no less than three);
Step S203 passes through inquiry《Chinese Pharmacopoeia》(version in 2015) carries out classification processing to medicine information, according to drug Many quasi-drugs are unified for same category by the correspondence between classification.
Step S301 analyzes patient data's collection after pretreatment, mainly analyzes patient's essential information, patient Illness information and patient's medicining condition;
Step S302, extract data set in patient's essential information, medicine information, disease information, build the isomery of patient Information network.
Step S401 analyzes the heterogeneous network built first, passes through the similarity calculated for each patient Score and then the correlation for reflecting patient,
The problem of increasing in view of score increases with data volume, this is unreasonable.In order to solve this problem, this hair It is bright that visual sexual factor is adjusted, it can be indicated with following this form,
Step S402 considers that the drug treatment disease quantity of patient is adjusted score, if patient is using same The many diseases of drug treatment, it is believed that be abnormal, which will be by deduction score.It is computed and adjusts process, two diseases with score The formula of similarity score between people can be expressed as,
Wherein, NKIt is using the quantity of the disease of kth kind drug, t is the quantity of the patient disease.
Step S403 calculates candidate set and the similarity score with reference to all patients in group, and using average value as reflection The final score of itself and normal patient similarity degree.Last score can be expressed with following this formula,
Wherein m is the quantity with reference to patient in group,It is the similarity score of each patient.
Step S501, after the similarity score for calculating patient, by for statistical analysis to score, it can be found that patient Similarity score distribution close to normal distribution, consider that properties of normal distribution is combined to carry out threshold calculations;
Step S502 assists determining the flexible critical value of detection outlier, when other using the confidence interval of normal distribution When factor changes, critical value can be automatically adjusted to value appropriate, then critical value, i.e., threshold value can indicate to be formulated,
Wherein Y is the threshold value being calculated,It is patient's similarity scoreAverage value.
Step S503, iterative calculation critical value be compared with similarity score, score be less than critical value be considered from Group's value, carries out outlier screening in this way;
By iterative operation, it can be found that some outliers, but they may be a part for all outliers.It deletes They, and critical value is calculated again using remaining data, and delete new outlier.This process is repeated, it is new until can not find Outlier, and by the last one critical value be end value.
The outlier found is determined as the patient of doubtful fraud by step S504.
In conclusion the present invention in the fraud detection to patient, utilizes the essential information of patient, medication in medical insurance data Information and disease information are analyzed and are pre-processed to the historgraphic data recording of patient, and disease is calculated by the heterogeneous network of structure The similarity score of people.Outlier analysis is carried out in such a way that confidence interval is combined threshold value again, so can according to The outlier arrived distinguishes normal patient and fraud patient.
The foregoing is merely the preferred embodiments of the application, are not intended to limit this application, for the skill of this field For art personnel, the application can have various modifications and variations.Within the spirit and principles of this application, any made by repair Change, equivalent replacement, improvement etc., should be included within the protection domain of the application.
Above-mentioned, although the foregoing specific embodiments of the present invention is described with reference to the accompanying drawings, not protects model to the present invention The limitation enclosed, those skilled in the art should understand that, based on the technical solutions of the present invention, those skilled in the art are not Need to make the creative labor the various modifications or changes that can be made still within protection scope of the present invention.

Claims (7)

1. a kind of Medicare fraud detection device based on outlier analysis, it is characterised in that including:
Medical insurance data acquisition module 100 can obtain be hospitalized record, medication record and the diagnosis records of somewhere medical insurance mechanism, Including patient's essential information, medication information, disease information and medical information etc.;
Medical insurance data preprocessing module 200 can carry out data to raw data set using data cleansing technology and pharmacopeia and locate in advance Reason;
Similarity score computing module 300 can be that each patient calculates a similarity score, and examines by heterogeneous network The quantity that the drug of worry patient can treat disease is adjusted similarity score;
Outlier detection module 400, can by fixed threshold with confidence interval be combined in the way of be arranged one and flexibly face Dividing value carries out outlier iterative search;
Patient's fraud detection module 500 can analyze the patient that the outlier found is determined as to doubtful fraud by outlier.
2. a kind of Medicare fraud detection method based on outlier analysis, it is characterised in that include the following steps:
Step S1 obtains record of seeing a doctor in the actual medical insurance in somewhere;
Step S2, pre-processes data set using data cleansing and pharmacopeia;
Step S3, extraction patient, disease, medicine information build heterogeneous network;
Step S4 utilizes the similarity score of Similarity measures difference patient;
Step S5, by threshold value with confidence interval be combined in the way of carry out outlier analysis, with distinguish normally and fraud patient, Finally determine the patient of doubtful fraud.
3. the Medicare fraud detection method according to claim 2 based on outlier analysis, it is characterised in that described Record of seeing a doctor in the actual medical insurance in somewhere is obtained to include the following steps:
Step S101 obtains a large amount of medical treatment related datas of somewhere Medical Insurance Organizations, retains useful data, removes useless number According to;
Step S102, the data such as extraction patient's essential information record, medication record, treatment record.
4. the Medicare fraud detection method according to claim 2 based on outlier analysis, it is characterised in that described Pretreatment is carried out using data cleansing and pharmacopeia to data set to include the following steps:
Step S201, extraction will carry out the medical treatment data of the patient of fraud detection;
Step S202 handles patient's sensitive data and the high data of data loss rate, ensures each disease using data cleansing technology The medical treatment record of people no less than three;
Step S203 carries out classification processing by inquiring pharmacopeia to medicine information, according to the correspondence between drug and classification, Many quasi-drugs are unified for same category.
5. the Medicare fraud detection method according to claim 2 based on outlier analysis, it is characterised in that described Extraction patient, disease, medicine information structure heterogeneous network include the following steps:
Step S301 analyzes patient data's collection after pretreatment, mainly analyzes patient's essential information, patient's illness Information and patient's medicining condition;
Step S302, extract data set in patient's essential information, medicine information, disease information, build the Heterogeneous Information of patient Network.
6. the Medicare fraud detection method according to claim 2 based on outlier analysis, it is characterised in that described Similarity score using Similarity measures difference patient includes the following steps:
Step S401 analyzes the heterogeneous network built first, by the similarity score calculated for each patient Reflect the correlation of patient,
The problem of increasing in view of score increases with data volume, visual sexual factor is indicated with following this form:
Step S402 considers that the drug treatment disease quantity of patient is adjusted score, if patient uses same drug Treat many diseases, it is believed that be abnormal, which will be computed by deduction score and adjusted process with score, two patients it Between the formula of similarity score can be expressed as:
Wherein, NKIt is using the quantity of the disease of kth kind drug, t is the quantity of the patient disease;
Step S403, calculates candidate set and the similarity score with reference to all patients in group, and using average value as reflection its with The final score of normal patient similarity degree, last score can be expressed with following this formula:
Wherein m is the quantity with reference to patient in group,It is the similarity score of each patient.
7. the Medicare fraud detection method according to claim 2 based on outlier analysis, it is characterised in that described By threshold value with confidence interval be combined in the way of carry out outlier analysis include the following steps:
Step S501 by for statistical analysis to score, has found the similarity of patient after the similarity score for calculating patient The distribution of score carries out threshold calculations close to normal distribution, in conjunction with properties of normal distribution;
Step S502, using normal distribution confidence interval assist determine detection outlier flexible critical value, make when other because When element changes, critical value can be automatically adjusted to value appropriate, and the critical value, i.e. threshold value can indicate to be formulated:
Wherein Y is the threshold value being calculated,It is patient's similarity scoreAverage value;
Step S503, iterative calculation critical value are compared with similarity score, and score is considered outlier less than critical value, Outlier screening is carried out in this way,
By iterative operation, it can be found that some outliers, but they may be a part for all outliers, delete above-mentioned Outlier, and critical value is calculated again using remaining data, and new outlier is deleted, this process is repeated, until can not find New outlier, and using the last one critical value as end value;
The outlier found is determined as the patient of doubtful fraud by step S504.
CN201711471001.4A 2017-12-29 2017-12-29 Medical insurance fraud detection device and method based on outlier analysis Active CN108596770B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711471001.4A CN108596770B (en) 2017-12-29 2017-12-29 Medical insurance fraud detection device and method based on outlier analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711471001.4A CN108596770B (en) 2017-12-29 2017-12-29 Medical insurance fraud detection device and method based on outlier analysis

Publications (2)

Publication Number Publication Date
CN108596770A true CN108596770A (en) 2018-09-28
CN108596770B CN108596770B (en) 2022-04-01

Family

ID=63633408

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711471001.4A Active CN108596770B (en) 2017-12-29 2017-12-29 Medical insurance fraud detection device and method based on outlier analysis

Country Status (1)

Country Link
CN (1) CN108596770B (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109359873A (en) * 2018-10-24 2019-02-19 哈工大机器人(山东)智能装备研究院 One kind being based on PCA-T2Ball screw assembly, health evaluating method
CN109616176A (en) * 2018-12-04 2019-04-12 平安医疗健康管理股份有限公司 Method, apparatus, equipment and the storage medium that auxiliary doctor prescribes
CN109615204A (en) * 2018-11-30 2019-04-12 平安医疗健康管理股份有限公司 Method for evaluating quality, device, equipment and the readable storage medium storing program for executing of medical data
CN109636645A (en) * 2018-12-13 2019-04-16 平安医疗健康管理股份有限公司 Medical insurance monitoring and managing method, unit and computer readable storage medium
CN109659000A (en) * 2018-12-13 2019-04-19 平安医疗健康管理股份有限公司 Recognition methods, device, terminal and the computer readable storage medium of violation prescription
CN109919780A (en) * 2019-01-23 2019-06-21 平安科技(深圳)有限公司 Claims Resolution based on figure computing technique is counter to cheat method, apparatus, equipment and storage medium
CN110322356A (en) * 2019-04-22 2019-10-11 山东大学 The medical insurance method for detecting abnormality and system of dynamic multi-mode are excavated based on HIN
CN110349662A (en) * 2019-05-23 2019-10-18 复旦大学 The outliers across image collection that result is accidentally surveyed for filtering pulmonary masses find method and system
CN110827159A (en) * 2019-11-11 2020-02-21 上海交通大学 Financial medical insurance fraud early warning method, device and terminal based on relational graph
CN111105317A (en) * 2019-12-28 2020-05-05 哈尔滨工业大学 Medical insurance fraud detection method based on medicine purchase record
CN111127207A (en) * 2019-12-28 2020-05-08 哈尔滨工业大学 Block chain-based drug sales fraud supervision system and supervision method thereof
WO2020119114A1 (en) * 2018-12-13 2020-06-18 平安医疗健康管理股份有限公司 Method, device, and equipment for test data screening, and storage medium
CN111462897A (en) * 2020-04-01 2020-07-28 山东大学 Patient similarity analysis method and system based on improved heterogeneous information network
CN111597336A (en) * 2020-05-14 2020-08-28 腾讯科技(深圳)有限公司 Processing method and device of training text, electronic equipment and readable storage medium
CN111612636A (en) * 2020-04-29 2020-09-01 山东大学 Abnormal medical insurance data detection system and method based on dual clustering algorithm
CN112435133A (en) * 2020-11-18 2021-03-02 厦门理工学院 Medical insurance combined fraud detection method, device and equipment based on graph analysis

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090024427A1 (en) * 2007-07-19 2009-01-22 Shan Jerry Z Analyzing time series data that exhibits seasonal effects
CN102013084A (en) * 2010-12-14 2011-04-13 江苏大学 System and method for detecting fraudulent transactions in medical insurance outpatient services
CN104111996A (en) * 2014-07-07 2014-10-22 山大地纬软件股份有限公司 Health insurance outpatient clinic big data extraction system and method based on hadoop platform
CN104408547A (en) * 2014-10-30 2015-03-11 浙江网新恒天软件有限公司 Data-mining-based detection method for medical insurance fraud behavior
CN105159948A (en) * 2015-08-12 2015-12-16 成都数联易康科技有限公司 Medical insurance fraud detection method based on multiple features
CN105868555A (en) * 2016-03-29 2016-08-17 陈杰 Similarity calculation method based on commercial medical insurance claim cases
CN106408141A (en) * 2015-07-28 2017-02-15 平安科技(深圳)有限公司 Abnormal expense automatic extraction system and method

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090024427A1 (en) * 2007-07-19 2009-01-22 Shan Jerry Z Analyzing time series data that exhibits seasonal effects
CN102013084A (en) * 2010-12-14 2011-04-13 江苏大学 System and method for detecting fraudulent transactions in medical insurance outpatient services
CN104111996A (en) * 2014-07-07 2014-10-22 山大地纬软件股份有限公司 Health insurance outpatient clinic big data extraction system and method based on hadoop platform
CN104408547A (en) * 2014-10-30 2015-03-11 浙江网新恒天软件有限公司 Data-mining-based detection method for medical insurance fraud behavior
CN106408141A (en) * 2015-07-28 2017-02-15 平安科技(深圳)有限公司 Abnormal expense automatic extraction system and method
CN105159948A (en) * 2015-08-12 2015-12-16 成都数联易康科技有限公司 Medical insurance fraud detection method based on multiple features
CN105868555A (en) * 2016-03-29 2016-08-17 陈杰 Similarity calculation method based on commercial medical insurance claim cases

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
金华 等: "医保欺诈行为的主动发现", 《中国会议》 *

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109359873A (en) * 2018-10-24 2019-02-19 哈工大机器人(山东)智能装备研究院 One kind being based on PCA-T2Ball screw assembly, health evaluating method
CN109359873B (en) * 2018-10-24 2021-04-20 哈工大机器人(山东)智能装备研究院 Based on PCA-T2Health assessment method for ball screw pair
CN109615204A (en) * 2018-11-30 2019-04-12 平安医疗健康管理股份有限公司 Method for evaluating quality, device, equipment and the readable storage medium storing program for executing of medical data
CN109615204B (en) * 2018-11-30 2023-02-03 平安医疗健康管理股份有限公司 Quality evaluation method, device and equipment of medical data and readable storage medium
CN109616176A (en) * 2018-12-04 2019-04-12 平安医疗健康管理股份有限公司 Method, apparatus, equipment and the storage medium that auxiliary doctor prescribes
CN109659000A (en) * 2018-12-13 2019-04-19 平安医疗健康管理股份有限公司 Recognition methods, device, terminal and the computer readable storage medium of violation prescription
WO2020119114A1 (en) * 2018-12-13 2020-06-18 平安医疗健康管理股份有限公司 Method, device, and equipment for test data screening, and storage medium
CN109636645A (en) * 2018-12-13 2019-04-16 平安医疗健康管理股份有限公司 Medical insurance monitoring and managing method, unit and computer readable storage medium
US20210326995A1 (en) * 2019-01-23 2021-10-21 Ping An Technology (Shenzhen) Co., Ltd. Claim settlement anti-fraud method, apparatus, device, and storage medium based on graph computation technology
CN109919780A (en) * 2019-01-23 2019-06-21 平安科技(深圳)有限公司 Claims Resolution based on figure computing technique is counter to cheat method, apparatus, equipment and storage medium
WO2020151321A1 (en) * 2019-01-23 2020-07-30 平安科技(深圳)有限公司 Graph computation-based claim anti-fraud method, apparatus and device, and storage medium
CN110322356A (en) * 2019-04-22 2019-10-11 山东大学 The medical insurance method for detecting abnormality and system of dynamic multi-mode are excavated based on HIN
CN110349662A (en) * 2019-05-23 2019-10-18 复旦大学 The outliers across image collection that result is accidentally surveyed for filtering pulmonary masses find method and system
CN110349662B (en) * 2019-05-23 2023-01-13 复旦大学 Cross-image set outlier sample discovery method and system for filtering lung mass misdetection results
CN110827159A (en) * 2019-11-11 2020-02-21 上海交通大学 Financial medical insurance fraud early warning method, device and terminal based on relational graph
CN110827159B (en) * 2019-11-11 2023-11-03 上海交通大学 Financial medical insurance fraud early warning method, device and terminal based on relation diagram
CN111105317A (en) * 2019-12-28 2020-05-05 哈尔滨工业大学 Medical insurance fraud detection method based on medicine purchase record
CN111105317B (en) * 2019-12-28 2023-05-12 哈尔滨工业大学 Medical insurance fraud detection method based on medicine purchasing record
CN111127207A (en) * 2019-12-28 2020-05-08 哈尔滨工业大学 Block chain-based drug sales fraud supervision system and supervision method thereof
CN111462897A (en) * 2020-04-01 2020-07-28 山东大学 Patient similarity analysis method and system based on improved heterogeneous information network
CN111462897B (en) * 2020-04-01 2021-05-11 山东大学 Patient similarity analysis method and system based on improved heterogeneous information network
CN111612636A (en) * 2020-04-29 2020-09-01 山东大学 Abnormal medical insurance data detection system and method based on dual clustering algorithm
CN111597336A (en) * 2020-05-14 2020-08-28 腾讯科技(深圳)有限公司 Processing method and device of training text, electronic equipment and readable storage medium
CN111597336B (en) * 2020-05-14 2023-12-22 腾讯科技(深圳)有限公司 Training text processing method and device, electronic equipment and readable storage medium
CN112435133A (en) * 2020-11-18 2021-03-02 厦门理工学院 Medical insurance combined fraud detection method, device and equipment based on graph analysis

Also Published As

Publication number Publication date
CN108596770B (en) 2022-04-01

Similar Documents

Publication Publication Date Title
CN108596770A (en) Medicare fraud detection device and method based on outlier analysis
US11810037B2 (en) Automatic patient recruitment system and methods for use therewith
CN100449531C (en) Patient data mining
CN107247887A (en) The method and system of screening lung cancer are helped based on artificial intelligence
CN109842628A (en) A kind of anomaly detection method and device
Manju et al. Efficient multi-level lung cancer prediction model using support vector machine classifier
CN105335618B (en) It is a kind of based on the extension bed behavior monitoring method portrayed patient characteristicses
CN105139390A (en) Image processing method for detecting pulmonary tuberculosis focus in chest X-ray DR film
Mahapatra et al. A novel framework for retinal vessel segmentation using optimal improved frangi filter and adaptive weighted spatial FCM
Constantinides et al. Personalized versus generic mood prediction models in bipolar disorder
US20220037019A1 (en) Medical scan artifact detection system and methods for use therewith
Khan et al. Adapting graph theory and social network measures on healthcare data: A new framework to understand chronic disease progression
US20220005565A1 (en) System with retroactive discrepancy flagging and methods for use therewith
Zhang et al. Identifying diabetic macular edema and other retinal diseases by optical coherence tomography image and multiscale deep learning
Sun et al. Heterogeneous network-based chronic disease progression mining
Begley et al. Veteran drug overdose mortality, 2010–2019
Deb et al. CoVSeverity-Net: an efficient deep learning model for COVID-19 severity estimation from Chest X-Ray images
Adday et al. Enhanced vaccine recommender system to prevent COVID-19 based on clustering and classification
Bhattarai et al. Can big data and machine learning improve our understanding of acute respiratory distress syndrome?
Mowla et al. Leprosy profiles in post‐elimination stage: a tertiary care hospital experience
US20220051114A1 (en) Inference process visualization system for medical scans
Busey et al. Temporal sequences quantify the contributions of individual fixations in complex perceptual matching tasks
Abd El-Aziz et al. The role of emerging technologies for combating COVID-19 pandemic
Delnevo et al. Modeling patients' online medical conversations: a granger causality approach
Wu et al. Dermatochalasis aggravates meibomian gland dysfunction related dry eyes

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant