CN108596770B - Medical insurance fraud detection device and method based on outlier analysis - Google Patents
Medical insurance fraud detection device and method based on outlier analysis Download PDFInfo
- Publication number
- CN108596770B CN108596770B CN201711471001.4A CN201711471001A CN108596770B CN 108596770 B CN108596770 B CN 108596770B CN 201711471001 A CN201711471001 A CN 201711471001A CN 108596770 B CN108596770 B CN 108596770B
- Authority
- CN
- China
- Prior art keywords
- patient
- data
- outlier
- medical insurance
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000004458 analytical method Methods 0.000 title claims abstract description 43
- 238000001514 detection method Methods 0.000 title claims abstract description 36
- 238000000034 method Methods 0.000 title claims abstract description 27
- 238000007781 pre-processing Methods 0.000 claims abstract description 12
- 238000004364 calculation method Methods 0.000 claims abstract description 9
- 239000003814 drug Substances 0.000 claims description 53
- 229940079593 drug Drugs 0.000 claims description 33
- 201000010099 disease Diseases 0.000 claims description 25
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 claims description 25
- 238000004140 cleaning Methods 0.000 claims description 10
- 238000011282 treatment Methods 0.000 claims description 10
- 238000005516 engineering process Methods 0.000 claims description 7
- 230000008569 process Effects 0.000 claims description 5
- 238000012545 processing Methods 0.000 claims description 5
- 230000008859 change Effects 0.000 claims description 4
- 238000007619 statistical method Methods 0.000 claims description 3
- 238000012216 screening Methods 0.000 claims description 2
- 238000005406 washing Methods 0.000 claims description 2
- 238000013450 outlier detection Methods 0.000 abstract description 3
- 238000003745 diagnosis Methods 0.000 description 4
- 238000011156 evaluation Methods 0.000 description 4
- 238000004422 calculation algorithm Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000002159 abnormal effect Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000010365 information processing Effects 0.000 description 2
- 230000008520 organization Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 239000011362 coarse particle Substances 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 239000010419 fine particle Substances 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/08—Insurance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/2433—Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Finance (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Accounting & Taxation (AREA)
- Evolutionary Biology (AREA)
- Development Economics (AREA)
- Economics (AREA)
- Marketing (AREA)
- Strategic Management (AREA)
- Technology Law (AREA)
- General Business, Economics & Management (AREA)
- Medical Treatment And Welfare Office Work (AREA)
- Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)
Abstract
The invention provides a medical insurance fraud detection device and a detection method based on outlier analysis, wherein the medical insurance fraud detection device based on outlier analysis comprises the following steps: a medical insurance data acquisition module; a medical insurance data preprocessing module; a similarity score calculation module; an outlier detection module; a patient fraud detection module. The present invention improves the existing outlier analysis method through data preprocessing and adjusts the similarity score of each patient to make it suitable for the medical insurance field. Based on the method, the similarity score of the patient is calculated, a mode of combining the similarity and the outlier is adopted, the similarity score is calculated for each patient, then the outlier is analyzed, the distribution of the similarity score is found to be similar to the normal distribution through statistics, and the flexible critical value for detecting the outlier is determined by a mode of combining the threshold and the confidence interval.
Description
Technical Field
The invention belongs to the field of medical insurance, and particularly relates to a medical insurance fraud detection method based on outlier analysis.
Background
For medicare fraud, the national health care anti-fraud association (NHCAA) is defined as: "deliberately deceptive or fraudulent presentation by a person or organization to gain an illicit benefit to the person or organization".
Most of the traditional medical insurance fraud detection methods are based on rules. With the development of medical insurance business, a large amount of data accumulated in the medical insurance field, including medical diagnosis information, diagnosis and treatment details, prescription details and digital medical files in a medical insurance settlement system, are accumulated in a large amount, so that medical service big data is formed, and a large amount of medical service knowledge and rules are hidden in the medical service big data. The invention provides a medical insurance fraud behavior detection method based on outlier analysis based on big data.
Disclosure of Invention
The invention provides a medical insurance fraud detection device and method based on outlier analysis in order to improve the fraud detection effect and accuracy.
The invention aims to provide a medical insurance fraud detection device and method based on outlier analysis. The innovation point of the invention is that an evaluation algorithm is designed to realize fraud detection in the field of medical insurance by calculating a similarity score for each patient and utilizing a mode of combining a threshold value and a confidence interval.
In order to achieve the purpose, the invention adopts the following technical scheme:
an apparatus for detecting fraud in medical insurance based on outlier analysis, comprising: the medical insurance data acquisition module 100 can acquire the hospitalization records, the medication records and the treatment records of medical insurance institutions in certain areas, wherein the medical insurance records comprise basic information, medication information, disease information, diagnosis and treatment information and the like of patients; the medical insurance data preprocessing module 200 can preprocess the data of the original data set by using a data cleaning technology and pharmacopoeia; the similarity score calculating module 300 can calculate a similarity score for each patient through a heterogeneous network, and adjust the similarity score in consideration of the number of diseases which can be treated by the medicines of the patient; the outlier detection module 400 can set a flexible critical value by combining a fixed threshold value and a confidence interval to perform outlier iterative search; the patient fraud detection module 500 is capable of determining found outliers as suspected fraudulent patients through outlier analysis.
A medical insurance fraud detection method based on outlier analysis comprises the following steps: step S1, obtaining the actual medical record in the medical insurance in a certain area; step S2, preprocessing the data set by using data cleaning and pharmacopoeia; step S3, extracting information of patients, diseases and medicines to construct a heterogeneous network; step S4, calculating similarity scores of different patients by using the similarity; step S5, performing outlier analysis by combining threshold and confidence intervals to distinguish normal and fraudulent patients, and finally determining patients suspected of being fraudulent.
Preferably, the method for detecting fraud in medical insurance based on outlier analysis, wherein the step of obtaining the actual medical insurance medical record in a certain area comprises the following steps: step S101, acquiring a large amount of medical care related data of a medical insurance institution in a certain area, reserving useful data and removing useless data; step S102, extracting data such as patient basic information record, medication record, treatment record and the like.
Preferably, the method for detecting fraud in medical insurance based on outlier analysis, wherein the pre-processing of the data set using data washing and pharmacopoeia comprises the following steps: step S201, extracting hospitalizing data of a patient to be subjected to fraud detection; step S202, processing sensitive data and data with high data loss rate of patients by using a data cleaning technology, and ensuring that not less than three medical records of each patient are obtained; step S203, the category of the medicine information is processed by inquiring the pharmacopoeia, and a plurality of similar medicines are unified into the same category according to the corresponding relationship between the medicines and the categories.
Preferably, the method for detecting medical insurance fraud based on outlier analysis, wherein the extracting information of patients, diseases and drugs to construct a heterogeneous network comprises the following steps: step S301, analyzing the preprocessed patient data set, mainly analyzing basic information of the patient, diseased information of the patient and medication condition of the patient; step S302, extracting basic information, medicine information and disease information of the patient in the data set, and constructing a heterogeneous information network of the patient.
Preferably, the method for detecting medical insurance fraud based on outlier analysis, wherein the calculating the similarity scores of different patients by using the similarities comprises the following steps: step S401, firstly, the constructed heterogeneous network is analyzed, and the relevance of the patient is reflected by a similarity score calculated for each patient, and in consideration of the problem that the score increases with the increase of the data volume, the visibility factor is expressed in the following form:
in step S402, the score is adjusted in consideration of the number of diseases treated by the drug of the patient, if the patient treats many diseases with the same drug, which is considered abnormal, the score is subtracted from the patient, and the similarity score between the two patients can be expressed by the formula:
preferably, the method for detecting fraud in medical insurance based on outlier analysis is characterized in that the outlier analysis by means of combining the threshold value and the confidence interval comprises the following steps: step S501, after calculating the similarity score of the patient, carrying out statistical analysis on the score to find that the distribution of the similarity score of the patient is close to normal distribution, and carrying out threshold calculation by combining the property of the normal distribution; step S502, determining a flexible critical value for detecting outliers with the aid of a confidence interval of normal distribution, so that the critical value may be automatically adjusted to an appropriate value when other factors change, and the critical value, i.e. the threshold value, may be expressed by a formula:
compared with the prior art, the invention has the beneficial effects that:
(1) the present invention improves the existing outlier analysis method through data preprocessing and adjusts the similarity score of each patient to make it suitable for the medical insurance field. Based on the method, similarity scores of patients are calculated, a similarity score is calculated for each patient by means of similarity and outlier combination, and then outlier analysis is performed.
(2) The invention designs an evaluation algorithm combining a threshold value and a confidence interval for fraud detection. Through statistics, the distribution of the similarity scores can be found to be similar to the normal distribution, and a mode of combining a threshold value and a confidence interval is adopted to determine a flexible critical value for detecting outliers.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the application and, together with the description, serve to explain the application and are not intended to limit the application.
FIG. 1 is a schematic structural diagram of a medical insurance fraud detection apparatus based on outlier analysis according to an exemplary embodiment of the inventive concept;
FIG. 2 is a general flow diagram of a method for detecting medical insurance fraud based on outlier analysis, according to an exemplary embodiment of the inventive concept;
FIG. 3 is a flowchart of medical insurance data acquisition steps of a medical insurance fraud detection method based on outlier analysis, according to an exemplary embodiment of the inventive concept;
FIG. 4 is a flowchart of medical insurance data pre-processing steps of a medical insurance fraud detection method based on outlier analysis, according to an exemplary embodiment of the inventive concept;
FIG. 5 is a flowchart of the information extraction and heterogeneous network construction steps of a method for medical insurance fraud detection based on outlier analysis, according to an exemplary embodiment of the inventive concept;
FIG. 6 is a flowchart of similarity score calculation and adjustment steps of a method for detecting medical insurance fraud based on outlier analysis, according to an exemplary embodiment of the present inventive concept;
fig. 7 is a flowchart of outlier analysis fraud detection steps of an outlier analysis-based medical insurance fraud detection method according to an exemplary embodiment of the inventive concept.
Detailed Description
The invention is further described with reference to the following figures and examples.
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
The invention aims at a phenomenon of actual medical treatment pointed out in a public document, after a patient hospitalization record of medical insurance in a certain area is analyzed, basic information, medicine information, disease information and the like of the patient are extracted, and suspected fraud patient detection is carried out by utilizing similarity score calculation and flexible critical value calculation, and the invention provides a method for fraud detection by outlier analysis, which comprises the following steps:
firstly, two factors of similarity and outlier are considered simultaneously, a similarity score is calculated for each patient by utilizing the similarity, and then the scores of the patients are subjected to outlier analysis and evaluation.
Secondly, a flexible critical value is determined by combining threshold calculation and a confidence interval, and self adjustment can be carried out along with the change of data, so that a fraud patient and a normal patient are distinguished.
Wherein, the following terms related to the invention are:
heterogeneous network: the heterogeneous information network can be represented by a graph G = (V, E), where V represents a vertex and E represents an edge. It can be constructed from many interconnected, large-scale data sets, ranging from social, scientific, engineering, etc. The medical domain can also be modeled as a medical information network whose vertices can include doctors, patients, diseases, treatments, devices, etc., and the vertices can be described as relationships between patient and medication, patient to disease, patient to doctor visits, etc.
Meta-path: the Meta-Path (Meta-Path) is a Path formed by connecting a plurality of vertices, and can systematically reflect the association between different vertices in a heterogeneous information network.
Threshold value: threshold means a limit, so threshold is also called a critical value, and means the lowest value or the highest value that an effect can produce, and is widely used in various scientific fields. For a good threshold setting, more desirable results can be obtained.
Confidence interval: refers to the estimation interval of the overall parameter constructed from the sample statistics. In statistics, the Confidence interval (Confidence interval) of a probability sample is an interval estimate for some overall parameter of this sample. The confidence interval exhibits the extent to which the true value of this parameter has a certain probability of falling around the measurement. The confidence interval indicates the degree of plausibility of the measured value of the measured parameter, i.e. the "one probability" required above.
Fig. 1 is a schematic structural diagram of a medical insurance fraud detection apparatus based on outlier analysis according to an exemplary embodiment of the inventive concept.
As shown in fig. 1, the medical insurance fraud detection apparatus based on outlier analysis according to an exemplary embodiment of the inventive concept includes:
the medical insurance data acquisition module 100 is used for acquiring hospitalization records, medication records and treatment records of medical insurance institutions in certain areas, wherein the medical insurance records comprise basic information, medication information, disease information, diagnosis and treatment information and the like of patients;
the medical insurance data preprocessing module 200 is used for preprocessing the data of the original data set by using a data cleaning technology and pharmacopoeia; wherein,
because the data set has a series of problems of data loss, data inconsistency and the like, such as loss of basic information of patients, dosage of medicines and the like, the method utilizes a data cleaning technology to carry out decryption processing on sensitive data, ensures the integrity and confidentiality of information, and well processes data with high loss rate. Preferably, the pharmacopoeia is the Chinese pharmacopoeia (2015 edition), and the Chinese pharmacopoeia (2015 edition) is used for further extracting and classifying fine-particle-size medicines into coarse-particle-size medicines, so that the problem of medicine information processing is solved;
a similarity score calculating module 300 for calculating a similarity score for each patient through a heterogeneous network; wherein,
in the similarity score calculating module, calculating a similarity score for each patient by analyzing a heterogeneous network, and adjusting the similarity score by considering the number of diseases which can be treated by the medicines of the patient;
the outlier detection module 400 sets a flexible critical value by combining a fixed threshold value and a confidence interval to perform outlier iterative search; wherein,
in the outlier detecting module 400, it is found through statistics that the distribution of the similarity score is similar to the normal distribution, so that the confidence interval of the normal distribution is considered in the threshold calculation. Calculating a flexible critical value, and comparing the flexible critical value with the similarity score to find an outlier in the flexible critical value;
a patient fraud detection module 500 that determines the found outliers as suspected fraudulent patients by outlier analysis; wherein,
in the fraud detection module 500, by outlier analysis, normal patients and fraudulent patients can be distinguished according to the obtained outliers, and the found outliers are determined as suspected fraudulent patients.
Fig. 2 is a general flowchart of a method for detecting medical insurance fraud based on outlier analysis according to an exemplary embodiment of the inventive concept.
As shown in fig. 2, the medical insurance fraud detection method based on outlier analysis according to an exemplary embodiment of the inventive concept includes:
step S1, obtaining the actual medical record in the medical insurance in a certain area;
in a specific implementation, a medical record of actual medical insurance in a certain area is obtained, and a lot of available patient-related information, such as patient basic information, medicine information, disease information and the like, is extracted from the medical record for medical fraud detection.
Step S2, preprocessing the data set by using data cleaning and pharmacopoeia;
most of the data sets cannot be directly used in the medical field due to a series of problems of data missing, data inconsistency and the like of the data sets, such as missing of basic information of patients, dosage of medicines and the like. In the medical field, for example, the specific situation of each medicine is difficult to analyze because the medicine is too many in types and the data volume of each medicine is too small;
preferably, the invention utilizes the Chinese pharmacopoeia (2015 edition) to solve the problem of medicine information processing. By inquiring Chinese pharmacopoeia (2015 edition), medicines with fine particle sizes can be further extracted and classified into medicines with coarse particle sizes according to the recorded medicine types;
the invention utilizes the data cleaning technology to carry out encryption processing on the sensitive data, deletes the data with higher loss rate, ensures the integrity and confidentiality of the information and well processes the data with high loss rate.
Step S3, extracting information of patients, diseases and medicines to construct a heterogeneous network;
extracting basic information, medicine information and disease information of the patient from the acquired medical records, establishing a heterogeneous information network according to mutual relation, enabling the information to appear in the heterogeneous network in a vertex mode, and describing the connection between the information and the heterogeneous information network as information that the patient takes a certain medicine, the medicine can treat a certain disease and the like.
The patient takes a certain medicine each time because of a certain disease, and in the process, the information among the patient, the medicine and the disease is mutually linked to form a huge network which can be regarded as a heterogeneous network formed by taking the three as vertexes.
Step S4, calculating similarity scores of different patients by using the similarity;
calculating a similarity of each patient with other patients through a constructed heterogeneous network, further adjusting the calculated scores for the abnormal condition that the same medicine can treat a plurality of diseases from the disease perspective, calculating the scores of all patients in the candidate group of patients and the reference group, and taking the average value as a final score reflecting the similarity of the candidate group of patients and the normal patients.
Step S5, performing outlier analysis by combining a threshold value and a confidence interval to distinguish normal and fraud patients, and finally determining suspected fraud patients;
after calculating the similarity score of the patient, the invention designs an evaluation algorithm to detect outliers. And combining the threshold value with the confidence interval, and finding that the distribution of the similarity score is similar to the normal distribution through statistics. The invention contemplates taking the confidence interval of a normal distribution, i.e., the mean minus the standard deviation and the mean plus the standard deviation. The threshold is flexibly calculated to determine outliers to find fraudulent patients.
The steps of the medical insurance fraud detection method based on outlier analysis according to an exemplary embodiment of the inventive concept are specifically set forth below, as shown in fig. 3 to 7:
step S101, acquiring a large amount of medical care related data of a medical insurance institution in a certain area, reserving useful data and removing useless data;
step S102, extracting data such as patient basic information record, medication record, treatment record and the like.
Step S201, extracting hospitalizing data of a patient to be subjected to fraud detection;
step S202, processing patient sensitive data and data with high data loss rate by using a data cleaning technology (ensuring that not less than three medical records of each patient are ensured);
step S203, the medicine information is processed by category by inquiring Chinese pharmacopoeia (2015 edition), and a plurality of similar medicines are unified into the same category according to the corresponding relationship between the medicines and the categories.
Step S301, analyzing the preprocessed patient data set, mainly analyzing basic information of the patient, diseased information of the patient and medication condition of the patient;
step S302, extracting basic information, medicine information and disease information of the patient in the data set, and constructing a heterogeneous information network of the patient.
Step S401, firstly, the constructed heterogeneous network is analyzed, the relevance of the patients is reflected by calculating a similarity score for each patient,
this is not reasonable in view of the problem that the score increases as the amount of data increases. In order to solve this problem, the present invention adjusts the visibility factor, which can be expressed in the following form,
step S501, after calculating the similarity score of the patient, performing statistical analysis on the score to find that the distribution of the similarity score of the patient is close to normal distribution, and performing threshold calculation by considering the property of the normal distribution;
step S502, a flexible critical value for detecting outliers is determined by the aid of a confidence interval of normal distribution, when other factors change, the critical value can be automatically adjusted to a proper value, and the critical value, namely the threshold value, can be expressed by a formula,
step S503, iteratively calculating a critical value and comparing the critical value with the similarity score, and considering the score lower than the critical value as an outlier, and screening the outlier in the mode;
through iterative operations, some outliers may be found, but they may be only a fraction of all outliers. They are deleted and the remaining data is used to recalculate the threshold values and delete the new outliers. This process is repeated until no new outliers are found and the last threshold is the final value.
Step S504, the found outliers are determined to be suspected fraudulent patients.
In conclusion, in the fraud detection of the patient, the invention analyzes and preprocesses the historical data record of the patient by using the basic information, the medication information and the disease information of the patient in the medical insurance data, and calculates the similarity score of the patient through the constructed heterogeneous network. And then, outlier analysis is carried out in a mode of combining a threshold value and a confidence interval, so that normal patients and fraudulent patients can be distinguished according to the obtained outliers.
The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.
Although the embodiments of the present invention have been described with reference to the accompanying drawings, it is not intended to limit the scope of the present invention, and it should be understood by those skilled in the art that various modifications and variations can be made without inventive efforts by those skilled in the art based on the technical solution of the present invention.
Claims (5)
1. A medical insurance fraud detection method based on outlier analysis is characterized by comprising the following steps:
step S1, obtaining the actual medical record in the medical insurance in a certain area;
step S2, preprocessing the data set by using data cleaning and pharmacopoeia;
step S3, extracting information of patients, diseases and medicines to construct a heterogeneous network;
step S4, calculating similarity scores of different patients by using the similarity;
step S5, performing outlier analysis by combining a threshold value and a confidence interval to distinguish normal and fraud patients, and finally determining suspected fraud patients;
the method for performing outlier analysis by combining the threshold value and the confidence interval comprises the following steps:
step S501, after calculating the similarity score of the patient, carrying out statistical analysis on the score to find that the distribution of the similarity score of the patient is close to normal distribution, and carrying out threshold calculation by combining the property of the normal distribution;
step S502, a confidence interval of normal distribution is used for assisting in determining a flexible critical value for detecting an outlier, so that when other factors change, the critical value can be automatically adjusted to a proper value, and the critical value, namely the threshold value, is expressed by a formula:
step S503, iteratively calculating a critical value and comparing the critical value with a similarity score, wherein the score lower than the critical value is regarded as an outlier, and screening the outlier in the way,
through iterative operations, some outliers can be found, but they are only a fraction of all outliers, the outliers are deleted, and the threshold is calculated again using the remaining data, and new outliers are deleted, repeating this process until no new outliers can be found, and the last threshold is taken as the final value;
step S504, the found outliers are determined to be suspected fraudulent patients.
2. The method of claim 1, wherein the step of obtaining the medical record of the actual medical insurance in the area comprises the steps of:
step S101, acquiring a large amount of medical care related data of a medical insurance institution in a certain area, reserving useful data and removing useless data;
step S102, extracting basic information record, medication record and treatment record data of the patient.
3. The method of claim 2, wherein the pre-processing the data set using data washing and pharmacopoeia comprises the steps of:
step S201, extracting hospitalizing data of a patient to be subjected to fraud detection;
step S202, processing sensitive data and data with high data loss rate of patients by using a data cleaning technology, and ensuring that not less than three medical records of each patient are obtained;
step S203, the category of the medicine information is processed by inquiring the pharmacopoeia, and a plurality of similar medicines are unified into the same category according to the corresponding relationship between the medicines and the categories.
4. The method of claim 1, wherein the step of extracting information of patients, diseases and drugs to construct a heterogeneous network comprises the steps of:
step S301, analyzing the preprocessed patient data set, mainly analyzing basic information of the patient, diseased information of the patient and medication condition of the patient;
step S302, extracting basic information, medicine information and disease information of the patient in the data set, and constructing a heterogeneous information network of the patient.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711471001.4A CN108596770B (en) | 2017-12-29 | 2017-12-29 | Medical insurance fraud detection device and method based on outlier analysis |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711471001.4A CN108596770B (en) | 2017-12-29 | 2017-12-29 | Medical insurance fraud detection device and method based on outlier analysis |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108596770A CN108596770A (en) | 2018-09-28 |
CN108596770B true CN108596770B (en) | 2022-04-01 |
Family
ID=63633408
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711471001.4A Active CN108596770B (en) | 2017-12-29 | 2017-12-29 | Medical insurance fraud detection device and method based on outlier analysis |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108596770B (en) |
Families Citing this family (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109359873B (en) * | 2018-10-24 | 2021-04-20 | 哈工大机器人(山东)智能装备研究院 | Based on PCA-T2Health assessment method for ball screw pair |
CN109615204B (en) * | 2018-11-30 | 2023-02-03 | 平安医疗健康管理股份有限公司 | Quality evaluation method, device and equipment of medical data and readable storage medium |
CN109616176A (en) * | 2018-12-04 | 2019-04-12 | 平安医疗健康管理股份有限公司 | Method, apparatus, equipment and the storage medium that auxiliary doctor prescribes |
CN109669935A (en) * | 2018-12-13 | 2019-04-23 | 平安医疗健康管理股份有限公司 | Check data screening method, apparatus, equipment and storage medium |
CN109636645A (en) * | 2018-12-13 | 2019-04-16 | 平安医疗健康管理股份有限公司 | Medical insurance monitoring and managing method, unit and computer readable storage medium |
CN109659000A (en) * | 2018-12-13 | 2019-04-19 | 平安医疗健康管理股份有限公司 | Recognition methods, device, terminal and the computer readable storage medium of violation prescription |
CN109919780B (en) * | 2019-01-23 | 2024-07-09 | 平安科技(深圳)有限公司 | Method, device, equipment and storage medium for settling claims and resisting fraud based on graph computing technology |
CN110322356B (en) * | 2019-04-22 | 2020-08-07 | 山东大学 | Medical insurance abnormity detection method and system based on HIN mining dynamic multi-mode |
CN110349662B (en) * | 2019-05-23 | 2023-01-13 | 复旦大学 | Cross-image set outlier sample discovery method and system for filtering lung mass misdetection results |
CN110827159B (en) * | 2019-11-11 | 2023-11-03 | 上海交通大学 | Financial medical insurance fraud early warning method, device and terminal based on relation diagram |
CN111105317B (en) * | 2019-12-28 | 2023-05-12 | 哈尔滨工业大学 | Medical insurance fraud detection method based on medicine purchasing record |
CN111127207B (en) * | 2019-12-28 | 2023-06-09 | 哈尔滨工业大学 | Pharmaceutical sales fraud supervision system and supervision method based on blockchain |
CN111462897B (en) * | 2020-04-01 | 2021-05-11 | 山东大学 | Patient similarity analysis method and system based on improved heterogeneous information network |
CN111612636A (en) * | 2020-04-29 | 2020-09-01 | 山东大学 | Abnormal medical insurance data detection system and method based on dual clustering algorithm |
CN111597336B (en) * | 2020-05-14 | 2023-12-22 | 腾讯科技(深圳)有限公司 | Training text processing method and device, electronic equipment and readable storage medium |
CN112435133A (en) * | 2020-11-18 | 2021-03-02 | 厦门理工学院 | Medical insurance combined fraud detection method, device and equipment based on graph analysis |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102013084A (en) * | 2010-12-14 | 2011-04-13 | 江苏大学 | System and method for detecting fraudulent transactions in medical insurance outpatient services |
CN104111996A (en) * | 2014-07-07 | 2014-10-22 | 山大地纬软件股份有限公司 | Health insurance outpatient clinic big data extraction system and method based on hadoop platform |
CN104408547A (en) * | 2014-10-30 | 2015-03-11 | 浙江网新恒天软件有限公司 | Data-mining-based detection method for medical insurance fraud behavior |
CN105159948A (en) * | 2015-08-12 | 2015-12-16 | 成都数联易康科技有限公司 | Medical insurance fraud detection method based on multiple features |
CN105868555A (en) * | 2016-03-29 | 2016-08-17 | 陈杰 | Similarity calculation method based on commercial medical insurance claim cases |
CN106408141A (en) * | 2015-07-28 | 2017-02-15 | 平安科技(深圳)有限公司 | Abnormal expense automatic extraction system and method |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7865389B2 (en) * | 2007-07-19 | 2011-01-04 | Hewlett-Packard Development Company, L.P. | Analyzing time series data that exhibits seasonal effects |
-
2017
- 2017-12-29 CN CN201711471001.4A patent/CN108596770B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102013084A (en) * | 2010-12-14 | 2011-04-13 | 江苏大学 | System and method for detecting fraudulent transactions in medical insurance outpatient services |
CN104111996A (en) * | 2014-07-07 | 2014-10-22 | 山大地纬软件股份有限公司 | Health insurance outpatient clinic big data extraction system and method based on hadoop platform |
CN104408547A (en) * | 2014-10-30 | 2015-03-11 | 浙江网新恒天软件有限公司 | Data-mining-based detection method for medical insurance fraud behavior |
CN106408141A (en) * | 2015-07-28 | 2017-02-15 | 平安科技(深圳)有限公司 | Abnormal expense automatic extraction system and method |
CN105159948A (en) * | 2015-08-12 | 2015-12-16 | 成都数联易康科技有限公司 | Medical insurance fraud detection method based on multiple features |
CN105868555A (en) * | 2016-03-29 | 2016-08-17 | 陈杰 | Similarity calculation method based on commercial medical insurance claim cases |
Non-Patent Citations (1)
Title |
---|
医保欺诈行为的主动发现;金华 等;《中国会议》;20151129;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN108596770A (en) | 2018-09-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108596770B (en) | Medical insurance fraud detection device and method based on outlier analysis | |
US11967074B2 (en) | Method and system for computer-aided triage | |
US11631175B2 (en) | AI-based heat map generating system and methods for use therewith | |
US11158406B2 (en) | Automatic patient recruitment system | |
WO2021051938A1 (en) | Data anomaly analysis method and system employing graph analysis and computer device | |
Robinson Jr et al. | Clinical epidemiology and treatment patterns of patients with multicentric C astleman disease: results from two US treatment centres | |
Karthikeyan et al. | A Novel Deep Learning‐Based Black Fungus Disease Identification Using Modified Hybrid Learning Methodology | |
Manju et al. | Efficient multi-level lung cancer prediction model using support vector machine classifier | |
US12061994B2 (en) | Inference process visualization system for medical scans | |
Pathak et al. | A robust automated cataract detection algorithm using diagnostic opinion based parameter thresholding for telemedicine application | |
US20210326995A1 (en) | Claim settlement anti-fraud method, apparatus, device, and storage medium based on graph computation technology | |
US20120041784A1 (en) | Computerized Surveillance of Medical Treatment | |
CN111612636A (en) | Abnormal medical insurance data detection system and method based on dual clustering algorithm | |
Patel et al. | Cmbatting COVID-19: Artificial Intelligence Technologies & Challenges | |
Hoo et al. | Exploring the implications of different approaches to estimate centre-level adherence using objective adherence data in an adult cystic fibrosis centre–a retrospective observational study | |
Bakthula et al. | Automated human bone age assessment using image processing methods-survey | |
Zhang et al. | The comfort of patients with different nasal packings after endoscopic sinus surgery for chronic rhinosinusitis: A protocol for network meta-analysis | |
Cheng et al. | Brain tumor feature extraction and edge enhancement algorithm based on U-Net network | |
Chengathir et al. | Prediction of dengue using data mining classification algorithms | |
CN115910374B (en) | Hospital infectious disease aggregation time early warning method and medium | |
CN118280576B (en) | Patient care grade intelligent evaluation system based on high-dimensional tumor data | |
Priya et al. | Histogram based multimodal minimum cross entropy thresholding method for magnetic resonance brain tissue segmentation | |
Soni et al. | Application of data mining to health care | |
Pham et al. | Collective Anomaly Detection: Application to Respiratory Artefact Removals | |
CN117725366A (en) | Cloud computing-based pharmaceutical and mechanical recommendation analysis algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |