CN111612636A

CN111612636A - Abnormal medical insurance data detection system and method based on dual clustering algorithm

Info

Publication number: CN111612636A
Application number: CN202010368770.7A
Authority: CN
Inventors: 李晖; 李瑞璨; 崔立真; 郭伟
Original assignee: Shandong University
Current assignee: Shandong University
Priority date: 2020-04-29
Filing date: 2020-04-29
Publication date: 2020-09-01

Abstract

The disclosure discloses an abnormal co-occurrence hospitalizing medical insurance data detection system and method based on a dual clustering algorithm, and hospitalizing information and demographic information are acquired; constructing a P-TL picture according to medical insurance medical record of the medical insurance ginseng and insured people; aiming at the constructed P-TL image, mining suspicious patient groups frequently hospitalized at the same place at the same time and suspicious hospitalization records of the suspicious patient groups through a double clustering algorithm; normal patients were filtered out in the suspect patient population: for each resulting group of suspected fraudulent patients, isolated patients who are not edge-linked to other patients are filtered out, while the remaining groups of patients who are edge-linked to other patients are considered fraudulent if the number of people exceeds a threshold. Normal patients who are misjudged due to long-term regular medical attendance can be filtered, and medical insurance fraud behaviors can be identified more accurately.

Description

Abnormal medical insurance data detection system and method based on dual clustering algorithm

Technical Field

The disclosure belongs to the field of medical insurance computers, and particularly relates to an abnormal co-occurrence hospitalization insurance data detection system and method based on a dual-clustering algorithm.

Background

The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.

The medical insurance system is a social insurance system established for compensating the economic loss of workers caused by the disease risk.

With the explosive development of the medical insurance industry, few illegal persons begin fraudulent conduct against medical insurance funds for the benefit of interest.

The medical insurance data can be obtained through the medical insurance system, and abnormal data can be obtained through analysis of data such as card swiping or hospitalization reimbursement of medical insurance, for example, when the medical insurance data of a certain actor is consumed in the same place and time for many times, or when the medical insurance data of a certain actor purchases records of the same kind of medicines in the same place and time for many times, the abnormal data can be detected under the general condition of the records of the medical insurance data, and then the abnormal data is further analyzed or corresponding information feedback is executed, or a more strict data supervision scheme is established.

The inventor finds that, in research, the current medical insurance data anomaly detection mainly aims at simple analysis of anomaly data, including acquisition and judgment of corresponding time and place of the data, but does not consider the situation that recorded data are consumed at the same time and the same place for the same times caused by some chronic diseases, so that a certain error exists in the current medical insurance data anomaly detection accuracy, and the main reason that the medical insurance data is inaccurate in detection is caused by inaccurate factors considered for processing and detecting the medical insurance data, so that the main technical problem to be solved by the disclosure is how to carry out the anomaly detection of the medical insurance data under the premise that recorded data of the same kind of medicines are normally purchased at the same time and place for multiple times in the situation of big data.

Disclosure of Invention

In order to overcome the defects of the prior art, the abnormal medical insurance data detection method based on the dual cluster algorithm is provided, the dual cluster algorithm is utilized, and the health medical knowledge base is introduced, so that suspicious patient groups frequently hospitalized at the same time and the same place can be mined, normal patients wrongly judged due to long-term regular hospitalization can be filtered, and the medical insurance fraud behavior can be identified more accurately.

In order to achieve the above object, one or more embodiments of the present disclosure provide the following technical solutions:

the abnormal medical insurance data detection method based on the dual clustering algorithm comprises the following steps:

collecting medical insurance medical record data of medical insurance ginseng insurance people, and constructing a P-TL (graph), wherein the graph comprises two types of nodes, and P represents a set of the medical insurance medical record medical insurance ginseng insurance people; TL represents the collection of the hospitalizing time and hospitalizing place information in the hospitalizing record of the medical insurance;

aiming at the constructed P-TL image, mining suspicious patient groups frequently hospitalized at the same place at the same time and suspicious hospitalization records of the suspicious patient groups through a double clustering algorithm;

filtering out normal medical records from the suspicious medical records: for each group of suspected fraudulent patient populations in the resulting suspicious medical records, isolated patients in which other patients are not linked by edges are filtered out, while patient populations in the remaining suspicious medical records that are linked to each other by edges are considered medical abnormality data if the number of people exceeds a threshold.

On the other hand, the disclosure also discloses abnormal medical insurance data detection equipment based on the dual-clustering algorithm, which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, and is characterized in that the processor realizes the steps of the abnormal medical insurance data detection method based on the dual-clustering algorithm when executing the program.

In another aspect, the present disclosure also discloses a computer readable storage medium, on which a computer program is stored, wherein the program is executed by a processor to execute the steps of the abnormal medical insurance data detection method based on the dual cluster algorithm.

On the other hand, the present disclosure further discloses an abnormal co-occurrence hospitalization medical insurance data detection system based on the dual clustering algorithm, which is characterized by comprising:

the hospitalizing record data processing pre-module comprises: collecting medical insurance medical record data of medical insurance ginseng insurance people, and constructing a P-TL (graph), wherein the graph comprises two types of nodes, and P represents a set of the medical insurance medical record medical insurance ginseng insurance people; TL represents the collection of the hospitalizing time and hospitalizing place information in the hospitalizing record of the medical insurance;

the medical abnormal data detection module is used for mining suspicious patient groups and suspicious medical records of the suspicious patient groups who frequently see the medical at the same place at the same time through a double clustering algorithm aiming at the constructed P-TL image;

The above one or more technical solutions have the following beneficial effects:

aiming at the characteristics of accurate detection of the existing medical insurance record data, the technical scheme of the disclosure is to carry out processing schemes such as data cleaning, normalization and data encryption aiming at the acquired medical insurance data, the processed data is complete medical insurance record data which can be subsequently processed, and when abnormal data is detected, suspicious patient groups frequently hospitalized at the same time and place and suspicious hospitalization records of the suspicious patient groups are mined through a double clustering algorithm; filtering out normal medical records from the suspicious medical records: for each group of suspicious fraudulent patient groups in the obtained suspicious medical records, isolated patients which are not linked with other patients through edges are filtered out, and the remaining suspicious medical records are patient groups which are linked with other patients through edges, if the number of people exceeds a threshold value, the suspicious medical records are regarded as abnormal medical data, so that misjudgment on the abnormal medical data can be greatly avoided, and the accuracy of abnormal data detection is improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the disclosure, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure and are not to limit the disclosure.

FIG. 1 is a flowchart of an abnormal co-occurrence hospitalization medical insurance data abnormal identification method based on a dual clustering algorithm according to an embodiment of the present disclosure;

FIG. 2 is a model diagram of a patient population for detecting frequent simultaneous hospitalizations based on a dual clustering algorithm according to an embodiment of the present disclosure;

FIG. 3 is a model diagram illustrating calculation of prescription similarity between suspicious patients according to an embodiment of the present disclosure.

Detailed Description

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present disclosure. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.

The embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict.

The traditional medical insurance anti-fraud work mainly depends on formulation of rules, firstly, the medical insurance fraud rules are formulated, the hospitalization behaviors of the insured person are identified based on the rules, and the deceased person and the deceptive behaviors thereof are determined. The method highly depends on the experience of experts, corresponding rules can be formulated generally after the fraudulent conduct happens, and the medical insurance fraudulent conduct cannot be identified quickly and efficiently.

The existing abnormal co-occurrence medical treatment fraud behaviors specifically mean that some cheaters acquire medical insurance cards of multiple ginseng insurance people in a certain mode, and the medical insurance cards are used for purchasing medicines and then selling the medicines in reverse to obtain medical insurance funds in a cheating mode. These fraudsters, in order to reduce the cost of fraud, typically purchase the drug in one fraud using multiple health care cards.

Aiming at the behaviors, the conventional abnormal co-occurrence hospitalizing fraud behavior identification method only considers mining suspicious patient groups which frequently seek medical advice at the same time and place, but does not consider the situation that part of normal patients are misjudged due to long-term regular hospitalizing, so that the detection result is not accurate enough.

The general idea proposed by the present disclosure:

the method is based on a double clustering algorithm, and a suspicious patient group which frequently visits the same place at the same time is mined; meanwhile, a health medical knowledge base is introduced, normal patients which are misjudged due to long-term regular medical attendance are filtered, and fraudulent patients are obtained more accurately.

Example one

The embodiment discloses an abnormal medical insurance data detection method based on a dual cluster algorithm, which comprises the following steps:

step (1): and acquiring the visit information and the demographic information.

Acquiring the visit information of a patient, wherein the visit information mainly comprises: disease data, medication data, diagnosis and treatment data; acquiring demographic information of a patient, wherein the demographic information mainly comprises the age, sex, personnel category, marital, cultural level, occupation, residence and the like of the patient;

the visit information may be obtained from the medical system at the time of acquisition using communication means.

Step (2): and (5) data preprocessing.

The data set of the technical scheme is from a medical insurance information management system and comprises demographic information of patients, such as sex, age and the like, personal numbers, medical treatment numbers, disease names, disease codes, medicine names, medical treatment time, examination items and the like. Due to the fact that medicine codes and disease codes of different hospitals are different, medicine codes and disease codes of different medical insurance institutions are different, and data errors or data loss can also occur due to misoperation of workers sometimes, and therefore the problems of data inconsistency, data loss and data errors can occur in medical insurance data. Meanwhile, because of privacy problems, sensitive information such as personal codes and disease codes needs to be decrypted.

Firstly, data needs to be cleaned, and data with high missing rate and error data are processed; then, standardizing the medicine code and the disease code, and mapping the medicine code and the disease code to an international standard or a national standard to eliminate the problem of data inconsistency; and finally, the sensitive data is decrypted.

Sensitive data such as identity card number information, names, home addresses and other information are subjected to encryption processing by using an MD5 algorithm, namely the sensitive data are processed into meaningless character strings, so that sensitive information is prevented from being leaked when the data are used;

in medical data, since missing data cannot be filled, data having a missing rate higher than a set threshold value is deleted.

According to the international disease classification standard code ICD-10, the disease diagnosis code in the diagnosis information is converted into the corresponding international disease classification standard code ICD-10.

According to the Chinese pharmacopoeia (2015 edition), the medicine codes in the diagnosis information are converted into the corresponding medicine codes in the Chinese pharmacopoeia (2015 edition).

The specific data processing steps include:

1) data cleaning: in medical data, since missing data cannot be filled, data having a missing rate higher than a set threshold value is deleted. Data that is significantly erroneous is also deleted.

2) Data normalization:

a. and mapping the disease codes and the disease names of the original data set to international disease classifications (ICD-10) with the version as ICD-10. The mapping is divided into the following three cases:

if the disease name of the data set can be matched exactly to the disease name in ICD-10, the disease name of the original data set is retained and its disease code is changed to the corresponding disease code in ICD-10.

If the disease names in the data set can not be completely matched with the disease names in the ICD-10, firstly, Word segmentation is carried out on the disease names, different disease names are converted into Word vectors through the Word2Vec technology, meanwhile, the disease names in the ICD-10 are also converted into the Word vectors by adopting the same algorithm, and the similarity of the two Word vectors is calculated. For disease names with similarity exceeding the threshold, the disease code and disease name in the original data set are changed to the disease code and disease name in ICD-10.

And mapping the disease names with the similarity lower than the threshold value in a manual mode.

b. For the drug codes and drug names in the original data set, they were mapped with data from the pharmacopoeia of the people's republic of china (2015 edition). The specific operation process is similar to the disease name processing process, and the description is not repeated.

3) Data encryption: sensitive data such as identity card number information, names, home addresses and other information are subjected to encryption processing by using an MD5 algorithm, namely the sensitive data are processed into meaningless character strings, so that sensitive information leakage during data use is avoided.

And (3): and constructing a P-TL picture according to the medical insurance medical record of the medical insurance ginseng and insured people.

Wherein, the figure has two types of nodes, P represents the collection of the Chinese medical insurance ginseng insurance records for medical insurance; TL represents the collection of the hospitalizing time and hospitalizing place information in the hospitalizing record of the medical insurance, consisting of<Time and place of hospitalization>And (4) showing. There are two types of edges e in the figure, one is the edge connecting the insured ginseng and the insured ginseng, and is represented by e (p)_i,p_j) Is represented by the formula, wherein p_i,p_j∈ P, its weight w (P)_i,p_j) Calculating by the step (5); the other is the edge between the medical insurance ginseng and the medical position at the medical time, which is formed by e (p)_i,tl_j) Is represented by the formula, wherein p_i∈P，tl_i∈ TL, its weight w (p)_i,tl_j) Relating to the hospitalizing time and the hospitalizing place of the medical insurance ginseng insurance person. The method comprises the following specific steps:

for edge e (p)_i,tl_j) Weight w (p) of_i,tl_j) The time threshold Φ is calculated, set by the present disclosure to be two days. Wherein tl_j＝<t_j,l_j>，t_jStands for tl_jTime of hospitalization of Chinese medicine, /)_jStands for tl_jThe location of hospitalization. Let t_iRepresentative of patient p_iThe time of hospitalization.

When patient p_iAt and t_jWithin a time interval of phi at_jThe location takes a hospitalizing action, i.e. | t_j-t_i|<Φ, then weight w (p)_i,tl_j) The calculation method is as follows:

otherwise, when patient p_iIs not in conjunction with t_jWithin a time interval of phi at_jLocation hospitalization, weight w (p)_i,tl_j) The calculation method is as follows:

w(p_i,tl_j)＝0。

and (4): in the P-TL map constructed in step (3), the suspicious patient population frequently hospitalized at the same place at the same time and their suspicious hospitalization records are mined by a novel double clustering algorithm, as shown in FIG. 2. The method comprises the following specific steps:

(4.1) constructing a matrix M with the size of n × M to represent a P-TL diagram, wherein n is the number of elements contained in the medical insurance participant set P, M is the number of elements contained in the medical time and medical place information set TL, and M is the number of elements contained in the medical time and medical place information set TL_i,jEqual to the edge e (P) in the P-TL graph_i,tl_j) Weight value of w (p)_i,tl_j)。

(4.2) the double clustering algorithm can cluster the rows and columns of the matrix at the same time, and by this method, suspicious patient groups who frequently visit the same place at the same time and suspicious patients can be minedAnd (6) medical record. Let n dimension vector

And m-dimensional vector

Respectively representing the left vector and the right vector obtained by matrix decomposition of the matrix M. The outer product of the two vectors is as close as possible to the matrix M, i.e.,

the objective function to be solved is:

wherein,

is a vector

The number of non-zero entries in (a),

is a vector

Number of non-zero terms in,/_uAnd l_vRespectively limit the vector

Sum vector

The maximum number of non-zero entries in (c). Minimizing the above objective function is mathematically equivalent to minimizing

Wherein λ_uAnd λ_vCorresponding to the lagrange multiplier at the y-optimum.

In this embodiment, the above objective function needs to be solved by using a PALM algorithm, which is as follows:

(4.2.1) vector

Sum vector

Is initialized to 1. Order vector

Sum vector

Representing vectors at the t-th iteration

Sum vector

(4.2.2) Using vectors

Sum vector

Computing vectors

Order to

Represents y at the point

The partial derivative is calculated by

Order to

Represents

The Rippschtz modulus is calculated in the following way

Order to

As an index function, defined as:

when in use

When the temperature of the water is higher than the set temperature,

when in use

It is that,

wherein

Representative vector

The sum of the terms in (1).

Computing

The following optimization functions need to be solved:

η therein_u>1, a constant, is set to 2. The optimization function can then be converted into:

this problem is mathematically equivalent to

An analytical solution of it is

It can be seen that

Of the maximum absolute value of l_uOne element remains to be the optimal solution of the optimization function in (4.2.2). For example, if l_uIs 5, then will

Is arranged in descending absolute value order, the largest 5 items are selected to remain unchanged, and the rest items are set to be 0, the disclosure defines α as

Absolute value of element l_uLarge element value, then

The value of (d) is defined as:

when in use

When the temperature of the water is higher than the set temperature,

when in use

When the temperature of the water is higher than the set temperature,

(4.2.3) Using vectors

Sum vector

Computing vectors

Order to

Represents y at the point

The partial derivative is calculated by

Order to

Represents

The Rippschtz modulus is calculated in the following way

Order to

As an index function, defined as:

when in use

When the temperature of the water is higher than the set temperature,

when in use

It is that,

wherein

Representative vector

The sum of the terms in (1).

Computing

The following optimization functions need to be solved:

η therein_v>1, a constant, is set to 2. The optimization function can then be converted into:

this problem is mathematically equivalent to

Similarly, an analytical solution thereof is

Definition β is

Absolute value of element l_vLarge element value, then

The value of (d) is defined as:

when in use

When the temperature of the water is higher than the set temperature,

when in use

When the temperature of the water is higher than the set temperature,

(4.2.4) repeating the steps (4.2.2) and (4.2.3) repeatedly until the result converges. For example, up to

And is

The calculation is stopped with a setting of 0.01.

For the resulting vector

Sum vector

And clustering the rows and columns of the matrix M respectively corresponding to the non-zero items to obtain the sub-matrix. The present disclosure sets two thresholds Ψ and Y to limit the minimum of the rows and columns of the submatrix, which are set to 2 and 10, respectively. The row set corresponding to the submatrix is a mined suspicious patient group, the row set comprises elements not less than Ψ, the column set is a medical treatment location information set at medical treatment time, the result corresponds to medical treatment records of the suspicious patient group with fraud suspicion, and the column set comprises elements not less than Y. For example, if Y is set to 1, the suspicious patient groups only have to seek medical treatment at the same place at the same time, and the basis for judging the abnormality is not sufficient.

(4.2.5) in step (4.2.4), only one suspect group of patients was mined. If a new suspect patient population is to be mined again, the elements of the corresponding row in the M matrix corresponding to the mined patient are set to zero. For example, if the patient corresponding to the ith row of the matrix has been mined, then

Then the step (4.2.4) is performed on the updated matrix M to mine the new suspicious patient population and their suspicious medical records.

And (5): the similarity of the prescription from patient to patient is calculated as shown in figure 3.

As mentioned in step (3), in the P-TL diagram, the edge e (P)_i,p_j) Weight w (p) of_i,p_j) Representative of patient p_iTo the patient p_jThe similarity of the prescriptions between them. In step (4), a suspect patient population and their suspect medical records are mined. In this step, only calculation of the similarity of the prescription between these suspicious patients is considered, not all patients, and calculation of the similarity of the prescription only considers the suspicious medical records of the patients, not all medical records of the patients. The method comprises the following specific steps:

(5.1) calculating the weight (AW) of the drug in the medical insurance record. Drugs that are of interest to the fraudster should be weighted more heavily, such as drugs with high reimbursement rates, high sales prices, and a wide range of uses. Because the fraudulent group is selling the drug backwards, the present disclosure is only concerned with the drug that can be sold backwards, and not with other kinds of merchandise items, such as surgery, detection reagents, etc. The method comprises the following specific steps:

(5.1.1) if the item of merchandise is a reversible drug,

AW (drug) ratio of drug reimbursement x price of drug x total number of drugs in data set

- (1-drug reimbursement ratio) × drug selling price × total number of drugs in the data set,

(5.1.2) the weights of all drugs are then normalized to be at [0, 1 ]. The method comprises the following specific steps:

wherein min is the minimum value of all drug weights, and max is the maximum value of all drug weights.

(5.2) calculating the similarity s between different medical records^v. Each medical record can be expressed as:

and v is { time, location, diagnose, medicine, dose }, which represents the time, place, disease diagnosis, medicine, and medicine dosage information of the current medical treatment.

For two medical records

v_i＝{time_i,location_i,diagnose_i,medicine_i,dose_i}，

v_j＝{time_j,location_j,diagnose_j,medicine_j,dose_j}，

Similarity between them s^vThe calculation formula is as follows:

wherein the (x, y) function is defined as follows:

when x and y are the same, (x, y) ═ 1,

when x and y are different, (x, y) ═ 0.

(5.3) calculating the similarity s between prescriptions of different patients^p. The prescription V for each patient is a collection of medical records V, which can be represented as

V＝{v₁,v₂,...,v_l}，

Where l is the number of medical records V contained in the prescription V.

Prescription V for two patients_iAnd V_jTheir similarity s^pThe calculation formula is as follows:

wherein

|total(V_i,V_j)|＝|V_i|+|V_j|-|same(V_i,V_j)|。

And | same (V)_i,V_j) | is defined as:

wherein s is^v(V_i,p,V_j,q) Indicating prescription V_iThe p-th medical record and prescription V_jThe q-th medical record of (1). A is a matrix obtained by solving the following function:

A_p,q≥0，

wherein，fre_i,pIs a prescription V_iFrequency of occurrence, fre, of the p-th medical record_j,qPrescription V_jThe frequency of occurrence of the q-th medical record of (1). The solution function method is specifically as follows:

(5.3.1) order matrix

In which the number of rows of matrix A is equal to prescription V_iIncluding the number of medical records, the number of columns of the matrix A being equal to the prescription V_jIncluding the number of medical records. The matrix L has the same number of rows and columns as the matrix a.

(5.3.2)

If fre_i,p≤fre_j,qThen A_p,q＝fre_i,pAnd for any u, let L_u,p＝0；

If fre_i,p>fre_j,qThen A_p,q＝fre_j,qAnd for any v, let L_q,v＝0；

fre_i,p＝fre_i,p-A_p,q，

fre_j,q＝fre_j,q-A_p,q。

(5.3.3) if any element in L equals 1, then repeating step (3.3.2) until

It is stopped.

And (6): normal patients are filtered out in a suspect patient population.

The similarity of the prescription between suspicious patients is obtained through the step (5). In the P-TL diagram, the edge e (P)_i,p_j) Weight w (p) of_i,p_j) The value is according to patient p_iAnd patient p_jThe similarity value of the prescriptions is set. The present disclosure sets a threshold min for prescription similarity between patients_wThe threshold is set to 0.35. The method comprises the following specific steps:

when patient p_iAnd patient p_jThe similarity value of the prescriptions is less than the threshold value min_wWhen, the edge e (p)_i,p_j) Weight w (p) of_i,p_j) Set to a value of 0, considered as edge e (p)_i,p_j) Is absent.

When patient p_iAnd patient p_jThe similarity value of the prescriptions is more than or equal to the threshold value min_wWhen, the edge e (p)_i,p_j) Weight w (p) of_i,p_j) Setting the value to patient p_iAnd patient p_jThe similarity value of the prescriptions between the two.

For each group of suspected fraudulent patient populations obtained in step (4), passing the edge e (p) with other patients_i,p_j) The linked isolated patients are filtered out, while the rest pass the edge e (p) with other patients_i,p_j) A group of interlinked patients, which are considered fraudulent if their population exceeds the threshold Ψ. These fraudulent patients may be further analyzed in conjunction with demographic information.

The traditional abnormal co-occurrence hospitalizing fraud behavior identification method only considers the characteristic that a plurality of medical insurance cards are frequently consumed at the same time; the method utilizes a double clustering algorithm and introduces a health medical knowledge base at the same time, not only considers the characteristic that a plurality of medical insurance cards consume frequently and simultaneously, excavates suspicious patient groups which frequently seek medical advice at the same place at the same time, but also can filter normal patients which are misjudged due to long-term regular medical advice, thereby identifying medical insurance fraud behaviors more accurately, and compared with the traditional method, the identification accuracy is 76%, and the identification accuracy is improved to 95%. The medical insurance fund identification method and the medical insurance fund identification device are beneficial to identifying abnormal co-occurrence hospitalizing fraud behaviors and effectively protecting the medical insurance fund.

Example two

The embodiment discloses abnormal medical insurance data detection equipment based on a double clustering algorithm, which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor executes the program to realize the steps of the abnormal medical insurance data detection method based on the double clustering algorithm in the first embodiment.

EXAMPLE III

An object of the present embodiment is to provide a computer-readable storage medium.

A computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, performs the steps of implementing the abnormal medical insurance data detection method based on the dual clustering algorithm of example one.

Example four

Referring to fig. 1, the present embodiment discloses an abnormal co-occurrence hospitalization medical insurance data detection system based on a dual clustering algorithm, which includes:

a visit information and demographic information acquisition module: acquiring visit information and demographic information;

the visit information preprocessing module: data preprocessing:

The embodiment example of the application utilizes a 'clustering-Sim' model 'double clustering model' to identify fraud.

The double clustering method is used for mining suspicious patient groups which frequently seek medical advice at the same place at the same time and suspicious medical advice records which are simultaneously sought medical advice at the same place. The traditional double clustering method usually needs to manually set the number of clusters, and the quality of the final clustering result cannot be ensured. The double clustering method does not need to set clustering data in advance, and can ensure the quality of the finally mined clustering result by setting constraints on the quality of the final clustering result; the disclosed medical prescription similarity measurement calculation method. Since the prescription for each patient is a complex set of medical records. Different from the conventional common set, when calculating the similarity of the complex set, the occurrence frequency of the internal elements of the complex set and the similarity degree between the internal elements need to be considered. The invention considers the factors and can better calculate the similarity of the medical prescriptions.

The steps involved in the apparatuses of the above second, third and fourth embodiments correspond to the first embodiment of the method, and the detailed description thereof can be found in the relevant description of the first embodiment. The term "computer-readable storage medium" should be taken to include a single medium or multiple media containing one or more sets of instructions; it should also be understood to include any medium that is capable of storing, encoding or carrying a set of instructions for execution by a processor and that cause the processor to perform any of the methods of the present disclosure.

Those skilled in the art will appreciate that the modules or steps of the present disclosure described above can be implemented using general purpose computer means, or alternatively, they can be implemented using program code executable by computing means, whereby the modules or steps may be stored in memory means for execution by the computing means, or separately fabricated into individual integrated circuit modules, or multiple modules or steps thereof may be fabricated into a single integrated circuit module. The present disclosure is not limited to any specific combination of hardware and software.

The above description is only a preferred embodiment of the present disclosure and is not intended to limit the present disclosure, and various modifications and changes may be made to the present disclosure by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.

Although the present disclosure has been described with reference to specific embodiments, it should be understood that the scope of the present disclosure is not limited thereto, and those skilled in the art will appreciate that various modifications and changes can be made without departing from the spirit and scope of the present disclosure.

Claims

1. The abnormal medical insurance data detection method based on the dual clustering algorithm is characterized by comprising the following steps:

2. The abnormal medical insurance data detection method based on the dual clustering algorithm as claimed in claim 1, wherein there are two types of edges in the P-TL graph:

one is the edge connecting the medical insurance ginseng and the medical insurance ginseng, and is composed of e (p)_i,p_j) Is represented by the formula, wherein p_i,p_j∈P；

The other is the edge between the medical insurance ginseng and the medical position at the medical time, which is formed by e (p)_i,tl_j) Is represented by the formula, wherein p_i∈P，tl_i∈TL。

3. The abnormal medical insurance data detection method based on the dual clustering algorithm as claimed in claim 2, wherein for the edge e (p)_i,tl_j) Weight w (p) of_i,tl_j) Calculating and setting a time threshold phi, wherein tl_j＝<t_j,l_j>，t_jStands for tl_jTime of hospitalization of Chinese medicine, /)_jStands for tl_jThe location of hospitalization in (1), let t_iRepresentative of patient p_iThe time to seek medical attention;

when patient p_iAt and t_jWithin a time interval of phi at_jThe location hospitalizing action occurs, then the weight w (p)_i,tl_j) The calculation method is as follows:

otherwise, weight w (p)_i,tl_j) Is 0.

4. The method as claimed in claim 1, wherein before the step of clustering, a matrix M with a size of n × M is constructed to represent the P-TL diagram, wherein n is the number of elements contained in the medical insurance participant set P, M is the number of elements contained in the medical time and medical place information set TL, and M is the number of elements contained in the medical time and medical place information set TL_i,jEqual to the edge e (P) in the P-TL graph_i,tl_j) The weight value of (2).

5. The abnormal medical insurance data detection method based on the dual clustering algorithm of claim 1, wherein the dual clustering algorithm clusters rows and columns of the matrix at the same time, and mines suspicious patient groups and their suspicious medical records that frequently visit the same place at the same time.

6. The method of claim 5, wherein the PALM algorithm is applied to solve the objective function of the dual-clustering algorithm to obtain a suspicious patient group, if a new suspicious patient group is mined again, the elements of the corresponding row in the M matrix corresponding to the mined patient are set to zero, and then the function solution is performed again on the updated matrix M to obtain a new suspicious patient group and the corresponding suspicious medical records.

7. The abnormal medical insurance data detection method based on the dual clustering algorithm of claim 1, wherein the similarity of the prescription between the patients is calculated, only the similarity of the prescription between suspicious patients is considered to be calculated, and only the suspicious medical record of the patients is considered when the similarity of the prescription is calculated;

in the P-TL diagram, the edge e (P)_i,p_j) Weight w (p) of_i,p_j) The value is according to patient p_iAnd patient p_jThe similarity value of the prescriptions is set.

8. Abnormal medical insurance data detection equipment based on the double clustering algorithm comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, and is characterized in that the processor executes the program to realize the steps of the abnormal medical insurance data detection method based on the double clustering algorithm according to any one of claims 1 to 7.

9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method for abnormal medical insurance data detection based on the double clustering algorithm according to any one of claims 1 to 7.

10. Abnormal co-occurrence medical insurance data detection system based on dual clustering algorithm, characterized by comprising: