WO2022057057A1

WO2022057057A1 - Method for detecting medicare fraud, and system and storage medium

Info

Publication number: WO2022057057A1
Application number: PCT/CN2020/127183
Authority: WO
Inventors: 李坚强; 陈杰; 胡晓楠; 罗若恒
Original assignee: 深圳大学
Priority date: 2020-09-15
Filing date: 2020-11-06
Publication date: 2022-03-24
Also published as: CN112200684B; CN112200684A

Abstract

The present invention provides a method for detecting medicare fraud, and a system and a storage medium. The method comprises: acquiring a medical record of a patient, extracting a corresponding patient feature according to the acquired medical record, and according to the extracted patient feature and the correlation between the patient and a doctor, establishing a doctor-patient relationship neural network; inputting a pre-marked fraud sample into the established doctor-patient relationship neural network, training a fraud prediction model, and outputting, from the trained fraud prediction model, a predicted value of each patient node having a fraud behavior; and according to the output predicted value, determining whether the patient corresponding to the node has a fraud behavior. By means of a machine learning method, whether a patient has a fraud behavior is predicted, so that the difficulty of predicting a fraud behavior is reduced, a medicare fraud behavior can be effectively detected, and the health popularization of a medicare system is facilitated.

Description

A method, system and storage medium for detecting medical insurance fraud

technical field

The invention relates to the field of medical technology, and in particular, to a method, a system and a storage medium for detecting medical insurance fraud.

Background technique

Medical insurance is a social security project in my country, which is a social security system established to compensate citizens or workers for economic losses caused by disease risks. However, with the popularization of medical insurance, the phenomenon of criminals taking advantage of the opportunity of universal medical insurance to conduct medical insurance fraud has emerged in an endless stream, resulting in an additional increase in national financial medical and health expenditures.

Therefore, effective detection of medical insurance fraud is required, and existing detection methods include unsupervised learning and supervised learning. Among them, unsupervised learning relies on outlier analysis to find potential anomalies in unlabeled data, but the methods used to detect anomalies are not suitable for highly skewed data such as medical insurance fraud data; supervised learning requires a large number of labels Point data, by marking fraudulent and non-fraud examples to achieve prediction, but due to the lack of experts and medical fraud investigation, the actual marking points can be very few, and effective detection cannot be achieved.

It can be seen that neither of the two current methods of medical insurance fraud detection can effectively detect real medical insurance fraud, which is not conducive to preventing the occurrence of medical insurance fraud.

Therefore, the existing technology has defects and needs to be improved and developed.

SUMMARY OF THE INVENTION

The technical problem to be solved by the present invention is to provide a method, system and storage medium for detecting medical insurance fraud in view of the above-mentioned defects of the prior art, aiming to solve the problem that the medical insurance fraud detection method in the prior art cannot perform effective detection, Medicare fraud cannot be prevented.

The technical scheme adopted by the present invention to solve the technical problem is as follows:

A method of detecting health insurance fraud, comprising:

Obtain the patient's medical records, extract the corresponding patient characteristics according to the obtained medical records, and establish a doctor-patient relationship neural network according to the extracted patient characteristics and the corresponding relationship between the patient and the doctor;

Input the pre-marked fraud samples into the established doctor-patient relationship neural network, train a fraud prediction model, and output the predicted value of fraudulent behavior of each patient node from the trained fraud prediction model;

Determine whether the patient of the corresponding node has fraudulent behavior according to the output prediction value.

It is possible to actively learn and predict patient nodes with fraudulent behaviors through machines, which facilitates effective management of medical insurance frauds and facilitates the healthy popularization of the medical insurance system.

Further, the pre-labeled fraud samples are input into the established doctor-patient relationship neural network, the fraud prediction model is trained, and the predicted value that each patient node has fraudulent behavior is output from the trained fraud prediction model. include:

Monitor for new medical records;

If there is a new medical treatment record, input the patient node with the predicted value into the pre-established dynamic update network, and delete the invalid patient node;

Organize the remaining medical treatment records and the newly added medical treatment records after deleting the invalid node into the updated medical treatment records;

Continue to determine whether the patient corresponding to each node has fraudulent behavior according to the updated medical treatment record.

By updating the data in time to delete invalid nodes, the efficiency of prediction can be improved on the premise of ensuring the accuracy of prediction, and the system can run quickly to predict more patient nodes with fraudulent behavior.

Further, if there is a newly added medical treatment record, the patient node with the predicted value is input into the pre-established dynamic update network, and the invalid patient node is deleted, wherein the basis for judging the invalid patient node is:

Calculate the priority of each patient node separately according to the generation date and predicted value of the patient node with the predicted value;

Sort the priority of each patient node, and select the node with low priority as the invalid patient node.

Defining invalid patient nodes in an effective way can further improve the accuracy of prediction, ensure the validity of the data taken during prediction, and help improve the prediction rate.

Further, the deletion of invalid patient nodes specifically includes:

According to the number of new visits, delete the same number of invalid patient nodes in the order of the patient nodes with lower priority.

Further, inputting the pre-marked fraud samples into the established doctor-patient relationship neural network, wherein the step of obtaining the pre-marked fraud samples includes:

Select part of the medical records from the medical records in a preset way as the samples to be marked;

The selected samples to be marked are marked by experts, and the samples with fraudulent behaviors in the samples to be marked are identified to obtain pre-marked fraud samples.

By labeling the samples to be marked by experts, the authority of obtaining fraud samples is improved, and the predicted results are real and effective.

Further, selecting part of the medical treatment records from the medical treatment records in a preset manner as the samples to be marked, wherein the method of selecting the samples to be marked in a preset manner at least includes:

The entropy value of each patient is calculated by the maximum entropy selection strategy, and the maximum value of the calculated entropy values is selected as the sample to be marked;

Or, adopt a random strategy to randomly take part of the medical records in the medical records as the samples to be marked;

Alternatively, the probability value of each patient is calculated by the maximum probability strategy, and the maximum value among the calculated probability values is selected as the sample to be marked.

The samples to be marked are selected in a random manner, which maximizes the randomness of selection and helps to improve the accuracy of prediction.

Further, the described obtaining of the patient's medical records, the corresponding patient characteristics are extracted according to the obtained medical records, and the doctor-patient relationship neural network is established according to the extracted patient characteristics and the corresponding relationship between the patient and the doctor, including:

The patient identity information in the patient medical treatment information is anonymized, and the processed medical treatment information is converted into a medical treatment record of a data structure type.

By anonymizing patient identity information, patient privacy can be protected and patient information leakage can be avoided.

Further, obtaining the patient's medical treatment record, extracting the corresponding patient characteristics according to the obtained medical treatment record, and establishing a doctor-patient relationship neural network according to the extracted patient characteristics and the corresponding relationship between the patient and the doctor, specifically including:

Obtain the patient's medical records, extract the corresponding patient characteristics from the medical records, and establish a patient characteristic degree matrix;

Analyze the doctor-patient relationship between doctors and patients in the medical records, and establish the corresponding doctor-patient relationship adjacency matrix;

According to the patient characteristic degree matrix and the doctor-patient relationship adjacency matrix, the doctor-patient relationship neural network is established.

The present invention also discloses a system comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors to execute the one or more programs The above program contains a method for performing the detection of health insurance fraud as described above.

The present invention also discloses a storage medium, wherein the storage medium stores a computer program, and the computer program can be executed to implement the method for detecting medical insurance fraud as described above.

A method, system and storage medium for detecting medical insurance fraud provided by the present invention, wherein the method includes: acquiring a patient's medical record, extracting corresponding patient characteristics according to the obtained medical record, and according to the extracted patient characteristics and the corresponding relationship between patients and doctors, establish a doctor-patient relationship neural network; input pre-labeled fraud samples into the established doctor-patient relationship neural network, train a fraud prediction model, and output each fraud prediction model from the trained fraud prediction model. Each patient node has a predicted value of fraudulent behavior; according to the output predicted value, it is determined whether the patient of the corresponding node has fraudulent behavior. Predicting whether patients have fraudulent behaviors through machine learning reduces the difficulty of predicting fraudulent behaviors, and can effectively detect medical insurance fraudulent behaviors, which is conducive to maintaining the health and popularization of the medical insurance system.

Description of drawings

FIG. 1 is a flowchart of a preferred embodiment of the method for detecting medical insurance fraud in the present invention.

FIG. 2 is a flowchart of a specific embodiment of step S100 in the present invention.

FIG. 3 is a flow chart of a preferred embodiment of the present invention combined with a dynamic update network.

FIG. 4 is a flow chart showing a preferred embodiment of the fraud prediction model in relation to the dynamic update network in the present invention.

FIG. 5 is a flowchart of a specific embodiment of step S410 in FIG. 3 in the present invention.

FIG. 6 is a flow chart of a preferred embodiment of the execution process of the update algorithm in the present invention.

FIG. 7 is a comparison diagram of experimental results using and not using the dynamic update network in the present invention.

FIG. 8 is a functional principle block diagram of a preferred embodiment of the system of the present invention.

detailed description

In order to make the objectives, technical solutions and advantages of the present invention clearer and clearer, the present invention will be further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention.

Medical insurance is a social security project in my country, which is a social security system established to compensate citizens or workers for economic losses caused by disease risks. A certain amount of insurance is paid by individuals and employers. When the insured person goes to a doctor and incurs medical expenses, the medical insurance institution will give the patient a certain amount of economic compensation. By the end of 2018, the number of people participating in basic medical insurance in China had reached 1.35 billion, and the participation rate had exceeded 95%. At the same time, the medical insurance fund plays a pivotal role in the life of this article. According to statistics from the Ministry of Human Resources and Social Security, my country's medical expenditures increased from 1.45 trillion in 2008 to 4.10 trillion in 2015, with an average annual growth rate of 16% . However, while the pressure on the medical insurance fund is increasing, criminals take advantage of the opportunity of universal medical insurance to conduct medical insurance fraud.

Medical insurance fraud is a fraudulent behavior in the process of medical services for the purpose of seeking benefits. The frauds here mainly include two categories: patients use some means to defraud medical insurance; patients and doctors jointly defraud medical insurance. From 2013 to 2017, the national fiscal medical and health expenditures totaled 5,950.2 billion yuan, with an average annual increase of 11.7%. While the country attaches great importance to medical and health care, the additional expenditures caused by medical insurance fraud are also increasing.

Existing detection of medical insurance fraud can be divided into two main branches: unsupervised learning methods and supervised learning methods. Among them, unsupervised learning relies on outlier analysis to find potential anomalies in unlabeled data, however, outlier detection methods are not suitable for highly skewed data, such as health insurance fraud data. Supervised learning requires a large number of labeled data points, including fraudulent and non-fraud examples, to achieve good predictive performance, however, due to a lack of domain experts and expensive medical fraud investigations, there are very few labeled points; Labels are extremely unbalanced, as non-fraud examples are often not explicitly disclosed in the real world. To solve this problem, one-class classification (OCC) algorithm is a solution for modeling medical fraud data when there is a lack of non-fraud examples, however, in medical fraud datasets, OCC method still suffers from insufficient number of training points. lead to poor prediction performance. Therefore, the above-mentioned unsupervised learning and supervised learning methods for detecting fraudulent activities in medical insurance have shortcomings and cannot effectively predict fraudulent behaviors.

Based on this, the present invention proposes a method for detecting medical insurance fraud by means of machine learning, thereby solving the problem that fraudulent behaviors cannot be effectively predicted in the prior art. The method described in the present invention is explained in detail below.

Please refer to FIG. 1, which is a flowchart of a method for detecting medical insurance fraud in the present invention. As shown in FIG. 1 , a method for detecting medical insurance fraud according to an embodiment of the present invention includes the following steps:

S100. Obtain a patient's medical treatment record, extract corresponding patient characteristics according to the obtained medical treatment record, and establish a doctor-patient relationship neural network according to the extracted patient characteristics and the corresponding relationship between the patient and the doctor.

Specifically, when a patient registers and uses medical insurance in an outpatient clinic, the hospital doctor enters the patient's medical information into the medical information system, and then the patient's medical record exists in the medical information system. Among them, the medical treatment information includes but is not limited to the patient's identity information, insurance type, purchased drug items, purchase quantity, medical visit date, etc. Since the medical visit information can be increased according to the patient's medical treatment situation, and the insurance types are similar, therefore, this This section does not give a detailed explanation of the medical treatment information. It is understood that this is only used to illustrate part of the content covered by the medical treatment information, and is not used to limit the present invention.

And in general, when conducting health insurance fraud analysis, it is also crucial to analyze the relationship between doctors and patients. The patient's visit information also includes the doctor who received the current medical project. By obtaining the patient's medical record, the patient's detailed medical information and the corresponding doctor for a single visit can be obtained.

By adopting the methods of machine learning and deep learning, the invention models the patient characteristics and the relationship between the patient and the doctor extracted from the medical treatment information, and then establishes a neural network of the doctor-patient relationship, which can strengthen the connection between each patient node, and further Facilitates classification of patient nodes.

S200: Input the pre-marked fraud samples into the established doctor-patient relationship neural network, train a fraud prediction model, and output a prediction value of fraudulent behavior of each patient node from the trained fraud prediction model.

Specifically, the selected labeled fraud samples are input into the doctor-patient relationship neural network, and then the model is trained. Through the active learning strategy of the machine, more patient nodes are classified and labeled according to the pre-labeled fraud samples, that is, Calculate the predicted value of fraudulent behavior of all patient nodes, and then analyze the fraudulent behavior of patients through the predicted value, which is conducive to the detection of medical insurance fraud, and then supervises medical insurance fraud, and helps to popularize medical insurance health.

S300. Determine whether the patient of the corresponding node has fraudulent behavior according to the output prediction value.

Specifically, the degree of the possibility of fraudulent behavior can be determined by the size of the calculated predicted value. Generally, the larger predicted value is regarded as the key analysis object, and the limit of the fraudulent behavior is determined by the size of the predicted value. Domain experts or medical insurance administrators or hospital-defined settings are not described in detail here, but are only used to illustrate that fraud can be determined by predicting values.

In one embodiment, as shown in FIG. 2 , the step S100 specifically includes:

S110. Obtain a patient's medical treatment record, extract corresponding patient characteristics from the medical treatment record, and establish a patient characteristic degree matrix.

S120, analyze the doctor-patient relationship between the doctor and the patient in the medical treatment record, and establish a corresponding doctor-patient relationship adjacency matrix.

When the patient's visit information is entered in the medical information system, the corresponding system will add medical records. By extracting the characteristics of each patient from all the medical records in the system, and the doctor who is connected with each patient when they visit a doctor, by comparing the two This kind of information is processed and displayed in the form of data, which is convenient for machine learning.

S130. Establish a doctor-patient relationship neural network according to the patient characteristic degree matrix and the doctor-patient relationship adjacency matrix.

Among them, the characteristic degree matrix is established for the patient characteristics, and the adjacency matrix is established according to the relationship between the doctor and the patient, and then the algorithm of forming the doctor-patient relationship neural network according to the degree matrix and the adjacency matrix is demonstrated as follows: Specifically, define the patient (P) and the doctor ( D) The network of relationships (PD) is an undirected graph

Among them, the number of nodes is expressed as: N=|ν|, ε represents the connection between the patient and the doctor,

Represents the connection weight between patient and doctor.

Then, define a graph convolution operation based on spectral domain convolution

Among them, any patient node is represented as:

Using Θ to represent a convolution kernel function, the present invention defines the representation of the convolution operation as follows:

By performing the convolution operation on the patient nodes, information can be exchanged between the interconnected patient nodes, and the patient nodes of the same type can be distributed more closely.

After that, use the Laplace matrix to process the degree matrix and the adjacency matrix to realize the eigendecomposition (that is, the spectral decomposition), and define its expression as follows:

_{where In} is the identity matrix;

is a degree matrix (representing patient characteristic information: such as date of consultation, type of medical insurance, amount, etc.), and

is the calculation formula of the degree matrix, A is the adjacency matrix (representing the weight information between the patient and the doctor),

is the diagonal matrix composed of the eigenvalues of the Laplace matrix;

is the eigenvector of the Laplacian matrix; the filter Θ(Λ) is the diagonal matrix with respect to the Laplacian matrix.

Since the complexity of the above formula is O(n ² ), it is improved by Chebyshev polynomial approximation and first-order approximation. The improved formula is as follows:

After that, the spatial features of the topology map are extracted through GCN to facilitate feature extraction, so Graph Convolution Network (GCN) has the following layer-by-layer propagation rules:

where H ⁽⁰⁾ =X is the patient node information, H ⁽¹⁾ represents the output of the first layer of the graph convolutional neural network, W ⁽¹⁾ is the weight matrix of the first layer of the graph convolutional neural network, σ(g) is the sigmoid activation function.

For the supervised learning model established in the present invention, the training of the prediction model generally requires labeled data with sufficient data, and the more labeled data, the higher the prediction accuracy. Since in practical applications, the labeling of fraudulent behaviors mainly relies on the investigation by domain experts, the cost of this is undoubtedly huge, and the efficiency of mobilizing experts to investigate is also very low. Moreover, manual labeling in the ever-increasing medical data is obviously Not realistic. Therefore, in the present invention, the most valuable data is selected for labeling by the method of active learning, so as to reduce the manpower and financial resources required for labeling in the existing method.

In a specific embodiment, the present invention uses the method of active learning to mark fraud samples, including:

S210. Select a part of the medical treatment records from the medical treatment records in a preset manner as the samples to be marked.

S220: Perform expert annotation on the selected samples to be marked, identify samples with fraudulent behaviors among the samples to be marked, and obtain pre-marked fraud samples.

In a specific embodiment, the method of selecting samples to be marked in a preset manner in step S210 includes at least one or more of a maximum entropy strategy, a random strategy, and a maximum probability strategy. Several methods of random sampling are briefly introduced, and the strategies listed here are combined or combined with other means, and then the average point, mean point, and middle point are selected for the overall sample taken. The manners such as these are extensions of the manners listed in this embodiment, and are not explained in detail here, and they are all within the scope of protection involved in this embodiment.

Mode 1: Calculate the entropy value of each patient through the maximum entropy selection strategy, and select the maximum value of the calculated entropy values as the sample to be marked.

Among them, the calculation formula of entropy is:

n represents the different values of the random variable x, and p _i represents the probability when x takes the value of i.

We adopt the maximum entropy (most uncertain) strategy to select label nodes.

By using the maximum entropy selection strategy (Maximum Entropy selection: MEs) to select the medical records with the most uncertain distribution in the medical records, the randomness of the data can be guaranteed. Specifically, the confidence value of which category the sample point belongs to is described by the conditional entropy. If the conditional entropy value is larger, it indicates that the classification of a certain sample point is less clear (the classification confidence is smaller); if the conditional entropy value is smaller, it indicates that the The clearer the classification of a sample point (the greater the classification confidence). The conditional entropy value is calculated by the following formula: H(Y|Z)=H(Z,Y)-H(Z)

By calculating the entropy value of each patient node, and then sorting them, each time the node with the largest entropy is selected for fraud labeling.

Method 2: Adopting a random strategy Part of the medical records in the medical records are randomly selected as the samples to be marked. That is, a preset number of medical visit records are randomly selected from all medical visit records as samples to be marked.

Method 3: Calculate the probability value of each patient through the maximum probability strategy, and select the maximum value among the calculated probability values as the sample to be marked. Specifically, by calculating the probability value of each patient, the sample to be marked is selected according to the size of the probability value. Since the calculated probability value is in the prior art, no example is given here.

It should be noted that in the above three methods, the maximum entropy strategy is preferentially selected to select the samples to be marked, which can select the most uncertain samples to be marked with the greatest randomness, thereby increasing the authenticity of the sample selection.

In one embodiment, the step of training the prediction model in step S200 includes:

By combining the doctor-patient relationship neural network and the variational self-decoder model, the prediction model can be established.

In general, the Bayesian formula:

It is to obtain the posterior probability of Z by observing the prior probability of patient visit data X. But in fact, there is only data about X, but no distribution function about X, that is, p(X) is unknown, then p(Z|X) cannot be solved. The above problem can be solved by Variational Auto-Encoder.

Auto-Encoder consists of two parts: decoder and encoder. What can be directly obtained in the present invention is the patient's visit data X, and at the same time X is generated by the hidden variable Z, the generation model from Z→X is p _θ (X|Z), which is called a decoder; The recognition model is q _θ (Z|X), which is called the encoder. Assuming that all data are independent and identically distributed, in order to make the effect of the generative model better, it is necessary to estimate the parameters of the generative model p _θ (X|Z). In the present invention, the logarithmic maximum likelihood method is used to obtain the logarithmic likelihood function. The maximum value of , the expression is as follows:

By first obtaining the patient's visit records, and then using the encoder q _θ (Z|X ⁽ⁱ⁾ ) to approximate the true posterior probability p _θ (Z|X ⁽ⁱ⁾ ), the distribution of the patient's visit records can be obtained. In the present invention, the encoder in VAE (variational auto encoding) is used. By sampling from the distribution relationship of patient nodes, the fraud behavior corresponding to the node can be obtained from a patient node. The input of fraudulent samples, and then through the adjustment of invisible parameters, the labels of all patient nodes are generated, which can better predict the relationship between doctors and patients, and at the same time, solve the problem of imbalanced data samples.

The similarity of the distribution between the two medical records is measured by the KL divergence (Kullback–Leibler divergence), and the following formula is obtained:

The expression of the prediction model is obtained by further combining the doctor-patient relationship neural network with the self-segmenting encoder:

logp _θ (X ⁽ⁱ⁾ )=KL(q _φ (Z|X ⁽ⁱ⁾ )||p _θ (Z|X ⁽ⁱ⁾ ))+L(θ,φ,X ⁽ⁱ⁾ )

The patient node can be predicted through the fraud prediction model. Specifically, the patient's medical record is used as the input data to establish a doctor-patient relationship neural network, and the output of the doctor-patient relationship neural network is used as the input of the variational auto-decoder, and finally the prediction result is output. All patient nodes are trained in the fraud prediction model. When the preset training times are reached, the prediction and classification of the unknown patient nodes in the patient nodes can be completed, that is, the predicted value of the fraud behavior of the patient nodes is calculated. The predicted value can be divided between 0 and 1, where non-fraud and 1 are fraud.

In practical application scenarios, medical insurance fraud methods emerge in an endless stream, so it is necessary to update the medical insurance fraud prediction model in time. However, due to the complexity of the graph relationship network and a large number of nodes, each training update needs to consume a lot of time and computing resources. This leads to great limitations in practical applications.

And with the passage of time, the number of patient visits will also increase, resulting in more and more patient nodes in the doctor-patient graph relationship network, and the requirements for machine computing conditions (hardware, memory, CPU, etc.) will be higher. With the increase of patient nodes, the amount of calculation also increases sharply, which makes it more difficult for the system to predict the fraud value, which is difficult to apply in practice.

Therefore, based on the above reasons, the present invention also proposes an online update strategy, so that the new data is automatically updated every day, and then by adding new nodes and deleting useless old nodes, the number of nodes in the graph is kept at a certain number, thereby The realization system can complete the training in a short time to ensure the good real-time performance of the system.

In one embodiment, the strategy implementation steps for online update in the present invention are as follows:

As shown in FIG. 3, after the step S200, it further includes:

S400. Monitor whether there is a newly added medical treatment record.

If so, step S410 is performed to input patient nodes with predicted values into the pre-established dynamic update network, and delete invalid patient nodes.

Specifically, by monitoring the newly added medical treatment records, when there are new medical treatment records, the newly added medical treatment records can be added to the fraud prediction model to conduct fraud analysis in a timely manner, and further by adding the original medical treatment records in the fraud prediction model. Invalid data removal can ensure the efficiency of system operation. When no new medical treatment records are detected, fraud prediction is performed on all medical treatment records according to the originally set prediction period to ensure the timeliness of the calculated predicted values.

S420: Arrange the remaining medical treatment records and the newly added medical treatment records after deleting the invalid node into an updated medical treatment record.

S430. Continue to determine whether the patient corresponding to each node has fraudulent behavior according to the updated medical treatment record. Specifically, steps S100-S430 are performed cyclically by using the updated medical treatment record as the source data for obtaining the medical treatment record in step S100, so as to achieve the effect of continuously updating data and prediction nodes, and ensure the feasibility of the prediction model.

Specifically, through the online update strategy, the system can be updated regularly, and some nodes with relatively little information in the graph relational network are deleted at each update. Through continuous iteration, the model prediction accuracy can be guaranteed while maintaining real-time. performance and training efficiency.

In a flowchart, as shown in Figure 4, in order to better represent the connection between the fraud prediction model established in the present invention and the dynamic update network, the following diagrams are used to further illustrate:

S10, start;

S20. Obtain the medical treatment record; obtain the medical treatment record from the data center;

S30, extracting patient characteristics; extracting patient characteristics from the medical treatment record;

S40, establishing a doctor-patient relationship neural network according to patient characteristics; establishing a doctor-patient relationship neural network according to the extracted patient characteristics and the relationship between doctors and patients;

S50. Obtain pre-marked fraud samples; select some medical treatment records from the data center for expert marking to mark fraud samples;

S60, train a fraud prediction model; establish a fraud prediction model according to the marked fraud samples and the doctor-patient relationship neural network;

S70, output the predicted value corresponding to each patient node; output the predicted value that all patient nodes have fraudulent behavior;

S80. Whether there is a new medical visit record; input the new medical visit record into the data center;

If so, execute S81, and input the predicted values of all patient nodes into the dynamic update network;

S82. Delete the invalid patient node, and form an updated medical visit record with the newly added medical visit record; update the data of the data center after forming the new medical visit record;

If not, execute S90, end;

Steps S20-S90 are looped.

In order to facilitate the description of all medical treatment records, the concept of a data center is introduced in this embodiment to describe the circulation process of medical treatment records.

In a further specific embodiment, as shown in FIG. 5 , the step S410 specifically includes:

S411. Input the patient node with the predicted value into the pre-established dynamic update network.

S412. Calculate the priority of each patient node respectively according to the generation date and the predicted value of the patient node with the predicted value.

S413. Sort the priority of each patient node, and select the patient node with a low priority as an invalid patient node.

S414 , delete an equal number of invalid patient nodes in the order of the patient nodes with lower priority according to the newly added number of visits.

Among them, the factors for judging a patient node as an invalid node are determined as follows: the patient's visit time (the earlier the patient is deleted, the priority is to be deleted), and the probability that the patient is predicted to be fraudulent (the smaller the priority is to delete).

Specifically, as shown in Figure 6, the execution flow of the update algorithm is as follows:

S42, input the medical treatment record V and the newly added data W of the hospital;

S43, generate a doctor-patient relationship neural network (adjacency matrix A and feature matrix X) according to V;

S44, use the variational self-decoding relationship model to predict all patient nodes in the doctor-patient relationship neural network, and output the predicted value p of each patient node;

S45, standardize the input date d of each patient node;

S46, combine the predicted value p of each patient node with the input date d; calculate the priority set s by s=λ _p p+λ _d d;

S47, sort s, and delete the corresponding number of nodes in s according to the number of newly added data W;

S48, iteratively update the system by combining the deleted node and the newly added node.

Among them, s is the priority (the smaller the priority), p is the probability that the patient node is predicted to be fraudulent, and d is the patient's visit date. _λp and _λd are the weights of probability and date, respectively. Calculate s for all patient nodes, sort them in ascending order, and delete the first k nodes (the number of k is equal to the number of newly added nodes). Experiments show that, see the comparison in Figure 7, (wherein, the circles indicate that the online dynamic update strategy is not used, and the prisms indicate that the online dynamic update strategy is used). It can also be seen from the figure that the use of the dynamic update strategy makes the training speed of the fraud model relatively Since the method without this strategy is improved by at least 40 times, its accuracy and precision are also guaranteed (using the dynamic strategy can make the model update within 6 hours, so it has good applicability).

In an embodiment, before the step S100, it further includes:

By anonymizing patient identity information, patient privacy can be protected and patient information leakage can be avoided. By using the doctor-patient relationship neural network and the variational autoencoder, the present invention can effectively predict the distribution of patient nodes, and select the best labeled samples through preset conditions for expert labeling, and then input them into the model for training, and then It reduces the cost of manual labeling and increases the accuracy of fraud prediction. Further, the present invention also provides an online dynamic update network model, which can update the system in real time under the premise that the number of patient nodes to be predicted in the system is fixed, thereby improving the prediction efficiency and implementability of the fraud prediction model; In addition, by deleting invalid nodes, the prediction time can be saved, the occupation of system resources can be avoided, and the prediction accuracy can be improved.

The present invention also discloses a system, which, as shown in FIG. 8, includes a memory 20, and one or more programs, wherein the one or more programs are stored in the memory 20 and are configured to be composed of one or more programs Execution of the one or more programs by the processor 10 includes performing the method for detecting medical insurance fraud as described above; specifically as described above.

The present invention also discloses a storage medium, wherein the storage medium stores a computer program, and the computer program can be executed to implement the above method for detecting medical insurance fraud; the details are as described above.

To sum up, a method, system and storage medium for detecting medical insurance fraud disclosed by the present invention can obtain medical records, extract patient characteristics and the relationship between patients and doctors according to the medical records, and then use machine learning and deep learning. The method models the extracted features and establishes a fraud prediction model, so as to detect medical insurance fraud with a small amount of manual intervention, ensure the effectiveness of medical insurance fraud, and save a lot of human, material and financial expenditures. Furthermore, the present invention also proposes an online dynamic update strategy, which dynamically updates the patient nodes in the graph neural network, thereby ensuring the real-time and accuracy of the prediction model and the system operation efficiency.

It should be understood that the application of the present invention is not limited to the above examples. For those of ordinary skill in the art, improvements or transformations can be made according to the above descriptions, and all these improvements and transformations should belong to the protection scope of the appended claims of the present invention.

Claims

A method for detecting medical insurance fraud, comprising:

Obtain the patient's medical records, extract the corresponding patient characteristics according to the obtained medical records, and establish a doctor-patient relationship neural network according to the extracted patient characteristics and the corresponding relationship between the patient and the doctor;

Input the pre-marked fraud samples into the established doctor-patient relationship neural network, train a fraud prediction model, and output the predicted value of fraudulent behavior of each patient node from the trained fraud prediction model;

Determine whether the patient of the corresponding node has fraudulent behavior according to the output prediction value.
The method for detecting medical insurance fraud according to claim 1, wherein the pre-marked fraud samples are input into the established doctor-patient relationship neural network, a fraud prediction model is trained, and a fraud prediction model is trained from the trained fraud prediction model. After outputting the predicted value of fraudulent behavior for each patient node in the model, it includes:

Monitor for new medical records;

If there is a new medical treatment record, input the patient node with the predicted value into the pre-established dynamic update network, and delete the invalid patient node;

Organize the remaining medical treatment records and the newly added medical treatment records after deleting the invalid node into the updated medical treatment records;

Continue to determine whether the patient corresponding to each node has fraudulent behavior according to the updated medical treatment record.
The method for detecting medical insurance fraud according to claim 2, wherein if there is a newly added medical treatment record, the patient node with the predicted value is input into the pre-established dynamic update network, and the invalid patient node is deleted, Among them, the basis for determining invalid patient nodes is:

Calculate the priority of each patient node separately according to the generation date and predicted value of the patient node with the predicted value;

Sort the priority of each patient node, and select the node with low priority as the invalid patient node.
The method for detecting medical insurance fraud according to claim 3, wherein the deleting an invalid patient node specifically includes:

According to the number of new visits, delete the same number of invalid patient nodes in the order of the patient nodes with lower priority.
The method for detecting medical insurance fraud according to claim 1, wherein the pre-marked fraud samples are input into the established doctor-patient relationship neural network, wherein the step of obtaining the pre-marked fraud samples comprises:

Select part of the medical records from the medical records in a preset way as the samples to be marked;

The selected samples to be marked are marked by experts, and the samples with fraudulent behaviors in the samples to be marked are identified to obtain pre-marked fraud samples.
The method for detecting medical insurance fraud according to claim 1, wherein the selecting part of the medical treatment records from the medical treatment records in a preset manner as the samples to be marked, wherein the method of selecting the samples to be marked in the preset manner at least includes: :

The entropy value of each patient is calculated by the maximum entropy selection strategy, and the maximum value of the calculated entropy values is selected as the sample to be marked;

Or, adopt a random strategy to randomly take part of the medical records in the medical records as the samples to be marked;

Alternatively, the probability value of each patient is calculated by the maximum probability strategy, and the maximum value among the calculated probability values is selected as the sample to be marked.
The method for detecting medical insurance fraud according to claim 1, wherein the obtaining a patient's medical treatment record, extracting corresponding patient characteristics according to the obtained medical treatment record, and according to the extracted patient characteristics and the correspondence between the patient and the doctor relationship, building a neural network of doctor-patient relationship, previously including:

The patient identity information in the patient medical treatment information is anonymized, and the processed medical treatment information is converted into a medical treatment record of a data structure type.
The method for detecting medical insurance fraud according to claim 1, wherein the obtaining a patient's medical treatment record, extracting corresponding patient characteristics according to the obtained medical treatment record, and according to the extracted patient characteristics and the correspondence between the patient and the doctor relationship, establish a neural network of doctor-patient relationship, including:

Obtain the patient's medical records, extract the corresponding patient characteristics from the medical records, and establish a patient characteristic degree matrix;

Analyze the doctor-patient relationship between doctors and patients in the medical records, and establish the corresponding doctor-patient relationship adjacency matrix;

According to the patient characteristic degree matrix and the doctor-patient relationship adjacency matrix, the doctor-patient relationship neural network is established.
A system comprising a memory and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors to execute the one or more programs A method for performing the detection of health insurance fraud as claimed in any one of claims 1-8 is included.
A storage medium, characterized in that the storage medium stores a computer program, and the computer program can be executed to implement the method for detecting medical insurance fraud according to any one of claims 1-8.