CN113657548A - Medical insurance abnormity detection method and device, computer equipment and storage medium - Google Patents

Medical insurance abnormity detection method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN113657548A
CN113657548A CN202111015971.XA CN202111015971A CN113657548A CN 113657548 A CN113657548 A CN 113657548A CN 202111015971 A CN202111015971 A CN 202111015971A CN 113657548 A CN113657548 A CN 113657548A
Authority
CN
China
Prior art keywords
data
medical insurance
detection model
training
medical
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111015971.XA
Other languages
Chinese (zh)
Inventor
李佳秀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Medical and Healthcare Management Co Ltd
Original Assignee
Ping An Medical and Healthcare Management Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Medical and Healthcare Management Co Ltd filed Critical Ping An Medical and Healthcare Management Co Ltd
Priority to CN202111015971.XA priority Critical patent/CN113657548A/en
Publication of CN113657548A publication Critical patent/CN113657548A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Medical Informatics (AREA)
  • Medical Treatment And Welfare Office Work (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

The application discloses a medical insurance abnormity detection method and device, computer equipment and a storage medium, relates to the field of medical big data processing, and is used for improving the detection efficiency and accuracy of medical insurance abnormity behaviors. The medical insurance abnormality detection method comprises the following steps: acquiring medical insurance data openly shared by a medical insurance organization, and performing data preprocessing on the medical insurance data to obtain original data; describing original data through a preset dimension to obtain target data; performing feature extraction on the original data and the target data by adopting a machine learning algorithm to obtain feature vectors; inputting the original data and the characteristic vector as a training data set into a preset neural network for training to generate a medical insurance detection model; and inputting the medical insurance data to be detected into the medical insurance detection model, and detecting the type of the medical insurance data to be detected according to the medical insurance detection model. The method and the device can effectively improve the auditing efficiency and accuracy of the medical insurance abnormal data through the trained medical insurance detection model.

Description

Medical insurance abnormity detection method and device, computer equipment and storage medium
Technical Field
The embodiment of the invention relates to the field of medical big data processing, in particular to a medical insurance abnormity detection method and device, computer equipment and a storage medium.
Background
With the continuous expansion of medical insurance coverage, the medical requirements of insured people are continuously increased, the expenditure on medical funds is also continuously increased, the task of supervising the medical insurance funds is heavier and heavier, the expenditure caused by medical fraud exists, the traditional medical insurance fund supervision depends on experience auditing, and the auditing of reimburser information, pathogenesis, medical scheme, medicine application, rehabilitation medical treatment and other data in medical data is mainly carried out manually, so that the time and the labor are consumed, no specific auditing standard exists, and the auditing standard is different from person to person, so that the auditing accuracy is lower.
On the other hand, as time goes on, medical insurance fraud behaviors are increasingly hidden, complex and changeable, the wind control difficulty is further continuously upgraded, a plurality of violation behaviors cannot be accurately identified, the conditions of missed audit and false audit occur, and the audit effect is poor.
Disclosure of Invention
The embodiment of the invention provides a medical insurance abnormality detection method and device, computer equipment and a storage medium, which can improve the detection efficiency and accuracy of medical insurance abnormal behaviors.
In order to achieve the above purpose, the embodiment of the present application adopts the following technical solutions:
in a first aspect, a medical insurance abnormality detection method is provided, including:
acquiring medical insurance data openly shared by a medical insurance organization, and performing data preprocessing on the medical insurance data to obtain original data;
describing the original data through a preset dimension to obtain target data;
performing feature extraction on the original data and the target data by adopting a machine learning algorithm to obtain feature vectors;
inputting the original data and the characteristic vector as a training data set into a preset neural network for training to generate a medical insurance detection model;
and inputting the medical insurance data to be detected into the medical insurance detection model, and detecting the type of the medical insurance data to be detected according to the medical insurance detection model.
Optionally, after the step of inputting the medical insurance data to be detected into the medical insurance detection model and detecting the type of the medical insurance data to be detected according to the medical insurance detection model, the method further includes the following steps:
obtaining an auditing result of performing double auditing on the medical insurance data to be detected in the abnormal type;
and adding the auditing result into the training data set as a marked training sample.
Optionally, after the step of adding the audit result as a labeled training sample to the training data set of the medical insurance testing model, the method further includes the following steps:
acquiring a target insured person corresponding to the medical insurance data to be detected with the auditing result being illegal;
and adding the target ginseng protector into a preset blacklist.
Optionally, the step of describing the original data by a preset dimension to obtain target data specifically includes the following steps:
and describing data attributes of the original data to obtain the target data from a disease diagnosis dimension, a medical action subject dimension, a medical action compliance dimension and a reflow marking data dimension.
Optionally, the step of extracting features of the original data and the target data by using a machine learning algorithm to obtain a feature vector specifically includes the following steps:
extracting numerical data and classification data in the original data and the target data;
and respectively normalizing the numerical data and the classified data through a statistical distribution algorithm and a clustering algorithm to obtain the feature vector.
Optionally, after the step of inputting the raw data and the feature vector as a training data set to a preset neural network for training to generate a medical insurance detection model, the method further includes the following steps:
performing overfitting detection on the medical insurance detection model, and judging whether the medical insurance detection model is overfitting according to a detection result;
and when the medical insurance detection model is judged to be over-fitted, re-training the medical insurance detection model according to a preset training strategy until the medical insurance detection model is not over-fitted.
Optionally, after the step of inputting the raw data and the feature vector as a training data set to a preset neural network for training to generate a medical insurance detection model, the method further includes the following steps:
acquiring performance parameters of the medical insurance detection model, and judging whether the medical insurance detection model meets a preset model standard according to the performance parameters;
and when the medical insurance detection model is judged not to meet the preset model standard, adjusting the model parameters of the medical insurance detection model according to the performance parameters.
In a second aspect, to solve the above technical problem, an embodiment of the present invention further provides a medical insurance abnormality detection apparatus, including:
the system comprises a data acquisition module, a data processing module and a data processing module, wherein the data acquisition module is used for acquiring medical insurance data openly shared by a medical insurance institution and preprocessing the medical insurance data to obtain original data;
the data description module is used for describing the original data through a preset dimension to obtain target data;
the characteristic extraction module is used for extracting the characteristics of the original data and the target data by adopting a machine learning algorithm to obtain a characteristic vector;
the model training module is used for inputting the original data and the characteristic vector as a training data set into a preset neural network for training to generate a medical insurance detection model;
and the data detection module is used for inputting the medical insurance data to be detected into the medical insurance detection model and detecting the type of the medical insurance data to be detected according to the medical insurance detection model.
Optionally, the apparatus further comprises:
the audit result acquisition module is used for acquiring an audit result of performing double audit on the to-be-detected medical insurance data of the abnormal type;
and the data sample adding module is used for adding the auditing result into the training data set as a marked training sample.
Optionally, the apparatus further comprises:
the target ginseng insurance person acquisition module is used for acquiring a target ginseng insurance person corresponding to the medical insurance data to be detected, of which the auditing result is illegal;
and the blacklist adding module is used for adding the target ginseng and insurance person into a preset blacklist.
Optionally, the data description module includes:
and the data description unit is used for describing the data attribute of the original data to obtain the target data from the dimension of disease diagnosis, the dimension of the main body of the medical action, the dimension of the qualified medical action and the dimension of the marking data of the reflow.
Optionally, the feature extraction module comprises:
the data extraction unit is used for extracting numerical data and classified data in the original data and the target data;
and the data processing unit is used for respectively normalizing the numerical data and the classified data through a statistical distribution algorithm and a clustering algorithm to obtain the feature vector.
Optionally, the apparatus further comprises:
the overfitting detection module is used for performing overfitting detection on the medical insurance detection model and judging whether the medical insurance detection model is overfitting or not according to a detection result;
and the model retraining module is used for retraining the medical insurance detection model according to a preset training strategy when judging that the medical insurance detection model is over-fitted until the medical insurance detection model is not over-fitted.
Optionally, the apparatus further comprises:
the parameter acquisition module is used for acquiring performance parameters of the medical insurance detection model and judging whether the medical insurance detection model meets a preset model standard or not according to the performance parameters;
and the parameter adjusting module is used for adjusting the model parameters of the medical insurance detection model according to the performance parameters when the medical insurance detection model is judged not to meet the preset model standard.
In a third aspect, to solve the above technical problem, an embodiment of the present invention further provides a computer device, including a memory and a processor, where the memory stores computer-readable instructions, and the computer-readable instructions, when executed by the processor, cause the processor to execute the steps of the medical insurance abnormality detection method.
The computer device may be a network device, or may be a part of an apparatus in the network device, such as a system-on-chip in the network device. The chip system is configured to support the network device to implement the functions related to the first aspect and any one of the possible implementations thereof, for example, to receive, determine, and shunt data and/or information related to the medical insurance anomaly detection method. The chip system includes a chip and may also include other discrete devices or circuit structures.
In a fourth aspect, to solve the above technical problem, an embodiment of the present invention further provides a storage medium storing computer readable instructions, where the computer readable instructions, when executed by one or more processors, cause the one or more processors to perform the steps of the medical insurance abnormality detection method.
In a fifth aspect, a computer program product is provided, which when running on a computer, causes the computer to execute the medical insurance anomaly detection method according to the first aspect and any one of the possible design manners thereof.
It should be noted that all or part of the computer instructions may be stored on the first computer storage medium. The first computer storage medium may be packaged together with the processor of the medical insurance abnormality detection apparatus, or may be packaged separately from the processor of the medical insurance abnormality detection apparatus, which is not limited in this embodiment of the application.
For the description of the second, third, fourth and fifth aspects of the present invention, reference may be made to the detailed description of the first aspect; in addition, for the beneficial effects of the second aspect, the third aspect, the fourth aspect and the fifth aspect, reference may be made to the beneficial effect analysis of the first aspect, and details are not repeated here.
In the embodiment of the present application, the names of the above-mentioned medical insurance abnormality detection apparatuses do not limit the devices or the function modules themselves, and in an actual implementation, the devices or the function modules may appear by other names. Insofar as the functions of the respective devices or functional blocks are similar to those of the present invention, they are within the scope of the claims of the present invention and their equivalents.
These and other aspects of the invention will be more readily apparent from the following description.
The embodiment of the invention has the beneficial effects that: after the medical insurance data are obtained, preprocessing operations such as data cleaning and conversion are firstly carried out on the medical insurance data to obtain original data, target data describing the original data are then carried out from preset dimensions, feature extraction is carried out on the original data and the target data to obtain feature vectors, the original data and the feature vectors are used as training data sets and input into a neural network to be trained to generate a medical insurance detection model, and then auditing and mining can be carried out through the abnormity of the medical insurance data of the trained medical insurance detection model, and the auditing efficiency and the accuracy of the abnormal medical insurance data can be effectively improved.
Drawings
Fig. 1 is a schematic flow chart of a medical insurance abnormality detection method provided in an embodiment of the present application;
fig. 2 is a schematic flow chart illustrating data of the medical insurance abnormality detection method provided in the embodiment of the present application;
fig. 3 is a schematic flow chart illustrating feature extraction performed by the medical insurance abnormality detection method according to the embodiment of the present application;
fig. 4 is a schematic flow chart illustrating a training sample added to the medical insurance abnormality detection method according to the embodiment of the present application;
fig. 5 is a schematic flowchart illustrating a process of adding an illegal participant to a blacklist by the medical insurance abnormality detection method according to the embodiment of the present application;
FIG. 6 is a schematic flow chart of model overfitting detection of the medical insurance anomaly detection method provided in the embodiment of the present application;
fig. 7 is a schematic flow chart illustrating adjustment of model parameters by the medical insurance anomaly detection method according to the embodiment of the present application;
fig. 8 is a schematic structural diagram of an embodiment of a medical insurance abnormality detection apparatus provided in the embodiment of the present application;
fig. 9 is a block diagram of a basic structure of a computer device according to an embodiment of the present disclosure.
Detailed Description
Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present application.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It will be understood by those within the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
As described in the background art, the existing medical insurance anomaly detection method mainly depends on experience audit, has no specific audit standard, and has low audit accuracy and poor audit effect.
In order to solve the above problems, an embodiment of the present application provides a method for detecting medical insurance abnormality, where after medical insurance data is obtained, preprocessing operations such as data cleaning and conversion are performed on the medical insurance data to obtain raw data, then target data describing the raw data from a preset dimension is performed, feature extraction is performed on the raw data and the target data to obtain feature vectors, the raw data and the feature vectors are input to a neural network as a training data set to be trained to generate a medical insurance detection model, and then auditing and mining can be performed through abnormality of the trained medical insurance detection model medical insurance data, so that auditing efficiency and accuracy of the medical insurance abnormal data can be effectively improved.
The medical insurance abnormality detection method can be applied to computer equipment. The computer equipment can be equipment for medical insurance wind control supervision, a chip in the equipment, and a system on chip in the equipment.
Optionally, the device may be a physical machine, for example: desktop computers, also called desktop computers (desktop computers), mobile phones, tablet computers, notebook computers, ultra-mobile personal computers (UMPCs), netbooks, Personal Digital Assistants (PDAs), and other terminal devices.
Optionally, the computer device may also implement functions to be implemented by the computer device through a Virtual Machine (VM) deployed on a physical machine.
The medical insurance abnormality detection method provided by the embodiment of the application is described in detail below with reference to the accompanying drawings. As shown in fig. 1, the medical insurance abnormality detection method includes: S101-S105.
S101, medical insurance data openly shared by a medical insurance organization are obtained, and data preprocessing is carried out on the medical insurance data to obtain original data.
Optionally, the medical insurance institution is a department that stores relevant data of the national medical fund, such as the national medical insurance agency, or an institution or platform that is associated with the department that stores relevant data of the national medical fund and can acquire medical insurance data from the department, such as an enterprise website that periodically issues a medical insurance statistical bulletin, and the data source of the enterprise website is the national medical insurance agency.
In a possible implementation manner, the medical insurance data is medical data, such as personal health record, prescription, examination report, and the like, specifically, when acquiring the medical insurance data, the computer device first determines the dimension of the required data, such as the dimension of disease diagnosis, the dimension of medical action subject, the dimension of medical action compliance, and the dimension of reflow marking data, wherein the dimension of disease diagnosis is the disease diagnosis data of reimburser, such as the age, name, sex, medical cause, medicine, medical history, and the like of a patient; the dimension of the medical behavior subject is related information when the reimburser goes for a doctor, such as basic information and relations of a hospital, a doctor and a patient; the dimension of the medical behavior compliance refers to relevant information of a treatment process when the reimburser visits the doctor, such as clinical paths, expert rules and the like; the dimensions of the reflowed marking data are the returned labeled data, e.g., if drug a has an effect on symptom B and symptom C, then drug a has two labels corresponding to symptom B and symptom C, respectively. Of course, in implementation, the dimension of the data also includes other dimensions, such as a visit number dimension, a department dimension, a physician dimension, and a security unit dimension, and is not limited herein.
After the computer equipment acquires the medical insurance data, the data preprocessing operation such as cleaning and conversion can be carried out on the medical insurance data, and the medical insurance data after the data preprocessing is the original data.
For example, cleaning operations are performed on medical insurance data, including but not limited to data standardization processing, missing value processing, abnormal value processing, principal component dimension reduction analysis, and the like, and specifically, preprocessing of the data requires that data samples meet certain standards, wherein the data standardization processing is a data processing mode for converting data of different specifications into data of the same specification, so that comparability exists between data representing different attributes (different units).
Illustratively, the data normalization processing can adopt a non-dimensionalization processing method, and the non-dimensionalization processing method includes a linear non-dimensionalization method, a non-linear non-dimensionalization method and a non-dimensionalization method of a qualitative index, wherein the linear non-dimensionalization method refers to that when an actual value of an index is converted into an index evaluation value which is not influenced by the dimension, a linear relation is assumed between the two, and the change of the actual value of the index causes a corresponding proportional change of the index evaluation value. Linear dimensionless methods include, but are not limited to, min-max normalization and Z-score.
Alternatively, the formula of the Z-score (zero-mean normalization) method is as follows:
Figure BDA0003239805420000081
y in the formula (i) is the raw data, μ is the sample mean, and δ is the sample standard deviation. The Z-score method is characterized in that the mean value mu and the standard deviation delta of each datum (index) are firstly obtained, then the value Y after the original number is normalized can be calculated, dimensions can be removed by the Z-score method, and the influence of selection of different dimensions on distance calculation is avoided.
In some optional embodiments, the data normalization processing may also use one-hot-encoding (one-hot encoding) processing, and in implementation, the medical insurance data may sometimes use normalization processing for numerical variables, and the numerical variables may be sometimes used as numerical variables. While other medical insurance data are classified values, and there is no significance between the values of the classified values, such as: for the data, one-hot-encoding can be used to convert the classification variables into several binary columns, and taking the drug types including drug a, drug B, drug C, drug D, and drug E as examples, the one-hot code of drug a is: 10000, drug B has a unique hot code of: 01000, the one-hot code for drug C is: 00100, drug D has a one-hot code of: 00010, drug E has a unique hot code of: 00001. the data after the one-hot coding can be directly used for the classifier, and the problem that the classifier does not process attribute data well is solved.
In other embodiments, the min-max normalization is performed by linearly transforming the raw data to map between [0, 1 ]. The formula for the min-max normalization is as follows:
Figure BDA0003239805420000082
formula II, where y is the original data and xmaxIs the maximum value of the sample, xminIs the maximum value of the sample, in a set of height data ([2.5 ]],[3.1],[1.4],[2.2],[3.2]) For example, after min-max normalization: ([0.6111],[0.9444],[0],[0.4444],[1]). The difference between the data is amplified through the data after min-max normalization, and the learning of the model is facilitated.
In some embodiments, a failure in data collection or storage may result in data loss, for example, a failure in data storage, a damaged memory, or a mechanical failure, which may result in data not being collected or stored for a certain period of time, and of course, the data loss may also be caused by subjective factors, for example, an answer to a question rejected by an interviewee in a market survey, a data entry person missing data by mistake, and the like. The missing value processing may process missing data, and when implemented, the missing value processing includes, but is not limited to, missing value completion, deletion of a feature containing a missing value, direct use of a feature containing a missing value, and the like, and is not particularly limited herein.
Optionally, the missing value padding includes, but is not limited to, mean interpolation, homogeneous mean interpolation, median interpolation, mode interpolation, etc., in some embodiments, the mean interpolation refers to interpolating the missing value using a mean value of the valid values of the sample attributes, taking a visit number as an example, the visit number includes ([12], [14], [15], [22], [ ], [21]), wherein the empty data is missing, and the mean value is calculated as: 16.8, can take the value 17, then the number of visits after mean interpolation is: ([12],[14],[15],[22],[17],[21]).
Further, the above-mentioned mean interpolation is applicable to the case where the distance of the sample attribute is measurable, and when the distance of the sample attribute is not measurable (non-numerical type), the mode of the valid value of the sample attribute may be used to interpolate the missing value, that is, the value with the largest number of times the sample attribute takes values to fill up the missing value. Taking the dimensions of the department as an example, the dimensions of the department include ([ paediatrics ], [ gynecology ], [ internal medicine ], [ andrology ], [ none ], [ paediatrics ]), wherein no data is missing, and since the surgical occurrence frequency is the largest, the surgical interpolation is carried out at the position of the missing value to obtain ([ paediatrics ], [ gynecology ], [ internal medicine ], [ andrology ], [ paediatrics ]).
Further, the homogeneous mean interpolation first needs to classify the sample data, and then uses the mean of the samples in the class to interpolate the missing value.
Further, median interpolation is to sort a set of data by size, and then take an effective value at a middle position to interpolate a missing value, for example, the above-mentioned visit numbers ([12], [14], [15], [22], [ ], [21]), and if the median is 15, the visit number after median interpolation is: ([12],[14],[15],[22],[15],[21]).
It is understood that, in other embodiments, other deficiency value completing methods may be adopted for deficiency value completing, such as hot card interpolation, regression interpolation, multiple interpolation, etc., and may be used for completing the missing data, which is not limited herein.
In some embodiments, abnormal values may exist in the medical insurance data, which are unreasonable values in the data set, and are also called outliers, for example, the number of the insured person is 8 digits, and when the number of the acquired insured person is not 8 digits, the number is confirmed to be an abnormal value.
Alternatively, the determination of outliers includes, but is not limited to, boxplot analysis, 3 δ principle, simple statistical analysis, and the like.
Further, simple statistical analysis is to make a descriptive statistic on attribute values to see which values are not reasonable. For example, the attribute of the identification number is regulated as follows: the ID card number is 18 bits, if the number of bits of the ID card number in the sample data is not 18 bits, the sample data is an abnormal value.
Further, when the data obeys the positive distribution, the 3 δ principle can be used, and the probability of being out of 3 δ from the average is 0.003 according to the definition of the positive distribution, which belongs to the extremely small probability event, and then the sample data with the distance of more than 3 δ from the average can be considered to belong to the abnormal value. Of course, in other embodiments, when the data does not obey the positive-over distribution, the standard deviation can be determined by how many times away from the average distance, and the value of how many times can be determined according to actual situations. For example, if the probability of being 3 times farther from the average distance is 0.004, it can be assumed that sample data 3 times farther from the average distance belongs to an abnormal value.
Further, to improve the accuracy of the abnormal value determination, a boxed graph analysis may also be employed, which uses five statistics in the data: the method for describing data by the minimum value, the first quartile, the median, the third quartile and the maximum value is characterized in that the first quartile (Q1), the median and the third quartile (Q3) are calculated firstly, specifically, a group of data can be sorted from small to large, the number at the middle position is the median, namely the number at the position of 50%, and similarly, the first quartile and the third quartile are 25% and 75% of the numbers after being sorted from small to large. Let IQR be Q3-Q1, then the values between Q3+1.5(IQR) and Q1-1.5(IQR) are values within the acceptable range, and values other than Q3+1.5(IQR) and Q1-1.5(IQR) are considered abnormal values.
Further, the abnormal value processing method includes, but is not limited to, deleting a sample containing the abnormal value, treating the abnormal value as a missing value, and the like, and is not particularly limited herein.
Alternatively, the outlier may be treated as a missing value and the missing value may be processed, and the above-mentioned step of filling the missing value may be referred to, for example, the outlier may be interpolated as a missing value by mean interpolation, homogeneous mean interpolation, median interpolation, mode interpolation, or the like.
In some optional embodiments, the principal Component dimension reduction analysis may adopt a PCA (principal Component analysis) method to perform dimension reduction processing on the medical insurance data information, and the algorithm steps of the PCA are as follows:
inputting: data set X ═ X1,x2,x3,...,xnNeeds to be reduced to the K dimension.
De-averaging (i.e., de-neutralization), i.e., subtracting the respective average value from each bit feature;
computing a covariance matrix
Figure BDA0003239805420000101
Covariance matrix solving by eigenvalue decomposition method
Figure BDA0003239805420000102
The eigenvalues and eigenvectors of (a);
sorting the eigenvalues from big to small, selecting the largest K eigenvectors, and then forming an eigenvector matrix P by using the K eigenvectors played by the chess as the row vectors respectively;
the data is converted into a new space constructed by K eigenvectors, i.e., Y — PX.
To input a data set
Figure BDA0003239805420000103
For example, the two lines of data are brought to one line by using the PCA method.
Since each row of the X matrix is already zero-mean, no de-averaging is required.
Solving a covariance matrix:
Figure BDA0003239805420000111
solving an eigenvalue and an eigenvector of the covariance matrix, wherein the solved eigenvalue is as follows: lambda [ alpha ]1=2,
Figure BDA0003239805420000112
The corresponding feature vectors are:
Figure BDA0003239805420000113
wherein the corresponding feature vectors are each a general solution, C1And C2Any real number can be taken, and the normalized feature vector is:
Figure BDA0003239805420000114
fourthly, the characteristic vector matrix P is as follows:
Figure BDA0003239805420000115
multiplying the first row of the feature vector matrix P by the data matrix X to obtain the data after dimensionality reduction as follows:
Figure BDA0003239805420000116
some data without effective information or with some characteristics and other characteristics which are repeated in the medical insurance data are subjected to dimension reduction processing by the PCA method, so that the data quality is improved, and the model training efficiency is improved.
The order of execution of the above-described cleaning of the data by the data normalization processing, the missing value processing, the abnormal value processing, and the principal component dimension reduction analysis is not fixed, and for example, the operations of the non-dimensionalization processing, the missing value processing, the abnormal value processing, and the discrete data processing may be executed at a time, or the missing value processing, the abnormal value processing, and the discrete data processing may be executed first, or the missing value processing, the abnormal value processing, and the non-dimensionalization processing may be executed first, and the present embodiment is not limited thereto.
And S102, describing the original data through a preset dimension to obtain target data.
The computer device performs preset dimension description on original data obtained after data preprocessing is performed on medical insurance data, wherein the preset dimensions are preset angles, such as dimension of disease diagnosis, dimension of a main body of a medical action, dimension of medical action compliance and dimension of returned marking data, to describe the medical insurance data.
In some alternative embodiments, please refer to fig. 2, and fig. 2 is a flowchart illustrating data according to an embodiment of the present application.
As shown in fig. 2, step S102 specifically includes the following steps:
and S1021, describing data attributes of the medical insurance data from a disease diagnosis dimension, a medical action subject dimension, a medical action qualified dimension and a reflow marking data dimension to obtain target data.
In practice, the data attribute is the meaning of the data field in the medical insurance data, for example, the medical insurance data includes the information of the patient name, the identification number, the medical insurance policy number, the disease name, the name and the amount of the operation and the medication in the treatment process.
Optionally, describing the medical insurance data from the disease diagnosis dimension is to describe the medical data in a disease diagnosis related grouping manner from the perspective of medical resource consumption. Including the age, sex, days of hospitalization, clinical diagnosis, disease symptoms, surgery, disease severity, complications and complications of the patient, the patient is classified into diagnosis-related groups.
The medical insurance data is described according to the main dimension of medical behaviors through the basic information and relationship among hospitals, doctors and patients.
The medical insurance data is described from the direction of clinical paths, expert rules and the like, wherein the clinical paths refer to a set of standardized treatment modes and treatment procedures established for a certain disease, and are a comprehensive mode related to clinical treatment, and the method for promoting treatment organization and disease management by taking evidence and guidelines as knowledge. The expert rules include the contents of a medical knowledge base, etc.
The step of describing the medical insurance data from the dimension of the reflowed marking data can describe the medical insurance data from various data directions with different characteristics, such as directions of discrete data, continuous data, time sequence data and the like.
S103, extracting the features of the original data and the target data by adopting a machine learning algorithm to obtain a feature vector.
After the computer equipment describes the original data in a preset dimension, feature extraction is carried out on the original medical insurance data and the target data by adopting a machine learning algorithm, wherein the machine learning algorithm is a subject which is specially used for researching how a computer simulates or realizes human learning behaviors so as to obtain new knowledge or skills and reorganizing an existing knowledge structure to continuously improve the performance of the computer.
In some embodiments, please refer to fig. 3, fig. 3 is a schematic flow chart illustrating feature extraction according to an embodiment of the present application.
As shown in fig. 3, step S103 specifically includes the following steps:
s1031, extracting numerical data and classified data in the original data and the target data;
s1032, respectively normalizing the numerical data and the classified data through a statistical distribution algorithm and a clustering algorithm to obtain the feature vector.
Specifically, the medical insurance data comprises numerical data and classified data, wherein the numerical data refers to numerical data with values capable of distinguishing sizes, such as the number of hospitalizations, the classified data refers to character data with values incapable of distinguishing sizes, such as disease names, the computer device can identify and extract the numerical data and the classified data in the medical insurance data and perform data standardization, and the data standardization comprises a statistical distribution algorithm and a clustering algorithm.
Optionally, the numerical data includes cost, number of hospitalizations, etc., and specifically, the cost, number of hospitalizations data are normalized to the [0, 1] interval by using a statistical distribution. For example, a score interval is set, e.g., 10 to 90, with the cost, day of stay data between 10 and 90 being set to 1 and the data less than 10 or greater than 90 being set to 0.
Alternatively, the classification data includes medication records, examination records, assay records, and the like, and specifically, euclidean distances are calculated for the medication records, examination records, and assay records by a clustering algorithm and then normalized. For example, based on International Classification of Diseases (ICD) as a grouping basis, the medication records, the inspection records, and the test records are clustered in the same disease, for example, the type of the drug in the drug white list is set to 1, otherwise set to 0, or the inspection item in the inspection white list is marked as 1, otherwise marked as 0, or the test item in the test white list is marked as 1, otherwise marked as 0.
It should be noted that the numerical data is not limited to the above-mentioned expense and hospital stay, and may also include date, dosage, number of examinations, and other medical insurance related data; the taxonomic data is not limited to the above-mentioned medication record, examination record and test record, and may include hospital, name of patient, department, and attending physician, etc., and is not limited herein.
In some optional embodiments, the expert rule may be further used as a feature input to obtain returned result data of the expert rule, the returned result data is classified into multiple types of data with different characteristics, such as binary data, continuous data, and the like, and then the types of data are normalized to the [0, 1] interval, respectively. And returning data with higher suspected degree in the result data, wherein the score is higher.
S104, inputting the original data and the feature vector as a training data set into a preset neural network for training to generate a medical insurance detection model;
in practice, the acquired medical insurance data and the extracted feature vectors are combined into a training data set to train a neural network, the neural network is a computing system with interconnected nodes, and optionally, the neural network includes, but is not limited to, a Feedforward Neural Network (FNN), a Convolutional Neural Network (CNN), a Recurrent Neural Network (RNN), a recurrent neural network, and the like. In implementation, taking a deep learning algorithm as an example, taking 6 dimensions of an M-dimensional vector as an example, where the training data set includes the above-mentioned personnel number ID dimension, hospital dimension, visit number dimension, department dimension, physician dimension, and reference and maintenance unit dimension, the default step size is set to 1, 2M + 1M-dimensional vectors are generated, the 2M + 1M-dimensional vectors are processed through a preset type of forest, 2M +1 3-dimensional vectors are respectively generated, and the 2M +1 3-dimensional vectors are connected together to generate a G-dimensional vector, where M is 2 and G is 3 (2M +1) forest numbers.
And aiming at the generated G-dimensional vector, each layer receives the feature information contained in the feature vector in a cascade mode, feature learning is carried out through multi-layer and multi-type forest combination, the output result of each layer is spliced with the generated G-dimensional vector, feature learning is carried out through the forest combination of each layer again, and parameters of each layer are kept as a medical insurance detection model according to the training times and the convergence index.
Alternatively, the neural network may adopt an XGBoost classification model, and generate a medical insurance detection model through importing the training data set into the XGBoost classification model for training and learning, wherein model parameters and model evaluation are mainly involved. Specifically, the model parameters mainly relate to n _ estimators (number of weak estimators in the integration), eta (step size when iterating the decision tree), max _ depth, and objective (objective function). By using the GridSearchCV module in the sklern model _ selection, the expected value range of each parameter is preset, and then all the parameters are traversed in the range, and finally the parameter value combination meeting the optimal performance of the model is obtained.
And S105, inputting the medical insurance data to be detected into the medical insurance detection model, and detecting the type of the medical insurance data to be detected according to the medical insurance detection model.
The type of the medical insurance data to be detected can be detected through the trained medical insurance detection model, and optionally, the type of the medical insurance data is the type of medical insurance behaviors contained in the medical insurance data, including normal, suspected and illegal behaviors. The medical insurance violation in the medical process can be accurately, timely and effectively detected in real time, so that the accuracy and the efficiency of medical insurance fund supervision are improved.
According to the embodiment of the application, after the medical insurance data are obtained, preprocessing operations such as data cleaning and conversion are firstly carried out on the medical insurance data to obtain the original data, then the target data describing the original data from the preset dimension are carried out, feature extraction is carried out on the original data and the target data to obtain the feature vector, the original data and the feature vector are used as a training data set and input to the neural network to be trained to generate the medical insurance detection model, and then auditing and mining can be carried out through the abnormity of the trained medical insurance detection model medical insurance data, and the auditing efficiency and the accuracy of the abnormal medical insurance data can be effectively improved.
In some alternative embodiments, please refer to fig. 4, fig. 4 is a schematic flow chart of adding training samples according to an embodiment of the present application.
As shown in fig. 4, after the step of inputting the medical insurance data to be detected into the medical insurance detection model and detecting the type of the medical insurance data to be detected according to the medical insurance detection model, the medical insurance abnormality detection method provided by the application further includes the following steps:
s106, obtaining an auditing result of performing double auditing on the abnormal type medical insurance data to be detected;
and S107, adding the auditing result into the training data set as a marked training sample.
After the medical insurance data to be detected is input into the medical insurance detection model for detection, the medical insurance data to be detected which belongs to the abnormal type is subjected to re-audit, wherein the abnormal type refers to the medical insurance data to be detected which is judged to be suspected and illegal after the medical insurance detection model is detected, and the efficiency and the accuracy of auditing the medical insurance data can be effectively improved. And a self-learning closed loop for medical insurance violation detection can be formed, so that the sensitivity of a medical insurance detection model is improved, new medical insurance violation can be detected more quickly, advance prevention and early warning can be performed on different violations, alarm prompting is performed in the process, analysis and control are performed afterwards, and the real-time performance of medical insurance behavior supervision is guaranteed.
In some alternative embodiments, please refer to fig. 5, fig. 5 is a flowchart illustrating adding an illegal participant to a blacklist according to an embodiment of the present application.
As shown in fig. 5, after the step of adding the audit result as the labeled training sample to the training data set of the medical insurance detection model, the medical insurance anomaly detection method provided by the present application further includes the following steps:
s108, acquiring a target insured person corresponding to the medical insurance data to be detected with the illegal auditing result;
and S109, adding the target ginseng protector into a preset blacklist.
After the medical insurance data to be detected with abnormal types are subjected to re-audit, illegal medical insurance data to be detected and corresponding insured persons are found out, the insured persons are persons who purchase medical insurance, when the medical insurance data of a certain insured person is determined to be illegal, the insured person can be added into a blacklist, subsequent medical insurance resupply can be refused to the persons in the blacklist, or the audit intensity of the medical insurance data of the insured person in the blacklist is increased, and the like, and specific limitation is not made here.
In some alternative embodiments, please refer to fig. 6, fig. 6 is a schematic flow chart of model overfitting detection according to an embodiment of the present application.
As shown in fig. 6, after the step of inputting the raw data and the feature vector as a training data set to a preset neural network for training to generate a medical insurance detection model, the medical insurance abnormality detection method provided by the present application further includes the following steps:
s110, performing overfitting detection on the medical insurance detection model, and judging whether the medical insurance detection model is overfitting according to a detection result;
and S111, when the medical insurance detection model is judged to be over-fitted, retraining the medical insurance detection model according to a preset training strategy until the medical insurance detection model is not over-fitted.
After the medical insurance detection model is generated, overfitting detection can be carried out on the medical insurance detection model, overfitting refers to the fact that the model is good in performance on a verification set and a training set, the overfitting phenomenon is poor in performance on a test set, whether the overfitting phenomenon occurs or not can be judged through a prediction result in the implementation process, when the overfitting phenomenon occurs, the model parameters can be adjusted to retrain and correct the overfitting, for example, a regularization item cyclic training model is added, and the overfitting does not occur on the model any more.
In some alternative embodiments, please refer to fig. 7, and fig. 7 is a schematic flowchart illustrating a process of adjusting model parameters according to an embodiment of the present application.
As shown in fig. 7, after the step of inputting the raw data and the feature vector as a training data set to a preset neural network for training to generate a medical insurance detection model, the medical insurance abnormality detection method provided by the present application further includes the following steps:
s112, acquiring performance parameters of the medical insurance detection model, and judging whether the medical insurance detection model meets a preset model standard according to the performance parameters;
and S113, when the medical insurance detection model is judged not to meet the preset model standard, adjusting the model parameters of the medical insurance detection model according to the performance parameters.
Model performance parameters are information used to verify attributes of the evaluation model, and in practice, include Accuracy of the model.
Optionally, Accuracy of the trained medical insurance detection model may be calculated according to an output result of the confusion matrix, and an Accuracy calculation formula is as follows:
Figure BDA0003239805420000161
in the formula III, TP and TN represent correct samples, and TP + TN + FP + FN represents all samples, so that the accuracy rate represents the proportion of the correct samples in all samples.
Further, when the calculated accuracy meets an expected set threshold, for example, the accuracy reaches 99.9%, it is determined that the medical insurance detection modularity meets a preset model standard, and in implementation, the model standard is stored in the computer device, and the medical insurance detection model meeting the model standard can be stored. If the medical insurance detection model does not meet the model standard, for example, the accuracy rate does not meet the preset threshold value, the parameters of the model are adjusted to train again until the accuracy rate of the medical insurance detection model meets the preset threshold value.
In other embodiments, the model criteria may further include precision (precision), recall (recall), and F1 values, where the precision represents the proportion of samples of which the real category is the attribute in samples predicted to be the attribute, the recall represents the proportion of samples successfully predicted by the model in samples of which the real category is the attribute, and the F1 value is the harmonic mean of the precision and the recall, and the specific type of the performance evaluation is not limited herein.
The scheme provided by the embodiment of the application is mainly introduced from the perspective of a method. To implement the above functions, it includes hardware structures and/or software modules for performing the respective functions. Those of skill in the art will readily appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the embodiment of the application, the medical insurance abnormality detection apparatus may be divided into the functional modules according to the method example, for example, each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. Optionally, the division of the modules in the embodiment of the present application is schematic, and is only a logic function division, and there may be another division manner in actual implementation.
Referring to fig. 8, fig. 8 is a schematic view of a basic structure of the medical insurance abnormality detection apparatus of the present embodiment.
As shown in fig. 8, a medical insurance abnormality detection apparatus includes:
the data acquisition module 201 is used for acquiring medical insurance data openly shared by a medical insurance institution and performing data preprocessing on the medical insurance data to obtain original data;
the data description module 202 is configured to describe the original data through a preset dimension to obtain target data;
the feature extraction module 203 is configured to perform feature extraction on the original data and the target data by using a machine learning algorithm to obtain a feature vector;
the model training module 204 is configured to input the original data and the feature vector as a training data set to a preset neural network for training, so as to generate a medical insurance detection model;
the data detection module 205 is configured to input the medical insurance data to be detected into the medical insurance detection model, and detect the type of the medical insurance data to be detected according to the medical insurance detection model.
Optionally, the medical insurance abnormality detection apparatus provided by the application further includes:
the audit result acquisition module is used for acquiring an audit result of performing double audit on the to-be-detected medical insurance data of the abnormal type;
and the data sample adding module is used for adding the auditing result into the training data set as a marked training sample.
Optionally, the medical insurance abnormality detection apparatus provided by the application further includes:
the target ginseng insurance person acquisition module is used for acquiring a target ginseng insurance person corresponding to the medical insurance data to be detected, of which the auditing result is illegal;
and the blacklist adding module is used for adding the target ginseng and insurance person into a preset blacklist.
Optionally, the data description module includes:
and the data description unit is used for describing data attributes of the original data to obtain the target data from a disease diagnosis dimension, a medical action main body dimension, a medical action qualified dimension and a reflow marking data dimension.
Optionally, the feature extraction module further comprises:
the data extraction unit is used for extracting numerical data and classified data in the original data and the target data;
and the data processing unit is used for respectively normalizing the numerical data and the classified data through a statistical distribution algorithm and a clustering algorithm to obtain the feature vector.
Optionally, the medical insurance abnormality detection apparatus provided by the application further includes:
the overfitting detection module is used for performing overfitting detection on the medical insurance detection model and judging whether the medical insurance detection model is overfitting or not according to a detection result;
and the model retraining module is used for retraining the medical insurance detection model according to a preset training strategy when judging that the medical insurance detection model is over-fitted until the medical insurance detection model is not over-fitted.
Optionally, the medical insurance abnormality detection apparatus provided by the application further includes:
the parameter acquisition module is used for acquiring performance parameters of the medical insurance detection model and judging whether the medical insurance detection model meets a preset model standard or not according to the performance parameters;
and the parameter adjusting module is used for adjusting the model parameters of the medical insurance detection model according to the performance parameters when the medical insurance detection model is judged not to meet the preset model standard.
In order to solve the above technical problem, an embodiment of the present invention further provides a computer device. Referring to fig. 9, fig. 9 is a block diagram of a basic structure of a computer device according to the present embodiment.
As shown in fig. 9, the internal structure of the computer device is schematically illustrated. The computer device includes a processor, a non-volatile storage medium, a memory, and a network interface connected by a system bus. The non-volatile storage medium of the computer device stores an operating system, a database and computer readable instructions, the database can store control information sequences, and the computer readable instructions can enable the processor to realize the medical insurance abnormality detection method when being executed by the processor. The processor of the computer device is used for providing calculation and control capability and supporting the operation of the whole computer device. The memory of the computer device may have computer readable instructions stored therein, which when executed by the processor, may cause the processor to perform a method of medical insurance exception detection. The network interface of the computer device is used for connecting and communicating with the terminal. Those skilled in the art will appreciate that the architecture shown in fig. 9 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In this embodiment, the processor is configured to execute specific functions of the data obtaining module 201, the data describing module 202, the feature extracting module 203, the model training module 204, and the data detecting module 205 in fig. 8, and the memory stores program codes and various data required for executing the modules. The network interface is used for data transmission to and from a user terminal or a server. The memory in this embodiment stores program codes and data required for executing all the sub-modules in the face image key point detection device, and the server can call the program codes and data of the server to execute the functions of all the sub-modules.
The computer equipment obtains basic data by preprocessing operations such as data cleaning and conversion after medical insurance data in a medical insurance data source are obtained, characteristic classification summarizing and grouping are carried out on the basic data to obtain derivative data, the medical insurance data, the basic data and the derivative data are integrated into a training data set, a universal training data set is generated at one time, a large amount of index processing time is saved, labor time is saved, medical insurance data are medical insurance data of preset dimensions in a database which is shared by a medical insurance organization in an open mode, a large amount of model indexes are generated through the medical insurance data of multiple dimensions, the comprehensiveness of the data is increased, and further the developed model is comprehensive and high in accuracy.
The invention also provides a storage medium storing computer readable instructions, which when executed by one or more processors, cause the one or more processors to perform the steps of any of the above-mentioned embodiments of the medical insurance anomaly detection method.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the computer program is executed. The storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a Random Access Memory (RAM).
The invention also provides a storage medium storing computer readable instructions, which when executed by one or more processors, cause the one or more processors to perform the steps of any of the above-mentioned embodiments of the medical insurance anomaly detection method.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the computer program is executed. The storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a Random Access Memory (RAM).
Those of skill in the art will appreciate that the various operations, methods, steps in the processes, acts, or solutions discussed in this application can be interchanged, modified, combined, or eliminated. Further, other steps, measures, or schemes in various operations, methods, or flows that have been discussed in this application can be alternated, altered, rearranged, broken down, combined, or deleted. Further, steps, measures, schemes in the prior art having various operations, methods, procedures disclosed in the present application may also be alternated, modified, rearranged, decomposed, combined, or deleted.
The foregoing is only a partial embodiment of the present application, and it should be noted that, for those skilled in the art, several modifications and decorations can be made without departing from the principle of the present application, and these modifications and decorations should also be regarded as the protection scope of the present application.

Claims (10)

1. A medical insurance abnormality detection method is characterized by comprising the following steps:
acquiring medical insurance data openly shared by a medical insurance organization, and performing data preprocessing on the medical insurance data to obtain original data;
describing the original data through a preset dimension to obtain target data;
performing feature extraction on the original data and the target data by adopting a machine learning algorithm to obtain feature vectors;
inputting the original data and the characteristic vector as a training data set into a preset neural network for training to generate a medical insurance detection model;
and inputting the medical insurance data to be detected into the medical insurance detection model, and detecting the type of the medical insurance data to be detected according to the medical insurance detection model.
2. The medical insurance anomaly detection method according to claim 1, wherein after the step of inputting the medical insurance data to be detected into the medical insurance detection model and detecting the type of the medical insurance data to be detected according to the medical insurance detection model, the method further comprises the following steps:
obtaining an auditing result of performing double auditing on the medical insurance data to be detected in the abnormal type;
and adding the auditing result into the training data set as a marked training sample.
3. The medical insurance anomaly detection method according to claim 2, wherein after the step of adding the audit result as a labeled training sample to the training data set of the medical insurance detection model, the method further comprises the steps of:
acquiring a target insured person corresponding to the medical insurance data to be detected with the auditing result being illegal;
and adding the target ginseng protector into a preset blacklist.
4. The medical insurance anomaly detection method according to claim 1, wherein the step of describing the original data by a preset dimension to obtain target data specifically comprises the steps of:
and describing data attributes of the original data to obtain the target data from a disease diagnosis dimension, a medical action subject dimension, a medical action compliance dimension and a reflow marking data dimension.
5. The medical insurance anomaly detection method according to claim 1, wherein the step of extracting features of the original data and the target data by using a machine learning algorithm to obtain feature vectors specifically comprises the following steps:
extracting numerical data and classification data in the original data and the target data;
and respectively normalizing the numerical data and the classified data through a statistical distribution algorithm and a clustering algorithm to obtain the feature vector.
6. The medical insurance anomaly detection method according to claim 1, wherein after the step of inputting the raw data and the feature vectors as training data sets into a preset neural network for training to generate a medical insurance detection model, the method further comprises the following steps:
performing overfitting detection on the medical insurance detection model, and judging whether the medical insurance detection model is overfitting according to a detection result;
and when the medical insurance detection model is judged to be over-fitted, re-training the medical insurance detection model according to a preset training strategy until the medical insurance detection model is not over-fitted.
7. The medical insurance anomaly detection method according to claim 1, wherein after the step of inputting the raw data and the feature vectors as training data sets into a preset neural network for training to generate a medical insurance detection model, the method further comprises the following steps:
acquiring performance parameters of the medical insurance detection model, and judging whether the medical insurance detection model meets a preset model standard according to the performance parameters;
and when the medical insurance detection model is judged not to meet the preset model standard, adjusting the model parameters of the medical insurance detection model according to the performance parameters.
8. A medical insurance abnormality detection apparatus, characterized by comprising:
the system comprises a data acquisition module, a data processing module and a data processing module, wherein the data acquisition module is used for acquiring medical insurance data openly shared by a medical insurance institution and preprocessing the medical insurance data to obtain original data;
the data description module is used for describing the original data through a preset dimension to obtain target data;
the characteristic extraction module is used for extracting the characteristics of the original data and the target data by adopting a machine learning algorithm to obtain a characteristic vector;
the model training module is used for inputting the original data and the characteristic vector as a training data set into a preset neural network for training to generate a medical insurance detection model;
and the data detection module is used for inputting the medical insurance data to be detected into the medical insurance detection model and detecting the type of the medical insurance data to be detected according to the medical insurance detection model.
9. A computer device comprising a memory and a processor, wherein computer readable instructions are stored in the memory, which when executed by the processor, cause the processor to perform the steps of the medical insurance anomaly detection method of any one of claims 1 to 7.
10. A non-volatile storage medium, characterized in that it stores a computer program implemented by the medical insurance anomaly detection method according to any one of claims 1 to 7, and when the computer program is called by a computer, the steps included in the method are executed.
CN202111015971.XA 2021-08-31 2021-08-31 Medical insurance abnormity detection method and device, computer equipment and storage medium Pending CN113657548A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111015971.XA CN113657548A (en) 2021-08-31 2021-08-31 Medical insurance abnormity detection method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111015971.XA CN113657548A (en) 2021-08-31 2021-08-31 Medical insurance abnormity detection method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113657548A true CN113657548A (en) 2021-11-16

Family

ID=78482603

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111015971.XA Pending CN113657548A (en) 2021-08-31 2021-08-31 Medical insurance abnormity detection method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113657548A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114567536A (en) * 2022-02-24 2022-05-31 北京百度网讯科技有限公司 Abnormal data processing method and device, electronic equipment and storage medium
CN115080997A (en) * 2022-06-02 2022-09-20 武汉金豆医疗数据科技有限公司 Mobile checking method and device for medical insurance fund, computer equipment and storage medium
CN115796350A (en) * 2022-11-23 2023-03-14 长江大学 Method and system for predicting total organic carbon content of hydrocarbon source rock in few well regions in sea area
CN116070693A (en) * 2023-04-06 2023-05-05 北京亚信数据有限公司 Patient information and medical service relation detection model training and detection method and device
CN116300666A (en) * 2023-05-24 2023-06-23 科大智能物联技术股份有限公司 Power plant boiler operation control method based on XGBoost optimization
CN116701383A (en) * 2023-08-03 2023-09-05 中航信移动科技有限公司 Data real-time quality monitoring method, electronic equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105159948A (en) * 2015-08-12 2015-12-16 成都数联易康科技有限公司 Medical insurance fraud detection method based on multiple features
CN109118376A (en) * 2018-08-14 2019-01-01 平安医疗健康管理股份有限公司 Medical insurance premium calculation principle method, apparatus, computer equipment and storage medium
CN109636061A (en) * 2018-12-25 2019-04-16 深圳市南山区人民医院 Training method, device, equipment and the storage medium of medical insurance Fraud Prediction network
CN109934719A (en) * 2017-12-18 2019-06-25 北京亚信数据有限公司 The detection method and detection device of medical insurance unlawful practice, medical insurance control charge system
CN111340641A (en) * 2020-05-22 2020-06-26 浙江工业大学 Abnormal hospitalizing behavior detection method
CN111709845A (en) * 2020-06-01 2020-09-25 青岛国新健康产业科技有限公司 Medical insurance fraud behavior identification method and device, electronic equipment and storage medium
CN112801805A (en) * 2021-01-21 2021-05-14 浙江大学山东工业技术研究院 Medical insurance small card fraud detection method and system based on deep self-supervision neural network

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105159948A (en) * 2015-08-12 2015-12-16 成都数联易康科技有限公司 Medical insurance fraud detection method based on multiple features
CN109934719A (en) * 2017-12-18 2019-06-25 北京亚信数据有限公司 The detection method and detection device of medical insurance unlawful practice, medical insurance control charge system
CN109118376A (en) * 2018-08-14 2019-01-01 平安医疗健康管理股份有限公司 Medical insurance premium calculation principle method, apparatus, computer equipment and storage medium
CN109636061A (en) * 2018-12-25 2019-04-16 深圳市南山区人民医院 Training method, device, equipment and the storage medium of medical insurance Fraud Prediction network
CN111340641A (en) * 2020-05-22 2020-06-26 浙江工业大学 Abnormal hospitalizing behavior detection method
CN111709845A (en) * 2020-06-01 2020-09-25 青岛国新健康产业科技有限公司 Medical insurance fraud behavior identification method and device, electronic equipment and storage medium
CN112801805A (en) * 2021-01-21 2021-05-14 浙江大学山东工业技术研究院 Medical insurance small card fraud detection method and system based on deep self-supervision neural network

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114567536A (en) * 2022-02-24 2022-05-31 北京百度网讯科技有限公司 Abnormal data processing method and device, electronic equipment and storage medium
CN114567536B (en) * 2022-02-24 2024-02-23 北京百度网讯科技有限公司 Abnormal data processing method, device, electronic equipment and storage medium
CN115080997A (en) * 2022-06-02 2022-09-20 武汉金豆医疗数据科技有限公司 Mobile checking method and device for medical insurance fund, computer equipment and storage medium
CN115080997B (en) * 2022-06-02 2024-01-09 武汉金豆医疗数据科技有限公司 Mobile checking method and device for medical insurance fund, computer equipment and storage medium
CN115796350A (en) * 2022-11-23 2023-03-14 长江大学 Method and system for predicting total organic carbon content of hydrocarbon source rock in few well regions in sea area
CN116070693A (en) * 2023-04-06 2023-05-05 北京亚信数据有限公司 Patient information and medical service relation detection model training and detection method and device
CN116300666A (en) * 2023-05-24 2023-06-23 科大智能物联技术股份有限公司 Power plant boiler operation control method based on XGBoost optimization
CN116701383A (en) * 2023-08-03 2023-09-05 中航信移动科技有限公司 Data real-time quality monitoring method, electronic equipment and storage medium
CN116701383B (en) * 2023-08-03 2023-10-27 中航信移动科技有限公司 Data real-time quality monitoring method, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
US11922348B2 (en) Generating final abnormality data for medical scans based on utilizing a set of sub-models
CN113657548A (en) Medical insurance abnormity detection method and device, computer equipment and storage medium
CN111339126B (en) Medical data screening method and device, computer equipment and storage medium
US11900473B2 (en) Method of personalizing, individualizing, and automating the management of healthcare fraud-waste-abuse to unique individual healthcare providers
US10956823B2 (en) Distributed rule-based probabilistic time-series classifier
US20140279754A1 (en) Self-evolving predictive model
Hosseinzadeh et al. Assessing the predictability of hospital readmission using machine learning
Ekina et al. Application of bayesian methods in detection of healthcare fraud
US20210319387A1 (en) Artificial intelligence based approach for dynamic prediction of injured patient health-state
US20210174968A1 (en) Visualization of Social Determinants of Health
CN113642672B (en) Feature processing method and device of medical insurance data, computer equipment and storage medium
Leevy et al. Investigating the relationship between time and predictive model maintenance
Ling et al. An error detecting and tagging framework for reducing data entry errors in electronic medical records (EMR) system
Sanii et al. Explainable Machine Learning Models for Pneumonia Mortality Risk Prediction Using MIMIC-III Data
Herland Big Data Analytics and Engineering for Medicare Fraud Detection
CN115545955B (en) Method and device for detecting abnormal data in medical archive data and electronic equipment
Zhang et al. Prescription fraud detection through statistic modeling
US20230409926A1 (en) Index for risk of non-adherence in geographic region with patient-level projection
Ahmadinejad et al. Distance based model to detect healthcare insurance fraud within unsupervised database
Martell A MACHINE LEARNING APPROACH FOR ALERT BEHAVIOR RESPONSE MODELING TO MITIGATE ALERT FATIGUE IN HEALTH INFORMATION SYSTEMS
CN116912007A (en) Medical insurance fraud identification method and device and electronic equipment
CN115545955A (en) Method and device for detecting abnormal data in medical archive data and electronic equipment
Wang Tackling Bias, Privacy, and Scarcity Challenges in Health Data Analytics
Huang Using Artificial Neural Networks to Predict One Year Population Mortality Rates
CN117271508A (en) Method and system for detecting abnormal aggregation of medical insurance cards

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination