CN113642672A

CN113642672A - Feature processing method and device of medical insurance data, computer equipment and storage medium

Info

Publication number: CN113642672A
Application number: CN202111009455.6A
Authority: CN
Inventors: 李佳秀
Original assignee: Ping An Medical and Healthcare Management Co Ltd
Current assignee: Ping An Medical and Healthcare Management Co Ltd
Priority date: 2021-08-31
Filing date: 2021-08-31
Publication date: 2021-11-12

Abstract

The application discloses a feature processing method and device of medical insurance data, computer equipment and a storage medium, relates to the field of medical big data processing, and is used for comprehensiveness of model training data. The characteristic processing method of the medical insurance data comprises the following steps: acquiring original medical insurance data of a preset dimension in a medical insurance data source, wherein the medical insurance data source is a database which is opened and shared by a medical insurance organization; preprocessing the original medical insurance data to generate basic data; carrying out feature classification, summarization and grouping on the basic data according to the preset dimensionality to obtain derivative data; and integrating the original medical insurance data, the basic data and the derivative data to generate a training data set. According to the characteristic processing method of the medical insurance data, a large number of model indexes are generated through the multi-dimensional medical insurance data, the comprehensiveness of the data is improved, and further the developed model is comprehensive and high in accuracy.

Description

Feature processing method and device of medical insurance data, computer equipment and storage medium

Technical Field

The embodiment of the invention relates to the field of medical big data processing, in particular to a method and a device for processing characteristics of medical insurance data, computer equipment and a storage medium.

Background

With the rapid increase of medical expenses, the supervision task of the medical insurance fund is heavier and heavier, the supervision of the medical insurance fund needs to audit medical data, such as auditing reimburser information, pathogenesis, medical scheme, medication, rehabilitation medical treatment and other data, while the traditional supervision of the medical insurance fund depends on experience audit, and has no specific audit standard, so that the audit accuracy is lower, and more fishes are missed, in addition, the fraud of the medical insurance fund is increasingly concealed and complicated, the wind control difficulty is further continuously upgraded, the technology of medical big data and machine learning is gradually used for excavating some complicated and concealed violations, the audit accuracy can be greatly improved, and the fine and specialized management of the medical insurance fund is realized.

There are many methods for machine learning, but before all models are trained, the step of feature engineering is avoided, and the data and features define the upper limit of the machine learning model, so that the selection of the features and the processing of the feature engineering play a crucial role in the whole modeling process. However, the current modeling process mainly focuses on the modeling method, and ignores the processing process of medical insurance data, so that the trained model has strong bias and low accuracy.

Disclosure of Invention

The embodiment of the invention provides a method and a device for processing characteristics of medical insurance data, computer equipment and a storage medium, which can improve the comprehensiveness of model training data.

In order to achieve the above purpose, the embodiment of the present application adopts the following technical solutions:

in a first aspect, a feature processing method for medical insurance data is provided, including:

acquiring original medical insurance data of a preset dimension in a medical insurance data source, wherein the medical insurance data source is a database which is opened and shared by a medical insurance organization;

preprocessing the original medical insurance data to generate basic data;

carrying out feature classification, summarization and grouping on the basic data according to the preset dimensionality to obtain derivative data;

and integrating the original medical insurance data, the basic data and the derivative data to generate a training data set.

Optionally, after the step of integrating the original medical insurance data, the basic data and the derived data to generate a training data set, the method further comprises the steps of:

inputting the training data set into a preset neural network for training to generate a medical insurance detection model;

and inputting the medical insurance data to be detected into the medical insurance detection model, and detecting the type of the medical insurance data to be detected.

Optionally, after the step of inputting the medical insurance data to be detected into the medical insurance detection model and detecting the type of the medical insurance data to be detected, the method further includes the following steps:

obtaining an auditing result of performing double auditing on the detected medical insurance data to be detected with abnormal types;

and adding the auditing result into the training data set as a marked training sample.

Optionally, after the step of obtaining an audit result of performing a double audit on the detected medical insurance data to be detected whose type is abnormal, the method further includes the following steps:

acquiring information of the insured person of the medical insurance data to be detected, the auditing result of which is illegal;

and acquiring all medical insurance history records matched with the information of the insured person, and performing re-examination on the medical insurance history records.

Optionally, after the step of inputting the training data set into a preset neural network for training to generate a medical insurance detection model, the method further includes the following steps:

acquiring performance parameters of the medical insurance detection model;

and adjusting the model parameters of the medical insurance detection model according to the performance parameters.

Optionally, the step of adjusting the model parameters of the medical insurance detection model according to the performance parameters includes the following steps:

judging whether the performance parameters meet preset model standards;

and deleting the medical insurance detection model when the performance parameters are judged not to meet the model standard.

performing overfitting detection on the medical insurance detection model, and judging whether the medical insurance detection model is overfitting according to a detection result;

and when the medical insurance detection model is judged to be over-fitted, retraining the medical insurance detection model according to a preset adjusting strategy until the medical insurance detection model is not over-fitted.

In a second aspect, a feature processing device for medical insurance data is provided, and the feature processing device for medical insurance data includes:

the system comprises a medical insurance data acquisition module, a data processing module and a data processing module, wherein the medical insurance data acquisition module is used for acquiring original medical insurance data of preset dimensionality in a medical insurance data source, and the medical insurance data source is a database which is opened and shared by a medical insurance institution;

the data preprocessing module is used for preprocessing the original medical insurance data to generate basic data;

the data derivation module is used for carrying out feature classification, summarization and grouping on the basic data according to the preset dimensionality to obtain derived data;

and the data integration module is used for integrating the original medical insurance data, the basic data and the derivative data to generate a training data set.

Optionally, the apparatus further comprises:

the model training module is used for inputting the training data set to a preset neural network for training to generate a medical insurance detection model;

and the data detection module is used for inputting the medical insurance data to be detected into the medical insurance detection model and detecting the type of the medical insurance data to be detected.

Optionally, the apparatus further comprises:

the data re-examination module is used for acquiring an examination result of re-examining the detected medical insurance data to be detected with abnormal types;

and the training sample adding module is used for adding the auditing result into the training data set as a marked training sample.

Optionally, the apparatus further comprises:

the insurance participation information acquisition module is used for acquiring the insurance participation information of the medical insurance data to be detected, of which the auditing result is illegal;

and the medical insurance history record auditing module is used for acquiring all medical insurance history records matched with the information of the insured person and performing re-auditing on the medical insurance history records.

Optionally, the apparatus further comprises:

the model performance acquisition module is used for acquiring performance parameters of the medical insurance detection model;

and the model parameter adjusting module is used for adjusting the model parameters of the medical insurance detection model according to the performance parameters.

Optionally, the apparatus further comprises:

the model performance judging module is used for judging whether the performance parameters meet the preset model standard or not by the model performance;

and the model deleting module is used for deleting the medical insurance detection model when the performance parameters are judged to be not in accordance with the model standard.

Optionally, the apparatus further comprises:

the overfitting detection module is used for performing overfitting detection on the medical insurance detection model and judging whether the medical insurance detection model is overfitting or not according to a detection result;

and the model circulating training module is used for retraining the medical insurance detection model according to a preset training strategy when judging that the medical insurance detection model is over-fitted until the medical insurance detection model is not over-fitted.

In a third aspect, to solve the above technical problem, an embodiment of the present invention further provides a computer device, including a memory and a processor, where the memory stores computer-readable instructions, and the computer-readable instructions, when executed by the processor, cause the processor to execute the steps of the feature processing method of medical insurance data.

The computer device may be a network device, or may be a part of an apparatus in the network device, such as a system-on-chip in the network device. The chip system is configured to support the network device to implement the functions related to the first aspect and any one of the possible implementations thereof, for example, to receive, determine, and distribute data and/or information related to the feature processing method of the medical insurance data. The chip system includes a chip and may also include other discrete devices or circuit structures.

In a fourth aspect, to solve the above technical problem, an embodiment of the present invention further provides a storage medium storing computer readable instructions, where the computer readable instructions, when executed by one or more processors, cause the one or more processors to perform the steps of the above method for processing characteristics of medical insurance data.

In a fifth aspect, a computer program product is provided, which when run on a computer causes the computer to perform the method for characterizing medical insurance data according to the first aspect and any one of its possible designs.

It should be noted that all or part of the computer instructions may be stored on the first computer storage medium. The first computer storage medium may be packaged together with the processor of the feature processing device of the medical insurance data, or may be packaged separately from the processor of the feature processing device of the medical insurance data, which is not limited in this embodiment of the application.

For the description of the second, third, fourth and fifth aspects of the present invention, reference may be made to the detailed description of the first aspect; in addition, for the beneficial effects of the second aspect, the third aspect, the fourth aspect and the fifth aspect, reference may be made to the beneficial effect analysis of the first aspect, and details are not repeated here.

In the embodiment of the present application, the names of the feature processing devices of the medical insurance data do not limit the devices or the functional modules, and in practical implementation, the devices or the functional modules may be presented by other names. Insofar as the functions of the respective devices or functional blocks are similar to those of the present invention, they are within the scope of the claims of the present invention and their equivalents.

These and other aspects of the invention will be more readily apparent from the following description.

The embodiment of the invention has the beneficial effects that: after the original medical insurance data in the medical insurance data source is obtained, preprocessing operations such as data cleaning and conversion are carried out on the original medical insurance data to obtain basic data, feature classification summarizing and grouping are carried out on the basic data to obtain derivative data, then the original medical insurance data, the basic data and the derivative data are integrated into a training data set, a universal training data set is generated at one time, a large amount of index processing time is saved, labor time is saved, the original medical insurance data are medical insurance data with dimensions preset in a database which is opened and shared by a medical insurance organization, a large amount of model indexes are generated through the medical insurance data with multiple dimensions, the comprehensiveness of the data is improved, and further the developed model is comprehensive and high in accuracy.

Drawings

Fig. 1 is a schematic flow chart of a characteristic processing method of medical insurance data according to an embodiment of the present application;

FIG. 2 is a schematic flow chart of modeling and detecting medical insurance data anomalies of the feature processing method for medical insurance data provided by the embodiment of the present application;

fig. 3 is a schematic flow chart illustrating a process of performing a review on medical insurance data to be detected by the feature processing method for medical insurance data provided in the embodiment of the present application;

fig. 4 is a schematic flow chart illustrating that the medical insurance data characteristic processing method provided in the embodiment of the present application rechecks medical insurance history data of an illegal participant;

FIG. 5 is a schematic flow chart illustrating model parameter adjustment of a feature processing method for medical insurance data provided in an embodiment of the present application;

FIG. 6 is a schematic flowchart of a processing model of a feature processing method for medical insurance data provided in an embodiment of the present application;

FIG. 7 is a schematic flow chart of model overfitting detection of a feature processing method for medical insurance data provided in an embodiment of the present application;

FIG. 8 is a schematic structural diagram of an embodiment of a feature processing apparatus for medical insurance data according to an embodiment of the present application;

fig. 9 is a block diagram of a basic structure of a computer device according to an embodiment of the present disclosure.

Detailed Description

Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present application.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It will be understood by those within the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

As described in the background art, the existing feature processing method of medical insurance data mainly focuses on the modeling method, and ignores the processing process of medical insurance data, so that the trained model has strong bias and low accuracy.

In order to solve the above problems, an embodiment of the present application provides a feature processing method for medical insurance data, where after original medical insurance data in a medical insurance data source is obtained, the original medical insurance data is subjected to preprocessing operations such as data cleaning and conversion to obtain basic data, the basic data is subjected to feature classification summarization and grouping to obtain derivative data, the original medical insurance data, the basic data and the derivative data are integrated into a training data set, and a universal training data set is generated at one time, so that a large amount of index processing time and labor time are saved.

The characteristic processing method of the medical insurance data can be applied to computer equipment. The computer equipment can be equipment for medical insurance wind control supervision, a chip in the equipment, and a system on chip in the equipment.

Optionally, the device may be a physical machine, for example: desktop computers, also called desktop computers (desktop computers), mobile phones, tablet computers, notebook computers, ultra-mobile personal computers (UMPCs), netbooks, Personal Digital Assistants (PDAs), and other terminal devices.

Optionally, the computer device may also implement functions to be implemented by the computer device through a Virtual Machine (VM) deployed on a physical machine.

The following describes in detail the characteristic processing method of medical insurance data provided by the embodiments of the present application with reference to the drawings. As shown in fig. 1, the characteristic processing method of medical insurance data includes: S101-S104.

S101, original medical insurance data of a preset dimension in a medical insurance data source is obtained, wherein the medical insurance data source is a database which is opened and shared by medical insurance organizations.

Optionally, the medical insurance organization may be a national medical insurance bureau, or an organization or a platform capable of acquiring medical insurance data from the national medical insurance bureau, for example, an enterprise website for regularly issuing medical insurance statistical bulletin, and the data source of the enterprise website is the national medical insurance bureau.

In a possible implementation manner, the original medical insurance data is medical data, such as personal health files, prescriptions, examination reports, and the like, specifically, when the original medical insurance data is obtained, the computer device first determines the dimension of the required data, where the preset dimensions include, but are not limited to, a staff number ID dimension, a hospital dimension, a visit number dimension, a department dimension, a physician dimension, and a security unit dimension, where the six dimensions of the staff number ID dimension, the hospital dimension, the visit number dimension, the department dimension, the physician dimension, and the security unit dimension are examples, the staff number ID dimension is a number ID of a reimburser, such as an identity number, and the hospital dimension is a hospital name or a short name of a reimburser for hospitalization, the visit number dimension is a number of a visit card used by the reimburser for hospitalization, and the department dimension is a department name of a hospital when the reimburser for reimbursement, the dimension of the doctor is the main doctor when the reimburser goes to the doctor, and the dimension of the insurance participation unit is the unit for transacting the medical insurance procedures for the reimburser.

The computer equipment extracts data from the medical insurance data source according to six dimensions, namely a personnel number ID dimension, a hospital dimension, a visit number dimension, a department dimension, a doctor dimension and a participation unit dimension, and extracts data related to the hospital in the medical insurance data source as original medical insurance data by taking the hospital dimension as an example.

And S102, carrying out data preprocessing on the original medical insurance data to generate basic data.

Specifically, after the original medical insurance data is obtained, preprocessing operations such as cleaning and conversion can be performed on the original medical insurance data to obtain basic data.

For example, the cleaning operation is performed on the original medical insurance data, including but not limited to non-dimensionalization, missing value processing, abnormal value processing, discrete data processing, etc., and the preprocessing of the data requires that the data samples meet a certain standard, wherein the non-dimensionalization processing is a data processing mode for converting data with different specifications into the same specification, so that the data representing different attributes (different units) are comparable to each other.

The non-dimensionalization processing method comprises a linear non-dimensionalization method, a non-linear non-dimensionalization method and a non-dimensionalization method of qualitative indexes, wherein the linear non-dimensionalization method is that when an actual value of an index is converted into an index evaluation value which is not influenced by dimensions, a linear relation is assumed between the actual value and the index evaluation value, and the change of the actual value of the index causes a corresponding proportional change of the index evaluation value. Linear dimensionless methods include, but are not limited to, min-max normalization and Z-score.

In the min-max normalization method, the original data is subjected to linear transformation and is mapped between [0 and 1 ]. The formula for the min-max normalization is as follows:

in formula (i), min is the minimum value of the sample, max is the maximum value of the sample, x is the original data, and after min-max normalization, taking a set of height data ([1.70], [1.71], [1.72], [1.70], [1.73]) as an example, the values are: ([0],[0.3333],[0.6667],[0],[1]). The difference between the data is amplified through the data after min-max normalization, and the learning of the model is facilitated.

In other embodiments, the formula for the Z-score (zero-mean normalization) method is as follows:

in the formula (II), x is the original data, u is the sample mean, and sigma is the sample standard deviation. Namely, the Z-score method firstly obtains the mean value u and the standard deviation sigma of each data (index), then the value x' after the raw number is normalized can be calculated, and the Z-score method can remove dimensions and avoid the influence of the selection of different dimensions on distance calculation.

In some embodiments, a failure in data collection or storage may result in data loss, for example, a failure in data storage, a damaged memory, or a mechanical failure, which may result in data not being collected or stored for a certain period of time, and of course, the data loss may also be caused by subjective factors, for example, an answer to a question rejected by an interviewee in a market survey, a data entry person missing data by mistake, and the like. The missing value processing may process missing data, and when implemented, the missing value processing includes, but is not limited to, missing value completion, deletion of a feature containing a missing value, direct use of a feature containing a missing value, and the like, and is not particularly limited herein.

Optionally, the missing value interpolation includes, but is not limited to, mean interpolation, homogeneous mean interpolation, median interpolation, mode interpolation, etc., in some embodiments, the mean interpolation refers to interpolating the missing value using a mean value of valid values of sample attributes, and the visit number includes ([123], [124], [125], [126], [ xxx ], [122]) taking a visit number dimension as an example, where xxx represents missing data and the mean value is calculated as: 124, the visit number after the mean interpolation is: ([123],[124],[125],[126],[124],[122]).

Further, the above-mentioned mean interpolation is applicable to the case where the distance of the sample attribute is measurable, and when the distance of the sample attribute is not measurable (non-numerical type), the mode of the valid value of the sample attribute may be used to interpolate the missing value, that is, the value with the largest number of times the sample attribute takes values to fill up the missing value. Taking the dimension of the department as an example, the dimension comprises ([ orthopedics ], [ surgery ], [ internal medicine ], [ neurology ], [ none ], [ surgery ]), wherein no data is missing, and since the surgery occurs the most frequently, the surgery is inserted into the position of the missing value to obtain ([ orthopedics ], [ surgery ], [ internal medicine ], [ neurology ], [ surgery ]).

Further, the homogeneous mean interpolation first needs to classify the sample data, and then uses the mean of the samples in the class to interpolate the missing value.

Further, median interpolation is to sort a set of data by size, and then interpolate a missing value by taking an effective value at an intermediate position, for example, the above-mentioned visit number ([123], [124], [125], [126], [ xxx ], [122]), and if the median is 124, the visit number after median interpolation is: ([123],[124],[125],[126],[124],[122]).

It is understood that in other embodiments, other deficiency value complementing methods can be used, such as hot-card interpolating, regression interpolating, multiple interpolating, etc., and can be used to complement the missing data.

In some embodiments, an abnormal value may exist in the original medical insurance data, where the abnormal value is an unreasonable value in the data set, and the abnormal value is also referred to as an outlier, for example, the personnel number dimension is an identification number, the identification number has 18 bits, and when a non-18-bit number appears in the collected personnel number, the number is determined to be an abnormal value.

Alternatively, the determination of outliers includes, but is not limited to, boxplot analysis, 3 δ principle, simple statistical analysis, etc., wherein the simple statistical analysis is a descriptive statistic on attribute pairs to see which values are not reasonable. For example, the above rule for this attribute of the identification number: the ID card number is 18 bits, if the number of bits of the ID card number in the sample data is not 18 bits, the sample data is an abnormal value.

Further, when the data obeys the positive distribution, the 3 δ principle can be used, and the probability of being out of 3 δ from the average is 0.003 according to the definition of the positive distribution, which belongs to the extremely small probability event, and then the sample data with the distance of more than 3 δ from the average can be considered to belong to the abnormal value. Of course, in other embodiments, when the data does not obey the positive-over distribution, the standard deviation can be determined by how many times away from the average distance, and the value of how many times can be determined according to actual situations. For example, if the probability of being 5 times farther from the average distance is 0.001, it can be assumed that sample data 5 times farther from the average distance belongs to an abnormal value.

Further, to improve the accuracy of the abnormal value determination, a boxed graph analysis may also be employed, which uses five statistics in the data: the method for describing data by the minimum value, the first quartile, the median, the third quartile and the maximum value is characterized in that the first quartile (Q1), the median and the third quartile (Q3) are calculated firstly, specifically, a group of data can be sorted from small to large, the number at the middle position is the median, namely the number at the position of 50%, and similarly, the first quartile and the third quartile are 25% and 75% of the numbers after being sorted from small to large. Let IQR be Q3-Q1, then the values between Q3+1.5(IQR) and Q1-1.5(IQR) are values within the acceptable range, and values other than Q3+1.5(IQR) and Q1-1.5(IQR) are considered abnormal values.

Further, the abnormal value processing method includes, but is not limited to, deleting a sample containing the abnormal value, treating the abnormal value as a missing value, and the like, and is not particularly limited herein.

Alternatively, the missing value processing performed on the abnormal value regarded as the missing value may refer to the step of filling the missing value, for example, the step of performing interpolation on the abnormal value as the missing value, such as mean interpolation, homogeneous mean interpolation, median interpolation, mode interpolation, and the like.

In some optional embodiments, the raw medical insurance data may be further subjected to discrete data processing, where the discrete data processing refers to converting the classification data into a numerical value that can be put into a model for calculation, the raw medical insurance data sometimes is not always a continuous value, and may be some classification values, and values of some classification values have a size meaning, such as size: [ X, XL, XXL ], a numerical mapping { X: 1, XL: 2, XXL: 3}. While there is no significance between the values of other classification values, such as hospital: one-hot-encoding (one-hot-encoding), which is a method of converting a classification variable into several binary columns, can be used for such data. Taking the department dimensions including orthopedics, internal medicine, surgery, neurology, and ophthalmology as examples, the one-hot code for orthopedics is: 10000, unique code of internal medicine: 01000, surgical one-hot coding is: 00100, the unique fever coding of the neurology department is: 00010, ophthalmic unique heat code is: 00001. the data after the one-hot coding can be directly used for the classifier, and the problem that the classifier does not process attribute data well is solved.

The order of execution of the non-dimensionalization, missing value, abnormal value, and discrete data processing for cleaning the data is not fixed, and for example, the operations of the non-dimensionalization, missing value, abnormal value, and discrete data processing may be executed at a time, or the missing value processing, abnormal value processing, and discrete data processing may be executed first, or the missing value processing, abnormal value processing, and non-dimensionalization may be executed first, and the present embodiment is not limited thereto.

S103, carrying out feature classification, summarization and grouping on the basic data according to the preset dimensionality to obtain derived data.

The computer device classifies the cleaned data according to different dimensions, collects the data in groups, counts the data after grouping the data and derives interpretable collected fields, and optionally classifies the data according to preset dimensions in the medical insurance data, for example, classifies the data according to six dimensions of a staff number ID dimension, a hospital dimension, a visit number dimension, a department dimension, a doctor dimension and a participation and insurance unit dimension, and obtains the characteristics of the corresponding staff number ID dimension, the hospital dimension, the visit number dimension, the department dimension, the doctor dimension and the participation and insurance unit dimension.

Further, the feature groups are summarized into features generated based on different dimensions, and are respectively summarized into N dimensional features, for example, the features are summarized into the six dimensional features, and taking the personnel number ID dimension as an example, the data of the personnel number ID can be searched in the cleaned data, and the searched data of the personnel number ID is summarized under the personnel number ID dimension.

Specifically, the data may be classified by using a variance selection method, a chi-square test, or a recursive feature elimination method, wherein the variance selection method first calculates the variance of each feature, and then selects the feature with the variance larger than the threshold according to the threshold.

In some embodiments, the chi-square test is to examine the correlation of qualitative independent variables to qualitative dependent variables. Assuming that the independent variable has N values and the dependent variable has M values, considering the difference between the observed value of the sample frequency with the independent variable equal to i and the dependent variable equal to j and the expectation, the SelectKBest class of feature _ selection library can be used to select features in combination with the chi-square test.

Optionally, the recursive feature elimination method is to perform multiple rounds of training by using a base model, after each round of training, eliminate features of several weight coefficients, and then perform the next round of training based on a new feature set, specifically, feature selection may be performed by using an RFE class of the feature _ selection library.

Further, the statistical method for the statistics after the feature grouping comprises a statistical analysis method and a proportion analysis method, wherein the statistical analysis method is to perform statistical analysis on the features of the middle grouping to obtain statistical indexes of the features, including a maximum value, a minimum value, a mean value, a sum, a median, a quarter-quartile, a three-quarter-quartile and/or a variance;

further, the proportion analysis method is to calculate each proportion of each dimension; taking hospital dimension as an example, the ratio of the patient-watching times of the hospital a to the patient-watching times of all the hospitals in the time period can be calculated, or the ratio of the patient-watching cost of the hospital b to the patient-watching cost of all the hospitals in the time period can be calculated.

And S104, integrating the original medical insurance data, the basic data and the derivative data to generate a training data set.

Specifically, after derivative data are obtained through derivation, the computer equipment root summarizes the original medical insurance data, the basic data and the derivative data, and during implementation, data summarization and integration can be performed according to different dimensions to generate a training data set. The generated training data set can be used for subsequent model development, and a large number of model indexes (data) are generated through the feature library, so that multidimensional data modeling analysis becomes possible, and the comprehensiveness of the data is increased.

According to the embodiment of the application, after the original medical insurance data in the medical insurance data source is obtained, preprocessing operations such as data cleaning and conversion are carried out on the original medical insurance data to obtain basic data, feature classification summarizing and grouping are carried out on the basic data to obtain derivative data, then the original medical insurance data, the basic data and the derivative data are integrated into a training data set, a universal training data set is generated at one time, a large amount of index processing time is saved, manual time is saved, the original medical insurance data are medical insurance data with dimensions preset in a database which is shared by a medical insurance organization in an open mode, a large amount of model indexes are generated through the multi-dimensional medical insurance data, the comprehensiveness of the data is improved, and further the developed model is comprehensive and high in accuracy.

In some optional embodiments, please refer to fig. 2, fig. 2 is a schematic flow chart of modeling and detecting medical insurance data anomalies according to an embodiment of the present application.

As shown in fig. 2, after the step of integrating the original medical insurance data, the basic data and the derived data to generate the training data set, the feature processing method of medical insurance data provided by the present application further includes the following steps:

s105, inputting the training data set into a preset neural network for training to generate a medical insurance detection model;

s106, inputting the medical insurance data to be detected into the medical insurance detection model, and detecting the type of the medical insurance data to be detected, wherein the type comprises normal, suspected and illegal.

A neural network is a computing system with interconnected nodes, optionally including but not limited to a feed Forward Neural Network (FNN), a Convolutional Neural Network (CNN), a Recurrent Neural Network (RNN), a recurrent neural network, and the like. In implementation, taking a deep learning algorithm as an example, taking 6 dimensions of an M-dimensional vector as an example, where the training data set includes the above-mentioned personnel number ID dimension, hospital dimension, visit number dimension, department dimension, physician dimension, and reference and maintenance unit dimension, the default step size is set to 1, 2M + 1M-dimensional vectors are generated, the 2M + 1M-dimensional vectors are processed through a preset type of forest, 2M +1 3-dimensional vectors are respectively generated, and the 2M +1 3-dimensional vectors are connected together to generate a G-dimensional vector, where M is 2 and G is 3 (2M +1) forest numbers.

And aiming at the generated G-dimensional vector, each layer receives the feature information contained in the feature vector in a cascade mode, feature learning is carried out through multi-layer and multi-type forest combination, the output result of each layer is spliced with the generated G-dimensional vector, feature learning is carried out through the forest combination of each layer again, and parameters of each layer are kept as a medical insurance detection model according to the training times and the convergence index.

The medical insurance detection model can detect the type of the medical insurance data to be detected, and optionally, the type of the medical insurance data is the type of medical insurance behaviors contained in the medical insurance data, including normal, suspected and illegal behaviors. The medical insurance violation in the medical process can be accurately, timely and effectively detected in real time, so that the accuracy and the efficiency of medical insurance fund supervision are improved.

In some optional embodiments, please refer to fig. 3, and fig. 3 is a schematic flow chart illustrating a process of performing a review on medical insurance data to be detected according to an embodiment of the present application.

As shown in fig. 3, after the step of inputting the medical insurance data to be detected into the medical insurance detection model and detecting the type of the medical insurance data to be detected, the characteristic processing method of the medical insurance data provided by the present application further includes the following steps:

s107, obtaining an auditing result of performing double auditing on the detected medical insurance data to be detected with abnormal types;

and S108, adding the auditing result into the training data set as a marked training sample.

After the medical insurance data to be detected is input into the medical insurance detection model for detection, the medical insurance data to be detected which are detected to be abnormal such as suspicion, violation and the like can be audited again, optionally, the data can be audited again through manpower, and the efficiency and the accuracy of auditing the medical insurance data can be effectively improved. And the result after secondary verification is added into the training data set and used for training the model, and a medical insurance violation detection self-learning closed loop can be formed, so that the sensitivity of the medical insurance detection model is improved, new medical insurance violation can be detected more quickly, advance prevention and early warning can be performed on different violations, warning prompt in the process is performed, analysis and control are performed afterwards, and the real-time performance of medical insurance behavior supervision is ensured.

In some alternative embodiments, please refer to fig. 4, and fig. 4 is a flowchart illustrating a process of reviewing medical insurance history data of illegal paramedics according to an embodiment of the present application.

As shown in fig. 4, after the step of obtaining an audit result of performing a double audit on the detected medical insurance data to be detected, which belongs to an abnormal type, the feature processing method of the medical insurance data provided by the present application further includes the following steps:

s109, acquiring information of the insured person of the medical insurance data to be detected, wherein the audit result is illegal;

and S110, acquiring all medical insurance history records matched with the information of the insured person, and performing re-examination on the medical insurance history records.

After the medical insurance data to be detected with the abnormal types are subjected to re-audit, illegal medical insurance data to be detected and corresponding information of the insured person are found out, the information of the insured person is identity information of a person who purchases the medical insurance, when the medical insurance data of a certain insured person is determined to be illegal, the medical insurance historical data of the insured person possibly also has illegal data, the system can obtain all medical insurance historical records of the insured person and re-audit the medical insurance historical records, so that all illegal medical insurance behaviors of the insured person are found out, the insured person can be conveniently adjusted subsequently, for example, the medical insurance continuation of the insured person is refused, or the auditing strength of the medical insurance data of the insured person is increased, and the like, and specific limitation is not made here.

In some alternative embodiments, please refer to fig. 5, where fig. 5 is a schematic flow chart of model parameter tuning according to an embodiment of the present application.

As shown in fig. 5, after the step of inputting the training data set to a preset neural network for training to generate a medical insurance detection model, the feature processing method of medical insurance data provided by the present application further includes the following steps:

s111, acquiring performance parameters of a medical insurance detection model;

and S112, adjusting the model parameters of the medical insurance detection model according to the performance parameters.

After generating the medical insurance detection model, the computer device may further evaluate the performance of the model to obtain performance parameters of the medical insurance detection model, optionally, the performance parameters include, but are not limited to, Accuracy (Accuracy), precision (precision), recall (recall), F1 value, and the like of the model.

Further, the accuracy of the evaluation index of the medical insurance detection model can be calculated according to the output result of the confusion matrix, and the accuracy calculation formula is as follows:

in the formula III, TP and TN represent correctly predicted samples, all data represent all samples, so the accuracy rate represents the proportion of the correctly predicted samples in all samples.

Further, the precision rate represents the proportion of samples with the attribute as the real category in the samples predicted to be the attribute, the recall rate represents the proportion of samples successfully predicted by the model in the samples with the attribute as the real category, and the F1 value is the harmonic mean value of the precision rate and the recall rate, and the specific type of the performance evaluation is not limited herein.

Further, after the performance parameters of the model are obtained, parameters of the model, such as a tuning-overshoot parameter, may be adjusted according to the performance parameters, and in implementation, the tuning-overshoot parameter may be selected near a default parameter of a learning algorithm, or a tuning space including a plurality of parameters is configured, and the parameters in the tuning space are traversed to select the optimal parameter as the parameter of the model, so that the performance of the model may be optimized, which is not specifically limited herein.

In some alternative embodiments, referring to fig. 6, fig. 6 is a schematic flow chart of a process model according to an embodiment of the present application.

As shown in fig. 6, after the step of obtaining the performance parameters of the medical insurance detection model, the feature processing method of the medical insurance data provided by the present application further includes the following steps:

s113, judging whether the performance parameters meet preset model standards;

after obtaining the performance parameters of the medical insurance detection model, the computer device may further determine whether the performance parameters of the model meet a preset model standard, where the model standard is a pre-configured parameter threshold, for example, the parameter threshold of the accuracy includes a lowest value and a highest value, where the lowest value is 99%, when the accuracy obtained by the calculation reaches 99%, adjust the parameters of the model to optimize the model, and store the model, and if the accuracy is lower than 99%, the performance of the model is poor, it is determined that the model does not meet the model standard, and step S114 is executed.

And S114, deleting the medical insurance detection model. When the performance of the medical insurance detection module is poor, the computer equipment can delete the medical insurance detection module, so that the model with poor performance and good preservation performance is removed, and the accuracy of data detection can be effectively improved.

In some alternative embodiments, please refer to fig. 7, fig. 7 is a schematic flow chart of model overfitting detection according to an embodiment of the present application.

As shown in fig. 7, after the step of inputting the training data set to a preset neural network for training to generate a medical insurance detection model, the feature processing method of medical insurance data provided by the present application further includes the following steps:

s115, performing overfitting detection on the medical insurance detection model;

s116, judging whether the medical insurance detection model is over-fitted or not according to the detection result;

after the medical insurance detection model is generated, overfitting detection can be carried out on the medical insurance detection model, the overfitting refers to the fact that the model is good in performance on the verification set and the training set, the model is poor in performance on the testing set, whether overfitting occurs or not can be judged through a prediction result in implementation, when the overfitting occurs, step S117 is executed, and otherwise step S118 is executed.

And S117, retraining the medical insurance detection model according to a preset training strategy until no overfitting exists.

And S118, saving the medical insurance detection model.

After the model is over-fitted, the over-fit training can be corrected by adjusting the model parameters and retraining, for example, by adding a data set and adding a regularization term to train the model in a cyclic manner until the model is no longer over-fitted.

The scheme provided by the embodiment of the application is mainly introduced from the perspective of a method. To implement the above functions, it includes hardware structures and/or software modules for performing the respective functions. Those of skill in the art will readily appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiment of the application, the functional modules of the feature processing device of the medical insurance data can be divided according to the method example, for example, each functional module can be divided corresponding to each function, or two or more functions can be integrated into one processing module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. Optionally, the division of the modules in the embodiment of the present application is schematic, and is only a logic function division, and there may be another division manner in actual implementation.

Referring to fig. 8, fig. 8 is a schematic view of a basic structure of a feature processing device for medical insurance data of the present embodiment.

As shown in fig. 8, a feature processing device of medical insurance data includes:

the medical insurance data acquisition module 1110 is configured to acquire original medical insurance data of a preset dimension in a medical insurance data source, where the medical insurance data source is a database opened and shared by a medical insurance organization;

the data preprocessing module 1120 is used for performing data preprocessing on the original medical insurance data to generate basic data;

a data derivation module 1130, configured to perform feature classification, summarization, and grouping on the basic data according to the preset dimension to obtain derived data;

a data integration module 1140, configured to integrate the original medical insurance data, the basic data, and the derivative data to generate a training data set.

Optionally, the feature processing device for medical insurance data provided by the application further includes:

In order to solve the above technical problem, an embodiment of the present invention further provides a computer device. Referring to fig. 9, fig. 9 is a block diagram of a basic structure of a computer device according to the present embodiment.

As shown in fig. 9, the internal structure of the computer device is schematically illustrated. The computer device includes a processor, a non-volatile storage medium, a memory, and a network interface connected by a system bus. The non-volatile storage medium of the computer device stores an operating system, a database and computer readable instructions, the database can store control information sequences, and the computer readable instructions can enable the processor to realize a feature processing method of medical insurance data when being executed by the processor. The processor of the computer device is used for providing calculation and control capability and supporting the operation of the whole computer device. The memory of the computer device may have stored therein computer readable instructions that, when executed by the processor, cause the processor to perform a method of characterizing medical insurance data. The network interface of the computer device is used for connecting and communicating with the terminal. Those skilled in the art will appreciate that the architecture shown in fig. 9 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In this embodiment, the processor is configured to execute specific functions of the data protection acquisition module 1110, the data preprocessing module 1120, the data derivation module 1130, and the data integration module 1140 in fig. 8, and the memory stores program codes and various data required for executing the modules. The network interface is used for data transmission to and from a user terminal or a server. The memory in this embodiment stores program codes and data required for executing all the sub-modules in the face image key point detection device, and the server can call the program codes and data of the server to execute the functions of all the sub-modules.

The computer equipment obtains the original medical insurance data in the medical insurance data source, then carries out preprocessing operations such as data cleaning and conversion on the original medical insurance data to obtain basic data, carries out characteristic classification summarizing and grouping on the basic data to obtain derivative data, then integrates the original medical insurance data, the basic data and the derivative data into a training data set, generates a universal training data set at one time, saves a large amount of index processing time and labor time, and generates a large amount of model indexes through multi-dimensional medical insurance data because the original medical insurance data open and share a database of a medical insurance organization, thereby increasing the comprehensiveness of the data and further ensuring that the developed model is comprehensive and has high accuracy.

The invention also provides a storage medium storing computer readable instructions, which when executed by one or more processors, cause the one or more processors to perform the steps of the method for processing characteristics of medical insurance data according to any one of the above embodiments.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the computer program is executed. The storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a Random Access Memory (RAM).

Those of skill in the art will appreciate that the various operations, methods, steps in the processes, acts, or solutions discussed in this application can be interchanged, modified, combined, or eliminated. Further, other steps, measures, or schemes in various operations, methods, or flows that have been discussed in this application can be alternated, altered, rearranged, broken down, combined, or deleted. Further, steps, measures, schemes in the prior art having various operations, methods, procedures disclosed in the present application may also be alternated, modified, rearranged, decomposed, combined, or deleted.

The foregoing is only a partial embodiment of the present application, and it should be noted that, for those skilled in the art, several modifications and decorations can be made without departing from the principle of the present application, and these modifications and decorations should also be regarded as the protection scope of the present application.

Claims

1. A characteristic processing method of medical insurance data is characterized by comprising the following steps:

preprocessing the original medical insurance data to generate basic data;

2. The method for characterizing medical insurance data according to claim 1, wherein after the step of integrating the raw medical insurance data, the basic data and the derived data to generate a training data set, the method further comprises the steps of:

3. The feature processing method of medical insurance data according to claim 2, wherein after the step of inputting the medical insurance data to be detected into the medical insurance detection model and detecting the type of the medical insurance data to be detected, the method further comprises the following steps:

4. The feature processing method of medical insurance data according to claim 3, wherein after the step of obtaining an audit result of a double audit of the detected medical insurance data to be detected with abnormal type, the method further comprises the steps of:

5. The method for processing characteristics of medical insurance data according to claim 2, wherein after the step of inputting the training data set into a preset neural network for training and generating a medical insurance detection model, the method further comprises the following steps:

acquiring performance parameters of the medical insurance detection model;

6. The method for characterizing medical insurance data according to claim 5, wherein after the step of obtaining the performance parameters of the medical insurance detection model, the method further comprises the steps of:

judging whether the performance parameters meet preset model standards;

7. The method for processing characteristics of medical insurance data according to claim 1, wherein after the step of inputting the training data set to a preset neural network for training and generating a medical insurance detection model, the method further comprises the steps of:

and when the medical insurance detection model is judged to be over-fitted, retraining the medical insurance detection model according to a preset training strategy until the medical insurance detection model is not over-fitted.

8. A characteristic processing device of medical insurance data is characterized by comprising:

9. A computer device comprising a memory and a processor, wherein computer readable instructions are stored in the memory, which when executed by the processor, cause the processor to perform the steps of the method of characterizing medical insurance data according to any one of claims 1 to 7.

10. A non-volatile storage medium, characterized in that it stores a computer program implemented by the method for characterizing medical insurance data according to any one of claims 1 to 7, the computer program, when called by a computer, performing the steps included in the method.