CN112131277B

CN112131277B - Medical data anomaly analysis method and device based on big data and computer equipment

Info

Publication number: CN112131277B
Application number: CN202011039794.4A
Authority: CN
Inventors: 唐强
Original assignee: Shenzhen Ping An Medical Health Technology Service Co Ltd
Current assignee: Shenzhen Ping An Medical Health Technology Service Co Ltd
Priority date: 2020-09-28
Filing date: 2020-09-28
Publication date: 2023-04-18
Anticipated expiration: 2040-09-28
Also published as: CN112131277A

Abstract

The application relates to artificial intelligence and provides a medical data abnormity analysis method and device based on big data, computer equipment and a storage medium. The method comprises the following steps: when an abnormal analysis request sent by a terminal is received, medical data to be analyzed corresponding to the same disease category are obtained; combining the data of each type in the medical data according to a preset data combination type to obtain a medical data pair corresponding to each data combination type; clustering the medical data pairs corresponding to the data combination types respectively, and determining outliers in the medical data corresponding to the data combination types according to clustering results; and determining abnormal data in the medical data according to the abnormality degree of each outlier, and feeding the abnormal data back to the terminal so as to perform visual display at the terminal. By adopting the method, the accuracy of the medical data anomaly analysis can be improved.

Description

Medical data anomaly analysis method and device based on big data and computer equipment

Technical Field

The present application relates to the field of computer technologies, and in particular, to a medical data anomaly analysis method and apparatus based on big data, a computer device, and a storage medium.

Background

With the continuous development of the internet technology, the internet technology plays an important role in the medical industry, and the processing efficiency of medical data can be effectively improved by processing various medical data by using the internet technology. With the increasing demand of medical treatment in different places, the authenticity and reliability of medical data of medical treatment in different places need to be accurately analyzed and judged, so that accurate medical data can be obtained, and a medical institution can provide better medical service pertinently.

However, the medical data has a variety of data, and currently, the authenticity and reliability of each type of medical data are analyzed one by one, so that the relation between each type of data cannot be effectively utilized, and the accuracy of abnormal analysis of the medical data is limited.

Disclosure of Invention

In view of the above, it is necessary to provide a method, an apparatus, a computer device and a storage medium for analyzing medical data abnormality based on big data, which can improve the accuracy of analyzing medical data abnormality.

A big-data-based medical data anomaly analysis method, the method comprising:

when an abnormal analysis request sent by a terminal is received, medical data to be analyzed corresponding to the same disease category are obtained;

combining the data of each type in the medical data according to a preset data combination type to obtain a medical data pair corresponding to each data combination type;

clustering the medical data pairs corresponding to the data combination types respectively, and determining outliers in the medical data corresponding to the data combination types according to clustering results;

and determining abnormal data in the medical data according to the abnormality degree of each outlier, and feeding the abnormal data back to the terminal so as to perform visual display on the terminal.

In one embodiment, when an anomaly analysis request sent by a terminal is received, acquiring medical data to be analyzed, corresponding to the same disease category, of a user includes:

receiving an abnormal analysis request sent by a terminal, and determining a target user identifier according to the abnormal analysis request;

acquiring a medical record corresponding to the target user identifier from a medical record library;

medical data to be analyzed corresponding to the same disease species is extracted from the medical records.

In one embodiment, extracting medical data to be analyzed corresponding to the same disease species from the medical record comprises:

determining the disease species to be analyzed, and screening medical records belonging to the disease species to be analyzed from the medical records;

analyzing the screened medical records, and determining the hospitalizing information, the cost information and the insurance participation position of the target user identification corresponding to the disease species to be analyzed according to the analysis result;

determining the remote medical distance according to the position of the medical hospital in the medical information and the participation position;

and obtaining medical data to be analyzed according to the hospitalizing information, the expense information, the participation and insurance place and the remote hospitalizing distance.

In one embodiment, clustering the medical data pairs corresponding to the data combination types, and determining outliers in the medical data corresponding to the data combination types according to the clustering result includes:

clustering medical data pairs corresponding to the data combination types respectively through a density clustering algorithm to obtain a clustering result comprising at least one clustering cluster;

respectively determining preparation outliers in the medical data pairs corresponding to the data combination types based on an outlier detection algorithm;

and determining outliers in the medical data corresponding to the data combination types from the prepared outliers according to the clustering clusters.

In one embodiment, determining abnormal data in the medical data according to the abnormality degree of each outlier comprises:

inquiring preset data combination types to respectively correspond to abnormal factors;

calculating the abnormality degree of outliers corresponding to the data combination type according to the abnormality factor;

and determining abnormal data in the medical data according to the abnormal degree of each outlier.

overlapping the abnormality degrees of all outliers corresponding to the same target user identification to obtain the user abnormality degrees corresponding to all the target user identifications respectively;

and when the user abnormality degree exceeds the abnormality degree threshold value, determining that the medical data of the target user identification corresponding to the user abnormality degree is abnormal data.

In one embodiment, the method for feeding back the abnormal data to the terminal for visual display at the terminal includes:

inquiring a preset abnormal grading condition;

determining an abnormal level corresponding to the abnormal data according to the abnormal grading condition;

determining a visual display mode of abnormal data according to the abnormal level;

and feeding back the abnormal data and the visual display mode to the terminal so as to visually display the abnormal data at the terminal according to the visual display mode.

A big-data-based medical data anomaly analysis device, the device comprising:

the medical data acquisition module is used for acquiring medical data to be analyzed corresponding to the same disease species when receiving an abnormal analysis request sent by the terminal;

the data pair obtaining module is used for combining the data of each type in the medical data according to a preset data combination type to obtain a medical data pair corresponding to each data combination type;

the outlier determining module is used for clustering the medical data pairs corresponding to the data combination types respectively and determining outliers in the medical data corresponding to the data combination types according to clustering results;

and the abnormal data determining module is used for determining abnormal data in the medical data according to the abnormal degree of each outlier and feeding the abnormal data back to the terminal so as to perform visual display on the terminal.

A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:

A computer storage medium on which a computer program is stored which, when executed by a processor, performs the steps of:

combining the data of various types in the medical data according to a preset data combination type to obtain medical data pairs corresponding to the data combination types respectively;

According to the medical data anomaly analysis method and device based on the big data, the computer equipment and the storage medium, data of various types in the medical data to be analyzed corresponding to the same disease type are combined according to the preset data combination type, the obtained medical data pairs are clustered, outliers in the medical data corresponding to the data combination types are determined according to the clustering result, and then the anomaly data in the medical data are determined according to the anomaly degree of the outliers. The medical data analysis method has the advantages that various types of data in the medical data are combined through the preset data combination type, abnormal data are determined according to the abnormal degree of outliers obtained by clustering the obtained medical data, various types of medical data can be combined for abnormal analysis, the relation among various types of data is effectively utilized, the abnormal data in the medical data can be accurately determined, and the accuracy of the abnormal analysis of the medical data is improved.

Drawings

FIG. 1 is a diagram illustrating an exemplary application of a big data based anomaly analysis method for medical data;

FIG. 2 is a schematic flow chart diagram illustrating a big data-based medical data anomaly analysis method according to an embodiment;

FIG. 3 is a schematic flow chart illustrating the extraction of medical data according to one embodiment;

FIG. 4 is a block diagram of a big data-based medical data anomaly analysis device according to an embodiment;

FIG. 5 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of and not restrictive on the broad application.

The medical data anomaly analysis method based on big data can be applied to the application environment shown in fig. 1. Wherein the terminal 102 and the server 104 communicate via a network. The method comprises the steps that a terminal 102 sends an abnormal analysis request to a server 104 to request the server 104 to perform abnormal analysis on medical data, when the server 104 receives the abnormal analysis request sent by the terminal 102, the server 104 obtains the medical data to be analyzed corresponding to the same disease type, combines data of various types in the medical data to be analyzed corresponding to the same disease type according to a preset data combination type, clusters the obtained medical data pair, determines outliers in the medical data corresponding to the data combination type according to a clustering result, determines abnormal data in the medical data according to the abnormality degree of the outliers, and finally feeds the obtained abnormal data back to the terminal 102 through the server 104 to perform visual display on the terminal 102. The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices, and the server 104 may be implemented by an independent server or a server cluster formed by a plurality of servers.

In one embodiment, as shown in fig. 2, a big data-based medical data anomaly analysis method is provided, which is described by taking the method as an example applied to the server in fig. 1, and includes the following steps:

step 202, when receiving an abnormal analysis request sent by the terminal, acquiring medical data to be analyzed corresponding to the same disease type.

The medical data may be information related to medical services, for example, the medical data may include hospitalizing information, cost information, medical insurance information, and the like, and the hospitalizing information may specifically include hospitalizing hospital information, disease information, medication information, hospitalization information, operation information, and the like in a medical process; the expense information can comprise the hospitalizing expense in the hospitalizing process, and specifically can comprise total expense, medical insurance expense, self-fee expense and the like; the medical insurance information is medical insurance information of the patient corresponding to the medical data, and can comprise an insurance participation place, an insurance participation type, medical insurance content and the like. In the medical insurance data, due to the fact that the medical condition is seen from different places, the authenticity and the reliability of the medical data need to be analyzed to determine whether abnormal data exist in the medical data, so that the accuracy of the medical data is ensured, a medical institution can provide targeted medical services, and the medical effect is improved; on the other hand, the authenticity and the accuracy of medical data are ensured, and the fraudulent behavior of medical insurance cheating can be prevented. Therefore, the medical data anomaly analysis method can be applied to intelligent government affairs, and construction of intelligent cities is promoted. The abnormality analysis request is sent to the server by the terminal to request the server to perform abnormality analysis on the medical data, the abnormality analysis request can carry an acquisition path of the medical data which needs to be subjected to the abnormality analysis, and the server can acquire the medical data which needs to be subjected to the abnormality analysis according to the acquisition path. The disease type refers to the type of disease, different types of diseases, that is, different types of diseases may have different hospitalization information and cost information, and if different diagnosis and treatment means and different drugs are used in the diagnosis and treatment process, the medical data needs to be analyzed for abnormality by classification according to the disease type, that is, the medical data of the same disease type needs to be analyzed for abnormality, so as to ensure the accuracy of the abnormality analysis.

Specifically, the terminal sends an abnormality analysis request to the server, and when the server receives the abnormality analysis request, the server obtains medical data to be analyzed of the same disease type, so that abnormality analysis is performed on the medical data of the same disease type, and abnormal data in the medical data are obtained.

And 204, combining the data of each type in the medical data according to a preset data combination type to obtain a medical data pair corresponding to each data combination type.

The medical data comprises various types of data, such as hospitalizing information, expense information, insurance information and the like, and the abnormal analysis is carried out after the different types of data are combined, so that the relation among the various types of data can be effectively utilized, and the accuracy of the abnormal analysis is improved. The data combination type refers to a type that needs to be combined in the medical data, and if the type can be total cost and hospitalization times, the total cost information and hospitalization times in the medical data are combined to perform anomaly analysis according to a combination result. The medical data pair is a data pair obtained by combining different types of data in the medical data, such as the medical data pair can be (total cost, number of hospitalizations).

Specifically, after medical data to be analyzed are obtained, the server determines a preset data combination type, and combines data of various types in the medical data according to the data combination type to obtain a medical data pair corresponding to the data combination type. In a specific implementation, the medical data generally corresponds to patients, and the medical data of each patient can be combined according to the data combination type to obtain a medical data pair corresponding to the medical data of each patient. The data combination type can be flexibly set according to actual requirements so as to construct medical data pairs comprising different types of data. For example, the data of each type in the medical data may be combined in two types to obtain a two-dimensional medical data pair including the data of the two types.

And step 206, clustering the medical data pairs corresponding to the data combination types respectively, and determining outliers in the medical data corresponding to the data combination types according to clustering results.

Where clustering is a machine learning technique involving grouping data points, given a set of data points, each data point is divided into a specific set by a clustering algorithm, and theoretically, data points in the same set should have similar attributes and/or characteristics, while data points in different sets should have highly different attributes and/or characteristics. The clustering result comprises various groups formed by clustering the medical data pairs, and is called clusters, and the clustering result comprises at least one cluster. An outlier refers to a point that has a low correlation with a cluster in the clustering result, and is generally an isolated point that is a long distance from the cluster. The outliers are far from the clusters obtained by clustering, which indicates that the outliers are different from the general medical data pairs, and may be abnormal data.

Specifically, after the server combines the data of each type in the medical data, the server clusters each medical data pair, for example, by using a density clustering algorithm, to obtain a clustering result, and determines an outlier in the medical data corresponding to each data combination type according to the clustering result, where the outlier is a medical data pair suspected to be abnormal data.

And step 208, determining abnormal data in the medical data according to the abnormal degree of each outlier, and feeding the abnormal data back to the terminal for visual display at the terminal.

Wherein the degree of abnormality reflects the degree of abnormality of the outlier, such as can be obtained from the distance of the outlier from the cluster in the clustering result. The abnormal data is abnormal data in the medical data, and the authenticity and the reliability of the abnormal data are low and may be caused by data errors.

Specifically, after the server obtains the outliers in the medical data corresponding to each data combination type, the server further determines the abnormality degree corresponding to each outlier, determines the abnormal data in the medical data according to the abnormality degree, and feeds the abnormal data obtained through abnormality analysis back to the terminal for visual display, so that the abnormal data in the medical data can be visually displayed on the terminal.

According to the medical data anomaly analysis method based on the big data, data of various types in medical data to be analyzed corresponding to the same disease category are combined according to a preset data combination type, the obtained medical data pairs are clustered, outliers in the medical data corresponding to the data combination types are determined according to clustering results, and then the anomaly data in the medical data are determined according to the anomaly degree of the outliers. The medical data analysis method has the advantages that various types of data in the medical data are combined through the preset data combination type, abnormal data are determined according to the abnormal degree of outliers obtained by clustering the obtained medical data, various types of medical data can be combined for abnormal analysis, the relation among various types of data is effectively utilized, the abnormal data in the medical data can be accurately determined, and the accuracy of the abnormal analysis of the medical data is improved.

In one embodiment, when an anomaly analysis request sent by a terminal is received, acquiring medical data to be analyzed, corresponding to the same disease category, of a user includes: receiving an abnormal analysis request sent by a terminal, and determining a target user identifier according to the abnormal analysis request; acquiring a medical record corresponding to the target user identifier from a medical record library; medical data to be analyzed corresponding to the same disease species is extracted from the medical records.

In this embodiment, the medical record is obtained from the medical record library according to the target user identifier determined by the abnormality analysis request, and the medical data to be analyzed is extracted from the medical record. The target user identification refers to identification information of a disease patient, such as an identification number, a name, a medical system account number, a mobile phone number and the like, which can be used for identifying different patients. The medical record library is a database for storing various medical records, the medical records record various information of the patient in the hospitalizing process, such as medical record information, medical insurance information, payment information and the like, and the medical record library can be arranged in a medical institution to manage various information of the patient in the hospitalizing process.

Specifically, the server receives an anomaly analysis request sent by the terminal, and determines a target user identifier according to the anomaly analysis request. For example, the anomaly analysis request may carry a target user identifier, for example, a user identifier field may be set, and the server analyzes the received anomaly analysis request, for example, analyzes each field of the target user identifier, and reads the content of the user identifier field, so that the target user identifier may be obtained from the anomaly analysis request. After the target user identification is determined, the server inquires a preset medical record library and acquires the medical record corresponding to the target user identification from the medical record library. For example, a mapping relationship between each target user identifier and the corresponding medical record may be recorded in the medical record library, and the server may query the medical record corresponding to the target user identifier from the medical record library according to the mapping relationship through the target user identifier in the abnormality analysis request. After the medical record corresponding to the target user identification is obtained, the server extracts the medical data to be analyzed corresponding to the same disease type from the medical record, such as medical data corresponding to diabetes. In a specific implementation, the data structure of the medical record is set according to the management requirements of the medical institution, and the obtained medical record needs to be analyzed according to the medical record format corresponding to the medical institution, so that the medical data to be analyzed corresponding to the same disease category is extracted from the medical record.

In this embodiment, the medical record is acquired from the medical record library according to the target user identifier determined by the abnormality analysis request, and the medical data to be analyzed is extracted from the medical record, so that the abnormality analysis can be performed on the specified medical data in response to the abnormality analysis request of the terminal, and the pertinence of the abnormality analysis of the medical data is improved.

In one embodiment, as shown in fig. 3, the process of extracting medical data, i.e. extracting medical data to be analyzed corresponding to the same disease species from the medical record, includes:

step 302, determining the disease species to be analyzed, and screening medical records belonging to the disease species to be analyzed from the medical records.

The disease type to be analyzed is a disease type corresponding to medical data needing abnormal analysis, and generally, the medical data of different disease types have large difference, so that effective and accurate abnormal analysis cannot be directly performed. If the medical data corresponding to the diabetes and the cold are combined to perform anomaly analysis, effective and accurate anomaly data are difficult to obtain, and the analysis accuracy is low. The screening medical record is the content corresponding to the disease species to be analyzed in the medical record.

Specifically, when the server extracts the medical data from the medical record, the server determines the current disease type to be analyzed, and the disease type to be analyzed may also be extracted from the corresponding disease type field in the abnormality analysis request sent by the terminal. The server screens the medical records to obtain screened medical records belonging to the disease species to be analyzed.

And step 304, analyzing the screened medical records, and determining the hospitalizing information, the cost information and the insurance participation position of the target user identification corresponding to the disease species to be analyzed according to the analysis result.

The medical information may specifically include medical hospital information, disease information, medication information, hospitalization information, operation information, and the like in the medical process; the expense information can comprise the hospitalizing expense in the hospitalizing process, and specifically can comprise total expense, medical insurance expense, self-fee expense and the like; the participation place identifies the region where the corresponding user participates in the medical insurance for the target user.

Specifically, after the screened medical records are obtained, the server analyzes the screened medical records to identify the content in the medical records, so as to obtain an analysis result, and determines the hospitalization information, the cost information and the participation position of the target user identification corresponding to the disease type to be analyzed according to the analysis result, so as to obtain the hospitalization information, the cost information and the participation position of each target user identification under the disease type to be analyzed. During specific implementation, the server can analyze the screened medical records through text recognition, semantic recognition and the like, and determine medical information, cost information and insurance participation positions of the target user identification corresponding to the disease species to be analyzed according to the analysis result.

And step 306, determining the distance of the remote medical treatment according to the position of the medical treatment hospital in the position of the insurance participation and the medical treatment information.

The hospitalizing hospital is a hospital participating in the diagnosis and treatment process of the patient corresponding to the target user identification. The server can determine whether the patient corresponding to the target user identification has a remote medical condition according to the patient insurance participation place corresponding to the target user identification and the position of the medical hospital in the medical information, and if so, the specific remote medical distance can be obtained.

In one particular application, the determination of distance to seek medical attention from a remote location may be accomplished by crawling all hospital longitudes and latitudes automatically from the network using a crawler technology, such as the python script crawler frameworkThe initial coordinate is the position of the hospital, and then the distance information from the patient to all the different hospitals is calculated according to the address of the unit or the family address of the patient corresponding to the target user identification, the distance divergence (the divergence degree of the hospitalizing distance,

wherein id represents that the target user identification corresponds to the patient, m represents the number of hospitals in which the patient is hospitalized at different places, and/or>

Representing the distance number required to be calculated, dist (p, q) represents the distance between p and q hospitals, and then distance normalization is carried out, L _id ＝dist _id /max(dist _id )，L _id Ranges from 0 to 1, with larger values indicating a large distance spread for the patient to seek medical advice, and more suspicious, less authenticity and reliability of the medical data. The distance of the remote medical treatment can be the sum of the distances of all remote medical treatment hospitals. Assuming a maximum sum of distances of 11500km, the divergence in distance can be 0.94 (the patient has traveled 5 foreign hospitals in total, the number of distances being $, and @>

Is 10, and the calculation formula is 2607+1283.3+1331.7+559.3+ 1331.7+ 559.)) km/11500km with high suspicious activity.

And 308, obtaining medical data to be analyzed according to the hospitalizing information, the expense information, the participation insurance place and the remote hospitalizing distance.

The server obtains medical data to be analyzed according to the obtained medical information, the cost information, the insurance participation place and the remote medical distance so as to carry out abnormity analysis on the medical data and judge whether the medical data corresponding to the target user identification is abnormal.

In this embodiment, the medical records are screened according to the disease type to be analyzed, the screened medical records are analyzed, and the medical data to be analyzed are obtained according to the medical information, the cost information, the participation place and the remote medical distance obtained by the analysis result, so that the abnormality analysis can be performed on different types of medical data according to the disease type, and the accuracy of the abnormality analysis is ensured.

In one embodiment, the clustering medical data pairs corresponding to each data combination type respectively, and determining outliers in the medical data corresponding to each data combination type according to the clustering result includes: clustering medical data pairs corresponding to the data combination types respectively through a density clustering algorithm to obtain a clustering result comprising at least one clustering cluster; respectively determining preparation outliers in the medical data pairs corresponding to the data combination types based on an outlier detection algorithm; and determining outliers in the medical data corresponding to the data combination types from the prepared outliers according to the clustering clusters.

The density clustering algorithm is a density-based clustering algorithm, clustering is carried out according to the density of the data set in spatial distribution, the number of clustering clusters does not need to be preset, and the method is suitable for clustering the data set with unknown content. After the medical data pairs are clustered, the medical data pairs with strong density relation in spatial distribution can be divided into the same clustering cluster. Outlier detection algorithms (LOFs) can efficiently detect anomalous data or behaviors that differ significantly from normal data behavior or characteristic attributes, which are referred to as outliers. The prepared outliers are preliminary results directly obtained after detection based on an outlier detection algorithm, and the outliers meeting the requirements can be accurately determined from the prepared outliers by further combining clustering results.

Specifically, after the server combines the data of each type in the medical data to obtain the medical data pair, the server clusters the medical data pair corresponding to each data combination type respectively through a Density Clustering algorithm, such as DBSCAN (Density-Based Spatial Clustering of Applications with Noise, density-Based Noise application Spatial Clustering), options (Ordering points to identify a Clustering structure), and the like, to obtain a Clustering result including at least one Clustering cluster. The medical data pairs in the same cluster have higher density and the same characteristics, which indicates that the medical data pairs in the cluster belong to normal data.

On the other hand, the server determines the preliminary outliers in the medical data corresponding to each data combination type through an outlier detection algorithm, which is a distance-based abnormal point detection algorithm and can detect the preliminary outliers in the medical data corresponding to each data combination type. Specifically, an outlier factor, such as an outlier factor of the point p, representing an average of a ratio of a local reachable density of a neighborhood point of the point p to a local reachable density of the point p, may be calculated by an outlier detection algorithm, and a preliminary outlier in the medical data may be determined by the outlier factor.

And after the clustering result and the prepared outlier are obtained, the server combines the clustering result and the prepared outlier to determine the outlier in the medical data corresponding to each data combination type. For example, the server may determine the preliminary outliers by using the clustering results, determine whether each preliminary outlier corresponds to a certain clustering cluster in the clustering results, if so, determine that the preliminary outliers are not outliers, otherwise, determine that the preliminary outliers are outliers.

In a specific application, the medical data can be divided into numerical data (continuous type) and classification type data (discontinuous type), and the numerical data, such as total medical cost, medical insurance reimbursement cost, self-fee cost and the like, can be subjected to (0,1) normalization processing; and for non-numerical data such as hospital grade, different medical personnel categories, etc., the data can be coded, such as one-hot coding (onehot), and for example, the hospital is divided into three levels, namely three levels: (1,0,0), secondary (0,1,0), and primary and others (0,0,1).

When density clustering is carried out, a neighborhood sample set is determined according to a distance measurement result, a core object set is determined according to the neighborhood sample set, and medical data pairs are clustered according to core objects in the core object set to obtain a clustering result. Specifically, each medical data pair may be clustered by the DBSCAN algorithm. In implementation, for a dataset D = (x 1, x2,.., xm) of medical data pairs, neighborhood parameters (e, minPts) are output as cluster partition C. Where xm is the mth medical data pair, e is the radius of the neighborhood around a point, and MinPts is the number of points included in the neighborhood. Further, the following steps:

1) Initializing a core object set

Initializing cluster number k =0, initializing unvisited sample set Γ = D, cluster partition @>

2) For j =1,2.. M, all core objects are found as follows:

a) Finding a belonged neighborhood subsample set N belonged to (xj) of the sample xj in a distance measurement mode;

b) If the number of the samples in the subsample set meets | N ∈ (xj) | ≧ MinPts, adding the samples xj into the core object sample set: Ω = Ω { xj };

3) If core object set

Ending the algorithm, otherwise, turning to the step 4);

4) Randomly selecting a core object o in a core object set omega, initializing a current cluster core object queue omega cur = { o }, initializing a class serial number k = k +1, initializing a current cluster sample set Ck = { o }, and updating an unaccessed sample set gamma = gamma- { o };

5) If the current cluster core object queue

If the current cluster Ck is generated completely, updating cluster division C = { C1, C2, ·, ck }, updating a core object set omega = omega-Ck, and turning to step 3);

6) Taking out a core object o 'from the current cluster core object queue Ω cur, finding out all the neighborhood sub sample sets N e (o') by using the neighborhood distance threshold e, letting Δ = N e (o ') nΓ, updating the current cluster sample set Ck = Ck ≧ u Δ, updating the unaccessed sample set Γ = Γ - Δ, updating Ω cur = Ω cur § Δ: (Δ:Ω) -o', and turning to step 5);

and finally, outputting a result as the obtained cluster division C = { C1, C2,. And Ck }, so that clustering of medical data pairs is realized, and a clustering result is obtained.

In addition, the outlier algorithm can effectively solve the outlier caused by different densities, and mainly calculates the outlier factor: if the ratio is closer to 1, the density of the neighborhood points of the medical data pair p is almost the same, and p may belong to the same cluster with the neighborhood; if the ratio is less than 1, the density of the medical data pair p is higher than that of the neighborhood points, and p is a dense point; if the ratio is greater than 1, the density of the medical data pair p is less than the density of the points in the neighborhood, and p is more likely to be an outlier. By judging the value of the outlier, for example, for the total medical cost, the total medical cost of the outlier in the dimension is obviously higher than that of other same-disease treatment, and a small disease is packaged into a super-serious disease to obtain a high reimbursement cost; for the hospitalization times, if the dimension outlier indicates that the patient has a far higher number of times of visits than other patients, the possibility of buying false invoices for collecting medical insurance funds exists; and if the dimension outlier indicates that the patient treatment track is different from others, such as the reference place and the medical distance, a certain fraud risk exists.

In the embodiment, the clustering combination obtained by clustering through the density clustering algorithm and the prepared outliers determined by the outlier detection algorithm are combined to determine the outliers in the medical data corresponding to the data combination types, so that the accuracy of outlier judgment can be effectively improved, and the accuracy of medical data anomaly analysis is improved.

In one embodiment, determining abnormal data in the medical data based on the degree of abnormality for each outlier comprises: inquiring preset data combination types to respectively correspond to abnormal factors; calculating the abnormality degree of outliers corresponding to the data combination type according to the abnormality factor; and determining abnormal data in the medical data according to the abnormal degree of each outlier.

The abnormal factors are abnormal weights of various data combination types, different data combination types correspond to different abnormal weights, and the abnormal factors can be flexibly set according to actual requirements so as to highlight the importance degree of different data combination types on abnormal sharing judgment. The degree of abnormality reflects the degree of abnormality of the outlier, and the higher the degree of abnormality, the higher the degree of abnormality of the corresponding outlier is, the more likely the outlier is to be abnormal data.

Specifically, when determining abnormal data in the medical data, the server queries that each data combination type corresponds to an abnormal factor, and the abnormal factor can be set correspondingly when each data combination type is set, so that the importance degree of different types of medical data combinations is adjusted by using the abnormal factor, and the accuracy of abnormal analysis is improved. After obtaining the abnormal factor, the server calculates the degree of abnormality of the outlier corresponding to the data combination type according to each abnormal factor, specifically, the degree of abnormality can be obtained according to the product of the abnormal factor and the distance of the outlier corresponding to the data combination type, and the distance of the outlier can be the average distance between the outlier and the cluster center of each cluster in the clustering result or the distance between the outlier and the cluster center of the closest cluster. After the abnormality degrees of the outliers of the data combination types are obtained, the abnormality degrees of the outliers of the data combination types are integrated, for example, the abnormality evaluation result corresponding to each medical data can be obtained according to the sum of the abnormality degrees of the outliers of the data combination types, and whether the corresponding medical data is abnormal data or not is judged according to the abnormality evaluation result. If the abnormal evaluation result can include abnormal scores, the abnormal scores are compared with a preset score threshold, and if the abnormal scores exceed the score threshold, the medical data corresponding to the abnormal scores are considered to be abnormal data.

In this embodiment, the degree of abnormality of each outlier is determined by the preset data combination type corresponding to the abnormality factor, so as to adjust the importance degree of each outlier, thereby further improving the accuracy of the abnormality analysis of the medical data.

In one embodiment, determining abnormal data in the medical data based on the degree of abnormality for each outlier comprises: overlapping the abnormality degrees of all outliers corresponding to the same target user identification to obtain the user abnormality degrees corresponding to all the target user identifications respectively; and when the user abnormality degree exceeds the abnormality degree threshold value, determining that the medical data of the target user identification corresponding to the user abnormality degree is abnormal data.

In this embodiment, after calculating the abnormality degrees of the outliers corresponding to the data combination types according to the abnormality factor, the server superimposes the abnormality degrees of the outliers corresponding to the same target user identifier, that is, superimposes the abnormality degrees of the outliers in the medical data belonging to the same patient, so as to obtain the user abnormality degrees corresponding to the target user identifiers, respectively. The server compares the user abnormality degree with a preset abnormality degree threshold, and if the user abnormality degree exceeds the abnormality degree threshold, the server shows that the medical data abnormality degree of the patient corresponding to the target user identification under the disease type is higher, the authenticity and the reliability of the medical data are lower, namely the medical data of the target user identification corresponding to the user abnormality degree is determined to be abnormal data, and therefore accurate abnormality analysis of the medical data is achieved.

In this embodiment, the abnormality degrees corresponding to the outliers belonging to the same patient are superimposed to perform abnormality analysis on the medical data of the patient by integrating the abnormality degrees corresponding to the outliers, so that the accuracy of the abnormality analysis can be effectively improved.

In one embodiment, the abnormal data is fed back to the terminal for visual display at the terminal, and the method comprises the following steps: inquiring a preset abnormal grading condition; determining an abnormal level corresponding to the abnormal data according to the abnormal grading condition; determining a visual display mode of abnormal data according to the abnormal level; and feeding back the abnormal data and the visual display mode to the terminal so as to visually display the abnormal data at the terminal according to the visual display mode.

In this embodiment, a corresponding visual display mode is determined according to the abnormal level of the abnormal data, so that the abnormal data is visually displayed on the terminal according to the corresponding visual display mode.

Specifically, when the abnormal data is fed back to the terminal, the server inquires preset abnormal grading conditions, the abnormal grading conditions record grading conditions of various abnormal data, and the abnormal grade corresponding to the abnormal data can be determined according to the abnormal grading conditions. For example, the abnormal classification condition may be an abnormal degree value partition condition, and the abnormal degree size corresponding to the abnormal data may be mapped with the abnormal degree value partition condition, so as to determine the abnormal level of the abnormal data. The abnormal level may reflect an abnormal degree of the abnormal data, and the higher the abnormal level is, the higher the abnormal degree of the abnormal data is, and the lower the authenticity and reliability of the abnormal data is. After the abnormal level of the abnormal data is determined, the server further determines a visual display mode corresponding to the abnormal data according to the abnormal level, for example, the abnormal data with higher abnormal degree can be displayed in a more striking, highlighted or intuitive visual display mode, for example, the abnormal degree of the abnormal data can be highlighted through fonts, pictures and the like with different colors. After the visual display mode of the abnormal data is determined, the server feeds the abnormal data and the visual display mode back to the terminal, and after the terminal receives the abnormal data and the visual display mode, the abnormal data are visually displayed according to the visual display mode, so that the abnormal data in the medical data are visually displayed.

In addition, all the analyzed medical data can be fed back to the terminal, visual display of the medical data can be carried out on the terminal, and the abnormal data can be highlighted according to the visual display mode corresponding to the abnormal level of the abnormal data.

It should be understood that although the various steps in the flow diagrams of fig. 2-3 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2-3 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performing the sub-steps or stages is not necessarily sequential, but may be performed alternately or alternatingly with other steps or at least some of the sub-steps or stages of other steps.

In one embodiment, as shown in fig. 4, there is provided a big-data-based medical data abnormality analysis apparatus including: a medical data acquisition module 402, a data pair acquisition module 404, an outlier determination module 406, and an outlier determination module 408, wherein:

a medical data obtaining module 402, configured to obtain medical data to be analyzed corresponding to the same disease category when receiving an abnormality analysis request sent by a terminal;

a data pair obtaining module 404, configured to combine data of each type in the medical data according to a preset data combination type, so as to obtain a medical data pair corresponding to each data combination type;

an outlier determining module 406, configured to cluster the medical data pairs corresponding to the data combination types, and determine outliers in the medical data corresponding to the data combination types according to a clustering result;

and the abnormal data determining module 408 is configured to determine abnormal data in the medical data according to the abnormality degree of each outlier, and feed the abnormal data back to the terminal for visual display at the terminal.

In one embodiment, the medical data acquisition module 402 includes a user identification determination module, a medical record acquisition module, and a medical data extraction module; wherein: the user identifier determining module is used for receiving an abnormal analysis request sent by the terminal and determining a target user identifier according to the abnormal analysis request; the medical record acquisition module is used for acquiring a medical record corresponding to the target user identifier from the medical record library; and the medical data extraction module is used for extracting the medical data to be analyzed corresponding to the same disease species from the medical records.

In one embodiment, the medical data extraction module comprises a disease species determination module, a medical record analysis module, a remote distance determination module and a data to be analyzed acquisition module; wherein: the disease species determination module is used for determining the disease species to be analyzed and screening the medical records belonging to the disease species to be analyzed; the medical record analysis module is used for analyzing the screened medical records and determining the hospitalizing information, the expense information and the insurance participation place of the target user identification corresponding to the disease species to be analyzed according to the analysis result; the remote distance determining module is used for determining a remote medical distance according to the position of the medical hospital in the medical information and the insurance participation position; and the data to be analyzed obtaining module is used for obtaining the medical data to be analyzed according to the hospitalizing information, the cost information, the insurance participation place and the remote hospitalizing distance.

In one embodiment, the outlier determination module 406 includes a clustering module, an outlier detection module, and an outlier acquisition module; wherein: the clustering processing module is used for clustering the medical data pairs corresponding to the data combination types respectively through a density clustering algorithm to obtain a clustering result comprising at least one clustering cluster; the outlier detection module is used for respectively determining preparation outliers in the medical data pairs corresponding to the data combination types based on an outlier detection algorithm; and the outlier acquisition module is used for determining outliers in the medical data corresponding to the data combination types from the prepared outliers according to the clustering.

In one embodiment, the anomaly data determination module 408 includes an anomaly factor module, an anomaly degree module, and an anomaly data acquisition module; wherein: the abnormal factor module is used for inquiring the preset abnormal factors corresponding to the data combination types; the abnormality degree module is used for calculating the abnormality degree of outliers corresponding to the data combination type according to the abnormality factors; and the abnormal data acquisition module is used for determining abnormal data in the medical data according to the abnormality degree of each outlier.

In one embodiment, the anomaly data obtaining module comprises an anomaly degree overlapping module and an anomaly degree comparing module; wherein: the abnormality degree superposition module is used for superposing the abnormality degrees of all the outliers corresponding to the same target user identifier to obtain the user abnormality degrees corresponding to all the target user identifiers respectively; and the abnormality degree comparison module is used for determining that the medical data of the target user identification corresponding to the user abnormality degree is abnormal data when the user abnormality degree exceeds the abnormality degree threshold value.

In one embodiment, the abnormal data determination module 408 includes a hierarchical condition query module, a level determination module, a display mode determination module, and an abnormal data feedback module; wherein: the grading condition query module is used for querying preset abnormal grading conditions; the level determining module is used for determining the abnormal level corresponding to the abnormal data according to the abnormal grading condition; the display mode determining module is used for determining a visual display mode of the abnormal data according to the abnormal level; and the abnormal data feedback module is used for feeding the abnormal data and the visual display mode back to the terminal so as to visually display the abnormal data at the terminal according to the visual display mode.

For specific limitations of the medical data abnormality analysis apparatus based on big data, reference may be made to the above limitations of the medical data abnormality analysis method based on big data, and details are not repeated here. The modules in the big data based medical data abnormality analysis device can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent of a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 5. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operating system and the computer program to run on the non-volatile storage medium. The database of the computer device is used for storing data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a big data based medical data anomaly analysis method.

Those skilled in the art will appreciate that the architecture shown in fig. 5 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, there is provided a computer device comprising a memory storing a computer program and a processor implementing the following steps when the processor executes the computer program:

In one embodiment, the processor when executing the computer program further performs the steps of: receiving an abnormal analysis request sent by a terminal, and determining a target user identifier according to the abnormal analysis request; acquiring a medical record corresponding to the target user identifier from a medical record library; medical data to be analyzed corresponding to the same disease species is extracted from the medical records.

In one embodiment, the processor, when executing the computer program, further performs the steps of: determining the disease species to be analyzed, and screening medical records belonging to the disease species to be analyzed from the medical records; analyzing the screened medical records, and determining the hospitalizing information, the cost information and the insurance participation position of the target user identification corresponding to the disease species to be analyzed according to the analysis result; determining the remote medical seeking distance according to the position of the medical seeking hospital in the medical seeking information and the position of the medical seeking site; and obtaining medical data to be analyzed according to the hospitalizing information, the expense information, the participation and insurance place and the remote hospitalizing distance.

In one embodiment, the processor, when executing the computer program, further performs the steps of: clustering medical data pairs corresponding to the data combination types respectively through a density clustering algorithm to obtain a clustering result comprising at least one clustering cluster; respectively determining preparation outliers in the medical data pairs corresponding to the data combination types based on an outlier detection algorithm; and determining outliers in the medical data corresponding to each data combination type from the prepared outliers according to the clustering.

In one embodiment, the processor, when executing the computer program, further performs the steps of: inquiring preset data combination types to respectively correspond to abnormal factors; calculating the abnormality degree of outliers corresponding to the data combination type according to the abnormality factors; and determining abnormal data in the medical data according to the abnormal degree of each outlier.

In one embodiment, the processor when executing the computer program further performs the steps of: overlapping the abnormality degrees of all outliers corresponding to the same target user identification to obtain the user abnormality degrees corresponding to all the target user identifications respectively; and when the user abnormality degree exceeds the abnormality degree threshold value, determining that the medical data of the target user identification corresponding to the user abnormality degree is abnormal data.

In one embodiment, the processor, when executing the computer program, further performs the steps of: inquiring a preset abnormal grading condition; determining an abnormal level corresponding to the abnormal data according to the abnormal grading condition; determining a visual display mode of abnormal data according to the abnormal level; and feeding back the abnormal data and the visual display mode to the terminal so as to visually display the abnormal data at the terminal according to the visual display mode.

In one embodiment, a computer storage medium is provided, having a computer program stored thereon, the computer program, when executed by a processor, performing the steps of:

and determining abnormal data in the medical data according to the abnormality degree of each outlier, and feeding the abnormal data back to the terminal so as to perform visual display at the terminal.

In one embodiment, the computer program when executed by the processor further performs the steps of: receiving an abnormal analysis request sent by a terminal, and determining a target user identifier according to the abnormal analysis request; acquiring a medical record corresponding to the target user identifier from a medical record library; medical data to be analyzed corresponding to the same disease species is extracted from the medical record.

In one embodiment, the computer program when executed by the processor further performs the steps of: determining the disease species to be analyzed, and screening medical records belonging to the disease species to be analyzed from the medical records; analyzing the screened medical records, and determining the hospitalizing information, the cost information and the insurance participation position of the target user identification corresponding to the disease species to be analyzed according to the analysis result; determining the remote medical distance according to the position of the medical hospital in the medical information and the participation position; and obtaining medical data to be analyzed according to the hospitalizing information, the expense information, the participation and insurance place and the remote hospitalizing distance.

In one embodiment, the computer program when executed by the processor further performs the steps of: clustering medical data pairs corresponding to the data combination types respectively through a density clustering algorithm to obtain a clustering result comprising at least one clustering cluster; respectively determining preparation outliers in the medical data pairs corresponding to the data combination types based on an outlier detection algorithm; and determining outliers in the medical data corresponding to each data combination type from the prepared outliers according to the clustering.

In one embodiment, the computer program when executed by the processor further performs the steps of: inquiring preset data combination types to respectively correspond to abnormal factors; calculating the abnormality degree of outliers corresponding to the data combination type according to the abnormality factor; and determining abnormal data in the medical data according to the abnormality degree of each outlier.

In one embodiment, the computer program when executed by the processor further performs the steps of: overlapping the abnormality degrees of all outliers corresponding to the same target user identification to obtain the user abnormality degrees corresponding to all the target user identifications respectively; and when the user abnormality degree exceeds the abnormality degree threshold value, determining that the medical data of the target user identification corresponding to the user abnormality degree is abnormal data.

In one embodiment, the computer program when executed by the processor further performs the steps of: querying a preset abnormal grading condition; determining an abnormal level corresponding to the abnormal data according to the abnormal grading condition; determining a visual display mode of abnormal data according to the abnormal level; and feeding back the abnormal data and the visual display mode to the terminal so as to visually display the abnormal data at the terminal according to the visual display mode.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), rambus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A big data based medical data anomaly analysis method, the method comprising:

combining the data of each type in the medical data according to a preset data combination type to obtain medical data pairs which respectively correspond to the data combination types and comprise different types of data;

inquiring preset abnormal factors corresponding to the data combination types; the abnormal factor is abnormal weight of various data combination types;

calculating the abnormality degree of outliers corresponding to the data combination type according to the abnormality factors;

superposing the abnormal degrees of all outliers corresponding to the same target user identification to obtain the user abnormal degree corresponding to each target user identification;

when the user abnormality degree exceeds an abnormality degree threshold value, determining that the medical data of the target user identification corresponding to the user abnormality degree is abnormal data, and feeding back the abnormal data to the terminal for visual display at the terminal.

2. The method according to claim 1, wherein the acquiring the medical data to be analyzed corresponding to the same disease category from the user when receiving the abnormality analysis request sent by the terminal comprises:

medical data to be analyzed corresponding to the same disease species is extracted from the medical record.

3. The method of claim 2, wherein said extracting medical data to be analyzed corresponding to a same disease species from said medical records comprises:

determining a disease species to be analyzed, and screening a screening medical record belonging to the disease species to be analyzed from the medical records;

analyzing the screened medical record, and determining the hospitalizing information, the cost information and the insurance participation place of the target user identification corresponding to the disease species to be analyzed according to the analysis result;

determining a remote medical distance according to the position of the medical hospital in the medical information and the position of the medical insurance participation position;

and obtaining medical data to be analyzed according to the hospitalizing information, the expense information, the insurance participation place and the remote hospitalizing distance.

4. The method according to claim 1, wherein the clustering the medical data pairs corresponding to the data combination types respectively, and determining outliers in the medical data corresponding to the data combination types according to a clustering result comprises:

5. The method according to any one of claims 1 to 4, wherein the feeding back of the abnormal data to the terminal for visual display at the terminal comprises:

inquiring a preset abnormal grading condition;

determining a visual display mode of the abnormal data according to the abnormal level;

6. A big-data-based medical data anomaly analysis apparatus, the apparatus comprising:

the medical data acquisition module is used for acquiring medical data to be analyzed corresponding to the same disease category of a user when receiving an abnormal analysis request sent by the terminal;

the data pair obtaining module is used for combining the data of each type in the medical data according to a preset data combination type to obtain medical data pairs which respectively correspond to the data combination types and comprise different types of data;

the abnormal data determining module is used for inquiring preset abnormal factors respectively corresponding to each data combination type, and the abnormal factors are abnormal weights of various data combination types; calculating the abnormality degree of outliers corresponding to the data combination type according to the abnormality factors; overlapping the abnormality degrees of all outliers corresponding to the same target user identification to obtain the user abnormality degrees corresponding to all the target user identifications respectively; when the user abnormality degree exceeds an abnormality degree threshold value, determining that the medical data of the target user identification corresponding to the user abnormality degree is abnormal data, and feeding back the abnormal data to the terminal for visual display at the terminal.

7. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor when executing the computer program performs the steps of the method according to any of claims 1 to 5.

8. A computer storage medium on which a computer program is stored, characterized in that the computer program, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 5.