CN109767819B

CN109767819B - Medical record grouping method and device, storage medium and electronic equipment

Info

Publication number: CN109767819B
Application number: CN201811512148.8A
Authority: CN
Inventors: 王阳; 赵立军; 张霞
Original assignee: Neusoft Corp
Current assignee: Neusoft Corp
Priority date: 2018-12-11
Filing date: 2018-12-11
Publication date: 2021-06-04
Anticipated expiration: 2038-12-11
Also published as: CN109767819A

Abstract

The disclosure relates to a grouping method, a grouping device, a storage medium and an electronic device of medical records, wherein the method comprises the following steps: extracting all target features in medical records to be grouped; acquiring a total energy value of the DRG group corresponding to the medical record to be grouped according to a pre-established characteristic probability network and all target characteristics, wherein the characteristic probability network is a network topology structure which takes the DRG group and the characteristic group as nodes, takes an association relation between the DRG group and the characteristic group as an edge, takes a correlation probability value between the DRG group and the characteristic group as the weight of the edge, and the total energy value is the sum of a plurality of correlation probability values between the DRG group and a plurality of characteristic groups to which all the target characteristics belong; and determining the DRG group with the maximum total energy value as a target DRG group corresponding to the medical record to be grouped. The medical records can be identified and grouped through a network structure established according to the correlation between the characteristics and the DRG group, so that the step of manual grouping is avoided, and the efficiency and the accuracy of medical record grouping are improved.

Description

Medical record grouping method and device, storage medium and electronic equipment

Technical Field

The present disclosure relates to the field of medical evaluation, and in particular, to a method and an apparatus for grouping medical records, a storage medium, and an electronic device.

Background

The DRG (diagnostic Related Groups) system is one of the more advanced payment methods recognized in the world today, is a patient classification scheme, and is a classification coding standard specially used for a medical insurance prepayment system. The system carries out medical evaluation on medical records of patients according to factors such as age, sex, hospitalization days, clinical diagnosis, diseases, operation, disease severity, complications, outcome and the like of the patients, further divides the medical records into 600 DRG groups with 500 plus drugs, and gives quota prepayment through scientific measurement and calculation. That is, the DRG system actually agrees with the hospital on the patient type payment criteria for the medical insurance institution, who pays the hospital according to the patient type payment criteria when receiving and treating the patient who participates in medical insurance, exceeding a payment system partially borne by the hospital. In recent years, with the increasing medical expenses, the pressure of medical insurance is gradually increased, and the governments gradually introduce the DRG scheme into the medical insurance system to manage and supervise medical resources. In the related art, the medical records are usually divided into different DRG groups manually according to the content (i.e., features) in the medical records, and then an abnormal medical record is identified in the DRG group by using a one-class SVM (one-class Support Vector Machine) or an isolated forest. However, the one-class SVM is greatly influenced by the hyperparameter, and the isolated forest can only process a small amount of continuous variables. In addition, both methods only find out abnormal points in grouped medical records, and cannot provide more accurate grouping information.

Disclosure of Invention

To overcome the problems in the related art, an object of the present disclosure is to provide a method, an apparatus, a storage medium, and an electronic device for grouping medical records.

In order to achieve the above object, according to a first aspect of the embodiments of the present disclosure, there is provided a method for grouping medical records, the method including:

extracting all target features in medical records to be grouped;

acquiring a total energy value of a first DRG group corresponding to the medical record to be grouped according to a pre-established characteristic probability network and all target characteristics, wherein the characteristic probability network is a network topology structure established by taking the DRG group and the characteristic group as nodes, taking an association relation between the DRG group and the characteristic group as an edge and taking a correlation probability value between the DRG group and the characteristic group as a weight of the edge, the first DRG group is any one of a plurality of DRG groups contained in a DRG library, the total energy value is the sum of the plurality of correlation probability values between the first DRG group and the plurality of characteristic groups to which all the target characteristics belong, and each characteristic group contains a plurality of characteristics meeting the same grouping condition;

and determining the DRG group with the maximum total energy value as a target DRG group corresponding to the medical record to be grouped.

Optionally, each DRG group in the DRG library corresponds to a plurality of grouped medical records, each grouped medical record includes a plurality of features, the plurality of features correspond to a plurality of feature classes, and before the extracting all target features in the medical records to be grouped, the method further includes:

aiming at a first feature class, acquiring all sample features belonging to the first feature class in a plurality of grouped medical records corresponding to a second DRG group, wherein the second DRG group is any one DRG group in the DRG library, and the first feature class is any one feature class in the plurality of feature classes;

dividing all the sample characteristics into a plurality of characteristic groups according to grouping conditions corresponding to the first characteristic class, so that the number of the sample characteristics in the plurality of characteristic groups is in a preset distribution state;

acquiring a correlation probability value between the second DRG group and each feature group in the plurality of feature groups according to the number of sample features in each feature group in the plurality of feature groups and a probability density function corresponding to the preset distribution state;

after the correlation probability values between the plurality of DRG groups and all the feature groups corresponding to the plurality of DRG groups are obtained, establishing the feature probability network by taking the plurality of DRG groups and all the feature groups corresponding to the plurality of DRG groups as nodes, taking the correlation relations between the plurality of DRG groups and all the feature groups corresponding to the plurality of DRG groups as edges, and taking the correlation probability values between the plurality of DRG groups and all the feature groups corresponding to the plurality of DRG groups as weights of the edges.

Optionally, the obtaining, according to the pre-established feature probability network and all the target features, a total energy value corresponding to the medical records to be grouped in the first DRG group includes:

determining all first feature groups having association relation with the first DRG group according to the feature probability network;

determining a plurality of second feature groups matching the all target features from the all first feature groups;

and determining the sum of a plurality of correlation probability values between the first DRG group and the plurality of second feature groups according to the feature probability network, wherein the sum is used as a total energy value of the first DRG group corresponding to the medical records to be grouped.

Optionally, the feature value type of the feature is a discrete type or a continuous type, all features corresponding to each feature class have the same feature value type, and when the feature value types of all sample features corresponding to the first feature class are discrete types, the dividing, according to the grouping condition corresponding to the first feature class, all the sample features into a plurality of feature groups so that the number of the sample features in the plurality of feature groups is in a preset distribution state includes:

dividing the sample features with the same feature value in all the sample features into a feature group to obtain a plurality of feature groups;

arranging the plurality of feature groups so that the number of sample features in the plurality of feature groups is in the preset distribution state;

and allocating numbers to the plurality of feature groups according to the sequence of arrangement.

Optionally, when the feature value types of all the sample features corresponding to the first feature class are continuous types, the dividing, according to the grouping condition corresponding to the first feature class, all the sample features into a plurality of feature groups, so that the number of the sample features in the plurality of feature groups is in a preset distribution state, includes:

obtaining the sample characteristic with the maximum characteristic value and the sample characteristic with the minimum characteristic value in all the sample characteristics;

equally dividing a plurality of value intervals between the maximum characteristic value and the minimum characteristic value;

dividing the sample features in the same value interval from all the sample features into the same feature group to obtain a plurality of feature groups;

and allocating numbers to the plurality of feature groups according to the size of the endpoint value of the value intervals corresponding to the plurality of feature groups.

Optionally, the preset distribution state is normal distribution, and the obtaining of the correlation probability value between the second DRG group and each feature group corresponding to the second DRG group according to the number of target features in each feature group in the plurality of feature groups and the probability density function corresponding to the preset distribution state includes:

taking the number of each feature group as an input variable of a normal distribution probability density function to obtain a correlation probability value between the second DRG group and each feature group in the plurality of feature groups; wherein the normally distributed probability density function comprises:

where x is an input variable, σ is a standard deviation of the number of target features in the plurality of feature groups, and μ is an average of the number of sample features in the plurality of feature groups.

Optionally, for a target feature of which a feature value type is a discrete type in all the target features, the determining, in all the first feature groups, a plurality of second feature groups that match all the target features includes:

determining a second feature class to which the target feature belongs;

determining a plurality of third feature groups corresponding to the second feature class among all the first feature groups;

determining a feature group in which features having the same feature value as the target feature are located in the plurality of third feature groups as a second feature group matched with the first feature;

determining a second feature group matching each of the target features as the plurality of second feature groups.

Optionally, for a target feature of which a feature value type is a continuous type among all the target features, the determining, in all the first feature groups, a plurality of second feature groups that match all the target features includes:

determining a third feature class to which the target feature belongs;

determining a plurality of fourth feature groups corresponding to the third feature class in all the first feature groups, wherein the plurality of fourth feature groups are a plurality of value intervals which are divided according to the maximum feature values and the minimum feature values of all the features corresponding to the third feature class;

determining a first value interval in which the target feature is located in the plurality of value intervals;

determining a feature group corresponding to the first value range in the plurality of fourth feature groups as the second feature group matched with the target feature;

According to a second aspect of the embodiments of the present disclosure, there is provided an apparatus for grouping medical records, the apparatus including:

the characteristic extraction module is used for extracting all target characteristics in medical records to be grouped;

an energy value obtaining module, configured to obtain, according to a pre-established feature probability network and all target features, a total energy value of a first DRG group corresponding to the medical record to be grouped, where the feature probability network is a network topology structure established by using the DRG group and the feature group as nodes, using an association relationship between the DRG group and the feature group as an edge, and using a correlation probability value between the DRG group and the feature group as a weight of the edge, the first DRG group is any one of a plurality of DRG groups included in a DRG library, the total energy value is a sum of a plurality of correlation probability values between the first DRG group and a plurality of feature groups to which all the target features belong, and each feature group includes a plurality of features that satisfy a same grouping condition;

and the medical record grouping module is used for determining the DRG group with the maximum total energy value as a target DRG group corresponding to the medical record to be grouped.

Optionally, each DRG group in the DRG library corresponds to a plurality of grouped medical records, each grouped medical record includes a plurality of features, and the plurality of features correspond to a plurality of feature classes, and the apparatus further includes:

a sample obtaining module, configured to obtain, for a first feature class, all sample features belonging to the first feature class in a plurality of grouped medical records corresponding to a second DRG group, where the second DRG group is any DRG group in the DRG library, and the first feature class is any feature class in the plurality of feature classes;

the characteristic grouping module is used for dividing all the sample characteristics into a plurality of characteristic groups according to the grouping condition corresponding to the first characteristic class so as to enable the number of the sample characteristics in the plurality of characteristic groups to be in a preset distribution state;

a correlation determination module, configured to obtain a correlation probability value between the second DRG group and each feature group in the plurality of feature groups according to the number of sample features in each feature group in the plurality of feature groups and a probability density function corresponding to the preset distribution state;

a network establishing module, configured to, after obtaining the correlation probability values between the multiple DRG groups and all feature groups corresponding to the multiple DRG groups, establish the feature probability network by using the multiple DRG groups and all feature groups corresponding to the multiple DRG groups as nodes, using the correlation relationships between the multiple DRG groups and all feature groups corresponding to the multiple DRG groups as edges, and using the correlation probability values between the multiple DRG groups and all feature groups corresponding to the multiple DRG groups as weights of the edges.

Optionally, the energy value obtaining module includes:

a first feature group determining submodule, configured to determine, according to the feature probability network, all first feature groups having an association relationship with the first DRG group;

a second feature group determination sub-module for determining a plurality of second feature groups that match the all of the target features among the all of the first feature groups;

and the energy value operator module is used for determining the sum of a plurality of correlation probability values between the first DRG group and the plurality of second feature groups according to the feature probability network, and the sum is used as the total energy value of the first DRG group corresponding to the medical records to be grouped.

Optionally, the feature value type of the feature is a discrete type or a continuous type, all the features corresponding to each feature class have the same feature value type, and the feature grouping module includes:

a first feature group obtaining sub-module, configured to divide sample features having the same feature value among all the sample features into a feature group, so as to obtain the plurality of feature groups;

the feature group arrangement submodule is used for arranging the plurality of feature groups so as to enable the number of the sample features in the plurality of feature groups to be in the preset distribution state;

and the first characteristic group numbering submodule is used for distributing numbers for the plurality of characteristic groups according to the sequence of arrangement.

Optionally, the feature grouping module includes:

the characteristic value obtaining submodule is used for obtaining the sample characteristic with the maximum characteristic value and the sample characteristic with the minimum characteristic value in all the sample characteristics;

the interval division submodule is used for equally dividing a plurality of value intervals between the maximum characteristic value and the minimum characteristic value;

the second characteristic group obtaining submodule is used for dividing the sample characteristics in the same value interval from all the sample characteristics into the same characteristic group so as to obtain the plurality of characteristic groups;

and the second feature group numbering submodule is used for allocating numbers to the plurality of feature groups according to the sizes of the endpoint values of the value intervals corresponding to the plurality of feature groups.

Optionally, the preset distribution state is a normal distribution, and the correlation determination module is configured to:

Optionally, for a target feature of which a feature value type is a discrete type in all the target features, the second feature group determination submodule is configured to:

determining a second feature class to which the target feature belongs;

Optionally, for a target feature of which the feature value type is a continuous type among all the target features, the second feature group determining submodule is configured to:

determining a third feature class to which the target feature belongs;

According to a third aspect of the embodiments of the present disclosure, a computer-readable storage medium is provided, on which a computer program is stored, which when executed by a processor, implements the steps of the grouping method of medical records provided by the first aspect of the embodiments of the present disclosure.

According to a fourth aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including:

a memory having a computer program stored thereon;

a processor configured to execute the computer program in the memory to implement the steps of the medical record grouping method provided in the first aspect of the embodiments of the present disclosure.

By the technical scheme, all target features in the medical records to be grouped can be extracted; acquiring a total energy value of a first DRG group corresponding to the medical record to be grouped according to a pre-established characteristic probability network and all target characteristics, wherein the characteristic probability network is a network topology structure established by taking the DRG group and the characteristic group as nodes, taking an association relation between the DRG group and the characteristic group as an edge and taking a correlation probability value between the DRG group and the characteristic group as a weight of the edge, the first DRG group is any one of a plurality of DRG groups contained in a DRG library, the total energy value is the sum of the correlation probability values between the first DRG group and the plurality of characteristic groups to which all the target characteristics belong, and each characteristic group contains a plurality of characteristics meeting the same grouping condition; and determining the DRG group with the maximum total energy value as a target DRG group corresponding to the medical record to be grouped. The medical records can be identified and grouped through a network structure established according to the correlation between the characteristics and the DRG group, so that the step of manual grouping is avoided, and the efficiency and the accuracy of medical record grouping are improved.

Additional features and advantages of the disclosure will be set forth in the detailed description which follows.

Drawings

The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure without limiting the disclosure. In the drawings:

FIG. 1 is a flow diagram illustrating a method of grouping medical records according to an exemplary embodiment;

FIG. 2 is a flow diagram illustrating another method of grouping medical records according to the embodiment shown in FIG. 1;

FIG. 3 is a flow chart illustrating a total energy value acquisition method according to the embodiment shown in FIG. 2;

FIG. 4 is a flow diagram illustrating a method of feature group partitioning according to the embodiment shown in FIG. 2;

FIG. 5 is a flow chart of another method of feature group partitioning according to the embodiment shown in FIG. 2;

FIG. 6 is a block diagram illustrating a grouping apparatus of medical records according to an exemplary embodiment;

FIG. 7 is a block diagram of another grouping apparatus for medical records according to the embodiment shown in FIG. 6;

FIG. 8 is a block diagram of an energy value acquisition module according to the embodiment shown in FIG. 7;

FIG. 9 is a block diagram of a feature grouping module shown in accordance with the embodiment shown in FIG. 7;

FIG. 10 is a block diagram of another feature grouping module shown in accordance with the embodiment shown in FIG. 7;

FIG. 11 is a block diagram illustrating an electronic device in accordance with an example embodiment.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

Before grouping medical records provided by the present disclosure, a target application scenario Related to each embodiment in the present disclosure is introduced, where the target application scenario includes a DRG (Diagnosis Related grouping) system, and the DRG system stores a plurality of DRG Groups, and each DRG group corresponds to a plurality of grouped medical records.

Fig. 1 is a flowchart illustrating a grouping method of medical records according to an exemplary embodiment, as shown in fig. 1, applied to the DRG system described above, the method including:

step 101, extracting all target features in medical records to be grouped.

For example, the medical records to be grouped are medical records submitted to the DRG system by a hospital, and the medical records to be grouped need to be grouped by the DRG system, so as to determine medical resources consumed by the medical records to be grouped according to the divided DRG groups. All the above target features are the contents recorded under each column in the medical records to be grouped, wherein each column is a feature class, and the feature class may include: the "symptoms", "drug numbers", "treatment means", "whether or not to perform an operation", "hospitalization costs", "operation costs", and "days of hospitalization", etc. Alternatively, the target characteristics are information filled under the columns of "symptom", "medication number" and "hospitalization cost" in each medical record to be grouped.

And 102, acquiring total energy value of the first DRG group corresponding to the medical records to be grouped according to the pre-established characteristic probability network and all the target characteristics.

The characteristic probability network is a network topology structure established by taking DRG groups and characteristic groups as nodes, taking the association relation between the DRG groups and the characteristic groups as edges and taking the correlation probability value between the DRG groups and the characteristic groups as the weight of the edges, and the first DRG group is any one of a plurality of DRG groups contained in a DRG library. Each DRG group in the DRG library corresponds to a plurality of grouped medical records, each grouped medical record comprises a plurality of characteristics, and the plurality of characteristics correspond to a plurality of characteristic classes. It can be understood that, before the step 102, it is necessary to determine the degree of association between the existing multiple DRG groups in the DRG system and all the features in the multiple grouped medical records, and then, in the step 102, according to the corresponding relationship between the target feature in the medical record to be grouped and the existing feature in the DRG system, determine the degree of association (i.e., the total energy value) between the target feature in the medical record to be grouped and each DRG group, where the degree of association is a basis for grouping the medical records to be grouped in the following step 103. The total energy value is the sum of correlation probability values between the first DRG group and a plurality of feature groups to which all the target features belong, and each feature group comprises a plurality of features meeting the same grouping condition. The expression "feature group" is actually a feature that appears in a plurality of existing grouped medical records, and the number of features in a feature group is the number of times that the feature corresponding to the feature group appears in the plurality of existing grouped medical records.

And 103, determining the DRG group with the maximum total energy value as a target DRG group corresponding to the medical record to be grouped.

For example, after the total energy value of each DRG group in the DRG system obtained in the step 102 is obtained, the total energy values of all DRG groups can be compared, and a target DRG group with the largest total energy value (i.e. the highest correlation degree with the medical record to be grouped) is selected, so that the medical record to be grouped is divided into the target DRG groups, and the grouping process of the medical record to be grouped is completed.

In summary, the present disclosure can extract all target features in medical records to be grouped; acquiring a total energy value of a first DRG group corresponding to the medical record to be grouped according to a pre-established characteristic probability network and all target characteristics, wherein the characteristic probability network is a network topology structure established by taking the DRG group and the characteristic group as nodes, taking an association relation between the DRG group and the characteristic group as an edge and taking a correlation probability value between the DRG group and the characteristic group as a weight of the edge, the first DRG group is any one of a plurality of DRG groups contained in a DRG library, the total energy value is the sum of the correlation probability values between the first DRG group and the plurality of characteristic groups to which all the target characteristics belong, and each characteristic group contains a plurality of characteristics meeting the same grouping condition; and determining the DRG group with the maximum total energy value as a target DRG group corresponding to the medical record to be grouped. The medical records can be identified and grouped through a network structure established according to the correlation between the characteristics and the DRG group, so that the step of manual grouping is avoided, and the efficiency and the accuracy of medical record grouping are improved.

Fig. 2 is a flowchart illustrating another medical record grouping method according to the embodiment shown in fig. 1, and as shown in fig. 2, before step 101, the method may further include:

and 104, aiming at the first characteristic class, acquiring all sample characteristics belonging to the first characteristic class in a plurality of grouped medical records corresponding to the second DRG group.

Wherein, the second DRG group is any DRG group in the DRG library, and the first feature class is any one of the feature classes. It should be noted that the steps 104 to 106 are only described as an example of the process of counting the number of times and calculating the probability of all sample features in one feature class in one DRG group. In an actual DRG system scenario, it is necessary to perform frequency statistics and probability calculation on a large number of features under multiple feature classes in multiple DRG groups, where the frequency statistics and probability calculation process of the features under each feature class in each DRG group is the same as the frequency statistics and probability calculation process described in steps 104 to 106.

For example, all of the sample features are the contents recorded in each column of the plurality of grouped medical records, where each column corresponds to a feature class. The plurality of grouped medical records (including the medical records to be grouped) are set to adopt the same tabulation format, that is, feature classes contained in each grouped medical record are completely the same. For example, if the first feature class (i.e., a column of a grouped medical record corresponding to any of the DRG groups) is "symptom," then the feature class "symptom" is included in each of the grouped medical records. Wherein all sample features belonging to the first feature class may include: hoarseness, hyperplasia, discomfort, pain, redness and swelling, etc. It should be noted that the features related to the embodiments of the present disclosure are classified into two types of discrete and continuous feature values, the discrete feature values are character texts, for example, "hoarseness", "hyperplasia", "discomfort", "pain", and "redness" as described above are discrete features, the feature values included in the feature class "hospitalization cost" are continuous numerical values, and the features included in one feature class all have the same type of feature values. The distinction between the discrete and continuous features is based not only on the feature text as characters or numerals, but also on the feature "20140910348940" which is a feature having a discrete feature value because, for example, the feature "20140910348940" included under the feature "drug number" is a numeral but not continuous.

And 105, dividing all the sample characteristics into a plurality of characteristic groups according to the grouping condition corresponding to the first characteristic class, so that the number of the sample characteristics in the plurality of characteristic groups is in a preset distribution state.

For example, the grouping condition corresponding to the first feature class is determined by the feature value type of the sample feature under the first feature class, and when the feature value type of the sample feature under the first feature class is a discrete type, the grouping condition is that sample features with completely identical feature values (actually, character texts) of the sample features in the first feature class are divided into a feature group, and then the number of features in the feature group is counted as the number of times that the sample features appear in the plurality of grouped medical records. For example, the first feature class is "symptom", and all sample features under the first feature class are divided into 5 feature groups of a (hoarseness), B (hyperplasia), C (discomfort), D (pain), and E (redness). Wherein, the a-feature group includes 50 features of "hoarseness", that is, the feature of "hoarseness" appears 50 times in the above-mentioned plurality of grouped medical records.

Based on this, it can be understood that when the feature value type of the sample feature under the feature class is a discrete type, only the number of times of occurrence of the next feature (i.e. a character text) under the same feature class actually needs to be counted, and the differential expressions of "feature group", "feature" and "feature value" are used herein only to correspond to the step of counting the number of times of occurrence of the feature when the feature value type of the sample feature under the feature class is a continuous type, which is described below. For example, for a feature with a discrete feature value of "hoarseness" under the feature class "symptom", the corresponding contents of "feature group", "feature" and "feature value" are the same and are all "hoarseness", but the difference is that "feature group" is focused on the number of occurrences of the feature "hoarseness", and "feature value" is focused on the type of feature value expressing the feature "hoarseness" (i.e., discrete type).

For example, unlike the feature having the discrete feature value, the feature having the continuous feature value cannot be used as a grouping condition in a manner of matching the character text, and therefore, the feature having the continuous feature value needs to be divided into a plurality of feature groups according to the value range in which the feature is located. When the characteristic value type of the sample characteristic under the characteristic class is a continuous type, the grouping condition is to obtain the maximum characteristic value and the minimum characteristic value of the sample characteristic under the first characteristic class, then to equally divide the value intervals with preset number from the maximum characteristic value and the minimum characteristic value, and to use each value interval as a characteristic group, and then to divide a plurality of characteristic values in the value interval into a characteristic group. For example, for the feature class "hospitalization cost", the maximum eigenvalue of the sample feature under the feature class "hospitalization cost" is "20000" and the minimum eigenvalue is "5000", and in this case, 3 value intervals of [5000, 10000], [10001, 15000] and [15001, 20000] may be equally divided between the eigenvalue "20000" and the feature "5000", and sequentially correspond to the feature group X, the feature group Y, and the feature group Z, respectively. Thus, the features having feature values of "5040" and "6000" are classified into the feature group X, the features having feature values of "12000" and "13987" are classified into the feature group Y, and the features having feature values of "18234" and "17540" are classified into the feature group Z. As can be seen from this, for a feature having a continuous feature value "6000" in the feature class "hospitalization cost", the "feature group" at this point is a value range in which the feature "6000" is located, and the "feature value" is used not only to represent the feature value type (i.e., continuous type) of the feature "6000", but also as a basis for determining the "feature group" in which the feature "6000" is located.

And 106, acquiring a correlation probability value between the second DRG group and each feature group in the plurality of feature groups according to the number of the sample features in each feature group in the plurality of feature groups and a probability density function corresponding to the preset distribution state.

Illustratively, the preset distribution state is a normal distribution, and the step 106 includes: taking the serial number of each feature group as an input variable of a normal distribution probability density function to acquire a correlation probability value between the second DRG group and each feature group in the plurality of feature groups; wherein the normally distributed probability density function (1) includes:

For example, after the steps 104 to 106 are performed, a correlation probability value between one DRG group and a plurality of feature groups belonging to the same feature class corresponding to the DRG group may be obtained. Wherein one feature class is one computational unit. Taking n feature groups corresponding to one DRG group as an example, the n feature groups are divided into 3 feature groups, where a feature group a corresponds to a feature group, B feature group B corresponds to B feature group, and C feature group C corresponds to C feature group, where n is a + B + C. It can be understood that, when calculating the correlation probability values between the DRG group and the feature groups, the probability value calculating step in the step 106 needs to be performed by using a feature groups corresponding to the feature class a (feature class B corresponds to B feature groups or feature class C corresponds to C feature groups) as a calculating unit to obtain the correlation probability values between the DRG group and each of the a feature groups (B feature groups or C feature groups). To obtain the correlation probability value between the DRG group and each of the n feature groups, the

above steps

104 and 106 need to be repeated three times. After the calculation of the correlation probability value between one DRG group and the related feature group (for example, the above n feature groups) is completed, the correlation probability value between all DRG groups (i.e., the above DRG groups) in the DRG library and the related feature group (i.e., all feature groups corresponding to the above DRG groups) may be calculated in the same step, as a basis for the following step 107 of constructing the feature probability network.

In addition, after the above calculation step is completed, the concept of feature class may be discarded, each feature group is directly used as an independent unit, the association relationship and the corresponding weight between one feature group and one DRG group are simply obtained, and the step of establishing the feature probability network described in the following step 107 is further performed. Similarly, when the total energy value of each DRG group is obtained according to all target features in the medical record to be grouped in the step 102, the feature group to which each target feature belongs needs to be located according to the feature class to which each target feature belongs, and then the concept of the feature class is discarded as well, the correlation probability value between each feature group (the feature group includes any target feature) and each DRG group is determined, and the correlation probability values are summed to obtain the total energy value.

Step 107, after obtaining the correlation probability values between the plurality of DRG groups and all feature groups corresponding to the plurality of DRG groups, establishing the feature probability network by using the plurality of DRG groups and all feature groups corresponding to the plurality of DRG groups as nodes, using the correlation relationships between the plurality of DRG groups and all feature groups corresponding to the plurality of DRG groups as edges, and using the correlation probability values between the plurality of DRG groups and all feature groups corresponding to the plurality of DRG groups as weights of the edges.

Illustratively, the feature probability network may be denoted as G (V, E, W). In the feature probability network G (V, E, W), V can be represented as: v is V_DRG∪V_fWherein V is_DRGRepresents a set of said plurality of DRG group nodes, V_fRepresenting a set of all feature set nodes; e can be expressed as: e ═ E_ij|i∈V_f,j∈V_DRGWherein i is V_fJ is V_DRGAny of the DRG group nodes in (e)_ijAn edge connecting the feature group node i and the DRG group node j (namely the incidence relation between the two); w may be expressed as W ═ ρ_j,p(i)|i∈V_f,j∈V_DRGWhere ρ is_j,p(i) Representing the relevance probability value between the feature group node i and the DRG group node j under the feature class p.

Fig. 3 is a flowchart illustrating a method for obtaining a total energy value according to the embodiment shown in fig. 2, and as shown in fig. 3, the step 102 may include:

step 1021, determining all first feature groups having association relation with the first DRG group according to the feature probability network.

In step 1022, a plurality of second feature sets matching all the target features are determined from all the first feature sets.

For example, for a target feature with a discrete feature value type in all the target features, the step 1022 may include: step 10221, determining a second feature class to which the target feature belongs; step 10222, determining a plurality of third feature groups corresponding to the second feature class in all the first feature groups; step 10223, determining a feature group in which a feature having the same feature value as the target feature is located in the plurality of third feature groups, as a second feature group matched with the first feature; at step 10224, a second feature group matching each of the target features is determined as the plurality of second feature groups. For example, for a target feature with a feature value of "red and swollen", a second feature class "symptom" to which the target feature "red and swollen" belongs needs to be determined first; then, determining a plurality of third feature groups corresponding to the second feature class in all the first feature groups (i.e. all the feature groups in the feature probability network), wherein the feature classes corresponding to the third feature groups are all 'symptoms'; then, determining all feature groups with the features with the feature values of red and swollen from a plurality of third feature groups with the feature class of 'symptom' as the second feature group; and finally, repeatedly executing the steps 10221 to 10223, and determining a plurality of feature groups matched with other target features as the plurality of second feature groups.

Alternatively, for the target feature with the continuous feature value type in all the target features, the step 1022 may include: step 10225, determining a third feature class to which the target feature belongs; step 10226, determining a plurality of fourth feature groups corresponding to the third feature class in all the first feature groups, where the plurality of fourth feature groups are a plurality of value intervals, which are equally divided according to the maximum feature values and the minimum feature values of all the features corresponding to the third feature class; step 10227, determining a first value range in which the target feature is located among the plurality of value ranges; step 10228, determining a feature group corresponding to the first value range from the fourth feature groups as the second feature group matching the target feature; at step 10229, a second feature group matching each of the target features is determined as the plurality of second feature groups. For example, for a target feature with a feature value of "6000", it is first necessary to determine a third feature class "hospitalization cost" to which the target feature "6000" belongs; then, determining a plurality of fourth feature groups corresponding to the third feature class in all the first feature groups (i.e. all the feature groups in the feature probability network), wherein the plurality of fourth feature groups correspond to a plurality of value intervals of the hospitalization cost of the third feature class; then, determining a value interval in which the characteristic value 6000 is positioned in the plurality of value intervals, and taking a characteristic group corresponding to the value interval as the second characteristic group; finally, the above steps 10225 to 10228 are repeatedly executed to determine a plurality of feature groups matching with other target features as the above second feature groups.

And step 1023, determining the sum of a plurality of correlation probability values between the first DRG group and the plurality of second characteristic groups according to the characteristic probability network, and taking the sum as the total energy value of the first DRG group corresponding to the medical records to be grouped.

For example, the initialization energy value formula (2) of the DRG group and the feature group can be expressed as:

wherein, V_fIs a set of all feature group nodes, V 'in the DRG system'_fFor the set of feature groups corresponding to all target features in the medical records to be grouped, the meaning of the formula (2) is: setting the initialization energy value to be 1 for the feature group nodes contained in the intersection of the set of all feature group nodes and the set of the feature groups corresponding to all target features in the medical records to be grouped; setting the initialization energy value to be 0 for the feature group node corresponding to the feature group only existing in one of the grouped medical records and the medical records to be grouped, wherein the practical effect is to ignore the feature only existing in one of the grouped medical records and the medical records to be grouped when calculating the total energy valueAnd (4) grouping.

Illustratively, based on the initial energy value formula (2), formula (3) for calculating the total energy value may be expressed as:

wherein Q is_jRepresents the total energy value, Q, of the jth DRG group_iIndicating the initialization energy value, W, of the ith feature group_ijThe weight of the edge (correlation probability value) between the ith feature group and the jth DRG group.

Fig. 4 is a flowchart of a feature group division method according to the embodiment shown in fig. 2, where, as shown in fig. 4, when the feature value types of all sample features corresponding to the first feature class are discrete types, the step 105 may include:

step 1051, divide the sample features having the same feature value among all the sample features into a feature group to obtain the feature groups.

Step 1052, arranging the plurality of feature groups so that the number of the sample features in the plurality of feature groups is in the preset distribution state.

For example, the preset distribution state is a normal distribution, since the number of randomly arranged sample features is not normally distributed in most cases under the feature class containing the discrete feature value. Therefore, in this step 1052, the feature groups are arranged so that the number of sample features is normally distributed. The specific arrangement mode can be as follows: with the feature group having the largest number of sample features among the plurality of feature groups as a center, the other feature groups are arranged in order from the largest to the smallest on the left and right of the feature group having the largest number of sample features, so that the number of sample features is normally distributed.

Step 1053, assigning numbers to the plurality of feature groups according to the order of arrangement.

Illustratively, the number is used as an input variable of the probability density function, and actually represents the position of a feature group in the overall normal distribution.

Fig. 5 is a flowchart of another feature group division method according to the embodiment shown in fig. 2, and as shown in fig. 5, when the feature value types of all sample features corresponding to the first feature class are continuous types, the step 105 may include:

step 1054, obtain the sample feature with the maximum eigenvalue and the sample feature with the minimum eigenvalue of all the above sample features.

Step 1055, equally dividing a plurality of value intervals between the maximum eigenvalue and the minimum eigenvalue.

Illustratively, this step 1055 may include: calculating the group distance of the value interval by the following group distance calculation formula (4):

wherein, g_disIndicating group spacing, V_maxIs the maximum eigenvalue, V_minIs the minimum eigenvalue, g_numThe number of partitions is preferably 30. Thereafter, the interval [ V ] can be acquired_min+i*g_dis,V_min+(i+1)*g_dis]Wherein i ∈ [0, g ]_num]I.e. i is from 0 to g_numThe serial number of any value interval in between.

And 1056, dividing the sample features in the same value interval of all the sample features into the same feature group to obtain the plurality of feature groups.

And 1057, assigning numbers to the plurality of feature groups according to the sizes of the endpoint values of the value intervals corresponding to the plurality of feature groups.

For example, unlike the sample feature having the discrete feature value, when the plurality of feature groups are arranged according to the size of the end point value of the value section corresponding to the plurality of feature groups, the distribution state itself of the number of sample features in the plurality of feature groups is approximated to a normal distribution without rearranging. Therefore, here, the plurality of feature groups arranged in the row are directly assigned with numbers, and the numbers are the same as the numbers in the step 1053.

In summary, the present disclosure can extract all target features in medical records to be grouped; acquiring a total energy value of a first DRG group corresponding to the medical record to be grouped according to a pre-established characteristic probability network and all target characteristics, wherein the characteristic probability network is a network topology structure established by taking the DRG group and the characteristic group as nodes, taking an association relation between the DRG group and the characteristic group as an edge and taking a correlation probability value between the DRG group and the characteristic group as a weight of the edge, the first DRG group is any one of a plurality of DRG groups contained in a DRG library, the total energy value is the sum of the correlation probability values between the first DRG group and the plurality of characteristic groups to which all the target characteristics belong, and each characteristic group contains a plurality of characteristics meeting the same grouping condition; and determining the DRG group with the maximum total energy value as a target DRG group corresponding to the medical record to be grouped. The medical records can be identified and grouped through a network structure established according to the correlation between the discrete type and continuous type characteristics contained in the medical records and the DRG group, the step of manual grouping is avoided, meanwhile, the application range of self-adaptive medical record grouping is expanded, and further the medical record grouping efficiency and accuracy are improved.

Fig. 6 is a block diagram illustrating a grouping apparatus for medical records according to an exemplary embodiment, as shown in fig. 6, applied to the DRG system described above, the apparatus 600 includes:

the feature extraction module 610 is configured to extract all target features in medical records to be grouped;

an energy value obtaining module 620, configured to obtain a total energy value of a first DRG group corresponding to the medical record to be grouped according to a pre-established feature probability network and all target features, where the feature probability network is a network topology structure established by taking the DRG group and the feature group as nodes, taking an association relationship between the DRG group and the feature group as an edge, and taking a correlation probability value between the DRG group and the feature group as a weight of the edge, the first DRG group is any one of a plurality of DRG groups included in a DRG library, the total energy value is a sum of a plurality of correlation probability values between the first DRG group and a plurality of feature groups to which all the target features belong, and each feature group includes a plurality of features meeting a same grouping condition;

and a medical record grouping module 630, configured to determine that the DRG group with the largest total energy value is the target DRG group corresponding to the medical record to be grouped.

Fig. 7 is a block diagram of another medical record grouping apparatus according to the embodiment shown in fig. 6, and as shown in fig. 7, each of the DRG groups in the DRG repository corresponds to a plurality of grouped medical records, each of the grouped medical records includes a plurality of features, and the plurality of features correspond to a plurality of feature classes, the apparatus 600 further includes:

a sample obtaining module 640, configured to obtain, for a first feature class, all sample features belonging to the first feature class in a plurality of grouped medical records corresponding to a second DRG group, where the second DRG group is any DRG group in the DRG library, and the first feature class is any feature class in the plurality of feature classes;

a feature grouping module 650, configured to divide all the sample features into a plurality of feature groups according to a grouping condition corresponding to the first feature class, so that the number of the sample features in the plurality of feature groups is in a preset distribution state;

a correlation determining module 660, configured to obtain a correlation probability value between the second DRG group and each feature group in the plurality of feature groups according to the number of sample features in each feature group in the plurality of feature groups and a probability density function corresponding to the preset distribution state;

a network establishing module 670, configured to, after obtaining the correlation probability values between the plurality of DRG groups and all feature groups corresponding to the plurality of DRG groups, establish the feature probability network by using the plurality of DRG groups and all feature groups corresponding to the plurality of DRG groups as nodes, using the correlation relationships between the plurality of DRG groups and all feature groups corresponding to the plurality of DRG groups as edges, and using the correlation probability values between the plurality of DRG groups and all feature groups corresponding to the plurality of DRG groups as weights of the edges.

Fig. 8 is a block diagram illustrating an energy value obtaining module according to the embodiment shown in fig. 7, wherein the energy value obtaining module 620, as shown in fig. 8, includes:

a first feature group determining submodule 621, configured to determine, according to the feature probability network, all first feature groups having an association relationship with the first DRG group;

a second feature group determination sub-module 622 for determining a plurality of second feature groups matching all the target features from all the first feature groups;

and an energy value operator module 623, configured to determine a sum of multiple correlation probability values between the first DRG group and the multiple second feature groups according to the feature probability network, as a total energy value of the first DRG group corresponding to the medical record to be grouped.

Fig. 9 is a block diagram of a feature grouping module according to the embodiment shown in fig. 7, where, as shown in fig. 9, the feature value types of the features are discrete types or continuous types, and all the features corresponding to each feature class have the same feature value type, the feature grouping module 650 includes:

a first feature group obtaining sub-module 651, configured to divide sample features having the same feature value among all the sample features into a feature group, so as to obtain the plurality of feature groups;

a feature group arrangement submodule 652 configured to arrange the plurality of feature groups so that the number of sample features in the plurality of feature groups is in the preset distribution state;

the first feature group numbering sub-module 653 is configured to assign numbers to the plurality of feature groups according to the order of arrangement.

Fig. 10 is a block diagram illustrating another feature grouping module according to the embodiment shown in fig. 7, such as the feature grouping module 650 shown in fig. 10, including:

the feature value obtaining submodule 654 is configured to obtain a sample feature having a maximum feature value and a sample feature having a minimum feature value from among all the sample features;

an interval division submodule 655, configured to equally divide a plurality of value intervals between the maximum eigenvalue and the minimum eigenvalue;

a second feature group obtaining sub-module 656, configured to divide sample features in the same value interval of all the sample features into the same feature group, so as to obtain the multiple feature groups;

the second feature group numbering sub-module 657 is configured to assign numbers to the plurality of feature groups according to the sizes of the endpoint values of the value intervals corresponding to the plurality of feature groups.

Optionally, the preset distribution state is a normal distribution, and the correlation determining module 660 is configured to:

taking the serial number of each feature group as an input variable of a normal distribution probability density function to acquire a correlation probability value between the second DRG group and each feature group in the plurality of feature groups; wherein the normally distributed probability density function includes:

Optionally, the second feature set determining sub-module 622 is configured to:

determining a second feature class to which the target feature belongs;

determining a plurality of third feature groups corresponding to the second feature class in all the first feature groups;

determining a feature group in which a feature having the same feature value as the target feature is located from the plurality of third feature groups, as a second feature group matched with the first feature;

a second feature group matching each of the target features is determined as the plurality of second feature groups.

Optionally, for the target feature with the continuous type of feature value type in all the target features, the second feature group determining sub-module 622 is configured to:

determining a third feature class to which the target feature belongs;

determining a plurality of fourth feature groups corresponding to the third feature class in all the first feature groups, wherein the plurality of fourth feature groups are a plurality of value intervals which are divided according to the maximum feature values, the minimum feature values and the like of all the features corresponding to the third feature class;

determining a feature group corresponding to the first value range from the fourth feature groups as the second feature group matched with the target feature;

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Fig. 11 is a block diagram illustrating an electronic device 1100 in accordance with an example embodiment. As shown in fig. 11, the electronic device 1100 may include: a processor 1101, a memory 1102, multimedia components 1103, input/output (I/O) interfaces 1104, and communication components 1105.

The processor 1101 is configured to control the overall operation of the electronic device 1100, so as to complete all or part of the steps in the medical record grouping method. The memory 1102 is used to store various types of data to support operation at the electronic device 1100, such as instructions for any application or method operating on the electronic device 1100, as well as application-related data, such as contact data, messaging, pictures, audio, video, and so forth. The Memory 1102 may be implemented by any type or combination of volatile and non-volatile Memory devices, such as Static Random Access Memory (SRAM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Erasable Programmable Read-Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic disk, or optical disk. The multimedia components 1103 may include screen and audio components. Wherein the screen may be, for example, a touch screen and the audio component is used for outputting and/or inputting audio signals. For example, the audio component may include a microphone for receiving external audio signals. The received audio signal may further be stored in the memory 1102 or transmitted through the communication component 1105. The audio assembly also includes at least one speaker for outputting audio signals. The I/O interface 1104 provides an interface between the processor 1101 and other interface modules, such as a keyboard, mouse, buttons, and the like. These buttons may be virtual buttons or physical buttons. The communication component 1105 provides for wired or wireless communication between the electronic device 1100 and other devices. Wireless Communication, such as Wi-Fi, bluetooth, Near Field Communication (NFC), 2G, 3G, or 4G, or a combination of one or more of them, so that the corresponding Communication component 1105 may include: Wi-Fi module, bluetooth module, NFC module.

In an exemplary embodiment, the electronic Device 1100 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic components for performing the grouping method of medical records described above.

In another exemplary embodiment, a computer readable storage medium, such as the memory 1102, is also provided that includes program instructions executable by the processor 1101 of the electronic device 1100 to perform the medical record grouping method described above.

Preferred embodiments of the present disclosure are described in detail above with reference to the accompanying drawings, however, the present disclosure is not limited to the specific details of the above embodiments, and other embodiments of the present disclosure may be easily conceived by those skilled in the art within the technical spirit of the present disclosure after considering the description and practicing the present disclosure, and all fall within the protection scope of the present disclosure.

It should be noted that the various features described in the above embodiments may be combined in any suitable manner without departing from the scope of the invention. Meanwhile, any combination can be made between various different embodiments of the disclosure, and the disclosure should be regarded as the disclosure of the disclosure as long as the combination does not depart from the idea of the disclosure. The present disclosure is not limited to the precise structures that have been described above, and the scope of the present disclosure is limited only by the appended claims.

Claims

1. A method for grouping medical records, the method comprising:

extracting all target features in medical records to be grouped;

determining the DRG group with the maximum total energy value as a target DRG group corresponding to the medical record to be grouped;

each DRG group in the DRG library corresponds to a plurality of grouped medical records, each grouped medical record comprises a plurality of features, the features correspond to a plurality of feature classes, and before all target features in the medical records to be grouped are extracted, the method further comprises the following steps:

2. The method of claim 1, wherein the obtaining the total energy value corresponding to the medical records to be grouped in the first DRG group according to the pre-established feature probability network and all the target features comprises:

3. The method according to claim 1, wherein the feature value types of the features are discrete types or continuous types, all the features corresponding to each feature class have the same feature value type, and when the feature value types of all the sample features corresponding to the first feature class are discrete types, the dividing all the sample features into a plurality of feature groups according to the grouping condition corresponding to the first feature class so that the number of the sample features in the plurality of feature groups is in a preset distribution state comprises:

4. The method according to claim 3, wherein when the feature value types of all the sample features corresponding to the first feature class are continuous, the dividing all the sample features into a plurality of feature groups according to the grouping condition corresponding to the first feature class so that the number of the sample features in the plurality of feature groups is in a preset distribution state comprises:

5. The method of claim 4, wherein the preset distribution state is a normal distribution, and the obtaining the correlation probability value between the second DRG group and each feature group corresponding to the second DRG group according to the number of target features in each feature group in the plurality of feature groups and the probability density function corresponding to the preset distribution state comprises:

6. The method according to claim 2, wherein the determining, for the target feature of which the feature value type is a discrete type among the all target features, a plurality of second feature groups that match the all target features among the all first feature groups comprises:

determining a second feature class to which the target feature belongs;

7. The method according to claim 2, wherein the determining, for the target features of which the feature value types are continuous among the all target features, a plurality of second feature groups that match the all target features among the all first feature groups comprises:

determining a third feature class to which the target feature belongs;

8. An apparatus for grouping medical records, the apparatus comprising:

a medical record grouping module, configured to determine that the DRG group with the largest total energy value is the target DRG group corresponding to the medical record to be grouped;

each of the DRG groups in the DRG library corresponds to a plurality of grouped medical records, each of the grouped medical records includes a plurality of features, the plurality of features correspond to a plurality of feature classes, the apparatus further comprises:

9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.

10. An electronic device, comprising:

a memory having a computer program stored thereon;

a processor for executing the computer program in the memory to carry out the steps of the method of any one of claims 1 to 7.