WO2021139116A1

WO2021139116A1 - Method, apparatus and device for intelligently grouping similar patients, and storage medium

Info

Publication number: WO2021139116A1
Application number: PCT/CN2020/099566
Authority: WO
Inventors: 廖希洋; 马凯宁; 欧秋雨
Original assignee: 平安科技（深圳）有限公司
Priority date: 2020-05-14
Filing date: 2020-06-30
Publication date: 2021-07-15
Also published as: CN111739634A; WO2021139116A9

Abstract

Provided are a method, apparatus and device for intelligently grouping similar patients, and a storage medium. The method for intelligently grouping similar patients comprises: acquiring new patient data to be matched, wherein the new patient data contains multiple pieces of disease feature data; performing vectorization processing on the disease feature data to obtain a disease feature word vector corresponding to a new patient; calculating the Mahalanobis distance between the new patient data and each piece of historical patient data in a preset disease feature database; sorting various Mahalanobis distances to obtain a sorting result; and determining disease information groups correspondingly matching the new patient data, wherein the disease information groups respectively include different clinical outcome information. By using the method, data information of historical patients can be used to the greatest possible extent, the disease information group to which the new patient belongs can be quickly determined according to the Mahalanobis distance, and a doctor can be assisted, according to the features of the corresponding disease information group, in making a decision, thereby improving the accuracy of the doctor when making a medical decision.

Description

Method, device, equipment and storage medium for intelligent grouping of similar patients

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on May 14, 2020, the application number is 202010405737.7, and the invention title is "Similar patient intelligent clustering method, device, equipment and storage medium", the entire content of which is incorporated by reference Incorporate in the application.

Technical field

This application relates to the field of database technology, in particular to methods, devices, equipment and storage media for intelligent grouping of similar patients.

Background technique

With the development of technology, artificial intelligence is becoming more and more common. In the medical field, when doctors make medical decision-making scenarios, they usually compare the characteristics and treatment process of patients who have been treated in the past with the actual conditions of patients who are currently being treated. Combining circumstances to make more appropriate medical decisions. However, when doctors make medical decisions for new patients, they do not fully utilize the data of existing patients.

The inventor realizes that most of the medical decisions made for new patients based on the data of samples (historical patients) rely on continuous data, such as test indicators, age, etc., to obtain different subgroups with large differences in clinical outcomes, and It is not possible to use the information considered by doctors in decision-making as much as possible, and it is impossible to make accurate medical decisions quickly.

Summary of the invention

The main purpose of this application is to solve the technical problem of how to intelligently group similar patients.

In order to achieve the above objectives, the first aspect of the present application provides an intelligent grouping method for similar patients, which includes: acquiring new patient data to be matched, the new patient data including multiple disease characteristic data; The disease feature data is vectorized to obtain the disease feature word vector corresponding to the new patient; based on the disease feature word vector, the relationship between the new patient data and each historical patient data in the preset disease feature database is calculated. The disease feature database contains multiple disease information groups, and similar disease features belong to the same disease information group; each of the Mahalanobis distances is sorted to obtain a sorting result; based on the sorting result, all disease information groups are determined The new patient data corresponds to the matched disease information group, wherein the disease information group respectively contains different clinical outcome information.

The second aspect of the present application provides an intelligent grouping device for similar patients, including a memory, a processor, and computer-readable instructions stored on the memory and running on the processor, and the processor executes the computer The following steps are implemented when the instructions are readable: acquiring new patient data to be matched, the new patient data containing multiple disease characteristic data; performing vectorization processing on each disease characteristic data of the new patient to obtain the corresponding new patient The disease feature word vector; based on the disease feature word vector, calculate the Mahalanobis distance between the new patient data and each historical patient data in the preset disease feature database, wherein the disease feature database contains multiple diseases Information group, similar disease characteristics belong to the same disease information group; sort each of the Mahalanobis distances to obtain a sort result; based on the sort result, determine the disease information group corresponding to the new patient data, wherein, The disease information groups respectively contain different clinical outcome information.

The third aspect of the present application provides a computer-readable storage medium in which computer instructions are stored, and when the computer instructions are run on the computer, the computer is caused to perform the following steps: obtain the new to-be-matched Patient data, the new patient data contains multiple disease feature data; vectorized processing is performed on each disease feature data of the new patient to obtain the disease feature word vector corresponding to the new patient; based on the disease feature word vector Calculate the Mahalanobis distance between the new patient data and each historical patient data in the preset disease feature database, where the disease feature database contains multiple disease information groups, and similar disease features belong to the same disease information group Sort the Mahalanobis distances to obtain a sorting result; based on the sorting result, determine the disease information group corresponding to the new patient data, wherein the disease information group contains different clinical outcome information .

The fourth aspect of the present application provides an intelligent grouping device for similar patients, including: a first acquisition module for acquiring new patient data to be matched, the new patient data including multiple disease characteristic data; a first processing module, It is used to vectorize each disease feature data of the new patient to obtain the disease feature word vector corresponding to the new patient; the first calculation module is used to calculate the new patient data based on the disease feature word vector The Mahalanobis distance between each historical patient data in the preset disease feature database, wherein the disease feature database contains multiple disease information groups, and similar disease features belong to the same disease information group; the sorting module is used to compare The respective Mahalanobis distances are sorted to obtain a sorting result; the determining module is configured to determine, based on the sorting result, the disease information group corresponding to the new patient data, wherein the disease information group includes different Clinical outcome information.

In the technical solution provided in this application, the new patient data to be matched is acquired, and the new patient data contains multiple disease characteristic data, and each disease characteristic data of the new patient is vectorized to obtain the corresponding new patient Based on the disease feature word vector, calculate the Mahalanobis distance between the new patient data and each historical patient data in the preset disease feature database, wherein the disease feature database contains multiple diseases Information group, similar disease characteristics belong to the same disease information group, the respective Mahalanobis distances are sorted to obtain a sorting result, and based on the sorting result, the matching disease information group corresponding to the new patient data is determined. Wherein, the disease information groups respectively include different clinical outcome information. This solution can be applied in the field of smart medical care, thereby promoting the construction of smart cities. It can make maximum use of the information in the sample (patient) data that doctors will consider when making medical decisions, and judge the disease group of the new patient based on the Mahalanobis distance. , Assist doctors in making decisions based on the characteristics of the corresponding group, which improves the efficiency of judging the group that the patient belongs to, and improves the accuracy of medical decision-making.

Description of the drawings

Fig. 1 is a schematic diagram of a first embodiment of a method for intelligent grouping of similar patients in an embodiment of this application;

FIG. 2 is a schematic diagram of a second embodiment of a method for intelligent grouping of similar patients in an embodiment of this application;

Fig. 3 is a schematic diagram of a third embodiment of a method for intelligent grouping of similar patients in an embodiment of this application;

4 is a schematic diagram of a first embodiment of an intelligent grouping device for similar patients in an embodiment of this application;

FIG. 5 is a schematic diagram of a second embodiment of an intelligent grouping device for similar patients in an embodiment of this application;

Fig. 6 is a schematic diagram of an embodiment of an intelligent grouping device for similar patients in an embodiment of the application.

Detailed ways

The embodiments of the present application provide a method, device, equipment and storage medium for intelligent grouping of similar patients, which are used to calculate the new patient data and each preset disease group when determining the disease group to which the new patient belongs by acquiring new patient data. According to the Mahalanobis distance between each sample (patient) data in the group, the disease group to which the new patient data belongs is determined according to the value of the Mahalanobis distance. This solution belongs to the field of smart medical care. Through this solution, the construction of smart cities can be promoted. This application can make maximum use of the information in the sample (patient) data that doctors will consider when making medical decisions. At the same time, it can be judged based on the Mahalanobis distance The disease group to which the new patient belongs, assists the doctor in making decisions based on the characteristics of the corresponding group and other information. It improves the efficiency of judging the group of patients and improves the accuracy of doctors' decision-making.

In order to enable those skilled in the art to better understand the solution of the present application, the embodiments of the present application will be described below in conjunction with the accompanying drawings in the embodiments of the present application.

The terms "first", "second", "third", "fourth", etc. (if any) in the description and claims of this application and the above-mentioned drawings are used to distinguish similar objects, without having to use To describe a specific order or sequence. It should be understood that the data used in this way can be interchanged under appropriate circumstances so that the embodiments described herein can be implemented in a sequence other than the content illustrated or described herein. In addition, the terms "including" or "having" and any variations thereof are intended to cover non-exclusive inclusion. For example, a process, method, system, product, or device that includes a series of steps or units is not necessarily limited to those clearly listed. Steps or units, but may include other steps or units that are not clearly listed or are inherent to these processes, methods, products, or equipment.

For ease of understanding, the following describes the specific process of the embodiment of the present application. Please refer to FIG. 1. An embodiment of the method for intelligent grouping of similar patients in the embodiment of the present application includes:

In an embodiment, the method for intelligent grouping of similar patients includes:

101. Obtain new patient data to be matched, where the new patient data includes multiple disease characteristic data;

In this embodiment, the new patient data to be matched refers to the data of the patient who is being treated by the doctor, and the doctor needs to learn from the information of the previous patient to make medical decisions. It contains both the personal information of the new patient and the patient's information. Information about the symptoms of the disease and the characteristics of the disease, including gender, age, name, various physical examination indicators, examination results, past medical history and other data information. For example, Zhang San, gender male, Han nationality, age 25, hepatitis B history of 10 years, chief complaint: often feeling fatigue, lack of physical strength, lower extremity edema, insomnia and dreams, upper abdominal discomfort, abdominal distension, yellow skin and urine, dark urine, etc. .

"Matching" in this embodiment refers to matching the disease and symptoms of the new patient with the symptoms of the previous patient.

102. Perform vectorization processing on each disease feature data of the new patient to obtain a disease feature word vector corresponding to the new patient;

In this embodiment, since the collected data types containing new patient information not only include continuous data such as test indicators, age, etc., but also discrete data or text data such as gender and examination results, it is necessary to check the collected data. According to the data type to which the new patient data belongs, vectorize the data to obtain the corresponding vectorized new patient data. For example, if the new patient data is a mixture of text data, discrete data, and continuous data, then the word vector method in natural language processing technology is used to perform one-stop treatment on the text data and discrete data. -Hot) preprocessing of encoding to obtain vectorized data.

Among them, continuous data does not require any standardization or normalization preprocessing, and the characteristic data of this type can be used directly.

103. Based on the disease feature word vector, calculate the Mahalanobis distance between the new patient data and each historical patient data in a preset disease feature database, wherein the disease feature database includes multiple disease information groups, Similar disease characteristics belong to the same disease information group;

In this embodiment, the Mahalanobis distance between the new patient data and each historical patient data in each preset disease feature database is calculated according to the patient feature word vector generated after vectorization processing. For example, the preset disease feature database shares the Mahalanobis distance A, B, C, D, E, F, G, 7 disease information groups, each disease information group has n samples (patients): A(a1,a2,a3...an), B (b1,b2,b3...bn), C(c1,c2,c3...cn), D(d1,d2,d3...dn), E(e1,e2,e3...en) , F(f1,f2,f3...fn), G(g1,g2,g3...gn), respectively calculate the new patient data and A, B, C, D, E, F, G, 7 diseases The Mahalanobis distance between each sample (patient) data in the information group.

In this embodiment, the disease characteristic information database, we can understand it as a database containing a large number of patient data, including multiple different groups of a disease, for example, the outcome is diabetes with nephropathy, diabetes with hypertension, Or groups with diabetes HbA1c standards, etc. Each disease information group contains data information of a certain number of patients with this type of clinical outcome. In this embodiment, we also call the data information of these patients as sample data.

104. Sort the Mahalanobis distances to obtain a sorting result;

In this embodiment, the Mahalanobis distance is sorted according to the value of the Mahalanobis distance between the calculated new patient data and each historical patient data in each preset disease characteristic information database, and the sorting result is obtained. The ranking can be from largest to smallest, or from smallest to largest, in which the Mahalanobis distance between pairs of patients with similar outcomes is much smaller than the Mahalanobis distance between pairs of patients with dissimilar outcomes.

105. Based on the sorting result, determine a disease information group corresponding to the new patient data, wherein the disease information group respectively contains different clinical outcome information.

In this embodiment, the disease information group refers to a specific disease group, which contains a certain number of samples (patients) of this type of disease. Taking the clinical outcome information of a diabetic patient as HbA1c (less than 7) as an example, each sample (patient) in the clinical outcome information group of the disease has personal information, disease characteristics, disease development process, outcome and other information throughout the course of the disease. Current medical history, past medical history, recent medications, past history, family history, physical examination, outcome and other data information.

In this embodiment, if the Mahalanobis distance between the new patient data and the sample (patient) is smaller, it means that the outcome between the two patients is similar, and the greater the possibility that they belong to the same disease information group, so you can According to the sorting result of Mahalanobis distance, it is determined that the new patient data corresponds to the disease information group.

In this embodiment, Mahalanobis distance is used to measure the similarity between two data samples. For example, two sample data are identified by two sample matrices, and the covariance of sample matrix 1 data is sample matrix 1 horse. Similarly, the sample matrix 2 also has a corresponding Mahalanobis distance. If the calculated two Mahalanobis distances are closer, then it can be considered that the similarity of the two samples is higher.

It is understandable that the execution subject of this application may be an intelligent grouping device for similar patients, and may also be a terminal or a server, which is not specifically limited here. The embodiment of the present application takes the server as the execution subject as an example for description.

In the embodiment of this application, by acquiring new patient data, when determining the disease group to which the new patient belongs, the Markov between the new patient data and each sample (patient) data in each preset disease group is calculated respectively. The distance, according to the value of Mahalanobis distance, determines the disease group to which the new patient data belongs. This solution belongs to the field of smart medical care. Through this solution, the construction of smart cities can be promoted. This application can make maximum use of the information in the sample (patient) data that doctors will consider when making medical decisions. At the same time, it can be judged based on the Mahalanobis distance The disease group to which the new patient belongs, assists the doctor in making decisions based on the characteristics of the corresponding group and other information. It improves the efficiency of judging the group of patients and improves the accuracy of doctors' decision-making.

Referring to Fig. 2, another embodiment of the method for intelligent grouping of similar patients in the embodiment of the present application includes:

201. Obtain sample data including outcome variables;

In this embodiment, the outcome variable refers to the outcome of a certain disease concern. If you have a cold, the outcome of concern is whether it is cured. The outcome of type 2 diabetes care is whether glycation meets the standard.

In this embodiment, the sample data containing outcome variables refers to the data information of patients who have received treatment and the treatment has ended. A large number of historical patient data containing outcome variables are obtained as sample data through the hospital’s electronic medical records and other channels, and the sample is judged The type of data. For example, basic information such as the patient's name, age and blood type, the patient's main complaint, past medical history, family history, physical examination, medication information, and outcome (whether cured), etc.

202. Preprocess the sample data based on the type of the sample data to obtain a discretized word vector;

In this embodiment, the sample is preprocessed according to the type of sample data. For example, discrete data or text data can be vectorized to obtain data in the form of discrete word vectors.

In an optional embodiment, specifically acquiring the type of the new patient data;

In this embodiment, in the medical field, the data type of new patient data includes not only continuous data such as test indicators, age, etc., but also discrete data or text data such as gender and examination results. At the same time, since discrete data and text data must be discretized before they can be used in discrete word vector form, the type of new patient data must be determined.

In another optional embodiment, specifically based on the type of the new patient data, determine the vectorization processing corresponding to the data and execute the vectorization processing;

Wherein, the vectorization processing method includes:

A. When the type of the new patient data is text data, vectorize the text data;

In this embodiment, if the new patient data is text data, vectorization processing is performed on the modified data.

In this embodiment, text data refers to any character that cannot participate in arithmetic operations, and is also referred to as character data, such as gender, inspection results, and so on.

In this embodiment, vectorization refers to converting words into a distributed representation, also known as word vectors, so that there is a concept of "distance" between words and contains more information.

B. When the type of the new patient data is discrete data, perform vectorization processing on the discrete data;

In this embodiment, as with the text data, if the new patient data is discrete data, the data is also vectorized in the same manner to form a discrete word vector form.

C. When the type of the new patient data is continuous data, the data is not vectorized.

In this embodiment, if the new patient data is continuous data, there is no need to perform any standardization or normalization preprocessing on the continuous data, and it can be used directly.

In this embodiment, continuous data refers to continuous data, a statistical concept, also known as continuous variables. Refers to data that can be arbitrarily selected within a certain interval, the value is continuous, and two adjacent values can be infinitely divided (that is, an infinite number of values). For example, the specifications and dimensions of the production parts, the height, weight, and chest circumference of the body measured are continuous data, and the values can only be obtained by measurement or measurement. "

In this embodiment, due to the different data types, the processing of the data is different. For example, continuous data can be used directly without processing, while text data or discrete data need to be vectorized before it can be processed. Therefore, we must determine the vectorization processing corresponding to the sample data.

In this embodiment, vectorization processing is performed on the new patient data to obtain the patient feature word vector. Patient feature word vector refers to data in the form of a word vector containing patient information and characteristics.

203. Calculate the Mahalanobis distance between each sample in the sample data based on the discretized word vector;

In this embodiment, the Mahalanobis distance between each sample (patient) data in the sample data is calculated according to the discretized word vector.

In this embodiment, the Mahalanobis distance means that the Mahalanobis distance is an effective method to calculate the closest distance between a sample and the "center of gravity" of a sample set, or to effectively calculate the similarity between two unknown sample sets. It takes into account the relationship between various characteristics, can eliminate the interference of the correlation between variables, and the Mahalanobis distance is scale-independent, that is, independent of the measurement scale. When ∑ is the identity matrix, the Mahalanobis distance is the Euclidean distance. In summary, Mahalanobis distance can easily measure the distance between the observed sample and the known sample set, so it is very suitable for fault diagnosis.

204. Cluster the sample data based on the Mahalanobis distance between each sample in the sample data to obtain a clustering result;

In this embodiment, clustering is a special classification process that divides sample data with insufficient prior knowledge and uncertainties into several classes. The division is based on dividing data records with a greater degree of similarity into the same group. The degree of dissimilarity among the data records in different groups of childhood is maximized. It is a statistical analysis method for studying (sample or index) classification problems. The cluster generated by clustering is a collection of a set of data objects. These objects are similar to objects in the same cluster and different from objects in other clusters.

In this embodiment, the sample data is clustered according to the Mahalanobis distance between each sample in the sample data to determine the clustering result. For example, the sample data contains n samples (patients) M1, M2, M3... Mn, respectively, calculate the Mahalanobis distance between each sample, and cluster the sample data according to the Mahalanobis distance to obtain the clustering result , Get multiple sample groups.

205. Based on the grouping result, obtain multiple disease information groups included in the sample data, and extract features of the disease information group;

In this embodiment, according to the clustering results, multiple disease information groups contained in the sample data are obtained, and each disease information group contains different clinical outcome information corresponding to a certain disease, for example, according to 500 samples in the sample data (Patient) The Mahalanobis distance between the two groups, clustering the sample data, and obtaining seven different clinical outcome disease information groups of diabetes A, B, C, D, E, F, and G. Further , Extract the characteristics of samples (patients) in each disease information group, such as demographic characteristics, inspection and detection characteristics, etc., and describe these characteristics. For example, in a disease information group, what is the age distribution of the population, gender ( According to the distribution of characteristics, assist doctors in decision-making. The feature distribution of the disease information group in this embodiment is some features of the data distribution of the sample data contained in the group. For example, in the group, the average age of the sample (patient) is 50 years old, and the gender is male. Accounted for 70% and so on.

In this embodiment, multiple disease information groups are obtained according to the grouping results, and features in each disease information group are extracted. These features include, but are not limited to, the gender (male and female) ratio of the population, age distribution, inspection data, and disease Features, disease progression, current medical history, past medical history, etc. For another example, extract the features of a data set of iris flowers. The data set contains 4 features: the length of the calyx, the width of the calyx, the length of the petal, and the width of the petal, in centimeters. Through feature extraction, we can get the characteristics of each disease group to help doctors make more accurate medical decisions.

206. Based on the characteristics of the disease information group, query a preset disease condition description database, and output the disease condition description corresponding to the characteristics of the disease information group;

In this embodiment, the characteristic refers to the characteristic information specific to a certain disease, such as the gender distribution of the population, the distribution characteristic of the inspection data, the characteristic of the disease, and the characteristic of the disease development process. According to the feature distribution information in the disease information group, query the preset disease disease description database to determine the data information of the corresponding disease to help doctors make more accurate medical decisions.

In the examples of this application, the disease description database is obtained based on a large number of disease medical records in the hospital, including a large number of disease characteristics of patients of different ages corresponding to the type of disease, disease development, disease medication treatment process, and the final development trend of the disease . When diagnosing a new patient, judge the disease characteristics of the new patient based on the complaint of the new patient and the diagnosed disease, use the disease characteristics as a key, and query from the preset disease description database to determine the disease type of the new patient . For example, the patient’s condition features: polyuria, polydipsia, and polyphagia, but the weight loss is severe in a short period of time, accompanied by edema of the lower limbs. According to the patient’s condition, the most matching new patient can be queried from the preset disease feature description database. In order to determine the type of disease that best matches the new patient, help doctors make more accurate medical diagnoses.

207. Obtain new patient data to be matched;

208. Perform vectorization processing on each disease feature data of the new patient to obtain a disease feature word vector corresponding to the new patient;

209. Based on the disease feature word vector, calculate the Mahalanobis distance between the new patient data and each historical patient data in the preset disease feature database;

210. Sort the Mahalanobis distances to obtain a sorting result;

211. Based on the ranking result, determine a matched disease information group corresponding to the new patient data.

In the embodiment of this application, by acquiring new patient data, when determining the disease group to which the new patient belongs, the Markov between the new patient data and each sample (patient) data in each preset disease group is calculated respectively. The distance, according to the value of Mahalanobis distance, determines the disease group to which the new patient data belongs. This solution belongs to the field of smart medical care. Through this solution, the construction of smart cities can be promoted. This application can maximize the use of the information in the sample (patient) data that doctors will consider when making medical decisions. At the same time, it can judge new information based on the Mahalanobis distance. The disease group to which the patient belongs, assists the doctor in making decisions based on the characteristics of the corresponding group and other information. It improves the efficiency of judging the group of patients and improves the accuracy of doctors' decision-making.

Referring to Fig. 3, the third embodiment of the method for intelligent grouping of similar patients in the embodiment of the present application includes:

301. Obtain sample data containing outcome variables;

302. Preprocess the sample data based on the type of the sample data to obtain a discretized word vector.

303. Based on the discretized word vector, respectively calculate the Mahalanobis distance between each sample in the sample data.

304. Set the number of clusters to k, and randomly select k samples as initial cluster centers;

In this embodiment, the clustering center refers to dividing the input sample data into different parts according to characteristics in the neural network, which is called clustering, and the clustering center is the center of the clustering.

In this embodiment, clustering refers to the process of dividing a collection of physical or abstract objects into multiple classes composed of similar objects, and is a statistical analysis method for studying (sample or index) classification problems. The cluster generated by clustering is a collection of a set of data objects. These objects are similar to objects in the same cluster and different from objects in other clusters.

In this embodiment, it is assumed that the samples (diabetics) in the sample data are divided into k groups containing different clinical outcome information, and k samples are randomly selected as cluster centers in this batch of sample data centers. For example, we assume that all the data in the sample data can be divided into A, B, C, D, E, F, G, 7 disease information groups, representing 7 different outcome information of diabetes, of which A, B, C , D, E, F, G are the cluster centers of these 7 disease information groups.

In this embodiment, the determination of the cluster center is divided into an initial situation and a non-initial situation. In the initial situation, randomly select k samples from the sample data as the initial cluster centers. The initial cluster center is expressed as: mp(1)=(V _i1 ,V _i2 ,...,V _ij ), where p=1, 2,...,k, and k represents the number of clusters.

305. Calculate the Mahalanobis distances from each sample in the sample data to each cluster center respectively.

In this embodiment, the Mahalanobis distance from each sample to each cluster center in the sample data is calculated separately. For example, if the sample data contains N samples, calculate the Mahalanobis distances between N1, N2, N3... NN and A, B, C, D, E, F, G, and the 7 initial cluster centers, where , The Mahalanobis distances between N1 and A, B, C, D, E, F, G, and the 7 initial cluster centers are a1, b1, c1, d1, e1, f1, g1, respectively.

306. Based on the Mahalanobis distance of each sample to each cluster center, select the minimum Mahalanobis distance corresponding to each sample, and classify each sample into the group where the cluster center corresponding to the minimum Mahalanobis distance is located, until All samples in the sample data are divided, and the first clustering result is obtained;

In this embodiment, according to the value of the Mahalanobis distance from each sample to each cluster center in the obtained sample data, the minimum Mahalanobis distance corresponding to each sample is selected, and each sample is divided into the cluster corresponding to the minimum Mahalanobis distance. In the group where the class center is located, until all the samples in the sample data are divided, the first grouping result is generated. For example, we assume that all the data in the sample data can be divided into A, B, C, D, E, F, G, and 7 different clinical outcome information represents the aggregation of different outcome information groups for a certain disease. The cluster center, the sample data contains N samples, and the Mahalanobis distances between N1, N2, N3...NN and A, B, C, D, E, F, G, and the 7 initial cluster centers are calculated respectively. Taking N1 as an example, the Mahalanobis distances between N1 and A, B, C, D, E, F, and G, and the 7 initial cluster centers are a1, b1, c1, d1, e1, f1, g1, respectively, where If a1 is the smallest, N1 is classified into the clinical outcome information group of the disease where the cluster center A is located. Take this as an example, until the N samples in the sample data are divided, and the first clustering result is generated.

307. According to the Mahalanobis distance, calculate the sum of squared errors of the clusters corresponding to the first clustering result;

In this embodiment, the sum of squared errors of the clusters is calculated according to the Mahalanobis distance.

In this embodiment, when the data is clustered, the density of the samples (patients) in the sample data and the similarity difference of the outcomes between the samples (patients) have an impact on the clustering effect. For example, when the concentration of samples (patients) is high, and the disease characteristics between the disease information group and the disease information group are quite different, the clustering effect is better.

In this embodiment, the sum of square errors refers to the sum of square errors of all samples in the sample data (which needs to be clustered). The smaller the sum of square errors, the higher the similarity of the samples in the disease information group.

308. In a non-initial case, calculate K non-initial cluster centers according to the clustering result generated last time;

In this embodiment, in a non-initial case, the average value of the sample values contained in each grouping is calculated according to the clustering result generated in the previous (clustering) time to obtain k non-initial clustering centers.

309. Calculate the Mahalanobis distance of each sample in the sample data to each non-initial cluster center, select the minimum Mahalanobis distance corresponding to each sample, and divide each sample into the distance corresponding to the minimum Mahalanobis distance. The cluster where the non-initial clustering center is located will generate a new clustering result;

In this embodiment, the Mahalanobis distance between each sample (patient) in the sample data and each non-initial cluster center is calculated, and further, the minimum Mahalanobis distance corresponding to each sample is selected, and each sample is divided into the minimum Mahalanobis distance. The cluster that corresponds to the non-initial clustering center of the ′-degree distance will generate a new clustering result. For example, there are 7 non-initial clustering centers of S, F, H, B, P, R, and K. Calculate the Mahalanobis distance between each non-initial clustering center of sample (patient) m and K, corresponding to the value of Mahalanobis distance M1, m2,...m7, where the value of m2 is the smallest, the sample (patient) m is classified into the group where the non-initial cluster center F is located, until all the samples in the sample data are divided, and a new The grouping result of. Among them, for the sample data, each clustering, the clustering results obtained are different.

310. Based on the Mahalanobis distance, calculate and obtain the sum of squared errors of the clusters corresponding to the new clustering result;

In this embodiment, the density of clustered data samples and the difference between clusters have a greater impact on the clustering effect. When the density of processed data is high and the difference between classes is large, the clustering effect is good, and vice versa. , It is worse. In clustering algorithms, the square error criterion is commonly used, and the function formula is as follows:

Among them, Jc(m) represents the sum of the squared errors of all samples (patients) in the sample data. The smaller Jc(m), the higher the similarity within the group. Xi represents the point in the multidimensional space (a given sample). (Patient)), Zj represents the average value of cluster Cj. Update clusters (step S305) In a non-initial case, K non-initial cluster centers are calculated according to the clustering result generated last time. Update the average value of the cluster, the calculation formula is as follows:

311. Compare the sum of square errors of the clusters corresponding to the first grouping result with the sum of square errors of the clusters corresponding to the new grouping result, and obtain a comparison result.

In this embodiment, since the selection of the initially selected K clustering centers is random, it is difficult to select representative data records as the initial clustering centers. Therefore, the clustering results are very unstable, so it must be based on the initial clustering Recalculate the Mahalanobis distance between each sample in the sample data and the new clustering center corresponding to the clustering result. According to this Mahalanobis distance, calculate the sum of squared errors of the clusters corresponding to the new clustering result, and calculate the sum of the squared errors of the clusters corresponding to the new clustering results. The sum of the squared errors corresponding to the sub-clusters is compared. The smaller the value, the more accurate the clustering result.

In this embodiment, the above iterative calculation process is performed cyclically to compare the sum of square errors of two adjacent clusters. By comparing the sum of square errors of two adjacent clusters, when the value of the sum of square errors corresponding to the cluster no longer occurs Obvious change, that is, when E-E'<ε, stop the iterative calculation, where E and E'are the sum of the square errors of two adjacent clusters, the larger value is E, and the smaller value is E ', ε represents a small positive number.

In this step, because the sum of squared errors of clustering is a method of judging the error of a calculation result. The clustering itself is an iterative process, therefore, what this scheme wants to obtain is a stable clustering result, and use it as the final result. Therefore, when the value error obtained by each loop iteration is small enough (that is, it has similarity), it can be considered that the clustering result is sufficiently stable.

In this embodiment, iterative calculation is a typical method in numerical calculation, which is applied to finding roots of equations, solving equations, finding eigenvalues of matrices, and so on. The basic idea is to approximate successively, first take a rough approximation, and then use the same recurrence formula to repeatedly correct this initial value until the predetermined accuracy requirement is reached.

312. Based on the comparison result, select the clustering result corresponding to the cluster with the smallest sum of square errors of the clusters corresponding to the two clustering results as the final clustering result;

In this embodiment, because the smaller the sum of square errors, the higher the similarity within the group. Therefore, among the grouping results obtained by all clusters, the value of the sum of square errors corresponding to the cluster is the smallest, which means that the grouping result is higher. Accurately, and further, the clustering result of the corresponding cluster with the smallest sum of square errors is the final clustering result.

313. Based on the grouping result, obtain multiple disease information groups included in the sample data, and extract features of the disease information group;

314. Based on the characteristics of the disease information group, query a preset disease condition description database, and output a disease condition description corresponding to the characteristics of the disease information group;

315. Obtain new patient data to be matched, where the new patient data includes multiple disease characteristic data;

316. Perform vectorization processing on each disease feature data of the new patient to obtain a disease feature word vector corresponding to the new patient;

317. Based on the disease feature word vector, calculate the Mahalanobis distance between the new patient data and each historical patient data in the preset disease feature database;

318. Sort the Mahalanobis distances to obtain a sorting result.

319. Based on the sorting result, determine a matching disease information group corresponding to the new patient data, wherein the disease information group respectively includes different clinical outcome information.

In the embodiment of this application, by acquiring new patient data, when determining the disease group to which the new patient belongs, the Markov between the new patient data and each sample (patient) data in each preset disease group is calculated respectively. The distance, according to the value of Mahalanobis distance, determines the disease group to which the new patient data belongs. This solution belongs to the field of smart medical care. Through this solution, the construction of smart cities can be promoted. This application can make maximum use of the information in the sample (patient) data that doctors will consider when making medical decisions. At the same time, it can be judged based on the Mahalanobis distance The disease group to which the new patient belongs will assist the doctor in making decisions based on the characteristics of the corresponding group and other information. It improves the efficiency of judging the group that a patient belongs to, and improves the accuracy of doctors' decision-making.

The above describes the method for intelligent grouping of similar patients in the embodiments of the present application. The intelligent grouping device for similar patients in the embodiments of the present application is described below. Referring to FIG. 4, an embodiment of the intelligent grouping device for similar patients in the embodiments of the present application includes:

The first obtaining module 401 is used to obtain new patient data to be matched;

The first processing module 402 is configured to perform vectorization processing on each disease feature data of the new patient to obtain a disease feature word vector corresponding to the new patient;

The first calculation module 403 is configured to calculate the Mahalanobis distance between the new patient data and each historical patient data in the preset disease characteristic database based on the disease feature word vector;

The sorting module 404 is configured to sort the Mahalanobis distances to obtain a sorting result;

The determining module 405 is configured to determine the matched disease information group corresponding to the new patient data based on the sorting result, wherein the disease information group respectively contains different clinical outcome information.

Optionally, the first processing module 402 may also be specifically configured to:

Acquire the type of the new patient data, determine the vectorization processing corresponding to the data based on the type of the new patient data, and execute the vectorization processing, wherein the preprocessing method includes:

A. When the type of the new patient data is text data, vectorize the text data;

In the embodiment of this application, a method for intelligent grouping of similar patients is provided. The method obtains new patient data, and when judging the disease group to which the new patient belongs, respectively calculates the new patient data and each of the preset disease groups. The Mahalanobis distance between the sample (patient) data. According to the value of the Mahalanobis distance, the disease group to which the new patient data belongs is determined. This solution belongs to the field of smart medical care. Through this solution, the construction of smart cities can be promoted. This application can make maximum use of the information in the sample (patient) data that doctors will consider when making medical decisions. At the same time, it can be judged based on the Mahalanobis distance The disease group to which the new patient belongs will assist the doctor in making decisions based on the characteristics of the corresponding group and other information. It improves the efficiency of judging the group that a patient belongs to, and improves the accuracy of doctors' decision-making.

Referring to Fig. 5, the second embodiment of the device for intelligent grouping of similar patients in the embodiment of the present application includes:

The first obtaining module 501 is used to obtain new patient data to be matched;

The first processing module 502 is configured to perform vectorization processing on each disease feature data of the new patient to obtain a disease feature word vector corresponding to the new patient;

The first calculation module 503 is configured to calculate the Mahalanobis distance between the new patient data and each historical patient data in the preset disease characteristic database based on the disease feature word vector;

The sorting module 504 is used to sort the Mahalanobis distances to obtain a sorting result;

The determining module 505 is configured to determine the matched disease information group corresponding to the new patient data based on the ranking result;

The second obtaining module 506 is used to obtain sample data including outcome variables;

The second processing module 507 is configured to preprocess the sample data based on the type of the sample data to obtain a discretized word vector;

The second calculation module 508 is configured to calculate the Mahalanobis distance between each sample in the sample data based on the discretized word vector;

The clustering module 509 is configured to cluster the sample data based on the Mahalanobis distance between each sample in the sample data to obtain a clustering result;

The extraction module 510 is configured to obtain multiple disease information groups contained in the sample data based on the grouping result, and extract the characteristics of the disease information groups;

The query module 511 is configured to query a preset disease condition description database based on the characteristics of the disease information group, and output the disease condition description corresponding to the characteristics of the disease information group.

Optionally, the first processing module 502 may also be specifically configured to:

A. When the type of the new patient data is text data, vectorize the text data;

Optionally, the clustering module 509 may be specifically used for:

Set the number of clusters to k, randomly select k samples as the initial cluster centers, and calculate the Mahalanobis distances from each sample to each cluster center in the sample data, based on each sample to each cluster center Select the minimum Mahalanobis distance corresponding to each sample, and divide each sample into the group where the cluster center corresponding to the minimum Mahalanobis distance is located, until all the samples in the sample data are divided, and the first time is obtained. Clustering result;

Optionally, the clustering module 509 may also be specifically used for:

According to the Mahalanobis distance, calculate the sum of square errors of the clusters corresponding to the first clustering result. In the non-initial case, K non-initial clustering centers are calculated according to the clustering results generated last time, and each sample in the sample data is calculated The Mahalanobis distance to each non-initial cluster center is selected, the minimum Mahalanobis distance corresponding to each sample is selected, and each sample is divided into the group where the non-initial cluster center corresponding to the minimum Mahalanobis distance is located, and generates New clustering result;

Optionally, the clustering module 509 may also be specifically used for:

Based on the Mahalanobis distance, the total square error of the cluster corresponding to the new clustering result is calculated, and the total square error of the cluster corresponding to the first clustering result is compared with the total square error of the cluster corresponding to the new clustering result, The comparison result is obtained, and based on the comparison result, the grouping result corresponding to the cluster with the smallest sum of square errors corresponding to the two grouping results is selected as the final grouping result.

The above Figures 4 and 5 describe in detail the similar patient intelligent grouping device in the embodiment of the present application from the perspective of modular functional entities, and the following describes the similar patient intelligent grouping device in the embodiment of the present application in detail from the perspective of hardware processing.

6 is a schematic structural diagram of an intelligent grouping device for similar patients provided by an embodiment of the present application. The intelligent grouping device 600 for similar patients may have relatively large differences due to different configurations or performances, and may include one or more processors (central Processing units, CPU) 610 (for example, one or more processors), memory 620, and one or more storage media 630 (for example, one or more storage devices with a large amount of data) storing application programs 633 or data 632. Among them, the memory 620 and the storage medium 630 may be short-term storage or persistent storage. The program stored in the storage medium 630 may include one or more modules (not shown in the figure), and each module may include a series of command operations in the intelligent clustering device 600 for similar patients. Further, the processor 610 may be configured to communicate with the storage medium 630, and execute a series of instruction operations in the storage medium 630 on the intelligent grouping device 600 for similar patients.

The similar patient intelligent grouping device 600 may also include one or more power sources 640, one or more wired or wireless network interfaces 650, one or more input and output interfaces 660, and/or one or more operating systems 631, such as Windows Serve, Mac OS X, Unix, Linux, FreeBSD, etc. Those skilled in the art can understand that the structure of the intelligent clustering device for similar patients shown in FIG. 6 does not constitute a limitation on the intelligent clustering device for similar patients, and may include more or fewer components than shown in the figure, or a combination of certain components, or Different component arrangements.

The present application also provides an intelligent grouping device for similar patients, including: a memory and at least one processor, the memory stores instructions, and the memory and the at least one processor are interconnected by wires; the at least one processor The instructions in the memory are invoked, so that the intelligent path planning device executes the steps in the intelligent grouping method for similar patients.

The present application also provides a computer-readable storage medium. The computer-readable storage medium may be a non-volatile computer-readable storage medium or a volatile computer-readable storage medium. The computer-readable storage medium stores computer instructions, and when the computer instructions are executed on the computer, the computer executes the following steps:

Acquiring new patient data to be matched, where the new patient data includes multiple disease feature data;

Performing vectorization processing on each disease feature data of the new patient to obtain a disease feature word vector corresponding to the new patient;

Based on the disease feature word vector, calculate the Mahalanobis distance between the new patient data and each historical patient data in the preset disease feature database, wherein the disease feature database contains multiple disease information groups and similar symptoms Features belong to the same disease information group;

Sorting the Mahalanobis distances to obtain a sorting result;

Based on the sorting result, the matched disease information group corresponding to the new patient data is determined, wherein the disease information group respectively contains different clinical outcome information.

Those skilled in the art can clearly understand that, for the convenience and conciseness of the description, the specific working process of the system, device and unit described above can refer to the corresponding process in the foregoing method embodiment, which will not be repeated here.

In the several embodiments provided in this application, it should be understood that the disclosed system, device, and method can be implemented in other ways. For example, the device embodiments described above are merely illustrative, for example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or It can be integrated into another system, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.

As mentioned above, the above embodiments are only used to illustrate the technical solutions of the present application, but not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: The technical solutions recorded in the embodiments are modified, or some of the technical features are equivalently replaced; and these modifications or replacements do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

An intelligent grouping method for similar patients, which includes:

Acquiring new patient data to be matched, where the new patient data includes multiple disease feature data;

Performing vectorization processing on each disease feature data of the new patient to obtain a disease feature word vector corresponding to the new patient;

Based on the disease feature word vector, calculate the Mahalanobis distance between the new patient data and each historical patient data in the preset disease feature database, where the disease feature database contains multiple disease information groups and similar symptoms Features belong to the same disease information group;

Sorting the Mahalanobis distances to obtain a sorting result;

Based on the sorting result, the matched disease information group corresponding to the new patient data is determined, wherein the disease information group respectively contains different clinical outcome information.
The method for intelligent grouping of similar patients according to claim 1, wherein before the step of acquiring new patient data to be matched, the method further comprises:

Obtain sample data containing outcome variables;

Preprocessing the sample data based on the type of the sample data to obtain a discretized word vector;

Based on the discretized word vector, respectively calculating the Mahalanobis distance between each sample in the sample data;

Clustering the sample data based on the Mahalanobis distance between each sample in the sample data to obtain a clustering result;

Based on the grouping result, acquiring multiple disease information groups included in the sample data, and extracting features of the disease information group;

Based on the characteristics of the disease information group, query a preset disease and symptom description database, and output the disease and symptom description corresponding to the characteristics of the disease information group.
The method for intelligent grouping of similar patients according to claim 1, wherein said performing vectorization processing on each disease feature data of said new patient to obtain a disease feature word vector corresponding to said new patient comprises:

Acquiring the type of the new patient data;

Based on the type of the new patient data, determine the vectorization processing corresponding to the data and execute the vectorization processing;

Wherein, the vectorization processing includes:

A. When the type of the new patient data is text data, vectorize the text data;

B. When the type of the new patient data is discrete data, perform vectorization processing on the discrete data;

C. When the type of the new patient data is continuous data, the data is not vectorized.
The method for intelligent clustering of similar patients according to claim 2, wherein the clustering of the sample data based on the Mahalanobis distance between each sample in the sample data to obtain a clustering result comprises:

Set the number of clusters to k, and randomly select k samples as the initial cluster centers;

Respectively calculating the Mahalanobis distance from each sample in the sample data to each cluster center;

Based on the Mahalanobis distance from each sample to each cluster center, the minimum Mahalanobis distance corresponding to each sample is selected, and each sample is classified into the group where the cluster center corresponding to the minimum Mahalanobis distance is located, until all All the samples in the sample data are divided, and the first clustering result is obtained.
The method for intelligent grouping of similar patients according to claim 4, wherein in said dividing each sample into the group where the cluster center corresponding to the minimum Mahalanobis distance is located, until all samples in the sample data are divided After getting the first clustering result, it also includes:

According to the Mahalanobis distance, calculate the sum of squared errors of the clusters corresponding to the first clustering result;

In the non-initial case, K non-initial clustering centers are calculated according to the clustering result generated last time;

Calculate the Mahalanobis distance of each sample in the sample data to each non-initial cluster center, select the minimum Mahalanobis distance corresponding to each sample, and divide each sample into the non-initial Mahalanobis distance corresponding to the minimum Mahalanobis distance. The cluster where the cluster center is located generates a new clustering result.
The method for intelligent grouping of similar patients according to claim 5, wherein, in the calculation of the Mahalanobis distance of each sample in the sample data to each non-initial cluster center, the minimum Mahalanobis distance corresponding to each sample is selected , And classify each sample into the cluster where the non-initial cluster center corresponding to the minimum Mahalanobis distance is located, and after the step of generating a new clustering result, it also includes:

Based on the Mahalanobis distance, calculate and obtain the sum of squared errors of the clusters corresponding to the new clustering results;

Comparing the sum of square errors of the clusters corresponding to the first grouping result and the sum of square errors of the clusters corresponding to the new grouping result, and obtaining a comparison result;

Based on the comparison result, the grouping result corresponding to the cluster with the smallest sum of square errors corresponding to the two grouping results is selected as the final grouping result.
An intelligent grouping device for similar patients, including a memory, a processor, and computer-readable instructions stored on the memory and running on the processor, and the processor implements the following steps when the processor executes the computer-readable instructions :

Acquiring new patient data to be matched, where the new patient data includes multiple disease feature data;

Performing vectorization processing on each disease feature data of the new patient to obtain a disease feature word vector corresponding to the new patient;

Based on the disease feature word vector, calculate the Mahalanobis distance between the new patient data and each historical patient data in the preset disease feature database, wherein the disease feature database contains multiple disease information groups and similar symptoms Features belong to the same disease information group;

Sorting the Mahalanobis distances to obtain a sorting result;

Based on the sorting result, the matched disease information group corresponding to the new patient data is determined, wherein the disease information group respectively contains different clinical outcome information.
According to the similar patient intelligent grouping device according to claim 7, the processor further implements the following steps when executing the computer program:

Obtain sample data containing outcome variables;

Preprocessing the sample data based on the type of the sample data to obtain a discretized word vector;

Based on the discretized word vector, respectively calculating the Mahalanobis distance between each sample in the sample data;

Clustering the sample data based on the Mahalanobis distance between each sample in the sample data to obtain a clustering result;

Based on the grouping result, acquiring multiple disease information groups included in the sample data, and extracting features of the disease information group;

Based on the characteristics of the disease information group, query a preset disease and symptom description database, and output the disease and symptom description corresponding to the characteristics of the disease information group.
According to the similar patient intelligent grouping device according to claim 7, the processor further implements the following steps when executing the computer program:

Acquiring the type of the new patient data;

Based on the type of the new patient data, determine the vectorization processing corresponding to the data and execute the vectorization processing;

Wherein, the vectorization processing includes:

A. When the type of the new patient data is text data, vectorize the text data;

B. When the type of the new patient data is discrete data, perform vectorization processing on the discrete data;

C. When the type of the new patient data is continuous data, the data is not vectorized.
According to the similar patient intelligent grouping device according to claim 8, the processor further implements the following steps when executing the computer program:

Set the number of clusters to k, and randomly select k samples as the initial cluster centers;

Respectively calculating the Mahalanobis distance from each sample in the sample data to each cluster center;

Based on the Mahalanobis distance from each sample to each cluster center, the minimum Mahalanobis distance corresponding to each sample is selected, and each sample is classified into the group where the cluster center corresponding to the minimum Mahalanobis distance is located, until all All the samples in the sample data are divided, and the first clustering result is obtained.
According to the similar patient intelligent grouping device according to claim 10, the processor further implements the following steps when executing the computer program:

According to the Mahalanobis distance, calculate the sum of squared errors of the clusters corresponding to the first clustering result;

In the non-initial case, K non-initial clustering centers are calculated according to the clustering result generated last time;

Calculate the Mahalanobis distance of each sample in the sample data to each non-initial cluster center, select the minimum Mahalanobis distance corresponding to each sample, and divide each sample into the non-initial distance corresponding to the minimum Mahalanobis distance. The cluster where the cluster center is located generates a new clustering result.
According to the similar patient intelligent grouping device according to claim 11, the processor further implements the following steps when executing the computer program:

Based on the Mahalanobis distance, calculate and obtain the sum of squared errors of the clusters corresponding to the new clustering results;

Comparing the sum of square errors of the clusters corresponding to the first grouping result and the sum of square errors of the clusters corresponding to the new grouping result, and obtaining a comparison result;

Based on the comparison result, the grouping result corresponding to the cluster with the smallest sum of square errors corresponding to the two grouping results is selected as the final grouping result.
A computer-readable storage medium in which computer instructions are stored, and when the computer instructions are executed on a computer, the computer executes the following steps:

Acquiring new patient data to be matched, where the new patient data includes multiple disease feature data;

Performing vectorization processing on each disease feature data of the new patient to obtain a disease feature word vector corresponding to the new patient;

Based on the disease feature word vector, calculate the Mahalanobis distance between the new patient data and each historical patient data in the preset disease feature database, wherein the disease feature database contains multiple disease information groups and similar symptoms Features belong to the same disease information group;

Sorting the Mahalanobis distances to obtain a sorting result;

Based on the sorting result, the matched disease information group corresponding to the new patient data is determined, wherein the disease information group respectively contains different clinical outcome information.
The computer-readable storage medium according to claim 13, when the computer instructions are executed on the computer, the computer is caused to further execute the following steps:

Obtain sample data containing outcome variables;

Preprocessing the sample data based on the type of the sample data to obtain a discretized word vector;

Based on the discretized word vector, respectively calculating the Mahalanobis distance between each sample in the sample data;

Clustering the sample data based on the Mahalanobis distance between each sample in the sample data to obtain a clustering result;

Based on the grouping result, acquiring multiple disease information groups included in the sample data, and extracting features of the disease information group;

Based on the characteristics of the disease information group, query a preset disease and symptom description database, and output the disease and symptom description corresponding to the characteristics of the disease information group.
The computer-readable storage medium according to claim 13, when the computer instructions are executed on the computer, the computer is caused to further execute the following steps:

Acquiring the type of the new patient data;

Based on the type of the new patient data, determine the vectorization processing corresponding to the data and execute the vectorization processing;

Wherein, the vectorization processing includes:

A. When the type of the new patient data is text data, vectorize the text data;

B. When the type of the new patient data is discrete data, perform vectorization processing on the discrete data;

C. When the type of the new patient data is continuous data, the data is not vectorized.
The computer-readable storage medium according to claim 12, when the computer instructions are executed on the computer, the computer is caused to further execute the following steps:

Set the number of clusters to k, and randomly select k samples as the initial cluster centers;

Respectively calculating the Mahalanobis distance from each sample in the sample data to each cluster center;

Based on the Mahalanobis distance from each sample to each cluster center, the minimum Mahalanobis distance corresponding to each sample is selected, and each sample is classified into the group where the cluster center corresponding to the minimum Mahalanobis distance is located, until all All the samples in the sample data are divided, and the first clustering result is obtained.
The computer-readable storage medium according to claim 16, when the computer instructions are executed on the computer, the computer is caused to further perform the following steps:

According to the Mahalanobis distance, calculate the sum of squared errors of the clusters corresponding to the first clustering result;

In the non-initial case, K non-initial clustering centers are calculated according to the clustering result generated last time;

Calculate the Mahalanobis distance of each sample in the sample data to each non-initial cluster center, select the minimum Mahalanobis distance corresponding to each sample, and divide each sample into the non-initial distance corresponding to the minimum Mahalanobis distance. The cluster where the cluster center is located generates a new clustering result.
The computer-readable storage medium according to claim 17, when the computer instructions are executed on the computer, the computer is caused to further perform the following steps:

Based on the Mahalanobis distance, calculate and obtain the sum of squared errors of the clusters corresponding to the new clustering results;

Comparing the sum of square errors of the clusters corresponding to the first grouping result and the sum of square errors of the clusters corresponding to the new grouping result, and obtaining a comparison result;

Based on the comparison result, the grouping result corresponding to the cluster with the smallest sum of square errors corresponding to the two grouping results is selected as the final grouping result.
An intelligent grouping device for similar patients, wherein the intelligent grouping device for similar patients includes:

The first acquisition module is used to acquire new patient data to be matched, and the new patient data includes multiple disease characteristic data;

The first processing module is configured to perform vectorization processing on each disease feature data of the new patient to obtain a disease feature word vector corresponding to the new patient;

The first calculation module is used to calculate the Mahalanobis distance between the new patient data and each historical patient data in the preset disease feature database based on the disease feature word vector, wherein the disease feature database includes multiple Disease information group, similar disease characteristics belong to the same disease information group;

A sorting module for sorting the Mahalanobis distances to obtain a sorting result;

The determining module is configured to determine the matched disease information group corresponding to the new patient data based on the sorting result, wherein the disease information group respectively contains different clinical outcome information.
The intelligent grouping device for similar patients according to claim 19, wherein:

The intelligent grouping device for similar patients further includes:

The sample data acquisition module is used to acquire sample data including outcome variables;

The second processing module is configured to preprocess the sample data based on the type of the sample data to obtain a discretized word vector;

The second calculation module is configured to calculate the Mahalanobis distance between each sample in the sample data based on the discretized word vector;

A clustering module, configured to cluster the sample data based on the Mahalanobis distance between each sample in the sample data to obtain a clustering result;

An extraction module, configured to obtain multiple disease information groups contained in the sample data based on the grouping result, and extract the characteristics of the disease information group;

The query module is used to query a preset disease condition description database based on the characteristics of the disease information group, and output the disease condition description corresponding to the characteristics of the disease information group.