CN113793667A - Disease prediction method and device based on cluster analysis and computer equipment - Google Patents

Disease prediction method and device based on cluster analysis and computer equipment Download PDF

Info

Publication number
CN113793667A
CN113793667A CN202111086515.4A CN202111086515A CN113793667A CN 113793667 A CN113793667 A CN 113793667A CN 202111086515 A CN202111086515 A CN 202111086515A CN 113793667 A CN113793667 A CN 113793667A
Authority
CN
China
Prior art keywords
clustering
matrix
patient
pathological
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111086515.4A
Other languages
Chinese (zh)
Inventor
徐啸
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202111086515.4A priority Critical patent/CN113793667A/en
Publication of CN113793667A publication Critical patent/CN113793667A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Abstract

The application discloses a disease prediction method and device based on cluster analysis and computer equipment, relates to the technical field of big data processing, and can solve the technical problems that the current cluster analysis mode cannot effectively combine patient information and pathological characteristic information, so that the clustering effect is not accurate enough, the clustering efficiency is low, and further effective data support cannot be provided for disease prediction. The method comprises the following steps: acquiring sample pathological data, and constructing a cluster analysis matrix according to the sample pathological data, wherein the row attribute of the cluster analysis matrix is a patient main body, and the column attribute of the cluster analysis matrix is a pathological feature; performing dimensionality reduction processing on the clustering analysis matrix according to a preset matrix decomposition algorithm to obtain a patient clustering matrix and a pathological feature clustering matrix; respectively determining a patient clustering result and a pathological feature clustering result by using the patient clustering matrix and the pathological feature clustering matrix; and determining a target prediction disease according to the patient clustering result and the pathological feature clustering result.

Description

Disease prediction method and device based on cluster analysis and computer equipment
Technical Field
The present application relates to the field of big data processing technologies, and in particular, to a disease prediction method and apparatus based on cluster analysis, and a computer device.
Background
With the rapid development of medical and electronic information technology, nowadays, the medical electronic record and the historical visit information of a patient can be completely stored in the medical record of a hospital. The technology provides analysis data for doctors to determine and treat patients of the same type while recording the illness state and the treatment process of the patients.
Patients with similar pathological data tend to be more likely to have the same class of disease; on the other hand, patients with the same class of disease often have similar pathological data. Therefore, clustering analysis is performed on the historical pathological data records of each patient, so that which patients may have similar diseases and which physiological data has similar characteristics and causes the diseases can be known.
The existing method separates the clustering analysis of pathological data and the clustering analysis of patient groups into two independent tasks, finds out relevant pathological features by utilizing the similarity of the pathological data, and then clusters the patients according to the similarity of the pathological data of the patients. The relation between the patient and the pathological feature cluster is ignored, the pathological features and the patient cannot be clustered simultaneously, the patient information and the pathological feature information cannot be effectively combined, the clustering result is not accurate enough, the clustering efficiency is low, and effective data support cannot be provided for disease prediction.
Disclosure of Invention
In view of this, the present application provides a disease prediction method and apparatus based on cluster analysis, and a computer device, which can be used to solve the technical problem that the current cluster analysis method cannot effectively combine patient information and pathological feature information, which results in inaccurate clustering effect and low clustering efficiency, and further cannot provide effective data support for disease prediction.
According to an aspect of the present application, there is provided a method for predicting a disease based on cluster analysis, the method including:
acquiring sample pathological data, and constructing a cluster analysis matrix according to the sample pathological data, wherein the row attribute of the cluster analysis matrix is a patient main body, and the column attribute of the cluster analysis matrix is a pathological feature;
performing dimensionality reduction processing on the clustering analysis matrix according to a preset matrix decomposition algorithm to obtain a patient clustering matrix and a pathological feature clustering matrix;
respectively determining a patient clustering result and a pathological feature clustering result by using the patient clustering matrix and the pathological feature clustering matrix;
and determining a target prediction disease according to the patient clustering result and the pathological feature clustering result.
According to another aspect of the present application, there is provided a disease prediction apparatus based on cluster analysis, the apparatus including:
the system comprises a construction module, a data acquisition module and a data processing module, wherein the construction module is used for acquiring sample pathological data and constructing a cluster analysis matrix according to the sample pathological data, the row attribute of the cluster analysis matrix is a patient main body, and the column attribute of the cluster analysis matrix is a pathological feature;
the processing module is used for carrying out dimensionality reduction processing on the clustering analysis matrix according to a preset matrix decomposition algorithm to obtain a patient clustering matrix and a pathological feature clustering matrix;
the first determining module is used for respectively determining a patient clustering result and a pathological feature clustering result by utilizing the patient clustering matrix and the pathological feature clustering matrix;
and the second determination module is used for determining a target prediction disease according to the patient clustering result and the pathological feature clustering result.
According to yet another aspect of the present application, there is provided a storage medium having stored thereon a computer program which, when executed by a processor, implements the above cluster analysis-based disease prediction method.
According to yet another aspect of the present application, there is provided a computer device comprising a storage medium, a processor, and a computer program stored on the storage medium and executable on the processor, the processor implementing the above-mentioned cluster analysis-based disease prediction method when executing the program.
By means of the technical scheme, compared with the current disease prediction mode, the disease prediction method, the disease prediction device and the computer equipment based on the cluster analysis can firstly establish a cluster analysis matrix based on sample pathological data, and perform dimensionality reduction processing on the cluster analysis matrix according to a preset matrix decomposition algorithm to obtain a patient cluster matrix and a pathological feature cluster matrix; and then, determining a patient clustering result and a pathological feature clustering result respectively by using the patient clustering matrix and the pathological feature clustering matrix, thereby determining a target prediction disease according to the patient clustering result and the pathological feature clustering result. Through the technical scheme in this application, can be when carrying out cluster analysis to patient and pathological feature, can obtain the result of two clusters simultaneously through a step, and then realize the promotion to clustering efficiency. Meanwhile, the mutual relation and influence of the patient information and the pathological characteristic information can be considered, the clustering result can be more accurate, and therefore powerful data support can be provided for disease prediction.
The foregoing description is only an overview of the technical solutions of the present application, and the present application can be implemented according to the content of the description in order to make the technical means of the present application more clearly understood, and the following detailed description of the present application is given in order to make the above and other objects, features, and advantages of the present application more clearly understandable.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application to the disclosed embodiment. In the drawings:
fig. 1 is a schematic flow chart illustrating a disease prediction method based on cluster analysis according to an embodiment of the present application;
fig. 2 is a schematic flow chart of another disease prediction method based on cluster analysis according to an embodiment of the present application;
fig. 3 is a schematic structural diagram illustrating a disease prediction apparatus based on cluster analysis according to an embodiment of the present application;
fig. 4 shows a schematic structural diagram of another disease prediction apparatus based on cluster analysis according to an embodiment of the present application.
Detailed Description
According to the embodiment of the application, the disease can be predicted based on the block chain technology, and specifically, the sample pathological data and the patient pathological data of the target patient can be stored in the nodes of the block chain, so that the privacy and the safety of the medical data are ensured. The block chain referred by the application is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
Aiming at the technical problems that the current clustering analysis mode cannot effectively combine patient information and pathological characteristic information, so that the clustering effect is not accurate enough and the clustering efficiency is low, and further effective data support cannot be provided for disease prediction, the application provides a disease prediction method based on clustering analysis, and as shown in fig. 1, the method comprises the following steps:
101. acquiring sample pathological data, and constructing a cluster analysis matrix according to the sample pathological data, wherein the row attribute of the cluster analysis matrix is a patient main body, and the column attribute of the cluster analysis matrix is pathological characteristics.
The sample pathological data is related medical data with the same data characteristic dimension corresponding to the pathological data of the patient to be subjected to disease prediction, such as characteristic information of age, sex, height, weight, hospital stay number, clinical diagnosis, disease symptoms, examination indexes, operation, disease severity, cost and the like; the clustering analysis matrix is a binarization matrix obtained after binarization processing is carried out on the sample pathological data. In the cluster analysis matrix, the row attributes are patient subjects and the column attributes are pathological features. The pathological data information of each sample patient is used as a row of m-dimensional numerical vectors in the cluster analysis matrix, so that an n-x-m-dimensional cluster analysis matrix R is formed by n pieces of patient informationn*m. Wherein the numerical value of each row represents pathological data information of a sample patient, the pathological data information comprises m pathological features in total, and the numerical value of each column represents the characteristic value of different patients on the pathological features.
The implementation subject of the application can be a device for predicting diseases, can be configured at a client side or a server side, and can firstly construct a clustering analysis matrix based on sample pathological data and perform dimensionality reduction processing on the clustering analysis matrix according to a preset matrix decomposition algorithm to obtain a patient clustering matrix and a pathological feature clustering matrix; and then, determining a patient clustering result and a pathological feature clustering result respectively by using the patient clustering matrix and the pathological feature clustering matrix, thereby determining a target prediction disease according to the patient clustering result and the pathological feature clustering result.
102. And performing dimensionality reduction processing on the clustering analysis matrix according to a preset matrix decomposition algorithm to obtain a patient clustering matrix and a pathological feature clustering matrix.
The predetermined matrix decomposition algorithm may be a random gradient descent method and/or an Alternating Least Squares (ALS) method, and the principle of the matrix decomposition algorithm is to decompose one matrix into a product of several matrices, and the product is infinitely close to the original matrix.
In a specific application scenario, a cluster analysis matrix R is obtainedn*mThereafter, the existing matrix decomposition algorithm can be applied to the cluster analysis matrix Rn*mPerforming dimensionality reduction decomposition to obtain a patient clustering matrix Pn*kPathological feature clustering matrix Ql*mAnd relation feature matrix Ek*l. Wherein the patient clustering matrix Pn*kClustering result, pathological feature clustering matrix Q for characterizing patientsl*mClustering results for characterizing pathological features, relational feature matrix Ek*lRepresenting a patient clustering matrix Pn*kPathological feature clustering matrix Ql*mAn intermediate matrix sharing characteristic relationship therebetween, for passing through the relationship characteristic matrix E in the dimension reduction decomposition processk*lTo obtain a more accurate patient clustering matrix Pn*kAnd pathological feature clustering matrix Ql*mThe clustering result of (2) intoOne-step clustering of patientsn*kPathological feature clustering matrix Ql*mAnd relation feature matrix Ek*lThe product of (c) can be maximally close to the cluster analysis matrix. For the embodiment, the patient clustering matrix P conforming to the constraint of the matrix decomposition algorithm is obtained in the dimension reduction processingn*kPathological feature clustering matrix Ql*mAnd relation feature matrix Ek*lThen, a patient cluster matrix P can be further extracted from the decomposition resultn*kAnd pathological feature clustering matrix Ql*mSo as to be based on a patient clustering matrix Pn*kAnd pathological feature clustering matrix Ql*mAnd simultaneously, the clustering of patients and the analysis of pathological feature clustering are realized.
103. And respectively determining a patient clustering result and a pathological characteristic clustering result by using the patient clustering matrix and the pathological characteristic clustering matrix.
For the present embodiment, the patient clustering matrix P is utilizedn*kDetermining patient clustering results corresponding to each patient in view of the patient clustering matrix Pn*kThe patient clustering method comprises n rows and k columns, wherein each row represents the patient identity characteristics (such as age, sex, work, height, weight and the like) of one patient, and the k columns represent the types of patient clustering division, so that the patient clustering matrix P can be used for determining the patient clustering results corresponding to all patientsn*kDetermining the maximum column value in each row as a patient clustering result corresponding to the patient identity in the row; clustering matrix Q by using pathological featuresl*mWhen determining the clustering result corresponding to each pathological feature, considering the pathological feature clustering matrix Ql*mThe method comprises the following steps that l rows and m columns are included, wherein the l rows represent the classification of pathological feature clustering division, and each column represents a pathological feature (such as the number of hospitalization days, clinical diagnosis, symptoms, examination indexes, operation, disease severity, cost and the like), so that when the pathological feature clustering result is determined, the pathological feature clustering matrix Q can be used for clustering the pathological featuresl*mThe maximum row value in each column corresponds to the pathological feature clustering result.
104. And determining a target prediction disease according to the patient clustering result and the pathological feature clustering result.
In a specific application scenario, after the patient clustering result and the pathological feature clustering result are determined, as an optional mode, a disease condition knowledge base can be further created according to the patient clustering result and the pathological feature clustering result, so that the patient clustering result and the pathological feature clustering result can be applied to clinical pathological diagnosis, online pathological diagnosis and customization of a treatment scheme of a new patient in a disease treatment stage by using the disease condition knowledge base, and can also be applied to other realizable medical scenarios such as prediction of treatment effect and treatment cost possibly generated when the new patient receives a clinical path.
For the embodiment, the method can be further applied to the patient clustering result and the pathological feature clustering result, and the prediction of the corresponding potential or obvious disease type of the target patient is realized according to the pathological data of the patient by combining the user portrait technology. The user portrait technology is based on mathematical modeling of users in the real world by user data, the core work of the user portrait technology is to label the users, and the labels are highly refined feature identifications obtained by analyzing user information, so that label-based analysis and decision are carried out on the users. For the embodiment, after the patient clustering result and the pathological feature clustering result are determined, clustering information corresponding to pathological data of the patient is determined based on the patient clustering result and the pathological feature clustering result, a target user portrait of the target patient is further determined based on the clustering information, a preset disease with the highest similarity to the target user portrait feature is screened in a preset disease list, and finally the preset disease is used as a target prediction disease corresponding to the target patient.
According to the disease prediction method based on cluster analysis in the embodiment, a cluster analysis matrix can be constructed based on sample pathological data, and dimension reduction processing is performed on the cluster analysis matrix according to a preset matrix decomposition algorithm to obtain a patient cluster matrix and a pathological feature cluster matrix; and then, determining a patient clustering result and a pathological feature clustering result respectively by using the patient clustering matrix and the pathological feature clustering matrix, thereby determining a target prediction disease according to the patient clustering result and the pathological feature clustering result. Through the technical scheme in this application, can be when carrying out cluster analysis to patient and pathological feature, can obtain the result of two clusters simultaneously through a step, and then realize the promotion to clustering efficiency. Meanwhile, the mutual relation and influence of the patient information and the pathological characteristic information can be considered, the clustering result can be more accurate, and therefore powerful data support can be provided for disease prediction.
Further, as a refinement and an extension of the specific implementation of the above embodiment, in order to fully illustrate the specific implementation process in this embodiment, another disease prediction method based on cluster analysis is provided, as shown in fig. 2, the method includes:
201. acquiring sample pathological data, and constructing a cluster analysis matrix according to the sample pathological data, wherein the row attribute of the cluster analysis matrix is a patient main body, and the column attribute of the cluster analysis matrix is pathological characteristics.
For this embodiment, the sample pathological data may be binarized by 0-1 to obtain a binary matrix, i.e., a cluster analysis matrix Rn*mAnd the pathological data information of each sample patient is used as a row of m-dimensional numerical vectors in the cluster analysis matrix, so that an n-x-m-dimensional cluster analysis matrix is formed by n pieces of patient information. Correspondingly, when constructing the cluster analysis matrix according to the sample pathological data, the embodiment step 201 may specifically include: carrying out binarization processing on the sample pathological data to obtain pathological characteristics of each patient main body and patient identity characteristics; and constructing a cluster analysis matrix by using the pathological features of the patient and the identity features of the patient, so that the row attribute of the cluster analysis matrix is a patient main body, and the column attribute of the cluster analysis matrix is a pathological feature.
202. And decomposing the clustering analysis matrix into a product of the patient clustering matrix, the pathological feature clustering matrix and the relation feature matrix according to a preset matrix decomposition algorithm, wherein the Frobenius norm calculation result corresponding to the product and the clustering analysis matrix is smaller than a preset threshold value.
Wherein the patient clustering matrix Pn*kThe method comprises n rows and k columns in total, wherein the row attribute is the identity characteristic of a patient, the column attribute is the clustering class of the patient, and a pathological characteristic clustering matrix Ql*mThe method comprises the following steps of sharing l rows and m columns, wherein the row attribute is a pathological feature clustering category, the column attribute is a pathological feature, and the column attribute is a closingIs a feature matrix Ek*lThe method comprises the steps of collecting a plurality of groups of patients, wherein the groups of patients comprise k rows and l columns, the row attribute is a patient clustering class, and the column attribute is a pathological feature clustering class.
For the present embodiment, the matrix R is analyzed for clustersn*mWhen the dimension reduction decomposition processing is performed, matrix parameter values may be set for the dimension reduction matrix in advance, and for example, the parameter values may include parameter values corresponding to n, k, l, and m, and preset value intervals of elements in each matrix. Further, a patient clustering matrix P can be obtained according to the matrix parameter valuesn*kPathological feature clustering matrix Ql*mAnd relation feature matrix Ek*lInitializing a random constant for the data value of each element in the matrix P, and then utilizing the existing random gradient descent method or the alternate least square method to carry out matrix Pn*k,Ql*mAnd Ek*lPerforming iterative update training while applying matrix Pn*k,Ql*mAnd Ek*lFrobenius norm constraint is adopted to make the judgment of the completion of the training when the judgment loss function reaches the convergence state in the training process, and the clustering analysis matrix R is used at the momentn*mAnd matrix Pn*k、Ql*m、Ek*lThe Frobenius norm calculation result corresponding to the difference value of the three matrix products is minimum, namely the matrix Pn*k,Ql*mAnd Ek*lIs best able to recover the cluster analysis matrix Rn*m
The Frobenius norm is characterized by the formula:
Figure BDA0003265748910000081
wherein the content of the first and second substances,
Figure BDA0003265748910000082
is Frobenius norm which is each element R in the matrix RijThe sum of the squares of the absolute values of (a). In the present application, the matrix R is Rn*m-Pn*kEk*lQl*m
203. And respectively determining a patient clustering result and a pathological characteristic clustering result by using the patient clustering matrix and the pathological characteristic clustering matrix.
As an optional manner for this embodiment, step 203 in this embodiment may specifically include: extracting the maximum value of the column attribute of each patient identity characteristic in the patient clustering matrix, and determining the maximum value of the column attribute as the patient clustering result of the patient identity characteristic; and extracting the row attribute maximum value of each pathological feature in the pathological feature clustering matrix, and determining the row attribute maximum value as a pathological feature clustering result of the pathological feature.
204. The method comprises the steps of obtaining pathological data of a target patient, and extracting patient identity data and pathological feature data from the pathological data of the target patient.
Wherein, the target patient is a patient subject to be subjected to disease detection according to pathological data of the patient.
For the embodiment, after obtaining the pathological data of the target patient, the patient identification data and the pathological feature data may be further extracted from the pathological data of the patient according to a preset keyword or feature data extraction template, where the patient identification data may include age, sex, work, height, weight, and the like, and the pathological feature data may include days of stay, clinical diagnosis, symptoms, examination indexes, surgery, severity of disease, cost, and the like.
205. And determining first clustering information corresponding to the patient identity data according to the patient clustering result, and determining second clustering information corresponding to the pathological feature data according to the pathological feature clustering result.
The clustering result of the patient includes the clustering result corresponding to each patient identity, and the clustering result of the pathological features includes the clustering result corresponding to each pathological feature. Therefore, in this embodiment, first clustering information corresponding to the patient identification data may be further determined according to the patient clustering result, and second clustering information corresponding to the pathological feature data may be further determined according to the pathological feature clustering result, where the first clustering information is used to reflect the population attribute classification result corresponding to the target patient, and the second clustering information is used to reflect the attribute classification result corresponding to the pathological feature of the target patient. In this embodiment, the clustering results under two dimensions can be obtained simultaneously through one step, and then through the analysis to a plurality of dimensions, accurate prediction to the disease is favorable to carrying out.
206. A target predicted disease matching the first cluster information and the second cluster information is determined based on a user profile technique.
For this embodiment, when determining the target predicted disease matched with the first clustering information and the second clustering information, as an optional implementation, the step 206 of this embodiment may specifically include: generating a target user portrait of the target patient according to the first clustering information and the second clustering information; and screening a preset disease with the highest similarity with the target user portrait characteristics in a preset disease list, and taking the preset disease as a target prediction disease.
In a specific application scenario, after the first clustering information and the second clustering information of the target patient are determined, the first clustering information and the second clustering information can be respectively used as an independent portrait dimension to construct the portrait of the target user. Accordingly, when generating the target user representation of the target patient according to the first clustering information and the second clustering information, as an optional implementation, the embodiment may specifically include: extracting a first characteristic label of the first clustering information and a second characteristic label of the second clustering information according to a preset label extraction rule; and generating a characteristic label set of the target patient as a target user portrait of the target patient according to the first characteristic label and the second characteristic label.
Correspondingly, a preset disease list can be created in advance, a plurality of preset diseases constructed according to different user portraits are stored in the preset disease list in advance, and corresponding characteristic tag sets are configured according to the corresponding matched user portraits of the different preset diseases. In order to determine a target predicted disease according to a target user portrait, feature similarity calculation can be performed on the target user portrait and user portraits correspondingly configured in a preset disease list, specifically, feature similarity between a feature tag set of a target patient and preset feature tag sets of various preset diseases in the preset disease list can be calculated, and a target predicted disease with the highest feature similarity with the target user portrait can be screened out from the preset disease list based on the feature similarity. As an optional implementation manner, the steps of the embodiment may specifically include: calculating the feature similarity between the feature tag set of the target patient and the preset feature tag set of each preset disease in the preset disease list by using a preset feature distance calculation formula to obtain the feature similarity between the portrait of the target user and each preset disease; and determining the preset disease with the highest similarity with the target user portrait characteristics as a target predicted disease. The preset feature Distance calculation formula may be any Distance function formula suitable for measurement, and may include Euclidean Distance formula (Euclidean Distance), Manhattan Distance formula (Manhattan Distance), Jaccard Distance formula (Jaccard Distance), Mahalanobis Distance formula (Mahalanobis Distance), and the like.
By the disease prediction method based on cluster analysis, a cluster analysis matrix is constructed based on sample pathological data, and dimension reduction processing is carried out on the cluster analysis matrix according to a preset matrix decomposition algorithm to obtain a patient cluster matrix and a pathological feature cluster matrix; and then, determining a patient clustering result and a pathological feature clustering result respectively by using the patient clustering matrix and the pathological feature clustering matrix, thereby determining a target prediction disease according to the patient clustering result and the pathological feature clustering result. The technical scheme can be applied to a matrix decomposition algorithm and a user image technology to realize intelligent prediction of diseases. When clustering analysis is carried out on the patient and the pathological characteristics, two clustering results can be obtained simultaneously through one step, and then the clustering efficiency is improved. Meanwhile, the mutual relation and influence of the patient information and the pathological characteristic information can be considered, the clustering result can be more accurate, and therefore powerful data support can be provided for disease prediction.
Further, as a specific implementation of the method shown in fig. 1 and fig. 2, an embodiment of the present application provides a disease prediction apparatus based on cluster analysis, as shown in fig. 3, the apparatus includes: a construction module 31, a processing module 32, a first determination module 33, a second determination module 34;
the building module 31 is used for obtaining sample pathological data and building a cluster analysis matrix according to the sample pathological data, wherein the row attribute of the cluster analysis matrix is a patient subject, and the column attribute of the cluster analysis matrix is a pathological feature;
the processing module 32 is configured to perform dimension reduction processing on the clustering analysis matrix according to a preset matrix decomposition algorithm to obtain a patient clustering matrix and a pathological feature clustering matrix;
the first determining module 33 is configured to determine a patient clustering result and a pathological feature clustering result by using the patient clustering matrix and the pathological feature clustering matrix, respectively;
and the second determination module 34 is configured to determine a target predicted disease according to the patient clustering result and the pathological feature clustering result.
In a specific application scenario, when constructing the cluster analysis matrix according to the sample pathological data, as shown in fig. 4, the constructing module 31 may specifically include: a processing unit 311, a construction unit 312;
the processing unit 311 is configured to perform binarization processing on the sample pathological data to obtain pathological features and patient identity features of each patient;
the constructing unit 312 may be configured to construct a cluster analysis matrix by using the pathological features of the patient and the identity features of the patient, so that the row attribute of the cluster analysis matrix is the subject of the patient, and the column attribute of the cluster analysis matrix is the pathological feature.
Correspondingly, when performing dimension reduction processing on the clustering analysis matrix according to the preset matrix decomposition algorithm to obtain the patient clustering matrix and the pathological feature clustering matrix, the processing module 32 may be specifically configured to: decomposing the clustering analysis matrix into a product of a patient clustering matrix, a pathological feature clustering matrix and a relation feature matrix according to a preset matrix decomposition algorithm, and enabling the Frobenius norm calculation result corresponding to the product and the clustering analysis matrix to be smaller than a preset threshold value; the row attribute of the patient clustering matrix is a patient identity characteristic, the column attribute of the patient clustering matrix is a patient clustering category, the row attribute of the pathological feature clustering matrix is a pathological feature clustering category, the column attribute of the pathological feature clustering matrix is a pathological feature, the row attribute of the relation feature matrix is a patient clustering category, and the column attribute of the relation feature matrix is a pathological feature clustering category.
In a specific application scenario, when the patient clustering matrix and the pathological feature clustering matrix are used to determine the patient clustering result and the pathological feature clustering result respectively, as shown in fig. 4, the first determining module 33 may specifically include: a first extraction unit 331, a second extraction unit 332;
the first extraction unit 331 is configured to extract a maximum column attribute value of each patient identity in the patient clustering matrix, and determine the maximum column attribute value as a patient clustering result of the patient identity;
the second extracting unit 332 may be configured to extract a row attribute maximum value of each pathological feature in the pathological feature clustering matrix, and determine the row attribute maximum value as a pathological feature clustering result of the pathological feature.
Correspondingly, when determining the target predicted disease according to the patient clustering result and the pathological feature clustering result, as shown in fig. 4, the second determining module 34 may specifically include: a third extraction unit 341, a first determination unit 342, a second determination unit 343, a third determination unit 344;
a third extraction unit 341, configured to obtain patient pathology data of a target patient, and extract patient identity data and pathology feature data from the patient pathology data;
the first determining unit 342 is configured to determine first clustering information corresponding to the patient identity data according to the patient clustering result;
the second determining unit 343, configured to determine, according to the pathological feature clustering result, second clustering information corresponding to the pathological feature data;
a third determining unit 344 is operable to determine a target predicted disease matching the first and second clustering information based on a user profile technique.
Accordingly, when determining a target predicted disease matching the first cluster information and the second cluster information based on the user profile technique, the third determining unit 344 is specifically configured to generate a target user profile of the target patient based on the first cluster information and the second cluster information; and screening a preset disease with the highest similarity with the target user portrait characteristics in a preset disease list, and taking the preset disease as a target prediction disease.
In a specific application scenario, when a target user portrait of a target patient is generated according to the first clustering information and the second clustering information, the third determining unit 344 is specifically configured to extract a first feature tag of the first clustering information and a second feature tag of the second clustering information according to a preset tag extraction rule; generating a feature tag set of the target patient according to the first feature tag and the second feature tag, wherein the feature tag set is used as a target user portrait of the target patient;
correspondingly, when determining the target predicted disease with the highest feature similarity to the portrait of the target user, the third determining unit 344 is specifically configured to calculate the feature similarity between the feature tag set of the target patient and the preset feature tag set of each preset disease in the preset disease list by using a preset feature distance calculation formula, so as to obtain the feature similarity between the portrait of the target user and each preset disease; and determining the preset disease with the highest similarity with the target user portrait characteristics as a target predicted disease.
It should be noted that other corresponding descriptions of the functional units related to the disease prediction apparatus based on cluster analysis provided in this embodiment may refer to the corresponding descriptions in fig. 1 to fig. 2, and are not repeated herein.
Based on the method shown in fig. 1 to fig. 2, correspondingly, the present embodiment further provides a storage medium, which may be volatile or nonvolatile, and has computer readable instructions stored thereon, and when the computer readable instructions are executed by a processor, the method for predicting a disease based on cluster analysis shown in fig. 1 to fig. 2 is implemented.
Based on such understanding, the technical solution of the present application may be embodied in the form of a software product, which may be stored in a storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, or the like), and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device, or the like) to execute the method of the embodiments of the present application.
Based on the method shown in fig. 1 to fig. 2 and the virtual device embodiments shown in fig. 3 and fig. 4, in order to achieve the above object, the present embodiment further provides a computer device, where the computer device includes a storage medium and a processor; a storage medium for storing a computer program; a processor for executing a computer program to implement the above-described disease prediction method based on cluster analysis as shown in fig. 1 to 2.
Optionally, the computer device may further include a user interface, a network interface, a camera, Radio Frequency (RF) circuitry, a sensor, audio circuitry, a WI-FI module, and so forth. The user interface may include a Display screen (Display), an input unit such as a keypad (Keyboard), etc., and the optional user interface may also include a USB interface, a card reader interface, etc. The network interface may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), etc.
It will be understood by those skilled in the art that the present embodiment provides a computer device structure that is not limited to the physical device, and may include more or less components, or some components in combination, or a different arrangement of components.
The storage medium may further include an operating system and a network communication module. The operating system is a program that manages the hardware and software resources of the computer device described above, supporting the operation of information handling programs and other software and/or programs. The network communication module is used for realizing communication among components in the storage medium and communication with other hardware and software in the information processing entity device.
Through the above description of the embodiments, those skilled in the art will clearly understand that the present application can be implemented by software plus a necessary general hardware platform, and can also be implemented by hardware.
By applying the technical scheme, compared with the prior art, the method can firstly construct a clustering analysis matrix based on sample pathological data, and perform dimensionality reduction processing on the clustering analysis matrix according to a preset matrix decomposition algorithm to obtain a patient clustering matrix and a pathological feature clustering matrix; and then, determining a patient clustering result and a pathological feature clustering result respectively by using the patient clustering matrix and the pathological feature clustering matrix, thereby determining a target prediction disease according to the patient clustering result and the pathological feature clustering result. The technical scheme can be applied to a matrix decomposition algorithm and a user image technology to realize intelligent prediction of diseases. When clustering analysis is carried out on the patient and the pathological characteristics, two clustering results can be obtained simultaneously through one step, and then the clustering efficiency is improved. Meanwhile, the mutual relation and influence of the patient information and the pathological characteristic information can be considered, the clustering result can be more accurate, and therefore powerful data support can be provided for disease prediction.
Those skilled in the art will appreciate that the figures are merely schematic representations of one preferred implementation scenario and that the blocks or flow diagrams in the figures are not necessarily required to practice the present application. Those skilled in the art will appreciate that the modules in the devices in the implementation scenario may be distributed in the devices in the implementation scenario according to the description of the implementation scenario, or may be located in one or more devices different from the present implementation scenario with corresponding changes. The modules of the implementation scenario may be combined into one module, or may be further split into a plurality of sub-modules.
The above application serial numbers are for description purposes only and do not represent the superiority or inferiority of the implementation scenarios. The above disclosure is only a few specific implementation scenarios of the present application, but the present application is not limited thereto, and any variations that can be made by those skilled in the art are intended to fall within the scope of the present application.

Claims (10)

1. A disease prediction method based on cluster analysis is characterized by comprising the following steps:
acquiring sample pathological data, and constructing a cluster analysis matrix according to the sample pathological data, wherein the row attribute of the cluster analysis matrix is a patient main body, and the column attribute of the cluster analysis matrix is a pathological feature;
performing dimensionality reduction processing on the clustering analysis matrix according to a preset matrix decomposition algorithm to obtain a patient clustering matrix and a pathological feature clustering matrix;
respectively determining a patient clustering result and a pathological feature clustering result by using the patient clustering matrix and the pathological feature clustering matrix;
and determining a target prediction disease according to the patient clustering result and the pathological feature clustering result.
2. The method of claim 1, wherein said constructing a cluster analysis matrix from the sample pathology data comprises:
carrying out binarization processing on the sample pathological data to obtain pathological characteristics of each patient main body and patient identity characteristics;
and constructing a cluster analysis matrix by using the pathological features of the patient and the identity features of the patient, so that the row attributes of the cluster analysis matrix are the main body of the patient, and the column attributes of the cluster analysis matrix are the pathological features.
3. The method of claim 1, wherein the performing the dimensionality reduction on the cluster analysis matrix according to a predetermined matrix decomposition algorithm to obtain a patient cluster matrix and a pathological feature cluster matrix comprises:
decomposing the clustering analysis matrix into a product of a patient clustering matrix, a pathological feature clustering matrix and a relation feature matrix according to a preset matrix decomposition algorithm, wherein the Frobenius norm calculation result corresponding to the product and the clustering analysis matrix is smaller than a preset threshold value;
the row attribute of the patient clustering matrix is a patient identity characteristic, the column attribute of the patient clustering matrix is a patient clustering category, the row attribute of the pathological feature clustering matrix is a pathological feature clustering category, the column attribute of the pathological feature clustering matrix is a pathological feature, the row attribute of the relation feature matrix is the patient clustering category, and the column attribute of the relation feature matrix is the pathological feature clustering category.
4. The method of claim 3, wherein said determining patient cluster results and pathology cluster results using said patient cluster matrix and said pathology cluster matrix, respectively, comprises:
extracting the maximum column attribute value of each patient identity feature in the patient clustering matrix, and determining the maximum column attribute value as a patient clustering result of the patient identity feature;
and extracting the row attribute maximum value of each pathological feature in the pathological feature clustering matrix, and determining the row attribute maximum value as a pathological feature clustering result of the pathological feature.
5. The method of claim 1, wherein determining a target predicted disease from the patient cluster results and the pathology cluster results comprises:
acquiring patient pathological data of a target patient, and extracting patient identity data and pathological feature data from the patient pathological data;
determining first clustering information corresponding to the patient identity data according to the patient clustering result;
determining second clustering information corresponding to the pathological feature data according to the pathological feature clustering result;
determining a target predicted disease matching the first and second cluster information based on a user profile technique.
6. The method of claim 5, wherein the determining a target predicted disease matching the first cluster information and the second cluster information based on a user profile technique comprises:
generating a target user representation of the target patient according to the first clustering information and the second clustering information;
and screening a preset disease with the highest similarity with the target user portrait characteristics in a preset disease list, and taking the preset disease as a target prediction disease.
7. The method of claim 6, wherein generating the target user representation of the target patient based on the first cluster information and the second cluster information comprises:
extracting a first feature label of the first clustering information and a second feature label of the second clustering information according to a preset label extraction rule;
generating a feature tag set of the target patient as a target user representation of the target patient according to the first feature tag and the second feature tag;
the method for predicting the target user portrait features comprises the following steps of screening a preset disease with the highest similarity with the target user portrait features in a preset disease list, and taking the preset disease as a target predicted disease, wherein the method specifically comprises the following steps:
calculating the feature similarity between the feature tag set of the target patient and the preset feature tag set of each preset disease in a preset disease list by using a preset feature distance calculation formula to obtain the feature similarity between the target user portrait and each preset disease;
and determining the preset disease with the highest similarity to the target user portrait characteristics as a target predicted disease.
8. A disease prediction apparatus based on cluster analysis, comprising:
the system comprises a construction module, a data acquisition module and a data processing module, wherein the construction module is used for acquiring sample pathological data and constructing a cluster analysis matrix according to the sample pathological data, the row attribute of the cluster analysis matrix is a patient main body, and the column attribute of the cluster analysis matrix is a pathological feature;
the processing module is used for carrying out dimensionality reduction processing on the clustering analysis matrix according to a preset matrix decomposition algorithm to obtain a patient clustering matrix and a pathological feature clustering matrix;
the first determining module is used for respectively determining a patient clustering result and a pathological feature clustering result by utilizing the patient clustering matrix and the pathological feature clustering matrix;
and the second determination module is used for determining a target prediction disease according to the patient clustering result and the pathological feature clustering result.
9. A storage medium having stored thereon a computer program, which when executed by a processor implements the method for cluster analysis based disease prediction according to any of claims 1 to 7.
10. A computer device comprising a storage medium, a processor and a computer program stored on the storage medium and executable on the processor, wherein the processor implements the cluster analysis-based disease prediction method of any one of claims 1 to 7 when executing the program.
CN202111086515.4A 2021-09-16 2021-09-16 Disease prediction method and device based on cluster analysis and computer equipment Pending CN113793667A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111086515.4A CN113793667A (en) 2021-09-16 2021-09-16 Disease prediction method and device based on cluster analysis and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111086515.4A CN113793667A (en) 2021-09-16 2021-09-16 Disease prediction method and device based on cluster analysis and computer equipment

Publications (1)

Publication Number Publication Date
CN113793667A true CN113793667A (en) 2021-12-14

Family

ID=79183571

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111086515.4A Pending CN113793667A (en) 2021-09-16 2021-09-16 Disease prediction method and device based on cluster analysis and computer equipment

Country Status (1)

Country Link
CN (1) CN113793667A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115376698A (en) * 2022-10-25 2022-11-22 北京鹰瞳科技发展股份有限公司 Apparatus, method, and storage medium for predicting progression of fundus disease
CN116798646A (en) * 2023-08-17 2023-09-22 四川互慧软件有限公司 Snake injury prognosis prediction method and device based on clustering algorithm and electronic equipment

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1977283A (en) * 2003-08-14 2007-06-06 美国医软科技公司 Methods and system for intelligent qualitative and quantitative analysis for medical diagnosis
KR20140090483A (en) * 2013-01-09 2014-07-17 경희대학교 산학협력단 Method for clustering health-information
CN104915560A (en) * 2015-06-11 2015-09-16 万达信息股份有限公司 Method for disease diagnosis and treatment scheme based on generalized neural network clustering
CN107658023A (en) * 2017-09-25 2018-02-02 泰康保险集团股份有限公司 Disease forecasting method, apparatus, medium and electronic equipment
CN108986908A (en) * 2018-05-31 2018-12-11 平安医疗科技有限公司 Interrogation data processing method, device, computer equipment and storage medium
CN109360658A (en) * 2018-11-01 2019-02-19 北京航空航天大学 A kind of the disease pattern method for digging and device of word-based vector model
CN109686442A (en) * 2018-12-25 2019-04-26 刘万里 Method and system are determined based on the gastroesophageal reflux disease risk factor of machine learning
CN110189803A (en) * 2019-06-05 2019-08-30 南京理工大学 The disease risk factor extracting method combined based on cluster with classification
CN110993113A (en) * 2019-11-21 2020-04-10 广西大学 LncRNA-disease relation prediction method and system based on MF-SDAE

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1977283A (en) * 2003-08-14 2007-06-06 美国医软科技公司 Methods and system for intelligent qualitative and quantitative analysis for medical diagnosis
KR20140090483A (en) * 2013-01-09 2014-07-17 경희대학교 산학협력단 Method for clustering health-information
CN104915560A (en) * 2015-06-11 2015-09-16 万达信息股份有限公司 Method for disease diagnosis and treatment scheme based on generalized neural network clustering
CN107658023A (en) * 2017-09-25 2018-02-02 泰康保险集团股份有限公司 Disease forecasting method, apparatus, medium and electronic equipment
CN108986908A (en) * 2018-05-31 2018-12-11 平安医疗科技有限公司 Interrogation data processing method, device, computer equipment and storage medium
CN109360658A (en) * 2018-11-01 2019-02-19 北京航空航天大学 A kind of the disease pattern method for digging and device of word-based vector model
CN109686442A (en) * 2018-12-25 2019-04-26 刘万里 Method and system are determined based on the gastroesophageal reflux disease risk factor of machine learning
CN110189803A (en) * 2019-06-05 2019-08-30 南京理工大学 The disease risk factor extracting method combined based on cluster with classification
CN110993113A (en) * 2019-11-21 2020-04-10 广西大学 LncRNA-disease relation prediction method and system based on MF-SDAE

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115376698A (en) * 2022-10-25 2022-11-22 北京鹰瞳科技发展股份有限公司 Apparatus, method, and storage medium for predicting progression of fundus disease
CN116798646A (en) * 2023-08-17 2023-09-22 四川互慧软件有限公司 Snake injury prognosis prediction method and device based on clustering algorithm and electronic equipment
CN116798646B (en) * 2023-08-17 2023-11-24 四川互慧软件有限公司 Snake injury prognosis prediction method and device based on clustering algorithm and electronic equipment

Similar Documents

Publication Publication Date Title
Greener et al. A guide to machine learning for biologists
Maleki et al. A k-NN method for lung cancer prognosis with the use of a genetic algorithm for feature selection
Zhou et al. From micro to macro: data driven phenotyping by densification of longitudinal electronic medical records
Zhang et al. Comparison of machine learning methods for stationary wavelet entropy-based multiple sclerosis detection: decision tree, k-nearest neighbors, and support vector machine
JP2022031730A (en) System and method for modeling probability distribution
Pölsterl et al. Survival analysis for high-dimensional, heterogeneous medical data: Exploring feature extraction as an alternative to feature selection
Zhou et al. Automatic recognition and annotation of gene expression patterns of fly embryos
CN109344889B (en) Brain disease classification apparatus, user terminal, and computer-readable storage medium
CN113793667A (en) Disease prediction method and device based on cluster analysis and computer equipment
Mishra et al. Detection of breast cancer tumours based on feature reduction and classification of thermograms
Acharya et al. Deep convolutional network for breast cancer classification: enhanced loss function (ELF)
Agarwal et al. Survival prediction based on histopathology imaging and clinical data: A novel, whole slide cnn approach
Agrawal et al. Health services data: big data analytics for deriving predictive healthcare insights
CN116259415A (en) Patient medicine taking compliance prediction method based on machine learning
de Carvalho Brito et al. COVID-index: A texture-based approach to classifying lung lesions based on CT images
Rajeshwari et al. Modified filter based feature selection technique for dermatology dataset using beetle swarm optimization
Alaria et al. Design Simulation and Assessment of Prediction of Mortality in Intensive Care Unit Using Intelligent Algorithms
Hammad et al. A hybrid deep learning approach for COVID-19 detection based on genomic image processing techniques
Al-Ahmadi et al. Pattern recognition of omicron variants from amalgamated multi-focus EEG signals and X-ray images using deep transfer learning
CN112445846A (en) Medical item identification method, device, equipment and computer readable storage medium
Friedman et al. Why temporal persistence of biometric features, as assessed by the intraclass correlation coefficient, is so valuable for classification performance
Wang et al. Signal subgraph estimation via vertex screening
Yadav Cluster based-image descriptors and fractional hybrid optimization for medical image retrieval
Das et al. Managing uncertainty in imputing missing symptom value for healthcare of rural India
Canino et al. Feature selection model for diagnosis, electronic medical records and geographical data correlation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination