CN113793667A

CN113793667A - Disease prediction method and device based on cluster analysis and computer equipment

Info

Publication number: CN113793667A
Application number: CN202111086515.4A
Authority: CN
Inventors: 徐啸
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2021-09-16
Filing date: 2021-09-16
Publication date: 2021-12-14

Abstract

The application discloses a disease prediction method and device based on cluster analysis and computer equipment, relates to the technical field of big data processing, and can solve the technical problems that the current cluster analysis mode cannot effectively combine patient information and pathological characteristic information, so that the clustering effect is not accurate enough, the clustering efficiency is low, and further effective data support cannot be provided for disease prediction. The method comprises the following steps: acquiring sample pathological data, and constructing a cluster analysis matrix according to the sample pathological data, wherein the row attribute of the cluster analysis matrix is a patient main body, and the column attribute of the cluster analysis matrix is a pathological feature; performing dimensionality reduction processing on the clustering analysis matrix according to a preset matrix decomposition algorithm to obtain a patient clustering matrix and a pathological feature clustering matrix; respectively determining a patient clustering result and a pathological feature clustering result by using the patient clustering matrix and the pathological feature clustering matrix; and determining a target prediction disease according to the patient clustering result and the pathological feature clustering result.

Description

Disease prediction method and device based on cluster analysis and computer equipment

Technical Field

The present application relates to the field of big data processing technologies, and in particular, to a disease prediction method and apparatus based on cluster analysis, and a computer device.

Background

With the rapid development of medical and electronic information technology, nowadays, the medical electronic record and the historical visit information of a patient can be completely stored in the medical record of a hospital. The technology provides analysis data for doctors to determine and treat patients of the same type while recording the illness state and the treatment process of the patients.

Patients with similar pathological data tend to be more likely to have the same class of disease; on the other hand, patients with the same class of disease often have similar pathological data. Therefore, clustering analysis is performed on the historical pathological data records of each patient, so that which patients may have similar diseases and which physiological data has similar characteristics and causes the diseases can be known.

The existing method separates the clustering analysis of pathological data and the clustering analysis of patient groups into two independent tasks, finds out relevant pathological features by utilizing the similarity of the pathological data, and then clusters the patients according to the similarity of the pathological data of the patients. The relation between the patient and the pathological feature cluster is ignored, the pathological features and the patient cannot be clustered simultaneously, the patient information and the pathological feature information cannot be effectively combined, the clustering result is not accurate enough, the clustering efficiency is low, and effective data support cannot be provided for disease prediction.

Disclosure of Invention

In view of this, the present application provides a disease prediction method and apparatus based on cluster analysis, and a computer device, which can be used to solve the technical problem that the current cluster analysis method cannot effectively combine patient information and pathological feature information, which results in inaccurate clustering effect and low clustering efficiency, and further cannot provide effective data support for disease prediction.

According to an aspect of the present application, there is provided a method for predicting a disease based on cluster analysis, the method including:

acquiring sample pathological data, and constructing a cluster analysis matrix according to the sample pathological data, wherein the row attribute of the cluster analysis matrix is a patient main body, and the column attribute of the cluster analysis matrix is a pathological feature;

performing dimensionality reduction processing on the clustering analysis matrix according to a preset matrix decomposition algorithm to obtain a patient clustering matrix and a pathological feature clustering matrix;

respectively determining a patient clustering result and a pathological feature clustering result by using the patient clustering matrix and the pathological feature clustering matrix;

and determining a target prediction disease according to the patient clustering result and the pathological feature clustering result.

According to another aspect of the present application, there is provided a disease prediction apparatus based on cluster analysis, the apparatus including:

the system comprises a construction module, a data acquisition module and a data processing module, wherein the construction module is used for acquiring sample pathological data and constructing a cluster analysis matrix according to the sample pathological data, the row attribute of the cluster analysis matrix is a patient main body, and the column attribute of the cluster analysis matrix is a pathological feature;

the processing module is used for carrying out dimensionality reduction processing on the clustering analysis matrix according to a preset matrix decomposition algorithm to obtain a patient clustering matrix and a pathological feature clustering matrix;

the first determining module is used for respectively determining a patient clustering result and a pathological feature clustering result by utilizing the patient clustering matrix and the pathological feature clustering matrix;

and the second determination module is used for determining a target prediction disease according to the patient clustering result and the pathological feature clustering result.

According to yet another aspect of the present application, there is provided a storage medium having stored thereon a computer program which, when executed by a processor, implements the above cluster analysis-based disease prediction method.

According to yet another aspect of the present application, there is provided a computer device comprising a storage medium, a processor, and a computer program stored on the storage medium and executable on the processor, the processor implementing the above-mentioned cluster analysis-based disease prediction method when executing the program.

By means of the technical scheme, compared with the current disease prediction mode, the disease prediction method, the disease prediction device and the computer equipment based on the cluster analysis can firstly establish a cluster analysis matrix based on sample pathological data, and perform dimensionality reduction processing on the cluster analysis matrix according to a preset matrix decomposition algorithm to obtain a patient cluster matrix and a pathological feature cluster matrix; and then, determining a patient clustering result and a pathological feature clustering result respectively by using the patient clustering matrix and the pathological feature clustering matrix, thereby determining a target prediction disease according to the patient clustering result and the pathological feature clustering result. Through the technical scheme in this application, can be when carrying out cluster analysis to patient and pathological feature, can obtain the result of two clusters simultaneously through a step, and then realize the promotion to clustering efficiency. Meanwhile, the mutual relation and influence of the patient information and the pathological characteristic information can be considered, the clustering result can be more accurate, and therefore powerful data support can be provided for disease prediction.

The foregoing description is only an overview of the technical solutions of the present application, and the present application can be implemented according to the content of the description in order to make the technical means of the present application more clearly understood, and the following detailed description of the present application is given in order to make the above and other objects, features, and advantages of the present application more clearly understandable.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application to the disclosed embodiment. In the drawings:

fig. 1 is a schematic flow chart illustrating a disease prediction method based on cluster analysis according to an embodiment of the present application;

fig. 2 is a schematic flow chart of another disease prediction method based on cluster analysis according to an embodiment of the present application;

fig. 3 is a schematic structural diagram illustrating a disease prediction apparatus based on cluster analysis according to an embodiment of the present application;

fig. 4 shows a schematic structural diagram of another disease prediction apparatus based on cluster analysis according to an embodiment of the present application.

Detailed Description

According to the embodiment of the application, the disease can be predicted based on the block chain technology, and specifically, the sample pathological data and the patient pathological data of the target patient can be stored in the nodes of the block chain, so that the privacy and the safety of the medical data are ensured. The block chain referred by the application is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

Aiming at the technical problems that the current clustering analysis mode cannot effectively combine patient information and pathological characteristic information, so that the clustering effect is not accurate enough and the clustering efficiency is low, and further effective data support cannot be provided for disease prediction, the application provides a disease prediction method based on clustering analysis, and as shown in fig. 1, the method comprises the following steps:

101. acquiring sample pathological data, and constructing a cluster analysis matrix according to the sample pathological data, wherein the row attribute of the cluster analysis matrix is a patient main body, and the column attribute of the cluster analysis matrix is pathological characteristics.

The sample pathological data is related medical data with the same data characteristic dimension corresponding to the pathological data of the patient to be subjected to disease prediction, such as characteristic information of age, sex, height, weight, hospital stay number, clinical diagnosis, disease symptoms, examination indexes, operation, disease severity, cost and the like; the clustering analysis matrix is a binarization matrix obtained after binarization processing is carried out on the sample pathological data. In the cluster analysis matrix, the row attributes are patient subjects and the column attributes are pathological features. The pathological data information of each sample patient is used as a row of m-dimensional numerical vectors in the cluster analysis matrix, so that an n-x-m-dimensional cluster analysis matrix R is formed by n pieces of patient information^n*m. Wherein the numerical value of each row represents pathological data information of a sample patient, the pathological data information comprises m pathological features in total, and the numerical value of each column represents the characteristic value of different patients on the pathological features.

The implementation subject of the application can be a device for predicting diseases, can be configured at a client side or a server side, and can firstly construct a clustering analysis matrix based on sample pathological data and perform dimensionality reduction processing on the clustering analysis matrix according to a preset matrix decomposition algorithm to obtain a patient clustering matrix and a pathological feature clustering matrix; and then, determining a patient clustering result and a pathological feature clustering result respectively by using the patient clustering matrix and the pathological feature clustering matrix, thereby determining a target prediction disease according to the patient clustering result and the pathological feature clustering result.

102. And performing dimensionality reduction processing on the clustering analysis matrix according to a preset matrix decomposition algorithm to obtain a patient clustering matrix and a pathological feature clustering matrix.

The predetermined matrix decomposition algorithm may be a random gradient descent method and/or an Alternating Least Squares (ALS) method, and the principle of the matrix decomposition algorithm is to decompose one matrix into a product of several matrices, and the product is infinitely close to the original matrix.

In a specific application scenario, a cluster analysis matrix R is obtained^n*mThereafter, the existing matrix decomposition algorithm can be applied to the cluster analysis matrix R^n*mPerforming dimensionality reduction decomposition to obtain a patient clustering matrix P^n*kPathological feature clustering matrix Q^l*mAnd relation feature matrix E^k*l. Wherein the patient clustering matrix P^n*kClustering result, pathological feature clustering matrix Q for characterizing patients^l*mClustering results for characterizing pathological features, relational feature matrix E^k*lRepresenting a patient clustering matrix P^n*kPathological feature clustering matrix Q^l*mAn intermediate matrix sharing characteristic relationship therebetween, for passing through the relationship characteristic matrix E in the dimension reduction decomposition process^k*lTo obtain a more accurate patient clustering matrix P^n*kAnd pathological feature clustering matrix Q^l*mThe clustering result of (2) intoOne-step clustering of patients^n*kPathological feature clustering matrix Q^l*mAnd relation feature matrix E^k*lThe product of (c) can be maximally close to the cluster analysis matrix. For the embodiment, the patient clustering matrix P conforming to the constraint of the matrix decomposition algorithm is obtained in the dimension reduction processing^n*kPathological feature clustering matrix Q^l*mAnd relation feature matrix E^k*lThen, a patient cluster matrix P can be further extracted from the decomposition result^n*kAnd pathological feature clustering matrix Q^l*mSo as to be based on a patient clustering matrix P^n*kAnd pathological feature clustering matrix Q^l*mAnd simultaneously, the clustering of patients and the analysis of pathological feature clustering are realized.

103. And respectively determining a patient clustering result and a pathological characteristic clustering result by using the patient clustering matrix and the pathological characteristic clustering matrix.

For the present embodiment, the patient clustering matrix P is utilized^n*kDetermining patient clustering results corresponding to each patient in view of the patient clustering matrix P^n*kThe patient clustering method comprises n rows and k columns, wherein each row represents the patient identity characteristics (such as age, sex, work, height, weight and the like) of one patient, and the k columns represent the types of patient clustering division, so that the patient clustering matrix P can be used for determining the patient clustering results corresponding to all patients^n*kDetermining the maximum column value in each row as a patient clustering result corresponding to the patient identity in the row; clustering matrix Q by using pathological features^l*mWhen determining the clustering result corresponding to each pathological feature, considering the pathological feature clustering matrix Q^l*mThe method comprises the following steps that l rows and m columns are included, wherein the l rows represent the classification of pathological feature clustering division, and each column represents a pathological feature (such as the number of hospitalization days, clinical diagnosis, symptoms, examination indexes, operation, disease severity, cost and the like), so that when the pathological feature clustering result is determined, the pathological feature clustering matrix Q can be used for clustering the pathological features^l*mThe maximum row value in each column corresponds to the pathological feature clustering result.

104. And determining a target prediction disease according to the patient clustering result and the pathological feature clustering result.

In a specific application scenario, after the patient clustering result and the pathological feature clustering result are determined, as an optional mode, a disease condition knowledge base can be further created according to the patient clustering result and the pathological feature clustering result, so that the patient clustering result and the pathological feature clustering result can be applied to clinical pathological diagnosis, online pathological diagnosis and customization of a treatment scheme of a new patient in a disease treatment stage by using the disease condition knowledge base, and can also be applied to other realizable medical scenarios such as prediction of treatment effect and treatment cost possibly generated when the new patient receives a clinical path.

For the embodiment, the method can be further applied to the patient clustering result and the pathological feature clustering result, and the prediction of the corresponding potential or obvious disease type of the target patient is realized according to the pathological data of the patient by combining the user portrait technology. The user portrait technology is based on mathematical modeling of users in the real world by user data, the core work of the user portrait technology is to label the users, and the labels are highly refined feature identifications obtained by analyzing user information, so that label-based analysis and decision are carried out on the users. For the embodiment, after the patient clustering result and the pathological feature clustering result are determined, clustering information corresponding to pathological data of the patient is determined based on the patient clustering result and the pathological feature clustering result, a target user portrait of the target patient is further determined based on the clustering information, a preset disease with the highest similarity to the target user portrait feature is screened in a preset disease list, and finally the preset disease is used as a target prediction disease corresponding to the target patient.

According to the disease prediction method based on cluster analysis in the embodiment, a cluster analysis matrix can be constructed based on sample pathological data, and dimension reduction processing is performed on the cluster analysis matrix according to a preset matrix decomposition algorithm to obtain a patient cluster matrix and a pathological feature cluster matrix; and then, determining a patient clustering result and a pathological feature clustering result respectively by using the patient clustering matrix and the pathological feature clustering matrix, thereby determining a target prediction disease according to the patient clustering result and the pathological feature clustering result. Through the technical scheme in this application, can be when carrying out cluster analysis to patient and pathological feature, can obtain the result of two clusters simultaneously through a step, and then realize the promotion to clustering efficiency. Meanwhile, the mutual relation and influence of the patient information and the pathological characteristic information can be considered, the clustering result can be more accurate, and therefore powerful data support can be provided for disease prediction.

Further, as a refinement and an extension of the specific implementation of the above embodiment, in order to fully illustrate the specific implementation process in this embodiment, another disease prediction method based on cluster analysis is provided, as shown in fig. 2, the method includes:

201. acquiring sample pathological data, and constructing a cluster analysis matrix according to the sample pathological data, wherein the row attribute of the cluster analysis matrix is a patient main body, and the column attribute of the cluster analysis matrix is pathological characteristics.

For this embodiment, the sample pathological data may be binarized by 0-1 to obtain a binary matrix, i.e., a cluster analysis matrix R^n*mAnd the pathological data information of each sample patient is used as a row of m-dimensional numerical vectors in the cluster analysis matrix, so that an n-x-m-dimensional cluster analysis matrix is formed by n pieces of patient information. Correspondingly, when constructing the cluster analysis matrix according to the sample pathological data, the embodiment step 201 may specifically include: carrying out binarization processing on the sample pathological data to obtain pathological characteristics of each patient main body and patient identity characteristics; and constructing a cluster analysis matrix by using the pathological features of the patient and the identity features of the patient, so that the row attribute of the cluster analysis matrix is a patient main body, and the column attribute of the cluster analysis matrix is a pathological feature.

202. And decomposing the clustering analysis matrix into a product of the patient clustering matrix, the pathological feature clustering matrix and the relation feature matrix according to a preset matrix decomposition algorithm, wherein the Frobenius norm calculation result corresponding to the product and the clustering analysis matrix is smaller than a preset threshold value.

Wherein the patient clustering matrix P^n*kThe method comprises n rows and k columns in total, wherein the row attribute is the identity characteristic of a patient, the column attribute is the clustering class of the patient, and a pathological characteristic clustering matrix Q^l*mThe method comprises the following steps of sharing l rows and m columns, wherein the row attribute is a pathological feature clustering category, the column attribute is a pathological feature, and the column attribute is a closingIs a feature matrix E^k*lThe method comprises the steps of collecting a plurality of groups of patients, wherein the groups of patients comprise k rows and l columns, the row attribute is a patient clustering class, and the column attribute is a pathological feature clustering class.

For the present embodiment, the matrix R is analyzed for clusters^n*mWhen the dimension reduction decomposition processing is performed, matrix parameter values may be set for the dimension reduction matrix in advance, and for example, the parameter values may include parameter values corresponding to n, k, l, and m, and preset value intervals of elements in each matrix. Further, a patient clustering matrix P can be obtained according to the matrix parameter values^n*kPathological feature clustering matrix Q^l*mAnd relation feature matrix E^k*lInitializing a random constant for the data value of each element in the matrix P, and then utilizing the existing random gradient descent method or the alternate least square method to carry out matrix P^n*k，Q^l*mAnd E^k*lPerforming iterative update training while applying matrix P^n*k，Q^l*mAnd E^k*lFrobenius norm constraint is adopted to make the judgment of the completion of the training when the judgment loss function reaches the convergence state in the training process, and the clustering analysis matrix R is used at the moment^n*mAnd matrix P^n*k、Q^l*m、E^k*lThe Frobenius norm calculation result corresponding to the difference value of the three matrix products is minimum, namely the matrix P^n*k，Q^l*mAnd E^k*lIs best able to recover the cluster analysis matrix R^n*m。

The Frobenius norm is characterized by the formula:

wherein the content of the first and second substances,

is Frobenius norm which is each element R in the matrix R^ijThe sum of the squares of the absolute values of (a). In the present application, the matrix R is R^n*m-P^n*kE^k*lQ^l*m。

203. And respectively determining a patient clustering result and a pathological characteristic clustering result by using the patient clustering matrix and the pathological characteristic clustering matrix.

As an optional manner for this embodiment, step 203 in this embodiment may specifically include: extracting the maximum value of the column attribute of each patient identity characteristic in the patient clustering matrix, and determining the maximum value of the column attribute as the patient clustering result of the patient identity characteristic; and extracting the row attribute maximum value of each pathological feature in the pathological feature clustering matrix, and determining the row attribute maximum value as a pathological feature clustering result of the pathological feature.

204. The method comprises the steps of obtaining pathological data of a target patient, and extracting patient identity data and pathological feature data from the pathological data of the target patient.

Wherein, the target patient is a patient subject to be subjected to disease detection according to pathological data of the patient.

For the embodiment, after obtaining the pathological data of the target patient, the patient identification data and the pathological feature data may be further extracted from the pathological data of the patient according to a preset keyword or feature data extraction template, where the patient identification data may include age, sex, work, height, weight, and the like, and the pathological feature data may include days of stay, clinical diagnosis, symptoms, examination indexes, surgery, severity of disease, cost, and the like.

205. And determining first clustering information corresponding to the patient identity data according to the patient clustering result, and determining second clustering information corresponding to the pathological feature data according to the pathological feature clustering result.

The clustering result of the patient includes the clustering result corresponding to each patient identity, and the clustering result of the pathological features includes the clustering result corresponding to each pathological feature. Therefore, in this embodiment, first clustering information corresponding to the patient identification data may be further determined according to the patient clustering result, and second clustering information corresponding to the pathological feature data may be further determined according to the pathological feature clustering result, where the first clustering information is used to reflect the population attribute classification result corresponding to the target patient, and the second clustering information is used to reflect the attribute classification result corresponding to the pathological feature of the target patient. In this embodiment, the clustering results under two dimensions can be obtained simultaneously through one step, and then through the analysis to a plurality of dimensions, accurate prediction to the disease is favorable to carrying out.

206. A target predicted disease matching the first cluster information and the second cluster information is determined based on a user profile technique.

For this embodiment, when determining the target predicted disease matched with the first clustering information and the second clustering information, as an optional implementation, the step 206 of this embodiment may specifically include: generating a target user portrait of the target patient according to the first clustering information and the second clustering information; and screening a preset disease with the highest similarity with the target user portrait characteristics in a preset disease list, and taking the preset disease as a target prediction disease.

In a specific application scenario, after the first clustering information and the second clustering information of the target patient are determined, the first clustering information and the second clustering information can be respectively used as an independent portrait dimension to construct the portrait of the target user. Accordingly, when generating the target user representation of the target patient according to the first clustering information and the second clustering information, as an optional implementation, the embodiment may specifically include: extracting a first characteristic label of the first clustering information and a second characteristic label of the second clustering information according to a preset label extraction rule; and generating a characteristic label set of the target patient as a target user portrait of the target patient according to the first characteristic label and the second characteristic label.

Correspondingly, a preset disease list can be created in advance, a plurality of preset diseases constructed according to different user portraits are stored in the preset disease list in advance, and corresponding characteristic tag sets are configured according to the corresponding matched user portraits of the different preset diseases. In order to determine a target predicted disease according to a target user portrait, feature similarity calculation can be performed on the target user portrait and user portraits correspondingly configured in a preset disease list, specifically, feature similarity between a feature tag set of a target patient and preset feature tag sets of various preset diseases in the preset disease list can be calculated, and a target predicted disease with the highest feature similarity with the target user portrait can be screened out from the preset disease list based on the feature similarity. As an optional implementation manner, the steps of the embodiment may specifically include: calculating the feature similarity between the feature tag set of the target patient and the preset feature tag set of each preset disease in the preset disease list by using a preset feature distance calculation formula to obtain the feature similarity between the portrait of the target user and each preset disease; and determining the preset disease with the highest similarity with the target user portrait characteristics as a target predicted disease. The preset feature Distance calculation formula may be any Distance function formula suitable for measurement, and may include Euclidean Distance formula (Euclidean Distance), Manhattan Distance formula (Manhattan Distance), Jaccard Distance formula (Jaccard Distance), Mahalanobis Distance formula (Mahalanobis Distance), and the like.

By the disease prediction method based on cluster analysis, a cluster analysis matrix is constructed based on sample pathological data, and dimension reduction processing is carried out on the cluster analysis matrix according to a preset matrix decomposition algorithm to obtain a patient cluster matrix and a pathological feature cluster matrix; and then, determining a patient clustering result and a pathological feature clustering result respectively by using the patient clustering matrix and the pathological feature clustering matrix, thereby determining a target prediction disease according to the patient clustering result and the pathological feature clustering result. The technical scheme can be applied to a matrix decomposition algorithm and a user image technology to realize intelligent prediction of diseases. When clustering analysis is carried out on the patient and the pathological characteristics, two clustering results can be obtained simultaneously through one step, and then the clustering efficiency is improved. Meanwhile, the mutual relation and influence of the patient information and the pathological characteristic information can be considered, the clustering result can be more accurate, and therefore powerful data support can be provided for disease prediction.

Further, as a specific implementation of the method shown in fig. 1 and fig. 2, an embodiment of the present application provides a disease prediction apparatus based on cluster analysis, as shown in fig. 3, the apparatus includes: a construction module 31, a processing module 32, a first determination module 33, a second determination module 34;

the building module 31 is used for obtaining sample pathological data and building a cluster analysis matrix according to the sample pathological data, wherein the row attribute of the cluster analysis matrix is a patient subject, and the column attribute of the cluster analysis matrix is a pathological feature;

the processing module 32 is configured to perform dimension reduction processing on the clustering analysis matrix according to a preset matrix decomposition algorithm to obtain a patient clustering matrix and a pathological feature clustering matrix;

the first determining module 33 is configured to determine a patient clustering result and a pathological feature clustering result by using the patient clustering matrix and the pathological feature clustering matrix, respectively;

and the second determination module 34 is configured to determine a target predicted disease according to the patient clustering result and the pathological feature clustering result.

In a specific application scenario, when constructing the cluster analysis matrix according to the sample pathological data, as shown in fig. 4, the constructing module 31 may specifically include: a processing unit 311, a construction unit 312;

the processing unit 311 is configured to perform binarization processing on the sample pathological data to obtain pathological features and patient identity features of each patient;

the constructing unit 312 may be configured to construct a cluster analysis matrix by using the pathological features of the patient and the identity features of the patient, so that the row attribute of the cluster analysis matrix is the subject of the patient, and the column attribute of the cluster analysis matrix is the pathological feature.

Correspondingly, when performing dimension reduction processing on the clustering analysis matrix according to the preset matrix decomposition algorithm to obtain the patient clustering matrix and the pathological feature clustering matrix, the processing module 32 may be specifically configured to: decomposing the clustering analysis matrix into a product of a patient clustering matrix, a pathological feature clustering matrix and a relation feature matrix according to a preset matrix decomposition algorithm, and enabling the Frobenius norm calculation result corresponding to the product and the clustering analysis matrix to be smaller than a preset threshold value; the row attribute of the patient clustering matrix is a patient identity characteristic, the column attribute of the patient clustering matrix is a patient clustering category, the row attribute of the pathological feature clustering matrix is a pathological feature clustering category, the column attribute of the pathological feature clustering matrix is a pathological feature, the row attribute of the relation feature matrix is a patient clustering category, and the column attribute of the relation feature matrix is a pathological feature clustering category.

In a specific application scenario, when the patient clustering matrix and the pathological feature clustering matrix are used to determine the patient clustering result and the pathological feature clustering result respectively, as shown in fig. 4, the first determining module 33 may specifically include: a first extraction unit 331, a second extraction unit 332;

the first extraction unit 331 is configured to extract a maximum column attribute value of each patient identity in the patient clustering matrix, and determine the maximum column attribute value as a patient clustering result of the patient identity;

the second extracting unit 332 may be configured to extract a row attribute maximum value of each pathological feature in the pathological feature clustering matrix, and determine the row attribute maximum value as a pathological feature clustering result of the pathological feature.

Correspondingly, when determining the target predicted disease according to the patient clustering result and the pathological feature clustering result, as shown in fig. 4, the second determining module 34 may specifically include: a third extraction unit 341, a first determination unit 342, a second determination unit 343, a third determination unit 344;

a third extraction unit 341, configured to obtain patient pathology data of a target patient, and extract patient identity data and pathology feature data from the patient pathology data;

the first determining unit 342 is configured to determine first clustering information corresponding to the patient identity data according to the patient clustering result;

the second determining unit 343, configured to determine, according to the pathological feature clustering result, second clustering information corresponding to the pathological feature data;

a third determining unit 344 is operable to determine a target predicted disease matching the first and second clustering information based on a user profile technique.

Accordingly, when determining a target predicted disease matching the first cluster information and the second cluster information based on the user profile technique, the third determining unit 344 is specifically configured to generate a target user profile of the target patient based on the first cluster information and the second cluster information; and screening a preset disease with the highest similarity with the target user portrait characteristics in a preset disease list, and taking the preset disease as a target prediction disease.

In a specific application scenario, when a target user portrait of a target patient is generated according to the first clustering information and the second clustering information, the third determining unit 344 is specifically configured to extract a first feature tag of the first clustering information and a second feature tag of the second clustering information according to a preset tag extraction rule; generating a feature tag set of the target patient according to the first feature tag and the second feature tag, wherein the feature tag set is used as a target user portrait of the target patient;

correspondingly, when determining the target predicted disease with the highest feature similarity to the portrait of the target user, the third determining unit 344 is specifically configured to calculate the feature similarity between the feature tag set of the target patient and the preset feature tag set of each preset disease in the preset disease list by using a preset feature distance calculation formula, so as to obtain the feature similarity between the portrait of the target user and each preset disease; and determining the preset disease with the highest similarity with the target user portrait characteristics as a target predicted disease.

It should be noted that other corresponding descriptions of the functional units related to the disease prediction apparatus based on cluster analysis provided in this embodiment may refer to the corresponding descriptions in fig. 1 to fig. 2, and are not repeated herein.

Based on the method shown in fig. 1 to fig. 2, correspondingly, the present embodiment further provides a storage medium, which may be volatile or nonvolatile, and has computer readable instructions stored thereon, and when the computer readable instructions are executed by a processor, the method for predicting a disease based on cluster analysis shown in fig. 1 to fig. 2 is implemented.

Based on such understanding, the technical solution of the present application may be embodied in the form of a software product, which may be stored in a storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, or the like), and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device, or the like) to execute the method of the embodiments of the present application.

Based on the method shown in fig. 1 to fig. 2 and the virtual device embodiments shown in fig. 3 and fig. 4, in order to achieve the above object, the present embodiment further provides a computer device, where the computer device includes a storage medium and a processor; a storage medium for storing a computer program; a processor for executing a computer program to implement the above-described disease prediction method based on cluster analysis as shown in fig. 1 to 2.

Optionally, the computer device may further include a user interface, a network interface, a camera, Radio Frequency (RF) circuitry, a sensor, audio circuitry, a WI-FI module, and so forth. The user interface may include a Display screen (Display), an input unit such as a keypad (Keyboard), etc., and the optional user interface may also include a USB interface, a card reader interface, etc. The network interface may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), etc.

It will be understood by those skilled in the art that the present embodiment provides a computer device structure that is not limited to the physical device, and may include more or less components, or some components in combination, or a different arrangement of components.

The storage medium may further include an operating system and a network communication module. The operating system is a program that manages the hardware and software resources of the computer device described above, supporting the operation of information handling programs and other software and/or programs. The network communication module is used for realizing communication among components in the storage medium and communication with other hardware and software in the information processing entity device.

Through the above description of the embodiments, those skilled in the art will clearly understand that the present application can be implemented by software plus a necessary general hardware platform, and can also be implemented by hardware.

By applying the technical scheme, compared with the prior art, the method can firstly construct a clustering analysis matrix based on sample pathological data, and perform dimensionality reduction processing on the clustering analysis matrix according to a preset matrix decomposition algorithm to obtain a patient clustering matrix and a pathological feature clustering matrix; and then, determining a patient clustering result and a pathological feature clustering result respectively by using the patient clustering matrix and the pathological feature clustering matrix, thereby determining a target prediction disease according to the patient clustering result and the pathological feature clustering result. The technical scheme can be applied to a matrix decomposition algorithm and a user image technology to realize intelligent prediction of diseases. When clustering analysis is carried out on the patient and the pathological characteristics, two clustering results can be obtained simultaneously through one step, and then the clustering efficiency is improved. Meanwhile, the mutual relation and influence of the patient information and the pathological characteristic information can be considered, the clustering result can be more accurate, and therefore powerful data support can be provided for disease prediction.

Those skilled in the art will appreciate that the figures are merely schematic representations of one preferred implementation scenario and that the blocks or flow diagrams in the figures are not necessarily required to practice the present application. Those skilled in the art will appreciate that the modules in the devices in the implementation scenario may be distributed in the devices in the implementation scenario according to the description of the implementation scenario, or may be located in one or more devices different from the present implementation scenario with corresponding changes. The modules of the implementation scenario may be combined into one module, or may be further split into a plurality of sub-modules.

The above application serial numbers are for description purposes only and do not represent the superiority or inferiority of the implementation scenarios. The above disclosure is only a few specific implementation scenarios of the present application, but the present application is not limited thereto, and any variations that can be made by those skilled in the art are intended to fall within the scope of the present application.

Claims

1. A disease prediction method based on cluster analysis is characterized by comprising the following steps:

2. The method of claim 1, wherein said constructing a cluster analysis matrix from the sample pathology data comprises:

carrying out binarization processing on the sample pathological data to obtain pathological characteristics of each patient main body and patient identity characteristics;

and constructing a cluster analysis matrix by using the pathological features of the patient and the identity features of the patient, so that the row attributes of the cluster analysis matrix are the main body of the patient, and the column attributes of the cluster analysis matrix are the pathological features.

3. The method of claim 1, wherein the performing the dimensionality reduction on the cluster analysis matrix according to a predetermined matrix decomposition algorithm to obtain a patient cluster matrix and a pathological feature cluster matrix comprises:

decomposing the clustering analysis matrix into a product of a patient clustering matrix, a pathological feature clustering matrix and a relation feature matrix according to a preset matrix decomposition algorithm, wherein the Frobenius norm calculation result corresponding to the product and the clustering analysis matrix is smaller than a preset threshold value;

the row attribute of the patient clustering matrix is a patient identity characteristic, the column attribute of the patient clustering matrix is a patient clustering category, the row attribute of the pathological feature clustering matrix is a pathological feature clustering category, the column attribute of the pathological feature clustering matrix is a pathological feature, the row attribute of the relation feature matrix is the patient clustering category, and the column attribute of the relation feature matrix is the pathological feature clustering category.

4. The method of claim 3, wherein said determining patient cluster results and pathology cluster results using said patient cluster matrix and said pathology cluster matrix, respectively, comprises:

extracting the maximum column attribute value of each patient identity feature in the patient clustering matrix, and determining the maximum column attribute value as a patient clustering result of the patient identity feature;

and extracting the row attribute maximum value of each pathological feature in the pathological feature clustering matrix, and determining the row attribute maximum value as a pathological feature clustering result of the pathological feature.

5. The method of claim 1, wherein determining a target predicted disease from the patient cluster results and the pathology cluster results comprises:

acquiring patient pathological data of a target patient, and extracting patient identity data and pathological feature data from the patient pathological data;

determining first clustering information corresponding to the patient identity data according to the patient clustering result;

determining second clustering information corresponding to the pathological feature data according to the pathological feature clustering result;

determining a target predicted disease matching the first and second cluster information based on a user profile technique.

6. The method of claim 5, wherein the determining a target predicted disease matching the first cluster information and the second cluster information based on a user profile technique comprises:

generating a target user representation of the target patient according to the first clustering information and the second clustering information;

and screening a preset disease with the highest similarity with the target user portrait characteristics in a preset disease list, and taking the preset disease as a target prediction disease.

7. The method of claim 6, wherein generating the target user representation of the target patient based on the first cluster information and the second cluster information comprises:

extracting a first feature label of the first clustering information and a second feature label of the second clustering information according to a preset label extraction rule;

generating a feature tag set of the target patient as a target user representation of the target patient according to the first feature tag and the second feature tag;

the method for predicting the target user portrait features comprises the following steps of screening a preset disease with the highest similarity with the target user portrait features in a preset disease list, and taking the preset disease as a target predicted disease, wherein the method specifically comprises the following steps:

calculating the feature similarity between the feature tag set of the target patient and the preset feature tag set of each preset disease in a preset disease list by using a preset feature distance calculation formula to obtain the feature similarity between the target user portrait and each preset disease;

and determining the preset disease with the highest similarity to the target user portrait characteristics as a target predicted disease.

8. A disease prediction apparatus based on cluster analysis, comprising:

9. A storage medium having stored thereon a computer program, which when executed by a processor implements the method for cluster analysis based disease prediction according to any of claims 1 to 7.

10. A computer device comprising a storage medium, a processor and a computer program stored on the storage medium and executable on the processor, wherein the processor implements the cluster analysis-based disease prediction method of any one of claims 1 to 7 when executing the program.