CN111223570A

CN111223570A - Pathological data analysis method, device, equipment and storage medium

Info

Publication number: CN111223570A
Application number: CN202010005182.7A
Authority: CN
Inventors: 蔡金成
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2020-01-03
Filing date: 2020-01-03
Publication date: 2020-06-02
Also published as: WO2021135063A1

Abstract

The invention relates to the field of machine learning, and discloses a pathological data analysis method, a pathological data analysis device, pathological data analysis equipment and a storage medium, wherein the method comprises the following steps: acquiring a clustering result of a pathological data sample set; calculating and adjusting the contour coefficient according to the clustering result; determining the quality of the clustering result according to the adjustment contour coefficient of the clustering result; when the clustering result is excellent, acquiring a pathological data sample to be processed; and classifying the pathological data samples to be processed according to the clustering result, and generating pathological analysis data corresponding to the pathological data samples to be processed. The method solves the problem of overhigh time complexity in the clustering result evaluation and calculation process, greatly reduces the data calculation amount in the evaluation and calculation process, greatly improves the efficiency of clustering result evaluation, and can accelerate the judgment of the pathological data clustering result so as to quickly determine the optimal pathological data clustering result.

Description

Pathological data analysis method, device, equipment and storage medium

Technical Field

The invention relates to the field of machine learning, in particular to a pathological data analysis method, a pathological data analysis device, pathological data analysis equipment and a storage medium.

Background

In the medical field, with the development of technology, a hospital management system collects pathological data of a large number of patients. The pathological data can be divided into a plurality of sets by combining a clustering algorithm, and each set corresponds to one disease condition. Thus helping doctors to realize the accurate diagnosis of patients with difficult and complicated diseases.

While clustering is an algorithm that involves unsupervised grouping of data. The clustering algorithm is also called clustering analysis, is a statistical analysis method for researching data classification problems, and is an important means for data mining.

In a given data set, after the data set is divided into different groups by a clustering algorithm, the clustering result needs to be evaluated to evaluate the quality of the clustering result. The contour Coefficient (Silhouette coeffient) is a clustering result evaluation method for evaluating the effect of unsupervised clustering algorithms for use in determining the number of clusters (i.e., groups) during clustering. The contour coefficients combined with the degree of agglomeration (Cohesion) and degree of Separation (Separation) of the clusters evaluate the clustering effect. The value range of the contour coefficient is [ -1,1], and the larger the value is, the better the clustering effect is.

However, the temporal complexity of the contour coefficients is very high, with the temporal complexity being the square of n, i.e., O (n2), where n is the number of samples. In the process of processing a large-scale data set, the calculation amount of the contour coefficient of the clustering result is very large, and the result is difficult to calculate in a short time. Especially, when the number of clusters is determined by using the contour coefficients, the contour coefficients of a plurality of clustering results need to be calculated, and the whole process consumes longer time.

After the pathological data is clustered, a plurality of different clustering results are generally calculated. Because the amount of pathological data is huge, the detection indexes are also many, so that unpredictable errors often occur when the conventional contour coefficients are used for evaluating pathological data clustering results, or the calculation time is too long, and the required evaluation results cannot be obtained in time.

Disclosure of Invention

Therefore, it is necessary to provide a pathological data analysis method for solving the problem of too high time complexity in the clustering result evaluation and calculation process, improving the calculation speed of clustering result evaluation, and quickly determining the quality of the clustering result, so as to classify pathological data samples according to the clustering result and obtain the pathological analysis data required to be obtained.

A method of pathological data analysis, comprising:

acquiring a clustering result of a pathological data sample set, wherein the clustering result divides the pathological data sample set into a plurality of clusters, each cluster consists of a plurality of pathological sample points i, and the number of the pathological sample points i in the pathological data sample set is greater than a preset number threshold;

calculating the central point of each cluster according to the clustering result;

calculating the distance between a pathological sample point i and the central point of each cluster;

calculating an adjustment contour coefficient of the pathological sample point i according to the distance between the pathological sample point i and the central point of each cluster, wherein the calculation formula is as follows:

in the above formula, s_c(i) An adjustment contour coefficient representing a pathological sample point i; a is_c(i) Representing the distance between the pathological sample point i and the center point of the cluster where the pathological sample point i is located; b_c(i) Representing the distance between the central point of the cluster closest to the pathological sample point i and the pathological sample point i;

calculating the average number of the adjustment contour coefficients of all the pathological sample points i to obtain the adjustment contour coefficients of the clustering result;

determining the quality of the clustering result according to the adjustment contour coefficient of the clustering result;

when the clustering result is excellent, acquiring a pathological data sample to be processed;

classifying the pathological data samples to be processed according to the clustering result, and generating pathological analysis data corresponding to the pathological data samples to be processed.

A pathological data analysis device comprising:

the system comprises an acquisition result module, a comparison module and a display module, wherein the acquisition result module is used for acquiring a clustering result of a pathological data sample set, the clustering result divides the pathological data sample set into a plurality of clusters, each cluster consists of a plurality of pathological sample points i, and the number of the pathological sample points i in the pathological data sample set is greater than a preset number threshold;

the central point calculation module is used for calculating the central point of each cluster according to the clustering result;

the distance calculation module is used for calculating the distance between a pathological sample point i and the center point of each cluster;

a sample point coefficient calculating module, configured to calculate an adjustment contour coefficient of the pathological sample point i according to a distance between the pathological sample point i and a center point of each cluster, where the calculation formula is as follows:

a result coefficient calculating module, configured to calculate an average of the adjusted contour coefficients of all the pathological sample points i, and obtain the adjusted contour coefficient of the clustering result;

the result evaluation module is used for determining the advantages and disadvantages of the clustering results according to the adjustment contour coefficients of the clustering results;

the sample obtaining module is used for obtaining a pathological data sample to be processed when the clustering result is excellent;

and the sample analysis module is used for classifying the pathological data samples to be processed according to the clustering result and generating pathological analysis data corresponding to the pathological data samples to be processed.

A computer device comprising a memory, a processor and a computer program stored in said memory and executable on said processor, said processor implementing the above-mentioned pathology data analysis method when executing said computer program.

A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, implements the above-mentioned pathology data analysis method.

According to the pathological data analysis method, the pathological data analysis device, the computer equipment and the storage medium, the clustering result of the pathological data sample set is obtained, the clustering result divides the pathological data sample set into a plurality of clusters, each cluster consists of a plurality of pathological sample points i, and the number of the pathological sample points i in the pathological data sample set is greater than a preset number threshold value, so that the result obtained by clustering analysis is obtained; and calculating the center point of each cluster according to the clustering result so as to determine the center point position of each cluster. And calculating the distance between the pathological sample point i and the central point of each cluster, wherein the calculated amount is greatly reduced because only the distance between the pathological sample point i and the central point of the cluster is calculated, but not the distances between the pathological sample point i and all other pathological sample points i. And calculating the adjustment contour coefficient of the pathological sample point i according to the distance between the pathological sample point i and the central point of each cluster to obtain the adjustment contour coefficient of a single pathological sample point i, wherein the calculation amount is less than that of the method before improvement. And calculating the average of the adjustment contour coefficients of all the pathological sample points i to obtain the adjustment contour coefficients of the clustering result, wherein the calculation speed is high due to the mean calculation. And determining the quality of the clustering result according to the adjustment profile coefficient of the clustering result, wherein the quality of the clustering result can be quickly judged because the adjustment profile coefficient of the clustering result can be quickly calculated, and the higher the adjustment profile coefficient of the clustering result is, the more accurate the clustering result is. And when the clustering result is excellent, acquiring the pathological data samples to be processed so as to classify the pathological data samples by using the clustering result. Classifying the pathological data samples to be processed according to the clustering result, and generating pathological analysis data corresponding to the pathological data samples to be processed to generate valuable data to prompt pathological risks of patients. The method solves the problem of overhigh time complexity in the clustering result evaluation and calculation process, greatly reduces the data calculation amount in the evaluation and calculation process, greatly improves the efficiency of clustering result evaluation, and can accelerate the judgment of the pathological data clustering result so as to quickly determine the optimal pathological data clustering result.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive labor.

FIG. 1 is a schematic diagram of an application environment of the method for analyzing pathological data according to an embodiment of the present invention;

FIG. 2 is a flow chart of a method for analyzing pathological data according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a calculation path for comparing before and after refinement;

FIG. 4 is a flow chart of a method for analyzing pathological data according to an embodiment of the present invention;

FIG. 5 is a flow chart of a method for analyzing pathological data according to an embodiment of the present invention;

FIG. 6 is a flow chart of a method for analyzing pathological data according to an embodiment of the present invention;

FIG. 7 is a flow chart of a method for analyzing pathological data according to an embodiment of the present invention;

FIG. 8 is a schematic structural diagram of a pathological data analysis device according to an embodiment of the present invention;

FIG. 9 is a schematic diagram of a computer device according to an embodiment of the invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The pathological data analysis method provided by the embodiment can be applied to the application environment shown in fig. 1, in which the client communicates with the server through a network. The client includes, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices. The server can be implemented by an independent server or a server cluster composed of a plurality of servers.

In an embodiment, as shown in fig. 2, a pathological data analysis method is provided, which is described by taking the application of the method to the server side in fig. 1 as an example, and includes the following steps:

s10, obtaining a clustering result of a pathological data sample set, wherein the clustering result divides the pathological data sample set into a plurality of clusters, each cluster is composed of a plurality of pathological sample points i, and the number of the pathological sample points i in the pathological data sample set is larger than a preset number threshold;

s20, calculating the center point of each cluster according to the clustering result;

s30, calculating the distance between the pathological sample point i and the central point of each cluster

S40, calculating the adjustment contour coefficient of the pathological sample point i according to the distance between the pathological sample point i and the central point of each cluster, wherein the calculation formula is as follows:

s50, calculating the average of the adjustment contour coefficients of all the pathological sample points i to obtain the adjustment contour coefficients of the clustering result;

s60, determining the quality of the clustering result according to the adjustment contour coefficient of the clustering result;

s70, when the clustering result is excellent, acquiring a pathological data sample to be processed;

and S80, classifying the pathological data samples to be processed according to the clustering result, and generating pathological analysis data corresponding to the pathological data samples to be processed.

In this embodiment, the clustering result may be a result obtained after the pathological data sample set performs a clustering task. The clustering result of the pathological data sample set can be obtained by a method based on partitioning and clustering based on coacervation hierarchy, such as K-means, aggregative and the like. The preset quantity threshold value can be set according to actual needs, and can be set to be 5 thousands, 10 or other values. Here, each pathology sample i in the pathology data sample set includes a plurality of detection indexes, such as a first detection index, a second detection index, … …. The pathological sample i may be considered as a point in a multidimensional space. Specifically, the spatial dimension of each pathological sample point i is the same before clustering. That is, the pathological sample points i in the pathological data sample set contain the same number of detection indicators. The clustering result divides the pathological data sample set into a plurality of clusters, and each cluster has one or more pathological sample points i. Clusters here may be in the meaning of groups or subsets. Usually, the disease species corresponding to the same cluster are the same.

Since the value of the pathological sample point i is known, it can be expressed in the form of coordinates, such as (x)_i，y_i). Thus, the cluster center point c can be solved. The coordinate value of the center point c of the cluster is equal to the average value of the coordinate values of all the sample points of the cluster. For example, cluster N is denoted as { i }₁，i₂……i_nEach sample can be represented as (x)_i，y_i) The coordinates of the center point c of the cluster may then be:

after solving the cluster center point c, the distance of each pathology sample point i from the cluster center point c can be calculated. If the number of clusters is k, k distances can be calculated for each pathological sample point i, wherein the k distances include an intra-cluster distance (the distance between the pathological sample point i and the intra-cluster central point) and m-1 extra-cluster distances (the distance between the pathological sample point i and the extra-cluster central point).

Then, the adjustment contour coefficient of the sample point can be calculated according to the distance between the sample point and the center point of the cluster. The adjustment contour coefficient of the pathological sample point i is calculated by the following formula:

in the above formula, s_c(i) Adjusted contour coefficient, a, representing pathological sample point i_c(i) Representing the distance between the pathological sample point i and the center point of the cluster where the pathological sample point i is located; b_c(i) The distance of the cluster center closest to the pathological sample point i is indicated.

In the process of solving, b_c(i) Which is the minimum of k-1 extra-cluster distances. The adjustment contour coefficients of the sample points can thus be solved. The calculated adjustment contour coefficient of the pathological sample point i is a numerical value with the value range of [ -1,1]。

The adjusted contour coefficients of all the sample points can be calculated according to the formula in the previous step, and then the average of the adjusted contour coefficients of all the sample points is calculated, so that the adjusted contour coefficients of the clustering result can be obtained. Similarly, the adjustment contour coefficient of the clustering result is a numerical value with a value range of [ -1,1 ].

After the adjustment contour coefficient is calculated, the quality of the clustering result can be determined according to the adjustment contour coefficient. The larger the value, the better the clustering effect of the clustering result. The cluster result can be ranked according to the assigned numerical range, such as (0.5, 1) is good, (0, 0.5) is general, and [ -1, 0] is poor.

The time complexity of adjusting the contour coefficients is represented by O (n) compared to the original contour coefficients²) Reducing the calculation amount to O (n) greatly reduces the calculation amount required for evaluating the clustering result. In the processing process of the large-scale data set, the rapid evaluation of a plurality of clustering results can be realized so as to determine the advantages and disadvantages of the clustering results.

By using the pathological data analysis method provided by the embodiment, the judgment on the pathological data clustering result can be accelerated, so that the optimal pathological data clustering result can be quickly determined.

After the optimal clustering result of the pathological data is determined, the pathological data samples to be processed can be obtained, then the pathological data samples to be processed are classified according to the clustering result, and the pathological analysis data corresponding to the pathological data samples to be processed are generated. In some cases, the pathology analysis data may be a pathology risk cue report for the patient.

In order to compare the difference between the original contour coefficient and the adjusted contour coefficient of the present embodiment, a schematic diagram of the calculation path as shown in fig. 3 is provided. FIG. 3-a shows the path used to calculate the degree of coagulation (distance of pathological sample point i from the sample point within the cluster) before improvement; FIG. 3-b shows the path used to calculate the degree of separation (distance of pathological sample point i from the sample points outside the cluster) before improvement; FIG. 3-c shows the path for calculating the degree of coagulation (distance of pathological sample point i from the sample point within the cluster) after modification; fig. 3-d shows the path used to calculate the degree of separation (distance of pathological sample point i from the sample point outside the cluster) after refinement.

In an application example, the original contour coefficient calculation method and the contour coefficient adjustment method are respectively used for evaluating the clustering result of the same pathological data sample set, and the results are shown in table 1.

TABLE 1 calculation time consumption of different evaluation methods for processing clustering results of the same pathology data sample set

The configuration of the server for calculating the test results of table 1 is: 20-core CPU, maximum speed 2.39 GHz; 256G memory, speed: 2400 MHz.

From the accurate precision analysis, compared with the original contour coefficient, when the compactness of the contour coefficient in a calculation cluster is adjusted, the average distance from the pathological sample point i of each cluster to the center point of the cluster is adopted instead of the average distance between every two samples in the cluster, so that the time consumption and the space cost for calculating the sample distance matrix can be greatly reduced, a large amount of calculation resources are saved, and the running speed is improved. Taking the sample set P as an example, the calculation time is reduced from the original 22871.75766 seconds to the improved 2.728480302, and the calculation efficiency is improved by 8382.6 times. But also in the accuracy of the distance calculation.

The method provided by the embodiment is also suitable for other sample sets with large data processing amount and high dimensionality, such as the financial data processing field, the medicine data analysis field, the image data identification field and the like.

In steps S10-S80, obtaining a clustering result of a pathology data sample set, where the clustering result divides the pathology data sample set into a plurality of clusters, each cluster is composed of a plurality of pathology sample points i, and the number of the pathology sample points i in the pathology data sample set is greater than a preset number threshold, so as to obtain a result obtained by clustering analysis; and calculating the center point of each cluster according to the clustering result so as to determine the center point position of each cluster. And calculating the distance between the pathological sample point i and the central point of each cluster, wherein the calculated amount is greatly reduced because only the distance between the pathological sample point i and the central point of the cluster is calculated, but not the distances between the pathological sample point i and all other pathological sample points i. And calculating the adjustment contour coefficient of the pathological sample point i according to the distance between the pathological sample point i and the central point of each cluster to obtain the adjustment contour coefficient of a single pathological sample point i, wherein the calculation amount is less than that of the method before improvement. And calculating the average of the adjustment contour coefficients of all the pathological sample points i to obtain the adjustment contour coefficients of the clustering result, wherein the calculation speed is high due to the mean calculation. And determining the quality of the clustering result according to the adjustment profile coefficient of the clustering result, wherein the quality of the clustering result can be quickly judged because the adjustment profile coefficient of the clustering result can be quickly calculated, and the higher the adjustment profile coefficient of the clustering result is, the more accurate the clustering result is. And when the clustering result is excellent, acquiring the pathological data samples to be processed so as to classify the pathological data samples by using the clustering result. Classifying the pathological data samples to be processed according to the clustering result, and generating pathological analysis data corresponding to the pathological data samples to be processed to generate valuable data to prompt pathological risks of patients.

Optionally, as shown in fig. 4, after step S50, the method further includes:

s51, calculating the adjustment contour coefficients of the plurality of clustering results;

and S52, determining the clustering result with the highest adjustment contour coefficient as the optimal clustering result of the pathological data sample set.

In this embodiment, since the calculation amount of the adjustment contour coefficient is greatly reduced, the computer can calculate the adjustment contour coefficients of a plurality of clustering results in a short time. And then determining the optimal clustering result according to the size of the adjusted contour coefficient. The larger the value of the adjustment contour coefficient is, the better the clustering effect of the clustering result is, so that the clustering result with the highest adjustment contour coefficient can be determined as the optimal clustering result of the pathological data sample set.

In steps S51-S52, the adjusted contour coefficients of the plurality of clustering results are calculated to quickly calculate the adjusted contour coefficients of the plurality of clustering results. And determining the clustering result with the highest adjustment contour coefficient as the optimal clustering result of the pathological data sample set, wherein the optimal clustering result can be quickly determined due to the high calculation speed of the adjustment contour coefficient of the clustering result.

Optionally, as shown in fig. 5, after step S50, the method further includes:

s53, judging whether the adjustment contour coefficient of the clustering result is larger than a preset coefficient threshold value;

and S54, if the adjustment contour coefficient of the clustering result is larger than a preset coefficient threshold value, determining the clustering result as the optimal clustering result of the pathological data sample set.

In some cases, an expected value, i.e. a preset coefficient threshold value, may be set, and when the adjusted contour coefficient is greater than the preset coefficient threshold value, the clustering result may be determined as a preferred clustering result of the pathology data sample set. For example, in one example, the predetermined coefficient threshold may be set to 0.5.

In steps S53-S54, it is determined whether the adjusted contour coefficient of the clustering result is greater than a preset coefficient threshold, so as to compare the calculated adjusted contour coefficient of the clustering result with the preset coefficient threshold. And if the adjustment contour coefficient of the clustering result is greater than the preset coefficient threshold, determining the clustering result as the optimal clustering result of the pathological data sample set, and selecting the adjustment contour coefficient of the clustering result greater than the preset coefficient threshold as the optimal clustering result of the pathological data sample set.

Optionally, as shown in fig. 6, step S10 further includes, before:

s11, acquiring the pathological data sample set;

s12, calculating the clustering result of the pathology data sample set based on a K-Means clustering algorithm.

In this embodiment, K-Means is a clustering analysis algorithm for iterative solution. The calculation process is as follows: first, the number to be clustered is determined, and their respective center points are randomly initialized. To determine the number to cluster, it is preferable to quickly look at the data and attempt to identify any different groupings. The center point is a vector of the same length as each vector of data points; classifying each data point by calculating the distance between the current point and the center of each group, and then classifying the data points into the group of the center closest to the current point; calculating the average value of all points in each class as a new cluster center based on the result after iteration; the iterations repeat these steps, or until the group center does not vary much (less than a set threshold) between iterations. Alternatively, the random initialization group center may be selected several times and then the initialization center point for the best result may be selected.

The advantage of Kmeans is that the speed is very fast, since only the distance between the point and the centre of the group needs to be calculated, with a small amount of calculation, the time complexity of which is o (n).

In steps S11-S12, the pathology data sample set is acquired to obtain a pathology data sample set to be processed. Calculating the clustering result of the pathology data sample set based on a K-Means clustering algorithm to obtain a clustering result needing to be evaluated.

Optionally, as shown in fig. 7, step S10 further includes, before:

s11, acquiring the pathological data sample set;

s13, calculating the clustering result of the pathology data sample set based on a coacervation hierarchical clustering algorithm.

The coacervation hierarchical clustering algorithm is to combine two most similar data points by calculating the similarity between every two data points and iterate the process repeatedly until the set requirement of the number of clusters is met. The smaller the distance, the higher the similarity. The distance may be a euclidean distance or the like.

The specific steps of the Agglomerative include: firstly, each sample is taken as a class, and the distance between every two classes is calculated; forming a new category by combining two categories with the minimum distance (most similar); recalculating the distance between each category; iterating the two steps until a cluster is formed; the process of the agglomerative hierarchical clustering is to establish a tree, a threshold value, namely the number of clusters formed, can be set according to requirements, and when the number of categories is equal to the threshold value, the iteration can be terminated.

In steps S11, S13, the pathology data sample set is acquired to obtain a pathology data sample set to be processed. And calculating the clustering result of the pathological data sample set based on a coacervation hierarchical clustering algorithm to obtain the clustering result to be evaluated.

In the embodiment, the distance between a sample point and the center point of each cluster is calculated by acquiring the center point of each cluster after clustering and according to the center point of each cluster; calculating an adjustment contour coefficient of the sample point according to the distance between the sample point and the central point of the cluster; and calculating the average number of the adjustment contour coefficients of all the pathological sample points i to obtain the adjustment contour coefficients of the clustering results, and determining the advantages and disadvantages of the clustering results according to the adjustment contour coefficients of the clustering results. The embodiment solves the problem of overhigh time complexity in the clustering result evaluation and calculation process, greatly reduces the data calculation amount in the evaluation and calculation process, greatly improves the efficiency of clustering result evaluation, and can accelerate the judgment of the pathological data clustering result so as to quickly determine the optimal pathological data clustering result.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.

In one embodiment, a pathological data analysis device is provided, which corresponds to the pathological data analysis method in the above embodiments one to one. As shown in fig. 8, the pathological data analysis apparatus includes an acquisition result module 10, a central point calculation module 20, a distance calculation module 30, a sample point coefficient calculation module 40, a result coefficient calculation module 50, a result evaluation module 60, an acquisition sample module 70, and a sample analysis module 80. The functional modules are explained in detail as follows:

an obtaining result module 10, configured to obtain a clustering result of a pathological data sample set, where the clustering result divides the pathological data sample set into a plurality of clusters, each cluster is composed of a plurality of pathological sample points i, and the number of the pathological sample points i in the pathological data sample set is greater than a preset number threshold;

a central point calculating module 20, configured to calculate a central point of each cluster according to the clustering result;

a distance calculating module 30, configured to calculate a distance between a pathological sample point i and a center point of each cluster;

a sample point coefficient calculating module 40, configured to calculate an adjustment contour coefficient of the pathological sample point i according to a distance between the pathological sample point i and a center point of each cluster, where the calculation formula is as follows:

a result coefficient calculating module 50, configured to calculate an average of the adjusted contour coefficients of all the pathological sample points i, so as to obtain the adjusted contour coefficient of the clustering result;

a result evaluation module 60, configured to determine the quality of the clustering result according to the adjusted contour coefficient of the clustering result;

an obtaining sample module 70, configured to obtain a pathological data sample to be processed when the clustering result is excellent;

and the sample analysis module 80 is configured to classify the pathological data samples to be processed according to the clustering result, and generate pathological analysis data corresponding to the pathological data samples to be processed.

Optionally, the pathological data analysis device further includes:

the multi-result calculating module is used for calculating the adjustment contour coefficients of the clustering results;

and the optimal result determining module is used for determining the clustering result with the highest adjustment contour coefficient as the optimal clustering result of the pathological data sample set.

Optionally, the pathological data analysis device further includes:

the coefficient judgment module is used for judging whether the adjustment contour coefficient of the clustering result is greater than a preset coefficient threshold value or not;

and the optimal result determining module is used for determining the clustering result as the optimal clustering result of the pathological data sample set if the adjustment contour coefficient of the clustering result is greater than a preset coefficient threshold value.

Optionally, the pathological data analysis device further includes:

a sample set acquisition module for acquiring the pathological data sample set;

the first clustering calculation module is used for calculating the clustering result of the pathology data sample set based on a K-Means clustering algorithm.

Optionally, the pathological data analysis device further includes:

a sample set acquisition module for acquiring the pathological data sample set;

and the second clustering calculation module is used for calculating the clustering result of the pathological data sample set based on a coacervation hierarchical clustering algorithm.

For specific limitations of the pathological data analysis device, reference may be made to the above limitations of the pathological data analysis method, which are not described herein again. The modules in the pathological data analysis device can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 9. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing data related to pathological data clustering result evaluation. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a pathology data analysis method.

In one embodiment, a computer device is provided, comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program:

In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims

1. A method of pathological data analysis, comprising:

2. The pathological data analysis method according to claim 1, wherein the calculating an average of the adjusted contour coefficients of all the pathological sample points i to obtain the adjusted contour coefficients of the clustering result further comprises:

calculating the adjustment contour coefficients of a plurality of clustering results;

and determining the clustering result with the highest adjustment contour coefficient as the optimal clustering result of the pathological data sample set.

3. The pathological data analysis method according to claim 1, wherein the calculating an average of the adjusted contour coefficients of all the pathological sample points i to obtain the adjusted contour coefficients of the clustering result further comprises:

judging whether the adjustment contour coefficient of the clustering result is larger than a preset coefficient threshold value or not;

and if the adjustment contour coefficient of the clustering result is greater than a preset coefficient threshold value, determining the clustering result as the optimal clustering result of the pathological data sample set.

4. The pathological data analysis method of claim 1, wherein before obtaining the clustering result that divides the pathological data sample set into a number of clusters, the method comprises:

acquiring the pathological data sample set;

calculating the clustering result of the pathology data sample set based on a K-Means clustering algorithm.

5. The pathological data analysis method of claim 1, wherein before obtaining the clustering result that divides the pathological data sample set into a number of clusters, the method comprises:

acquiring the pathological data sample set;

calculating the clustering result of the pathology data sample set based on a coacervation hierarchical clustering algorithm.

6. A pathological data analysis device, comprising:

in the above formula, s_c(i) An adjustment contour coefficient representing a pathological sample point i; a is_c(i) Point i representing pathological sample and method for producing the sameDistance of the center point of the cluster; b_c(i) Representing the distance between the central point of the cluster closest to the pathological sample point i and the pathological sample point i;

7. The pathological data analysis device of claim 6, further comprising:

8. The pathological data analysis device of claim 6, further comprising:

9. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the pathology data analysis method according to any one of claims 1 to 5 when executing the computer program.

10. A computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, carries out the pathology data analysis method according to any one of claims 1 to 5.