WO2021135063A1

WO2021135063A1 - Pathological data analysis method and apparatus, and device and storage medium

Info

Publication number: WO2021135063A1
Application number: PCT/CN2020/093328
Authority: WO
Inventors: 蔡金成
Original assignee: 平安科技（深圳）有限公司
Priority date: 2020-01-03
Filing date: 2020-05-29
Publication date: 2021-07-08
Also published as: CN111223570A

Abstract

A method and apparatus for analysis of pathological data, and a device and a storage medium, relating to the field of artificial intelligence. The method comprises: acquiring a clustering result of a pathological data sample set (S10); calculating an adjustment silhouette coefficient according to the clustering result, and determining the quality of the clustering result according to the adjustment silhouette coefficient of the clustering result (S60); when the clustering result is good, acquiring pathological data samples to be processed (S70); and classifying, according to the clustering result, the pathological data samples to be processed, and generating pathological analysis data corresponding to the pathological data samples to be processed (S80). By means of the method, the problem of excessive time complexity during a clustering result evaluation calculation process is solved, the data calculation amount during the evaluation calculation process is greatly reduced, the clustering result evaluation efficiency is greatly improved, and the determination of a pathological data clustering result can be accelerated, so as to quickly determine the optimal pathological data clustering result.

Description

Pathological data analysis method, device, equipment and storage medium

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on January 3, 2020, the application number is 202010005182.7, and the invention title is "pathological data analysis method, device, equipment and storage medium", the entire content of which is incorporated by reference In this application.

Technical field

This application relates to the field of machine learning in the field of artificial intelligence, and in particular to a pathological data analysis method, device, equipment, and storage medium.

Background technique

In the medical field, with the development of technology, the hospital's management system collects a large number of patients' pathological data. These pathological data can be combined with a clustering algorithm to divide the pathological data into multiple sets, and each set corresponds to a condition. This can help doctors realize the diagnosis of patients with intractable diseases.

The clustering algorithm is an algorithm that involves unsupervised grouping of data. Clustering algorithm, also known as cluster analysis, is a statistical analysis method for studying data classification problems, and it is also an important means of data mining.

In a given data set, after the data set is divided into different groups by the clustering algorithm, the clustering results need to be evaluated to evaluate the quality of the clustering results. Silhouette Coefficient is a clustering result evaluation method, used to evaluate the effect of unsupervised clustering algorithm, so as to determine the number of clusters (ie, grouping) in the clustering process. The profile coefficient combines the cohesion and separation of the cluster to evaluate the clustering effect. The value range of the contour coefficient is [-1,1]. The larger the value, the better the clustering effect.

However, the time complexity of the contour coefficient is very high, and its time complexity is the square of n, that is, O(n2), where n is the number of samples. In the processing of large-scale data sets, the calculation amount of the contour coefficients of the clustering results is very large, and it is difficult to calculate the results in a short time. Especially when the contour coefficient is used to determine the number of clusters, the contour coefficients of multiple clustering results need to be calculated, and the whole process takes longer.

The inventor found that after clustering calculation of pathological data, multiple different clustering results are usually calculated. Due to the huge amount of pathological data and many detection indicators, the existing contour coefficients used to evaluate the clustering results of pathological data often have unforeseen errors, or the calculation takes too long to obtain the required evaluation results in time.

Application content

Based on this, it is necessary to provide a pathological data analysis method for the above technical problems to solve the problem of high time complexity in the evaluation and calculation process of the clustering results, improve the calculation speed of the evaluation of the clustering results, and quickly determine the clustering results. Then classify the pathological data samples according to the clustering results to obtain the required pathological analysis data.

A pathological data analysis method, including:

Obtain a clustering result of the pathological data sample set, the clustering result divides the pathological data sample set into several clusters, the clusters are composed of multiple pathological sample points i, and the pathological sample points of the pathological data sample set The number of i is greater than the preset number threshold;

Calculating the center point of each of the clusters according to the clustering result;

Calculate the distance between the pathological sample point i and the center point of each cluster;

The adjusted contour coefficient of the pathological sample point i is calculated according to the distance between the pathological sample point i and the center point of each cluster, and the calculation formula is as follows:

In the above formula, s _c (i) represents the adjusted contour coefficient of pathological sample point i; a _c (i) represents the distance between pathological sample point i and the center point of its cluster; b _c (i) represents the closest pathological sample point i The distance between the center point of the cluster and the pathological sample point i;

Calculate the average of the adjusted contour coefficients of all pathological sample points i, and obtain the adjusted contour coefficients of the clustering result;

Determining the pros and cons of the clustering result according to the adjusted contour coefficient of the clustering result;

When the clustering result is excellent, obtaining a sample of pathological data to be processed;

Classify the pathological data sample to be processed according to the clustering result, and generate pathological analysis data corresponding to the pathological data sample to be processed.

A pathological data analysis device, including:

The obtaining result module is used to obtain the clustering result of the pathological data sample set. The clustering result divides the pathological data sample set into several clusters, the clusters are composed of multiple pathological sample points i, and the pathological data The number of pathological sample points i in the sample set is greater than the preset number threshold;

A central point calculation module, configured to calculate the central point of each of the clusters according to the clustering result;

The distance calculation module is used to calculate the distance between the pathological sample point i and the center point of each cluster;

The sample point coefficient calculation module is used to calculate the adjusted contour coefficient of the pathological sample point i according to the distance between the pathological sample point i and the center point of each cluster, and the calculation formula is as follows:

The result coefficient calculation module is used to calculate the average of the adjusted contour coefficients of all the pathological sample points i, and obtain the adjusted contour coefficient of the clustering result;

The result evaluation module is used to determine the pros and cons of the clustering result according to the adjusted contour coefficient of the clustering result;

The sample obtaining module is used to obtain a sample of pathological data to be processed when the clustering result is excellent;

The sample analysis module is configured to classify the pathological data sample to be processed according to the clustering result, and generate pathological analysis data corresponding to the pathological data sample to be processed.

A computer device includes a memory, a processor, and computer-readable instructions that are stored in the memory and can run on the processor, and the processor implements the following steps when the processor executes the computer-readable instructions:

One or more readable storage media storing computer readable instructions, when the computer readable instructions are executed by one or more processors, the one or more processors execute the following steps:

The details of one or more embodiments of the present application are presented in the following drawings and description, and other features and advantages of the present application will become apparent from the description, drawings and claims.

Beneficial effect

The invention solves the problem of high time complexity in the evaluation and calculation process of clustering results, greatly reduces the amount of data calculation in the evaluation and calculation process, greatly improves the efficiency of the evaluation of the clustering results, and can accelerate the judgment of the pathological data clustering results , In order to quickly determine the best pathological data clustering results.

Description of the drawings

In order to explain the technical solutions of the embodiments of the present application more clearly, the following will briefly introduce the drawings that need to be used in the description of the embodiments of the present application. Obviously, the drawings in the following description are only some embodiments of the present application. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without creative labor.

FIG. 1 is a schematic diagram of an application environment of a pathological data analysis method in an embodiment of the present application;

FIG. 2 is a schematic flowchart of a pathological data analysis method in an embodiment of the present application;

Figure 3 is a schematic diagram of a calculation path used to compare before and after improvement;

FIG. 4 is a schematic flowchart of a pathological data analysis method in an embodiment of the present application;

FIG. 5 is a schematic flowchart of a pathological data analysis method in an embodiment of the present application;

Fig. 6 is a schematic flowchart of a pathological data analysis method in an embodiment of the present application;

FIG. 7 is a schematic flowchart of a pathological data analysis method in an embodiment of the present application;

Fig. 8 is a schematic structural diagram of a pathological data analysis device in an embodiment of the present application;

Fig. 9 is a schematic diagram of a computer device in an embodiment of the present application.

Detailed ways

The technical solutions in the embodiments of the present application will be described clearly and completely in conjunction with the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, rather than all of them. Based on the embodiments in this application, all other embodiments obtained by a person of ordinary skill in the art without creative work shall fall within the protection scope of this application.

The pathological data analysis method provided in this embodiment can be applied in an application environment as shown in FIG. 1, where the client communicates with the server through the network. Among them, the client includes, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices. The server can be implemented with an independent server or a server cluster composed of multiple servers.

In an embodiment, as shown in FIG. 2, a pathological data analysis method is provided. The method is applied to the server in FIG. 1 as an example for description, including the following steps:

S10. Obtain a clustering result of the pathological data sample set, where the clustering result divides the pathological data sample set into several clusters, the clusters are composed of multiple pathological sample points i, and the pathology of the pathological data sample set The number of sample points i is greater than the preset number threshold;

S20: Calculate the center point of each of the clusters according to the clustering result;

S30. Calculate the distance between the pathological sample point i and the center point of each cluster

S40. Calculate the adjusted contour coefficient of the pathological sample point i according to the distance between the pathological sample point i and the center point of each cluster, and the calculation formula is as follows:

S50: Calculate the average of the adjusted contour coefficients of all the pathological sample points i, and obtain the adjusted contour coefficient of the clustering result;

S60: Determine the quality of the clustering result according to the adjusted contour coefficient of the clustering result;

S70. When the clustering result is excellent, obtain a pathological data sample to be processed;

S80. Classify the pathological data sample to be processed according to the clustering result, and generate pathological analysis data corresponding to the pathological data sample to be processed.

In this embodiment, the clustering result may be the result obtained after performing the clustering task on the pathological data sample set. The clustering results of the pathological data sample set can be obtained through methods based on partition and agglomerative hierarchical clustering, such as K-means, Agglomerative, etc. The preset number threshold can be set according to actual needs, for example, it can be set to 50,000, 10 or other values. Here, each pathological sample i in the pathological data sample set includes multiple detection indexes, such as a first detection index, a second detection index,... The pathological sample i can be regarded as a point in a multi-dimensional space. In particular, before the pathological data sample set is clustered, the spatial dimension of each pathological sample point i is the same. In other words, the pathological sample point i in the pathological data sample set contains the same number of detection indicators. The clustering result divides the pathological data sample set into several clusters, and each cluster has one or more pathological sample points i. Here, cluster can mean grouping or subset. Normally, the disease types corresponding to the same cluster are the same.

Since the value of the pathological sample point i is known, it can be expressed in the form of coordinates, such as (x _i , y _i ). Therefore, the center point c of the cluster can be solved. The coordinate value of the center point c of the cluster is equal to the average value of the coordinate values of all sample points of the cluster. For example, the cluster N is represented as {i ₁ , i ₂ ……i _n }, and each sample can be represented as (x _i , y _i ), the coordinates of the center point c of the cluster can be:

After the center point c of the cluster is solved, the distance between each pathological sample point i and the center point c of the cluster can be calculated. If the number of clusters is k, then k distances can be calculated for each pathological sample point i, including an intra-cluster distance (the distance between the pathological sample point i and the center point in the cluster) and m-1 distances outside the cluster (The distance between the pathological sample point i and the center point outside the cluster).

Then, the adjusted contour coefficient of the sample point can be calculated according to the distance between the sample point and the center point of the cluster. The adjusted contour coefficient of pathological sample point i is calculated by the following formula:

In the above formula, s _c (i) represents the adjusted contour coefficient of pathological sample point i, a _c (i) represents the distance between pathological sample point i and the center point of its cluster; b _c (i) represents the closest pathological sample point i The distance between the centers of clusters.

In the process of solving, b _c (i) is the smallest value among the k-1 distances outside the cluster. Thus, the adjusted contour coefficient of the sample points can be solved. The calculated adjusted contour coefficient of pathological sample point i is a value, and its value range is [-1,1].

The adjusted contour coefficient of all sample points can be calculated according to the formula in the previous step, and then the average of the adjusted contour coefficients of all sample points can be calculated to obtain the adjusted contour coefficient of the clustering result. Similarly, the adjusted contour coefficient of the clustering result is a value, and its value range is [-1,1].

After calculating the adjusted contour coefficient, the pros and cons of the clustering result can be determined according to the adjusted contour coefficient. The larger the value, the better the clustering effect of the clustering result. You can set a specified numerical range to classify the pros and cons of the clustering results, such as (0.5,1] for excellent, (0,0.5] for general, and [-1,0] for poor.

Compared with the original contour coefficients, the time complexity of adjusting contour coefficients is ^{reduced from O(n 2} ) to O(n), which greatly reduces the amount of calculation required to evaluate the clustering results. In the process of large-scale data set processing, multiple clustering results can be quickly evaluated to determine the pros and cons of the clustering results.

Using the pathological data analysis method provided in this embodiment can speed up the determination of the pathological data clustering result, so as to quickly determine the best pathological data clustering result.

After determining the best pathological data clustering results, the pathological data samples to be processed can be obtained, and then the pathological data samples to be processed are classified according to the above-mentioned clustering results, and pathological analysis corresponding to the pathological data samples to be processed is generated data. In some cases, the pathological analysis data may be the patient's pathological risk prompt report.

In order to facilitate the comparison between the original contour coefficient and the adjusted contour coefficient of this embodiment, a schematic diagram of the calculation path as shown in FIG. 3 is provided. Figure 3-a shows the path used to calculate the degree of aggregation (the distance between pathological sample point i and the sample point in the cluster) before the improvement; Figure 3-b shows the path used to calculate the degree of separation (pathological sample point i and the cluster) before the improvement Figure 3-c shows the path used to calculate the degree of cohesion (the distance between pathological sample point i and the sample point in the cluster) after the improvement; Figure 3-d shows the improved path for Calculate the path of the degree of separation (the distance between the pathological sample point i and the sample point outside the cluster).

In an application example, the original contour coefficient calculation method and the adjusted contour coefficient method were used to evaluate the clustering results of the same pathological data sample set. The results are shown in Table 1.

Table 1 Calculation time consumption of different evaluation methods for processing clustering results of the same pathological data sample set

The configuration of the server used to calculate the test results in Table 1 is: 20-core CPU, maximum speed 2.39GHz; 256G memory, speed: 2400MHz.

From the analysis of precision and accuracy, compared with the original contour coefficient, when adjusting the compactness of the contour coefficient in the calculation of the cluster, the average distance from the pathological sample point i of each cluster to the center point of the cluster is used instead of the two in the cluster. The average distance between samples, which can greatly reduce the time consumption and space overhead in calculating the sample distance matrix, save a lot of computing resources, and improve the running speed. Taking the sample set P as an example, the calculation time is reduced from 22871.75766 seconds to the improved 2.728480302, and the calculation efficiency is increased by 8382.6 times. However, the accuracy of distance calculation has also decreased.

The method provided in this embodiment is also applicable to other sample sets with a large amount of processed data and high dimensionality, such as the financial data processing field, the drug data analysis field, and the image data recognition field.

In steps S10-S80, a clustering result of the pathological data sample set is obtained, and the clustering result divides the pathological data sample set into several clusters, the clusters are composed of a plurality of pathological sample points i, and the pathological data The number of pathological sample points i in the sample set is greater than the preset number threshold to obtain the result obtained by the cluster analysis; the center point of each cluster is calculated according to the clustering result to determine the center point position of each cluster. Calculate the distance between pathological sample point i and the center point of each cluster. Since only the distance between pathological sample point i and the cluster center point is calculated, instead of the distance between pathological sample point i and all other pathological sample points i, the calculation is greatly reduced the amount. The adjusted contour coefficient of the pathological sample point i is calculated according to the distance between the pathological sample point i and the center point of each cluster to obtain the adjusted contour coefficient of a single pathological sample point i, and the amount of calculation is less than the method before the improvement. Calculate the average of the adjusted contour coefficients of all pathological sample points i, and obtain the adjusted contour coefficients of the clustering result. Because it is an average operation, the calculation speed is relatively fast. The pros and cons of the clustering results are determined according to the adjusted contour coefficients of the clustering results. Since the adjusted contour coefficients of the clustering results can be quickly calculated, the pros and cons of the clustering results can be quickly determined, and the adjusted contours of the clustering results The higher the coefficient, the more accurate the clustering result. When the clustering result is excellent, a pathological data sample to be processed is obtained, so as to use the clustering result to classify the pathological data sample. According to the clustering results, the pathological data samples to be processed are classified, and pathological analysis data corresponding to the pathological data samples to be processed are generated, so as to generate valuable data to indicate the pathological risk of the patient.

Optionally, as shown in FIG. 4, after step S50, the method further includes:

S51: Calculate the adjusted contour coefficients of multiple clustering results;

S52: Determine the clustering result with the highest adjusted contour coefficient as the optimal clustering result of the pathological data sample set.

In this embodiment, since the calculation amount of the adjusted contour coefficient is greatly reduced, the computer can calculate the adjusted contour coefficients of multiple clustering results in a relatively short time. Then the optimal clustering result is determined according to the size of the adjusted contour coefficient. Since the larger the value of the adjusted contour coefficient, the better the clustering effect of the clustering result. Therefore, the clustering result with the highest adjusted contour coefficient can be determined as the optimal clustering result of the pathological data sample set.

In steps S51-S52, the adjusted contour coefficients of the multiple clustering results are calculated to quickly calculate the adjusted contour coefficients of the multiple clustering results. The clustering result with the highest adjusted contour coefficient is determined as the optimal clustering result of the pathological data sample set. Since the adjusted contour coefficient of the clustering result has a fast calculation speed, the optimal clustering result can be quickly determined.

Optionally, as shown in FIG. 5, after step S50, the method further includes:

S53: Determine whether the adjusted contour coefficient of the clustering result is greater than a preset coefficient threshold;

S54. If the adjusted contour coefficient of the clustering result is greater than a preset coefficient threshold, the clustering result is determined as a preferred clustering result of the pathological data sample set.

In some cases, an expected value, that is, a preset coefficient threshold, can be set. When the adjusted contour coefficient is greater than the preset coefficient threshold, it can be determined that the clustering result is the preferred clustering result of the pathological data sample set. For example, in an example, the preset coefficient threshold may be set to 0.5.

In steps S53-S54, it is determined whether the adjusted contour coefficient of the clustering result is greater than a preset coefficient threshold, and the calculated adjusted contour coefficient of the clustering result is compared with the preset coefficient threshold. If the adjusted contour coefficient of the clustering result is greater than the preset coefficient threshold, the clustering result is determined as the preferred clustering result of the pathological data sample set to select the adjusted contour coefficient of the clustering result to be greater than the preset coefficient The threshold is used as the preferred clustering result of the pathological data sample set.

Optionally, as shown in FIG. 6, before step S10, the method further includes:

S11. Obtain the pathological data sample set;

S12: Calculate the clustering result of the pathological data sample set based on the K-Means clustering algorithm.

In this embodiment, K-Means is an iterative solution clustering analysis algorithm. The calculation process is as follows: First, determine the number of clusters to be clustered, and initialize their respective center points randomly. In order to determine the number of clusters, it is best to quickly look at the data and try to identify any different groupings. The center point is a vector with the same length as the vector of each data point; by calculating the distance between the current point and the center of each group, each data point is classified, and then classified into the group with the closest center; based on iteration After the result, calculate the average of all points in each category as the new cluster center; repeat these steps iteratively, or until the group center does not change much between iterations (less than a set threshold). In addition, you can choose to initialize the center of the group randomly several times, and then select the initialization center point with the best result.

The advantage of Kmeans is that it is very fast, because it only needs to calculate the distance between the point and the center of the group, the amount of calculation is small, and its time complexity is o(n).

In steps S11-S12, the pathological data sample set is obtained to obtain the pathological data sample set to be processed. The clustering result of the pathological data sample set is calculated based on the K-Means clustering algorithm to obtain the clustering result that needs to be evaluated.

Optionally, as shown in FIG. 7, before step S10, the method further includes:

S11. Obtain the pathological data sample set;

S13: Calculate the clustering result of the pathological data sample set based on the agglomerative hierarchical clustering algorithm.

The agglomerative hierarchical clustering algorithm is to combine the two most similar data points by calculating the similarity between the two data points, and iterate this process repeatedly until the set number of clusters is met. The smaller the distance, the higher the similarity. The distance can be a measurement method such as Euclidean distance.

The specific steps of Agglomerative include: First, treat each sample as one category and calculate the distance between the two categories; combine the two categories with the smallest distance (most similar) into one category to form a new category; recalculate each category The distance between; iterate the last two steps until a cluster is formed; the process of agglomerative hierarchical clustering is to build a tree, and a threshold can be set as needed, that is, the number of clusters formed. When the number of categories is equal to this threshold, Then the iteration can be terminated.

In steps S11 and S13, the pathological data sample set is obtained to obtain the pathological data sample set to be processed. The clustering result of the pathological data sample set is calculated based on the agglomerative hierarchical clustering algorithm to obtain the clustering result that needs to be evaluated.

In this embodiment, the center point of each cluster after clustering is obtained and the distance between the sample point and the center point of the cluster is calculated according to the center point of each cluster; the distance between the sample point and the center point of the cluster is calculated according to the distance between the sample point and the center point of the cluster The adjusted contour coefficient of the sample point; calculate the average of the adjusted contour coefficients of all the pathological sample points i, obtain the adjusted contour coefficient of the clustering result, and determine the cluster according to the adjusted contour coefficient of the clustering result The pros and cons of the class results. This embodiment solves the problem of high time complexity in the evaluation and calculation process of the clustering results, greatly reduces the amount of data calculation in the evaluation and calculation process, greatly improves the efficiency of the evaluation of the clustering results, and can accelerate the evaluation of the pathological data clustering results. Judgment to quickly determine the best pathological data clustering results.

It should be understood that the size of the sequence number of each step in the foregoing embodiment does not mean the order of execution, and the execution sequence of each process should be determined by its function and internal logic, and should not constitute any limitation to the implementation process of the embodiment of the present application.

In one embodiment, a pathological data analysis device is provided, and the pathological data analysis device corresponds to the pathological data analysis method in the above-mentioned embodiment in a one-to-one correspondence. As shown in FIG. 8, the pathological data analysis device includes a result acquisition module 10, a center point calculation module 20, a distance calculation module 30, a sample point coefficient calculation module 40, a result coefficient calculation module 50, a result evaluation module 60, and a sample acquisition module 70和sample analysis module 80. The detailed description of each functional module is as follows:

The obtaining result module 10 is used to obtain a clustering result of a pathological data sample set. The clustering result divides the pathological data sample set into several clusters. The clusters are composed of multiple pathological sample points i. The number of pathological sample points i in the data sample set is greater than the preset number threshold;

The central point calculation module 20 is configured to calculate the central point of each of the clusters according to the clustering result;

The distance calculation module 30 is used to calculate the distance between the pathological sample point i and the center point of each cluster;

The sample point coefficient calculation module 40 is configured to calculate the adjusted contour coefficient of the pathological sample point i according to the distance between the pathological sample point i and the center point of each cluster, and the calculation formula is as follows:

The result coefficient calculation module 50 is configured to calculate the average of the adjusted contour coefficients of all the pathological sample points i, and obtain the adjusted contour coefficient of the clustering result;

The result evaluation module 60 is configured to determine the quality of the clustering result according to the adjusted contour coefficient of the clustering result;

The sample obtaining module 70 is configured to obtain a pathological data sample to be processed when the clustering result is excellent;

The sample analysis module 80 is configured to classify the pathological data sample to be processed according to the clustering result, and generate pathological analysis data corresponding to the pathological data sample to be processed.

Optionally, the pathological data analysis device further includes:

Multi-result calculation module, used to calculate the adjusted contour coefficient of multiple clustering results;

The optimal result determining module is used to determine the clustering result with the highest adjusted contour coefficient as the optimal clustering result of the pathological data sample set.

Optionally, the pathological data analysis device further includes:

A coefficient judgment module for judging whether the adjusted contour coefficient of the clustering result is greater than a preset coefficient threshold;

The preferred result determining module is configured to determine the clustering result as the preferred clustering result of the pathological data sample set if the adjusted contour coefficient of the clustering result is greater than the preset coefficient threshold.

Optionally, the pathological data analysis device further includes:

The sample set acquisition module is used to acquire the pathological data sample set;

The first clustering calculation module is configured to calculate the clustering result of the pathological data sample set based on the K-Means clustering algorithm.

Optionally, the pathological data analysis device further includes:

The second clustering calculation module is configured to calculate the clustering result of the pathological data sample set based on the agglomerative hierarchical clustering algorithm.

For the specific definition of the pathological data analysis device, please refer to the above definition of the pathological data analysis method, which will not be repeated here. Each module in the above-mentioned pathological data analysis device can be implemented in whole or in part by software, hardware, and a combination thereof. The above-mentioned modules may be embedded in the form of hardware or independent of the processor in the computer equipment, or may be stored in the memory of the computer equipment in the form of software, so that the processor can call and execute the operations corresponding to the above-mentioned modules.

In one embodiment, a computer device is provided. The computer device may be a server, and its internal structure diagram may be as shown in FIG. 9. The computer equipment includes a processor, a memory, a network interface, and a database connected through a system bus. Among them, the processor of the computer device is used to provide calculation and control capabilities. The memory of the computer device includes a readable storage medium and an internal memory. The readable storage medium stores an operating system, computer readable instructions, and a database. The internal memory provides an environment for the operation of the operating system and computer readable instructions in the readable storage medium. The database of the computer device is used to store the data involved in the evaluation of the pathological data clustering result. The network interface of the computer device is used to communicate with an external terminal through a network connection. The computer-readable instructions are executed by the processor to realize a pathological data analysis method. The readable storage medium provided in this embodiment includes a non-volatile readable storage medium and a volatile readable storage medium.

In one embodiment, a computer device is provided, including a memory, a processor, and computer-readable instructions stored on the memory and capable of running on the processor, and the processor implements the following steps when the processor executes the computer-readable instructions:

In one embodiment, one or more computer-readable storage media storing computer-readable instructions are provided. The readable storage media provided in this embodiment include non-volatile readable storage media and volatile readable storage media. Storage medium. The readable storage medium stores computer readable instructions, and when the computer readable instructions are executed by one or more processors, the following steps are implemented:

A person of ordinary skill in the art can understand that all or part of the processes in the methods of the above-mentioned embodiments can be implemented by instructing relevant hardware through computer-readable instructions. The computer-readable instructions can be stored in a non-volatile computer. In a readable storage medium or a volatile readable storage medium, when the computer readable instruction is executed, it may include the processes of the above-mentioned method embodiments. Wherein, any reference to memory, storage, database, or other media used in the embodiments provided in this application may include non-volatile and/or volatile memory. Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. As an illustration and not a limitation, RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Those skilled in the art can clearly understand that for the convenience and conciseness of description, only the division of the above functional units and modules is used as an example. In practical applications, the above functions can be allocated to different functional units and modules as required. Module completion, that is, the internal structure of the device is divided into different functional units or modules to complete all or part of the functions described above.

The above-mentioned embodiments are only used to illustrate the technical solutions of the present application, not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, a person of ordinary skill in the art should understand that it can still implement the foregoing The technical solutions recorded in the examples are modified, or some of the technical features are equivalently replaced; these modifications or replacements do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the application, and should be included in Within the scope of protection of this application.

Claims

A pathological data analysis method, which includes:

Obtain a clustering result of the pathological data sample set, the clustering result divides the pathological data sample set into several clusters, the clusters are composed of multiple pathological sample points i, and the pathological sample points of the pathological data sample set The number of i is greater than the preset number threshold;

Calculating the center point of each of the clusters according to the clustering result;

Calculate the distance between the pathological sample point i and the center point of each cluster;

The adjusted contour coefficient of the pathological sample point i is calculated according to the distance between the pathological sample point i and the center point of each cluster, and the calculation formula is as follows:

In the above formula, s c (i) represents the adjusted contour coefficient of pathological sample point i; a c (i) represents the distance between pathological sample point i and the center point of its cluster; b c (i) represents the closest pathological sample point i The distance between the center point of the cluster and the pathological sample point i;

Calculate the average of the adjusted contour coefficients of all pathological sample points i, and obtain the adjusted contour coefficients of the clustering result;

Determining the pros and cons of the clustering result according to the adjusted contour coefficient of the clustering result;

When the clustering result is excellent, obtaining a sample of pathological data to be processed;

Classify the pathological data sample to be processed according to the clustering result, and generate pathological analysis data corresponding to the pathological data sample to be processed.
3. The pathological data analysis method according to claim 1, wherein said calculating the average number of adjusted contour coefficients of all pathological sample points i, and obtaining the adjusted contour coefficients of the clustering result, further comprises:

Calculate the adjusted contour coefficients of multiple clustering results;

The clustering result with the highest adjusted contour coefficient is determined as the optimal clustering result of the pathological data sample set.
3. The pathological data analysis method according to claim 1, wherein said calculating the average number of adjusted contour coefficients of all pathological sample points i, and obtaining the adjusted contour coefficients of the clustering result, further comprises:

Judging whether the adjusted contour coefficient of the clustering result is greater than a preset coefficient threshold;

If the adjusted contour coefficient of the clustering result is greater than the preset coefficient threshold, the clustering result is determined as the preferred clustering result of the pathological data sample set.
The pathological data analysis method according to claim 1, wherein the obtaining a clustering result, before the clustering result divides the pathological data sample set into several clusters, comprises:

Acquiring the pathological data sample set;

The clustering result of the pathological data sample set is calculated based on the K-Means clustering algorithm.
The pathological data analysis method according to claim 1, wherein the obtaining a clustering result, before the clustering result divides the pathological data sample set into several clusters, comprises:

Acquiring the pathological data sample set;

The clustering result of the pathological data sample set is calculated based on an agglomerated hierarchical clustering algorithm.
A pathological data analysis device, which includes:

The obtaining result module is used to obtain the clustering result of the pathological data sample set. The clustering result divides the pathological data sample set into several clusters, the clusters are composed of multiple pathological sample points i, and the pathological data The number of pathological sample points i in the sample set is greater than the preset number threshold;

A central point calculation module, configured to calculate the central point of each of the clusters according to the clustering result;

The distance calculation module is used to calculate the distance between the pathological sample point i and the center point of each cluster;

The sample point coefficient calculation module is used to calculate the adjusted contour coefficient of the pathological sample point i according to the distance between the pathological sample point i and the center point of each cluster, and the calculation formula is as follows:

In the above formula, s c (i) represents the adjusted contour coefficient of pathological sample point i; a c (i) represents the distance between pathological sample point i and the center point of its cluster; b c (i) represents the closest pathological sample point i The distance between the center point of the cluster and the pathological sample point i;

The result coefficient calculation module is used to calculate the average of the adjusted contour coefficients of all the pathological sample points i, and obtain the adjusted contour coefficient of the clustering result;

The result evaluation module is used to determine the pros and cons of the clustering result according to the adjusted contour coefficient of the clustering result;

The sample obtaining module is used to obtain a sample of pathological data to be processed when the clustering result is excellent;

The sample analysis module is configured to classify the pathological data sample to be processed according to the clustering result, and generate pathological analysis data corresponding to the pathological data sample to be processed.
The pathological data analysis device according to claim 6, further comprising:

Multi-result calculation module, used to calculate the adjusted contour coefficient of multiple clustering results;

The optimal result determining module is used to determine the clustering result with the highest adjusted contour coefficient as the optimal clustering result of the pathological data sample set.
The pathological data analysis device according to claim 6, further comprising:

A coefficient judgment module for judging whether the adjusted contour coefficient of the clustering result is greater than a preset coefficient threshold;

The preferred result determining module is configured to determine the clustering result as the preferred clustering result of the pathological data sample set if the adjusted contour coefficient of the clustering result is greater than the preset coefficient threshold.
The pathological data analysis device according to claim 6, further comprising:

The sample set acquisition module is used to acquire the pathological data sample set;

The first clustering calculation module is configured to calculate the clustering result of the pathological data sample set based on the K-Means clustering algorithm.
The pathological data analysis device according to claim 6, further comprising:

The sample set acquisition module is used to acquire the pathological data sample set;

The second clustering calculation module is configured to calculate the clustering result of the pathological data sample set based on the agglomerative hierarchical clustering algorithm.
A computer device includes a memory, a processor, and computer-readable instructions that are stored in the memory and can run on the processor, wherein the processor implements the following steps when the processor executes the computer-readable instructions:

Obtain a clustering result of the pathological data sample set, the clustering result divides the pathological data sample set into several clusters, the clusters are composed of multiple pathological sample points i, and the pathological sample points of the pathological data sample set The number of i is greater than the preset number threshold;

Calculating the center point of each of the clusters according to the clustering result;

Calculate the distance between the pathological sample point i and the center point of each cluster;

The adjusted contour coefficient of the pathological sample point i is calculated according to the distance between the pathological sample point i and the center point of each cluster, and the calculation formula is as follows:

In the above formula, s c (i) represents the adjusted contour coefficient of pathological sample point i; a c (i) represents the distance between pathological sample point i and the center point of its cluster; b c (i) represents the closest pathological sample point i The distance between the center point of the cluster and the pathological sample point i;

Calculate the average of the adjusted contour coefficients of all pathological sample points i, and obtain the adjusted contour coefficients of the clustering result;

Determining the pros and cons of the clustering result according to the adjusted contour coefficient of the clustering result;

When the clustering result is excellent, obtaining a sample of pathological data to be processed;

Classify the pathological data sample to be processed according to the clustering result, and generate pathological analysis data corresponding to the pathological data sample to be processed.
The computer device according to claim 11, wherein, after said calculating the average of the adjusted contour coefficients of all the pathological sample points i to obtain the adjusted contour coefficients of the clustering result, the processor executes the The following steps are also implemented when the computer-readable instructions are:

Calculate the adjusted contour coefficients of multiple clustering results;

The clustering result with the highest adjusted contour coefficient is determined as the optimal clustering result of the pathological data sample set.
The computer device according to claim 11, wherein, after said calculating the average of the adjusted contour coefficients of all the pathological sample points i to obtain the adjusted contour coefficients of the clustering result, the processor executes the The following steps are also implemented when the computer-readable instructions are:

Judging whether the adjusted contour coefficient of the clustering result is greater than a preset coefficient threshold;

If the adjusted contour coefficient of the clustering result is greater than the preset coefficient threshold, the clustering result is determined as the preferred clustering result of the pathological data sample set.
The computer device according to claim 11, wherein, before the clustering result is obtained, and the clustering result divides the pathological data sample set into a plurality of clusters, the processor further executes the computer-readable instruction To achieve the following steps:

Acquiring the pathological data sample set;

The clustering result of the pathological data sample set is calculated based on the K-Means clustering algorithm.
The computer device according to claim 11, wherein, before the clustering result is obtained, and the clustering result divides the pathological data sample set into a plurality of clusters, the processor further executes the computer-readable instruction To achieve the following steps:

Acquiring the pathological data sample set;

The clustering result of the pathological data sample set is calculated based on an agglomerated hierarchical clustering algorithm.
One or more readable storage media storing computer readable instructions, when the computer readable instructions are executed by one or more processors, the one or more processors execute the following steps:

Obtain a clustering result of the pathological data sample set, the clustering result divides the pathological data sample set into several clusters, the clusters are composed of multiple pathological sample points i, and the pathological sample points of the pathological data sample set The number of i is greater than the preset number threshold;

Calculating the center point of each of the clusters according to the clustering result;

Calculate the distance between the pathological sample point i and the center point of each cluster;

The adjusted contour coefficient of the pathological sample point i is calculated according to the distance between the pathological sample point i and the center point of each cluster, and the calculation formula is as follows:

In the above formula, s c (i) represents the adjusted contour coefficient of pathological sample point i; a c (i) represents the distance between pathological sample point i and the center point of its cluster; b c (i) represents the closest pathological sample point i The distance between the center point of the cluster and the pathological sample point i;

Calculate the average of the adjusted contour coefficients of all pathological sample points i, and obtain the adjusted contour coefficients of the clustering result;

Determining the pros and cons of the clustering result according to the adjusted contour coefficient of the clustering result;

When the clustering result is excellent, obtaining a sample of pathological data to be processed;

Classify the pathological data sample to be processed according to the clustering result, and generate pathological analysis data corresponding to the pathological data sample to be processed.
The readable storage medium according to claim 16, wherein, after said calculating the average of the adjusted contour coefficients of all the pathological sample points i to obtain the adjusted contour coefficients of the clustering result, the computer readable When the instruction is executed by one or more processors, the one or more processors further execute the following steps:

Calculate the adjusted contour coefficient of multiple clustering results;

The clustering result with the highest adjusted contour coefficient is determined as the optimal clustering result of the pathological data sample set.
The readable storage medium according to claim 16, wherein, after said calculating the average of the adjusted contour coefficients of all the pathological sample points i to obtain the adjusted contour coefficients of the clustering result, the computer readable When the instruction is executed by one or more processors, the one or more processors further execute the following steps:

Judging whether the adjusted contour coefficient of the clustering result is greater than a preset coefficient threshold;

If the adjusted contour coefficient of the clustering result is greater than the preset coefficient threshold, the clustering result is determined as the preferred clustering result of the pathological data sample set.
The readable storage medium of claim 16, wherein, before the clustering result is obtained, and the clustering result divides the pathological data sample set into several clusters, the computer-readable instructions are executed by one or more clusters. When the processor executes, the one or more processors further execute the following steps:

Acquiring the pathological data sample set;

The clustering result of the pathological data sample set is calculated based on the K-Means clustering algorithm.
The readable storage medium of claim 16, wherein, before the clustering result is obtained, and the clustering result divides the pathological data sample set into several clusters, the computer-readable instructions are executed by one or more clusters. When the processor executes, the one or more processors further execute the following steps:

Acquiring the pathological data sample set;

The clustering result of the pathological data sample set is calculated based on an agglomerated hierarchical clustering algorithm.