CN115064266B

CN115064266B - Incomplete multi-set data-based cancer diagnosis system, equipment and medium

Info

Publication number: CN115064266B
Application number: CN202210867454.3A
Authority: CN
Inventors: 余国先; 王星泽; 王峻; 郭伟; 崔立真
Original assignee: Shandong University
Current assignee: Shandong University
Priority date: 2022-07-21
Filing date: 2022-07-21
Publication date: 2024-04-26
Anticipated expiration: 2042-07-21
Also published as: CN115064266A

Abstract

The present disclosure provides a cancer diagnosis system based on incomplete multiunit data, which belongs to the technical field of artificial intelligence data mining classification and bioinformatics, comprising: a data acquisition module configured to: acquiring all available to-be-diagnosed group data of the same patient; a histology data feature extraction module configured to: respectively extracting features of the obtained different groups of data; a missing omics data generation module configured to: generating missing omics data of the patient based on the generated countermeasure strategies according to the omics data corresponding to the patient, and extracting features; a multi-set of mathematical feature fusion and diagnosis module configured to: fusing the extracted histology data characteristics of the patient with the generated histology data characteristics, and inputting the fused characteristics into a pre-trained diagnosis network model to obtain a diagnosis result.

Description

Incomplete multi-set data-based cancer diagnosis system, equipment and medium

Technical Field

The disclosure belongs to the technical field of artificial intelligence data mining classification and bioinformatics, and particularly relates to a cancer diagnosis system, equipment and medium based on incomplete multi-group data.

Background

The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.

In the context of current cancer pandemics, it is particularly important to precisely classify the type of cancer in a patient by employing the patient's histology data, as different types of cancer often require different treatments. Compared with the cancer diagnosis by adopting single histology data, the fusion of multiple histology data can lead the characteristics of the patient to be more abundant, thereby further improving the accuracy of diagnosis. However, due to the high cost, invasiveness, legal and ethical constraints and other factors of the partial detection means, incomplete multi-group data are ubiquitous in the real biological world, and how to realize more accurate cancer diagnosis of patients according to the incomplete multi-group data is a difficulty that the current machine learning technology still needs to be improved in cancer diagnosis.

The inventors found that the current methods of diagnosis under incomplete multi-set of data are: training and diagnosing after excluding the missing samples in the histology data; ensuring that one of the patient's histology data is complete and then training and diagnosing; respectively constructing models according to the availability of the histology data, and then training and diagnosing; and extracting the data of each group to the feature space with the same dimension, and performing training, diagnosis and the like after fusion. However, these training methods depending on preconditions severely limit their practical application in clinic, and in addition, the method of fusing feature spaces in the same dimension only pursues the common features of each group of data, so that the individual features of each group of data are ignored, and thus, there is more room for improvement in machine learning technology for accurate diagnosis of cancer.

Disclosure of Invention

In order to solve the problems, the present disclosure provides a cancer diagnosis system, a device and a medium based on incomplete multi-group data, where the scheme extracts group characteristics through a attention-based characteristic extraction network, and makes group characteristics represent good variability through combination and optimization of sharing loss and personality loss, and attention parameters in the characteristic extraction network can not only alleviate the overfitting problem caused by high dimensionality of the group data, but also make the system have good interpretability and authenticity; generating the missing omics data of the patient by generating an countermeasure strategy, thereby enriching the characteristic representation of the patient, and realizing flexible cancer diagnosis even if only one kind of omics data is available; and finally, fusing the data characteristics of each group and inputting the fused characteristic diagnosis network, so as to obtain a more accurate diagnosis result of the patient on cancer.

According to a first aspect of embodiments of the present disclosure, there is provided a cancer diagnosis system based on incomplete multi-set of chemical data, comprising:

a data acquisition module configured to: acquiring all available to-be-diagnosed group data of the same patient;

A histology data feature extraction module configured to: respectively extracting features of the obtained different groups of data;

A missing omics data generation module configured to: generating missing omics data of the patient based on the generated countermeasure strategies according to the omics data corresponding to the patient, and extracting features;

A multi-set of mathematical feature fusion and diagnosis module configured to: fusing the extracted histology data characteristics of the patient with the generated histology data characteristics, and inputting the fused characteristics into a pre-trained diagnosis network model to obtain a diagnosis result.

Further, the feature extraction is performed on the obtained different sets of data respectively, specifically:

respectively constructing an attention parameter layer for each group of data characteristics;

Constructing a feature extraction network corresponding to each group of the learning data to perform feature extraction;

calculating sharing loss according to the extracted characteristics;

And calculating the personality loss according to the extracted features.

Further, according to the corresponding histology data of the patient, generating missing histology data of the patient based on the generated countermeasure policy, specifically:

Generating missing omic data according to the extracted omic data characteristics;

calculating generation loss and countermeasures loss based on real data corresponding to available omic data and generated omic data;

The generation loss and the antagonism loss of each group of the data are integrated, and the antagonism loss is calculated and generated.

Further, generating missing omic data according to the extracted omic data characteristics; the method comprises the following steps:

Extraction features from patient-available histology data To calculate potential features/>, of the patient corresponding to the respective omics data

Potential features that will correspond to corresponding omics dataInputting into a generation network G ^v (), corresponding to the corresponding omic data, to generate patient missing omic data/>The concrete representation is as follows:

Wherein G ^v (. Cndot.) is a generator for generating v-th histology data, Is the generated patient missing data from group v.

Further, the generating loss and the countering loss are calculated based on the real data corresponding to the available omic data and the generated omic data; the method comprises the following steps: true data in available patient histology dataGenerated data/>, from available omics dataInputting an identification network D ^v (DEG) corresponding to the corresponding histology data, and respectively calculating the generation loss/>, under the corresponding histology, according to the identification resultAnd combat loss/>The concrete representation is as follows:

Wherein D ^v (·) is a discriminator corresponding to the v-th omic data, whose output is 0 to 1, the smaller the value is representing the higher the likelihood that the discriminator considers the omic data as generated data; An indication matrix indicating whether the omics data is missing.

Further, the extracted patient histology data features and the generated histology data features are fused, specifically: and splicing and fusing the patient histology data characteristics and the generated histology data characteristics.

Further, the generated loss of each group of the data is integrated with the fight loss, and the generated fight loss is calculated as follows:

By optimizing GLoss, the performance of the countermeasure network can be continuously perfected in the process of generating and countermeasures, so that missing omics data similar to real data is generated.

According to a second aspect of the embodiments of the present disclosure, there is provided an electronic device including a memory, a processor and a computer program stored to run on the memory, the processor implementing the following procedure when executing the program:

Acquiring all available to-be-diagnosed group data of the same patient;

respectively extracting features of the obtained different groups of data;

generating missing omics data of the patient based on the generated countermeasure strategies according to the omics data corresponding to the patient, and extracting features;

Fusing the extracted histology data characteristics of the patient with the generated histology data characteristics, and inputting the fused characteristics into a pre-trained diagnosis network model to obtain a diagnosis result.

According to a third aspect of embodiments of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the following process:

Acquiring all available to-be-diagnosed group data of the same patient;

Extracting the characteristics of the obtained histology data;

Generating missing omics data for the patient based on the generating countermeasure strategy based on the available omics data for the patient;

And fusing the extracted available features of the patient and the generated histology data features, and inputting the fused features into a pre-trained diagnosis network model to obtain a diagnosis result.

According to a fourth aspect of embodiments of the present disclosure, there is provided a computer program product comprising a computer program for performing the following steps when run on one or more processors:

Acquiring all available to-be-diagnosed group data of the same patient;

Extracting the characteristics of the obtained histology data;

Compared with the prior art, the beneficial effects of the present disclosure are:

(1) According to the scheme, firstly, the characteristics of the histology are extracted through the attention-based characteristic extraction network, and the characteristics of the histology are represented with good variability through the combination and optimization of the sharing loss and the individual loss, and the attention parameters in the characteristic extraction network can not only relieve the overfitting problem caused by the high dimensionality of the histology data, but also enable the system to have good interpretability and authenticity; generating the missing omics data of the patient by generating an countermeasure strategy, thereby enriching the characteristic representation of the patient, and realizing flexible cancer diagnosis even if only one kind of omics data is available; and finally, fusing the data characteristics of each group and inputting the fused characteristic diagnosis network, so as to obtain a more accurate diagnosis result of the patient on cancer.

(2) The system disclosed by the disclosure only takes the study data of each group of patients as input, can obtain the diagnosis result of the patients without complicated operation steps, and has good usability.

Additional aspects of the disclosure will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the disclosure.

Drawings

The accompanying drawings, which are included to provide a further understanding of the disclosure, illustrate and explain the exemplary embodiments of the disclosure and together with the description serve to explain the disclosure, and do not constitute an undue limitation on the disclosure.

FIG. 1 is a schematic diagram of a system for diagnosing cancer based on incomplete multi-set of chemical data according to an embodiment of the present disclosure;

FIG. 2 is a process flow diagram of a cancer diagnostic system based on incomplete multi-set of chemical data in an embodiment of the present disclosure.

Detailed Description

The disclosure is further described below with reference to the drawings and examples.

It should be noted that the following detailed description is illustrative and is intended to provide further explanation of the present disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments in accordance with the present disclosure. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.

Embodiments of the present disclosure and features of embodiments may be combined with each other without conflict.

Embodiment one:

it is an object of the present embodiment to provide a cancer diagnosis system based on incomplete multi-set of chemical data.

A cancer diagnostic system based on incomplete multi-set of chemical data, comprising:

calculating sharing loss according to the extracted characteristics;

And calculating the personality loss according to the extracted features.

Wherein D ^v (·) is a discriminator corresponding to the v-th omic data, whose output is 0 to 1, the smaller the value is representing the higher the likelihood that the discriminator considers the omic data as generated data; an indication matrix of whether the histologic data is missing or not, when When it indicates that the type-changed omic data is available,/>And then indicates that the type of the omics data is missing.

In particular, for easy understanding, the following detailed description of the embodiments will be given with reference to the accompanying drawings:

As shown in fig. 2, there is provided a cancer diagnosis system based on incomplete multi-set of chemical data, comprising:

S101, acquiring all available to-be-diagnosed histology data of the same patient, wherein the to-be-diagnosed histology data can comprise but not only: copy number variation data, DNA methylation data, diagnostic pathological section imaging data, gene expression data, and the like.

S102, extracting the characteristics of available histology data.

Specifically, firstly, an attention parameter layer is respectively constructed for each group of data characteristics to capture important group of data characteristics, then a group of data characteristic extraction network is respectively constructed for each group of data, the group of data captured by attention parameters is input into the characteristic extraction network to perform characteristic extraction, and finally, sharing loss and personality loss are calculated according to the extracted characteristics to be used for optimizing each attention-based characteristic extraction network in a training stage.

Specifically, the specific implementation manner of the step 102 is as follows:

S1021, respectively constructing attention parameter layers for the various histology data characteristics, and characterizing the patient' S histology data Input of the respective histology feature attention layer/>To capture important feature sites in the omics data to obtain the patient's omics data/>, which is captured by the attention parameterThereby reducing the over-fitting problem caused by the high dimensionality of the histology data.

S1022, constructing a feature extraction network corresponding to each group of data, inputting the group of data features captured by the attention parameters into the feature extraction network corresponding to the corresponding group of data to obtain feature vectors of each group of data with the same feature dimensionThe definition is as follows:

Wherein the method comprises the steps of And/>The original data and the extracted features of the ith patient under the v th histology are respectively,/>Is a parameter of a feature extraction network of the v-th histology data, is a para-multiplication symbol, omega ^v is an attention weight corresponding to a corresponding histology obtained by normalization of a Softmax function in the v-th histology data, the introduction of the attention parameter can enable the feature extraction network to adaptively capture important mutation sites in the patient histology data and give the sites a larger attention weight when the feature extraction network performs feature extraction, the feature extraction network can have a certain interpretability, and the local optimal problem caused by overlarge site weight in the histology data can be prevented by normalization of omega ^v through the Softmax function.

S1023, calculating sharing loss according to the extracted features, we respectively for each extracted histology featureConstructing a characteristic evaluation network, and calculating sharing loss SLoss according to the cancer type output by the evaluation network and the real cancer type of the patient as follows:

Wherein y _i is the type of cancer to which the patient corresponds, Is to measure the extracted characteristics/>Is a loss function of capacity (cross entropy loss is used herein). SLoss aims to induce consistent prediction of the histology characteristics extracted by the characteristic extraction network, and by means of combined optimization of the attention-based characteristic extraction network and the characteristic evaluation network by SLoss, information characteristics which are helpful for cancer type diagnosis can be extracted from various histology data of patients, and the interpretability and the authenticity of the system can be improved.

S1024, calculating personality loss according to the extracted features, firstly extracting features by each groupTo calculate the characteristic prototype/>, of the patientAnd takes this as an approximation of the commonality feature, and then extracts the features/>, of the individual histology data by a personality factor λAnd feature prototype/>The similarity of (2) is converted into a personality loss ILoss for measuring the balance of personality and commonality among the extracted features, defined as follows:

Wherein the method comprises the steps of Is the extraction feature/>And feature prototype/>Cosine similarity between them, when the similarity is larger than the personality factor lambda (-1. Ltoreq.lambda. Ltoreq.1), the term/(I)Much like the commonality feature, we use Relu (·) activation functions to convert this loss to 0; otherwise,/>Containing too many individual features, we alleviate the extracted features/>, by optimizing ILossIs an excessive personalization of (c).

Through the joint optimization of the sharing loss SLoss and the personality loss ILoss, the system can extract commonality and personality characteristics from multiple groups of the patient's learning data, so that the diversity of the patient characteristics of different cancer types is ensured, and the system is helped to realize more accurate diagnosis.

S103, generating missing histology data.

Specifically, potential characteristics of the patient corresponding to each group of the patient are calculated according to the extracted characteristics of the available group of the patient, the potential characteristics are input into a generating network to generate each group of the patient, the generating data of the available group and the real data are input into an antagonism network to calculate the generating loss and the antagonism loss, and the generating loss and the antagonism loss calculated under each group are integrated to obtain the generating antagonism loss.

Specifically, the specific implementation manner of the step 103 is as follows:

S1031, generating missing omics data based on the extracted features, extracting features from the patient-available omics data To calculate potential features/>, of the patient corresponding to the respective omics dataThe following are provided:

then, potential features corresponding to the corresponding group Inputting into a generation network G ^v (·) corresponding to the respective histology to generate patient missing histology data/>The following are provided:

wherein, in order to ensure the generation capability of the generation network, the extraction features of the v-th histology data are not used for Is calculated by the computer. G ^v (.) is a generator for generating v-th histology data,/>Is the generated patient missing data from group v.

S1032, calculating generation loss and antagonism loss according to the real data and generation data corresponding to the available group, and collecting the real data of the patient under the available groupGenerated data/>, under available histologyInputting an identification network D ^v (DEG) corresponding to the corresponding histology data, and respectively calculating the generation loss/>, under the corresponding histology, according to the identification resultAnd combat loss/>The definition is as follows:

Where D ^v (·) is a discriminator corresponding to the v-th omic data, whose output is 0 to 1, the smaller the value is representing the higher the likelihood that the discriminator considers the omic data as generated data.

S1033, integrating the generation loss and the antagonism loss of each group, obtaining the generation antagonism loss GLoss of the generation antagonism network under all available group data, which is defined as follows:

By optimizing GLoss, the performance of the antagonism network can be continuously perfected in the process of generating and antagonism, so that missing omics data which is more similar to real data is generated.

S104, histology feature fusion and diagnosis.

Specifically, the generated data under the missing histology is input into a attention-based feature extraction network corresponding to the corresponding histology to perform feature extraction, the extracted features are used as approximate features of the patient under the histology, feature fusion is performed on all the histology data of the patient, a cancer diagnosis network based on the fusion features is input, a diagnosis result of the patient is obtained, and the diagnosis loss is calculated according to the diagnosis result.

Specifically, the specific implementation manner of the step 104 is as follows:

s1041, calculating the characteristic vector of each group of the patient, and generating data of the patient Input attention-based feature extraction network under the corresponding group/>Feature extraction and obtaining representative feature vector/>, of the v-th histology to be used for cancer diagnosis, based on whether the histology data is availableThe definition is as follows:

Wherein the method comprises the steps of Is a representative feature vector of the v-th histologic data we will use for cancer diagnosis. When/>When available, we directly use their characteristic representation/>Carrying out subsequent feature fusion; otherwise, we use/>To extract the generated data/>For subsequent feature fusion.

S1042, fusion of the respective histology features, combining all the histology feature vectors of the patient for diagnosisFeature fusion based on splicing is carried out, and fusion features z _i are obtained and defined as follows:

S1043, performing cancer diagnosis according to the fusion characteristic, calculating diagnosis loss, inputting the fusion characteristic z _i into a cancer diagnosis network based on the fusion characteristic for diagnosis, and calculating diagnosis loss DLoss according to the diagnosis result, wherein the definition is as follows:

where f _Ψ (·) is the cancer diagnostic network optimized by the parameter ψ, CE (f _Ψ(z_i),y_i) is the cross entropy function.

Illustratively, the training of the incomplete multi-set of chemical data cancer diagnosis system comprises:

Constructing a first training set; the first training set is all available omics data for patients for whom the diagnostic result is known to the incomplete multi-omic data cancer diagnostic system;

Specifically, inputting the data of each of the group studies in the first training set to the attention-based feature extraction network under the corresponding group study results in sharing loss SLoss and personality loss ILoss. The generated data under the available group and the real data of the corresponding group in the first training set are input into an authentication network based on the specific group, and the generated countermeasure loss GLoss is calculated according to the output of the authentication network. The fusion features derived from the available data and the generated data are input into a fusion feature-based cancer diagnostic network to obtain a final diagnostic result for the patient, and a diagnostic loss DLoss is calculated from the diagnostic result. Finally, based on these losses, the final objective loss function L of the incomplete multi-set of chemical data cancer diagnostic system is obtained, defined as follows:

L＝min_Φ,Ω,G,ΨSLoss+ILoss+GLoss+DLoss (12)

Through optimizing the target loss function L, the updating of the network parameters of each module in the incomplete multi-mathematic data cancer diagnosis system can be realized, so that the diagnosis performance of the system is improved.

Further, a first validation set and a first test set are constructed. The first validation set and the first test set are all available omics data for patients whose diagnostic results are unknown to the incomplete multi-omic data cancer diagnostic system.

Specifically, all available histology data in the first verification set are input into an incomplete multiple-histology data cancer diagnosis system to obtain a diagnosis result predicted by the system, the diagnosis accuracy of the system in the first verification set is calculated according to the diagnosis result predicted by the system and the real diagnosis result of a patient, and the system parameter with the highest diagnosis accuracy in the training process is selected as the final parameter of the incomplete multiple-histology data cancer diagnosis system.

Further, inputting all available histology data in the first test set into an incomplete multiple-histology data cancer diagnosis system represented by final parameters to obtain a system prediction diagnosis result, calculating the diagnosis accuracy of the system in the first test set according to the system prediction diagnosis result and the real diagnosis result of the patient, and taking the accuracy as an approximation of the diagnosis accuracy of the incomplete multiple-histology data cancer diagnosis system in future diagnosis tasks.

In summary, by embodiments of the present invention, we propose an incomplete multi-set of data cancer diagnostic system. The system firstly extracts the histology characteristics through a characteristic extraction network based on attention, and ensures that the histology characteristics are expressed with good difference through the combination and optimization of sharing loss and personality loss, and attention parameters in the network can not only relieve the overfitting problem caused by the high dimensionality of the histology data, but also ensure that the system has good interpretability and authenticity; generating the missing omics data of the patient by generating an countermeasure strategy, thereby enriching the characteristic representation of the patient, and realizing flexible cancer diagnosis even if only one kind of omics data is available; and finally, fusing the data characteristics of each group and inputting the fused characteristic diagnosis network, so as to obtain a more accurate diagnosis result of the patient on cancer. In addition, the system only takes the data of each group of the patient as input, can obtain the diagnosis result of the patient without complicated operation steps, and has good usability.

Embodiment two:

an object of the present embodiment is to provide an electronic apparatus.

An electronic device comprising a memory, a processor and a computer program stored to run on the memory, the processor implementing the following processes when executing the program:

Acquiring all available to-be-diagnosed group data of the same patient;

respectively extracting features of the obtained different groups of data;

Further, the details of the implementation steps of the present embodiment are described in the first embodiment, so they will not be described herein.

Embodiment III:

it is an object of the present embodiment to provide a non-transitory computer readable storage medium.

A non-transitory computer readable storage medium having stored thereon a computer program which when executed by a processor performs the following process:

Acquiring all available to-be-diagnosed group data of the same patient;

Extracting the characteristics of the obtained histology data;

Embodiment four:

it is an object of the present embodiment to provide a computer program product.

A computer program product comprising a computer program for performing the following steps when run on one or more processors:

Acquiring all available to-be-diagnosed group data of the same patient;

Extracting the characteristics of the obtained histology data;

The cancer diagnosis system, the equipment and the medium based on incomplete multi-group data provided by the embodiment can be realized, and have wide application prospect.

The foregoing description of the preferred embodiments of the present disclosure is provided only and not intended to limit the disclosure so that various modifications and changes may be made to the present disclosure by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.

Claims

1. A cancer diagnostic system based on incomplete multi-set of chemical data, comprising:

A multi-set of mathematical feature fusion and diagnosis module configured to: fusing the extracted histology data characteristics of the patient with the generated histology data characteristics, and inputting the fused characteristics into a pre-trained diagnosis network model to obtain a diagnosis result;

generating missing omics data of the patient based on the generated countermeasure strategies according to the corresponding omics data of the patient, wherein the missing omics data of the patient are specifically:

integrating the generation loss and the antagonism loss of each group of the study data, and calculating the generation antagonism loss;

Generating missing omics data according to the extracted omics data characteristics, specifically:

Extraction features from patient-available histology data To calculate potential characteristics of the patient corresponding to respective ones of the histologic data

wherein, Is the generated patient missing data of group v;

The calculation of the generation loss and the antagonism loss based on the real data corresponding to the available histology data and the generated histology data is specifically as follows: true data in available patient histology data Generated data/>, from available omics dataInputting an identification network D ^v (DEG) corresponding to the corresponding histology data, and respectively calculating the generation loss/>, under the corresponding histology, according to the identification resultAnd combat loss/>The concrete representation is as follows:

Wherein the output of D ^v (·) is 0 to 1, a smaller value representing a higher likelihood that the discriminator considers the set of data as generated data; An indication matrix indicating whether the omics data is missing.

2. The system for diagnosing cancer based on incomplete multi-set of chemical data according to claim 1, wherein the feature extraction is performed on the obtained different sets of chemical data respectively, specifically:

calculating sharing loss according to the extracted characteristics;

And calculating the personality loss according to the extracted features.

3. The system of claim 1, wherein the generated loss of each set of data is integrated with the fight loss, and the fight loss is calculated as follows:

4. The incomplete multi-group data based cancer diagnosis system according to claim 1, wherein the extracted group data features of the patient and the generated group data features are fused, in particular: and splicing and fusing the patient histology data characteristics and the generated histology data characteristics.

5. An electronic device comprising a memory, a processor and a computer program stored to run on the memory, the processor implementing the following processes when executing the program:

Acquiring all available to-be-diagnosed group data of the same patient;

respectively extracting features of the obtained different groups of data;

Fusing the extracted histology data characteristics of the patient with the generated histology data characteristics, and inputting the fused characteristics into a pre-trained diagnosis network model to obtain a diagnosis result;

wherein, Is the generated patient missing data of group v;

6. A non-transitory computer readable storage medium, having stored thereon a computer program which when executed by a processor performs the following process:

Acquiring all available to-be-diagnosed group data of the same patient;

Extracting the characteristics of the obtained histology data;

Fusing the extracted available features of the patient and the generated histology data features, and inputting the fused features into a pre-trained diagnosis network model to obtain a diagnosis result;

wherein, Is the generated patient missing data of group v;

7. A computer program product comprising a computer program for performing the following steps when run on one or more processors:

Acquiring all available to-be-diagnosed group data of the same patient;

Extracting the characteristics of the obtained histology data;

wherein, Is the generated patient missing data of group v;