CN116206756B

CN116206756B - Lung adenocarcinoma data processing method, system, equipment and computer readable storage medium

Info

Publication number: CN116206756B
Application number: CN202310503341.XA
Authority: CN
Inventors: 宋兰; 胡歌; 朱振宸; 金征宇; 周振; 潘政松; 谭卫雄
Original assignee: Peking Union Medical College Hospital Chinese Academy of Medical Sciences
Current assignee: Peking Union Medical College Hospital Chinese Academy of Medical Sciences
Priority date: 2023-05-06
Filing date: 2023-05-06
Publication date: 2023-10-27
Anticipated expiration: 2043-05-06
Also published as: CN116206756A

Abstract

The application relates to a lung adenocarcinoma data processing method, a system, a device and a computer readable storage medium. Comprising the following steps: acquiring clinical sample data of a lung adenocarcinoma patient; extracting features of the clinical sample data; classifying the samples based on the characteristics to obtain first pre-classification results of the AAH/AIS/MIA group and the IAC group; classifying the samples based on the characteristics to obtain second pre-classification results of the AAH/AIS group and the MIA/IAC group; reclassifying IAC group samples in the first pre-classification result and AAH/AIS group samples in the second pre-classification result, wherein the reclassifying is to perform three classification on the samples based on the characteristics to obtain an AAH/AIS, MIA, IAC classification result. The application effectively solves the problem that the existing diagnostic model has lower accuracy in diagnosing lung adenocarcinoma infiltration, and has important clinical application value.

Description

Lung adenocarcinoma data processing method, system, equipment and computer readable storage medium

Technical Field

The application belongs to the technical field of medical data processing, and particularly relates to a lung adenocarcinoma data processing method, a lung adenocarcinoma data processing system, lung adenocarcinoma data processing equipment, a computer readable storage medium and application of the lung adenocarcinoma data processing system.

Background

With the worldwide popularity of lung cancer screening, more early stage lung adenocarcinomas were found, which often appear as ground glass lung nodules on CT. Compared with solid nodules, ground glass nodules have the characteristics of slow growth rate and low probability of lymph node metastasis or distant metastasis, and are generally subjected to minimally invasive treatment. The prognosis of the ground glass nodule and the choice of surgical mode are mainly dependent on the wettability of the pathology during or after surgery. The differential diagnosis of the infiltration of early lung adenocarcinoma is the key of preoperative evaluation, and determines the subsequent clinical treatment mode. For example, the prognosis for pre-invasive lesions such as Atypical Adenomatoid Hyperplasia (AAH), in situ Adenocarcinoma (AIS), etc., is better, and the optimal timing of excision can be determined by follow-up; for Micro Invasive Adenocarcinoma (MIA) and Invasive Adenocarcinoma (IAC) with poor prognosis, immediate active intervention is required. Thus, accurate non-invasive diagnosis of ground glass nodule wettability is helpful for pre-operative clinical decisions, but this is a challenging task due to large feature overlap.

At present, multiple wettability prediction studies model quantitative high-dimensional image information by adopting an image histology (size and entity proportion in extracted image characteristics) method and/or a deep learning method to predict wettability of ground glass nodules, and the prediction performance of the method is lower than that of advanced radiologists in a multi-center data set. The existing artificial intelligence research is mostly limited to improving the accuracy of a model by utilizing a novel computer vision technology, lacks substantial consideration closely related to early lung adenocarcinoma infiltration, achieves the bottleneck and faces the problem of lower MIA diagnosis accuracy.

Disclosure of Invention

The application aims to provide a lung adenocarcinoma data processing method, a system, equipment, a computer readable storage medium and application thereof, aiming at the problems, and the application effectively solves the problem that the existing diagnosis model has lower accuracy in diagnosing lung adenocarcinoma infiltration, in particular to the problem of high misjudgment rate of MIA.

According to a first aspect of the present application, there is provided a lung adenocarcinoma data processing method comprising:

acquiring clinical sample data of a lung adenocarcinoma patient;

extracting features of the clinical sample data;

classifying the samples based on the characteristics to obtain first pre-classification results of the AAH/AIS/MIA group and the IAC group;

classifying the AAH/AIS/MIA group based on the characteristics to obtain a first classification result comprising AAH/AIS and MIA;

classifying the samples based on the characteristics to obtain second pre-classification results of the AAH/AIS group and the MIA/IAC group;

classifying the MIA/IAC based on the characteristics to obtain a second classification result comprising MIA and IAC;

reclassifying IAC group samples in the first pre-classification result and AAH/AIS group samples in the second pre-classification result, wherein the reclassifying is to perform three classification on the samples based on the characteristics to obtain an AAH/AIS, MIA, IAC classification result;

And outputting a classification result of which the sample is AAH/AIS, MIA or IAC based on the first classification result, the second classification result and the reclassification result.

Further, the method comprises:

acquiring image sample data of a lung adenocarcinoma patient;

extracting image histology characteristics of the image sample data;

classifying the sample based on the image histology characteristics to obtain a first pre-classification result of the AAH/AIS/MIA group and the IAC group; classifying the AAH/AIS/MIA group based on the image histology characteristics to obtain a first classification result comprising AAH/AIS and MIA;

classifying the sample based on the image histology characteristics to obtain second pre-classification results of the AAH/AIS group and the MIA/IAC group; classifying the MIA/IAC based on the image histology characteristics to obtain a second classification result comprising MIA and IAC;

reclassifying IAC group samples in the first pre-classification result and AAH/AIS group samples in the second pre-classification result, wherein the reclassifying is to perform three classification on the samples based on the image histology characteristics to obtain an AAH/AIS, MIA, IAC classification result;

Further, the image histology feature of extracting the image sample data is that the image sample data is input into a deep learning algorithm model to extract the image histology feature, the deep learning algorithm model comprises a deep learning model added with a discriminant filter learning module, a machine learning model for extracting radiological histology feature and a fusion module, the discriminant filter learning module is used for extracting details of the nodule, and the image sample data is input into the deep learning model added with the discriminant filter learning module to extract global features and local details of the nodule in the sample; inputting the image sample data into a machine learning model for extracting radiological features to obtain radiological features; the fusion module performs feature fusion on the global features, the local detail features and the radioomic features of the nodules in the sample.

Preferably, the algorithm adopted by the deep learning model comprises the following algorithm or partial network structure of the algorithm for extracting global features and local features of the image: such as convolutional neural networks, recurrent neural networks, fully-connected neural networks, resNet, attention models, long-term and short-term memory networks, hopfield networks, denseNet, FCN, segNet, ENet, segViT, RTFormer, refineNet, dense-UNet, H-Denseunet, acceptance modules, etc.

The DenseNet (Dense Convolutional Network) model is a deep convolutional neural network model proposed by Huang Jun et al in 2017. DenseNet combines different convolutional layers in a Dense Connection (DenseConnection) fashion. In DenseNet, each convolution layer receives the output of all previous layers as input, so that the information flow of the network is more sufficient, and the feature extraction capability of the network is improved. The acceptance module is a special convolution layer structure used in GoogleNet, and can simultaneously use convolution kernels and pooling layers with different sizes to extract features and connect feature graphs with different sizes to extract the features. This structure can improve the nonlinear expression capacity of the network and reduce the parameter number of the model. The ResNet (Residual Network) is a deep convolutional neural network model proposed by He Kaiming et al in 2015, where ResNet replaces the conventional convolutional layers with residual blocks, each consisting of two convolutional layers and one skip connection (Shortcut Connection). The jump connection allows the network to jump some convolutional layers directly from the preceding layer, passing the input directly into the following layer, thus enabling better passing of gradient information into the shallow network.

Preferably, the deep learning algorithm model is a 3D DenseNet added with a discriminant filter learning module (discriminative filter learning, DFL), a Machine Learning (ML) model extracting radiological features, and a fusion module.

Further, the deep learning model of the discriminant filter learning module sequentially comprises a convolution layer, 3 dense block modules, a DFL module and a pooling layer, and input image sample data sequentially passes through the convolution layer, the dense block modules, the DFL module and the pooling layer to obtain 1024 global features and local detail features related to the nodules.

The fusion module adopts any one or more of the following methods to perform feature fusion: hard voting fusion, weighted fusion, stacking fusion and blending fusion.

Further, the machine learning model for extracting the radiological features comprises a region segmentation network module and a feature extraction module, wherein the region segmentation network module is used for segmenting image sample data to obtain a region of interest, and the feature extraction module is used for extracting features of the region of interest to obtain the radiological features; the fusion module is a multi-layer neural network classifier and performs feature fusion on the global features, the local detail features and the radiology features of the nodes in the sample.

Further, the method comprises: the number of acquired image samples of the lung adenocarcinoma patient, clinical information and/or genetic information of the patient; advancing the characteristics of the number of image samples, clinical information and/or genetic information of the patient with lung adenocarcinoma, wherein the characteristics comprise image histology characteristics, clinical characteristics and/or genome characteristics; classifying the samples based on the characteristics to obtain first pre-classification results of the AAH/AIS/MIA group and the IAC group; classifying the AAH/AIS/MIA group based on the characteristics to obtain a first classification result comprising AAH/AIS and MIA; classifying the samples based on the characteristics to obtain second pre-classification results of the AAH/AIS group and the MIA/IAC group; classifying the MIA/IAC based on the characteristics to obtain a second classification result comprising MIA and IAC; reclassifying IAC group samples in the first pre-classification result and AAH/AIS group samples in the second pre-classification result, wherein the reclassifying is to perform three classification on the samples based on the characteristics to obtain an AAH/AIS, MIA, IAC classification result; and outputting a classification result of which the sample is AAH/AIS, MIA or IAC based on the first classification result, the second classification result and the reclassification result.

Further, the method comprises:

Acquiring clinical sample data of a lung adenocarcinoma patient;

extracting features of the clinical sample data;

classifying samples based on non-invasive features in the features to obtain first pre-classification results of an AAH/AIS/MIA group and an IAC group; classifying the AAH/AIS/MIA group based on the wettability characteristics in the characteristics to obtain a first classification result comprising AAH/AIS and MIA;

classifying the samples based on the wettability features in the features to obtain second pre-classification results of the AAH/AIS group and the MIA/IAC group; classifying the MIA/IAC based on non-invasive features in the features to obtain a second classification result comprising MIA and IAC;

reclassifying IAC group samples in the first pre-classification result and AAH/AIS group samples in the second pre-classification result, wherein the reclassifying is to perform three classification on the samples based on the characteristics to obtain an AAH/AIS, MIA, IAC classification result.

Optionally, the classifying the sample based on the image histology features to obtain a first pre-classification result of the AAH/AIS/MIA group and the IAC group is to classify the sample based on the image histology features by using a first classifier, and the training process of the first classifier includes:

acquiring training sample image data and a label of a lung adenocarcinoma patient, wherein the first label is an AAH/AIS/MIA group, and the second label is an IAC group;

Inputting the image sample data into a deep learning algorithm model to extract image histology characteristics;

inputting the image histology characteristics into a classifier to obtain a preliminary classification result, carrying out loss calculation on the preliminary classification result and the label to obtain a loss value, and optimizing the classifier according to the loss value to obtain a trained first classifier;

optionally, the classifying the sample based on the image histology features to obtain a second pre-classification result of the AAH/AIS group and the MIA/IAC group is to classify the sample based on the image histology features by using a second classifier, and the training process of the second classifier includes:

acquiring training sample image data and a label of a lung adenocarcinoma patient, wherein the first label is an AAH/AIS group, and the second label is an MIA/IAC group;

inputting the image histology characteristics into a classifier to obtain a preliminary classification result, carrying out loss calculation on the preliminary classification result and the label to obtain a loss value, and optimizing the classifier according to the loss value to obtain a trained second classifier;

optionally, the reclassifying includes performing multi-classification on the sample based on the image histology characteristics to obtain an AAH/AIS, MIA, IAC classification result, and classifying the sample based on the image histology characteristics by using a third classifier, where a training process of the third classifier includes:

Acquiring training sample image data and labels of a lung adenocarcinoma patient, wherein the first label is an AAH/AIS group, the second label is an MIA group, and the third label is an IAC group;

inputting the image histology characteristics into a classifier to obtain a preliminary classification result, carrying out loss calculation on the preliminary classification result and the label to obtain a loss value, and optimizing the classifier according to the loss value to obtain a trained third classifier.

A lung adenocarcinoma data processing system comprising machine readable program instructions which when executed by a processor are adapted to carry out a lung adenocarcinoma data processing method as described above.

A lung adenocarcinoma data processing system, comprising:

an acquisition unit for acquiring clinical sample data of a lung adenocarcinoma patient;

a feature extraction unit for extracting features of the clinical sample data;

the first pre-classification unit is used for classifying the samples based on the characteristics to obtain first pre-classification results of the AAH/AIS/MIA group and the IAC group;

the first classification unit is used for classifying the AAH/AIS/MIA group based on the characteristics to obtain a first classification result comprising AAH/AIS and MIA;

The second pre-classification unit is used for classifying the samples based on the characteristics to obtain second pre-classification results of the AAH/AIS group and the MIA/IAC group;

a second classification unit, configured to classify the MIA/IAC based on the feature to obtain a second classification result including MIA and IAC;

the reclassifying unit is used for reclassifying IAC group samples in the first pre-classifying result and AAH/AIS group samples in the second pre-classifying result, wherein the reclassifying is to perform three-classification on the samples based on the characteristics to obtain an AAH/AIS, MIA, IAC classifying result;

and the output unit is used for outputting a classification result of which the sample is AAH/AIS, MIA or IAC based on the first classification result, the second classification result and the reclassifying result.

A lung adenocarcinoma data processing apparatus, the apparatus comprising: a memory and a processor; the memory is used for storing program instructions; the processor is used for calling program instructions, and when the program instructions are executed, the processor is used for executing the lung adenocarcinoma data processing method.

A computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the lung adenocarcinoma data processing method described above.

Further, the acquired image sample data of the lung adenocarcinoma patient comprises chest electron computed tomography CT images, chest X-rays (chest radiographs), positron emission computed tomography (PET-CT) images, magnetic Resonance Imaging (MRI) and ultrasound of the lung adenocarcinoma patient.

Further, the number of image samples obtained for a lung adenocarcinoma patient, clinical information and/or genetic information for the patient. Wherein the characteristics of the text information comprise age, sex, smoking, family history, pathological differentiation degree, squamous carcinoma, adenocarcinoma, carcinoembryonic antigen and the like, and the characteristics of the gene information comprise EGFR mutation, ALK mutation, HER2 and the like; inputting the image sample number of the lung adenocarcinoma patient, the text information, the gene information and/or the pathological information of the patient into a deep learning algorithm model to extract features, wherein the features comprise image histology features, clinical features and/or gene features; classifying the samples based on the characteristics to obtain first pre-classification results of the AAH/AIS/MIA group and the IAC group; classifying the AAH/AIS/MIA group based on the characteristics to obtain a first classification result comprising AAH/AIS and MIA; classifying the samples based on the characteristics to obtain second pre-classification results of the AAH/AIS group and the MIA/IAC group; classifying the MIA/IAC based on the characteristics to obtain a second classification result comprising MIA and IAC; reclassifying IAC group samples in the first pre-classification result and AAH/AIS group samples in the second pre-classification result, wherein the reclassifying is to perform three classification on the samples based on the characteristics to obtain an AAH/AIS, MIA, IAC classification result; and outputting a classification result of which the sample is AAH/AIS, MIA or IAC based on the first classification result, the second classification result and the reclassification result.

Further, the image features include global features and local detail features of the nodules in the sample extracted based on a deep learning model with the addition of a discriminant filter learning module, and radiological features based on a machine learning model.

Further, the radiological features include any one or more of the following different types of features: first order features, second order features, and higher order features. The first order features include energy, entropy, kurtosis, standard deviation, minimum feature value, mean, skewness, volume, surface area, sphericity, compactness, diameter, flatness, longest diameter, and the like. The second-order features include gray level co-occurrence matrix features, gray level size region matrix features, gray level run-length matrix features, adjacent gray level difference matrix features, and gray level dependency matrix features. The higher order features include wavelet features, fourier features, laplace features, and the like. The wavelet features include boundaries, free and bonded areas, etc.

In some embodiments, the nodules include any one or more of the following: solid lung nodules, pure ground glass nodules, mixed ground glass nodules.

The application has the advantages that:

1. The lung adenocarcinoma data processing prediction is performed based on the image processing technology, the accuracy of lung adenocarcinoma AAH/AIS, MIA, IAC classification is improved through flow innovation, and especially the accuracy of MIA classification is improved, so that the method has great significance in the aspects of assisting lung adenocarcinoma diagnosis and assisting in establishing personalized treatment strategies;

2. the application creatively optimizes the deep learning model, merges the deep learning model and the radiological machine learning model, and improves the performance of the model; further, the applicant carries out frame upgrading on the deep learning model, and the deep learning model after frame upgrading has a relation between complex semantic information and different local features through a trunk module by adding a discriminant filter learning module as a node detail feature extraction;

3. the application creatively integrates the binary classification model and the ternary classification model in the intelligent diagnosis process of the lung adenocarcinoma image, namely, two binary classification models are firstly constructed to respectively carry out the binary classification on the image to obtain a preliminary classification result, and the ternary classification model is further used for carrying out the three classification on the group with objection in the preliminary classification result based on the image; the result shows that the accuracy of the method is greatly improved compared with the current result of directly carrying out ternary classification.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a lung adenocarcinoma data processing method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a module connection of a lung adenocarcinoma data processing system according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a computer analysis apparatus according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a lung adenocarcinoma data processing flow and a schematic diagram of six model structures according to an embodiment of the present invention;

FIG. 5 is a ROC curve provided by one embodiment of the invention; a in fig. 5: an internal test set ROC curve; the external test set ROC curve in fig. 5B;

FIG. 6 is a graph showing confounding moments of six models of lung adenocarcinoma classification provided by one embodiment of the invention, respectively, on an external test set; the confusion matrix (a-F in fig. 6) comprising 6 models and Kappa and F1 scores (6G) of 6 models are obtained in that the confusion matrix is derived from the ML model (a in fig. 6), the DL model (B in fig. 6), the DL-ML model (C in fig. 6), respectively, the truth-table-based confusion matrix is applied to the Bi-ML model (D in fig. 6), the Bi-DL model (E in fig. 6), the Bi-DL-ML model (F in fig. 6), the row of the left matrix is the result of binary classification task 1, the column of the left matrix is the result of binary classification task 2, the four large cells intersecting the row and column are the result of rule-based fusion of binary classification models AAH/AIS, MIA, IAC and paradoxical nodules, and then each large cell is further divided into three parts according to the actual pathological classification, the right matrix is the ternary model confusion matrix for further recognition of paradoxical nodules (i.e., the nodules predicted as C in binary task 1 and the AAH/iah nodules predicted as binary task 2).

Detailed Description

In order to enable those skilled in the art to better understand the present invention, the following description will make clear and complete descriptions of the technical solutions according to the embodiments of the present invention with reference to the accompanying drawings.

In some of the flows described in the specification and claims of the present invention and in the above figures, a plurality of operations appearing in a particular order are included, but it should be clearly understood that the operations may be performed in other than the order in which they appear herein or in parallel, the sequence numbers of the operations such as S101, S102, etc. are merely used to distinguish between the various operations, and the sequence numbers themselves do not represent any order of execution. In addition, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel.

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments according to the invention without any creative effort, are within the protection scope of the invention.

The embodiment of the application provides a lung adenocarcinoma data processing method, a system, equipment, a computer readable storage medium and application thereof. The corresponding training device for implementing the lung adenocarcinoma data processing method can be integrated in computer equipment, and the computer equipment can be a terminal or a server and other equipment. The terminal can be terminal equipment such as a smart phone, a tablet personal computer, a notebook computer, a personal computer and the like. The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud storage, network services, cloud communication, middleware services, domain name services, security services, a content delivery network (Content Delivery Network, abbreviated as CDN), basic cloud computing services such as big data and an artificial intelligent platform. The terminal and the server may be directly or indirectly connected through wired or wireless communication, which is not limited herein.

Fig. 1 is a schematic flow chart of a lung adenocarcinoma data processing method provided by an embodiment of the application. Specifically, the following operations are included as shown in fig. 1:

S101: clinical sample data of a lung adenocarcinoma patient is obtained.

In one embodiment, the acquired image sample data of a lung adenocarcinoma patient includes chest Computed Tomography (CT) images, chest X-rays (chest radiographs), positron emission computed tomography (PET-CT) images, magnetic Resonance Imaging (MRI), ultrasound of the lung adenocarcinoma patient.

In one embodiment, image sample data, clinical information and/or genetic information of a patient is obtained for a lung adenocarcinoma patient. Wherein the image sample data comprises chest Computerized Tomography (CT) images, chest X-ray (chest radiography), positron emission tomography (PET-CT) images, magnetic Resonance Imaging (MRI), ultrasound, the text information features comprise age, sex, smoking, family history, pathological differentiation degree, squamous carcinoma, adenocarcinoma, carcinoembryonic antigen, and the like, and the gene information features comprise EGFR mutation, ALK mutation, HER2, and the like.

In one embodiment, the acquired image sample data of lung adenocarcinoma patients includes 361 lung nodule CT images of 281 lung adenocarcinoma patients confirmed by post-menstrual pathology from 12 months 2019 to 2021. Each nodule is given a true signature that is pathologically confirmed and meets inclusion criteria in the lung adenocarcinoma classification result (AAH, AIS, MIA, IAC). Pre-invasive lesions (AAH/AIS), micro-invasive adenocarcinoma (MIA), invasive Adenocarcinoma (IAC). Table 1 summarizes the characteristics of the patient and pulmonary nodules, including age, gender, size, and location.

And (3) injection: SD standard deviation; q1 is the first quartile; q3 third quartile, RUL, upper right leaf, RML, middle right leaf, RLL, lower right leaf, LUL, upper left lung leaf, LLL, lower left leaf.

Atypical Adenomatous Hyperplasia (AAH): is a precancerous lesion (precancerous lesion) of lung adenocarcinoma, the lesion is a single-row noninvasive atypical epithelial cell lining the alveolar wall, the focal lesion of peripheral alveoli is generally less than or equal to 5mm when the respiratory bronchioles are involved by mild to moderate atypical cell-restricted hyperplasia, and no interstitial inflammation and fibrosis change exists.

In situ Adenocarcinoma (AIS): the occurrence of solid changes in the density of the pure ground glass is pathologically an AIS. The size is used to distinguish between AAH and AIS, which are generally >5mm, and the AAH.ltoreq.5 mm AAH and AIS processes are continuous, with no clear limit from atypical hyperplasia to carcinoma in situ.

Micro Invasive Adenocarcinoma (MIA): the early lung adenocarcinoma is smaller than or equal to 3cm in diameter, mainly grows in an adherent mode, and the maximum diameter of any invasive lesion in the focus is smaller than or equal to 5mm, and is not accompanied by invasive pleura, blood vessels, lymphatic vessels or tumor necrosis.

Invasive Adenocarcinoma (IAC): this is a lung cancer with a high malignancy. When MIA continues to grow, alveoli collapse to form an irregular nest-like structure, interstitial spaces are infiltrated, and when the infiltrated solid transformation range is more than 5mm or the local nodule is completely formed by solid soft tissue density. CT enhancement scan can find that the lobular solid nodule can be reinforced or tumor microvascular symptoms can appear at the edge part of the nodule, and pleural dishing symptoms can appear. A thin burr feature may also occur at the periphery of the nodule.

S102: features of the clinical sample data are extracted.

In some embodiments, the clinical sample data is image data, and image histology features of the image sample data are extracted; classifying the sample based on the image histology characteristics to obtain a first pre-classification result of the AAH/AIS/MIA group and the IAC group; classifying the AAH/AIS/MIA group based on the image histology characteristics to obtain a first classification result comprising AAH/AIS and MIA; classifying the sample based on the image histology characteristics to obtain second pre-classification results of the AAH/AIS group and the MIA/IAC group; classifying the MIA/IAC based on the image histology characteristics to obtain a second classification result comprising MIA and IAC; reclassifying IAC group samples in the first pre-classification result and AAH/AIS group samples in the second pre-classification result, wherein the reclassifying is to perform three classification on the samples based on the image histology characteristics to obtain an AAH/AIS, MIA, IAC classification result; and outputting a classification result of which the sample is AAH/AIS, MIA or IAC based on the first classification result, the second classification result and the reclassification result.

In some embodiments, the number of image samples taken of a lung adenocarcinoma patient, clinical information and/or genetic information of the patient; advancing the characteristics of the image sample number, clinical information and/or genetic information of the lung adenocarcinoma patient, wherein the characteristics comprise image histology characteristics, clinical characteristics and/or genetic characteristics; classifying the samples based on the characteristics to obtain first pre-classification results of the AAH/AIS/MIA group and the IAC group; classifying the AAH/AIS/MIA group based on the characteristics to obtain a first classification result comprising AAH/AIS and MIA; classifying the samples based on the characteristics to obtain second pre-classification results of the AAH/AIS group and the MIA/IAC group; classifying the MIA/IAC based on the characteristics to obtain a second classification result comprising MIA and IAC; reclassifying IAC group samples in the first pre-classification result and AAH/AIS group samples in the second pre-classification result, wherein the reclassifying is to perform three classification on the samples based on the characteristics to obtain an AAH/AIS, MIA, IAC classification result; and outputting a classification result of which the sample is AAH/AIS, MIA or IAC based on the first classification result, the second classification result and the reclassification result.

In some embodiments, the extracting the image histology feature of the image sample data is that the image sample data is input into a deep learning algorithm model to extract the image histology feature, the deep learning algorithm model comprises a deep learning model added with a discriminant filter learning module, a machine learning model for extracting the radiological histology feature and a fusion module, the discriminant filter learning module is used for extracting the details feature of the nodule, and the deep learning model added with the discriminant filter learning module is input into the image sample data to extract the global feature and the local detail feature of the nodule in the sample; inputting the image sample data into a machine learning model for extracting radiological features to obtain radiological features; the fusion module performs feature fusion on the global features, the local detail features and the radioomic features of the nodules in the sample.

In some embodiments, the discriminant filter learning module is to extract nodule detail features; optionally, the deep learning model added with the discriminant filter learning module is to add the discriminant filter learning module in a densely connected convolution network, and the densely connected convolution network is used for mining complex semantic information and relations among different local details in the image.

Further, the algorithm adopted by the deep learning model comprises the following algorithm or partial network structure of the algorithm for extracting global features and local features of the image: such as convolutional neural networks, recurrent neural networks, fully connected neural networks, residual networks, attention models, long and short term memory networks, hopfield networks, denseNet, FCN, segNet, ENet, segViT, RTFormer, refineNet, dense-UNet, H-Denseunet. Preferably, the deep learning algorithm model is a 3D DenseNet added with a discriminant filter learning module (discriminative filter learning, DFL), a Machine Learning (ML) model extracting radiological features, and a fusion module.

FCN is the mountain-climbing operation of the full convolution network in the semantic segmentation field, and the main idea is to improve the network for classifying images into the network for semantic segmentation, and restore the network by changing a classifier (full connection layer) into an upsampling layer.

The SegNet backbone network is 2 VGGs 16, and the full connection layer is removed to form a corresponding encoder-decoder architecture, so that a maximum pooling index method is provided for upsampling, and memory is saved in the reasoning stage.

The ene contains a large encoder and a small decoder, is a lightweight semantic segmentation network, is a quick implementation of semantic segmentation, and also considers the real-time performance of segmentation while considering the segmentation accuracy. The downsampling process uses dilation convolution, so that the image resolution and the image receptive field can be well balanced, and the receptive field of an image target is enlarged while the resolution of a feature map is not reduced.

SegViT proposes an attention Mask decoder module that first uses spatial information in the visual Transformer (ViT) of the spatial attention mechanism to generate Mask predictions for each category, applying ATM decoder modules to a common, non-hierarchical ViT backbone in a cascaded manner.

RTFormer, a high-efficiency dual resolution transducer for real-time semantic segmentation, achieves a better balance between performance and efficiency, achieves high inference efficiency with GPU Friendly Attention of linear complexity, and more efficiently gathers all context information for high resolution branches across resolution attention by propagating high-level knowledge from low resolution branches.

RefineNet is a generative multipath enhanced network that utilizes multi-level abstractions for high resolution semantic segmentation; constructing all components by using homomorphic mapping with residual connections; a chain residue pooling module is presented that captures background context from a larger image region and fuses together using residual connections and learned weights to capture more rich background context information in an efficient manner.

The Dense-UNet replaces the encoder and decoder of a conventional U-Net network with a specially designed Dense connectivity module (Dense_Block).

H-DenseUNet a new hybrid dense connection of UNET, comprising a 2D DenseUNet for efficient extraction of on-chip features and a 3D DenseUNet counterpart partition for hierarchical aggregation.

Convolutional neural networks utilize convolution and pooling layers to reduce the dimensionality of an image, whose convolutional layers are trainable, but whose parameters are significantly less than standard hidden layers, are able to highlight important parts of the image and propagate forward.

S103: and classifying the samples based on the characteristics to obtain a first pre-classification result of the AAH/AIS/MIA group and the IAC group.

In one embodiment, classifying the sample based on non-invasive ones of the features results in a first pre-classification result for the AAH/AIS/MIA group and the IAC group. Preferably, the non-invasive feature is a non-invasive feature of a nodule, including one or more of the following: smooth edges, uniform density, low density, small nodule diameter.

In one embodiment, the classifying the sample to obtain the first pre-classification result of the AAH/AIS/MIA group and the IAC group is to classify the sample by using a first classifier, and the training process of the first classifier includes: acquiring training clinical sample data and a label of a lung adenocarcinoma patient, wherein the first label is an AAH/AIS/MIA group, and the second label is an IAC group; extracting features of the clinical sample data; inputting the characteristics into a classifier to obtain a preliminary classification result, carrying out loss calculation on the preliminary classification result and the label to obtain a loss value, and optimizing the classifier according to the loss value to obtain a trained first classifier.

In one embodiment, the classifying the sample based on the image histology features to obtain a first pre-classification result of the AAH/AIS/MIA group and the IAC group is to classify the sample based on the image histology features by using a first classifier, and the training process of the first classifier includes: acquiring training sample image data and a label of a lung adenocarcinoma patient, wherein the first label is an AAH/AIS/MIA group, and the second label is an IAC group; inputting the image sample data into a deep learning algorithm model to extract image histology characteristics; inputting the image histology characteristics into a classifier to obtain a preliminary classification result, carrying out loss calculation on the preliminary classification result and the label to obtain a loss value, and optimizing the classifier according to the loss value to obtain a trained first classifier.

In one embodiment, when the clinical sample data is image sample data of a lung adenocarcinoma patient, clinical information of the patient and/or gene information, the features of the clinical sample data can be extracted by adopting a statistical method, a machine learning algorithm and/or a deep learning model respectively or in a merging mode, etc., the selected mode can be adjusted according to the data condition, for example, when the clinical sample data is clinical data, regression analysis, logic analysis or a machine learning algorithm can be selected to extract the features of the clinical sample data; when the clinical sample data used are clinical data and image sample data, statistical methods, machine learning algorithms and/or deep learning models can be used to extract features of the clinical information, and deep learning models can be used to extract features of the image sample data.

S104: and classifying the AAH/AIS/MIA group based on the characteristics to obtain a first classification result comprising AAH/AIS and MIA.

In a specific embodiment, the AAH/AIS/MIA group is classified based on the wettability feature of the features to obtain a first classification result including AAH/AIS and MIA. Optionally, the wettability characteristic refers to wettability characteristics of the nodule, including one or more of the following states: burrs, irregularly shaped, unevenly dense bumps, invading vascular gaps, disappearance of fat gaps with blood vessels, pleural effusions, lobules, vacuoles, bronchioles inflation, vascular navigation, pleural traction.

In a specific embodiment, the classifying the AAH/AIS/MIA group based on the feature obtains a first classification result including AAH/AIS and MIA, and classifies the sample with a fourth classifier, where a training process of the fourth classifier includes: acquiring training clinical sample data and a label of a lung adenocarcinoma patient, wherein the first label is an AAH/AIS group, and the second label is an MIA group; extracting features of the clinical sample data; and inputting the characteristics into a classifier to obtain a preliminary classification result, carrying out loss calculation on the preliminary classification result and the label to obtain a loss value, and optimizing the classifier according to the loss value to obtain a trained fourth classifier.

S105: and classifying the samples based on the characteristics to obtain second pre-classification results of the AAH/AIS group and the MIA/IAC group.

In a specific embodiment, classifying the sample based on the wettability features of the features results in a second pre-classification result of the AAH/AIS group and the MIA/IAC group. Preferably, the wettability characteristic refers to wettability characteristics of the nodule, including one or more of the following states: burrs, irregularly shaped, unevenly dense bumps, invading vascular gaps, disappearance of fat gaps with blood vessels, pleural effusions, lobules, vacuoles, bronchioles inflation, vascular navigation, pleural traction.

In a specific embodiment, the classifying the sample to obtain the second pre-classification result of the AAH/AIS group and the MIA/IAC group is to classify the sample by using a second classifier, and the training process of the second classifier includes: acquiring training clinical sample data and a label of a lung adenocarcinoma patient, wherein the first label is an AAH/AIS group, and the second label is an MIA/IAC group; extracting features of the clinical sample data; inputting the characteristics into a classifier to obtain a preliminary classification result, carrying out loss calculation on the preliminary classification result and the label to obtain a loss value, and optimizing the classifier according to the loss value to obtain a trained second classifier.

And S106, classifying the MIA/IAC based on the characteristics to obtain a second classification result comprising MIA and IAC.

In a specific embodiment, classifying the MIA/IAC based on non-invasive features of the features results in a second classification result comprising MIA and IAC. Optionally, the non-invasive feature refers to a non-invasive feature of the nodule, including one or more of the following: smooth edges, uniform density, low density, small nodule diameter.

In a specific embodiment, classifying the MIA/IAC based on the feature obtains a second classification result including MIA and IAC, and classifies the sample with a fifth classifier, and a training process of the fifth classifier includes: acquiring training clinical sample data and a label of a lung adenocarcinoma patient, wherein the first label is an MIA group and the second label is an IAC group; extracting features of the clinical sample data; and inputting the characteristics into a classifier to obtain a preliminary classification result, carrying out loss calculation on the preliminary classification result and the label to obtain a loss value, and optimizing the classifier according to the loss value to obtain a trained fifth classifier.

S107, reclassifying IAC group samples in the first pre-classification result and AAH/AIS group samples in the second pre-classification result, wherein the reclassifying is to perform three classification on the samples based on the characteristics to obtain an AAH/AIS, MIA, IAC classification result.

In a specific embodiment, the reclassifying is that the classifying result of the samples by using the third classifier is that the samples are classified by using the AAH/AIS, MIA, IAC classification result, and the training process of the third classifier includes: acquiring training clinical sample data and labels of a lung adenocarcinoma patient, wherein the first label is an AAH/AIS group, the second label is an MIA group, and the third label is an IAC group; extracting features of the clinical sample data; and inputting the characteristics into a classifier to obtain a preliminary classification result, carrying out loss calculation on the preliminary classification result and the label to obtain a loss value, and optimizing the classifier according to the loss value to obtain a trained third classifier.

S108, outputting a classification result of which the sample is AAH/AIS, MIA or IAC based on the first classification result, the second classification result and the reclassification result.

In some embodiments, the machine learning model for extracting the radiological features comprises a region segmentation network module and a feature extraction module, wherein the region segmentation network module is used for segmenting image sample data to obtain a region of interest, and the feature extraction module is used for extracting features of the region of interest to obtain the radiological features; the fusion module is a multi-layer neural network classifier and performs feature fusion on the global features, the local detail features and the radioomic features in the sample.

Further, the region segmentation is realized by any one or more of the following models: U-Net++, PSPNet, deep Lab v1/v2/v3/v3+, YOLO, SSD, faster R-CNN, mask R-CNN, resNet.

U-Net++ adds a redesigned jump path on a U-Net basis to improve segmentation accuracy by adding a Dense block and a convolutional layer between the encoder and decoder.

PSPNet proposes a pyramid pooling module with a hole convolution, whose pyramid pooling incorporates four scale features, while incorporating multi-size information.

Deep Lab v1/v2/v3/v3+ is a deep Lab series model, deep Lab v1 uses hole convolution to enlarge receptive fields and conditional random field refinement boundaries, deep Lab v2 adds a parallel structure of hole convolution, deep Lab v3 adds multi-gradient image level features, cascade network performance is improved, deep Lab v3+ adds a decoder module, and its backbone network uses Aligned Xreception (with depth decomposable convolution).

YOLO is a real-time object detection algorithm, which is the first algorithm to balance the quality and speed of detection provided, and detects an input image in a feature-coded form, with one or more output layers that produce model predictions.

SSD is a single detection depth neural network, and simultaneously combines the regression idea of YOLO and the anchors mechanism of Faster R-CNN to extract multi-scale target features with different aspect ratio sizes.

The Fast R-CNN consists of a deep convolutional neural network for generating region candidate boxes and a detection head using Fast R-CNN for generating region candidate boxes.

The Mask R-CNN integrates the advantages of the fast R-CNN and the FCN algorithm, and is also a post-starting show in the double-stage example segmentation algorithm, the network model of the algorithm is designed to be unique, and the segmentation accuracy of the target image is high.

ResNet: residual connection is introduced to solve the problem of gradient disappearance in the deep neural network, and a very deep network structure is realized.

In one specific embodiment, we collected CT images of 361 lung nodules from post-operative pathology confirmed lung adenocarcinoma patients in our hospital for 281 consecutive cases from 12 months in 2019 to 1 month 2021, the collected nodules were used to externally evaluate the diagnostic performance of the different predictive models. 4929 lung nodule pathologies collected from 5 hospitals in dr. Wise system proved to be lung adenocarcinoma, divided into development set (n=3384), validation set (n=579) and internal test set (n=966). The method comprises the steps of adopting a deep learning framework to base on a traditional 3D Densenet, upgrading the traditional 3D Densenet, constructing a DL model, an ML model and a DL-ML model after upgrading, further integrating a binary classification model and a ternary classification model after modeling through the DL-ML model, and constructing a Bi-DL-ML model. The constructed DL-ML model is a fusion model, 1024 DL features are extracted through the last full connection layer in the DL model structure, 1454 radiological features are extracted through the ML model, then a multi-layer neural network classifier is constructed, and the two features are combined to form a 2478-dimensional feature vector for classifying node wettability. Two binary tasks: binary task 1 (differentiating IAC from AAH/AIS and MIA) based on the first classifier tends to identify non-invasive features, while binary task 2 (differentiating IAC and MIA from AAH/AIS) based on the second classifier tends to identify invasive features, and the final upgrade of our diagnostic model is to logically combine the ternary classification model and the two binary classification models, generating Bi-DL-ML models aimed at achieving good balance performance, the rules of the binary models are described in fig. 4, the paradoxical nodule classification group (IAC classification in task 1 and AAH/AIS classification in task 2) generated by our rules is further classified by the direct ternary DL-ML model. Also, we combine the ML and DL methods with the generated Bi-ML and Bi-DL models to verify the validity of the strategy. For MIA nodules with non-invasive and invasive features, binary models respectively focusing on the two features are combined, compared with a direct ternary model, the MIA focusing degree is increased, the correct classification capacity is improved, 6 model structures are shown in FIG. 4, and an ML model refers to a machine learning model for extracting radiology features; the DL model is a deep learning model added with a discriminant filter learning module; the DL-ML model comprises a DL module, an ML module and a fusion module, wherein the fusion module fuses the radiological characteristics extracted by the ML model and the image characteristics extracted by the DL model; the Bi-ML model is a machine learning model integrating a binary classification model and a ternary classification model, extracts radiological characteristics, performs two binary tasks based on the radiological characteristics, and classifies the generated paradoxical nodule classification group by adopting the ternary classification model; the Bi-DL model is a deep learning model of an added discriminant filter learning module integrating a binary classification model and a ternary classification model, extracts image histology characteristics, performs two binary tasks based on the image histology characteristics, and classifies the generated paradox nodule classification group by adopting the ternary classification model; the Bi-DL-ML model comprises a DL module, an ML module and a fusion module, wherein the fusion module is used for fusing the radiological characteristics extracted by the ML module and the image characteristics extracted by the DL module to obtain fusion characteristics, two binary tasks are carried out based on the fusion characteristics, and the generated paradoxical nodule classification group is classified by adopting a ternary classification model.

The results showed that AUC of ML, DL and ML-DL models in pGGN external test set was 0% to 4% lower than internal test set (a in fig. 5 and B in fig. 5), these reduced (internal and external) test set in ML model: 0.817[95% CI,0.799-0.835] vs. 0.813[95% CI,0.784-0.842]; p=0.65), semantic segmentation model (internal and external test set: 0.864[95% CI,0.848-0.880 ] versus 0.824[95% CI, 0.759-0.852]; p=0.35) and DL-ML models (internal and external test sets: 0.888[95% CI,0.874-0.902] and 0.849[95% CI,0.822-0.875]; p=0.39) shows no statistical significance. The Bi-ML, bi-DL, and Bi-DL-ML models using the strategy of the present application produced only one point in the ROC graph, with no significant difference between the two test sets. In the six models, the Bi-DL-ML model with the strategy and framework upgrading fusion model of the application obtains the best performance in all diagnostic indexes in the average method of the internal and external test sets, and has statistical significance. In addition, the diagnostic index of the DL-ML model with feature fusion is higher than that of the DL or ML model. The ternary model using the strategy of the present application achieves better results than the corresponding direct ternary model, e.g., bi-ML model is better than ML model, bi-DL model is better than DL model, bi-DL-ML model is better than DL-ML model.

Results of individual methods in the external test set: the final fusion model Bi-DL-ML model with the policy framework upgrade of the present application was scaled up stepwise to achieve the removal of the DL and Bi-DL models by a separate approach (Bi-DL-ML model versus DL model/Bi-DL model, 0.017[95% CI,0.003-0.061] versus 0.006[95% CI,0.006-0.064]/0.014[95% CI,0.006-0.061]; p=0.55 and 0.71), which means that the Bi-DL-ML model has the most balanced performance in the external test set (Table 2 and FIG. 6). In the first two upgrades, in addition to the high values of accuracy and kappa, the DL-ML model with multimodal fusion feature achieved high sensitivity and high specificity in all three classifications simultaneously (table 2). Specifically, the sensitivity and specificity of the DL-ML model are both higher than 0.69 in all classification tasks, and 0.629[95% CI,0.536-0.726] in the MIA classification. In contrast, the ML model showed low sensitivity in both AAH+AIS and MIA classifications (0.587 [95% CI, 0.495-0.669], 0.361[95% CI, 0.267-0.457 ]), while the DL model showed low sensitivity in the MIA classification (Table 2). The third upgrade using the inventive strategy improves the performance of all three classifications, particularly in the MIA classification, shown in detail in the confusion matrix (C-D in fig. 6). Taking the DL-ML (C in fig. 6) and Bi-DL-ML (D in fig. 6) models as an example, 9 AAH/AIS, 5 MIA and 8 IAC were correctly diagnosed by rule-based fusion of two binary tasks, and in paradox nodules, 7 MIA and 1 AAH/AIS were correctly classified. The balance of good models with the strategy of the application can also be reflected in sensitivity, accuracy and F1 score (Table 2), with a great improvement in MIA and a slight and modest improvement in AAH/AIS and IAC. The misdistinction between the three categories comes mainly from MIA categories. Furthermore, paradox nodule groups are minimal in the Bi-DL-ML model (D in FIG. 6).

/>

Annotation: the "average" line a represents the average of three diagnostic indices, each using a model approach. The P value of the average b reflects the comparison of the average diagnostic index of a model with the average diagnostic index of the Bi-DL-ML model, i.e., a many-to-one test. The "range" line indicates the range (i.e., maximum minus minimum) of three diagnostic indices measured using a single method of a model. The d-range P value shows the comparison result of the range diagnostic index of a certain model with the range diagnostic index of the BI-DL-ML model.

The lung adenocarcinoma data processing system provided by the embodiment of the invention comprises a computer program, and when the computer program is executed, the lung adenocarcinoma data processing method is realized.

FIG. 2 is a schematic diagram showing the connection of modules of a lung adenocarcinoma data processing system according to an embodiment of the present invention, which includes:

a feature extraction unit for extracting features of the clinical sample data;

Fig. 3 is a lung adenocarcinoma data processing apparatus provided in an embodiment of the present invention, the apparatus including: a memory and a processor; the memory is used for storing program instructions; the processor is used for calling program instructions, and when the program instructions are executed, the processor is used for executing the lung adenocarcinoma data processing method.

In some embodiments, the memory may be understood as any device holding a program and the processor may be understood as a device using the program. The apparatus may further comprise input means and output means.

The present application provides a computer-readable storage medium having stored thereon a computer program for performing lung adenocarcinoma data processing, which when executed by a processor, implements the above-described lung adenocarcinoma data processing method.

It will be clearly understood by those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described system, apparatus and module may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.

In the several embodiments provided by the present application, it should be understood that the disclosed system apparatus and method may be implemented in other manners. For example, the device embodiments described above are merely illustrative; for another example, the division of the modules is just one logic function division, and other division modes can be adopted in actual implementation; as another example, multiple modules or components may be combined or integrated into another system, or some features may be omitted, or not performed. In addition, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or modules, or may be in electrical, mechanical or other forms.

The modules illustrated as separate components may or may not be physically separate, and components shown as modules may or may not be physical modules, i.e., may be located in one place, or may be distributed over a plurality of network modules. And selecting part or all of the modules according to actual needs to realize the purpose of the scheme of the embodiment.

In addition, in the embodiments of the present invention, each functional module may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module. The integrated module can be realized in a hardware form or a software functional module form.

Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of the above embodiments may be implemented by a program to instruct related hardware, the program may be stored in a computer readable storage medium, and the storage medium may include: read Only Memory (ROM), random access Memory (RAM, random Access Memory), magnetic or optical disk, and the like. The main execution body of the computer program or the method is a computer device, and can be a mobile phone, a server, an industrial personal computer, a singlechip, an intelligent household appliance processor and the like.

Those of ordinary skill in the art will appreciate that all or some of the steps in the methods of the above embodiments may be implemented by a program, where the program may be stored in a computer readable storage medium, and the storage medium may be a read only memory, a magnetic disk, or an optical disk.

While the invention has been described in detail with respect to a computer device, those skilled in the art will appreciate that they can readily use the disclosed embodiments as a basis for the teaching of the present invention. In summary, the present description should not be construed as limiting the invention.

Claims

1. A method of lung adenocarcinoma data processing, the method comprising:

acquiring clinical sample data of a lung adenocarcinoma patient;

extracting features of the clinical sample data;

classifying samples by adopting a first classifier based on non-invasive features in the features to obtain first pre-classification results of an AAH/AIS/MIA group and an IAC group;

classifying the AAH/AIS/MIA group by adopting a fourth classifier and based on the wettability characteristics in the characteristics to obtain a first classification result comprising AAH/AIS and MIA;

Classifying the sample by adopting a second classifier based on the wettability characteristics in the characteristics to obtain second pre-classification results of the AAH/AIS group and the MIA/IAC group;

classifying the MIA/IAC group by adopting a fifth classifier and based on non-invasive features in the features to obtain a second classification result comprising MIA and IAC;

reclassifying IAC group samples in the first pre-classification result and AAH/AIS group samples in the second pre-classification result, wherein the reclassifying is carried out by adopting a third classifier and carrying out three classification on the samples based on the characteristics to obtain a reclassifying result of AAH/AIS, MIA, IAC;

outputting a classification result of which the sample is AAH/AIS, MIA or IAC based on the first classification result, the second classification result and the reclassification result;

wherein the clinical sample data is image sample data;

extracting characteristics of the clinical sample data is extracting image histology characteristics of the image sample data;

the image histology feature extracting the image sample data is that the image sample data is input into a deep learning algorithm model to extract the image histology feature, the deep learning algorithm model comprises a deep learning model added with a discriminant filter learning module, a machine learning model extracting radiological features and a fusion module, the discriminant filter learning module is used for extracting details of the nodes, and the image sample data is input into the deep learning model added with the discriminant filter learning module to extract global features and local details of the nodes in the sample; inputting the image sample data into a machine learning model for extracting radiological features to obtain radiological features; the fusion module performs feature fusion on the global features and the local detail features of the nodules in the sample and the radioomic features;

The non-invasive feature refers to the non-invasive feature of the nodule, and the invasive feature refers to the invasive feature of the nodule;

the AAH is atypical adenomatous hyperplasia, the AIS is in situ adenocarcinoma, the MIA is micro-invasive adenocarcinoma, and the IAC is invasive adenocarcinoma.

2. The method for processing lung adenocarcinoma data according to claim 1, wherein image sample data of a lung adenocarcinoma patient, clinical information and/or genetic information of the patient are acquired; extracting features of image sample data, clinical information and/or genetic information of the lung adenocarcinoma patient, wherein the features comprise image histology features, clinical features and/or genetic features; classifying samples by adopting a first classifier based on non-invasive features in the features to obtain first pre-classification results of an AAH/AIS/MIA group and an IAC group; classifying the AAH/AIS/MIA group by adopting a fourth classifier and based on the wettability characteristics in the characteristics to obtain a first classification result comprising AAH/AIS and MIA; classifying the sample by adopting a second classifier based on the wettability characteristics in the characteristics to obtain second pre-classification results of the AAH/AIS group and the MIA/IAC group; classifying the MIA/IAC group by adopting a fifth classifier and based on non-invasive features in the features to obtain a second classification result comprising MIA and IAC; reclassifying IAC group samples in the first pre-classification result and AAH/AIS group samples in the second pre-classification result, wherein the reclassifying is carried out by adopting a third classifier and carrying out three classification on the samples based on the characteristics to obtain a reclassifying result of AAH/AIS, MIA, IAC; and outputting a classification result of which the sample is AAH/AIS, MIA or IAC based on the first classification result, the second classification result and the reclassification result.

3. The method of claim 1, wherein the invasive feature comprises one or more of the following: burrs, irregularly shaped and unevenly dense bumps, invading vascular gaps, vanishing fat gaps between blood vessels, pleural effusion, lobules, cavitation, bronchiole inflation, vascular navigation and pleural traction; the non-invasive feature comprises one or more of the following: smooth edges, uniform density, low density, small nodule diameter.

4. A lung adenocarcinoma data processing method according to any of claims 1-3, characterized in that the training process of the first classifier comprises:

acquiring training clinical sample data and a label of a lung adenocarcinoma patient, wherein the first label is an AAH/AIS/MIA group, and the second label is an IAC group;

extracting features of the clinical sample data;

inputting the characteristics into a classifier to obtain a preliminary classification result, carrying out loss calculation on the preliminary classification result and the label to obtain a loss value, and optimizing the classifier according to the loss value to obtain a trained first classifier.

5. A lung adenocarcinoma data processing system, characterized in that the system comprises machine-readable program instructions for performing a method for implementing the lung adenocarcinoma data processing method according to any of the claims 1-4, when said machine-readable program instructions are executed by a processor.

6. A lung adenocarcinoma data processing apparatus, the apparatus comprising: a memory and a processor; the memory is used for storing program instructions; the processor is adapted to invoke program instructions, which when executed, are adapted to carry out a method for carrying out the lung adenocarcinoma data processing method of any of claims 1-4.

7. A computer readable storage medium, on which a computer program is stored, characterized in that the lung adenocarcinoma data processing method according to any one of claims 1-4 is implemented, when said computer program is executed by a processor.