CN113705595A

CN113705595A - Method, device and storage medium for predicting degree of abnormal cell metastasis

Info

Publication number: CN113705595A
Application number: CN202110239635.7A
Authority: CN
Inventors: 杨帆; 李航; 姚建华; 邢小涵; 赵宇
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-03-04
Filing date: 2021-03-04
Publication date: 2021-11-26

Abstract

A method for predicting the degree of abnormal cell metastasis based on medical data using artificial intelligence is provided. The method comprises the following steps: acquiring medical data of a plurality of modalities related to abnormal cells; extracting single-modality features of medical data of each modality from the medical data of the plurality of modalities respectively; obtaining cross-modal fusion features of the medical data of multiple modalities by fusing the extracted single-modal features; adjusting each single mode feature by using the cross-mode fusion feature as a global feature; obtaining multi-modal features of the medical data of the plurality of modalities based on the adjusted individual single-modal features; and predicting the degree of abnormal cell metastasis according to the multi-modal characteristics. By the method and the device for predicting the abnormal cell lymphatic migration degree based on the medical data, fusion of pathological information of different modes of a patient is realized, multi-mode information resources of the patient are fully utilized, and the prediction accuracy of the abnormal cell lymphatic migration is improved.

Description

Method, device and storage medium for predicting degree of abnormal cell metastasis

Technical Field

The present invention relates to the field of medical image processing, and in particular to a method, an apparatus, a computing device and a computer-readable storage medium for predicting the degree of abnormal cell metastasis.

Background

At present, artificial intelligence technology is widely used in various medical engineering. Artificial intelligence networks have been used to assist physicians in making decisions based on the characteristics of different diseases.

In clinical diagnosis, a physician typically makes a decision based on pathological images of the patient's focal tissue and clinical diagnostic information of the patient. In some related techniques, neural network models (e.g., fully-connected neural networks) are used to predict the state of a patient's focal tissue based on clinical information of the patient. However, since the content of clinical information is a limited description of the patient's condition, it does not fully reflect the microenvironment of the patient's focal tissue, limiting the performance of neural network models. In other related techniques, pathology images are used to predict the state of a patient's focal tissue. However, since the pathological image reflects only the pathological environment of the lesion tissue site, the physiological status of the patient cannot be incorporated into the decision process. For some protocols that consider both clinical information and pathology images of a patient, the pathology images involved (particularly the pathology features) typically require manual interpretation. Since such interpretation is highly demanding and time consuming for the pathologist, and varies from one laboratory to another, it leads to a floating index of interpretation. This may lead to erroneous decisions.

Disclosure of Invention

When the artificial intelligence in the prior art classifies abnormal cells, classification is mostly performed according to only one type of pathological information, and classification is not performed according to the overall situation of other pathological information (such as clinical information of patients), so that the classification accuracy is not high. In the prior art, the artificial intelligence network comprises a medium-sized network, the classification logic and the network comprise too simple, so that the classification accuracy is not high.

In view of the above, the present invention provides methods, apparatus, computing devices and computer-readable storage media for predicting the extent of abnormal cell migration, which are intended to alleviate or overcome some or all of the above-mentioned deficiencies and possibly others.

According to a first aspect of the present invention, a method for predicting the extent of abnormal cell metastasis based on medical data is provided. The method comprises the following steps: acquiring medical data of a plurality of modalities related to abnormal cells; extracting single-modality features of medical data of each modality from the medical data of the plurality of modalities respectively; obtaining cross-modal fusion features of the medical data of multiple modalities by fusing the extracted single-modal features; adjusting each single mode feature by using the cross-mode fusion feature as a global feature; obtaining multi-modal features of the medical data of the plurality of modalities based on the adjusted individual single-modal features; and predicting the degree of abnormal cell metastasis according to the multi-modal characteristics.

In some embodiments, the plurality of modalities includes at least a first modality and a second modality, the medical data of the first modality corresponds to image data of the abnormal cell, and the medical data of the second modality corresponds to clinical information data of the patient to which the abnormal cell belongs, wherein the medical data of the first modality corresponds to image data of the abnormal cell at a plurality of zoom scales, and the image of the abnormal cell at each of the plurality of zoom scales includes a plurality of images.

In some embodiments, extracting the single-modality feature of the medical data of each modality from the medical data of the plurality of modalities, respectively, comprises: extracting features of a first modality using a first feature extraction network for the first modality; and extracting features of the second modality using the second feature extraction network for the second modality.

In some embodiments, the plurality of images of abnormal cells at each of the plurality of zoom scales includes a plurality of images that are different in number.

In some embodiments, the first feature extraction network comprises an image feature extractor pre-trained based on a first data set, and the second feature extraction network comprises a table feature extractor pre-trained based on a second data set.

In some embodiments, obtaining cross-modality fused features of medical data of a plurality of modalities by fusing the extracted individual single-modality features includes: performing weighted fusion on the features of a plurality of images of abnormal cells under each scaling scale in a first mode to obtain weighted fusion features of the abnormal cells under each scale; and splicing the weighted fusion characteristics of all scales in the first mode and the characteristics of the second mode to obtain the multi-mode global characteristics.

In some embodiments, performing weighted fusion on the features of the multiple images of the abnormal cell in each zoom scale in the first modality, and obtaining the weighted fusion features of the abnormal cell in each scale includes: and averagely pooling the characteristics of the multiple images of the abnormal cells under each scaling scale in the first mode to obtain the weighted fusion characteristics of the abnormal cells under each scale.

In some embodiments, adjusting each of the single-modality features using the cross-modality fusion feature as a global feature comprises: performing local-global information fusion on the basis of the cross-modal fusion features serving as global features and the features of the pictures of all scales in the first mode to obtain the local-global information fusion features of the pictures of all scales in the first mode; carrying out weighted average on the local-global information fusion characteristics of each picture of each scale in the first mode to obtain the adjusted characteristics of each scale in the first mode; and adjusting the features of the second modality based on the cross-modality fusion features as global features to obtain adjusted features in the second modality.

In some embodiments, adjusting the feature of the second modality based on the cross-modality fused feature as a global feature, the obtaining the adjusted feature in the second modality comprising: and calculating the multi-modal global features through a full-connection network, and adjusting the features of the second mode by using a scaling coefficient obtained by a Sigmoid activation function to obtain adjusted features under the second mode.

In some embodiments, deriving multi-modal features of the medical data for the plurality of modalities based on the adjusted individual single-modality features comprises: and splicing the adjusted features of all scales in the first mode and the features of the second mode to obtain multi-mode features of the medical data of the multiple modes.

In some embodiments, classifying the degree of abnormal cell migration based on the multi-modal fusion features comprises: inputting the multi-modal fusion features into a classifier to predict the degree of abnormal cell metastasis; the classifier is based on the characteristics of the samples in the training set as input in advance, and the labels of the samples are used as supervision signals for training.

In some embodiments, the classifiers include at least a primary classifier and a secondary classifier.

In some embodiments, acquiring medical data of a plurality of modalities associated with abnormal cells further comprises: determining a region of interest in a tissue slice; images of the region of interest are acquired at different zoom scales through the sliding window based on the region of interest in the tissue slice.

According to a second aspect of the present invention, there is provided an apparatus for predicting the degree of abnormal cell metastasis based on medical data. The device includes: the device comprises an acquisition module, a single-modal feature extraction module, a cross-modal fusion module, an adjustment module, a multi-modal feature module and a prediction module. An acquisition module configured to acquire medical data of a plurality of modalities related to abnormal cells; a single modality feature extraction module configured to extract a single modality feature of medical data of each modality from medical data of a plurality of modalities, respectively; a cross-modal fusion module configured to obtain cross-modal fusion features of the medical data of the plurality of modalities by fusing the extracted individual single-modal features; an adjustment module configured to adjust each of the single-modality features using the cross-modality fusion features as global features; a multi-modal feature module configured to derive multi-modal features of the medical data for the plurality of modalities based on the adjusted individual single-modal features; and the prediction module is configured to predict the degree of abnormal cell transfer according to the multi-modal characteristics.

According to a third aspect of the present invention, there is provided a computer device characterized by comprising a memory in which a computer program is stored and a processor. The computer program, when executed by a processor, causes the processor to perform the steps of the method described in the above aspect.

According to a fourth aspect of the present invention, there is provided a computer-readable storage medium characterized in that a computer program is stored on the computer-readable storage medium. The computer program, when executed by a processor, causes the processor to perform the steps of the method described in the above aspect.

The method and the device for predicting the abnormal cell transfer degree based on the medical data, which are implemented by artificial intelligence, realize the fusion of pathological information of different modes of a patient, fully utilize multi-mode information resources of the patient and improve the accuracy rate of predicting the abnormal cell transfer. Specifically, the technical scheme of the invention adopts an end-to-end learning mode, and the used clinical information is easy to obtain and does not need to be additionally collected. The invention combines the characteristics of multi-modal data analysis, combines the characteristics into a multi-example learning analysis frame, predicts the early stage of lymph node metastasis by using the clinical information of a patient and a corresponding pathological image, and assists a doctor to make preoperative decision, thereby more accurately providing a treatment solution and improving the life quality of the patient. The multi-modal multi-example method uses the clinical cancer diagnosis gold standard (biopsy pathological section) and the clinical diagnosis information of a patient, applies deep learning and combines the information of the two, predicts whether lymph node metastasis can occur before operation, and effectively assists the decision before operation.

These and other advantages of the invention will be apparent from and elucidated with reference to the embodiments described hereinafter.

Drawings

Embodiments of the present application will now be described in more detail and with reference to the accompanying drawings, in which:

FIG. 1 illustrates an exemplary application scenario according to one embodiment of the present invention;

FIG. 2 shows a flow chart of a method of predicting the extent of abnormal cell metastasis based on medical data;

FIG. 3 shows a schematic block diagram of a method according to an embodiment of the invention;

FIG. 4 shows a schematic diagram of a specific structure of the multi-modal multi-instance module of FIG. 3;

FIG. 5 shows an image thermodynamic diagram processed according to the method of the present invention and an image single modality method, respectively;

FIG. 6 shows a schematic diagram of an apparatus for predicting the degree of abnormal cell metastasis; and

FIG. 7 illustrates a schematic block diagram of a computing system capable of implementing a prediction method for the degree of abnormal cell metastasis in accordance with some embodiments of the invention.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The same reference numerals denote the same or similar parts in the drawings, and thus, a repetitive description thereof will be omitted.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the application. One skilled in the relevant art will recognize, however, that the subject matter of the present application can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the application.

The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.

The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.

It will be understood that, although the terms first, second, third, etc. may be used herein to describe various components, these components should not be limited by these terms. These terms are used to distinguish one element from another. Thus, a first component discussed below may be termed a second component without departing from the teachings of the present concepts. As used herein, the term "and/or" and similar terms include all combinations of any, multiple, and all of the associated listed items.

It will be appreciated by those skilled in the art that the drawings are merely schematic representations of exemplary embodiments, and that the blocks or processes shown in the drawings are not necessarily required to practice the present application and are, therefore, not intended to limit the scope of the present application.

Before describing embodiments of the present invention in detail, some relevant concepts are explained first:

artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

Clinical Information (CI) is Information that characterizes a patient's physiological indicators, including but not limited to height, weight, blood pressure, past medical history, genomic Information, transcriptomics, metabolomics, proteomics, and intelligent Clinical tests.

Full slice digital image (WSI) is the most closely approached mode for direct microscopic diagnosis, and is currently the mainstream technique for digital pathology. Pathological analysis based on WSI is an important tool for diagnosing diseases, particularly malignant tumors. The WSI meets the requirements of high definition, high speed and high flux adjustment, provides possibility for the digitization of the traditional pathological section, and lays a solid foundation for the development and application of digital pathology. In digital pathology, a common digitization strategy is to perform a 200-fold or 400-fold scan of the pathological section, with the data generated by the scan being referred to as WSI. The rapid development of digital pathology based on full-scale digitized images (WSI) has made many advances in Artificial Intelligence (AI) in computer-aided diagnosis.

A Region of Interest (ROI) is a Region to be processed, which is delineated from a processed image in the form of a box, a circle, an ellipse, an irregular polygon, or the like in image processing using machine vision. In the present invention, a region of the WSI image that is related to diseased (e.g., cancer) tissue is specified.

The TabNet feature extraction network is a high-performance and interpretable depth-table data learning neural network proposed by Google. The TabNet will focus on (sequential association) in order to select the function to be inferred from each feature, facilitating the use of the learning capabilities of the model for the most prominent features, allowing for explanatory and more efficient learning.

In the medical field (cancer field for example), doctors often need to acquire pathological images of cancerous tissues of patients to develop a treatment plan for the next step. For example, highly suspected focal tissue may be directly obtained by puncturing. In the pathology department, the obtained lesion tissues are subjected to paraffin embedding and flaking, and are stained by hematoxylin and eosin to prepare a pathological biopsy, so that a biopsy pathological image is generated. On the other hand, when preparing a treatment plan, the doctor also examines the overall health level of the patient and analyzes the clinical information of the patient to determine the condition of the patient.

Breast cancer is the most common cancer in women. Most breast cancers are found already in an invasive stage, which means that abnormal cells have spread beyond the breast cancer or lobules to the surrounding tissue. In the treatment of breast cancer, some patients have conditions that involve axillary lymphadenectomy. Axillary lymphadenectomy can be associated with a number of recent and long-term complications. Therefore, before performing the cleaning operation, the transition state of the lymph node needs to be accurately classified to avoid unnecessary operations. At present, for the task of classifying the lymph node metastasis degree of breast cancer, the clinical routine detection only uses pathological images or only uses clinical information, so that higher accuracy cannot be achieved. For the scheme that the clinical information and the pathological images of the patient are considered at the same time, the pathological images still need to be manually interpreted, so that an end-to-end automatic decision process cannot be realized.

According to the invention, by simulating the interpretation method of a doctor by using a multi-modal model and applying deep learning in combination with multi-modal data, the multi-modal data (such as multi-scale pathological image information and clinical information) can be simultaneously incorporated into a decision process, so that the prediction of the state of focal tissue (such as breast cancer lymph node metastasis) can be completed end to end without the interpretation of a pathologist.

FIG. 1 illustrates an exemplary application scenario 100 according to one embodiment of the present invention. Included in the application scenario 100 are data 101, a server 102, a network 103, and a display 104. Data 101 refers to data input to server 102 for training or testing the prediction of the abnormal cell migration prediction model, for example, data 101 may include a training set, a validation set, and a test set. The data types may include, but are not limited to, image information, table information, and sound information. In an application scenario of the present invention, data 101 characterizing a puncture image of a cancer tissue of a patient, echoes of a relevant part of the patient, clinical information of the patient, etc. may be transmitted to the server 102 via the network 103 or input to the server 102 directly via an input device of the server 102. Network 103 may be, for example, a Wide Area Network (WAN), a Local Area Network (LAN), a wireless network, a public telephone network, an intranet, and any other type of network known to those skilled in the art. It will be appreciated that in other application scenarios, the data 101 may be input directly into the server 201 without going through the network 103. The server 102, upon receiving the data, classifies the data according to the neural network in an embodiment of the present invention, and visually outputs the intermediate calculation content to the display 104 for display, thereby assisting the doctor in classifying the data. The server 102 is a hardware device to which the technical solution according to an embodiment of the present invention can be applied, and at least a neural network according to an embodiment of the present invention trained in advance is stored thereon. As will be appreciated by one of ordinary skill in the art, the server 102 is not limited to a single server device, but may be a cluster of servers or a distributed device. The display 104 may receive the display data from the server 102, and display the analysis result of the abnormal cell migration degree according to the display data.

Fig. 2 shows a flow diagram of a method 200 for predicting the degree of abnormal cell metastasis based on medical data. The method 200 may be implemented on the server 102 shown in FIG. 1 using artificial intelligence according to the following steps.

In step 201, server 102 obtains medical data of a plurality of modalities associated with abnormal cells from a tissue slice. In one embodiment, the plurality of modalities includes at least a first modality and a second modality. The medical data of the first modality corresponds to abnormal cell-related image data. The medical data of the second modality corresponds to clinical information data of a patient to which the abnormal cell belongs. Here, an operator (e.g., a pathologist) first collects a biopsy pathology picture of the patient. The operator delineates the cancer region as an ROI, and simultaneously acquires clinical information corresponding to the ROI. In one embodiment, the medical data of the first modality corresponds to image data of the abnormal cell at a plurality of zoom scales. The operator scales the WSI to 20, 10 and 5 magnifications, respectively. Here, the pathological image data may use other numbers of magnifications (e.g., 5 zoom scales) or other numerical values of magnifications. And for different scaling scales, sliding windows with the size of 512 multiplied by 512 are used, the sliding windows slide in the effective ROI area with the step length of 512, and small pictures (Patch) under different multiplying powers are obtained and stored on a magnetic disk for subsequent processing. Other sizes may be used for Patch for pathological image acquisition here. In one embodiment, the abnormal-cell image at each of the plurality of zoom scales includes a plurality of images. In addition, the table of clinical information is cleaned and quantified by category for the subsequent numerical calculation section. In the context of this document, modality refers to a form of pathological information of a patient. As understood by those skilled in the art, the medical data of the plurality of modalities is not limited to medical data of a first modality and medical data of a second modality. For example, data from other modalities such as medical data from a genetic sequencing modality, medical data from a proteomic modality, and the like may also be included. The ROI sizes at different magnifications are not necessarily consistent, and the ROI sizes for different patients are also different. In one embodiment, the plurality of images of abnormal cells at each of the plurality of zoom scales may include a plurality of images having different numbers.

In step 202, single-modality features of the medical data of each modality are respectively extracted from the medical data of the plurality of modalities. In one embodiment, a first feature extraction network is used to extract features of a first modality. In one embodiment, the first feature extraction network includes an image feature extractor pre-trained based on the first data set. For the pathology image part, all Patch were feature extracted using EffectientNet-B0 pre-trained on ImageNet dataset as image feature extractor. All features acquired by each WSI at different magnifications will be packaged together and saved to disk for subsequent training or prediction. The image feature extractor in the present solution may use other convolutional neural networks (e.g., ResNet, densneet, EfficientNet, increment net, VGG), where the initialization method of the network may be based on ImageNet data set pre-training, random initialization, and pre-training based on other data sets. The subsequent algorithm will set up a separate network branch for each magnification to process. In one embodiment, the features of the second modality are extracted using a second feature extraction network for the second modality. And (3) performing feature extraction on the clinical information of the patient by using a table model feature extractor TabNet to generate an embedded feature representation of the clinical information. For feature extraction of clinical information, other table models or fully connected neural networks may be used.

In the pathological image portion, image features are extracted from images at a plurality of magnifications. Because the dimensions of the ROIs at different magnifications are not the same, and the dimensions of the ROIs of different patients are also different, the number of the features extracted in this step will vary greatly, and an effective feature selection mechanism is required here to retain effective sample features. Meanwhile, the method uses data of a plurality of modes, and also needs to effectively integrate information of different modes to jointly complete prediction. The multi-modal multi-instance module will solve the aforementioned problem of feature selection and modality information integration. Therefore, in step 203, cross-modality fusion features of the medical data of multiple modalities are obtained by fusing the extracted individual single-modality features. There will be a variable number of image features at each magnification. Here, multiple features at each magnification are fused into a single feature by way of global average pooling. Then, the form features and the features of the images with different magnifications are connected, feature fusion is carried out on the form features and the features of the images with different magnifications by using a full-connection network layer, cross-modal fusion features are obtained, weighted fusion is carried out on the features of a plurality of images of abnormal cells under each zooming scale under a first modality, and weighted fusion features of the abnormal cells under each scale are obtained; and splicing the weighted fusion characteristics of all scales in the first mode and the characteristics of the second mode to obtain the cross-mode fusion characteristics. Here, the weighted fusion may be performed by performing a global average pooling for the features of the plurality of images of the abnormal cell at each zoom scale in the first modality.

In step 204, each of the single-mode features is adjusted using the cross-mode fusion feature as a global feature. In one embodiment, the adjustment process includes: performing local-global information fusion on the basis of the cross-modal fusion features serving as global features and the features of the pictures of all scales in the first mode to obtain the local-global information fusion features of the pictures of all scales in the first mode; carrying out weighted average on the local-global information fusion characteristics of each picture of each scale in the first mode to obtain the adjusted characteristics of each scale in the first mode; and adjusting the features of the second modality based on the cross-modality fusion features as global features to obtain adjusted features in the second modality. In one embodiment, adjusting the feature of the second modality based on the cross-modality fusion feature as a global feature, the obtaining of the adjusted feature in the second modality includes: and calculating the multi-modal global features through a full-connection network, and adjusting the features of the second mode by using a scaling coefficient obtained by a Sigmoid activation function to obtain adjusted features under the second mode. In the network branch of the table features, after the cross-modal fusion features pass through the fully-connected network layer, the Sigmoid activation function is used to obtain the scaling coefficient of the branch features, and the table features are scaled. The scaling is embodied as a guidance of the monomodal branch learning process across the modal data. In the network branch of image features, the features at each magnification need to be processed separately to adjust the effective features. And splicing the cross-modal fusion features with the Patch features under each multiplying power, and performing local and global feature fusion by using a fully-connected network layer. And processing the fused features by using a multilayer fully-connected network and then connecting a Softmax activation function, and acquiring the weight representing the relative importance degree of each Patch. And multiplying the weight of the features of each magnification back to the features of the Patch, and summing all the features of the Patch to obtain the embedded feature representation of the pathological image at the magnification.

In step 205, multi-modal features of the medical data of the plurality of modalities are derived based on the adjusted individual single-modality features. In one embodiment, deriving multi-modal features of the medical data of the plurality of modalities based on the adjusted individual single-modality features comprises: and splicing the adjusted features of all scales in the first mode and the features of the second mode to obtain multi-mode features of the medical data of the multiple modes.

In step 206, the degree of abnormal cell migration is predicted based on the multi-modal features. Through the foregoing processing, the fused table embedding feature and the embedding feature of the image at each magnification have been obtained. In the classification output part, the table features and the image features are spliced, and the breast cancer lymph node metastasis is predicted by using a multilayer fully-connected network as a classifier Gc (also called a main classifier) of the whole model to obtain the probability of occurrence of metastasis. In one embodiment, classifying the degree of abnormal cell migration based on the multi-modal fusion features comprises: inputting the multi-modal fusion features into a classifier to predict the degree of abnormal cell metastasis; the classifier is based on the characteristics of the samples in the training set as input in advance, and the labels of the samples are used as supervision signals for training. In another embodiment, the classifiers include at least a primary classifier and a secondary classifier. The reason for using the secondary classifier here is that the training process of the network may fluctuate in part of the multi-instance fusion. To stabilize this process, based on the idea of deep supervision, another classifier Gt is used to add additional supervision information to this branch of table features during training.

The method realizes the fusion of pathological information of different modes of the patient, fully utilizes multi-mode information resources of the patient, and improves the prediction accuracy of abnormal cell transfer. Specifically, the technical scheme of the invention adopts an end-to-end learning mode, and the used clinical information is easy to obtain and does not need to be additionally collected. The invention combines the characteristics of multi-modal data analysis, combines the characteristics into a multi-example learning analysis frame, predicts the early stage of lymph node metastasis by using the clinical information of a patient and a corresponding pathological image, and assists a doctor to make preoperative decision, thereby more accurately providing a treatment solution and improving the life quality of the patient. The multi-modal multi-example method uses the clinical cancer diagnosis gold standard (biopsy pathological section) and the clinical diagnosis information of a patient, applies deep learning and combines the information of the two, predicts whether lymph node metastasis can occur before operation, and effectively assists the decision before operation.

Fig. 3 shows a schematic block diagram of a method according to an embodiment of the invention. In this method, a biopsy pathology picture of a patient is first collected, and an operator (e.g., pathologist) delineates a cancer region as an ROI. While other information is acquired.

Then, the obtained multi-modal information is preprocessed. The operator scales the WSI of the ROI to different scaling sizes, respectively. Here, the zoom size is described as 20-magnification, 10-magnification, and 5-magnification, respectively. For

pictures

301, 302, and 303 at 20-magnification, 10-magnification, and 5-magnification, a sliding window is used to slide within the effective ROI area in steps to obtain

small pictures

304, 305, and 306 at different magnifications. For different magnifications, the size of the sliding window may be 512 (pixels) × 512 (pixels), and the step size may be 512. The sliding window and step size may also be set to other values for different magnifications. The small pictures at different magnifications are saved to a disk for subsequent processing. The table 310 of clinical information is cleaned and quantified by category for the subsequent numerical calculation section.

Next, medical data of different modalities, i.e., pathological images and clinical information, are subjected to feature extraction, respectively. For medical data of the first modality (i.e., pathological image data), feature extraction is performed on all the small pictures here using the feature extraction network shown in fig. 3, resulting in extracted embedded

features

307, 308, and 309 of the images at different magnifications. The feature extraction network herein may employ EfficientNet-B0 pre-trained with a first data set (e.g., ImageNet data set). All features acquired by each WSI at different magnifications are packaged together and saved to disk for subsequent training or prediction. For medical data of the second modality (i.e., clinical information data 310), the clinical information of the patient is feature extracted using a table feature extraction network (e.g., a TabNet network), generating an embedded feature representation 311 of the clinical information.

Finally, the features of the modalities are fused by the multimodal, multi-instance module 312. The multimodal multi-exemplar module 312 fuses the embedded features 307, 308, 309 of the image and the embedded feature representation 311 of the clinical information, obtaining fused table embedded features 316 and fused embedded

features

313, 314, 315 of the image at each magnification. The specific structure of the multimodal multi-exemplar module 312 will be explained in more detail in FIG. 4. At 317, the embedded features 313, 314, and 315 of the fused image at each magnification and the fused table embedded feature 316 are stitched. And classifying the spliced features by using a classifier Gc 318 to predict the lymph node metastasis condition of abnormal cells and obtain the probability of metastasis. Where the classifier 318 may employ a multi-layer fully connected network. In part of multi-instance fusion, there may be fluctuations in the training process of the network. To stabilize this process, an auxiliary classifier 319 is used in training based on the deep supervised idea. The secondary classifier 319 classifies the fused table-embedded features 316 to add additional supervisory information.

Fig. 4 shows a schematic diagram of a specific structure of the multi-modal multi-instance module 312 in fig. 3. Taking the example of the image at 3 scaling scales as an example, the multi-modal multi-instantiation module 312 first aggregates the embedded features 307, 308 and 309 of the image, respectively. Since there may be a different number of image features at each magnification. The process of feature aggregation is shown on the left side of fig. 4. Here, the

multiple features

307, 308, 309 at each magnification are fused into a single feature u1, u2, u3 in a global average pooling manner. Then, the table feature ht is connected to the fusion features u1, u2, and u3 of the images of different magnifications. At 401, the connected features are fused using a fully connected network, resulting in a cross-modal fused feature Fglobal.

Next, each single-modality feature is adjusted using the cross-modality fusion feature as a global feature. In one embodiment, adjusting each of the single-modality features using the cross-modality fusion feature as a global feature comprises: performing local-global information fusion on the basis of the cross-modal fusion features serving as global features and the features of the pictures of all scales in the first mode to obtain the local-global information fusion features of the pictures of all scales in the first mode; carrying out weighted average on the local-global information fusion characteristics of each picture of each scale in the first mode to obtain the adjusted characteristics of each scale in the first mode; and adjusting the features of the second modality based on the cross-modality fusion features as global features to obtain adjusted features in the second modality.

In another embodiment, adjusting the feature of the second modality based on the cross-modality fusion feature as a global feature, the obtaining of the adjusted feature in the second modality includes: and calculating the multi-modal global features through a full-connection network, and adjusting the features of the second mode by using a scaling coefficient obtained by a Sigmoid activation function to obtain adjusted features under the second mode.

In a network branch of the table feature, after the cross-modal fusion feature Fglobal passes through a fully connected network layer at 403, a Sigmoid activation function is used to obtain a scaling coefficient of the branch and scale the table feature based on the scaling coefficient. The scaling sparseness is embodied as a guidance of the cross-modal data to the unimodal branch learning process.

In the image feature network branch, the features at each magnification need to be processed separately to screen out the effective features. Here, the obtained cross-modal fusion features are divided into sub-cross-modal fusion features o1, o2, and o3 of the same number as the scaling scale. O1, o2, and o3 were stitched to the Patch feature at each magnification, respectively. Then, the full-connection network is used for fusing local and global features, the fused features are processed by a multi-layer full-connection network and then a Softmax activation function, and weight representing the relative importance degree of each Patch is obtained. And multiplying the weight back to the characteristics of the Patch for each scaling scale, and summing all the characteristics of the Patch to obtain the embedded characteristic representation of the pathological image at the magnification.

The method obtains the weight which represents the relative importance degree of the Patch under each multiplying power. Compared with the traditional image single-mode algorithm, the method can better judge the importance degree of the Patch. Based on the method, the interpretation of the corresponding position can be associated, the influence degree of the Patch at different positions on the result can be visualized, and the doctor can be further assisted in making preoperative decision.

After the model training is completed, the trained model weight is loaded in the testing stage, the processes are repeated, and the possibility of lymph node transfer is predicted.

The multi-modal fusion method in the scheme can be replaced by fusion of a fully-connected neural network to obtain multi-modal characterization or weighted summation of single-modal prediction probabilities. The task of the scheme is to predict the lymph node metastasis of the breast cancer, and the lymph node metastasis prediction task or classification task of other cancer species can be replaced.

The method of the present invention and other methods are evaluated in comparison below.

In this comparative evaluation, the method of the present invention and six other methods were compared, respectively, and the six methods were:

(1) a single modality method based on a table modality. In this method, only the form modality that characterizes the clinical information of the patient is used as input for the classification. The network mainly uses a TabNet feature extraction network to extract modal features of the form modality, and directly carries out classification operation according to the modal features to obtain a classification result.

(2) Single modality methods based on image modalities. In this method, only the 5-magnification image modality of the WSI image characterizing the abnormal cells of the patient is used as input for the classification. The network mainly comprises a mode feature extracting network which mainly uses EfficientNet-B0 to extract the mode feature of a 5-magnification image mode, and directly carries out classification operation according to the mode feature to obtain a classification result.

(3) Single modality methods based on image modalities. In this method, only the 10-magnification image modality of WSI images characterizing abnormal cells of a patient is used as input for classification. The network mainly comprises a mode feature extracting network which mainly uses EfficientNet-B0 to extract the mode feature of a 10-magnification image mode, and classification operation is directly carried out according to the mode feature to obtain a classification result.

(4) Single modality methods based on image modalities. In this method, only the 20-magnification image modality of the WSI image characterizing the abnormal cells of the patient is used as input for the classification. The network mainly comprises a mode feature extracting network which mainly uses EfficientNet-B0 to extract the mode feature of the 20-magnification image mode, and directly carries out classification operation according to the mode feature to obtain a classification result.

(5) Multimodal methods based on image modalities. In this method, 5, 10 and 20-magnification image modalities of WSI images characterizing abnormal cells of a patient are used as input for classification. The network mainly comprises the steps of extracting modal characteristics of the three image modalities by using an EfficientNet-B0 characteristic extraction network, then fusing the characteristics of the three modalities into a characteristic representation through a neural network, and directly carrying out classification operation according to the characteristic representation to obtain a classification result. Like the method described above, the evaluation effect of this method is an embodiment of the above-mentioned simple network including method of single modality, which classifies according to a kind of pathological information, does not combine the overall situation of various pathological information for classification, and the classification resources are not fully utilized.

(6) A simple multimodal approach. In the method, 5, 10 and 20-magnification image modalities of a WSI image characterizing abnormal cells of a patient and a form modality of clinical information characterizing the patient are used as inputs for classification. The network comprises a mode feature extracting network which mainly uses EfficientNet-B0 to extract the mode features of the three image modes, and a TabNet feature extracting network is used to extract the features of the form mode. Based on the features of the four modes, the features of the four modes are fused into a feature representation through a full-connection network, and classification operation is directly carried out according to the feature representation to obtain a classification result. The merit effect of this approach is that the above-mentioned simple network of multiple modalities comprises one embodiment of a method that classifies logic and network complexity too coarse and simple to adequately accommodate multi-class data.

The comparative evaluation results are shown in Table 1, wherein AUC, F1-score, accuracy and recall are indexes commonly used in the prior art for evaluating classification performance. Wherein, Precision (Precision) refers to how many positive examples are predicted correctly among all the predicted positive examples, and the Precision can be denoted as Pr, for example, there are 10 positive examples and 5 negative examples, where the positive examples predicted correctly among 10 positive examples are 8, and 2 negative examples among 5 negative examples are predicted as positive, and Pr is { 8/(8 +2) ═ 0.8 }; the Recall ratio (Recall) is the proportion of correct cases that are predicted among all the cases and is denoted as R. The integrated value is obtained according to the precision rate Pr and the recall rate R and can be recorded as F1. For example, the composite value may be a harmonic mean of the precision rate Pr and the recall rate R. As shown in the following equation:

f1 = precision recall 2/(precision + recall) (1)

The greater the accuracy, recall rate or F1 value is, the better the effect of the corresponding model is, and generally, the F1 value can represent the classification effect better because of the integration of the other two evaluation indexes.

AUC (area under the receiver operating characteristic curve) was also used as an evaluation index to measure the effect of each protocol. AUC measures the probability that the classifier correctly judges the value of a positive sample about the value of a negative sample in the case where one positive sample (y-1) and one negative sample (y-0) are randomly drawn. So the larger the AUC value of the classifier, the higher the accuracy.

Table 1 comparative evaluation table.

By comparison, the AUC, F1-score and accuracy index of the invention are obviously higher than those of any other method, and the recall rate also has better performance. The method improves the classification accuracy.

Fig. 5 shows an image thermodynamic diagram processed according to the method of the invention and an image single modality method, respectively. Block 501 in fig. 5 is a thermodynamic diagram after processing an image at 5 magnifications using a-MIL model-based single-mode methods; block 502 is a thermodynamic diagram of the 10-magnification image processed using a single-mode method based on the a-MIL model; block 503 uses a single-mode method based on a-MIL model to process the image at 20 magnifications in a thermodynamic diagram; block 504 is a thermodynamic diagram for an image at 5 magnifications after processing by the method of the invention; block 505 is a thermodynamic diagram of an image at 10 magnifications processed using the method of the invention; block 506 uses the thermodynamic diagram of the image processed by the method of the present invention at 20 magnifications. Different grayscales in fig. 5 correspond to different weight coefficients. The colors in the blocks 501-503 are all relatively close, and different image blocks cannot be well distinguished. The blocks 504-506 can be better distinguished, and useless image blocks can be selected. Therefore, compared with a single-mode method with the same magnification, the method provided by the invention can be used for more accurately distinguishing the transfer degree of the abnormal cells by integrating the information of each mode.

Fig. 6 shows an apparatus 600 for predicting the degree of abnormal cell metastasis. The apparatus 600 comprises: the system comprises an acquisition module 601, a single-modal feature extraction module 602, a cross-modal fusion module 603, an adjustment module 604, a multi-modal feature module 605 and a prediction module 606. The acquisition module 601 is configured to acquire medical data of a plurality of modalities associated with abnormal cells. The single modality feature extraction module 602 is configured to extract single modality features of the medical data of each modality from the medical data of the plurality of modalities, respectively. The cross-modality fusion module 603 is configured to obtain cross-modality fusion features of the medical data of the plurality of modalities by fusing the extracted individual single-modality features. The adjustment module 604 is configured to adjust each of the single-modality features using the cross-modality fusion feature as a global feature. The multi-modal features module 605 is configured to derive multi-modal features of the medical data of the plurality of modalities based on the adjusted individual single-modal features. The prediction module 606 is configured to predict the degree of abnormal cell migration based on the multi-modal features.

The device has realized the integration to the pathological information of patient's different modals, make full use of patient's multi-modal information resource, has improved the categorised rate of accuracy. In particular, an end-to-end learning approach is adopted, and the used clinical information is easy to obtain and does not need to be collected additionally. The device combines the characteristics of multi-modal data analysis, and is combined into a multi-example learning analysis frame, clinical information of a patient and corresponding pathological images are used for predicting early lymph node metastasis, and a doctor is assisted in making preoperative decisions, so that a treatment solution is provided more accurately, and the life quality of the patient is improved. The multi-modal multi-example method uses the clinical cancer diagnosis gold standard (biopsy pathological section), combines the clinical diagnosis information of a patient, applies deep learning and combines the information of the two, predicts whether lymph node metastasis can occur before operation, and effectively assists the decision before operation.

Fig. 7 illustrates an example system 700 that includes an example computing device 710 that represents one or more systems and/or devices that can implement the various techniques described herein. Computing device 710 may be, for example, a server of a service provider, a device associated with a server, a system on a chip, and/or any other suitable computing device or computing system. The apparatus 600 for predicting the degree of abnormal cell metastasis described above with reference to fig. 6 may take the form of a computing device 710. Alternatively, the apparatus 600 for predicting the degree of abnormal cell metastasis may be implemented as a computer program in the form of an application 716.

The example computing device 710 as illustrated includes a processing system 711, one or more computer-readable media 712, and one or more I/O interfaces 713 communicatively coupled to each other. Although not shown, the computing device 710 may also include a system bus or other data and command transfer system that couples the various components to one another. A system bus can include any one or combination of different bus structures, such as a memory bus or memory controller, a peripheral bus, a universal serial bus, and/or a processor or local bus that utilizes any of a variety of bus architectures. Various other examples are also contemplated, such as control and data lines.

The processing system 711 represents functionality to perform one or more operations using hardware. Thus, the processing system 711 is illustrated as including hardware elements 714 that may be configured as processors, functional blocks, and so forth. This may include implementation in hardware as an application specific integrated circuit or other logic device formed using one or more semiconductors. Hardware element 714 is not limited by the material from which it is formed or the processing mechanism employed therein. For example, a processor may be comprised of semiconductor(s) and/or transistors (e.g., electronic Integrated Circuits (ICs)). In such a context, processor-executable instructions may be electronically-executable instructions.

The computer-readable medium 712 is illustrated as including a memory/storage 715. Memory/storage 715 represents memory/storage capacity associated with one or more computer-readable media. Memory/storage 715 may include volatile media (such as Random Access Memory (RAM)) and/or nonvolatile media (such as Read Only Memory (ROM), flash memory, optical disks, magnetic disks, and so forth). Memory/storage 715 may include fixed media (e.g., RAM, ROM, a fixed hard drive, etc.) as well as removable media (e.g., flash memory, a removable hard drive, an optical disk, and so forth). The computer-readable medium 712 may be configured in various other ways as further described below.

One or more I/O interfaces 713 represent functionality that allows a user to enter commands and information to computing device 710 using various input devices and optionally also allows information to be presented to the user and/or other components or devices using various output devices. Examples of input devices include a keyboard, a cursor control device (e.g., a mouse), a microphone (e.g., for voice input), a scanner, touch functionality (e.g., capacitive or other sensors configured to detect physical touch), a camera (e.g., motion that may not involve touch may be detected as gestures using visible or invisible wavelengths such as infrared frequencies), and so forth. Examples of output devices include a display device (e.g., a monitor or projector), speakers, a printer, a network card, a haptic response device, and so forth. Accordingly, the computing device 710 may be configured in various ways to support user interaction, as described further below.

Computing device 710 also includes application 716. Application 716 may be, for example, a software instance of apparatus 600 that predicts the extent of abnormal cell metastasis, and implements the techniques described herein in combination with other elements in computing device 710.

Various techniques may be described herein in the general context of software hardware elements or program modules. Generally, these modules include routines, programs, objects, elements, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The terms "module," "functionality," and "component" as used herein generally represent software, firmware, hardware, or a combination thereof. The features of the techniques described herein are platform-independent, meaning that the techniques may be implemented on a variety of computing platforms having a variety of processors.

An implementation of the described modules and techniques may be stored on or transmitted across some form of computer readable media. Computer readable media can include a variety of media that can be accessed by computing device 710. By way of example, and not limitation, computer-readable media may comprise "computer-readable storage media" and "computer-readable signal media".

"computer-readable storage medium" refers to a medium and/or device, and/or a tangible storage apparatus, capable of persistently storing information, as opposed to mere signal transmission, carrier wave, or signal per se. Accordingly, computer-readable storage media refers to non-signal bearing media. Computer-readable storage media include hardware such as volatile and nonvolatile, removable and non-removable media and/or storage devices implemented in a method or technology suitable for storage of information such as computer-readable instructions, data structures, program modules, logic elements/circuits or other data. Examples of computer readable storage media may include, but are not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical storage, hard disks, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or other storage devices, tangible media, or an article of manufacture suitable for storing the desired information and accessible by a computer.

"computer-readable signal medium" refers to a signal-bearing medium configured to transmit instructions to the hardware of computing device 710, such as via a network. Signal media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave, data signal or other transport mechanism. Signal media also includes any information delivery media. The term "modulated data signal" means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media.

As before, hardware element 714 and computer-readable medium 712 represent instructions, modules, programmable device logic, and/or fixed device logic implemented in hardware form that may be used in some embodiments to implement at least some aspects of the techniques described herein. The hardware elements may include integrated circuits or systems-on-chips, Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs), Complex Programmable Logic Devices (CPLDs), and other implementations in silicon or components of other hardware devices. In this context, a hardware element may serve as a processing device that performs program tasks defined by instructions, modules, and/or logic embodied by the hardware element, as well as a hardware device for storing instructions for execution, such as the computer-readable storage medium described previously.

Combinations of the foregoing may also be used to implement the various techniques and modules herein. Thus, software, hardware, or program modules and other program modules may be implemented as one or more instructions and/or logic embodied on some form of computer-readable storage medium and/or by one or more hardware elements 714. The computing device 710 may be configured to implement particular instructions and/or functions corresponding to software and/or hardware modules. Thus, implementing a module as a module executable by computing device 710 as software may be implemented at least partially in hardware, for example, using computer-readable storage media of a processing system and/or hardware elements 714. The instructions and/or functions may be executable/operable by one or more articles of manufacture (e.g., one or more computing devices 710 and/or processing systems 711) to implement the techniques, modules, and examples described herein.

In various implementations, the computing device 710 may assume a variety of different configurations. For example, the computing device 710 may be implemented as a computer-like device including a personal computer, a desktop computer, a multi-screen computer, a laptop computer, a netbook, and so forth. The computing device 710 may also be implemented as a mobile device-like device including mobile devices such as mobile phones, portable music players, portable gaming devices, tablet computers, multi-screen computers, and the like. Computing device 710 may also be implemented as a television-like device that includes devices with or connected to a generally larger screen in a casual viewing environment. These devices include televisions, set-top boxes, game consoles, and the like.

The techniques described herein may be supported by these various configurations of computing device 710 and are not limited to specific examples of the techniques described herein. Functionality may also be implemented in whole or in part on "cloud" 720 through the use of a distributed system, such as through platform 722 as described below.

Cloud 720 includes and/or is representative of platform 722 for resources 724. The platform 722 abstracts underlying functionality of hardware (e.g., servers) and software resources of the cloud 720. The resources 724 may include applications and/or data that may be used when executing computer processes on servers remote from the computing device 710. Resources 724 may also include services provided over the internet and/or over a subscriber network such as a cellular or Wi-Fi network.

Platform 722 may abstract resources and functionality to connect computing device 710 with other computing devices. The platform 722 may also be used to abstract a hierarchy of resources to provide a corresponding level of hierarchy encountered for the demand of the resources 724 implemented via the platform 722. Thus, in interconnected device embodiments, implementation of functions described herein may be distributed throughout the system 700. For example, the functionality may be implemented in part on the computing device 710 and through the platform 722 that abstracts the functionality of the cloud 720.

It should be appreciated that for clarity, embodiments of the application have been described with reference to different functional units. However, it will be apparent that the functionality of each functional unit may be implemented in a single unit, in a plurality of units or as part of other functional units without detracting from the application. For example, functionality illustrated to be performed by a single unit may be performed by a plurality of different units. Thus, references to specific functional units are only to be seen as references to suitable units for providing the described functionality rather than indicative of a strict logical or physical structure or organization. Thus, the present application may be implemented in a single unit or may be physically and functionally distributed between different units and circuits.

Although the present application has been described in connection with some embodiments, it is not intended to be limited to the specific form set forth herein. Rather, the scope of the present application is limited only by the accompanying claims. Additionally, although individual features may be included in different claims, these may possibly advantageously be combined, and the inclusion in different claims does not imply that a combination of features is not feasible and/or advantageous. The order of features in the claims does not imply any specific order in which the features must be worked. Furthermore, in the claims, the word "comprising" does not exclude other elements, and the terms "a" or "an" do not exclude a plurality. Reference signs in the claims are provided merely as a clarifying example and shall not be construed as limiting the scope of the claims in any way.

Claims

1. A method of predicting the extent of abnormal cell metastasis based on medical data, comprising:

acquiring medical data of a plurality of modalities related to abnormal cells;

extracting a single-modality feature of the medical data of each modality from the medical data of the plurality of modalities respectively;

obtaining cross-modal fusion features of the medical data of the plurality of modalities by fusing the extracted single-modal features;

adjusting each single-mode feature by using the cross-mode fusion feature as a global feature;

obtaining multi-modal features of the medical data of the plurality of modalities based on the adjusted individual single-modality features;

and predicting the degree of the abnormal cell metastasis according to the multi-modal characteristics.

2. The method of claim 1, wherein the plurality of modalities includes at least a first modality and a second modality, medical data of the first modality corresponding to image data of abnormal cells, medical data of the second modality corresponding to clinical information data of a patient to which the abnormal cells belong,

wherein the medical data of the first modality corresponds to image data of abnormal cells at a plurality of zoom scales, the image of abnormal cells at each of the plurality of zoom scales comprising a plurality of images.

3. The method of claim 2, wherein the extracting the single-modality feature of the medical data of each modality from the medical data of the plurality of modalities, respectively, comprises:

extracting features of a first modality using a first feature extraction network for the first modality; and

features of the second modality are extracted using a second feature extraction network for the second modality.

4. The method of claim 2, wherein the plurality of images of abnormal cells at each of the plurality of zoom scales comprises a plurality of images that differ in number.

5. The method of claim 3, wherein the first feature extraction network comprises an image feature extractor pre-trained based on a first data set and the second feature extraction network comprises a table feature extractor pre-trained based on a second data set.

6. The method according to claim 3, wherein the obtaining cross-modality fused features of the medical data of the plurality of modalities by fusing the extracted individual single-modality features comprises:

performing weighted fusion on the features of a plurality of images of the abnormal cell under each scaling scale in the first modality to obtain weighted fusion features of the abnormal cell under each scale;

and splicing the weighted fusion characteristics of all scales in the first mode and the characteristics of the second mode to obtain cross-mode fusion characteristics.

7. The method of claim 6, wherein the performing a weighted fusion of features for a plurality of images of the abnormal cell at each scale in the first modality to obtain a weighted fused feature of the abnormal cell at each scale comprises:

and performing average pooling on the features of the multiple images of the abnormal cells under each scaling scale in the first modality to obtain a weighted fusion feature of the abnormal cells under each scale.

8. The method according to claim 2, wherein the adjusting individual single-modality features using the cross-modality fusion features as global features comprises:

performing local-global information fusion on the cross-modal fusion features serving as global features and the features of the pictures of all scales in the first modality to obtain the local-global information fusion features of the pictures of all scales in the first modality;

carrying out weighted average on the local-global information fusion characteristics of each picture of each scale in the first mode to obtain the adjusted characteristics of each scale in the first mode; and

and adjusting the features of the second modality based on the cross-modality fusion features as global features to obtain adjusted features in the second modality.

9. The method according to claim 8, wherein the adjusting features of the second modality based on the cross-modality fused features as global features, resulting in adjusted features in the second modality comprising:

and calculating the multi-modal global features through a fully connected network, and adjusting the features of the second mode by using a scaling coefficient obtained by a Sigmoid activation function to obtain adjusted features under the second mode.

10. The method of claim 8, wherein the deriving multi-modal features of the medical data of the plurality of modalities based on the adjusted individual single-modality features comprises:

and splicing the adjusted features of each scale in the first mode and the features of the second mode to obtain multi-mode features of the medical data of the multiple modes.

11. The method of claim 1, wherein said classifying the degree of abnormal cell migration based on the multi-modal fusion signature comprises:

inputting the multi-modal fusion features into a classifier to predict the degree of abnormal cell metastasis;

the classifier is based on the characteristics of samples in a training set as input in advance, and the labels of the samples are used as supervision signals for training.

12. The method of any one of claims 1-11, wherein the acquiring medical data of a plurality of modalities associated with abnormal cells further comprises:

determining a region of interest in the tissue slice;

acquiring images of the region of interest through a sliding window based on the region of interest in the tissue slice.

13. An apparatus for predicting the degree of abnormal cell metastasis based on medical data, comprising:

an acquisition module configured to acquire medical data of a plurality of modalities related to abnormal cells;

a single modality feature extraction module configured to extract a single modality feature of medical data of each modality from the medical data of the plurality of modalities, respectively;

a cross-modality fusion module configured to obtain cross-modality fusion features of the medical data of the plurality of modalities by fusing the extracted individual single-modality features;

an adjustment module configured to adjust each of the single-modality features using the cross-modality fusion features as global features;

a multi-modal features module configured to derive multi-modal features of the medical data of the plurality of modalities based on the adjusted individual single-modal features;

a prediction module configured to predict the degree of abnormal cell metastasis based on the multi-modal features.

14. A computer arrangement, characterized by a memory and a processor, in which a computer program is stored which, when being executed by the processor, causes the processor to carry out the steps of the method according to any one of claims 1-12.

15. A computer-readable storage medium, having stored thereon a computer program which, when executed by a processor, causes the processor to carry out the steps of the method according to any one of claims 1-12.