CN114864076A

CN114864076A - Multi-modal breast cancer classification training method and system based on graph attention network

Info

Publication number: CN114864076A
Application number: CN202210489883.1A
Authority: CN
Inventors: 章永龙; 宋明宇; 李斌
Original assignee: Yangzhou University
Current assignee: Yangzhou University
Priority date: 2022-05-07
Filing date: 2022-05-07
Publication date: 2022-08-05

Abstract

The invention discloses a multi-modal breast cancer classification training method and system based on a graph attention network, wherein the method comprises the following steps: firstly, extracting pathological features and processing texts of electronic medical records to form medical record texts, and acquiring text features by using a pre-training model; meanwhile, performing high-order feature extraction on the pathological image set of the patient by using a graph attention network; then, fusing the obtained image, text and pathological features through a multi-modal adaptive gate control unit to obtain multi-modal fusion features of the patient; and finally, inputting the fused multi-modal characteristics into a multi-layer perceptron to perform classification prediction, and defining a cross entropy loss function training model. The method provided by the invention integrates the characteristics of the three modes of image, text and pathology to classify the breast cancer, the performance of the network structure provided by the invention is obviously superior to that of a single mode method, and the purpose of improving the breast cancer classification accuracy is achieved.

Description

Multi-modal breast cancer classification training method and system based on graph attention network

Technical Field

The invention belongs to the field of deep learning and disease classification, and particularly relates to a multi-modal breast cancer classification training method and system based on a graph attention network.

Background

Breast cancer is one of the most serious diseases threatening human life and health, and is a medical health problem which is commonly concerned all over the world. According to data published in 2020 by international research center for cancer (IARC) under the World Health Organization (WHO), it is shown that up to 226 ten thousand new cases of breast cancer exceed 220 ten thousand of lung cancer, and the breast cancer replaces lung cancer and becomes the first cancer in the world. Breast cancer can develop in both men and women, and more than about 98% of breast cancer patients are women. The incidence of breast cancer is high in the top worldwide, and the annual rise in incidence and the trend toward younger age have severely affected women's health worldwide. In clinical medicine, compared with images such as X-ray, nuclear magnetic resonance and the like, pathological images are still the best standard for breast cancer diagnosis. The early recognition of the benign and malignant classification of the breast cancer tumor pathological image has important significance for the clinician to formulate a personalized treatment scheme. In the traditional method, the classification method of the breast cancer pathological images based on the manual work adopts the manual work to extract the characteristics, and the classification is finished by using classifiers such as a support vector machine, a random forest and the like based on the characteristics. The method has the defects of high requirement of professional knowledge, time consumption for extracting the characteristics, difficulty in extracting the high-quality characteristics and the like.

At present, due to the complexity and the workload of breast cancer pathological image classification based on manual work, the task is time-consuming and labor-consuming, the result is very easy to be influenced by subjective human factors of pathologists, and the generalization capability of a classification model is poor in practical application. In recent years, methods of deep learning have shown increasing advantages in various medical image analysis tasks. Compared with the pathological image classification based on the manual work, the method reduces the requirement on professional knowledge, can continuously learn the image characteristics by utilizing the network and classify the pathological images into benign and malignant types, can improve the diagnosis efficiency, and can provide more objective and accurate diagnosis results for doctors. However, these methods still have some disadvantages: (1) a patient may contain multiple pathological images of various parts of breast cancer with some interaction between the images. Using a single pathological image, the interaction that exists between images is discarded. (2) In the existing research, pathological images are mostly used as input of a convolutional neural network, but the classification of breast cancer benign and malignant by only considering single-mode image data is difficult to meet the requirement of clinical diagnosis. (3) Data of different modes are associated, and the complementarity between the modes can not be fully exerted by a simple fusion method.

Disclosure of Invention

The purpose of the invention is as follows: in view of the problems in the prior art, an object of the present invention is to provide a multi-modal breast cancer classification training method and system based on a graph attention network, which comprehensively consider pathological features of a patient, pathological text description features, and features of multiple pathological images, and consider the relationship among the modal features for adaptive fusion, so as to improve the accuracy of breast cancer classification.

The technical scheme is as follows: in order to achieve the purpose, the invention adopts the following technical scheme: a breast cancer classification training method based on a graph attention network comprises the following steps:

step 1, extracting representative pathological features from an electronic medical record EMR of a patient, digitizing the features, and performing text description to obtain a medical record text;

step 2, extracting the characteristics of a single pathological image of a patient to obtain the characteristics of the pathological image node level, forming a full-connected graph of the pathological image set of the patient, taking the characteristics of the pathological image node level as initial characteristics, and acquiring the high-order characteristics of the pathological image node by using a graph attention network; respectively carrying out average pooling on the initial features and the high-order features of the pathological image nodes, and then splicing to obtain final pathological image features of the patient;

step 3, extracting the diagnosis text characteristics of the patient from the medical record text formed by EMR by using a pre-training language model;

step 4, fusing pathological image features, text features and pathological features of the patient through a multi-mode self-adaptive gate control unit; the self-adaptive gate control unit fuses the three modal characteristics by using an attention gate, and performs weighted summation on the fused characteristics and pathological image characteristics to obtain final multi-modal fusion characteristics;

and 5, classifying and predicting the multi-modal fusion characteristics through a multilayer perceptron, and training a model by defining a cross entropy loss function.

Further, step 1 extracts representative pathological features from the patient's electronic medical record,

including age, sex, disease course type, individual tumor history, pectoral adhesions, family tumor history, orange peel appearance, previous treatment, breast deformation, neoadjuvant chemotherapy, dimple symptoms, skin redness and swelling, skin ulcers, tumors, axillary lymphadenectasis, nipple changes, nipple discharge, lymphadenectasis, tumor location, tenderness, number of tumors, tumor size, tumor texture, tumor boundaries, smooth surface, tumor morphology, activity, envelope, skin adhesion, and diagnosis; firstly, numerically expressing each feature; and then the clinical medical rule carries out text description on the extracted features to obtain a medical record text of the patient.

Further, the specific process of acquiring the pathological image features of the patient in step 2 includes:

step 2-1, let a breast cancer patient have k pathological images, and the pathological image set is represented as X ═ X _i |i＝1，2，3，...，k}，x _i ∈R ^P P is the dimension of each image, and the characteristic V of the node level of the pathological image is obtained through a DenseNet model _i |i＝1，2，3，...，k}，v _i ∈R ^F F is the dimension of the node level feature of each image;

step 2-2, forming a fully connected graph of the pathological image set of the patient to acquire the correlation between the pathological images; the vertex in the graph is a pathological image, and the initial characteristic of the pathological image is represented as the node level characteristic obtained in the step 2-1;

step 2-3, extracting high-order features of the patient image by using a graph attention network GAT, wherein the features V ═ V at the node level of the pathological image _i |i＝1，2，3，...，k}，v _i ∈R ^F And obtaining a final pathological image node high-order characteristic, namely V ' ═ { V ', through a multilayer GAT model as an input of GAT ' _i |i＝1，2，3，...，k}，v′ _i ∈R ^F′ (ii) a The detailed process is as follows:

first, the attention coefficient e of the feature of the node j to the node i is calculated _ij ：

e _ij ＝LeakyReLU(a ^T [Wv _i ||Wv _j ])

Where | | | is the splicing operation, a ^T ∈R ^2F′ Is a parameterized weight vector realized by a full connection layer, LeakyReLU is a nonlinear activation function, and W represents a weight matrix;

then, the coefficient e is matched with a Softmax function _ij And (3) carrying out normalization to obtain the attention weight of the node j to the node i:

wherein N is _i Is the neighborhood of node i in the graph; finally using the normalized attention coefficient alpha _ij Calculating the weighted sum of the associated features to obtain the final output feature of each node:

wherein ELU is a combination of Sigmoid and ReLU, which is a nonlinear activation function; w ₁ Is a weight matrix; the features of the final graph level are summed by the features in the V' set and output after average pooling

The method is characterized in that the features at the node level of the pathological image of the patient are averaged and pooled with the features at the map level

Splicing to obtain the final characteristics G of the pathological image of the patient, wherein G belongs to R ^F′+F ：

Further, the specific process of fusing the features of the three modalities through the multi-modality adaptive gating unit in the step 4 includes:

step 4-1, firstly, according to the obtained pathological image characteristics G, the diagnosis text characteristics T and the pathological characteristics C of the patient, calculating to obtain two weights:

g ^t ＝ReLU(W _gt [G||T]+b _t )

g ^c ＝ReLU(W _gc [G||C]+b _c )

wherein W _gt And W _gc Is a weight matrix, b _t And b _c Is a bias vector, | | represents a splicing operation, and the ReLU is a nonlinear activation function;

and 4-2, obtaining a vector H according to the two weights, the diagnosis text characteristic T and the diagnosis pathology characteristic C:

H＝g ^t ·(W _t T)+g ^c ·(W _c C)+b _H

wherein W _t And W _c Is a weight matrix, b _H Is a bias vector;

and 4-3, finally, obtaining the final multi-modal fusion feature M of the patient through weighting and summing the pathological image feature G and the vector H:

M＝G+αH

wherein β is a hyperparameter initialized randomly by a model, | G | | Y ₂ And H Y ₂ L represents G and H, respectively ₂ And (4) norm.

Further, the classification prediction in step 5 specifically includes:

step 5-1, predicting benign and malignant two categories of the breast cancer by using a Softmax layer, namely:

wherein

Linear represents the output of the fully connected layer.

Step 5-2, calculating a loss function of the breast cancer benign and malignant binary classification task by using cross entropy:

where t is the total number of patients in the data set, P _n And

the actual and predicted values for the nth patient are indicated, respectively.

A graph attention network based multimodal breast cancer classification system comprising:

the pretreatment module is used for extracting representative pathological features from the electronic medical record EMR of the patient, digitizing the features and performing text description to obtain a medical record text;

the pathological image feature generation module is used for extracting the features of a single pathological image of a patient to obtain the features of the pathological image node level, forming a pathological image set of the patient into a full-connected graph, taking the features of the pathological image node level as initial features, and acquiring the high-order features of the pathological image nodes by using a graph attention network; respectively carrying out average pooling on the initial features and the high-order features of the pathological image nodes, and then splicing to obtain final pathological image features of the patient;

the text feature generation module is used for extracting diagnosis text features of the patient from a medical record text formed by EMR by using a pre-training language model;

the multi-mode feature fusion module is used for fusing pathological image features, text features and pathological features of the patient through the multi-mode self-adaptive gate control unit; the self-adaptive gate control unit fuses three modal characteristics by using an attention gate, and performs weighted summation on the fused characteristics and pathological image characteristics to serve as final multi-modal fusion characteristics;

and the training module is used for carrying out classification prediction on the breast cancer by the multi-mode fusion characteristics through a multilayer perceptron and training the model by defining a cross entropy loss function.

And the prediction module is used for inputting the pathological image set, the medical history text and the pathological features of the patient into the trained model to obtain a breast cancer classification prediction result.

A computer system comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the computer program when loaded into the processor implementing the steps of the graph attention network based multimodal breast cancer classification training method.

A computer-readable storage medium, having stored thereon a computer program which, when being executed by a processor, carries out the steps of the graph attention network based multimodal breast cancer classification training method.

Has the advantages that: compared with the prior art, the invention has the following remarkable advantages: 1) the model provided by the invention integrates the characteristics of three modes of images, texts and pathology to classify the breast cancer, and the performance of the network structure is superior to that of a single-mode method; 2) the invention adopts a graph attention network (GAT), takes the pathological image of the patient as a node to form a graph, and combines the pathological image characteristics of the node level and the pathological image characteristics of the graph level, thereby improving the classification performance; 3) the invention provides a fusion method of a multi-mode self-adaptive door, which combines the characteristics of three modes to obtain the multi-mode characteristics which mainly take the characteristics of pathological images and self-adaptively superpose texts and pathological characteristics; 4) experiments show that the invention can obtain more accurate breast cancer classification results, and the classification accuracy can reach 93.62%.

Drawings

FIG. 1 is a schematic flow chart of an embodiment of the present invention;

FIG. 2 is a screenshot depicting the primary numerical version of patient S0000004;

FIG. 3 is a screenshot of a patient S0000004 his main diagnosis description text;

fig. 4 is a schematic diagram of a multi-modal adaptive door according to an embodiment of the invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

With reference to fig. 1, a schematic flow chart of a first embodiment of the present invention provides a breast cancer classification training method based on a graph attention network, which mainly includes the following steps:

step 1, extracting representative pathological features from an electronic medical record EMR of a patient, digitizing the features, and processing the EMR to form a section of text description as a case history text;

step 4, fusing pathological image features, text features and pathological features of the patient through a multi-mode self-adaptive gate control unit; the adaptive gating unit fuses three modal characteristics by using an attention gate, and performs weighted summation on the fused characteristics and pathological image characteristics to obtain final multi-modal fusion characteristics.

And 5, classifying and predicting the fused multi-modal characteristics through a multilayer perceptron, and training a model by defining a cross entropy loss function.

Further, the process of step 1 is as follows:

step 1-1, 29 representative features are extracted from a patient's Electronic Medical Record (EMR). Specifically, the 29 characteristics included age, gender, type of disease course, individual tumor history, pectoral adhesions, family tumor history, orange peel appearance, pretreatment, breast deformity, neoadjuvant chemotherapy, dimple symptoms, skin redness, skin ulcers, tumors, axillary lymphadenectasis, nipple changes, nipple discharge, lymphadenectasis, tumor location, tenderness, number of tumors, tumor size, tumor texture, tumor boundaries, smooth surface, tumor morphology, activity, envelope, skin adhesions, and diagnosis. According to the actual situation, the data is quantized into specific numerical values. These features are closely related to the clinical medicine theory of breast cancer diagnosis and these structured data are used to describe the patient's condition. The patient S0000004 was selected and the main numerical type is described in fig. 2 below.

Step 1-2, performing text description on the 29 characteristics according to clinical medical rules to obtain a medical record text of the patient. Patient S0000004 his main diagnostic description is shown in figure 3 below.

Further, the step 2 of obtaining the high-order pathological image features includes:

step 2-3, extracting high-order features of the patient image by using a graph attention network GAT, wherein the feature V of the pathological image node level is { V ═ V } _i |i＝1，2，3，...，k}，v _i ∈R ^F And obtaining a final pathological image node high-order characteristic, namely V ' ═ { V ', through a multilayer GAT model as an input of GAT ' _i |i＝1，2，3，...，k}，v′ _i ∈R ^F′ (ii) a The detailed process is as follows:

e _ij ＝LeakyReLU(a ^T [Wv _i ||Wv _j ])

Where | | | is the splicing operation, a ^T ∈R ^2F′ Is a parameterized weight vector implemented by a fully-connected layer with LeakyReLU nonlinearity; e is an element of R ^t×t Is an attention coefficient matrix; t is the number of patients; w represents a weight matrix.

wherein ELU is a combination of Sigmoid and ReLU, which is a nonlinear activation function; w ₁ Is a weight matrix; the features of the final graph level are output after the features in the V' set are summed and averaged and pooled

The method is characterized in that the node-level features of pathological images of patients are averaged and pooled with the graph-level features

Further, in step 3, the text feature obtaining includes:

step 3-1, using the Bert model, and taking the diagnosis text description I obtained in the step 1-2 of the patient as input to obtain the text characteristics of the patient medical record

F ₁ Is the dimension of the medical record text after Bert.

In addition, 29 representative pathological features, defined as C, were selected from the patient EMR, with dimensions of 29X 1.

Further, in step 4, a schematic diagram of the multi-modal adaptive portal fusion is shown in fig. 4, and the specific process includes:

g ^t ＝ReLU(W _gt [G||T]+b _t )

g ^c ＝ReLU(W _gc [G||C]+b _c )

wherein W _gt ，W _gc Is a weight matrix of text and pathological modalities, b _t And b _c Is a bias vector, | | represents splicing operation, and the ReLU is a nonlinear activation function;

H＝g ^t ·(W _t T)+g ^c ·(W _c C)+b _H

wherein W _t And W _c Weight matrices for text and pathology information, respectively, b _H Is a bias vector;

M＝G+αH

Further, the classification prediction in step 5 specifically includes:

wherein

Linear represents the output of the fully connected layer.

where t is the total number of patients in the data set, P _n And

The invention provides a multi-modal breast cancer classification training method based on a graph attention network, which adopts multi-level features to represent image features in a pathological image processing stage, captures the fine-grained features of a pathological image by combining node-level vectors and graph-level vectors of the pathological image of a breast cancer patient, and simultaneously considers the interaction between the image and the image. Furthermore, a fusion strategy of the multi-modal adaptive gate is provided, and the core idea is to adjust the representation of one modality by using displacement vectors obtained from other modalities. The characteristics of the three modes of image, text and pathology are fused, and the breast cancer is classified. The application of the breast cancer automatic classification algorithm in clinic becomes possible.

The effects and advantages of the present invention are illustrated by experiments as follows. The data set used in the present invention contained data from 185 breast cancer patients, of which 82 were benign and 103 were malignant. Each patient contained 2-97 pathological images for study. Finally, there are a total of 3764 pathology images of size pixels, each marked as benign or malignant (1332 benign, 2432 malignant), all acquired using a laika Aperio AT2 slide scanner. In addition to the patient pathology image, each patient also contains a text description of the diagnosis and a numerical description of the patient's condition. To systematically verify the validity of the proposed model, four variants thereof were tested: (1) only single modal characteristics based on texts are adopted for classification, and the classification accuracy is 74.47%; (2) only the characteristics of the pathological image node level are classified, and the classification accuracy is 82.98%; (3) the image features obtained by splicing the node level features and the graph level features of the pathological images of the patient after average pooling are classified, and the classification accuracy is 87.23%; (4) only 29 representative structured features extracted from the EMR data were classified with a classification accuracy of 65.96%. The obtained accuracy is less than 93.62% of the classification accuracy of the model proposed by the user.

Based on the same inventive concept, the embodiment of the invention discloses a multi-modal breast cancer classification system based on a graph attention network, which comprises: the pretreatment module is used for extracting representative pathological features from the EMR of the patient, digitizing the features and performing text description to obtain a case history text; the pathological image feature generation module is used for extracting the features of a single pathological image of a patient to obtain the features of the pathological image node level, forming a pathological image set of the patient into a full-connected graph, taking the features of the pathological image node level as initial features, and acquiring the high-order features of the pathological image nodes by using a graph attention network; respectively carrying out average pooling on the initial features and the high-order features of the pathological image nodes, and then splicing to obtain final pathological image features of the patient; the text feature generation module is used for extracting diagnosis text features of the patient from a medical record text formed by EMR by using a pre-training language model; the multi-mode feature fusion module is used for fusing pathological image features, text features and pathological features of the patient through the multi-mode self-adaptive gate control unit; the self-adaptive gate control unit fuses three modal characteristics by using an attention gate, and performs weighted summation on the fused characteristics and pathological image characteristics to obtain final multi-modal fusion characteristics; and the training module is used for carrying out classification prediction on the breast cancer by the multi-mode fusion characteristics through a multilayer perceptron and training the model by defining a cross entropy loss function. And the prediction module is used for inputting the pathological image set, the medical history text and the pathological features of the patient into the trained model to obtain a breast cancer classification prediction result.

For the specific working process of each module described above, reference may be made to the corresponding process in the foregoing method embodiment, which is not described herein again. The division of the modules is only one logical functional division, and in actual implementation, there may be another division, for example, a plurality of modules may be combined or may be integrated into another system.

Based on the same inventive concept, the embodiment of the present invention discloses a computer system, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the computer program is loaded into the processor to implement the steps of the graph attention network-based multimodal breast cancer classification training method.

Based on the same inventive concept, the embodiment of the present invention discloses a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the computer program implements the steps of the graph attention network-based multimodal breast cancer classification training method.

The foregoing shows and describes the basic principles, principal steps and advantages of the present invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A multi-modal breast cancer classification training method based on a graph attention network is characterized by comprising the following steps:

step 4, fusing pathological image features, text features and pathological features of the patient through a multi-mode self-adaptive gate control unit; the self-adaptive gate control unit fuses three modal characteristics by using an attention gate, and performs weighted summation on the fused characteristics and pathological image characteristics to serve as final multi-modal fusion characteristics;

2. The graph attention network-based multimodal breast cancer classification training method according to claim 1, wherein the representative pathological features extracted from the electronic medical record of the patient in step 1 include age, sex, disease course type, personal tumor history, pectoral adhesion, family tumor history, orange peel appearance, previous treatment, breast deformation, neoadjuvant chemotherapy, dimple symptoms, skin redness, skin ulcer, tumor, axillary lymphadenectasis, nipple changes, nipple discharge, lymphadenectasis, tumor location, tenderness, tumor number, tumor size, tumor texture, tumor boundary, surface smoothness, tumor morphology, activity, envelope, skin adhesion and diagnosis; firstly, numerically expressing each feature; and performing text description on the extracted features according to clinical medical rules to obtain a medical record text of the patient.

3. The multi-modal breast cancer classification training method based on the graph attention network as claimed in claim 1, wherein the specific process of obtaining the pathological image features of the patient in the step 2 comprises:

step 2-1, let a breast cancer patient have k pathological images, and the pathological image set is represented as X ═ X _i |i＝1，2，3，...，k}，x _i ∈R ^P P is the dimension of each image, and the characteristic V of the node level of the pathological image is obtained through a DenseNet model _i |i＝1，2，3，...，k}，v _i ∈R ^F And F is a node-level feature of each imageDimension (d);

step 2-3, extracting high-order features of the patient image by using a graph attention network GAT, wherein the feature V of the pathological image node level is { V ═ V } _i |i＝1，2，3，...，k}，v _i ∈R ^F And obtaining a final pathological image node high-order characteristic, namely V ' ═ { V ', through a multilayer GAT model as an input of GAT ' _i |i＝1，2，3，...，k}，v′ _i ∈R ^F′ (ii) a F' is the dimension of the GAT output; the detailed process is as follows:

e _ij ＝LeakyReLU(a ^T [Wv _i ||Wv _j ])

wherein N is _i Is the neighborhood of node i in the graph;

finally using the normalized attention coefficient alpha _ij Calculating the weighted sum of the associated features to obtain the final output feature of each node:

wherein ELU is a combination of Sigmoid and ReLU, which is a nonlinear activation function; w ₁ Is a weight matrix(ii) a The features of the final graph level are summed by the features in the V' set and output after average pooling

4. The multi-modal breast cancer classification training method based on the graph attention network as claimed in claim 1, wherein the specific process of fusing the features of the three modalities through the multi-modal adaptive gating unit in the step 4 comprises:

step 4-1, according to the obtained pathological image characteristics G, the diagnosis text characteristics T and the pathological characteristics C of the patient, calculating to obtain two weights:

g ^t ＝ReLU(W _gt [G||T]+b _t )

g ^c ＝ReLU(W _gc [G||C]+b _c )

step 4-2, obtaining a vector H according to the two weights, the diagnosis text characteristic T and the pathological characteristic C:

H＝g ^t ·(W _t T)+g ^c ·(W _c C)+b _H

wherein W _t And W _c Is a weight matrix, b _H Is a bias vector;

and 4-3, weighting and summing the pathological image features G and the vectors H to obtain the final multi-modal fusion features M of the patient:

M＝G+αH

5. The multi-modal breast cancer classification training method based on the graph attention network as claimed in claim 1, wherein the classification prediction of breast cancer is performed by using a multi-layered perceptron in step 5, and the specific process comprises:

wherein

M is a multi-modal fusion feature, Linear represents the output of the fully connected layer;

where t is the total number of patients in the data set, P _n And

6. A multimodal breast cancer classification system based on a graph attention network, comprising:

7. The system of claim 6, wherein the pathological image feature generation module comprises:

a node level feature generation unit for obtaining the feature of the node level of the pathological image through the DenseNet model, wherein if a certain breast cancer patient has k pathological images, the pathological image set of the breast cancer patient is expressed as X ═ X _i |i＝1，2，3，...，k}，x _i ∈R ^P P is the dimension of each image, and the feature V ═ V at the node level of the pathological image _i |i＝1，2，3，...，k}，v _i ∈R ^F F is the dimension of the node level feature of each image;

the image high-order feature generation unit is used for extracting high-order features of the patient image by utilizing a graph attention network GAT; forming a full-connection graph by the pathological image set of the patient, wherein the vertex in the graph is a pathological image, and the initial characteristic of the pathological image is represented as the node level characteristic obtained by the node level characteristic generating unit; obtaining the final pathological image node high-order characteristic, namely V '═ { V' _i |i＝1，2，3，...，k}，v′ _i ∈R ^F′ (ii) a F' is the dimension of the GAT output; wherein the attention coefficient e of the characteristic of the node j to the node i _ij ：

e _ij ＝LeakyReLU(a ^T [Wv _i ||Wv _j ])

Where | | | is the splicing operation, a ^T ∈R ^2F′ Is a parameterized weight vector realized by a full connection layer, LeakyReLU is a nonlinear activation function, and W represents a weight matrix; using Softmax function to pair coefficient e _ij And (3) carrying out normalization to obtain the attention weight of the node j to the node i:

wherein N is _i Is the neighborhood of node i in the graph; final output characteristics of each node:

wherein ELU is a combination of Sigmoid and ReLU, and is nonlinear laserA live function; w ₁ Is a weight matrix;

an average pooling unit for summing the features in the V' set and performing average pooling to output

And a two-stage feature fusion unit for averagely pooling the node-level features of the pathological images of the patient with the graph-level features

8. The method for multi-modal breast cancer classification training based on graph attention network as claimed in claim 6, wherein the specific process of fusion of the multi-modal adaptive gating unit comprises:

according to the obtained pathological image characteristics G, the diagnosis text characteristics T and the pathological characteristics C of the patient, two weights are obtained through calculation:

g ^t ＝ReLU(W _gt [G||T]+b _t )

g ^c ＝ReLU(W _gc [G||C]+b _c )

obtaining a vector H according to the two weights, the diagnosis text characteristic T and the pathological characteristic C:

H＝g ^t ·(W _t T)+g ^c ·(W _c C)+b _H

wherein W _t And W _c Is a weight matrix, b _H Is a bias vector;

and weighting and summing the pathological image features G and the vector H to obtain the final multi-modal fusion features M of the patient:

M＝G+αH

9. A computer system comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the computer program when loaded into the processor implements the steps of the graph attention network based multimodal breast cancer classification training method according to any of claims 1-5.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the graph attention network based multimodal breast cancer classification training method according to any one of claims 1-5.