CN115761371A

CN115761371A - Medical image classification method and device, storage medium and electronic equipment

Info

Publication number: CN115761371A
Application number: CN202211517079.6A
Authority: CN
Inventors: 王伟光; 廖霄扬; 蔡巍; 张霞
Original assignee: Neusoft Corp; Shenyang Neusoft Intelligent Medical Technology Research Institute Co Ltd
Current assignee: Neusoft Corp; Shenyang Neusoft Intelligent Medical Technology Research Institute Co Ltd
Priority date: 2022-11-29
Filing date: 2022-11-29
Publication date: 2023-03-07

Abstract

The disclosure relates to a medical image classification method, a medical image classification device, a storage medium and an electronic device, which can acquire metadata information corresponding to a medical image to be classified; acquiring medical text information corresponding to the medical image according to the metadata information, wherein the medical text information comprises clinical diagnosis and treatment information and medical prior knowledge; and classifying the medical images according to the medical images and the medical text information through a target classification model obtained through pre-training.

Description

Medical image classification method and device, storage medium and electronic equipment

Technical Field

The present disclosure relates to the field of analysis of medical data, and in particular, to a medical image classification method, apparatus, storage medium, and electronic device.

Background

Multimodal refers to a variety of ways in which information is carried and presented. The multi-mode carries abundant and heterogeneous diversified information and has great value. Therefore, in the computer field, the research of multi-modal is becoming a hot spot of research in both academic and industrial fields.

In the medical field, researchers comprehensively process data of multiple modalities by using machine learning, and improve the accuracy of target prediction by fusing the characteristics of each modality. Unlike multi-modal fusion in the general field, however, multi-modal information in the current medical image field often focuses on fusion reasoning about information produced under different examination techniques. In a classification model of brain images, for example, an auxiliary medical reference is usually provided by fusing the results of imaging techniques such as computed tomography (CT scan), positron Emission Tomography (PET), magnetic Resonance Imaging (MRI), functional magnetic resonance imaging (fMRI), and the like.

Disclosure of Invention

The purpose of the present disclosure is to provide a medical image classification method, apparatus, storage medium and electronic device.

In a first aspect, the present disclosure provides a medical image classification method, including:

acquiring metadata information corresponding to medical images to be classified;

acquiring medical text information corresponding to the medical image according to the metadata information, wherein the medical text information comprises clinical diagnosis and treatment information and medical prior knowledge;

and classifying the medical images according to the medical images and the medical text information through a target classification model obtained through pre-training.

Optionally, the metadata information includes user identification information of a target user corresponding to the medical image, and the obtaining medical text information corresponding to the medical image according to the metadata information includes:

determining the clinical diagnosis and treatment information of the target user according to the user identification information;

determining the medical prior knowledge associated with the target user from the clinical medical information.

Optionally, the classifying the medical image according to the target classification model obtained by pre-training the medical image and the medical text information includes:

constructing a multi-mode knowledge graph according to the medical image, the clinical diagnosis and treatment information and the medical priori knowledge, wherein the multi-mode knowledge graph represents image features of the medical image and text features of the medical text information;

and classifying the medical image through the target classification model according to the multi-modal knowledge graph.

Optionally, the constructing a multi-modal knowledge graph according to the medical image, the clinical diagnosis and treatment information, and the medical priori knowledge includes:

constructing a first knowledge graph corresponding to a target user according to the clinical diagnosis and treatment information, wherein the first knowledge graph comprises a central node and a plurality of first nodes which are respectively connected with the central node, the central node is a node corresponding to the target user, and the first nodes represent the diagnosis and treatment information of the target user;

constructing a second knowledge graph on the basis of the first knowledge graph according to the medical prior knowledge, wherein the second knowledge graph comprises the central node, a plurality of first nodes respectively connected with the central node and a plurality of second nodes respectively connected with the first nodes, and the second nodes represent the medical prior knowledge related to the target user;

and adding a preset virtual node corresponding to the medical image on the second knowledge graph to generate the multi-mode knowledge graph, wherein the preset virtual node is connected with the central node.

Optionally, the target classification model includes a medical image vector characterization model, a knowledge-graph vector characterization model connected to the medical image vector characterization model, and a classifier connected to the knowledge-graph vector characterization model, and the classifying the medical image through the target classification model according to the multi-modal knowledge graph includes:

determining a first vector representation of the medical image through the medical image vector characterization model according to the metadata information, the first vector representation representing semantic features and sequence features characterizing the medical image;

determining, by the knowledge-graph vector characterization model, a second vector representation for each node in the multi-modal knowledge-graph;

determining a target vector representation of the multi-modal knowledge-graph from the first vector representation and the second vector representation;

classifying the medical image by the classifier according to the target vector representation.

Optionally, the medical image vector characterization model comprises a semantic feature extraction model, and a sequence feature extraction model connected to the semantic feature extraction model, and the determining the first vector representation of the medical image by the medical image vector characterization model according to the metadata information comprises:

reading image acquisition sequence information corresponding to the medical image from the metadata information;

dividing the medical image into a plurality of sequence pictures according to the image acquisition sequence information;

extracting semantic features of the sequence pictures through the semantic feature extraction model aiming at each sequence picture;

and determining the first vector representation of the medical image through the sequence feature extraction model according to the semantic features respectively corresponding to each sequence picture.

Optionally, the knowledge-graph vector characterization model comprises a GAT (graph attention network), the GAT comprises a plurality of sequentially connected network layers, and the determining, by the knowledge-graph vector characterization model, the second vector representation of each node in the multi-modal knowledge-graph comprises:

for each network layer of the GAT, calculating a correlation coefficient of two nodes in an associated node pair in the multi-modal knowledge graph through the network layer; the associated node pair is two nodes connected by one edge;

for a third node in each associated node pair, determining a vector representation corresponding to the third node in the network layer according to the correlation coefficient of the associated node pair, a vector representation obtained by the third node in a previous network of the network layer, and a vector representation of a fourth node obtained by the previous network, where the third node is one node in the associated node pair, and the fourth node is the other node in the associated node pair;

and if the network layer is the last layer of the GAT, taking the vector representation corresponding to the third node in the network layer as the second vector representation of the third node.

Optionally, the method further comprises:

under the condition that the network layer is not the last layer of the GAT, acquiring a correlation coefficient corresponding to each target associated node pair in the multi-modal knowledge graph calculated by the network layer, wherein the target associated node pair comprises a central node and a node connected with the central node;

updating the correlation coefficient of a designated node pair in the target associated node pair according to the correlation coefficient corresponding to each target associated node pair, wherein one node in the designated node pair is the central node, and the other node in the designated node pair is a preset virtual node connected with the central node;

calculating a vector representation of each node by a next network layer of the network layers according to the updated correlation coefficient, and taking the vector representation of each node as the second vector representation in the case that the next network layer is a last network of the GAT.

Optionally, the updating the correlation coefficient of the designated node pair in the target associated node pair according to the correlation coefficient respectively corresponding to each target associated node pair includes:

determining a maximum correlation coefficient from the correlation coefficients respectively corresponding to each target associated node pair;

taking a preset multiple of the maximum correlation coefficient as a target correlation coefficient, wherein the preset multiple is greater than or equal to a preset numerical value;

and updating the correlation coefficient of the designated node pair to be the target correlation coefficient.

Optionally, the determining a target vector representation of the multi-modal knowledge-graph from the first vector representation and the second vector representation comprises:

replacing the second vector representation corresponding to the preset virtual node with the first vector representation to obtain a second vector representation after the preset virtual node is updated;

and determining the target vector representation according to the preset weight corresponding to the preset virtual node and the second vector representation of each node.

In a second aspect, the present disclosure provides a medical image classification apparatus, the apparatus comprising:

the first acquisition module is used for acquiring metadata information corresponding to the medical images to be classified;

the second acquisition module is used for acquiring medical text information corresponding to the medical image according to the metadata information, wherein the medical text information comprises clinical diagnosis and treatment information and medical priori knowledge;

and the classification module is used for classifying the medical images according to the medical images and the medical text information through a target classification model obtained through pre-training.

In a third aspect, the present disclosure provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method of the first aspect of the present disclosure.

In a fourth aspect, the present disclosure provides an electronic device comprising:

a memory having a computer program stored thereon;

a processor for executing the computer program in the memory to implement the steps of the method of the first aspect of the disclosure.

Through the technical scheme, the medical images can be classified according to the medical image results of the user, the clinical diagnosis and treatment information related to the medical images and the data of the medical priori knowledge in different modes, and due to the fact that the semantic information contained in the data of the different modes is different, the medical images are classified based on the multi-mode medical data, and compared with a mode of classifying only by means of one-dimensional image features of the medical images, the accuracy of the medical image classification results can be remarkably improved.

Additional features and advantages of the disclosure will be set forth in the detailed description which follows.

Drawings

The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure without limiting the disclosure. In the drawings:

FIG. 1 is a flow chart illustrating a medical image classification method according to an exemplary embodiment;

FIG. 2 is a flow chart of a medical image classification method according to the embodiment shown in FIG. 1;

FIG. 3 is a flow diagram illustrating a method of constructing a multimodal knowledge-graph in accordance with the embodiment shown in FIG. 2;

FIG. 4 is a schematic diagram illustrating a first knowledge-graph in accordance with an exemplary embodiment;

FIG. 5 is a schematic illustration of a second knowledge-graph according to the embodiment shown in FIG. 4;

FIG. 6 is a schematic diagram of a multimodal knowledge-graph shown in accordance with the embodiment shown in FIG. 5;

FIG. 7 is a flow chart of a medical image classification method according to the embodiment shown in FIG. 2;

FIG. 8 is an architectural diagram illustrating classification of medical images based on a multi-modal knowledge-graph in accordance with an exemplary embodiment;

fig. 9 is a block diagram illustrating a medical image classification apparatus according to an exemplary embodiment;

FIG. 10 is a block diagram of an electronic device shown in accordance with an example embodiment.

Detailed Description

The following detailed description of the embodiments of the disclosure refers to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present disclosure, are given by way of illustration and explanation only, not limitation.

It should be noted that all the actions of acquiring signals, information or data in the present disclosure are performed under the premise of complying with the corresponding data protection regulation policy of the country of the location and obtaining the authorization given by the owner of the corresponding device.

The method is mainly applied to the analysis scene of the medical images, the related technologies mainly perform classification judgment on the medical images by combining the fusion result based on a machine learning model after fusing the imaging results of different inspection technologies (such as X-ray tomography, positron emission tomography, magnetic resonance imaging and functional magnetic resonance imaging), and the classification mode performs image identification on the medical images based on the image characteristics of the medical images, so that the medical images are classified according to the identification result, but the image characteristics of the medical images can only represent the characteristic information of one dimension of the medical images, cannot represent the characteristic information of other dimensions of the medical images, and the classification accuracy of the medical images can be influenced.

In order to solve the above problems, the present disclosure provides a medical image classification method, a medical image classification device, a storage medium, and an electronic device, and the following describes in detail a specific embodiment of the present disclosure with reference to the drawings.

Fig. 1 is a flowchart illustrating a medical image classification method according to an exemplary embodiment, as shown in fig. 1, the method including the steps of:

in step S11, metadata information corresponding to the medical image to be classified is acquired.

The medical image may include images in DICOM (Digital Imaging and Communications in Medicine) format, such as CT medical images, magnetic resonance images, etc., and the metadata information records therein the corresponding DICOM image inspection apparatus, inspection method (generally referred to as Imaging technology), and user basic information (such as user identification, gender, age), etc.

In the classification task of the medical image, if the rich information implied in the corresponding metadata information can be fully utilized, the representation capability of the image can be enhanced, and particularly, the accuracy of image identification can be improved for the long tail problem with unobvious features.

In step S12, medical text information corresponding to the medical image is obtained according to the metadata information, where the medical text information includes clinical diagnosis information and medical priori knowledge.

The clinical diagnosis and treatment information may include clinical and biological information of the user, for example, information such as name, gender, age, symptom, medical auxiliary examination, physical examination index parameter, suspected diagnosis result, etc. of the user, the medical priori knowledge is usually medical diagnosis and treatment standard information summarized based on medical knowledge and a large amount of clinical data, and a series of medical related knowledge such as names of various diseases, symptoms of each disease, examination items required by the disease, examination processes of each examination, discrimination processes of the examination results, etc. are defined in the medical priori knowledge. The Medical priori knowledge may be obtained from a preset priori knowledge base, which may include, for example, a SNOMED (systematic Nomenclature of Medicine) priori knowledge base, an ICD (International Classification of Diseases) priori knowledge base, a BMJ (British Medical Journal) priori knowledge base, and the like.

The metadata information includes user identification information of a target user corresponding to the medical image, and in this step, the clinical diagnosis and treatment information of the target user can be determined according to the user identification information; the medical prior knowledge associated with the target user is then determined based on the clinical medical information. For example, the medical priori knowledge related to the suspected diagnosis result may be associated with the suspected diagnosis result in the clinical diagnosis information, or the medical priori knowledge related to the symptom may be associated with the symptom in the clinical diagnosis information.

In step S13, the medical image is classified according to the target classification model obtained by pre-training the medical image and the medical text information.

The target classification model may be a deep learning model obtained by pre-training.

By adopting the method, the medical images can be classified according to the medical image result of the user, the clinical diagnosis and treatment information related to the medical images and the data of different modes of medical priori knowledge, and because the semantic information contained in the data of different modes is different, the medical images are classified based on the multi-mode medical data, and compared with a mode of classifying only by depending on the one-dimensional image characteristics of the medical images, the accuracy of the medical image classification result can be remarkably improved.

Fig. 2 is a flowchart of a medical image classification method according to the embodiment shown in fig. 1, and as shown in fig. 2, step S13 includes the following sub-steps:

in step S131, a multi-modal knowledge graph is constructed according to the medical image, the clinical diagnosis and treatment information, and the medical priori knowledge, wherein the multi-modal knowledge graph represents image features of the medical image and text features of the medical text information.

Fig. 3 is a flowchart illustrating a method for constructing a multi-modal knowledge-graph according to the embodiment shown in fig. 2, wherein, as shown in fig. 3, step S131 comprises the following sub-steps:

in step S1311, a first knowledge graph corresponding to a target user is constructed according to the clinical diagnosis and treatment information, where the first knowledge graph includes a central node and a plurality of first nodes respectively connected to the central node, the central node is a node corresponding to the target user, and the first nodes represent diagnosis and treatment information of the target user.

In a possible implementation manner of this step, the obtained user ID of the target user may be used as the central node, then various diagnosis and treatment information (such as information about sex, age, symptoms, medical auxiliary examination, physical examination index parameters, suspected diagnosis results, and the like) corresponding to the target user is used as the first node connected to the central node, and different diagnosis and treatment information corresponds to different first nodes, and after the central node points to each first node, the first knowledge graph is constructed and obtained.

Illustratively, fig. 4 is a schematic diagram of a first knowledge-graph according to an exemplary embodiment, as shown in fig. 4, a central node of the first knowledge-graph corresponds to a "user ID", and starting from the central node, a plurality of first nodes are pointed to, and the first nodes respectively correspond to: the clinical diagnosis and treatment information such as "woman", "36", "sleepiness", "dizziness", "limb weakness", "185/105", etc. the first knowledge graph shown in fig. 4 is only an exemplary representation, and the disclosure does not limit the specific information corresponding to each node in the first knowledge graph.

In step S1312, a second knowledge-graph is constructed on the basis of the first knowledge-graph according to the medical prior knowledge, the second knowledge-graph including the central node, a plurality of first nodes respectively connected to the central node, and a plurality of second nodes respectively connected to the first nodes, the second nodes representing the medical prior knowledge associated with the target user.

In this step, the second node may continue to be connected to the first node of the first knowledge-graph, and the first node points to the second node. Fig. 5 is a schematic diagram of a second knowledge-graph according to the embodiment shown in fig. 4, for example, for the first node "diabetes" in fig. 4, all medical prior knowledge related to "diabetes" can be obtained from the prior knowledge base, and then all medical prior knowledge related to "diabetes" can be recorded through the second node "diabetes medical prior knowledge" connected to the first node "diabetes" as shown in fig. 5.

In step S1313, a preset virtual node corresponding to the medical image is added to the second knowledge graph to generate the multi-modal knowledge graph, and the preset virtual node is connected to the central node.

In a possible implementation manner, the nodes corresponding to the medical images may be added to the second knowledge-graph in the form of triple data (user ID, imaging examination, medical image) through the preset virtual nodes, for example, fig. 6 is a schematic diagram of a multi-modal knowledge-graph shown in fig. 5, as shown in fig. 6, gray filled nodes are the preset virtual nodes, the center node points to the preset virtual nodes, and the preset virtual nodes are the nodes corresponding to the medical images.

Based on the steps shown in fig. 3, a multi-modal knowledge map containing medical images, clinical diagnosis and treatment information, and medical prior knowledge can be constructed.

In step S132, the medical image is classified by the target classification model according to the multi-modal knowledge base.

The target classification model comprises a medical image vector representation model, a knowledge map vector representation model connected with the medical image vector representation model and a classifier connected with the knowledge map vector representation model.

Fig. 7 is a flowchart illustrating a medical image classification method according to the embodiment shown in fig. 2, and as shown in fig. 7, step S132 includes the following sub-steps:

in step S1321, a first vector representation of the medical image is determined by the medical image vector characterization model according to the metadata information, the first vector representation representing semantic features and sequence features characterizing the medical image.

The medical image vector characterization model includes a semantic feature extraction model, which may include, for example, a CNN (Convolutional Neural Networks) model, and a sequence feature extraction model connected to the semantic feature extraction model, which may include, for example, an LSTM (Long Short-Term Memory) model.

In this step, the first vector representation of the medical image may be determined by:

reading image acquisition sequence information corresponding to the medical image from the metadata information; the image acquisition sequence information may include, for example, any sequence information of acquisition time, user posture (such as lying on side, lying on back, lying on stomach), and acquisition position based on the same reference point. Then dividing the medical image into a plurality of sequence pictures according to the image acquisition sequence information; extracting the semantic features of the sequence pictures through the semantic feature extraction model aiming at each sequence picture; and determining the first vector representation of the medical image through the sequence feature extraction model according to the semantic features respectively corresponding to each sequence picture.

Generally, a medical image in DICOM format has a time sequence feature, and in the case of a CT image, a CT image finally printed out is synthesized by a plurality of sequence images, and the acquisition time, the user posture, or the acquisition position of each sequence image may be different.

For example, fig. 8 is a schematic structural diagram of medical image classification based on a multi-modal knowledge graph according to an exemplary embodiment, as shown in fig. 8, a DICOM image obtained as a result of CT examination may be first divided into three sequence pictures, i.e., an image 1, an image 2, and an image 3, according to image acquisition sequence information, and for each sequence picture, semantic features of each sequence picture may be extracted through a CNN convolutional neural network and converted into corresponding feature vectors x, where a specific calculation process is shown in formula (1):

wherein, is a convolution operation, i.e. multiplication by bits in addition, X _in One of the sequence pictures representing the input, w _i For the convolution kernel, i is the number of channels of the convolution kernel, and for example, a convolution kernel with dimension (3, 3) may be selected. maxPooling is maximum pooling operation, namely, the maximum value in the feature map corresponding to the filter is taken as a pooling layer to be output, then the feature maps output by 3 channels can be added in a bit-by-bit mode to obtain a final feature map, and then a flatten stretching function is used for splicing and flattening the output feature map matrix according to the 0 th dimension to obtain the vector representation x of one sequence picture.

And then determining the first vector representation of the medical image through the sequence feature extraction model according to the semantic feature corresponding to each sequence picture. In a possible implementation manner, semantic features extracted by the CNN convolutional neural network may be arranged into a vector sequence { x1, x2, x3} according to a sequence order, as shown in fig. 8, after the vector sequence is input into an LSTM network and sequence features thereof are extracted, a corresponding first vector representation h of a CT image with sequence features is output _c . The specific calculation process is shown as formula (2):

h _i ＝tanh(U·x _i +W·h _i-1 ) (2)

wherein U, W are learnable weight parameters, h _i For the hidden layer vector of the ith layer of the LSTM network, assuming that the LSTM network includes 3 hidden layers, i =1,2,3, h ₀ An initialization vector for the LSTM network hidden layer; tanh is a nonlinear activation function, and in the present disclosure, a hidden layer vector h of the last layer of the LSTM network may be taken ₃ First vector representation h as a CT image _c . The foregoing examples are illustrative only, and the disclosure is not limited thereto.

It should be noted that, in the above example, only a plurality of sequence pictures obtained by dividing the medical image based on one image acquisition sequence information are shown, in another possible implementation manner of the present disclosure, if there are a plurality of image acquisition sequence information, a vector representation of the medical image may be obtained for the plurality of sequence pictures obtained by dividing each image acquisition sequence information, and then each vector representation is fused (for example, weighted summation) to obtain the final first vector representation of the medical image.

In step S1322, a second vector representation of each node in the multimodal knowledge-graph is determined by the knowledge-graph vector characterization model.

As shown in fig. 8, the knowledge-graph vector characterization model may be, for example, a GAT (graph attention network) network, which may include a plurality of network layers connected in sequence.

In this step, the constructed multi-modal knowledge-graph as shown in fig. 6 can be input into the GAT network, so that, for each network layer of the GAT, the correlation coefficient of two nodes in the multi-modal knowledge-graph in the associated node pair is calculated by the network layer; the associated node pair is two nodes connected by one edge, as shown in fig. 6, the associated node pair may include (node "user ID" → node "diabetes"), (node "user ID" → node "girl"), (node "user ID" → node "sleepiness"), and so on; for a third node in each associated node pair, determining a vector representation corresponding to the third node in the network layer according to the correlation coefficient of the associated node pair, a vector representation obtained by the third node in a network layer above the network layer, and a vector representation of a fourth node obtained by the network layer above, where the third node is one node in the associated node pair, and the fourth node is the other node in the associated node pair; taking an associated node pair (node "user ID" → node "somnolence") as an example, the third node may be node "user ID", and the fourth node may be node "somnolence"; thus, in the case where the network layer is the last layer network of the GAT, the vector representation corresponding to the third node at the network layer can be used as the second vector representation of the third node.

For example, in the process of encoding the multi-modal knowledge-graph, the present disclosure may go through two stages of "original encoding-fine tuning encoding", where in the original encoding stage, the multi-modal knowledge-graph (the preset virtual nodes in the graph are only the simple symbolic nodes representing "medical image") may be input into the GAT network, and then through each network layer of the GAT network, the correlation coefficients of two nodes in each associated node pair in the multi-modal knowledge-graph are calculated based on formula (3):

wherein alpha is _ij Representing the correlation coefficient of two nodes (i.e., node i and node j) in the associated node pair, W is a learnable parameter matrix, a is a learnable vector, h _i Vector, N, representing node i in the graph _i The first-order neighbor matrix is all first-order neighbor nodes of the node i, the first-order neighbor matrix refers to nodes directly connected with the node i through edges, | | is splicing operation, two vectors can be spliced in the last dimension, and LeakyReLU represents a nonlinear activation function.

Then, for the third node in each associated node pair, the vector representation of the third node at each network layer can be calculated by the following formula (4):

wherein h is _i ' represents the vector representation of the third node i calculated by the current network layer; h is _i Representing the vector representation, h, obtained by the third node i in the network layer higher than the current network layer _j Representing the vector representation obtained by the fourth node j in the network layer above the current network layer, K representing the number of heads used by the GAT network, and W ^k For the k-th head learnable parameter matrix, reLU is a nonlinear activation function.

In this way, when it is determined that the current network layer is the last network layer of the GAT, the vector corresponding to the third node i in the current network layer may be represented as h _i ' as a second vector representation of the third node i.

The foregoing examples are illustrative only, and the disclosure is not limited thereto.

It should be noted that, for each layer of the GAT network, in the case that it is determined that the network layer is not the last layer of the GAT network, a correlation coefficient corresponding to each target associated node pair in the multi-modal knowledge graph calculated by the network layer may be obtained, where the target associated node pair includes a central node and a node connected to the central node; for example, the "user ID" in fig. 6 is a central node, and each first node directly connected to the "user ID" and the "user ID" central node form the target associated node pair; then, updating the correlation coefficient of a designated node pair in the target associated node pair according to the correlation coefficient corresponding to each target associated node pair respectively, wherein one node in the designated node pair is the central node, and the other node in the designated node pair is a preset virtual node connected with the central node; as shown in fig. 6, the "medical image" node is the preset virtual node, and the designated node pair is the ("user ID" → "medical image") in fig. 6, where the correlation coefficient of the designated node pair can be updated in the following manner:

determining the maximum correlation coefficient from the correlation coefficients respectively corresponding to each target associated node pair; taking a preset multiple of the maximum correlation coefficient as a target correlation coefficient, wherein the preset multiple is greater than or equal to a preset numerical value; for example, the preset value may be 1, and the preset multiple may be 1.5 times, so that the correlation coefficient of the designated node pair is updated to the target correlation coefficient, and then each target associated node may be normalized with respect to the corresponding correlation coefficient, so that the contribution rate of image features (i.e., semantic features and sequence features) corresponding to the "medical image" node in the multi-modal knowledge graph in the classification task may be significantly increased, and the accuracy of classification may be further increased.

The vector representation of each node may then be calculated by the next network layer of the network layer based on the updated correlation coefficient (e.g., may be calculated by equation 4), and in the case where the next network layer is the last network layer of the GAT, the vector representation of each node is taken as the second vector representation.

In step S1323, a target vector representation of the multimodal knowledge map is determined from the first vector representation and the second vector representation.

Based on the method provided in step S1322, the second vector representation of each node in the multi-modal knowledge-graph may be determined, and it should be particularly noted that, a preset virtual node representing "medical image" in the multi-modal knowledge-graph may also obtain a corresponding vector representation, but since the node is only one symbolic node, in this step, the second vector representation corresponding to the preset virtual node may be replaced with the first vector representation of the medical image encoded in step S1321, so as to obtain an updated second vector representation of the preset virtual node (i.e., a fine-tuning stage), and then the target vector representation may be determined according to the preset weight corresponding to the preset virtual node and the second vector representation of each node.

Illustratively, the target vector representation of the multimodal knowledge-graph can be calculated by the following equation (5):

wherein h is _G Target vector representation for multimodal knowledge-graph, h _i An updated second vector representation of the image node, h, for the "medical image _i Is a second vector representation of other nodes in the graph, and G is a set of nodes in the graph. β is a hyperparameter and may be set, for example, to 70%.

In step S1324, the medical image is classified by the classifier according to the target vector representation.

The classifier may include, for example, DNN (Deep Neural Networks).

In this step, as shown in fig. 8, the target vector representation of the multi-modal knowledge-map (i.e., the global representation of the multi-modal knowledge-map) may be input into the DNN deep neural network, and then the classification result of the medical image may be output through the DNN deep neural network.

In one possible implementation, the DNN deep neural network may be designed as a 10-layer network, for example, and model training is performed by using a cross entropy loss function, and the specific calculation process is as shown in equation (6) and equation (7):

h _pred ＝W ⁹ (…ReLU(W ¹ ·ReLU(W′ ⁰ ·h _G ))) (6)

loss＝CrossEntropy(h _pred ，y _label ) (7)

wherein h is _pred For the prediction value of DNN output, reLu is a nonlinear activation function, W is a learnable parameter matrix, loss is a loss function value, cross Entropy is a cross entropy loss function, y _label To train the label value.

Fig. 9 is a block diagram illustrating a medical image classification apparatus according to an exemplary embodiment, as shown in fig. 9, the apparatus including:

a first obtaining module 901, configured to obtain metadata information corresponding to a medical image to be classified;

a second obtaining module 902, configured to obtain, according to the metadata information, medical text information corresponding to the medical image, where the medical text information includes clinical diagnosis and treatment information and medical priori knowledge;

a classification module 903, configured to classify the medical image according to the medical image and the medical text information through a pre-trained target classification model.

Optionally, the metadata information includes user identification information of a target user corresponding to the medical image, and the second obtaining module 902 is configured to determine the clinical diagnosis and treatment information of the target user according to the user identification information; determining the medical prior knowledge associated with the target user according to the clinical medical information.

Optionally, the classification module 903 is configured to construct a multi-modal knowledge graph according to the medical image, the clinical diagnosis and treatment information, and the medical priori knowledge, where the multi-modal knowledge graph represents image features of the medical image and text features of the medical text information; and classifying the medical image through the target classification model according to the multi-modal knowledge graph.

Optionally, the classification module 903 is configured to construct a first knowledge graph corresponding to a target user according to the clinical diagnosis and treatment information, where the first knowledge graph includes a central node and a plurality of first nodes respectively connected to the central node, the central node is a node corresponding to the target user, and the first nodes represent the diagnosis and treatment information of the target user;

Optionally, the target classification model includes a medical image vector characterization model, a knowledge-graph vector characterization model connected to the medical image vector characterization model, and a classifier connected to the knowledge-graph vector characterization model, the classification module 903 is configured to determine a first vector representation of the medical image through the medical image vector characterization model according to the metadata information, where the first vector representation represents a semantic feature and a sequence feature of the medical image; determining, by the knowledge-graph vector characterization model, a second vector representation for each node in the multi-modal knowledge-graph; determining a target vector representation of the multi-modal knowledge-graph from the first vector representation and the second vector representation; classifying the medical image by the classifier according to the target vector representation.

Optionally, the medical image vector characterization model includes a semantic feature extraction model and a sequence feature extraction model connected to the semantic feature extraction model, and the classification module 903 is configured to read image acquisition sequence information corresponding to the medical image from the metadata information; dividing the medical image into a plurality of sequence pictures according to the image acquisition sequence information; for each sequence picture, extracting the semantic features of the sequence picture through the semantic feature extraction model; and determining the first vector representation of the medical image through the sequence feature extraction model according to the semantic features respectively corresponding to each sequence picture.

Optionally, the knowledge-graph vector characterization model includes a graph attention network GAT, the GAT includes a plurality of sequentially connected network layers, and the classification module 903 is configured to calculate, for each network layer of the GAT, a correlation coefficient of two nodes in an associated node pair in the multi-modal knowledge-graph through the network layer; the associated node pair is two nodes connected by one edge; for a third node in each associated node pair, determining a vector representation corresponding to the third node in the network layer according to the correlation coefficient of the associated node pair, a vector representation obtained by the third node in a previous network of the network layer, and a vector representation of a fourth node obtained by the previous network, where the third node is one node in the associated node pair, and the fourth node is the other node in the associated node pair; and if the network layer is the last layer of the GAT, taking the vector representation corresponding to the third node in the network layer as the second vector representation of the third node.

Optionally, the classifying module 903 is configured to, when the network layer is not the last layer network of the GAT, obtain a correlation coefficient corresponding to each target associated node pair in the multi-modal knowledge graph calculated by the network layer, where the target associated node pair includes a central node and a node connected to the central node; updating the correlation coefficient of a designated node pair in the target associated node pair according to the correlation coefficient corresponding to each target associated node pair, wherein one node in the designated node pair is the central node, and the other node in the designated node pair is a preset virtual node connected with the central node; and calculating the vector representation of each node through the next network layer of the network layers according to the updated correlation coefficient, and taking the vector representation of each node as the second vector representation under the condition that the next network layer is the last network layer of the GAT.

Optionally, the classification module 903 is configured to determine a maximum correlation coefficient from correlation coefficients respectively corresponding to each target associated node pair; taking a preset multiple of the maximum correlation coefficient as a target correlation coefficient, wherein the preset multiple is greater than or equal to a preset numerical value; and updating the correlation coefficient of the designated node pair to be the target correlation coefficient.

Optionally, the classification module 903 is configured to replace the second vector representation corresponding to the preset virtual node with the first vector representation to obtain an updated second vector representation of the preset virtual node; determining the target vector representation according to a preset weight corresponding to the preset virtual node and the second vector representation of each node.

With regard to the apparatus in the above embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be described in detail here.

Fig. 10 is a block diagram illustrating an electronic device 1000 in accordance with an example embodiment. As shown in fig. 10, the electronic device 1000 may include: a processor 1001 and a memory 1002. The electronic device 1000 may also include one or more of a multimedia component 1003, an input/output (I/O) interface 1004, and a communications component 1005.

The processor 1001 is configured to control the overall operation of the electronic device 1000, so as to complete all or part of the steps in the medical image classification method. The memory 1002 is used to store various types of data to support operation of the electronic device 1000, such as instructions for any application or method operating on the electronic device 1000 and application-related data, such as contact data, messaging, pictures, audio, video, and so forth. The Memory 1002 may be implemented by any type of volatile or non-volatile Memory device or combination thereof, such as a Static Random Access Memory (SRAM), an Electrically Erasable Programmable Read-Only Memory (EEPROM), an Erasable Programmable Read-Only Memory (EPROM), a Programmable Read-Only Memory (PROM), a Read-Only Memory (ROM), a magnetic Memory, a flash Memory, a magnetic disk, or an optical disk. The multimedia components 1003 may include screen and audio components. Wherein the screen may be, for example, a touch screen and the audio component is used for outputting and/or inputting audio signals. For example, the audio component may include a microphone for receiving an external audio signal. The received audio signals may further be stored in memory 1002 or transmitted through communication component 1005. The audio assembly also includes at least one speaker for outputting audio signals. The I/O interface 1004 provides an interface between the processor 1001 and other interface modules, such as a keyboard, mouse, buttons, etc. These buttons may be virtual buttons or physical buttons. The communication component 1005 is used for wired or wireless communication between the electronic device 1000 and other devices. Wireless Communication, such as Wi-Fi, bluetooth, near Field Communication (NFC), 2G, 3G, 4G, NB-IOT, eMTC, or other 5G, or combinations thereof, which is not limited herein. The corresponding communication component 1005 may thus include: wi-Fi module, bluetooth module, NFC module, etc.

In an exemplary embodiment, the electronic Device 1000 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors or other electronic components for performing the above-mentioned medical image classification method.

In another exemplary embodiment, a computer readable storage medium including program instructions for implementing the steps of the medical image classification method described above when executed by a processor is also provided. For example, the computer readable storage medium may be the memory 1002 comprising program instructions executable by the processor 1001 of the electronic device 1000 to perform the medical image classification method described above.

In another exemplary embodiment, a computer program product is also provided, which comprises a computer program executable by a programmable apparatus, the computer program having code portions for performing the medical image classification method described above when executed by the programmable apparatus.

The preferred embodiments of the present disclosure are described in detail with reference to the accompanying drawings, however, the present disclosure is not limited to the specific details of the above embodiments, and various simple modifications may be made to the technical solution of the present disclosure within the technical idea of the present disclosure, and these simple modifications all belong to the protection scope of the present disclosure.

It should be noted that, in the above embodiments, the various features described in the above embodiments may be combined in any suitable manner, and in order to avoid unnecessary repetition, various possible combinations will not be further described in the present disclosure.

In addition, any combination of various embodiments of the present disclosure may be made, and the same should be considered as the disclosure of the present disclosure, as long as it does not depart from the spirit of the present disclosure.

Claims

1. A method for classifying medical images, the method comprising:

2. The method according to claim 1, wherein the metadata information includes user identification information of a target user corresponding to the medical image, and the obtaining medical text information corresponding to the medical image according to the metadata information includes:

determining the medical prior knowledge associated with the target user according to the clinical medical information.

3. The method according to claim 1 or 2, wherein the classifying the medical image according to the medical image and the target classification model obtained by pre-training the medical text information comprises:

4. The method of claim 3, wherein constructing a multi-modal knowledge-map from the medical imagery, the clinical findings information, and the medical prior knowledge comprises:

5. The method of claim 4, wherein the target classification model comprises a medical image vector characterization model, a knowledge-graph vector characterization model connected to the medical image vector characterization model, and a classifier connected to the knowledge-graph vector characterization model, and wherein classifying the medical image according to the multi-modal knowledge graph through the target classification model comprises:

6. The method according to claim 5, wherein the medical image vector characterization model comprises a semantic feature extraction model, and a sequence feature extraction model connected to the semantic feature extraction model, and the determining the first vector representation of the medical image by the medical image vector characterization model according to the metadata information comprises:

7. The method of claim 5, wherein the knowledge-graph vector characterization model comprises a graph attention network (GAT), wherein the GAT comprises a plurality of sequentially connected network layers, and wherein determining the second vector representation for each node in the multi-modal knowledge-graph through the knowledge-graph vector characterization model comprises:

8. The method of claim 7, further comprising:

and calculating the vector representation of each node through the next network layer of the network layers according to the updated correlation coefficient, and taking the vector representation of each node as the second vector representation under the condition that the next network layer is the last network layer of the GAT.

9. The method of claim 8, wherein updating the correlation coefficients for a given node pair in the target associated node pair according to the respective correlation coefficients for each of the target associated node pairs comprises:

10. The method of claim 5, wherein the determining a target vector representation of the multi-modal knowledge-graph from the first vector representation and the second vector representation comprises:

11. A medical image classification apparatus, characterized in that the apparatus comprises:

12. A non-transitory computer readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 10.

13. An electronic device, comprising:

a memory having a computer program stored thereon;

a processor for executing the computer program in the memory to carry out the steps of the method of any one of claims 1 to 10.