WO2023207743A1 - 图像检测方法、装置、计算机设备、存储介质及程序产品 - Google Patents

图像检测方法、装置、计算机设备、存储介质及程序产品 Download PDF

Info

Publication number
WO2023207743A1
WO2023207743A1 PCT/CN2023/089441 CN2023089441W WO2023207743A1 WO 2023207743 A1 WO2023207743 A1 WO 2023207743A1 CN 2023089441 W CN2023089441 W CN 2023089441W WO 2023207743 A1 WO2023207743 A1 WO 2023207743A1
Authority
WO
WIPO (PCT)
Prior art keywords
image set
image
missing
training
modality
Prior art date
Application number
PCT/CN2023/089441
Other languages
English (en)
French (fr)
Inventor
刘洪�
魏东
卢东焕
王连生
郑冶枫
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2023207743A1 publication Critical patent/WO2023207743A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10072Tomographic images
    • G06T2207/10088Magnetic resonance imaging [MRI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning

Definitions

  • the present application relates to the field of artificial intelligence, and in particular to an image detection method, device, computer equipment, storage medium and program product.
  • Magnetic resonance imaging is obtained through magnetic resonance imaging technology. Magnetic resonance imaging technology uses static magnetic fields and radio frequency magnetic fields to image human tissues. During the imaging process, neither ionizing radiation nor contrast agents are used. Get clear images with high contrast. MRI can reflect human organ abnormalities and early lesions from within human molecules. MRI images generally include multiple sequences. For example, MRI images can include: liquid attenuated inversion recovery sequence FLAIR sequence, T1 sequence, T1c sequence, T2 sequence, etc. The multiple sequence images included in these different sequences can present different tissue images, highlighting different lesion areas.
  • the identification of MRI lesions or abnormal areas is generally performed manually by doctors through their workbench and MRI, and there are certain misdetections or missed detections.
  • Embodiments of the present application provide an image detection method, device, computer equipment and storage medium, which can intelligently detect multi-modal images and assist in the identification of lesions or abnormal areas.
  • embodiments of the present application disclose an image detection method, which is executed in a computer device.
  • the method includes:
  • the first image set includes images of at least one modality, and the images of each modality are medical images of the corresponding modality;
  • the image missing state means that the first image set satisfies at least one of the following conditions: the modality corresponding to the first image set is less than a predetermined N modalities, N is a positive integer greater than 1; and the image of at least one modality in the first image set is missing a local image;
  • the missing description information is determined, wherein the missing description information is used to indicate at least one of the following: the modality corresponding to the first image set is relative to the N At least one modality in which the modality is missing, and the region of the missing image in the image of at least one modality in the first image set;
  • the first reference image set includes reference images of the N modalities, and the reference images of each modality in the first reference image set are used to represent the specific characteristics corresponding to each modality. information;
  • an image detection device which includes:
  • An acquisition unit configured to acquire a first image set, where the first image set includes images of at least one modality, and the images of each modality are medical images of the corresponding modality;
  • Determining unit configured to detect whether the first image set is in an image missing state, wherein the image missing state means that the first image set satisfies at least one of the following conditions: the model corresponding to the first image set There are less than predetermined N modes, and N is a positive integer greater than 1; and the image of at least one modality in the first image set is missing a partial image;
  • Processing unit for:
  • the missing description information is determined, wherein the missing description information is used to indicate at least one of the following: the modality corresponding to the first image set is relative to the N At least one modality in which the modality is missing, and a local area of the missing image in at least one modality in the first image set;
  • the first reference image set includes reference images of the N modalities, the first reference image
  • the reference image of each modality in the image set is used to represent the specific information corresponding to each modality
  • inventions of the present application also disclose a computer device.
  • the computer device includes: a processor, adapted to implement one or more computer programs; and a computer storage medium, where the computer storage medium stores one or more computer programs.
  • a plurality of computer programs, the one or more computer programs are adapted to be loaded by the processor and execute the above-mentioned image detection method.
  • embodiments of the present application also disclose a computer-readable storage medium that stores one or more computer programs, and the one or more computer programs are suitable for being loaded and executed by a processor. image detection method.
  • embodiments of the present application also disclose a computer program product or computer program.
  • the computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs the above-mentioned image detection method.
  • Figure 1 is a schematic architectural diagram of an image detection system disclosed in an embodiment of the present application.
  • Figure 2 is a schematic diagram of an application scenario of image detection disclosed in the embodiment of the present application.
  • Figure 3a is a schematic diagram of one of the image collection relationships disclosed in the embodiment of the present application.
  • Figure 3b is a schematic diagram of multiple modal missing situations disclosed in the embodiment of the present application.
  • Figure 4 is a schematic flow chart of an image detection method disclosed in the embodiment of the present application.
  • Figure 5 is a schematic diagram of an image segmentation result disclosed in an embodiment of the present application.
  • Figure 6 is a schematic diagram of another image segmentation result disclosed in the embodiment of the present application.
  • Figure 7 is a schematic interface diagram of an image detection method disclosed in the embodiment of the present application.
  • Figure 8 is a training framework diagram for the image detection method disclosed in the embodiment of the present application.
  • Figure 9 is a schematic flowchart of pre-training for the image detection model disclosed in the embodiment of the present application.
  • Figure 10a is a kind of synthesized full-modal synthetic image data disclosed in the embodiment of the present application.
  • Figure 10b is an implementation effect diagram disclosed in the embodiment of the present application.
  • Figure 11 is a schematic flowchart of the fine-tuning of the image detection model disclosed in the embodiment of the present application.
  • Figure 12 is a schematic structural diagram of an image detection device disclosed in an embodiment of the present application.
  • Figure 13 is a schematic structural diagram of a computer device disclosed in an embodiment of the present application.
  • the image detection method provided by the embodiment of the present application is used to identify abnormal areas in some images to be detected.
  • a first reference image set and an image detection model are designed.
  • the first reference image set is used to complete the missing images.
  • the image detection model can also restore (reconstruct) the missing image data and perform object recognition to determine abnormal areas in the image, such as For the lesion part.
  • AI Artificial Intelligence, artificial intelligence
  • AI can be used to assist doctors and other users in observing images and identifying lesions, and reducing abnormalities in MRI and other images (such as lesion areas).
  • AI is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results.
  • artificial intelligence is a comprehensive technology of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can respond in a similar way to human intelligence. device.
  • Artificial intelligence is the study of the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
  • Artificial intelligence technology is a comprehensive subject that covers a wide range of fields, including both hardware-level technology and software-level technology.
  • Basic artificial intelligence technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, mechatronics and other technologies.
  • Artificial intelligence software technology mainly includes computer vision technology, speech processing technology, natural language processing technology, and machine learning/deep learning.
  • Machine Learning is a multi-field interdisciplinary subject involving probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, etc. subjects. It specializes in studying how computers can simulate or implement human learning behavior to acquire new knowledge or skills, and reorganize existing knowledge structures to continuously improve their performance.
  • Machine learning is the core of artificial intelligence and the fundamental way to make computers intelligent. Its applications cover all fields of artificial intelligence.
  • Machine learning and deep learning usually include artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, teaching learning and other technologies.
  • Magnetic resonance imaging Magnetic resonance imaging is a relatively new medical imaging technology. It uses static magnetic fields and radio frequency magnetic fields to image human tissues. During the imaging process, neither ionizing radiation nor ionizing radiation is used. High-contrast, clear images can be obtained without the use of contrast media. It can reflect human organ abnormalities and early lesions from within human molecules, and is superior to X-ray CT in many places.
  • MRI images generally include images of multiple modalities, such as FLAIR, T1, T1c, T2, etc. These different modalities can be used to highlight different lesion areas.
  • Missing modality In clinical applications, one or more modalities are usually missing in MRI due to image damage, artifacts, acquisition protocol costs, etc.
  • Masked Autoencoder As an image self-supervision framework, the Masked Autoencoder has achieved great success in the field of self-supervision. Its agent task is to guide the model according to the visible features in an image. Some small patches restore the original pixel values of the image.
  • MI Model Inversion
  • MI Model inversion has long been used in the field of interpretability of deep learning. The goal of this technology is to synthesize the most representative images predicted by certain networks, such as for classification. saliency map.
  • SD Self Distillation
  • Self-distillation uses supervised learning to distill knowledge. Compared with the original knowledge distillation method, the teacher model and student model are one model, that is, one model to guide oneself in learning and complete knowledge distillation.
  • Multimodal masked autoencoder (M 3 AE): This is the abbreviation of the image detection model proposed in this application. It is a method for masking and restoring multi-modal image data. Autoencoders are used to simultaneously learn the correlation between different modalities and the structural relationships in images.
  • GFLOPS Giga Floating-point Operations Per Second
  • Floating-point (floating-point) refers to a value with decimals. Floating-point operations are the four rules of decimals. Operation is often used to measure computer operation speed or estimate computer performance, especially in the field of scientific computing that uses a large number of floating point operations. It is mainly used in the training process of the model of this application.
  • Figure 1 is a schematic architectural diagram of an image detection system disclosed in an embodiment of the present application.
  • the architecture diagram 100 of the image detection system may include a terminal device 101 and a server 102, where the server 102 Can be set up in Cloud 103.
  • the terminal device 101 is mainly used to receive the image set to be detected (for example, the first image set) and the segmentation results (such as the lesion area) corresponding to the image set to be detected in the embodiment of the present application.
  • the server 102 is mainly used to deploy the image set in the embodiment of the present application.
  • An image detection model is provided so that the image detection model can detect and segment the image collection to be detected and obtain a segmentation result.
  • the server 102 can also be responsible for training the image detection model.
  • the terminal device 101 obtains a set of images to be detected.
  • the set of images to be detected includes N modalities, each modality includes one or more images, and the image of each modality is a sequence of images.
  • N is an integer greater than or equal to 1; the terminal device 101 then sends the set of images to be detected to the server 102, and the server 102 detects the set of images to be detected. If it is detected that the set of images to be detected is in an image missing state, the set of images to be detected is determined.
  • the server 102 uses the reference image set to complete the missing image area in the image set to be detected, and obtains the target detection image set, where the reference image set includes N modalities, and each modality includes One or more reference images.
  • the reference image of each modality is used to represent the specific information corresponding to each modality; the server 102 then calls the image detection model to perform image detection and segmentation on the target detection image set to obtain the image corresponding to the to-be-detected image set. Segmentation results.
  • Figure 3a is a schematic diagram of one of the image collection relationships disclosed in the embodiment of the present application.
  • the image collection to be detected is The image missing state may be that one or more modal images are missing.
  • the image to be detected in the missing state is missing the image of the modality in the upper left corner, that is, the position of the dotted box.
  • the filling operation is to fill in these missing parts through an optimized reference image set (such as the first reference image set). Specifically, the missing modes in the missing image set are moved from the corresponding positions in the reference image set.
  • the image filling of the modality at , for example, the sequence in the upper left corner of the image set to be detected in Figure 3a is filled by the reference image sequence 1 in the upper left corner of the reference image set to obtain the final target detection image set (for example, the second image gather).
  • the image detection model and the reference image set are obtained by the server 102 through training and optimization.
  • the image detection model is trained based on the full-modal training image set and the missing training image set obtained by performing area masking processing on the full-modal training image set. Obtained; the reference image set is obtained by optimizing the missing training image set and the initial reference image set.
  • the initial reference image set includes N modalities, each modality includes one or more reference images, and the The value of the pixel point is the value to be optimized, and the value to be optimized of the pixel point on each reference image is optimized to obtain the first reference image set.
  • an image detection scenario is described, as shown in Figure 2.
  • the image set to be detected can be multi-modal image data, in which any zero to multiple modalities may be missing.
  • the trained image detection model can be used to directly obtain segmented brain tumor areas, etc. Segmentation results (i.e. output) of abnormal or lesion areas.
  • the segmentation results are specifically distinguished by different colors.
  • area 200, area 201 and area 203 shown in Figure 2 are grayscale images, in fact area 200, area The colors of different areas in 201 and 203 are different.
  • Area 200 (for example, purple) is a background color unrelated to the lesion, area 201 (for example, blue) represents edema, and area 202 (for example, yellow) represents enhancement. Tumors and, in some cases, also present areas (e.g. in green) representing necrosis and non-enhancing tumor core, not shown in Figure 2 .
  • a pre-trained image detection model will also be obtained before finalizing the image detection model. Based on this, in addition to the label information, the segmentation results can also include images to be detected that are in an image missing state.
  • the full-modal restored image set is restored after the set is completed and processed.
  • the full-modal restored image set is obtained by the pre-trained image detection model.
  • Figure 3b is a schematic diagram of multiple modal missing situations disclosed in the embodiment of the present application. It exemplifies that in the case where the full modality includes 4 modes, 14 modal missing conditions may occur. There are a total of 15 situations and a full-modality (that is, no modality missing) situation.
  • the image detection model trained by the embodiment of this application can process any of these 15 situations, all of which are The segmentation results can be obtained, which reflects the versatility of the image detection model of the embodiment of the present application in detecting and segmenting images with missing modalities and images with full modalities.
  • the terminal device 101 involved in the embodiment of this application includes but is not limited to user equipment, handheld devices with wireless communication functions, vehicle-mounted devices, wearable devices or computing devices.
  • the terminal device may be a mobile phone, a tablet computer, or a computer with wireless transceiver function.
  • the terminal device can also be a virtual reality (VR) terminal device, an augmented reality (AR) terminal device, a wireless terminal device in industrial control, a wireless terminal device in driverless driving, or a wireless terminal in telemedicine equipment, wireless terminal equipment in smart grids, wireless terminal equipment in smart cities, wireless terminal equipment in smart homes, etc.
  • VR virtual reality
  • AR augmented reality
  • the device used to implement the terminal device may also be a device that can support the terminal device to implement the function, such as a chip system, and the device may be installed in the terminal device.
  • the technical solution provided by the embodiments of the present application is described by taking the device for realizing the functions of the terminal device being a terminal device as an example.
  • the server 102 mentioned in the embodiment of this application may specifically be a server.
  • the server here may be an independent physical server, a server cluster or a distributed system composed of multiple physical servers, or a cloud-provided server.
  • the cloud server for computing services is the cloud 103 shown as an example in Figure 1.
  • the embodiment of the present application is not limited here.
  • the server is taken as an example to describe the technical solutions provided by the embodiments of this application.
  • the segmentation results are collected and presented, and the image detection method mentioned in the embodiments of this application can also be implemented on the image collection to be detected.
  • Figure 4 is a schematic flowchart of an image detection method disclosed in an embodiment of the present application. The method can be executed in one or more computer devices and may mainly include the following steps.
  • the first image set includes images of at least one modality, and the images of each modality are corresponding modalities.
  • medical imaging is the image set to be detected.
  • the image of each modality is an image sequence.
  • Modality can be used to indicate the type of an image sequence, which depends on the object being detected (such as the brain, chest, etc.), the type of detection tool (such as MRI, PET, CT, etc.) and the type of detection parameter (such as FLAIR , T1, T1c, T2, etc.).
  • modalities can be thought of as types. Images of different modalities can, for example, highlight different lesion areas.
  • the first image set may include images of N modalities, where N is an integer greater than 1, indicating that the image set includes all modalities (that is, there is no modality missing).
  • the first image set is, for example, a nuclear magnetic resonance image, and may include images of four modalities, that is, a sequence of four images.
  • I is a first image set, Among them, W is the width of a single image in the image of each modality, H is the height of each image, and D is the number of images in each modality, such as the number of slices of MRI data.
  • the first image set may be obtained from a database for detecting abnormal image areas; or, the first image set may be medical images obtained when examining a patient.
  • the image missing state means that the first image set satisfies at least one of the following conditions: the modalities corresponding to the first image set are less than the predetermined N modalities, and N is a positive integer greater than 1; and the first image set satisfies at least one of the following conditions: A partial image is missing from the images of at least one modality in an image set. When the modality corresponding to the first image set is less than the predetermined N modalities, it may be determined that the first image set is missing a modality.
  • the missing local image in the image of at least one modality is, for example, missing pixel values in the local area of one or more images of one modality.
  • the predetermined N modalities may be a combination of multiple modalities related to medical images.
  • the N modalities include, for example, FLAIR, T1, T1c, and T2 of the same detected object (eg, brain or chest).
  • the N modalities are a combination of PET, CT and MRI.
  • the N modalities are a combination of modalities of different detected objects, such as a combination of MRI of the brain and CT of the chest.
  • the missing description information is used to indicate at least one of the following: at least one modality corresponding to the first image set is missing relative to the N modalities, and at least one modality in the first image set is missing. area of the image. For example, in the case where the full modality includes 4 modalities, if the first image set includes images of 3 modalities, it may be determined that the first image set is missing images of 1 modality. In some cases, the first image set may not be missing. In this case, the first image set may be directly regarded as a full-modal image set to be detected. Embodiments of the present application can directly detect the full-modal image set to be detected through the image detection model to determine image abnormal areas. There is no need to use a reference image set to complete the full modality of the image set to be detected.
  • the first reference image set includes reference images of N modalities.
  • the reference image of each modality in the first reference image set is used to represent specific information corresponding to each modality.
  • the specific information corresponding to a modality refers to the salient features of images of this modality that are different from images of other modalities.
  • the reference images of N modalities can represent the specific information of N modalities.
  • S405 Based on the first reference image set, fill in the missing portion corresponding to the missing description information in the first image set to obtain a second image set.
  • the completion performed in step S405 may include: based on the first reference image set, adding a reference image corresponding to the missing at least one modality or a reference image corresponding to the missing image in the first image set. images corresponding to the area of the missing image to obtain the second image set, wherein the added reference image is the reference image of the corresponding modality in the first reference image set, and the added image corresponding to the area of the missing image is the The partial image corresponding to the location of the missing image area in the first reference image set.
  • using a reference image set to complete the missing parts is a method that can save time and space and complete the missing modalities at a very low cost. This method can eliminate the need for additional modules. needs to improve image detection efficiency.
  • the first reference image set is obtained through training and optimization.
  • the first reference image set may be obtained by optimizing the initial reference image set.
  • the initial reference image set may include reference images of N modalities. That is, it includes N initial reference sequences, each initial reference sequence includes one or more reference images, the value of the pixel on each initial reference sequence image is the value to be optimized, and the value of the pixel on each initial reference sequence image is The values to be optimized are optimized to obtain a reference image set.
  • obtaining the first reference image set please refer to the training process described in the corresponding embodiments of Figures 9 and 11.
  • S406 Detect image abnormal areas according to the second image set.
  • the abnormal area of the image represents the detected abnormal object, and the abnormal object is, for example, a lesion area.
  • subsequent detection can be performed through the image detection model, and abnormal image areas can be determined to facilitate display to the user.
  • the detection of abnormal image areas in this application can be considered as the detection of lesion areas in some medical images. It can also be used to detect image areas that are inconsistent with the image content of other areas or the image content of normal areas in some images. It can also be used to detect image areas in some other special circumstances.
  • the specific purpose of abnormal area detection is It can be determined based on the training image set and supervision image set in the corresponding training data used during training.
  • the first reference image set can be used to fill in the missing images, and then use the process
  • the completed multi-modal image detects image abnormal areas. Since the reference images of each modality in the first reference image set are used to represent specific information corresponding to each modality, the second image set can effectively fill in the missing parts of the multi-modal images, thereby helping Improve the efficiency and accuracy of image detection.
  • embodiments of the present application help to improve the detection efficiency and accuracy of image anomalies in the absence of modalities, thereby assisting doctors and users in detecting abnormalities such as lesion areas in MRI and other medical detection images. Observing areas can help reduce missed or misjudgment of abnormalities such as lesion areas.
  • step S406 may include the following operations:
  • the feature representation including a first correlation between different modalities in the second image set and a second correlation between image regions in the same modality;
  • image data corresponding to the missing description information is reconstructed for the first image set to obtain a third image set.
  • the third image set is the result obtained by adding the reconstructed image data to the first image set.
  • An abnormal area in the third image set is identified, and the abnormal area is used as the image abnormal area.
  • step S406 can obtain the first correlation between different modalities and the second correlation between image areas in the same modality, thereby obtaining the correlation between the missing part and the non-missing part of the multi-modal image. property, so that the missing part of the image data can be accurately reconstructed, thereby improving the accuracy of detecting abnormal areas.
  • the first reference image set representing specific information is used to complete the first image set before extracting the feature representation, the accuracy (realism) and realism of recovering the image through reconstruction (ie, generating the third image set) can be improved. efficiency, thereby improving the accuracy and efficiency of abnormal area detection.
  • S406 may include: using the second image set to determine the image abnormal area based on an image detection model for object recognition.
  • S406 may include calling an image detection model to perform image detection on the second image set to obtain the image abnormal area corresponding to the first image set.
  • the second image set can be input into the image detection model.
  • the image detection model is responsible for detecting the second image set and obtaining a segmentation result.
  • the segmentation result can specifically be label information about the abnormal area of the image (for example, including abnormal areas). Contour line) may also be the result obtained by marking the abnormal area on the first image set.
  • the segmentation result may also include a third image set.
  • an image detection model includes a first model and a classifier.
  • the first model includes, for example, an encoder and a decoder.
  • the first model can also be called the backbone network of the image detection model.
  • the first model is, for example, a MAE structure, but is not limited to this.
  • the entire image detection model has a VT-UNet structure, for example, but is not limited to this.
  • the encoder is used to extract a feature representation of the second image set, where the feature representation includes a first correlation between different modalities in the second image set and a second correlation between image areas in the same modality.
  • the decoder is configured to reconstruct image data corresponding to the missing description information for the first image set based on the feature representation to obtain a third image set.
  • the classifier is used to segment the abnormal area in the third image set and use the abnormal area as the image abnormal area.
  • the image detection model is obtained through training and optimization. Specifically, the image detection model is obtained based on a full-modal training image set, a missing training image set, and a combined training image set.
  • the combined training image set is a set of missing images.
  • the training image set is obtained by combining the initial reference image set.
  • the missing training image set is obtained by performing area masking processing on the full-modal training image set through masking technology.
  • the image detection model can be obtained specifically in Figure 9 and Figure 9. 11 corresponding to the training process described in the embodiment.
  • the image detection model can be used to obtain the segmentation results of four individual modal images and one full-modal image.
  • Figure 5 is a schematic diagram of an image segmentation result disclosed in an embodiment of the present application. Specifically, it is the cancer area segmentation results using four individual modality images and one full-modality image on the data of BraTS 2018 (data collection of multi-modal brain tumor segmentation competition).
  • FLAIR modal is kernel A commonly used mode of magnetic resonance (MR)
  • the full name is liquid attenuated inversion recovery mode, also known as water suppression imaging technology. In layman's terms, it is a pressurized water image.
  • the cerebrospinal fluid shows low signal (darker), and solid lesions and lesions containing bound water show obvious high signal (brighter); T1 and T2 are physical quantities used to measure electromagnetic waves, and they can be used as imaging Based on the data, different modalities can highlight different sub-regions of the lesion to assist doctors and users in final determination of the lesion.
  • the application scenarios of this application are not limited to MRI data or brain tumor data, but also other types of multi-modal medical imaging data combinations (such as PET (Positron Emission Computed Tomography)).
  • PET Positron Emission Computed Tomography
  • CT Computed Tomography
  • other body parts such as lung tumors
  • (a) is based on PET multi-modal imaging Lung tumor segmentation
  • (b) is lung tumor segmentation based on multi-modal CT images. That is to say, the image set to be detected may be MRI modality, PET modality, or CT modality, etc., or it may be a combination of two or more of MRI modality, PET modality, and CT modality.
  • the first reference image set and the image detection model constitute training data based on the corresponding combined images, For example, if MRI and PET are combined, then in the pre-training stage and fine-tuning stage mentioned later, the training data obtained by combining MRI and PET are used for optimization training, and the first reference image set and image detection model are obtained.
  • the image detection method of the present application can be displayed in a visual interface.
  • the first image set is displayed in the first display area of the user interface, and at the same time, the first image set is displayed in the second display area of the user interface.
  • the segmentation result corresponding to the first image set; similarly, in this application, the segmentation result may include at least one of the following: the image abnormal area; marking information about the image abnormal area; marking the image abnormal area in the first image set The result obtained after the abnormal area of the image is marked; the result obtained after marking the abnormal area of the image in the first image set.
  • FIG 7 it is a schematic interface diagram of an image detection method disclosed in the embodiment of the present application.
  • 701 is the first display area and 702 is the second display area.
  • the user can click the import button in the first display area 701.
  • button to import the first image set as shown in 703, for example, and then click the start button in the first display area 701, you can see the corresponding segmentation result in the second display area 702, as shown in Figure 7,
  • 704 is to restore the first image set
  • the full-modal image set corresponding to one image set is the third image set, and 705 is the tag information.
  • the tag information can also be displayed directly overlayed with the full-modal image set 704.
  • the computer device first obtains a first image set including at least one modality.
  • the images of each modality include one or more images; and then detects the first image set. If the first image set is detected, When an image set is in an image missing state, the missing description information in the first image set can be determined, and then the first reference image set is used to fill in the missing parts corresponding to the missing description information to obtain further information about the second image set.
  • the computer device then calls the image detection model to perform image detection on the second image set, and obtains the segmentation result corresponding to the first image set.
  • the computer device first completes the images with missing modalities based on the first reference image set, which can better obtain the feature representation of the first image set, which can help improve the performance of the missing modalities.
  • the multi-modal image detection effect is achieved.
  • the trained image detection model is used to detect the second image collection. Since the image detection model has been continuously optimized, the second image collection can be detected faster. Obtain segmentation results, thereby improving the overall image detection efficiency.
  • Figure 8 is a training framework diagram for the image detection method disclosed in the embodiment of the present application. It is roughly divided into two parts, one part is pre-training (the upper half of the straight line shown in Figure 8), and the other part is fine-tuning (Fig. 8).
  • the pre-training stage taking a pair of full-modal training image sets and an initial reference image set as an example, it can mainly include: performing area masking processing on the full-modal training images to obtain a missing training image set.
  • the area masking processing can specifically It is to cover any one or more modes among multiple modes, or it can be to cover one or more modes and then partially cover the image of at least one mode among the remaining modes.
  • the missing training image set and the initial reference image set are combined to obtain a combined training image set. Further, the combined training image set is input to the initial first model for training to obtain a predicted image set.
  • the first model and the initial reference image set are Optimize to obtain the pre-trained reference image set and the pre-trained first model.
  • a pretrained image detection model including a pretrained first model and an untrained classifier can be obtained.
  • the difference can be characterized as the loss value between the predicted image set and the initial reference image set, and the first model and the initial reference image set are adjusted according to the loss value.
  • the predicted The trained image detection model where the purpose of pre-training is to learn the feature representation of multi-modal images in the case of missing modalities.
  • a pre-training reference image set is also optimized (that is, the initial reference image is obtained after continuous optimization), which can be used It is used to fill in the image data that may be missing during training and inference.
  • the initial reference image set can be an initially generated image set of N modalities, or it can be an image set obtained after training based on the previous full-modal training image set and a missing training image set that needs further optimization.
  • the first reference image set of N modalities That is, in addition to the finally available first reference image set, any set that needs to be trained and optimized can be called an initial reference image set.
  • the fine-tuning stage taking a pair of full-modal training image sets and the segmentation supervision information corresponding to the full-modal training image set as an example, it can mainly include: first inputting the full-modal training image set into the pre-training obtained in the pre-training stage Perform segmentation prediction in the image detection model to obtain the first segmentation prediction information and store it in the storage space.
  • the combined fine-tuned image set is input into the pre-trained image detection model for segmentation prediction to obtain the second segmentation prediction information, and then based on the difference between the first segmentation prediction information and the second segmentation prediction information, the second segmentation prediction information and the total
  • the difference between the segmentation supervision information configured in the modal training image set is optimized to the pre-trained reference image set and the pre-trained image detection model to obtain the first reference image set and the image detection model.
  • the segmentation supervision information is, for example, label information related to abnormal areas of the image.
  • the difference between the first segmentation prediction information and the second segmentation prediction information, and the difference between the second segmentation prediction information and the segmentation supervision information configured for the full-modality training image set are all represented by loss values. , that is, first calculate the loss value between the first segmentation prediction information and the second segmentation prediction information, calculate the loss value between the second segmentation prediction information and the segmentation supervision information configured for the full-modal training image set, and then calculate the two The sum of the loss values, and based on this value, the pre-trained image detection model and the pre-trained reference image set are fine-tuned. After repeated adjustments, when the loss value reaches the convergence condition, the image detection model can be realized in the absence of modality.
  • the image detection model and reference image set obtained after training in the above two stages are universal and can be used to process MRI image data in any missing modality during testing (use).
  • the backbone network of the network model used in the training process of this application can be VT-UNet.
  • This network is a pure Transformer (a self-attention transformation network) architecture, and the corresponding parameter amount and calculation amount are lower than the commonly used 3DUnet (an image analysis model) or Vnet (an image analysis model).
  • this application uses the Adam (Adaptive momentum, an optimization algorithm) algorithm as the optimizer during network training, and sets the number of training rounds in the first and second phases to 600 and 400 rounds respectively.
  • the initial learning rate of training is 3e-4, and the cosine annealing learning rate scheduling mechanism is used during the training process, which has better convergence.
  • This application trains the model on two 2080Ti NVIDIA graphics cards, with a batch size of 2.
  • the pixel values can be clipped to one to ninety-nine percent of the intensity value, then minimum or maximum scaling, and finally randomly cropped to a fixed size of 128 ⁇ 128 ⁇ 128 pixels. for training.
  • the side length of the random 3D patch can be set to 16 pixels.
  • the corresponding images in the initial reference image set are initialized by Gaussian noise, and ⁇ can be set to 0.1.
  • Figure 9 is a schematic flow chart of pre-training for the image detection model disclosed in the embodiment of the present application.
  • Figure 9 can include S901-S905 ,Specific steps are as follows:
  • the training data may include: multiple full-modal training image sets and initial reference image sets.
  • Each full-modality training image set includes images of N modalities, and the images of each modality do not lack local images.
  • the initial reference image set includes reference images of N modalities.
  • the initial reference image set includes The reference image of each modality represents the initial specific information corresponding to each modality.
  • the initial specific information refers to the specific information obtained at the beginning, and the initial reference image set (ie, the initial specific information) can be optimized later.
  • These training data can be obtained from the database or from relevant institutions. For example, MIR data or brain tumor data can be obtained from hospitals.
  • S902 Mask each full-modal training image set to obtain a corresponding missing training image set.
  • the missing training image set is missing images of at least one modality.
  • the full-modal training image can be masked through masking processing, etc. Images of one or more modalities in the set are used to obtain a missing training image set.
  • the full-modal training image set has four modalities, and regional masking processing is performed on them, which can be to mask one of the modalities or two of the modalities. modality or three modalities to obtain the missing training image set; or, mask one or more modalities in the full-modal training image set, and perform partial masking processing on the images in the remaining modalities to obtain the missing training image. gather.
  • a full-modal training image set has four modalities.
  • Masking can be performed by masking one of the modalities, and then covering partial areas of the remaining three modalities, thereby obtaining a missing training image set.
  • the missing training data set may include M missing sequences, where M is an integer greater than or equal to 1 and less than N.
  • a combined training image set can be obtained based on the missing training image set and the initial reference image set.
  • the corresponding modality in the initial reference image set (that is, the same modality as the modality missing in the missing training image set) can be obtained.
  • the reference image of is added to the missing training image set, and the image data of the corresponding area in the initial reference image set (the location of the area corresponds to the location of the area of the missing image in at least one modality in the missing training image set) is added to Missing areas of the missing images in the training image set, resulting in a combined training image set
  • the completed combined training can be obtained by overlaying the reference image corresponding to T1 in the reference image at the position of the T1 modality in the missing training image set. Image collection.
  • S904 Input the combined training image set into the first model to obtain a predicted image set output by the first model. After the combined training image set is obtained, the combined training image set is input into the initial first model for processing to obtain a predicted image set.
  • the first model is built based on the mask autoencoder, and the corresponding first model can be based on MAE or VT-UNet. Of course, other commonly used deep neural networks can also be used, which is not limited by this application.
  • the first model and the initial reference image set are optimized based on the difference between the predicted image set and the full-modal training image set.
  • the loss value between the first model of the initial model (that is, the trained image detection model) and the initial reference image set is optimized.
  • the loss value between the prediction image set and the full-modality training image set may be calculated. If the loss value between a large number of prediction image sets and the full-modality training image set is less than or equal to the first threshold. , then the pre-trained reference image set and the pre-trained first model are determined, or when the first model is in a certain set of model parameters, for a large number of missing training image sets or full-modal training image sets, the corresponding The loss value is the smallest, the corresponding first model at this time is determined to be the pre-trained first model, and the corresponding pre-trained reference image set is obtained.
  • the initial reference image set is optimized through model inversion to obtain a pre-training reference image set.
  • the optimization target expression for optimizing the first model is as follows: Formula (1):
  • x represents the full-modal training image set
  • x′ represents the missing training image set
  • x sub represents the initial reference image set
  • S(x′, x sub ) represents the combined training image set
  • F is the reconstruction function
  • is the weight
  • is the mean square error loss function
  • x represents the full-modal training image set
  • x′ represents the missing training image set
  • x sub represents the initial reference image set
  • S(x′,x sub ) represents the combined training image set
  • F is the reconstruction function
  • is the weight
  • is the mean square error loss function.
  • the final model performance obtained using 0.8125 or 0.875 is better than 0.75 (mask probability in MAE related papers), where the Dice indicator (a set similarity measure, such as DSC ( Dice Similarity Coefficient (Dice similarity coefficient)) is used to measure the experimental effect.
  • DSC Dice Similarity Coefficient
  • WT whole tumor
  • TC tumor core
  • ET enhancing tumor
  • This application mainly describes the pre-training process in the model training process.
  • the purpose is to obtain the pre-trained first model and the pre-training reference image set.
  • This application uses a multi-modal mask autoencoder to learn the missing modal conditions.
  • this model i.e., the first model
  • this model is a single encoder-decoder structure, which reduces the difficulty of training the model.
  • the pre-training stage is to train the first model and the initial reference image set based on the training data and the modal completion rules based on the model inversion, and obtain the pre-trained reference image set and the pre-trained first model.
  • Pre-training The reference image collection and the pre-trained first model can be used to complete the modalities that may be missing during the training and inference processes, thereby improving the efficiency of image detection in this application.
  • Figure 11 can include S1101-S1104. The specific steps are as follows:
  • S1101 Combine the pre-training reference image set and the full-modal training image set to obtain a combined fine-tuning image set.
  • the combined fine-tuning image set includes images of N modalities, x modal images among the N modal images are from the pre-training reference image set, and y modal images are from the pre-training reference image set.
  • the pre-training reference image set and the full-modality training image set are combined according to the rule representation information to obtain a combined fine-tuning image set; or the pre-training reference image set and the full-modality training image set are randomly combined Combine to obtain a combined fine-tuned image set.
  • x modal images come from the pre-training reference image set
  • rule representation information is displayed.
  • the full-modal training image set and the pre-training reference image set can be combined to obtain a combined fine-tuning image set.
  • the dark part of the rule representation information indicating that the positions corresponding to the full-modal training image set are covered, and the light-colored departments indicate that the positions corresponding to the full-modal training image set are not covered, and are combined according to this rule to obtain a combined fine-tuned image set.
  • the pretrained image detection model includes a pretrained first model and a classifier.
  • the input terminal of the classifier is connected to the output terminal of the first model.
  • the full-modal training image set is first input into the pre-trained image detection model for segmentation prediction, and the first segmentation prediction information is obtained and stored in a storage
  • the storage space can also be stored in the CPU memory, which is also more suitable for hardware that cannot implement joint training due to lack of GPU memory.
  • the first segmentation prediction information can also be updated in real time.
  • S1103 Input the combined fine-tuned image set into the pre-trained image detection model to obtain the second segmentation prediction information output by the image detection model.
  • S1104 Based on the difference between the first segmentation prediction information and the second segmentation prediction information, the difference between the second segmentation prediction information and the segmentation supervision information configured for the full-modal training image set, perform the pre-training reference image set and the pre-training reference image set.
  • the trained image detection model is optimized to obtain a first reference image set and a trained image detection model.
  • the pre-training is performed based on the difference between the first segmentation prediction information and the second segmentation prediction information, the difference between the second segmentation prediction information and the segmentation supervision information configured for the full-modality training image set.
  • Optimizing with reference to the image collection and the pre-trained image detection model may include calculating the loss value between the first segmentation prediction information and the second segmentation prediction information and calculating the loss value between the second segmentation prediction information and the segmentation supervision information, thereby According to these two loss values, parameter values of the pre-trained reference image set and the pre-trained image detection model are optimized to obtain the first reference image set and the trained image detection model.
  • the optimization target expression for optimizing the pre-trained reference image set and the pre-trained image detection model is formula (3):
  • KL distance Kullback-Leibler Divergence, which measures the difference between two probability distributions in the same event space
  • W is the width of each image in the image collection
  • H is the height of each image in the image collection
  • D is the number of slices of each image in the image collection
  • C is the total category of the image collection segmentation.
  • Steps S1102-S1104 are a computationally efficient self-distillation method that can migrate task-related knowledge from full-modal data to missing-modal data in the same network, and can fine-tune the model to handle various tasks simultaneously.
  • a multi-modal segmentation model with missing modes while also reducing the computational overhead during training and deployment.
  • This application mainly describes the fine-tuning process in the model training process.
  • the purpose is to obtain the final image detection model and reference image set, which can be processed for any situation of modal images to obtain the corresponding segmentation results.
  • the fine-tuning task of this application is a computationally efficient self-distillation method.
  • the information of the full-modal data is distilled to the missing modality to achieve a higher-precision segmentation effect in the case of missing modalities.
  • the generated image detection model and reference image set are highly versatile and can be used to handle MRI in any missing modality when used (i.e., prediction process). data.
  • this application conducted specific experiments, specifically experiments completed on the PyTorch neural network framework, and obtained corresponding experimental results.
  • the technology corresponding to the image detection method of this application was tested in the brain tumor segmentation competition BraTS 2018 and BraTS 2019 to verify its effectiveness.
  • the data set of the BraTS series consists of multiple pairs of MRIs including four modalities, namely T1, T1c, T2 and FLAIR. These data have been organized by the competition, and pre-processing including stripping the skull, resampling to a unified resolution (1mm 3 ), and co-registering on the same template.
  • the BraTS 2018 and BraTS 2019 data sets include 285 and 335 cases of data and corresponding tumor area annotations respectively.
  • the two data sets can be randomly divided into a training set and a test set in a ratio of 80:20.
  • Dice coefficient and 95% Hausdorff distance can be used as evaluation indicators.
  • an online evaluation system can also be used to verify the verification set stored in the database of the technology of this application in full mode. performance in.
  • the above-mentioned BraTS 2018 and BraTS 2019 are two data sets that currently exist in the database and can be used directly.
  • Table 1 shows the comparison of the image detection method of this application and the general method of brain MRI tumor segmentation in the case of three missing modalities on the BraTS 2018 data set. These three methods are: HVED, LCRL and FGMF. Among them, FGMF is the general method with the highest index currently. Since these methods all show segmentation results on 20% of the data, the results corresponding to these methods can be extracted directly from the database.
  • the image detection method proposed in this application has the best overall performance on the test set, achieving the best median in the three tumor areas, and the image detection method proposed in this application The best results have been achieved in most cases (the image detection method of this application has achieved the best results in 14, 11, and 10 cases respectively in the three tumor areas, with a total of 15 missing cases. ). It is worth mentioning that the technology used in the image detection method proposed in this application uses a basic single encoder-decoder framework, while the above three methods in comparison all use multiple encoders or multiple decoders without exception. frame frame, its calculation amount is greater than the image detection method proposed in this application.
  • HVED Hetero- Modal Variational Encoder-Decoder for Joint Modality Completion and Segmentation, heteromodal variational codec for joint modality completion and segmentation
  • LCRL Topic correlation representation learning for brain tumor segmentation with missing MRI modalities, missing MRI modalities
  • FGMF Feature-enhanced generation and multimodality fusion based deep neural network for brain tumor segmentation with missing MR modalities, deep neural network based on feature enhancement generation and multimodality fusion, for brain tumor segmentation in the absence of MR modalities).
  • Table 2 shows the comparison between the image detection method of this application on the BraTS 2019 data set and the only LCRL that has done comparative experiments on this data set. The results show that the image detection method of the present application is better than LCRL in all deletions in all tumor areas, indicating that the image detection method of the present application has relatively good generalization.
  • the existing and missing modes are represented by ⁇ and o respectively, and the p value is given by Wilcoxon testing the significance of the corresponding method and the image detection method of this application respectively.
  • the image segmentation method of this application proposes a "universal" model, in order to reflect the effect of the image detection method of this application, it is also adversarially combined with the currently best dedicated model ACN (Adversarial Co-training Network). Segmentation network) was compared. The training and test ratio used by this method is different from the image detection method of this application. The results shown in the paper can be directly excerpted as a reference.
  • the existing and missing modes are represented by ⁇ and o respectively, and the p value is given by Wilcoxon testing the significance of the corresponding method and the image detection method of this application respectively.
  • Table 4 compares the online test results of the image detection method of this application and several current methods on two data sets, including LCRL, VT-UNet-T (the backbone network used by the image detection method of this application), and TransBTS (Another Transformer model for brain MRI tumor segmentation).
  • LCRL liquid crystal display
  • VT-UNet-T the backbone network used by the image detection method of this application
  • TransBTS Another Transformer model for brain MRI tumor segmentation
  • the evaluation index of the framework proposed in this application is better in all tumor areas, verifying the effectiveness of self-distillation from full modes to missing modes.
  • the self-distillation framework proposed in this application saves approximately 52 GFLOPS (floating point operations) of calculations compared to the joint training method.
  • embodiments of the present application also provide a schematic structural diagram of an image detection device.
  • Figure 12 is a schematic structural diagram of an image detection device provided by an embodiment of the present invention.
  • the device can be applied to the server mentioned above, or can also be applied to a computer device.
  • the image detection device 1200 shown in Figure 12 can run the following units:
  • the acquisition unit 1201 is used to acquire a first image set, where the first image set includes images of at least one modality, and the images of each modality are medical images of the corresponding modality;
  • Determining unit 1202 configured to detect whether the first image set is in an image missing state, where the image missing state means that the first image set satisfies at least one of the following conditions: the first image set corresponding to The modes are less than the predetermined N modes, N is a positive integer greater than 1; and the partial image is missing from the image of at least one modality in the first image set;
  • the missing description information is determined, wherein the missing description information is used to indicate at least one of the following: the modality corresponding to the first image set is relative to the N At least one modality in which the modality is missing, and the region of the missing image in the image of at least one modality in the first image set;
  • the first reference image set includes reference images of the N modalities, and the reference images of each modality in the first reference image set are used to represent the specific characteristics corresponding to each modality. information;
  • the processing unit 1203 detects image abnormal areas based on the second image set, and may be specifically used to:
  • the image abnormal area is determined using the second image set.
  • the image detection device further includes:
  • the display unit 1204 is configured to display the first image set in the first display area of the user interface; display the segmentation result corresponding to the first image set in the second display area of the user interface; wherein the segmentation
  • the results include any of the following: the image abnormal region; the result obtained after marking the image abnormal region in the first image set; or the result obtained after marking the image abnormal region in the first image set, and The third image collection.
  • the acquisition unit 1201 is also used to acquire training data for training the image detection model.
  • the training data includes: multiple full-modal training image sets and initial reference image sets; each The full-modal training image set includes images of N modalities, and the images of each modality do not lack local images.
  • the initial reference image set includes reference images of N modalities.
  • Each of the initial reference image sets includes The reference images of each modality represent the initial specific information corresponding to each modality.
  • the processing unit 1203 is also used to:
  • the initial reference image set is optimized to obtain a pre-training reference image set.
  • the processing unit 1203 performs mask processing on each full-modal training image set to obtain the corresponding missing training image set, which specifically includes:
  • the processing unit 1203 determines the optimization target expression for optimizing the first model based on the difference between the predicted image set and the full-modal training image set:
  • x represents the full-modal training image set
  • x′ represents the missing training image set
  • x sub represents the initial reference image set
  • S(x′, x sub ) represents the combined training image set
  • F is the reconstruction function
  • is the weight
  • is the mean square error loss function
  • the processing unit 1203 optimizes the optimization target expression of the initial reference image set based on the difference:
  • x represents the full-modal training image set
  • x′ represents the missing training image set
  • x sub represents the initial reference image set
  • S(x′,x sub ) represents the combined training image set
  • F is the reconstruction function
  • is the weight
  • processing unit 1203 is also used to:
  • the pre-training reference image set is combined with the full-modality training image set to obtain a combined fine-tuning image set.
  • the difference between the first segmentation prediction information and the second segmentation prediction information the difference between the second segmentation prediction information and the segmentation supervision information configured for the full-modal training image set, the pre-training reference image set and the The pre-trained image detection model is optimized to obtain the first reference image set and the trained image detection model.
  • the optimization target expression used by the processing unit 1203 to optimize the pre-trained reference image set and the pre-trained image detection model is:
  • the corresponding image detection model and reference image set are obtained by training and optimizing the first model and the initial reference image set;
  • Training and optimization of the first model and the initial reference image set include pre-training and fine-tuning; the first model is built based on a masked autoencoder;
  • the first model and the initial reference image set are trained according to the training data and the modal completion rules based on model inversion to obtain a pre-trained reference image set and a pre-trained image detection model. of;
  • the pre-trained reference image set and the pre-trained image detection model are trained according to the self-distillation method from the full-modal training image set to the missing sequence data set to obtain the first reference image set and Image detection model.
  • Embodiments of the present application first complete the images with missing modalities based on the reference image set, which can better obtain the feature information of the first image set, which can help improve multi-modality in the case of missing modalities.
  • the image segmentation effect further, uses the already trained image detection model to detect the target detection image set. Since the image detection model has been continuously optimized, it can detect the target detection image set faster and obtain the segmentation results. , thereby improving image detection efficiency.
  • FIG. 13 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
  • the computer device 1300 shown in FIG. 13 at least includes a processor 1301, an input interface 1302, an output interface 1303, a computer storage medium 1304, and a memory 1305.
  • the processor 1301, the input interface 1302, the output interface 1303, the computer storage medium 1304 and the memory 1305 can be connected through a bus or other means.
  • the computer storage medium 1304 may be stored in the memory 1305 of the computer device 1300.
  • the computer storage medium 1304 is used to store a computer program, the computer program includes program instructions, and the processor 1301 is used to execute the computer storage medium 1304.
  • the processor 1301 (or CPU (Central Processing Unit)) is the computing core and control core of the computer device 1300. It is suitable for implementing one or more computer programs, specifically suitable for loading and executing one or more computer programs. The program thereby implements the corresponding method process or corresponding function.
  • Embodiments of the present application also provide a computer storage medium (Memory), which is a memory device in a computer device and is used to store programs and data.
  • the computer storage media here may include built-in storage media in the computer device, and of course may also include extended storage media supported by the computer device.
  • Computer storage media provides storage space that stores the operating system of the computer device.
  • one or more computer programs (including program codes) suitable for being loaded and executed by the processor 1301 are also stored in the storage space.
  • the computer storage medium here can be a high-speed RAM memory or a non-volatile memory (non-volatile memory), such as at least one disk memory; optionally, it can also be at least one located far away from the aforementioned processor.
  • the computer storage medium can be loaded by the processor 2701 and execute one or more computer programs stored in the computer storage medium to implement the above corresponding steps of the image detection method shown in FIG. 4, FIG. 9 and FIG. 11.
  • one or more instructions in the computer storage medium are loaded by the processor 1301 and execute the image detection method of the embodiment of the present application.
  • An embodiment of the present application provides a computer-readable storage medium.
  • the computer-readable storage medium stores a computer program.
  • the computer program includes program instructions. When the program instructions are executed by a processor, all the above embodiments can be executed. steps performed.
  • Embodiments of the present application also provide a computer program product or computer program.
  • the computer program product or computer program includes computer instructions.
  • the computer instructions are stored in a computer-readable storage medium. When the computer instructions are executed by the processor of the computer device, all of the above are executed. Methods in Examples.
  • the program can be stored in a computer-readable storage medium.
  • the program can be stored in a computer-readable storage medium.
  • the process may include the processes of the embodiments of each of the above methods.
  • the storage medium can be a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM) or a random access memory (Random Access Memory, RAM), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Radiology & Medical Imaging (AREA)
  • Quality & Reliability (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)
  • Image Analysis (AREA)

Abstract

一种图像检测方法、装置、计算机设备、存储介质及程序产品,可以应用于人工智能领域,如智能医学,其中,方法包括:获取第一图像集合,所述第一图像集合包括至少一个模态的图像,每个模态的图像为相应模态的医学影像;检测所述第一图像集合是否处于图像缺失状态;若所述第一图像集合处于图像缺失状态,则确定缺失描述信息;获取第一参考图像集合,所述第一参考图像集合包括所述N个模态的参考图像;基于第一参考图像集合,在所述第一图像集合中补齐所述缺失描述信息对应的缺失部分,以得到第二图像集合;根据所述第二图像集合,检测图像异常区域。通过该方法,可对多模态图像进行检测,能够辅助进行病灶等异常区域的检测。

Description

图像检测方法、装置、计算机设备、存储介质及程序产品
本申请要求于2022年04月27日提交中国专利局、申请号为202210456475.6、申请名称为“一种图像检测方法、装置、计算机设备及存储介质”的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能领域,尤其涉及一种图像检测方法、装置、计算机设备、存储介质及程序产品。
背景技术
核磁共振图像(magnetic resonance imaging,MRI)是通过磁共振成像技术得到的,磁共振成像技术采用静磁场和射频磁场使人体组织成像,在成像过程中,既不用电离辐射、也不用造影剂就可获得高对比度的清晰图像。MRI能够从人体分子内部反映出人体器官失常和早期病变。核磁共振图像一般包含多个序列,例如核磁共振图像可以包括:液体衰减反转恢复序列FLAIR序列、T1序列、T1c序列、T2序列等,这些不同的序列下所包括的多个序列图像可以呈现不同的组织图像,突出不同的病灶区域。
在实际的运用过程中,对MRI的病灶或者异常区域识别一般是医生通过其工作台和MRI进行人工识别,存在一定的误检或者漏检的情况。
发明内容
本申请实施例提供了一种图像检测方法、装置、计算机设备及存储介质,可对多模态图像进行智能检测,能够辅助进行病灶或异常区域的识别。
一方面,本申请实施例公开了一种图像检测方法,在计算机设备中执行,该方法包括:
获取第一图像集合,所述第一图像集合包括至少一个模态的图像,每个模态的图像为相应模态的医学影像;
检测所述第一图像集合是否处于图像缺失状态,其中,所述图像缺失状态是指所述第一图像集合满足下述条件中至少一个:所述第一图像集合对应的模态少于预定的N个模态,N为大于1的正整数;以及所述第一图像集合中的至少一个模态的图像中缺失局部图像;
若所述第一图像集合处于图像缺失状态,则确定缺失描述信息,其中,所述缺失描述信息用于指示下述中至少一个:所述第一图像集合对应的模态相对于所述N个模态缺失的至少一个模态,和所述第一图像集合中的至少一个模态的图像中缺失图像的区域;
获取第一参考图像集合,所述第一参考图像集合包括所述N个模态的参考图像,所述第一参考图像集合中每个模态的参考图像用于表示每个模态对应的特异信息;
基于所述第一参考图像集合,在所述第一图像集合中补齐所述缺失描述信息对应的缺失部分,以得到第二图像集合;
根据所述第二图像集合,检测图像异常区域。
另一方面,本申请实施例公开了一种图像检测装置,该装置包括:
获取单元,用于获取第一图像集合,所述第一图像集合包括至少一个模态的图像,每个模态的图像为相应模态的医学影像;
确定单元,用于检测所述第一图像集合是否处于图像缺失状态,其中,所述图像缺失状态是指所述第一图像集合满足下述条件中至少一个:所述第一图像集合对应的模态少于预定的N个模态,N为大于1的正整数;以及所述第一图像集合中的至少一个模态的图像缺失局部图像;
处理单元,用于:
若所述第一图像集合处于图像缺失状态,则确定缺失描述信息,其中,所述缺失描述信息用于指示下述中至少一个:所述第一图像集合对应的模态相对于所述N个模态缺失的至少一个模态,和所述第一图像集合中的至少一个模态中缺失图像的局部区域;
获取第一参考图像集合,所述第一参考图像集合包括所述N个模态的参考图像,所述第一参考图 像集合中每个模态的参考图像用于表示每个模态对应的特异信息;
基于第一参考图像集合,在所述第一图像集合中补齐所述缺失描述信息对应的缺失部分,以得到第二图像集合;
根据所述第二图像集合,检测图像异常区域。
另一方面,本申请实施例还公开了一种计算机设备,所述计算机设备包括:处理器,适于实现一条或多条计算机程序;以及,计算机存储介质,所述计算机存储介质存储有一条或多条计算机程序,所述一条或多条计算机程序适于由所述处理器加载并执行上述的图像检测方法。
另一方面,本申请实施例还公开了一种计算机可读存储介质,所述计算机存储介质存储有一条或多条计算机程序,所述一条或多条计算机程序适于由处理器加载并执行上述的图像检测方法。
另一方面,本申请实施例还公开了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行上述的图像检测方法。
附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例公开的一种图像检测系统的架构示意图;
图2是本申请实施例公开的一种图像检测的应用场景示意图;
图3a是本申请实施例公开的其中一种图像集合关系的示意图;
图3b是本申请实施例公开的多种模态缺失情况的示意图;
图4是本申请实施例公开的一种图像检测方法的流程示意图;
图5是本申请实施例公开的一种图像分割结果示意图;
图6是本申请实施例公开的另一种图像分割结果的示意图;
图7是本申请实施例公开的一种图像检测方法的界面示意图;
图8是本申请实施例公开的一种针对图像检测方法的训练框架图;
图9是本申请实施例公开的针对图像检测模型的预训练的流程示意图;
图10a是本申请实施例公开的一种合成的全模态合成图像数据;
图10b是本申请实施例公开的一种实现效果图;
图11是本申请实施例公开的针对图像检测模型的微调的流程示意图;
图12是本申请实施例公开的一种图像检测装置的结构示意图;
图13是本申请实施例公开的一种计算机设备的结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请实施例提供的图像检测方法,用于对一些待检测的图像进行异常区域识别,考虑到待检测图像存在图像缺失的可能,设计了第一参考图像集合和图像检测模型。一方面通过第一参考图像集合对存在缺失的图像进行补齐操作,另一方面通过图像检测模型还能够恢复(重建)缺失的图像数据,并进行对象识别,以确定图像中的异常区域,例如为病灶部分。如此可以较好地对诸如MRI等多模态图像进行异常检测,通过AI(Artificial Intelligence,人工智能)技术协助医生等用户对图像的观察和病灶识别,减低对MRI等图像的异常(例如病灶区域)的漏判或者误判的可能性。
AI是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说,人工智能是计算机科学的一个综合技术,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式做出反应的智能机 器。人工智能也就是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。
人工智能技术是一门综合学科,涉及领域广泛,既有硬件层面的技术也有软件层面的技术。人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统、机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、语音处理技术、自然语言处理技术以及机器学习/深度学习等几大方向。
AI所涉及的网络模型等可以通过机器学习来训练优化,机器学习(Machine Learning,ML)是一门多领域交叉学科,涉及概率论、统计学、逼近论、凸分析、算法复杂度理论等多门学科。专门研究计算机怎样模拟或实现人类的学习行为,以获取新的知识或技能,重新组织已有的知识结构使之不断改善自身的性能。机器学习是人工智能的核心,是使计算机具有智能的根本途径,其应用遍及人工智能的各个领域。机器学习和深度学习通常包括人工神经网络、置信网络、强化学习、迁移学习、归纳学习、式教学习等技术。
同时,为了更清楚地对本申请进行阐述,先对本申请涉及的一些专业术语进行简单的描述,具体可以包括:
(1)核磁共振图像(magnetic resonance imaging,MRI):磁共振成像是一种较新的医学成像技术,它采用静磁场和射频磁场使人体组织成像,在成像过程中,既不用电离辐射、也不用造影剂就可获得高对比度的清晰图像。它能够从人体分子内部反映出人体器官失常和早期病变,在很多地方优于X线CT。核磁共振图像一般包含多个模态的图像,模态类型例如为FLAIR、T1、T1c、T2等,这些不同的模态可以用于突出不同的病灶区域。(2)缺失模态(missing modality):在临床应用中,由于图像损坏、伪影、获取协议成本等原因,MRI通常会出现一种或多种模态缺失的情况。(3)掩膜自编码器(Masked Autoencoder,MAE):掩膜自编码器作为一个图像自监督框架,在自监督领域取得了很大的成功,其代理任务是引导模型根据一个图像中可见的部分小块还原出图像原本的像素值。(4)模型反衍(Model Inversion,MI):模型反衍长期被用于深度学习的可解释性领域,该技术的目标是合成最具代表性的某些网络预测的图像,例如用于分类的显著性图。(5)自蒸馏(Self Distillation,SD):自蒸馏是采用有监督学习进行知识蒸馏。相较于原始的知识蒸馏方法,其teacher模型和student模型是一个模型,也就是一个模型来指导自己进行学习,完成知识蒸馏。(6)多模态掩膜自编码器(multimodal masked autoencoder,M3AE):是本申请所提出的图像检测模型的简称,是一种对多模态影像数据进行掩模处理并进行还原的自编码器,以此来同时学习不同模态之间的关联以及影像中的结构关系。(7)GFLOPS(Giga Floating-point Operations Per Second):即每秒10亿次的浮点运算次数,浮点(floating-point)指的是带有小数的数值,浮点运算即是小数的四则运算,常用来测量电脑运算速度或被用来估算电脑性能,尤其是在使用到大量浮点运算的科学计算领域中,主要用于本申请模型的训练过程。
请参见图1,图1为本申请实施例公开的一种图像检测系统的架构示意图,如图1所示,该图像检测系统的架构图100可以包括终端设备101以及服务器102,其中,服务器102可以设置在云端103中。终端设备101主要用于接收本申请实施例中的待检测图像集合(例如第一图像集合)以及待检测图像集合对应的分割结果(例如病灶区域),服务器102主要用于部署本申请实施例中的图像检测模型,以使得图像检测模型可以对待检测图像集合进行检测分割,得到分割结果,同时,服务器102还可以负责对图像检测模型进行训练。
在一种实现方式中,终端设备101获取待检测图像集合,该待检测图像集合包括N个模态,每个模态包括一个或者多个图像,每个模态的图像为一个图像的序列,N为大于或等于1的整数;终端设备101再将待检测图像集合发送给服务器102,服务器102对待检测图像集合进行检测,若检测到待检测图像集合处于图像缺失状态,则确定待检测图像集合中的缺失图像区域;进一步,服务器102利用参考图像集合对待检测图像集合中的缺失图像区域进行补齐操作,得到目标检测图像集合,其中,参考图像集合包括N个模态,每个模态包括一个或者多个参考图像,每个模态的参考图像用于表示每个模态对应的特异信息;服务器102再调用图像检测模型对目标检测图像集合进行图像检测分割,得到待检测图像集合对应的分割结果。
以图3a为例,图3a是本申请实施例公开的其中一种图像集合关系的示意图,待检测图像集合处 于图像缺失状态可以是缺少了一个或者多个模态的图像,在图3a中,处于缺失状态的待检测图像缺失了左上角的模态的图像,即虚线框的位置。补齐操作则是通过一个经优化的参考图像集合(例如第一参考图像集合)将这些缺失的部分补齐,具体是将处于缺失状态的图像集合中的缺失模态由参考图像集合中相应位置处的模态的图像填充,例如在图3a中待检测图像集合的左上角的序列,由参考图像集合中左上角的参考图像序列1填充,得到最后的目标检测图像集合(例如为第二图像集合)。
其中,图像检测模型和参考图像集合是服务器102通过训练优化得到的,图像检测模型是根据全模态训练图像集合、和将全模态训练图像集合进行区域掩盖处理后得到的缺失训练图像集合训练得到的;参考图像集合是对缺失训练图像集合和初始参考图像集合进行优化得到的,初始参考图像集合包括N个模态,每个模态包括一个或者多个参考图像,每个参考图像上的像素点的值为待优化值,对每个参考图像上的像素点的待优化值进行值优化,以得到第一参考图像集合。
在一种应用场景中,针对MRI数据或脑肿瘤数据,以服务器102为云端103为例,阐述了一种图像检测场景,如图2,当用户上传需要分割的待检测图像集合(即输入图像)时,待检测图像集合可以是多模态的影像数据,其中任意的零到多个模态可能缺失,基于本申请,利用训练好的图像检测模型可以直接得出分割好的脑肿瘤区域等异常或者病灶区域的分割结果(即输出),分割结果具体是以不同颜色用以区分的,尽管图2所示的区域200、区域201和区域203是灰度图,但实际上区域200、区域201和区域203中不同区域的颜色是不同的,其中的区域200(例如是紫色)是与病变无关的背景色,区域201(例如是蓝色)代表水肿,区域202(例如是黄色)代表增强肿瘤,在一些情况下,还会出现代表坏死和非增强肿瘤核心的区域(例如是绿色),图2中并未示出。又或者,在一种实现方式中,在最终确定图像检测模型之前,也会得到一个预训练的图像检测模型,基于此,分割结果除了标记信息,还可以包括对处于图像缺失状态的待检测图像集合进行补齐处理之后恢复得到的全模态恢复图像集合,该全模态恢复图像集合是由预训练的图像检测模型得到的。
在一个实施例中,图3b是本申请实施例公开的多种模态缺失的情况示意图,示例性的给出了在全模态包括4个模态的情况下,可能出现14种模态缺失的情况以及一种全模态(即无模态缺失)的情况,一共15种情况,利用本申请实施例训练出来的图像检测模型,可以对这15种情况中的任意一种进行处理,均可以得到分割结果,以此体现出了本申请实施例的图像检测模型关于模态缺失的图像和全模态的图像的检测分割的通用性。
本申请实施例中所涉及的终端设备101,包括但不限于用户设备、具有无线通信功能的手持式设备、车载设备、可穿戴设备或计算设备。示例性地,终端设备可以是手机(mobile phone)、平板电脑或带无线收发功能的电脑。终端设备还可以是虚拟现实(virtual reality,VR)终端设备、增强现实(augmented reality,AR)终端设备、工业控制中的无线终端设备、无人驾驶中的无线终端设备、远程医疗中的无线终端设备、智能电网中的无线终端设备、智慧城市(smart city)中的无线终端设备、智慧家庭(smart home)中的无线终端设备等等。本申请实施例中,用于实现终端设备还可以是能够支持终端设备实现该功能的装置,例如芯片系统,该装置可以被安装在终端设备中。本申请实施例提供的技术方案中,以用于实现终端设备的功能的装置是终端设备为例,对本申请实施例提供的技术方案进行描述。
本申请实施例中所提及的对服务器102,具体可以是服务器,这里的服务器可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集合群或者分布式系统,还可以是提供云计算服务的云服务器,如图1实例性指出的云端103,本申请实施例在此不作限制。本申请实施例提供的技术方案中,以服务器为例,对本申请实施例提供的技术方案进行描述。
在一个实施例中,还可以存在一个计算机设备,该计算机设备能够实现上述提及的终端设备101和服务器102的相关功能,即既能与医生等用户进行交互,获取到所需的待检测图像集合并呈现分割结果,也能够对待检测图像集合实施本申请实施例所提及的图像检测方法。
请参见图4,图4是本申请实施例公开的一种图像检测方法的流程示意图,该方法可以在一个或多个计算机设备中执行,主要可以包括以下步骤。
S401:获取第一图像集合,第一图像集合包括至少一个模态的图像,每个模态的图像为相应模态 的医学影像。这里,第一图像集合为待检测的图像集合。每个模态的图像为一个图像序列。模态可以用于指示一个图像序列的类型,该类型取决于被检测对象(例如脑部,胸部等被检测部位)、检测工具类型(例如MRI、PET、CT等)和检测参数类型(例如FLAIR、T1、T1c、T2等)。简言之,模态可以认为是类型。不同模态的图像例如可以突出不同的病灶区域。如果第一图像集合不存在模态缺失,则第一图像集合可以包括N个模态的图像,N为大于1的整数,表示全模态(即不存在模态缺失)情况下图像集合包括的模态的数量。第一图像集合例如为核磁共振图像,可以包含4个模态的图像,即包括4个图像的序列。例如,I是一个第一图像集合,其中,W是每个模态的图像中单张图像的宽,H是每一张图像的高,D是每个模态的图像数量,例如为MRI数据的切片数量,。第一图像集合可以是从数据库中获取的,以用于检测图像异常区域;又或者,第一图像集合可以是对病人进行检查时所获取的医学影像。
S402:检测第一图像集合是否处于图像缺失状态。其中,图像缺失状态是指第一图像集合满足下述条件中至少一个:所述第一图像集合对应的模态少于预定的N个模态,N为大于1的正整数;以及所述第一图像集合中的至少一个模态的图像中缺失局部图像。当第一图像集合对应的模态少于预定的N个模态时,可以确定第一图像集合存在模态缺失的情况。至少一个模态的图像中缺失局部图像例如为一个模态的一个或多个图像的局部区域缺失像素值。预定的N个模态可以是关于医学影像的多种模态的组合,N个模态例如包括同一被检测对象(例如脑部或者胸部)的FLAIR、T1、T1c、T2。又例如,N个模态为PET、CT和MRI的组合。又例如,N个模态为不同被检测对象各自的模态的组合,例如为脑部的MRI和胸部的CT的组合。
S403:若第一图像集合处于图像缺失状态,则确定缺失描述信息。所述缺失描述信息用于指示下述中至少一个:第一图像集合对应的模态相对于所述N个模态缺失的至少一个模态,和第一图像集合中的至少一个模态中缺失图像的区域。例如,在全模态包括4个模态的情况下,如果第一图像集合包括3个模态的图像,可以确定第一图像集合缺失1个模态的图像。在一些情况中,第一图像集合可能不存在缺失情况,这种情况下,可以将第一图像集合直接视为待检测的全模态图像集合。本申请实施例可以直接通过图像检测模型对待检测的全模态图像集合进行检测,以确定图像异常区域。而不需要利用参考图像集合对待检测图像集合的全模态进行补齐。
S404:获取第一参考图像集合。第一参考图像集合包括N个模态的参考图像。第一参考图像集合中每个模态的参考图像用于表示每个模态对应的特异信息。在一个实施例中,一种模态对应的特异信息是指该种模态的图像所具有的区别于其他模态的图像的显著特征。换言之,N个模态的参考图像可以表征N个模态的特异信息。
S405:基于第一参考图像集合,在所述第一图像集合中补齐所述缺失描述信息对应的缺失部分,以得到第二图像集合。
在一个实施例中,步骤S405所执行的补齐可以包括:基于第一参考图像集合,在所述第一图像集合中添加所述缺失的至少一个模态对应的参考图像或者与所述缺失图像的区域对应的图像,以得到所述第二图像集合,其中添加的参考图像为所述第一参考图像集合中相应模态的参考图像,添加的与所述缺失图像的区域对应的图像为所述第一参考图像集合中与所述缺失图像的区域的位置相对应的局部图像。
在一种实现方式中,利用参考图像集合对缺失部分进行补齐,是一种既可以节约时间和空间,又能以极低的代价得到补齐缺失模态方法,该方法可以消除对额外模块的需求,提高图像检测效率。
在一个实施例中,第一参考图像集合是经过训练优化得到的,第一参考图像集合可以是对初始参考图像集合进行优化后得到的,初始参考图像集合可以包括N个模态的参考图像,即包括N个初始参考序列,每个初始参考序列包括一个或者多个参考图像,每个初始参考序列图像上的像素点的值为待优化值,对每个初始参考序列图像上的像素点的待优化值进行值优化,以得到参考图像集合,第一参考图像集合的获得具体可以参见图9和图11所对应实施例描述的训练过程。
S406:根据第二图像集合,检测图像异常区域。图像异常区域表示被检测出的异常对象,异常对象例如为病灶区域等。对于第二图像集合,后续可以通过图像检测模型进行检测,确定出图像异常区域以便于显示给用户。本申请对图像异常区域的检测,可以认为是对一些医疗图像的病灶区域的检测, 也可以是对一些图像中存在的和其他区域的图像内容或者正常区域的图像内容不一致的图像区域进行检测,也可以是对其他一些特殊情况的图像区域进行的检测处理,具体的异常区域检测用途可以根据训练时所使用的相应训练数据中的训练图像集合和监督图像集合来决定。
综上,根据本申请实施例的方案中,针对多模态的图像(即第一图像集合)存在模态缺失的场景,可以利用第一参考图像集合对缺失的图像进行补齐,进而利用经过补齐的多模态的图像(即第二图像集合)对图像异常区域进行检测。由于第一参考图像集合中每个模态的参考图像用于表示每个模态对应的特异信息,因此第二图像集合可以有效对多模态的图像中缺失部分进行补齐,从而有助于提高图像检测的效率和准确性。特别是,在医学影像检测场景中,本申请的实施例有助于提升模态缺失情况下的图像异常的检测效率和准确性,进而能辅助医生用户对MRI等医学检测影像的病灶区域等异常区域进行观察,有助于降低对病灶区域等异常的漏判或者误判。
在一个实施例中,步骤S406可以包括如下操作:
提取第二图像集合的特征表示,所述特征表示包括第二图像集合中不同模态之间的第一相关性和同一模态中图像区域之间的第二相关性;
基于所述特征表示,为第一图像集合重建与所述缺失描述信息对应的图像数据,以得到第三图像集合,第三图像集合为在第一图像集合上添加重建的图像数据后得到的结果;
识别出所述第三图像集合中异常区域,并将所述异常区域作为所述图像异常区域。
综上,步骤S406可以获取不同模态之间的第一相关性和同一模态中图像区域之间的第二相关性,从而可以获取多模态图像的缺失部分与未缺失部分之间的相关性,从而能够准确重建缺失部分的图像数据,进而能够提高检测异常区域的准确性。特别是,由于在提取特征表示之前利用表征特异信息的第一参考图像集合补齐第一图像集合,从而能够提高通过重建来恢复图像(即生成第三图像集合)的准确性(真实度)和效率,进而能够提高异常区域检测的准确性和效率。
在一种实现方式中,S406可以包括:基于用于对象识别的图像检测模型,利用所述第二图像集合确定出所述图像异常区域。
在一种实现方式中,所述S406可以包括:调用图像检测模型对第二图像集合进行图像检测,得到第一图像集合对应的图像异常区域。
具体地,可以将第二图像集合输入到图像检测模型中,图像检测模型负责对第二图像集合进行检测,得到分割结果,分割结果具体可以是关于图像异常区域的标记信息(例如包括异常区域的轮廓线),也可以是对第一图像集合进行异常区域标记后得到的结果。同时,在一些实现方式中,分割结果还可以包括第三图像集合。
在一种实施例中,图像检测模型包括第一模型和分类器。第一模型例如包括编码器和解码器。第一模型也可以称为图像检测模型的主干网络。第一模型例如为MAE结构,但不限于此。图像检测模型整体例如为VT-UNet结构,但不限于此。
其中,编码器用于提取所述第二图像集合的特征表示,所述特征表示包括第二图像集合中不同模态之间的第一相关性和同一模态中图像区域之间的第二相关性。
解码器用于基于所述特征表示,为所述第一图像集合重建与所述缺失描述信息对应的图像数据,以得到第三图像集合。
分类器用于分割出所述第三图像集合中异常区域,并将所述异常区域作为所述图像异常区域。
在一个实施例中,图像检测模型是经过训练优化得到的,具体的,图像检测模型是根据全模态训练图像集合、缺失训练图像集合以及组合训练图像集合得到的,组合训练图像集合是对缺失训练图像集合和初始参考图像集合进行组合后得到的,缺失训练图像集合是对全模态训练图像集合通过遮罩技术进行区域掩盖处理后得到的,图像检测模型的获得具体可以参见图9和图11所对应实施例描述的训练过程。
例如,针对MRI数据或脑肿瘤数据,可以利用图像检测模型得到四个单独模态图像和一个全模态图像的分割结果,参见图5,是本申请实施例公开的一种图像分割结果示意图,具体是使用四个单独模态图像和一个全模态图像对BraTS 2018(多模态脑部肿瘤分割比赛的数据集合)的数据进行的癌区分割结果,另外根据提供的金标准(Groundtruth)可以精确的对分割结果进行解读。FLAIR模态是核 磁共振(MR)的一种常用的模态,全称是液体衰减反转恢复模态,也称为水抑制成像技术,通俗地说,它是压水像。在该模态上,脑脊液呈现低信号(暗一些),实质性病灶和含有结合水的病灶显示为明显的高信号(亮一些);T1、T2是用于测量电磁波的物理量,它们可以作为成像的数据,不同的模态可以突出病灶不同的子区域,以此来协助医生用户最终确定病变情况。
值得指出的是,本申请的应用场景并不仅仅局限于MRI数据或脑肿瘤数据,亦可是其他类型的多模态医学影像数据组合(比如PET(Positron Emission Computed Tomography,正电子发射型计算机断层显像)、CT(Computed Tomography,电子计算机断层扫描)、MRI等的各种组合)和其他身体部位(如肺部肿瘤),如图6所示,(a)是基于PET的多模态影像的肺部肿瘤分割,(b)是基于CT的多模态影像的肺部肿瘤分割。也就是说,待检测图像集合可能是MRI模态、PET模态、或CT模态等等,还可能是MRI模态、PET模态以及CT模态中的两组或多种的组合,通过将多个模态(例如MRI、PET、CT)的模态组合到一起来进行综合AI识别,协助诸如医生等用户更为全面检测病变情况。需要说明的是,在MRI模态、PET模态以及CT模态中的两组或多种的组合的情况下,第一参考图像集合和图像检测模型是基于相应的组合图像来构成训练数据,例如,MRI和PET组合,那么在后续提及的预训练阶段和微调阶段,均是采用的MRI和PET组合得到的训练数据来进行优化训练,得到第一参考图像集合和图像检测模型。
在一种实现方式中,本申请的图像检测方法可以在可视化界面中进行显示,具体的,在用户界面上的第一显示区域显示第一图像集合,同时,在用户界面的第二显示区域显示第一图像集合对应的分割结果;同样的,在本申请中,分割结果可以包括下述中至少一个:所述图像异常区域;关于图像异常区域的标记信息;在第一图像集合中标记所述图像异常区域之后得到的结果;在第一图像集合中标记所述图像异常区域之后得到的结果。
如图7所示,是本申请实施例公开的一种图像检测方法的界面示意图,其中,701是第一显示区域,702是第二显示区域,用户可以通过点击第一显示区域701中的导入按钮导入第一图像集合,例如703所示,然后点击第一显示区域701中开始按钮,便可以在第二显示区域702中看到相应的分割结果,如图7所示,704为恢复得到第一图像集合对应的全模态图像集合,即第三图像集合,705为标记信息。该标记信息还可以直接与全模态图像集合704叠加一起显示。
本申请实施例,计算机设备先获取包括至少一个模态的第一图像集合,具体的,每个模态的图像包括一个或者多个图像;然后再对第一图像集合进行检测,若检测到第一图像集合处于图像缺失状态,则可以确定出第一图像集合中的缺失描述信息,再利用第一参考图像集合对缺失描述信息对应的的缺失部分进行补齐操作,得到第二图像集合进一步的,计算机设备再调用图像检测模型对第二图像集合进行图像检测,得到第一图像集合对应的分割结果。
本申请实施例,计算机设备先根据第一参考图像集合对有缺失模态情况的图像进行补齐,可以更好的得到第一图像集合的特征表示,从而可以有助于提升缺失模态情况下的多模态图像检测效果,进一步,利用已经训练好的图像检测模型对第二图像集合进行检测,由于该图像检测模型已经经过了不断的优化,可以更快的对第二图像集合进行检测,得到分割结果,从而提升整体的图像检测效率。
请参见图8,是本申请实施例公开的一种针对图像检测方法的训练框架图,大致分成两部分,一部分为预训练(图8示出的直线的上半部分),一部分为微调(图8示出的直线的下半部分)。
在预训练阶段,以一对全模态训练图像集合和初始参考图像集合为例,主要可以包括:对全模态训练图像进行区域掩盖处理得到缺失训练图像集合,此处的区域掩盖处理具体可以是掩盖多个模态中的任意一个或多个模态,也可以是掩盖一个或多个模态后,再对剩余模态中至少一个模态的图像进行局部掩盖。进而,将缺失训练图像集合和初始参考图像集合进行组合得到组合训练图像集合。进一步,将组合训练图像集合输入到初始的第一模型进行训练得到预测图像集合,根据预测图像集合与作为监督图像的全模态训练图像集合之间的差异,对第一模型和初始参考图像集合进行优化,以便于得到预训练参考图像集合和预训练的第一模型。在此基础上,可以得到包含预训练的第一模型和未训练的分类器的预训练的图像检测模型。差异例如可以被表征为预测图像集合和初始参考图像集合之间的损失值,并根据损失值对第一模型和初始参考图像集合进行调整,经过迭代训练,当损失值达到收敛条件,可以得到预训练的图像检测模型,其中预训练的目的是学习缺失模态情况下多模态图像的特征表示。 同时,在预训练过程中,通过不断的反向传播对模型进行优化(模型反衍)时,也优化出了一个预训练参考图像集合(即初始参考图像被不断优化后得到的),可以用于补齐在训练和推理过程中可能缺失的图像数据。
需要说明的是,初始参考图像集合可以是最初生成的一个N个模态的图像集合,也可以是在基于上一全模态训练图像集合和缺少训练图像集合进行训练后得到的一个需要进一步优化的N个模态的第一参考图像集合。也即,除了最终得到的可用的第一参考图像集合之外,需要被训练优化的都可以称之为初始参考图像集合。
在微调阶段,以一对全模态训练图像集合和该全模态训练图像集合对应的分割监督信息为例,主要可以包括:先将全模态训练图像集合输入到预训练阶段得到的预训练的图像检测模型中进行分割预测,得到第一分割预测信息,并将其存储到存储空间中,然后将全模态训练图像集合和预训练参考图像集合进行随机组合,得到组合微调图像集合,并将该组合微调图像集合输入预训练的图像检测模型进行分割预测,得到第二分割预测信息,而后根据第一分割预测信息与第二分割预测信息之间的差异、第二分割预测信息与为全模态训练图像集合配置的分割监督信息之间的差异,对预训练参考图像集合和预训练的图像检测模型进行优化,以得到第一参考图像集合和图像检测模型。分割监督信息例如为与图像异常区域有关的标记信息。
在一个实施例中,第一分割预测信息与第二分割预测信息之间的差异、第二分割预测信息与为全模态训练图像集合配置的分割监督信息之间的差异均是以损失值体现的,即先计算第一分割预测信息与第二分割预测信息之间损失值,计算第二分割预测信息与为全模态训练图像集合配置的分割监督信息之间损失值,然后再计算二者损失值之和,并根据该值对预训练的图像检测模型以及预训练参考图像集合进行微调,经过反复的调整,当损失值达到收敛条件,就可以实现图像检测模型能够在缺失模态情况下达到更高精度的异常或者病灶区域的分割效果,从而得到最终的图像检测模型和参考图像集合。经过以上两个阶段训练的得到的图像检测模型和参考图像集合属于通用型,在测试(使用)时可用于处理任何缺失模态情况下的MRI图像数据。
本申请在训练过程中所采用的网络模型的主干网络可以为VT-UNet,该网络是一个纯Transformer(一种自注意力变换网络)的架构,对应的参数量和计算量低于常用的3DUnet(一种影像分析模型)或者Vnet(一种影像分析模型)。同时,本申请使用Adam(Adaptive momentum,一种优化算法)算法作为网络训练时的优化器,设定第一阶段和第二阶段的训练轮数分别为600和400轮。训练初始学习率为3e-4,并且在训练的过程中采用余弦退火学习率调度机制,其对应的收敛性较好。本申请在两张2080Ti英伟达显卡上训练模型,批处理大小为2。为了标准化所有数据,在训练时可以将像素值剪切到强度值的百分之一到百分之九十九,然后进行最小或者最大缩放,最后随机裁剪到128×128×128像素的固定大小以进行训练。随机三维小块的边长可以设置为的16个像素。对应的初始参考图像集合中的图像由高斯噪声初始化,λ可以被设置为0.1。
需要说明的是:1.在进行缺失模态补齐时,还可以直接使用预训练模型生成缺失模态的合成数据;2.本申请中的网络模型除了使用VT-UNet作为主干网络以外,还可以使用其他常用分割网络作为主干网络;3.可以将本申请拓展到其他具有相似应用场景的多模态图像或者其他组织结构的分割任务,并不仅仅局限于MIR数据或脑肿瘤数据。
根据图8所阐述的训练框架图,其中,预训练阶段的流程图可以参见图9,是本申请实施例公开的一种针对图像检测模型的预训练的流程示意图,图9可以包括S901-S905,具体步骤如下:
S901:获取用于训练图像检测模型的训练数据,训练数据可以包括:多个全模态训练图像集合、初始参考图像集合。每个全模态训练图像集合包括N个模态的图像,并且每个模态的图像不缺失局部图像,所述初始参考图像集合包括N个模态的参考图像,所述初始参考图像集合中每个模态的参考图像表示每个模态对应的初始的特异信息。初始的特异信息是指最开始得到的特异信息,后续可以对初始参考图像集合(即初始的特异信息)进行优化。这些训练数据可以是从数据库中获取的,也可以是从相关机构获取的,比如MIR数据或脑肿瘤数据是可以从医院获取的。
S902:将每个全模态训练图像集合进行掩码处理,得到相应的缺失训练图像集合。所述缺失训练图像集合缺失至少一个模态的图像。在一个实施例中,可以通过遮罩处理等方式掩盖全模态训练图像 集合中的一个或多个模态的图像,得到缺失训练图像集合,例如,全模态训练图像集合有四个模态,对其进行区域掩盖处理,可以是掩盖其中的一个模态、两个模态或者三个模态得到缺失训练图像集合;又或者,掩盖全模态训练图像集合中的一个或多个模态,并对剩余的模态中的图像进行局部掩盖处理,得到缺失训练图像集合。
例如,全模态训练图像集合有四个模态,对其进行掩盖处理,可以是掩盖其中的一个模态,然后对剩余的三个模态的部分区域进行覆盖,从而得到缺失训练图像集合。其中,缺失训练数据集合可以包括M个缺失序列,M为大于或等于1,且小于N的整数。
S903:利用初始参考图像集合对所述缺失训练图像集合中缺失的图像进行补齐,得到组合训练图像集合。在一个实施例中,可以根据缺失训练图像集合和初始参考图像集合得到组合训练图像集合,具体可以将初始参考图像集合中相应模态(即与缺失训练图像集合缺失的模态相同的模态)的参考图像添加到缺失训练图像集合中,以及,将初始参考图像集合中相应区域(该区域的位置与缺失训练图像集合中至少一个模态中缺失图像的区域的位置对应)的图像数据添加到缺失训练图像集合中的缺失图像的区域,从而得到组合训练图像集合
例如,对于MRI而言,如果缺失训练图像集合缺失了T1模型,则可以通过参考图像中对应于T1的参考图像覆盖在该缺失训练图像集合T1模态的位置处,得到补齐后的组合训练图像集合。
S904:将所述组合训练图像集合输入所述第一模型,以得到所述第一模型输出的预测图像集合。在得到组合训练图像集合后,将组合训练图像集合输入到初始的第一模型中进行处理,得到预测图像集合。其中,第一模型是基于掩模自编码器构建的,对应的第一模型可以基于MAE或者VT-UNet,当然还可以使用其他常用深度神经网络,本申请不对其进行限定。
S905:根据所述预测图像集合与所述全模态训练图像集合之间的差异,对所述第一模型进行优化,以得到预训练的第一模型。
S906:基于所述差异,优化所述初始参考图像集合,以得到预训练参考图像集合。
在一种实现方式中,根据预测图像集合与全模态训练图像集合之间的差异,对第一模型和所述初始参考图像集合进行优化,可以根据预测图像集合与全模态训练图像集合之间的损失值对初始模型(即为训练的图像检测模型)的第一模型和所述初始参考图像集合进行优化。
在一个实施例中,可以是计算预测图像集合与全模态训练图像集合之间的损失值,若是大量的预测图像集合与全模态训练图像集合之间的损失值小于或者等于第一阈值时,则确定出预训练参考图像集合和预训练的第一模型,又或者是在第一模型处于某组模型参数情况下,针对大量的缺失训练图像集合或者全模态训练图像集合而言,对应的损失值最小,确定此时对应的第一模型为预训练的第一模型,并得到对应的预训练参考图像集合。
在一个实施例中,基于预测图像集合与所述全模态训练图像集合之间的差异,通过模型反衍方式优化所述初始参考图像集合,以得到预训练参考图像集合。在另一种实现方式中,根据预测图像集合与全模态训练图像集合之间的差异,对第一模型进行优化的优化目标表达式如公式(1):
其中,x表示全模态训练图像集合,x′表示缺失训练图像集合,xsub表示初始参考图像集合,S(x′,xsub)表示组合训练图像集合,F是重构函数,是L2的正则项,γ是权重,是均方误差损失函数。通过该优化公式,可以确定在是损失值最小时对应的模型,并将其确定为预训练的图像检测模型,该目标优化公式可以使得第一模型在没有任何标注的情况下学习到数据中模态间的关系以及解剖的完整性。根据预测图像集合与全模态训练图像集合之间的差异,对初始参考图像集合进行优化的优化目标表达式如公式(2):
其中,x表示全模态训练图像集合,x′表示缺失训练图像集合,xsub表示初始参考图像集合,表示预训练参考图像集合,S(x′,xsub)表示组合训练图像集合,F是重构函数,是L2的正则项,γ是权重,是均方误差损失函数。通过该优化公式,用中对应的内容来补全预训练过程中x中被掩盖掉的内容,而不是用0直接进行掩盖,可以更好的重建具有缺失内容(模态或者部分块)的多模态,补全的内容必须捕捉能代表性特定模态的特异信息,这也将有助于提升缺失部分模态情况下的多模态 分割的效果。通俗的理解即是通过反向传播对初始参考图像集合进行优化的,可以称之为模型反衍,通过这种方式,模型不用引入新的模块,并且初始参考图像集合的优化代价极低。请参见图10a,是一种通过该优化方式优化后得到的全模态恢复图像集合。
在本申请中,公式(1)和公式(2)可以使用一个很小的正则权重,即γ=0.005,同时,采用均方误差损失函数可以使得模型更好地重建原图像,而正则项可以让得到的xsub的可信度更高。
值得注意的是,针对步骤S902中区域掩盖处理,本申请做了相应的实验并证明采用本申请的掩盖方法在各类方法中可以取得更好的效果,请参见图10b,是本申请的方法在不同掩模概率下的实验效果,在MAE方法中,模型只能通过参考图像周围的内容还原被掩盖的区域,而在本申请训练过程中,模型还可以通过参考其他模态的图像来还原被掩盖的区域。因此,本申请选择了较大的掩模概率,来使得本申请的自监督任务更加的困难,从而使模型能够学到更好的特征。如图10b所示,使用0.8125或0.875得到的最终模型表现均比0.75(在MAE相关论文中的掩膜概率)效果更好,其中,使用Dice指标(一种集合相似度度量指标,如DSC(Dice Similarity Coefficient,Dice相似系数))来衡量实验效果,Dice越高越好,其中,WT(whole tumor)为肿瘤整体,包括所有肿瘤区域;TC(tumor core)为肿瘤核心,由增强肿瘤、坏死区域和非增强肿瘤核心组成;ET(enhancing tumor)为增强肿瘤。
本申请主要阐述的是模型训练过程中的预训练过程,目的是得到预训练的第一模型和预训练参考图像集合,本申请通过一种多模态掩膜自编码器来学习缺失模态情况下多模态MRI中丰富的特征表示,该模型(即第一模型)是一个单一编码器-解码器结构,减低了对模型的训练难度。同时,预训练阶段是根据训练数据和基于模型反衍的模态补齐规则,对第一模型和初始参考图像集合进行训练,得到预训练参考图像集合和预训练的第一模型的,预训练参考图像集合和预训练的第一模型可以用于补齐在训练和推理过程中可能缺失的模态,从而提升了本申请对图像进行检测的效率。
根据图8所阐述的训练框架图,其中,微调阶段的流程图可以参见图11,图11可以包括S1101-S1104,具体步骤如下:
S1101:将预训练参考图像集合与全模态训练图像集合进行组合,得到组合微调图像集合。
在一个实施例中,所述组合微调图像集合包括的N个模态的图像,N个模态的图像中的x个模态的图像来自于所述预训练参考图像集合,y个模态的图像来自于所述全模态训练图像集合,其中,x和y为正整数,且x+y=N。
在一种实现方式中,根据规则表示信息将预训练参考图像集合与全模态训练图像集合进行组合,得到组合微调图像集合;又或者随机的将预训练参考图像集合与全模态训练图像集合进行组合从而得到组合微调图像集合。其中,N个模态的图像中的x个模态的图像来自于所述预训练参考图像集合,y个模态的图像来自于所述全模态训练图像集合,其中,x和y为正整数,且x+y=N。
例如,图8所示,显示有规则表示信息,根据该规则表示信息可以将全模态训练图像集合和预训练参考图像集合进行组合得到组合微调图像集合,可见,规则表示信息中深色的部分,表示将全模态训练图像集合对应的位置进行覆盖,浅色的部门表示不覆盖全模态训练图像集合对应的位置,根据该规则进行组合,从而得到组合微调图像集合。
S1102:将所述全模态训练图像集合输入预训练的图像检测模型,得到图像检测模型输出的第一分割预测信息。预训练的图像检测模型包括预训练的第一模型和分类器。分类器的输入端与所述第一模型的输出端连接。在一种实现方式中,为了更好的得到分割结果,先将全模态训练图像集合输入到预训练的图像检测模型中进行分割预测,得到第一分割预测信息,并将其存储在一个存储空间中,还可以将该存储空间存储到CPU内存中,也更适用于由于缺乏GPU内存而不能实现联合训练的硬件。同时,在对模型进行微调时,还可以对第一分割预测信息进行实时地更新。
S1103:将所述组合微调图像集合输入预训练的图像检测模型,得到图像检测模型输出的第二分割预测信息。
S1104:根据第一分割预测信息与第二分割预测信息之间的差异、第二分割预测信息与为全模态训练图像集合配置的分割监督信息之间的差异,对预训练参考图像集合和预训练的图像检测模型进行优化,以得到第一参考图像集合和经训练的图像检测模型。
在一种实现方式中,根据第一分割预测信息与第二分割预测信息之间的差异、第二分割预测信息与为全模态训练图像集合配置的分割监督信息之间的差异,对预训练参考图像集合和预训练的图像检测模型进行优化,可以是计算第一分割预测信息与第二分割预测信息之间的损失值以及计算第二分割预测信息与分割监督信息之间的损失值,从而根据这两个损失值对预训练参考图像集合和预训练的图像检测模型进行参数值优化,得到第一参考图像集合和经训练的图像检测模型的。
具体的,对预训练参考图像集合和预训练的图像检测模型进行优化的优化目标表达式为公式(3):
其中,表示第一分割预测信息(全模态时对应的分割结果),表示第二分割预测信息(缺失模态时对应的分割结果),sgt表示全模态训练图像集合配置的分割监督信息,f为代表第一模型的函数,fs为代表分类器的分割头,λ是权重,λ可以设置成0.1,为Dice损失与交叉损失之和,为一致性损失函数,如公式(4):
是计算的全模态下分割结果和缺失模态下分割结果的KL距离(Kullback-Leibler Divergence,它衡量的是相同事件空间里的两个概率分布的差异情况),具体的, W是图像集合中每一张图像的宽,H是图像集合中每一张图像的高,D是图像集合中每一张图像的切片数量,C是图像集合分割的总类别。
步骤S1102-S1104是一种计算高效的自蒸馏方法,该方法能够在同一个网络中将任务相关的知识从全模态的数据迁移到缺失模态的数据,可以将模型微调为可以同时处理各种缺失模态的多模态分割模型,同时也降低训练和部署时的计算开销。
本申请主要阐述的是模型训练过程中的微调过程,目的是得到最终的图像检测模型和参考图像集合,可以针对任何情况的模态图像进行处理,从而得到对应的分割结果。本申请的微调任务是一种计算高效的自蒸馏方法,在分割任务的微调过程中将全模态数据的信息蒸馏到缺失模态,以实现在缺失模态情况下更高精度的分割效果。
经过图9以及图11两个阶段的训练后,生成的图像检测模型和参考图像集合均具有较强的通用性,在使用(即预测过程)时可以用于处理任何缺失模态情况下的MRI数据。基于此,本申请做了具体的实验,具体是在PyTorch神经网络框架上完成的实验,并得到对应的实验效果。具体的,本申请的图像检测方法对应的技术在脑部肿瘤分割比赛BraTS 2018和BraTS 2019上都分别进行了实验用于验证其有效性。BraTS系列的数据集合由多对包括四个模态的MRI组成,分别是T1,T1c,T2和FLAIR。这些数据经过了比赛方的整理组织,进行了包括剥去颅骨,重新采样到统一分辨率(1mm3),并在同一模板上进行共配准等预处理。
在这项比赛中,四种肿瘤内结构(水肿,增强肿瘤,坏死和非增强肿瘤核心)被分为三个肿瘤区域并作为比赛的分割目标:1.肿瘤整体(whole tumor,WT),包括所有肿瘤区域;2.肿瘤核心(tumor core,TC),由增强肿瘤、坏死区域和非增强肿瘤核心组成;3.增强肿瘤(enhancing tumor,ET)。BraTS 2018和BraTS 2019数据集合分别包括285例和335例数据和对应的肿瘤区域标注。在实验中,可以随机将两份数据集合分别以80:20的比例分成训练集合和测试集合。在本申请中,可以采用Dice系数和95%豪斯多夫距离(HD95)作为评测指标,另外,还可以使用线上评测系统验证本申请技术在全模态情况下在数据库中存储的验证集合中的表现情况。上述的BraTS 2018和BraTS 2019是目前数据库中已存在的两个数据集合,可以直接拿来使用。
表1给出了本申请图像检测方法和三个缺失模态情况下脑部MRI肿瘤分割通用型方法在BraTS 2018数据集上的对比情况,这三个方法分别是:HVED,LCRL和FGMF。其中FGMF是目前指标最高的通用型方法。由于这些方法都是在20%数据上展现的分割结果,所以可以直接从数据库摘抄这些方法所对应的结果。在表1中可以发现,本申请所提出的图像检测方法在测试集上的整体表现是最好的,在三个肿瘤区都取得了最好的中位数,并且本申请提出的图像检测方法在大部分的情况下都取得了最好的结果(本申请的图像检测方法在三个肿瘤区域中分别在14、11、10个情况中都取得了最好的结果,总共有15种缺失情况)。值得一提的是,本申请提出的图像检测方法所采用的技术采用了基础的单一编码器-解码器框架,而与之对比的上述三种方法无一例外都采用了多编码器或者多解码器的框 架,其计算量大于本申请所提出的图像检测方法。
表1
其中,已有的和缺失的模态分别用·和ο表示,p值通过Wilcoxon分别测试相应方法和本申请的图像检测方法的显著性给出,上述三种方法的具体解释:HVED(Hetero-Modal Variational Encoder-Decoder for Joint Modality Completion and Segmentation,用于联合模态完成和分割的异模态变分编解码器),LCRL(Latent correlation representation learning for brain tumor segmentation with missing MRI modalities,缺失MRI模式下的潜在相关表征学习在脑肿瘤分割中的应用)和FGMF(Feature-enhanced generation and multimodality fusion based deep neural network for brain tumor segmentation with missing MR modalities,基于特征增强生成和多模态融合的深度神经网络,用于缺少MR模态的脑肿瘤分割)。
表2给出了本申请的图像检测方法在BraTS 2019数据集上和唯一在这个数据集上做了对比实验的LCRL的对比情况。结果表明本申请的图像检测方法在所有肿瘤区域的所有缺失情况下都优于LCRL,说明了本申请的图像检测方法具有比较好的泛化性。
表2

其中,已有的和缺失的模态分别用·和ο表示,p值通过Wilcoxon分别测试相应方法和本申请的图像检测方法的显著性给出。
另外,虽然本申请的图像分割方法提出的是一个“通用”型模型,但为了体现本申请的图像检测方法的效果,也跟目前最好的专用模型ACN(Adversarial Co-training Network,对抗性联合分割网络)进行了对比,该方法使用的训练测试比例跟本申请的图像检测方法并不相同,可以直接摘抄了其论文中展示的结果作为参考。
具体结果在表3中可以看到,本申请的图像检测方法在仅需训练一个模型的情况下跟针对每一种缺失情况都单独训练模型的ACN(该方法在本申请的实验中需要训练15个模型)整体表现几乎一致。
表3
其中,已有的和缺失的模态分别用·和ο表示,p值通过Wilcoxon分别测试相应方法和本申请的图像检测方法的显著性给出。
进一步的,为了客观地展现本申请的图像检测方法在全模态情况下的表现情况,如表4所示,在 表4中比较了本申请的图像检测方法和几个目前的方法在两个数据集的线上测试结果,包括LCRL、VT-UNet-T(本申请的图像检测方法使用的主干网络)、TransBTS(另一个针对脑部MRI肿瘤分割的Transformer模型)。另外,相应比赛的优选方案的结果(从已有数据库中获取的)也被包含在表4中作为参照;值得指出的是,这些优选方案通常都经过了大量的工程化,比如精细地调参。结果显示,对比其他非比赛方案,本申请的图像检测方法在9种情况下都取得了最好的结果(两个数据集×两个指标×三个肿瘤区域,共12种情况)。并且本申请的图像检测方法的结果在部分情况下几乎超过了相应比赛的优选方案,而这些优选方案通常都经过了大量的调参。这些结果表明,本申请的图像检测方法学习的多模态表示不仅对缺失模态具有鲁棒性,而且对全模态也具有有效性。
表4
其中,本申请的图像检测方法在BraTS 2018(左)和BraTS 2019(右)数据中全模态条件下与现有最优方法的比较结果,challenge表示相应比赛的优选方案,NA表示无法获取。
更进一步的,为了验证本申请的图像检测方法提出的技术中各个模块的有效性,可以通过逐一去掉整体方案中的某个模块的方法来完成消融实验。结果显示如表5所示,可以总结出以下几个结论:
1.在第1、2行(a,b)中,去掉了训练过程中的预训阶段,并在后者中加入在ImageNet数据集上的预训练参数,这两行的结果都明显的下降,表明预训练出来的预训练的图像检测模型在本申请的图像检测方法提出的框架中有着不可或缺的作用。
2.在第三行(c)中,将本申请中通过模型反衍学习到的全模态图像替换为全0的图像使得结果得到了下降,而第四行(d)中将全模态图像替换为相应模态中所有数据的平均值使得结果明显恶化,说明本申请的图像检测方法提出的基于模型反衍的缺失模态补齐方案捕捉到了更有用的模态特征信息,可以有效地作为脑瘤分割中缺失模态的补充。
3.最后,对比第五行(e),本申请所提出的框架的评估指标在所有肿瘤区域都更好,验证了全模态到缺失模态自蒸馏的有效性。另外,本申请所提出的自蒸馏框架相比于联合训练的方式节约了大约52GFLOPS(浮点运算)的计算量。
表5
基于上述的方法实施例,本申请实施例还提供了一种图像检测装置的结构示意图。参见图12,为本发明实施例提供的一种图像检测装置的结构示意图。所述装置可以应用于上述提及的服务器,也可以应用于一个计算机设备中,图12所示的图像检测装置1200可运行如下单元:
获取单元1201,用于获取第一图像集合,所述第一图像集合包括至少一个模态的图像,每个模态的图像为相应模态的医学影像;
确定单元1202,用于检测所述第一图像集合是否处于图像缺失状态,其中,所述图像缺失状态是指所述第一图像集合满足下述条件中至少一个:所述第一图像集合对应的模态少于预定的N个模态, N为大于1的正整数;以及所述第一图像集合中的至少一个模态的图像中缺失局部图像;
处理单元1203,用于:
若所述第一图像集合处于图像缺失状态,则确定缺失描述信息,其中,所述缺失描述信息用于指示下述中至少一个:所述第一图像集合对应的模态相对于所述N个模态缺失的至少一个模态,和所述第一图像集合中的至少一个模态的图像中缺失图像的区域;
获取第一参考图像集合,所述第一参考图像集合包括所述N个模态的参考图像,所述第一参考图像集合中每个模态的参考图像用于表示每个模态对应的特异信息;
基于所述第一参考图像集合,在所述第一图像集合中补齐所述缺失描述信息对应的缺失部分,以得到第二图像集合;
根据所述第二图像集合,检测图像异常区域。
在一种实现方式中,所述处理单元1203根据所述第二图像集合,检测图像异常区域,具体可用于:
基于用于对象识别的图像检测模型,利用所述第二图像集合确定出所述图像异常区域。
在一种实现方式中,所述图像检测装置还包括:
显示单元1204,用于在用户界面上的第一显示区域显示所述第一图像集合;在所述用户界面的第二显示区域显示所述第一图像集合对应的分割结果;其中,所述分割结果包括下述中任一个:所述图像异常区域;在第一图像集合中标记所述图像异常区域之后得到的结果;或者在第一图像集合中标记所述图像异常区域之后得到的结果,和所述第三图像集合。
在一种实现方式中,所述获取单元1201,还用于获取用于训练所述图像检测模型的训练数据,所述训练数据包括:多个全模态训练图像集合、初始参考图像集合;每个全模态训练图像集合包括N个模态的图像,并且每个模态的图像不缺失局部图像,所述初始参考图像集合包括N个模态的参考图像,所述初始参考图像集合中每个模态的参考图像表示每个模态对应的初始的特异信息。
所述处理单元1203,还用于:
将每个全模态训练图像集合进行掩码处理,得到相应的缺失训练图像集合,所述缺失训练图像集合缺失至少一个模态的图像;
利用初始参考图像集合对所述缺失训练图像集合中缺失的图像进行补齐,得到组合训练图像集合;
将所述组合训练图像集合输入所述第一模型,以得到所述第一模型输出的预测图像集合;
根据所述预测图像集合与所述全模态训练图像集合之间的差异,对所述第一模型进行优化,以得到预训练的第一模型;
基于所述差异,优化所述初始参考图像集合,以得到预训练参考图像集合。
在一种实现方式中,所述处理单元1203将每个全模态训练图像集合进行掩码处理,得到相应的缺失训练图像集合,具体包括:
掩盖所述全模态训练图像集合中的一个或多个模态的图像,得到缺失训练图像集合;或者掩盖所述全模态训练图像集合中的一个或多个模态的图像,并对剩余的至少一个模态的图像进行局部掩盖处理,以得到缺失训练图像集合。
在一种实现方式中,所述处理单元1203根据所述预测图像集合与所述全模态训练图像集合之间的差异,对所述第一模型进行优化的优化目标表达式为:
其中,x表示全模态训练图像集合,x′表示缺失训练图像集合,xsub表示初始参考图像集合,S(x′,xsub)表示组合训练图像集合,F是重构函数,是L2的正则项,γ是权重,是均方误差损失函数。
在一种实现方式中,所述处理单元1203基于所述差异,优化所述初始参考图像集合的优化目标表达式为:
其中,x表示全模态训练图像集合,x′表示缺失训练图像集合,xsub表示初始参考图像集合,表示预训练参考图像集合,S(x′,xsub)表示组合训练图像集合,F是重构函数,是L2的正则项,γ是权重,是均方误差损失函数。
在一种实现方式中,所述处理单元1203还用于:
将所述预训练参考图像集合与所述全模态训练图像集合进行组合,得到组合微调图像集合,所述组合微调图像集合包括的N个模态的图像,N个模态的图像中的x个模态的图像来自于所述预训练参考图像集合,y个模态的图像来自于所述全模态训练图像集合,其中,x和y为正整数,且x+y=N;
将全模态训练图像集合输入预训练的图像检测模型,得到图像检测模型输出的第一分割预测信息;
将组合微调图像集合输入预训练的图像检测模型,得到图像检测模型输出的第二分割预测信息;
根据第一分割预测信息与第二分割预测信息之间的差异、第二分割预测信息与为全模态训练图像集合配置的分割监督信息之间的差异,对所述预训练参考图像集合和所述预训练图像检测模型进行优化,以得到所述第一参考图像集合和经训练的图像检测模型。
在一种实现方式中,所述处理单元1203对所述预训练参考图像集合和所述预训练的图像检测模型进行优化的优化目标表达式为:
其中,表示第一分割预测信息,表示第二分割预测信息,sgt表示全模态训练图像集合配置的分割监督信息,λ是权重,为Dice损失与交叉损失之和,为一致性损失函数。f为代表第一模型的函数,fs为代表分类器的分割头。
在一种实现方式中,通过对第一模型和初始参考图像集合进行训练优化得到对应的图像检测模型和参考图像集合;
对所述第一模型和所述初始参考图像集合进行训练优化包括预训练和微调;所述第一模型是基于掩模自编码器构建的;
在预训练时,是根据训练数据和基于模型反衍的模态补齐规则,对所述第一模型和所述初始参考图像集合进行训练,得到预训练参考图像集合和预训练的图像检测模型的;
在微调时,是根据所述全模态训练图像集合到缺失序列数据集合的自蒸馏方法对所述预训练参考图像集合和所述预训练的图像检测模型进行训练,得到第一参考图像集合和图像检测模型的。
本申请的实施例先根据参考图像集合对有缺失模态情况的图像进行补齐,可以更好的得到第一图像集合的特征信息,从而可以有助于提升缺失模态情况下的多模态图像的分割效果,进一步,利用已经训练好的图像检测模型对目标检测图像集合进行检测,由于该图像检测模型已经经过了不断的优化,可以更快的对目标检测图像集合进行检测,得到分割结果,从而提升了图像检测效率。
基于上述方法以及装置实施例,本申请实施例提供了一种计算机设备。参见图13,为本申请实施例提供的一种计算机设备的结构示意图。图13所示的计算机设备1300至少包括处理器1301、输入接口1302、输出接口1303、计算机存储介质1304以及存储器1305。其中,处理器1301、输入接口1302、输出接口1303、计算机存储介质1304以及存储器1305可通过总线或其他方式连接。
计算机存储介质1304可以存储在是计算机设备1300的存储器1305中,所述计算机存储介质1304用于存储计算机程序,所述计算机程序包括程序指令,所述处理器1301用于执行所述计算机存储介质1304存储的程序指令。处理器1301(或称CPU(Central Processing Unit,中央处理器))是计算机设备1300的计算核心以及控制核心,其适于实现一条或多条计算机程序,具体适于加载并执行一条或多条计算机程序从而实现相应方法流程或相应功能。
本申请实施例还提供了一种计算机存储介质(Memory),所述计算机存储介质是计算机设备中的记忆设备,用于存放程序和数据。可以理解的是,此处的计算机存储介质既可以包括计算机设备中的内置存储介质,当然也可以包括计算机设备支持的扩展存储介质。计算机存储介质提供存储空间,该存储空间存储了计算机设备的操作系统。并且,在该存储空间中还存放了适于被处理器1301加载并执行一个或一个以上的计算机程序(包括程序代码)。需要说明的是,此处的计算机存储介质可以是高速RAM存储器,也可以是非不稳定的存储器(non-volatile memory),例如至少一个磁盘存储器;可选的还可以是至少一个位于远离前述处理器的计算机存储介质。所述计算机存储介质可由处理器2701加载并执行计算机存储介质中存放的一条或多条计算机程序,以实现上述有关图4、图9以及图11所示的图像检测方法的相应步骤。具体实现中,计算机存储介质中的一条或多条指令由处理器1301加载并执行本申请实施例的图像检测方法。
本申请实施例中提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令被处理器执行时,可执行上述所有实施例中所执行的步骤。
本申请实施例还提供一种计算机程序产品或计算机程序,计算机程序产品或计算机程序包括计算机指令,计算机指令存储在计算机可读存储介质中,计算机指令被计算机设备的处理器执行时,执行上述所有实施例中的方法。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)或随机存储记忆体(Random Access Memory,RAM)等。
以上所揭露的仅为本申请一种较佳实施例而已,当然不能以此来限定本申请之权利范围,本领域普通技术人员可以理解实现上述实施例的全部或部分流程,并依本申请权利要求所作的等同变化,仍属于发明所涵盖的范围。

Claims (17)

  1. 一种图像检测方法,在计算机设备中执行,所述方法包括:
    获取第一图像集合,所述第一图像集合包括至少一个模态的图像,每个模态的图像为相应模态的医学影像;
    检测所述第一图像集合是否处于图像缺失状态,其中,所述图像缺失状态是指所述第一图像集合满足下述条件中至少一个:所述第一图像集合对应的模态少于预定的N个模态,N为大于1的正整数;以及所述第一图像集合中的至少一个模态的图像中缺失局部图像;
    若所述第一图像集合处于图像缺失状态,则确定缺失描述信息,其中,所述缺失描述信息用于指示下述中至少一个:所述第一图像集合对应的模态相对于所述N个模态缺失的至少一个模态,和所述第一图像集合中的至少一个模态的图像中缺失图像的区域;
    获取第一参考图像集合,所述第一参考图像集合包括所述N个模态的参考图像,所述第一参考图像集合中每个模态的参考图像用于表示每个模态对应的特异信息;
    基于所述第一参考图像集合,在所述第一图像集合中补齐所述缺失描述信息对应的缺失部分,以得到第二图像集合;
    根据所述第二图像集合,检测图像异常区域。
  2. 如权利要求1所述的方法,其中,所述基于第一参考图像集合,在所述第一图像集合中补齐所述缺失描述信息对应的缺失部分,以得到第二图像集合,包括:
    基于第一参考图像集合,在所述第一图像集合中添加所述缺失的至少一个模态对应的参考图像或者与所述缺失图像的区域对应的图像,以得到所述第二图像集合,其中添加的参考图像为所述第一参考图像集合中相应模态的参考图像,添加的与所述缺失图像的区域对应的图像为所述第一参考图像集合中与所述缺失图像的区域的位置相对应的局部图像。
  3. 如权利要求1所述的方法,根据所述第二图像集合,检测图像异常区域,包括:
    提取所述第二图像集合的特征表示,所述特征表示包括第二图像集合中不同模态的图像之间的第一相关性和同一模态中图像区域之间的第二相关性;
    基于所述特征表示,为所述第一图像集合重建与所述缺失描述信息对应的图像数据,以得到第三图像集合,所述第三图像集合为在所述第一图像集合上添加重建的图像数据后得到的结果;
    识别出所述第三图像集合中异常区域,并将所述异常区域作为所述图像异常区域。
  4. 如权利要求1-3任一项所述的方法,根据所述第二图像集合,检测图像异常区域,包括:
    基于用于对象识别的图像检测模型,利用所述第二图像集合确定出所述图像异常区域。
  5. 如权利要求4所述的方法,其中,所述图像检测模型包括第一模型和分类器,所述第一模型包括编码器和解码器;
    所述编码器用于提取所述第二图像集合的特征表示,所述特征表示包括第二图像集合中不同模态的图像之间的第一相关性和同一模态中图像区域之间的第二相关性;
    所述解码器用于基于所述特征表示,为所述第一图像集合重建与所述缺失描述信息对应的图像数据,以得到第三图像集合,所述第三图像集合为在所述第一图像集合上添加重建的图像数据后得到的结果;
    所述分类器用于分割出所述第三图像集合中异常区域,并将所述异常区域作为所述图像异常区域。
  6. 如权利要求1-5任一项所述的方法,所述方法还包括:
    在用户界面上的第一显示区域显示所述第一图像集合;
    在所述用户界面的第二显示区域显示所述第一图像集合对应的分割结果;
    其中,所述分割结果包括下述中至少一个:
    所述图像异常区域;
    在第一图像集合中标记所述图像异常区域之后得到的结果;和
    所述第三图像集合。
  7. 如权利要求5所述的方法,所述方法还包括:
    获取用于训练所述图像检测模型的训练数据,所述训练数据包括:多个全模态训练图像集合、初始参考图像集合;每个全模态训练图像集合包括N个模态的图像,并且每个模态的图像不缺失局部图像,所述初始参考图像集合包括N个模态的参考图像,所述初始参考图像集合中每个模态的参考图像表示每个模态对应的初始的特异信息;
    将每个全模态训练图像集合进行掩码处理,得到相应的缺失训练图像集合,所述缺失训练图像集合缺失至少一个模态的图像;
    利用初始参考图像集合对所述缺失训练图像集合中缺失的图像进行补齐,得到组合训练图像集合;
    将所述组合训练图像集合输入所述第一模型,以得到所述第一模型输出的预测图像集合;
    根据所述预测图像集合与所述全模态训练图像集合之间的差异,对所述第一模型进行优化,以得到预训练的第一模型;
    基于所述差异,优化所述初始参考图像集合,以得到预训练参考图像集合。
  8. 如权利要求7所述的方法,所述基于所述差异,优化所述初始参考图像集合,以得到预训练参考图像集合,包括:
    基于所述差异,通过模型反衍方式优化所述初始参考图像集合,以得到预训练参考图像集合。
  9. 如权利要求7所述的方法,所述将每个全模态训练图像集合进行掩码处理,得到相应的缺失训练图像集合,包括:
    掩盖所述全模态训练图像集合中的一个或多个模态的图像,得到缺失训练图像集合;或者
    掩盖所述全模态训练图像集合中的一个或多个模态的图像,并对剩余的至少一个模态的图像进行局部掩盖处理,以得到缺失训练图像集合。
  10. 如权利要求7所述的方法,通过下述方式执行所述根据所述预测图像集合与所述全模态训练图像集合之间的差异,对所述第一模型进行优化:
    其中,x表示全模态训练图像集合,x′表示缺失训练图像集合,xsub表示初始参考图像集合,S(x′,xsub)表示组合训练图像集合,F是重构函数,是L2的正则项,γ是权重,是均方误差损失函数。
  11. 如权利要求7所述的方法,根据下述方式执行所述基于所述差异,优化所述初始参考图像集合:
    其中,x表示全模态训练图像集合,x′表示缺失训练图像集合,xsub表示初始参考图像集合,表示预训练参考图像集合,S(x′,xsub)表示组合训练图像集合,F是重构函数,是L2的正则项,γ是权重,是均方误差损失函数。
  12. 如权利要求7所述的方法,所述方法还包括:
    将所述预训练参考图像集合与所述全模态训练图像集合进行组合,得到组合微调图像集合,所述组合微调图像集合包括的N个模态的图像,N个模态的图像中的x个模态的图像来自于所述预训练参考图像集合,y个模态的图像来自于所述全模态训练图像集合,其中,x和y为正整数,且x+y=N;
    将所述全模态训练图像集合输入预训练的图像检测模型,得到所述预训练的图像检测模型输出的第一分割预测信息,所述预训练的图像检测模型包括所述预训练的第一模型和所述分类器;
    将所述组合微调图像集合输入所述预训练的图像检测模型,得到所述预训练的图像检测模型输出的第二分割预测信息;
    根据所述第一分割预测信息与所述第二分割预测信息之间的差异、所述第二分割预测信息与为所述全模态训练图像集合配置的分割监督信息之间的差异,对所述预训练参考图像集合和所述预训练的图像检测模型进行优化,以得到所述第一参考图像集合和经训练的图像检测模型。
  13. 如权利要求12所述的方法,根据下述方式对所述预训练参考图像集合和所述预训练的图像检测模型进行优化:
    其中,表示第一分割预测信息,表示第二分割预测信息,sgt表示全模态训练图像集合配置的分割监督信息,λ是权重,为Dice损失与交叉损失之和,为一致性损失函数,f为代表第一模型的函数,fs为代表分类器的分割头。
  14. 一种图像检测装置,所述装置包括:
    获取单元,用于获取第一图像集合,所述第一图像集合包括至少一个模态的图像,每个模态的图像为相应模态的医学影像;
    确定单元,用于检测所述第一图像集合是否处于图像缺失状态,其中,所述图像缺失状态是指所述第一图像集合满足下述条件中至少一个:所述第一图像集合对应的模态少于预定的N个模态,N为大于1的正整数;以及所述第一图像集合中的至少一个模态的图像中缺失局部图像;
    处理单元,用于:
    若所述第一图像集合处于图像缺失状态,则确定缺失描述信息,其中,所述缺失描述信息用于指示下述中至少一个:所述第一图像集合对应的模态相对于所述N个模态缺失的至少一个模态,和所述第一图像集合中的至少一个模态的图像中缺失图像的区域;
    获取第一参考图像集合,所述第一参考图像集合包括所述N个模态的参考图像,所述第一参考图像集合中每个模态的参考图像用于表示每个模态对应的特异信息;
    基于所述第一参考图像集合,在所述第一图像集合中补齐所述缺失描述信息对应的缺失部分,以得到第二图像集合;
    根据所述第二图像集合,检测图像异常区域。
  15. 一种计算机设备,所述计算机设备还包括:
    处理器,适于实现一条或多条计算机程序;以及,计算机存储介质,所述计算机存储介质存储有一条或多条计算机程序,所述一条或多条计算机程序适于由所述处理器加载并执行如权利要求1-13任一项所述的图像检测方法。
  16. 一种非易失性计算机可读存储介质,所述存储介质存储有一条或多条计算机程序,所述一条或多条计算机程序适于由处理器加载并执行如权利要求1-13任一项所述的图像检测方法。
  17. 一种计算机程序产品,存储有一条或多条指令,所述一条或多条指令适于由处理器加载并执行如权利要求1-13任一项所述的方法。
PCT/CN2023/089441 2022-04-27 2023-04-20 图像检测方法、装置、计算机设备、存储介质及程序产品 WO2023207743A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210456475.6A CN115115575A (zh) 2022-04-27 2022-04-27 一种图像检测方法、装置、计算机设备及存储介质
CN202210456475.6 2022-04-27

Publications (1)

Publication Number Publication Date
WO2023207743A1 true WO2023207743A1 (zh) 2023-11-02

Family

ID=83326600

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/089441 WO2023207743A1 (zh) 2022-04-27 2023-04-20 图像检测方法、装置、计算机设备、存储介质及程序产品

Country Status (2)

Country Link
CN (1) CN115115575A (zh)
WO (1) WO2023207743A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115115575A (zh) * 2022-04-27 2022-09-27 腾讯医疗健康(深圳)有限公司 一种图像检测方法、装置、计算机设备及存储介质
CN115954100B (zh) * 2022-12-15 2023-11-03 东北林业大学 胃癌病理图像智能辅助诊断系统

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190332900A1 (en) * 2018-04-30 2019-10-31 Elekta Ab Modality-agnostic method for medical image representation
CN112037171A (zh) * 2020-07-30 2020-12-04 西安电子科技大学 基于多模态特征融合的多任务mri脑瘤图像分割方法
CN113496495A (zh) * 2021-06-25 2021-10-12 华中科技大学 可缺失输入的医学图像分割模型建立方法及分割方法
CN114119788A (zh) * 2021-12-01 2022-03-01 南京大学 一种基于对抗生成网络的多模态医学图像编码与生成方法
CN114283151A (zh) * 2021-08-16 2022-04-05 腾讯科技(深圳)有限公司 用于医学图像的图像处理方法、装置、设备及存储介质
CN115115575A (zh) * 2022-04-27 2022-09-27 腾讯医疗健康(深圳)有限公司 一种图像检测方法、装置、计算机设备及存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190332900A1 (en) * 2018-04-30 2019-10-31 Elekta Ab Modality-agnostic method for medical image representation
CN112037171A (zh) * 2020-07-30 2020-12-04 西安电子科技大学 基于多模态特征融合的多任务mri脑瘤图像分割方法
CN113496495A (zh) * 2021-06-25 2021-10-12 华中科技大学 可缺失输入的医学图像分割模型建立方法及分割方法
CN114283151A (zh) * 2021-08-16 2022-04-05 腾讯科技(深圳)有限公司 用于医学图像的图像处理方法、装置、设备及存储介质
CN114119788A (zh) * 2021-12-01 2022-03-01 南京大学 一种基于对抗生成网络的多模态医学图像编码与生成方法
CN115115575A (zh) * 2022-04-27 2022-09-27 腾讯医疗健康(深圳)有限公司 一种图像检测方法、装置、计算机设备及存储介质

Also Published As

Publication number Publication date
CN115115575A (zh) 2022-09-27

Similar Documents

Publication Publication Date Title
US10706333B2 (en) Medical image analysis method, medical image analysis system and storage medium
Shin et al. Deep vessel segmentation by learning graphical connectivity
WO2021036616A1 (zh) 一种医疗图像处理方法、医疗图像识别方法及装置
Mahapatra et al. Image super-resolution using progressive generative adversarial networks for medical image analysis
Mahapatra et al. Training data independent image registration using generative adversarial networks and domain adaptation
WO2023207743A1 (zh) 图像检测方法、装置、计算机设备、存储介质及程序产品
US20220122263A1 (en) System and method for processing colon image data
WO2022032823A1 (zh) 图像分割方法、装置、设备及存储介质
CN111429421A (zh) 模型生成方法、医学图像分割方法、装置、设备及介质
KR102102255B1 (ko) 의료 영상에서 병변의 시각화를 보조하는 방법 및 이를 이용한 장치
WO2022032824A1 (zh) 图像分割方法、装置、设备及存储介质
US11935213B2 (en) Laparoscopic image smoke removal method based on generative adversarial network
CN111080670A (zh) 图像提取方法、装置、设备及存储介质
Singh et al. Deep LF-Net: Semantic lung segmentation from Indian chest radiographs including severely unhealthy images
CN110570394A (zh) 医学图像分割方法、装置、设备及存储介质
Sarica et al. A dense residual U-net for multiple sclerosis lesions segmentation from multi-sequence 3D MR images
CN115601299A (zh) 基于图像的肝硬化状态智能评估系统及其方法
Tan et al. Skin lesion recognition via global-local attention and dual-branch input network
CN112862785B (zh) Cta影像数据识别方法、装置及存储介质
CN112862786B (zh) Cta影像数据处理方法、装置及存储介质
Liu et al. Anisotropic hybrid network for cross-dimension transferable feature learning in 3D medical images
WO2022227193A1 (zh) 肝脏区域分割方法、装置、电子设备及存储介质
CN111369564B (zh) 一种图像处理的方法、模型训练的方法及装置
CN112862787B (zh) Cta影像数据处理方法、装置及存储介质
Sille A transfer learning approach for deep learning based brain tumor segmentation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23795176

Country of ref document: EP

Kind code of ref document: A1