WO2023160157A1 - 三维医学图像的识别方法、装置、设备、存储介质及产品 - Google Patents

三维医学图像的识别方法、装置、设备、存储介质及产品 Download PDF

Info

Publication number
WO2023160157A1
WO2023160157A1 PCT/CN2022/139576 CN2022139576W WO2023160157A1 WO 2023160157 A1 WO2023160157 A1 WO 2023160157A1 CN 2022139576 W CN2022139576 W CN 2022139576W WO 2023160157 A1 WO2023160157 A1 WO 2023160157A1
Authority
WO
WIPO (PCT)
Prior art keywords
features
image
feature
dimensional
round
Prior art date
Application number
PCT/CN2022/139576
Other languages
English (en)
French (fr)
Inventor
江铖
庞建业
姚建华
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2023160157A1 publication Critical patent/WO2023160157A1/zh
Priority to US18/377,958 priority Critical patent/US20240046471A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects
    • G06V20/647Three-dimensional objects by matching two-dimensional images to three-dimensional objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/06Topological mapping of higher dimensional structures onto lower dimensional surfaces
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10072Tomographic images
    • G06T2207/10081Computed x-ray tomography [CT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10072Tomographic images
    • G06T2207/10088Magnetic resonance imaging [MRI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10072Tomographic images
    • G06T2207/10104Positron emission tomography [PET]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/03Recognition of patterns in medical or anatomical images

Definitions

  • the embodiment of the present application is based on the Chinese patent application with the application number 202210191770.3 and the filing date of February 28, 2022, and claims the priority of the Chinese patent application.
  • the entire content of the Chinese patent application is hereby incorporated into the embodiment of the present application as refer to.
  • the embodiments of the present application relate to the field of artificial intelligence, and in particular to a method, device, equipment, computer-readable storage medium, and computer program product for recognizing a three-dimensional medical image.
  • image analysis of the 3D medical image can be performed using an image dense prediction method, wherein the dense prediction method refers to a method of predicting each pixel in the image.
  • image recognition is performed based on the overall 3D medical image to obtain an image recognition result.
  • the method of directly performing image recognition based on 3D medical images has a large amount of calculation, low recognition efficiency, and requires a large amount of data for pre-training, which is relatively complicated.
  • Embodiments of the present application provide a three-dimensional medical image recognition method, device, device, computer-readable storage medium, and computer program product, which can improve the three-dimensional medical image recognition efficiency and reduce computational complexity. Described technical scheme is as follows:
  • An embodiment of the present application provides a method for recognizing a three-dimensional medical image, which is executed by a computer device, and the method includes:
  • view angle rearrangement is performed on the 3D medical image features of the i-1th round to obtain 2D image features, wherein the 3D medical image features of the i-1th round are 3D medical image features
  • the features obtained by performing the i-1th round of feature extraction on the image, and the different two-dimensional image features are the features of the i-1th round of three-dimensional medical image features under different viewing angles;
  • image recognition processing is performed to obtain the image recognition result of the three-dimensional medical image, wherein, i is a positive integer that increases sequentially, 1 ⁇ i ⁇ I, and I is positive integer.
  • An embodiment of the present application provides a three-dimensional medical image recognition device, the device comprising:
  • the viewing angle rearrangement module is configured to perform viewing angle rearrangement processing on the i-1th round of 3D medical image features during the i-th round of feature extraction to obtain two-dimensional image features, wherein the i-1th round of 3D medical image features
  • the medical image feature is the feature obtained by performing the i-1 round of feature extraction on the three-dimensional medical image, and the different two-dimensional image features are the features of the i-1 round of three-dimensional medical image features under different viewing angles;
  • the feature extraction module is configured to perform semantic feature extraction processing on each of the two-dimensional image features to obtain image semantic features under different viewing angles;
  • the feature fusion module is configured to perform feature fusion processing on the image semantic features under different viewing angles to obtain the i-th round of 3D medical image features;
  • the image recognition module is configured to perform image recognition processing based on the first round of three-dimensional medical image features obtained by the first round of feature extraction, and obtain the image recognition result of the three-dimensional medical image, wherein, i is a positive integer that increases sequentially, and 1 ⁇ i ⁇ I, I is a positive integer.
  • An embodiment of the present application provides a computer device, the computer device includes a processor and a memory, the memory stores at least one instruction, at least one program, code set or instruction set, the at least one instruction, the at least A program, the code set or instruction set is loaded and executed by the processor to implement the method for recognizing a three-dimensional medical image as described in the above aspects.
  • An embodiment of the present application provides a computer-readable storage medium. At least one instruction, at least one program, code set or instruction set is stored in the readable storage medium. The at least one instruction, the at least one program, the The above code set or instruction set is loaded and executed by the processor to implement the recognition method of the three-dimensional medical image as described in the above aspects.
  • An embodiment of the present application provides a computer program product or computer program, where the computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer instruction from the computer-readable storage medium, and the processor executes the computer instruction, so that the computer device executes the method for recognizing a three-dimensional medical image provided by the above aspect.
  • the perspective rearrangement is first performed on the three-dimensional medical image features, thereby dividing them into two-dimensional image features under different perspectives, and performing feature extraction on the two-dimensional image features respectively,
  • the image semantic features under different viewing angles are obtained, and then the image semantic features under different viewing angles are fused to obtain the 3D medical image features after feature extraction.
  • the embodiment of the present application uses a simplified local computing unit to perform different viewing angles. Feature extraction can reduce computational complexity, thereby improving the recognition efficiency of 3D medical images.
  • FIG. 1 shows a schematic diagram of the principles of the three-dimensional medical image recognition method provided by the embodiment of the present application
  • Figure 2 shows a schematic diagram of the implementation environment provided by the embodiment of the present application
  • FIG. 3 shows a flowchart of a method for recognizing a three-dimensional medical image provided by an embodiment of the present application
  • FIG. 4 shows a flow chart of a three-dimensional medical image recognition method provided by an embodiment of the present application
  • FIG. 5 shows a schematic structural diagram of the overall image recognition structure provided by the embodiment of the present application.
  • FIG. 6 shows a schematic structural diagram of the spatial feature extraction process shown in the embodiment of the present application.
  • FIG. 7 shows a schematic structural diagram of the semantic feature extraction process shown in the embodiment of the present application.
  • FIG. 8 shows a schematic structural diagram of the feature fusion process shown in the embodiment of the present application.
  • FIG. 9 shows a schematic structural diagram of the TR-MLP network shown in the embodiment of the present application.
  • FIG. 10 shows a schematic structural diagram of a skip connection fusion network shown in an embodiment of the present application.
  • Fig. 11 shows a structural block diagram of a three-dimensional medical image recognition device provided by an embodiment of the present application
  • FIG. 12 shows a schematic structural diagram of a computer device provided by an embodiment of the present application.
  • Artificial Intelligence is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results.
  • artificial intelligence is a comprehensive technique of computer science that attempts to understand the nature of intelligence and produce a new kind of intelligent machine that can respond in a similar way to human intelligence.
  • Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
  • Artificial intelligence technology is a comprehensive subject that involves a wide range of fields, including both hardware-level technology and software-level technology.
  • Artificial intelligence basic technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, and mechatronics.
  • Artificial intelligence software technology mainly includes several major directions such as computer vision technology, speech processing technology, natural language processing technology, and machine learning/deep learning.
  • Computer vision technology is a science that studies how to make machines "see”. To put it further, it refers to the use of cameras and computers instead of human eyes to identify and measure the target and other machine vision, and further make graphics Processing, so that the computer processing becomes an image that is more suitable for human observation or sent to the instrument for detection.
  • Computer vision technology usually includes image processing, image recognition, image segmentation, image semantic understanding, image retrieval, video processing, video semantic understanding, video content/behavior recognition, 3D object reconstruction, 3D technology, virtual reality, augmented reality, simultaneous positioning and Map construction and other technologies also include common biometric identification technologies such as face recognition and fingerprint recognition.
  • the 3D medical image recognition method involved in the embodiment of the present application can reduce the computational complexity by performing feature extraction on the 2D image features corresponding to the 3D medical image features under different viewing angles. Improve the efficiency of 3D medical image recognition.
  • the perspective rearrangement is performed on the i-1-th round of 3D medical image features 101 obtained by the i-1-th round of feature extraction, and the first The first two-dimensional image feature 102 under the viewing angle, the second two-dimensional image feature 103 under the second viewing angle, and the third two-dimensional image feature 104 under the third viewing angle, respectively, for the first two-dimensional image feature 102 under different viewing angles , the second two-dimensional image feature 103 and the third two-dimensional image feature 104 to perform semantic feature extraction to obtain the first image semantic feature 105, the second image semantic feature 106 and the third image semantic feature 107, so that the three are fused to obtain The i-th round of semantic features of 3D images 108 .
  • the three-dimensional medical image features are decomposed into two-dimensional image features under different viewing angles, the two-dimensional image features are extracted, which helps to reduce the amount of calculation, thereby improving the recognition efficiency of three-dimensional medical images.
  • the method provided in the embodiment of the present application can be applied to an image recognition process of any three-dimensional medical image.
  • the category of each part in the three-dimensional medical image can be identified, thereby assisting the analysis of lesions and organs.
  • the computer equipment used for 3D medical image recognition can be various types of terminal equipment or servers, where the server can be an independent physical server, or a server cluster composed of multiple physical servers or a distributed
  • the system can also be a cloud server that provides cloud computing services; the terminal can be a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart watch, etc., but is not limited thereto.
  • the server can be a server cluster deployed in the cloud, and open artificial intelligence cloud services (Aiaas, AI as a Service) to users.
  • AIaaS platform will split several types of common AI services and provide independent services in the cloud. Or packaged services, this service model is similar to an AI-themed mall, all users can access one or more artificial intelligence services provided by the AIaaS platform through the application programming interface.
  • one of the artificial intelligence cloud services may serve for 3D medical image recognition, that is, the server in the cloud is packaged with the 3D medical image recognition program provided by the embodiment of the present application.
  • the user calls the 3D medical image recognition service in the cloud service through a terminal (running a client, such as a lesion analysis client, etc.), so that the server deployed in the cloud calls the packaged 3D medical image recognition program to decompose the features of the 3D medical image
  • Research on treatment methods for example, conduct auxiliary diagnosis based on edema indicators included in image recognition results to determine whether the target object may be inflammation or trauma or allergies, or drink too much water.
  • the recognition method of the 3D medical image provided by the embodiment of the present application is not directly aimed at obtaining the result of disease diagnosis or health status, and the diagnosis result of disease or health status cannot be directly obtained according to the image recognition result, that is, the image recognition result is not It is directly used for disease diagnosis and is only used as intermediate data to assist patients in disease prediction, and to assist doctors and researchers in disease diagnosis, follow-up visits, and research on treatment methods
  • FIG. 2 shows a schematic diagram of an implementation environment provided by an embodiment of the present application.
  • This implementation environment includes a terminal 210 and a server 220 .
  • data communication is performed between the terminal 210 and the server 220 through a communication network.
  • the communication network may be a wired network or a wireless network, and the communication network may be at least one of a local area network, a metropolitan area network, and a wide area network. A sort of.
  • the terminal 210 is an electronic device running a three-dimensional medical image recognition program.
  • the electronic device may be a smart phone, a tablet computer, or a personal computer, etc., which is not limited in this embodiment of the present application.
  • the three-dimensional medical image can be input into the program of the terminal 210, the terminal 210 uploads the three-dimensional medical image to the server 220, and the server 220 executes the recognition method of the three-dimensional medical image provided by the embodiment of the present application Image recognition, and feed back the image recognition result to the terminal 210.
  • the server 220 can be an independent physical server, or a server cluster or distributed system composed of multiple physical servers, and can also provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, Cloud servers for basic cloud computing services such as middleware services, domain name services, security services, content delivery network (CDN, Content Delivery Network), and big data and artificial intelligence platforms.
  • cloud services such as middleware services, domain name services, security services, content delivery network (CDN, Content Delivery Network), and big data and artificial intelligence platforms.
  • the server 220 is used to provide image recognition services for applications installed in the terminal 210 .
  • an image recognition network is set in the server 220 for classifying the three-dimensional medical images sent by the terminal 210 .
  • the image recognition network can also be deployed on the side of the terminal 210, and the terminal 210 locally implements the recognition method of the three-dimensional medical image provided by the embodiment of the present application (that is, the image recognition network), without using the server 220, correspondingly Yes, the image recognition network can be trained on the terminal 210 side, which is not limited in this embodiment of the present application.
  • the following embodiments are described by taking the method for recognizing a three-dimensional medical image executed by a computer device as an example.
  • FIG. 3 shows a flowchart of a method for recognizing a three-dimensional medical image provided by an embodiment of the present application, and the method includes the following steps.
  • Step 301 in the feature extraction process of the i-th round, perform viewing angle rearrangement processing on the i-1th round of 3D medical image features to obtain 2D image features, wherein the i-1th round of 3D medical image features is a three-dimensional
  • the features obtained by performing the i-1 round of feature extraction on the medical image, and the different 2D image features are the features of the i-1 round of 3D medical image features under different viewing angles.
  • the 3D medical image feature is a feature extracted from the 3D medical image to be recognized.
  • the three-dimensional medical image to be identified may be a three-dimensional medical image such as a computerized tomography (CT, Computed Tomography) image, a magnetic resonance imaging (MRI, Magnetic Resonance Imaging) or a positron emission tomography (PET, Positron Emission Computed Tomography) image.
  • CT computerized tomography
  • MRI Magnetic resonance imaging
  • PET Positron Emission Computed Tomography
  • the first round of 3D medical image features are features obtained by feature extraction of initial 3D medical image features
  • the initial 3D medical image features are features obtained by initial embedding processing of 3D medical images.
  • the initial embedding process is used to map high-dimensional data such as 3D medical images to a low-dimensional space, so as to obtain low-dimensional initial 3D medical image features.
  • three-dimensional medical image recognition is performed through multiple rounds of feature extraction processes.
  • the same feature extraction network is used for feature extraction, and in each round of feature extraction, the input of the feature extraction network is determined according to the output of the previous round of feature extraction network, that is, in the i-th round
  • feature extraction is performed based on the i-1th round of 3D medical image features.
  • the 3D medical image features are 3D data
  • the 3D medical image features are first divided, that is, in the i-th round of feature extraction process, the features obtained in the i-1th round of feature extraction are rearranged in perspective .
  • viewing angle rearrangement is to divide 3D medical image features into 2D image features under different viewing angles, so as to perform feature extraction based on 2D image features under different viewing angles and reduce computational complexity.
  • the process of viewing angle rearrangement is implemented by the following method: performing visual rearrangement processing on multiple dimensions of the i-1th round of 3D medical image features to obtain two-dimensional image features under multiple viewing angles, namely The multiple dimensions of the 3D medical image features of round i-1 are arranged and combined to obtain multiple viewing angles, and the 2D image features under each viewing angle are respectively extracted.
  • the (H, W, D) dimension of the (H, W, D) dimension of the i-1th round of three-dimensional medical image features is rearranged to obtain (H, W), (H, D) and (W, D) Two-dimensional image features under three viewing angles, each viewing angle corresponds to a two-dimensional direction in the three-dimensional medical image feature.
  • Different two-dimensional image features are image features corresponding to different two-dimensional image slices, wherein the two-dimensional image slice is a two-dimensional image in two-dimensional space after the visual rearrangement of the three-dimensional medical image.
  • Step 302 perform semantic feature extraction processing on each two-dimensional image feature, and obtain image semantic features under different viewing angles.
  • semantic feature extraction will be performed on the two-dimensional image feature, so as to learn the image information in the corresponding two-dimensional image slice.
  • the process of semantic feature extraction for two-dimensional image features includes the learning of two-dimensional image switching space information and image semantic learning based on the corresponding perspective.
  • the corresponding image semantic features under different viewing angles can be obtained. That is, the image semantic features corresponding to the three perspectives of (H, W), (H, D) and (W, D) are obtained.
  • Step 303 performing feature fusion processing on the image semantic features under different viewing angles to obtain the i-th round of 3D medical image features.
  • the semantic features of images under different viewing angles can be fused to complete this round of feature extraction process, and obtain the i-th round of 3D medical image features, and then based on The i-th round of 3D medical image features is subjected to a feature extraction process of the i+1 round of 3D medical image features.
  • Step 304 perform image recognition processing based on the first round of 3D medical image features obtained from the first round of feature extraction, and obtain the image recognition result of the 3D medical image, wherein, i is a positive integer that increases sequentially, 1 ⁇ i ⁇ I, I is a positive integer.
  • the feature extraction process will end, and after the end of the first round of feature extraction process, image recognition will be performed based on the first round of three-dimensional medical image features.
  • the perspective rearrangement is first performed on the three-dimensional medical image features, so as to divide them into two-dimensional image features under different perspectives, and the two-dimensional images are respectively The features are extracted to obtain the image semantic features under different viewing angles, and then the image semantic features under different viewing angles are fused to obtain the 3D image semantic features after feature extraction.
  • the embodiment of the present application uses a simplified local computing unit to perform feature extraction at different angles of view , which can reduce the computational complexity, thereby improving the recognition efficiency of 3D medical images.
  • each two-dimensional image feature in the feature extraction process of two-dimensional image features under different viewing angles, will be divided, so as to learn the corresponding feature of the local window, and the corresponding slice of each two-dimensional image feature
  • the context features are learned to obtain image semantic features under different viewing angles, which will be described below with an exemplary embodiment.
  • FIG. 4 shows a flowchart of a three-dimensional medical image method provided by an embodiment of the present application, and the method includes the following steps.
  • Step 401 in the i-th round of feature extraction process, perform viewing angle rearrangement processing on the i-1th round of 3D medical image features to obtain 2D image features.
  • the initial embedding process (Patch Embedding) is first performed on the 3D medical image. Medical image features are used as the starting point, and multiple rounds of feature extraction are performed, among which Convolutional Stem is the initial convolutional layer of the convolutional neural network. Among them, the initial embedding process is used to map high-dimensional data such as 3D medical images to a low-dimensional space, so as to obtain low-dimensional initial 3D medical image features.
  • the feature extraction process includes a feature encoding process and a feature decoding process
  • the feature encoding process includes a process of downsampling the 3D medical image features, that is, reducing the dimension of the 3D medical image features
  • the feature decoding process includes
  • the upsampling process of 3D medical image features is to increase the dimension of 3D medical image features.
  • the downsampling process uses a 3D convolution with a Kernel Size of 3 and a Stride of 2, and two downsampling steps each time. times, while the upsampling process uses a 3D transposed convolution with a Kernel Size of 2 and a Stride of 2, and upsamples twice each time.
  • each round of feature extraction process is implemented by the same Transformer-Multilayer Perceptron (TR-MLP, Transformer-Multilayer Perceptron) structure.
  • TR-MLP Transformer-Multilayer Perceptron
  • the input size is a three-dimensional medical image of C i ⁇ H ⁇ W ⁇ D
  • the initial embedding process (Patch Embedding) 501 is performed first, wherein the size of the image block (Patch) is 2 ⁇ 2, Get the 3D medical image features of C ⁇ H/4 ⁇ W/4 ⁇ D/4, and input the 3D medical image features of C ⁇ H/4 ⁇ W/4 ⁇ D/4 into the first TR-MLP block (Block)
  • the first round of feature extraction is performed in the first round of feature extraction.
  • the obtained first round of 3D medical image features are down-sampled to obtain a 2C ⁇ H/8 ⁇ W/8 ⁇ D/8 3D medical image features, and input the 2C ⁇ H/8 ⁇ W/8 ⁇ D/8 3D medical image features into the second TR-MLPBlock for the second round of feature extraction to obtain the second round of 3D medical image features, after that, directly
  • the second round of 3D medical image features is input into the third TR-MLP Block for the third round of feature extraction.
  • the obtained third round of 3D medical image features is down-sampled again until it is down-sampled to 8C ⁇ H/32 ⁇ W/32 ⁇ D/32, and then perform the upsampling process.
  • the feature extraction process performed in TR-MLP Block 502 and the previous feature extraction process in TR-MLP Block are feature encoding processes, and then feature decoding processes.
  • each round of feature encoding process or feature decoding process is realized through visual rearrangement processing, semantic feature processing, and feature fusion processing.
  • step 302 in FIG. 3 may be implemented through steps 402 to 403 in FIG. 4 .
  • Step 402 performing spatial feature extraction processing on the two-dimensional image features to obtain the two-dimensional image spatial features.
  • the process may include step 402a-step 402c (not shown in the figure):
  • Step 402a performing window division processing on the two-dimensional image features to obtain local two-dimensional image features corresponding to N windows, wherein the N windows do not overlap with each other, and N is a positive integer greater than 1.
  • a window-based multi-head self-attention (W-MSA, Window-Multi-head Self-Attention) network structure is mainly used to model long-distance and local spatial semantic information in two-dimensional image slices.
  • W-MSA Window-Multi-head Self-Attention
  • the two-dimensional image feature Z is first divided into windows, and divided into local two-dimensional image features Z i corresponding to N non-overlapping windows , the division process can be shown as formula (1):
  • M is the window size set by W-MSA
  • HW refers to the size of the two-dimensional image feature, that is, the size of the two-dimensional image obtained by segmentation under the (H, W) perspective.
  • the attention calculation is performed based on the window, and the output result is obtained, that is, the local two-dimensional image space features.
  • attention processing is realized through attention mechanism.
  • the attention mechanism (Attention Mechanism) is used to selectively focus on a part of all information while ignoring other information.
  • the attention mechanism can enable the neural network to have the ability to focus on part of the input, that is, to select a specific input.
  • the attention mechanism is a resource allocation scheme that is the main means to solve the problem of information overload, and allocates computing resources to more important tasks.
  • the embodiment of the present application is not limited to the form of the attention mechanism, for example, the attention mechanism may be multi-head attention, key-value pair attention, structured attention, and the like.
  • Step 402b performing feature extraction processing on N local two-dimensional image features to obtain two-dimensional image window features.
  • the mode of the feature extraction processing includes the following steps:
  • Step 1 Perform self-attention processing on the N local two-dimensional image features to obtain self-attention features of the N local two-dimensional image features.
  • each local two-dimensional image feature is subjected to self-attention processing, wherein the self-attention processing process is multi-head self-attention processing.
  • Each local 2D image feature corresponds to multiple self-attention heads.
  • self-attention processing is performed based on the query item Q, the key item K, and the value item V corresponding to the local two-dimensional image features to obtain N self-attention features of the local two-dimensional image features.
  • the query item (Q, Query), key item (K, Key) and value item (V, Value) corresponding to the kth self-attention head are respectively Among them, k is a positive integer greater than 1, then the calculation method of the k-th self-attention feature of the local two-dimensional image feature Z i corresponding to the i-th window is shown in formula (2):
  • RPE is the relative position encoding information, that is, the window position encoding, which represents the perceivable spatial position information of the window.
  • the self-attention feature corresponding to the kth self-attention head contains the features corresponding to N windows, as shown in formula (3):
  • Step 2 Carry out feature fusion processing on the self-attention features of N local two-dimensional image features to obtain the internal features of the first image window.
  • W H is the parameter matrix
  • Concat represents the merge operation
  • the normalization processing may be performed in a batch normalization (BN, Batch Normalization) manner.
  • the viewing angle v is one of the viewing angles (H, W), (H, D) and (W, D).
  • the normalized local two-dimensional image features Input into the W-MSA structure for self-attention processing.
  • Step 3 Perform convolution processing on the internal features of the first image window to obtain the interaction features of the first image window.
  • the W-MSA structure uses the W-MSA structure to learn the features of each local two-dimensional image feature, and in order to further strengthen the learning of two-dimensional image features, the depth separable convolution block (DWConv2D) with a Kernel Size of 5 will be used.
  • the structure performs convolution processing, thereby increasing the learning of locality between spatially adjacent windows. For example, the internal features of the first image window are input into the DWConv2D network for convolution processing to obtain the interaction features of the first image window.
  • DWConv2D can also include a residual structure, which is to fuse the internal features of the first image window after convolution with the internal features of the first image window to obtain the interactive features of the first image window As shown in formula (6):
  • Step 4 Perform feature extraction processing on the interaction features of the first image window through the multi-layer perceptron MLP to obtain the two-dimensional image window features.
  • BN is used to normalize the interactive features of the first image window after convolution processing
  • MLP Multilayer Perceptron
  • MLP Multilayer Perceptron
  • MLP represents a multi-layer perceptron structure.
  • Step 402c perform window rearrangement processing on the N windows, and perform feature extraction processing on the two-dimensional image window features corresponding to the N windows after the window rearrangement respectively, to obtain two-dimensional image space features, wherein the window rearrangement is used for Change the spatial position of N windows.
  • window rearrangement is performed on the N windows, so as to learn window features of the two-dimensional image after window rearrangement again.
  • the shuffle operation can be used to rearrange the windows, thereby disrupting the spatial information and enhancing the interaction between cross-window information.
  • the method may include the following steps:
  • Step 1 Perform self-attention processing on the two-dimensional image window features corresponding to the N windows after the rearrangement of the windows, and obtain the self-attention features corresponding to the N windows respectively.
  • self-attention processing is performed on the two-dimensional image window features corresponding to the N windows after window rearrangement to obtain self-attention features.
  • the method reference may be made to the above steps, which will not be repeated here.
  • Step 2 Perform feature fusion processing on the N self-attention features to obtain internal features of the second image window.
  • the process of obtaining the internal features of the second image window through feature fusion may refer to the process of obtaining the internal features of the first image window through fusion, which will not be repeated here.
  • Step 3 Perform position flipping processing on the internal features of the second image window, and perform convolution processing on the internal features of the second image window after position flipping, to obtain the interactive features of the second image window.
  • re-scramble the position of the window so as to use the W-MSA structure to perform a window self-attention learning to enhance the information learning between windows, and then flip the position of the internal features of the second image window, that is, restore each window The corresponding position information is restored to the original position to obtain the interactive feature of the second image window.
  • T represents the window rearrangement operation
  • R represents the position flipping operation
  • DWConv2D is used to perform convolution processing again to obtain the interaction features of the second image window.
  • This process can refer to the process of obtaining the interaction features of the first image window through convolution processing in the above steps, and will not be repeated here.
  • Step 4 Perform feature extraction processing on the interaction features of the second image window through the MLP to obtain two-dimensional image space features.
  • MLP is used again for channel learning to obtain the final two-dimensional image space features.
  • the spatial feature extraction of two-dimensional image features is a full-view slice spatial shuffle block (FVSSSB, Full-View Slice Spatial Shuffle Block) process to obtain the two-dimensional image spatial features.
  • FVSSSB Full-View Slice Spatial Shuffle Block
  • the overall process is shown in Figure 6, so as to fully learn Two-dimensional image features, in order to extract accurate two-dimensional image space features, and facilitate subsequent accurate image recognition.
  • Step 403 based on the main viewing angle and the auxiliary viewing angle, perform semantic feature extraction processing on the spatial features of the two-dimensional image to obtain image semantic features, wherein the main viewing angle is the viewing angle corresponding to the two-dimensional image features, and the auxiliary viewing angle is the three-dimensional viewing angle that is different from the main viewing angle. perspective.
  • the process of extracting semantic features from two-dimensional image spatial features and obtaining image semantic features is a slice-aware context mixing (SAVCM, Slice-Aware Volume Context Mixing) process, in which the network parameters of the SAVCM network are shared under each perspective, That is, the network parameters are the same.
  • the process may include the following steps:
  • Step 403a performing feature fusion processing on the two-dimensional image spatial features and position coding features to obtain the first image semantic features, wherein the position coding features are used to indicate the position information corresponding to the two-dimensional image features.
  • the position coding feature is a parameter that can be learned automatically, so that the position information of the two-dimensional image slice is injected into the two-dimensional image space feature Inside, slice location-aware learning is achieved.
  • APE S means Corresponding spatial location codes.
  • Step 403b under the main perspective, perform semantic feature extraction on the semantic features of the first image through MLP, to obtain the semantic features of the main image.
  • the semantic feature extraction is performed in the main view and the auxiliary view respectively.
  • the main viewing angle refers to a viewing angle corresponding to the two-dimensional image feature
  • the auxiliary viewing angle is a viewing angle different from the main viewing angle in the three-dimensional viewing angle.
  • the main viewing angle is (H, W)
  • the auxiliary viewing angle is the remaining D viewing angle.
  • the semantic features of the main image can be obtained as shown in Figure 7, firstly, the semantic features of the first image (B, SP, C, TH) is rearranged to obtain (B, SP, TH, C), and then, use MLP to extract along the direction of channel C. Among them, the dimension is first increased to 4C, and the original number of channels is restored after extraction. C, and then restore the position of the extracted semantic features of the main image to (B, SP, C, TH), where SP represents the spatial dimension under the main perspective.
  • axial-MLP residual axial multilayer perceptron
  • Step 403c under the secondary perspective, extract the semantic features of the semantic features of the first image through MLP to obtain the semantic features of the secondary image.
  • the semantic features of the first image are extracted using MLP based on the auxiliary perspective, and the semantic features of the auxiliary image are obtained.
  • the semantic features of the first image are extracted along the secondary perspective, that is, the dimension is first increased to 4TH, and after the semantic features of the first image are extracted, it is restored to the original dimension TH, where TH represents the dimension under the secondary perspective. spatial dimension.
  • Step 403d performing feature fusion processing on the semantic features of the main image and the semantic features of the auxiliary image to obtain the semantic features of the image.
  • the features of the two will be fused to obtain the semantic features of the image.
  • the semantic features of the main image Secondary Image Semantic Features and the original features Merge on the channels, get the merged features after merging, and then use MLP to map the merged features, restore the original number of channels, and get the image semantic features
  • the context information of the two-dimensional image slice can be perceived and the accuracy of feature learning can be improved, as shown in formula (12):
  • Axial-MLP represents the axial multilayer perceptron operation
  • Concat represents the merge operation
  • MLP cp represents the feature fusion operation
  • step 303 in FIG. 3 may be implemented through steps 404-405 in FIG. 4 .
  • Step 404 performing fusion processing on the semantic features of the image and the perspective features to obtain the semantic features of the perspective image.
  • APE is added to realize the fusion processing of image semantic features and perspective features, and obtain perspective image semantic features Since the process of aggregating the rich semantics of pan-view learning is processed on channels, APE is added on the channels of pan-view features to achieve view-awareness during aggregation, as shown in Equation (13).
  • APE is a code corresponding to a channel, and is used to indicate a corresponding view, that is, a view feature, for example, a (H, W) view.
  • Step 405 performing feature fusion processing on the semantic features of images from various perspectives to obtain the i-th round of 3D medical image features.
  • the full-view features of the three channels are merged to obtain the merged features, among which, the number of channels ⁇ 3, and then use the natural logarithm (LN) to normalize the merged features, and finally use the MLP view aggregator MLP va to normalize the normalized
  • LN natural logarithm
  • MLP view aggregator MLP va to normalize the normalized
  • the feature is mapped back to the original number of channels, and the medical volume feature output Z l+1 of the current block of MLP-Transformer is obtained, which is the 3D medical image feature of the i-th round.
  • Concat represents the merge operation
  • LN represents the normalization operation
  • MLP va represents the mapping operation
  • step 304 in FIG. 3 may be implemented through steps 406-407 in FIG. 4 .
  • the feature extraction process includes a feature encoding process or a feature decoding process, wherein the feature encoding process includes the down-sampling process of the 3D medical image features, that is, reducing the dimension of the 3D medical image features, and the feature decoding process includes the 3D medical image features.
  • the upsampling process that is, increasing the dimensionality of 3D medical image features.
  • Step 406 When the upsampling result reaches the original size, determine the extracted 3D medical image features as the first round of 3D medical image features obtained from the first round of feature extraction.
  • the upsampling result reaches the original size of the three-dimensional medical image, it is determined as the first round of feature extraction process.
  • the corresponding result is determined as the first round of 3D medical image features, and the first round of 3D medical image features is used for target prediction 503 (ie, image recognition), obtaining an image recognition result.
  • target prediction 503 ie, image recognition
  • the target prediction result will be fused with the corresponding features of the initially input 3D medical image, so as to perform image recognition based on the fused features.
  • the input 3D medical image C i ⁇ H ⁇ W ⁇ D is convolved to obtain the initial 3D medical image feature C I ′ ⁇ H ⁇ W ⁇ D of the initial image, and C′ O ⁇ H ⁇ W ⁇ D ⁇ D and C' I ⁇ H ⁇ W ⁇ D are fused and convoluted to obtain the final output result.
  • Step 407 Perform image recognition processing based on the first round of 3D medical image features to obtain an image recognition result.
  • image recognition is performed based on the features of the first round of 3D medical images, so that subsequent image registration and classification can be performed on 3D medical images.
  • the TR-MLP network structure can be shown in FIG. 9, first, the perspective rearrangement is performed on the (H, W, D) dimension of the three-dimensional medical image feature Z l input by the current block, and the rearrangement is (H, W), (H, D), (W, D) two-dimensional image slices of three viewing angles, each viewing angle corresponds to a two-dimensional slice direction in 3D; use for rearranged full-view 2D image slices FVSSB fully learns 2D slice information to obtain two-dimensional image features; then uses slice-aware medical volume context hybrid SAVCM to capture the remaining image semantic information along the third view; finally uses view-aware aggregator to learn from full-view The rich semantics are aggregated, and finally the 3D medical image feature Z l+1 output of the Transformer-MLP block is obtained, which is used as the input feature of the next Transformer-MLP block.
  • the three views are calculated in parallel, and the full-view slice space shuffling block network and the slice-aware medical volume context mixing block network share parameters in each view, that is, the feature extraction network corresponding to the same network parameters is used respectively. Semantic feature extraction is performed on the two-dimensional image features under each viewing angle, and the semantic features of the image under different viewing angles are obtained.
  • the context-aware ability of 3D medical image features is realized, and the inductive bias is greatly enhanced.
  • the accuracy of 3D medical image recognition is improved, and the computationally intensive 3D convolutional neural network (3D CNN) and pure visual transformation (Transformer) are replaced by the streamlined local vision Transformer-MLP computing unit, thereby reducing the Computational complexity, improve recognition efficiency.
  • the feature extraction process includes a feature encoding process or a feature decoding process, and the extraction process includes a self-attention processing process, wherein the self-attention processing process is based on Q, K, and V for self-attention calculation.
  • the features of the feature encoding process (implemented by the encoder) and the features of the feature decoding process (implemented by the decoder) are fused to obtain the Q, K, V values.
  • the K value in the t-th round of feature decoding is obtained based on the K value in the t-1th round of feature decoding and the K value in the corresponding feature encoding process
  • the V in the t-th round of feature decoding The value is obtained based on the V value in the t-1 round of feature decoding and the V value in the corresponding feature encoding process
  • the Q value in the t-th round of decoding process is the Q value in the t-1 round of feature decoding.
  • the resolution of the input feature of the t-th round of feature decoding is the same as that of the output feature of the corresponding encoding process, that is, image features with the same resolution are skip-connected and fused.
  • the resolution corresponding to the second round of feature decoding process is 4C ⁇ H/16 ⁇ W/16 ⁇ D/16
  • the resolution corresponding to the feature encoding process of skip connection fusion is also 4C ⁇
  • the features of the second round of feature decoding input that is, the features after upsampling the first round of feature decoding output features
  • the output features of the last round of feature encoding process are skip-connected and fused.
  • E v and D v are convolved with a standard convolution (PWConv2D) with a Kernel Size of 1.
  • PWConv2D a standard convolution
  • the original channel number of the decoder feature D v is divided into 3 parts by using PWConv2D, and the Q value of the decoder is obtained K value and the value of V As shown in formula (15):
  • CrossMerge represents a skip connection fusion operation.
  • a skip-connection fusion network is introduced to perform skip-connection fusion on the features corresponding to the encoder and the decoder, thereby fusing multi-scale information and enriching the semantic learning of image features.
  • Fig. 11 is a structural block diagram of a three-dimensional medical image recognition device provided in an embodiment of the present application. As shown in Fig. 11, the device includes:
  • the viewing angle rearrangement module 1101 is configured to perform viewing angle rearrangement processing on the i-1th round of 3D medical image features in the i-th round of feature extraction to obtain two-dimensional image features, wherein the i-1th round of The three-dimensional medical image feature is the feature obtained by performing the i-1th round of feature extraction on the three-dimensional medical image, and the different two-dimensional image features are the features of the i-1th round of three-dimensional medical image features under different viewing angles; feature extraction The module 1102 is configured to perform semantic feature extraction processing on each of the two-dimensional image features to obtain image semantic features under different viewing angles; the feature fusion module 1103 is configured to perform feature fusion processing on the image semantic features under different viewing angles, The i-th round of three-dimensional medical image features are obtained; the image recognition module 1104 is configured to perform image recognition processing based on the first round of three-dimensional medical image features obtained by the first round of feature extraction, and obtain the image recognition result of the three-dimensional medical image, wherein , i is an increasing positive integer,
  • the feature extraction module 1102 includes:
  • the first extraction unit is configured to perform spatial feature extraction processing on the two-dimensional image features to obtain two-dimensional image spatial features
  • the second extraction unit is configured to perform semantic feature extraction processing on the two-dimensional image spatial features based on the main viewing angle and the auxiliary viewing angle, to obtain the image semantic features, wherein the main viewing angle is the corresponding two-dimensional image feature Viewing angle, the auxiliary viewing angle is a viewing angle different from the main viewing angle in the three-dimensional viewing angle.
  • the first extraction unit is further configured to: perform window division processing on the two-dimensional image features to obtain local two-dimensional image features corresponding to N windows, wherein the N windows are mutually No overlapping, N is a positive integer greater than 1; feature extraction processing is performed on the N local two-dimensional image features to obtain two-dimensional image window features; window rearrangement processing is performed on the N windows, and the window rearrangement
  • the two-dimensional image window features corresponding to the last N windows are subjected to feature extraction processing to obtain two-dimensional image space features, wherein the window rearrangement is used to change the spatial positions of the N windows.
  • the first extraction unit is further configured to:
  • the first extraction unit is further configured to:
  • the feature extraction process is performed on the interaction feature of the second image window by a multi-layer perceptron MLP to obtain the space feature of the two-dimensional image.
  • the first extraction unit is further configured to:
  • Self-attention processing is performed based on the query item Q, key item K, and value item V corresponding to the local two-dimensional image features to obtain N self-attention features of the local two-dimensional image features.
  • the feature extraction process includes a feature encoding process or a feature decoding process
  • the K value in the t-th round of feature decoding process is based on the K value in the t-1th round of feature decoding, and the corresponding feature encoding process
  • the K value in the t-th round of feature decoding process is obtained by fusing the V value in the t-th round of feature decoding based on the V value in the t-1-th round of feature decoding and the V value in the corresponding feature encoding process.
  • the Q value in the process is the Q value in the t-1th round of feature decoding.
  • the second extraction unit is further configured to:
  • the feature fusion module 1103 further includes:
  • the first fusion unit is configured to perform fusion processing on the semantic feature of the image and the perspective feature to obtain the semantic feature of the perspective image;
  • the second fusion unit is configured to perform feature fusion processing on the semantic features of each view image to obtain the i-th round of 3D medical image features.
  • the feature extraction module 1102 is further configured to:
  • the feature extraction networks corresponding to the same network parameters are respectively used to perform semantic feature extraction processing on the two-dimensional image features under each viewing angle, so as to obtain the image semantic features under different viewing angles.
  • the feature extraction process includes a feature encoding process or a feature decoding process
  • the feature encoding process includes a down-sampling process for 3D medical image features
  • the feature decoding process includes up-sampling for 3D medical image features process.
  • the image recognition module 1104 also includes:
  • a determining unit configured to determine the extracted three-dimensional medical image features as the first round of three-dimensional medical image features obtained by the first round of feature extraction when the upsampling result reaches the original size
  • the recognition unit is configured to perform image recognition processing based on the features of the first round of 3D medical images to obtain the image recognition result.
  • the three-dimensional medical image is a CT image, an MRI image or a PET image.
  • the perspective rearrangement is first performed on the three-dimensional medical image features, so as to divide them into two-dimensional image features under different perspectives, and the two-dimensional images are respectively Feature extraction is performed to obtain image semantic features under different viewing angles, and then the image semantic features under different viewing angles are fused to obtain 3D medical image features after feature extraction.
  • the embodiment of the present application uses a simplified local computing unit to perform different viewing angles. Feature extraction can reduce computational complexity, thereby improving the recognition efficiency of 3D medical images.
  • the device provided by the above-mentioned embodiment is only illustrated by the division of the above-mentioned functional modules.
  • the above-mentioned function distribution can be completed by different functional modules according to the needs, that is, the internal structure of the device is divided into Different functional modules to complete all or part of the functions described above.
  • the device provided by the above embodiment and the method embodiment belong to the same idea, and the implementation process thereof is detailed in the method embodiment, and will not be repeated here.
  • the computer device 1200 includes a central processing unit (CPU, Central Processing Unit) 1201, a system memory 1204 including a random access memory 1202 and a read-only memory 1203, and a system connecting the system memory 1204 and the central processing unit 1201 Bus 1205.
  • the computer device 1200 also includes a basic input/output system (I/O system, Input/Output) 1206 that helps to transmit information between various devices in the computer, and is used to store an operating system 1213, an application program 1214 and other program modules 1215 of the mass storage device 1207.
  • I/O system Input/Output
  • the basic input/output system 1206 includes a display 1208 for displaying information and input devices 1209 such as a mouse and a keyboard for users to input information. Both the display 1208 and the input device 1209 are connected to the central processing unit 1201 through the input and output controller 1210 connected to the system bus 1205 .
  • the basic input/output system 1206 may also include an input-output controller 1210 for receiving and processing input from keyboards, mice, or electronic stylus and other devices. Similarly, input output controller 1210 also provides output to a display screen, printer, or other type of output device.
  • the mass storage device 1207 is connected to the central processing unit 1201 through a mass storage controller (not shown) connected to the system bus 1205 .
  • the mass storage device 1207 and its associated computer-readable media provide non-volatile storage for the computer device 1200 . That is, the mass storage device 1207 may include a computer-readable medium (not shown) such as a hard disk or drive.
  • Computer-readable media may comprise computer storage media and communication media.
  • Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.
  • Computer storage media include random access memory (RAM, Random Access Memory), read-only memory (ROM, Read Only Memory), flash memory or other solid-state storage technologies, read-only discs (CD-ROM, Compact Disc Read-Only Memory ), Digital Versatile Disc (DVD, Digital Versatile Disc) or other optical storage, tape cartridge, tape, magnetic disk storage or other magnetic storage device.
  • RAM Random Access Memory
  • ROM Read Only Memory
  • flash memory or other solid-state storage technologies
  • read-only discs CD-ROM, Compact Disc Read-Only Memory
  • DVD Digital Versatile Disc
  • DVD Digital Versatile Disc
  • the computer storage medium is not limited to the above-mentioned ones.
  • the above-mentioned system memory 1204 and mass storage device 1207 may be
  • the memory stores one or more programs, one or more programs are configured to be executed by one or more central processing units 1201, one or more programs include instructions for implementing the above method, and the central processing unit 1201 executes the one or more Multiple programs implement the methods provided by the above method embodiments.
  • the computer device 1200 can also run on a remote computer connected to the network through a network such as the Internet. That is, the computer device 1200 can be connected to the network 1212 through the network interface unit 1211 connected to the system bus 1205, or in other words, the network interface unit 1211 can also be used to connect to other types of networks or remote computer systems (not shown ).
  • the memory also includes one or more programs, the one or more programs are stored in the memory, and the one or more programs include the steps executed by the computer device in the method provided by the embodiment of the present application .
  • the embodiment of the present application also provides a computer-readable storage medium, at least one instruction, at least one program, code set or instruction set is stored in the readable storage medium, at least one instruction, at least one program, code set or instruction set is composed of
  • the processor loads and executes the method to realize the recognition method of the three-dimensional medical image described in any one of the above embodiments.
  • An embodiment of the present application provides a computer program product or computer program, where the computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer instruction from the computer-readable storage medium, and the processor executes the computer instruction, so that the computer device executes the method for recognizing a three-dimensional medical image provided by the above aspect.
  • the medium may be a computer-readable storage medium included in the memory in the above embodiments; or a computer-readable storage medium that exists independently and is not assembled into the terminal.
  • the computer-readable storage medium stores at least one instruction, at least one section of program, code set or instruction set, and the at least one instruction, said at least one section of program, said code set or instruction set are loaded and executed by a processor to realize The recognition method of a three-dimensional medical image described in any one of the above method embodiments.
  • the computer-readable storage medium may include: ROM, RAM, solid state drives (SSD, Solid State Drives) or optical discs, etc.
  • the RAM may include resistive random access memory (ReRAM, Resistance Random Access Memory) and dynamic random access memory (DRAM, Dynamic Random Access Memory).
  • ReRAM resistive random access memory
  • DRAM Dynamic Random Access Memory
  • the program can be stored in a computer-readable storage medium.
  • the above-mentioned The storage medium mentioned may be a read-only memory, a magnetic disk or an optical disk, and the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Radiology & Medical Imaging (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Computer Graphics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

一种三维医学图像的识别方法、装置、设备、计算机可读存储介质及计算机程序产品,涉及人工智能领域。方法包括:在第i轮特征提取过程中,对第i-1轮的三维医学图像特征进行视角重排处理,得到二维图像特征,其中,第i-1轮的三维医学图像特征是对三维医学图像进行第i-1轮特征提取得到的特征、且不同的二维图像特征是第i-1轮的三维医学图像特征在不同视角下的特征;对各个二维图像特征进行语义特征提取处理,得到不同视角下的图像语义特征;对不同视角下的图像语义特征进行特征融合处理,得到第i轮的三维医学图像特征;基于第Ⅰ轮特征提取得到的第Ⅰ轮的三维医学图像特征进行图像识别处理,得到三维医学图像的图像识别结果,其中,i 为依次递增的正整数,1<i≤I,Ⅰ为正整数。

Description

三维医学图像的识别方法、装置、设备、存储介质及产品
相关申请的交叉引用
本申请实施例基于申请号为202210191770.3、申请日为2022年02月28日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请实施例作为参考。
技术领域
本申请实施例涉及人工智能领域,特别涉及一种三维医学图像的识别方法、装置、设备、计算机可读存储介质及计算机程序产品。
背景技术
在医学领域中,利用计算机视觉技术对三维医学图像进行识别,有助于对疾病病情状态进行预测。
目前,在对三维医学图像进行识别过程中,可利用图像密集预测方法对三维医学图像进行图像分析,其中,密集预测方法是指对图像中每个像素进行预测的方法。相关技术中,在对三维医学图像进行密集预测时,将基于整体三维医学图像进行图像识别,得到图像识别结果。
然而,基于三维医学图像直接进行图像识别的方式计算量较大,识别效率较低,且还需大量数据进行预训练,方式较为复杂。
发明内容
本申请实施例提供了一种三维医学图像的识别方法、装置、设备、计算机可读存储介质及计算机程序产品,可提高三维医学图像的识别效率,降低计算复杂度。所述技术方案如下:
本申请实施例提供了一种三维医学图像的识别方法,由计算机设备执行,所述方法包括:
在第i轮特征提取过程中,对第i-1轮的三维医学图像特征进行视角重排处理,得到二维图像特征,其中,所述第i-1轮的三维医学图像特征是对三维医学图像进行第i-1轮特征提取得到的特征、且不同的二维图像特征是所述第i-1轮的三维医学图像特征在不同视角下的特征;
对各个所述二维图像特征进行语义特征提取处理,得到不同视角下的图像语义特征;
对不同视角下的所述图像语义特征进行特征融合处理,得到第i轮的三维医学图像特征;
基于第I轮特征提取得到的第I轮的三维医学图像特征进行图像识别处理,得到所述三维医学图像的图像识别结果,其中,i为依次递增的正整数,1<i≤I,I为正整数。
本申请实施例提供了一种三维医学图像的识别装置,所述装置包括:
视角重排模块,配置为在第i轮特征提取过程中,对第i-1轮的三维医学图像特征进行视角重排处理,得到二维图像特征,其中,所述第i-1轮的三维医学图像特征是对三维医学图像进行第i-1轮特征提取得到的特征、且不同的二维图像特征是所述第i-1轮的三维医学图像特征在不同视角下的特征;
特征提取模块,配置为对各个所述二维图像特征进行语义特征提取处理,得到不同视角下的图像语义特征;
特征融合模块,配置为对不同视角下的所述图像语义特征进行特征融合处理,得到第i轮的三维医学图像特征;
图像识别模块,配置为基于第I轮特征提取得到的第I轮的三维医学图像特征进行图像识别处理,得到所述三维医学图像的图像识别结果,其中,i为依次递增的正整数,1<i≤I,I为正整数。
本申请实施例提供了一种计算机设备,所述计算机设备包括处理器和存储器,所述存储器中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由所述处理器加载并执行以实现如上述方面所述的三维医学图像的识别方法。
本申请实施例提供了一种计算机可读存储介质,所述可读存储介质中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由处理器加载并执行以实现如上述方面所述的三维医学图像的识别方法。
本申请实施例提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行上述方面提供的三维医学图像的识别方法。
本申请实施例提供的技术方案带来的有益效果至少包括:
本申请实施例中,在每个特征提取阶段中,首先对三维医学图像特征进行视角重排,从而将其划分为不同视角下的二维图像特征,并分别对二维图像特征进行特征提取,得到不同视角下的图像语义特征,从而对不同视角下的图像语义特征进行融合,得到特征提取后的三维医学图像特征。在该过程中,由于通过对不同视角下的二维图像特征进行特征提取,相较于相关技术中直接提取三维图像特征进行图像识别的方式,本申请实施例通过精简的局部计算单元进行不同视角的特征提取,可降低计算复杂度,从而提高三维医学图像的识别效率。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1示出了本申请实施例提供的三维医学图像识别方法的原理示意图;
图2示出了本申请实施例提供的实施环境的示意图;
图3示出了本申请实施例提供的三维医学图像的识别方法的流程图;
图4示出了本申请实施例提供的三维医学图像的识别方法的流程图;
图5示出了本申请实施例提供的整体图像识别结构的结构示意图;
图6示出了本申请实施例示出的空间特征提取过程的结构示意图;
图7示出了本申请实施例示出的语义特征提取过程的结构示意图;
图8示出了本申请实施例示出的特征融合过程的结构示意图;
图9示出了本申请实施例示出的TR-MLP网络的结构示意图;
图10示出了本申请实施例示出的跳连融合网络的结构示意图;
图11示出了本申请实施例提供的三维医学图像的识别装置的结构框图;
图12示出了本申请实施例提供的计算机设备的结构示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
人工智能(AI,Artificial Intelligence)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说,人工智能是计算机科学的一个综合技术,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式做出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。
人工智能技术是一门综合学科,涉及领域广泛,既有硬件层面的技术也有软件层面的技术。人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统、机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、语音处理技术、自然语言处理技术以及机器学习/深度学习等几大方向。
计算机视觉技术(CV,Computer Vision)是一门研究如何使机器“看”的科学,更进一步的说,就是指用摄影机和电脑代替人眼对目标进行识别和测量等机器视觉,并进一步做图形处理,使电脑处理成为更适合人眼观察或传送给仪器检测的图像。作为一个科学学科,计算机视觉研究相关的理论和技术,试图建立能够从图像或者多维数据中获取信息的人工智能系统。计算机视觉技术通常包括图像处理、图像识别、图像分割、图像语义理解、图像检索、视频处理、视频语义理解、视频内容/行为识别、三维物体重建、3D技术、虚拟现实、增强现实、同步定位与地图构建等技术,还包括常见的人脸识别、指纹识别等生物特征识别技术。
本申请实施例涉及的三维医学图像的识别方法,即计算机视觉技术在图像识别领域的应用,通过对不同视角下三维医学图像特征对应的二维图像特征分别进行特征提取,可降低计算复杂度,提高三维医学图像识别效率。
示意性的,如图1所示,在第i轮特征提取过程中,首先对第i-1轮特征提取得到的第i-1轮的三维医学图像特征101进行视角重排,分别得到第一视角下的第一二维图像特征102、第二视角下的第二二维图像特征103以及第三视角下的第三二维图像特征104,分别对不同视角下的第一二维图像特征102、第二二维图像特征103以及第三二维图像特征104进行语义特征提取,得到第一图像语义特征105、第二图像语义特征106以及第三图像语义特征107,从而对三者进行融合得到第i轮的三维图像语义特征108。
由于将三维医学图像特征分解为不同视角下的二维图像特征,从而对二维图像特征进行特征提取,因此,有助于降低计算量,从而提高三维医学图像的识别效率。
本申请实施例提供的方法,可应用于对任意三维医学图像的图像识别过程中。示意性的,可识别三维医学图像中各部位所属类别,从而辅助对病灶与器官的分析。
本申请实施例提供的用于三维医学图像识别的计算机设备可以是各种类型的终端设备或服务器,其中,服务器可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式系统,还可以是提供云计算服务的云服务器;终端可以是智能手机、平板电脑、笔记本电脑、台式计算机、智能音箱、智能手表等,但并不局限于此。
以服务器为例,例如可以是部署在云端的服务器集群,向用户开放人工智能云服务(AiaaS,AI as a Service),AIaaS平台会把几类常见的AI服务进行拆分,并在云端提 供独立或者打包的服务,这种服务模式类似于一个AI主题商城,所有的用户都可以通过应用程序编程接口的方式来接入使用AIaaS平台提供的一种或者多种人工智能服务。
例如,其中的一种人工智能云服务可以为三维医学图像识别服务,即云端的服务器封装有本申请实施例提供的三维医学图像识别的程序。用户通过终端(运行有客户端,例如病灶分析客户端等)调用云服务中的三维医学图像识别服务,以使部署在云端的服务器调用封装的三维医学图像识别的程序,将三维医学图像特征分解为不同视角下的二维图像特征,从而对二维图像特征进行特征提取,以进行三维医学图像的识别,得到图像识别结果,后续基于图像识别结果辅助医生、研究人员进行疾病的诊断、复诊和治疗方法的研究,例如,基于图像识别结果包括的水肿指标进行辅助诊断,以确定目标对象是否可能是炎症或者外伤或者过敏、水喝多了。
需要说明的是,本申请实施例提供的三维医学图像的识别方法不是以获得疾病诊断结果或者健康状况为直接目的,不能根据图像识别结果直接获得疾病的诊断结果或健康状况,即图像识别结果不直接用于疾病诊断,仅作为中间数据,以辅助患者进行疾病的预测,辅助医生、研究人员进行疾病的诊断、复诊和治疗方法的研究
图2示出了本申请实施例提供的实施环境的示意图。该实施环境中包括终端210和服务器220。其中,终端210与服务器220之间通过通信网络进行数据通信,在一些实施例中,通信网络可以是有线网络也可以是无线网络,且该通信网络可以是局域网、城域网以及广域网中的至少一种。
终端210是运行有三维医学图像识别程序的电子设备,该电子设备可以是智能手机、平板电脑或个人计算机等等,本申请实施例并此不作限定。当需要对三维医学图像进行识别时,可将三维医学图像输入终端210的程序中,终端210将三维医学图像上传至服务器220,由服务器220执行本申请实施例提供的三维医学图像的识别方法进行图像识别,并反馈图像识别结果至终端210。
服务器220可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式系统,还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、内容分发网络(CDN,Content Delivery Network)、以及大数据和人工智能平台等基础云计算服务的云服务器。
在一些实施例中,服务器220用于为终端210中安装的应用程序提供图像识别服务。在一些实施例中,服务器220中设置有图像识别网络,用于对终端210发送的三维医学图像进行分类。
当然,在一些实施例中,图像识别网络也可以部署在终端210侧,由终端210在本地实现本申请实施例提供的三维医学图像的识别方法(即图像识别网络),无需借助服务器220,相应的,图像识别网络可在终端210侧完成训练,本申请实施例对此不作限定。为了方便表述,下述各个实施例以三维医学图像的识别方法由计算机设备执行为例进行说明。
请参考图3,其示出了本申请实施例提供的三维医学图像的识别方法的流程图,该方法包括如下步骤。
步骤301,在第i轮特征提取过程中,对第i-1轮的三维医学图像特征进行视角重排处理,得到二维图像特征,其中,第i-1轮的三维医学图像特征是对三维医学图像进行第i-1轮特征提取得到的特征,且不同二维图像特征是第i-1轮的三维医学图像特征在不同视角下的特征。
其中,三维医学图像特征是对待识别三维医学图像提取得到的特征。待识别三维医学图像可为计算机断层扫描(CT,Computed Tomography)图像、磁共振成像(MRI,Magnetic Resonance Imaging)或正电子发射断层成像(PET,Positron Emission Computed  Tomography)等三维医学图像。
其中,第1轮的三维医学图像特征为通过对初始三维医学图像特征进行特征提取得到的特征,初始三维医学图像特征为对三维医学图像进行初始嵌入处理得到的特征。其中,初始嵌入处理用于将三维医学图像这种高维数据映射到低维空间,从而得到低维度的初始三维医学图像特征。
本申请实施例中,通过多轮特征提取过程,进行三维医学图像的识别。其中,每轮特征提取过程均采用相同的特征提取网络进行特征提取,且每轮特征提取过程中,特征提取网络的输入是根据上一轮特征提取网络的输出结果确定得到,即在第i轮特征提取过程中,是基于第i-1轮的三维医学图像特征进行特征提取的。
由于三维医学图像特征为3D数据,直接对整体三维医学图像特征进行特征提取时,计算量较大,过程较为复杂。因此,本申请实施例中,在每轮特征提取过程中,首先对三维医学图像特征进行划分,即在第i轮特征提取过程中,对第i-1轮特征提取得到的特征进行视角重排。其中,视角重排是将三维医学图像特征划分为不同视角下的二维图像特征,从而基于不同视角下的二维图像特征进行特征提取,降低计算复杂度。
在一些实施例中,视角重排的处理过程通过以下方法实现:对第i-1轮的三维医学图像特征的多个维度进行视觉重排处理,得到多个视角下的二维图像特征,即对第i-1轮的三维医学图像特征的多个维度进行排列组合处理,得到多个视角,分别提取各个视角下的二维图像特征。
在一种实施方式中,对第i-1轮的三维医学图像特征的(H,W,D)维度进行视角重排,得到(H,W),(H,D)以及(W,D)三个视角下的二维图像特征,每个视角对应三维医学图像特征中的一个二维方向。不同的二维图像特征即为不同二维图像切片对应的图像特征,其中,二维图像切片为三维医学图像进行视觉重排后,在二维空间上的二维图像。
需要说明的是,在第i轮特征提取过程中,基于第i-1轮的三维医学图像特征进行特征提取时,可能存在对第i-1轮的三维医学图像特征进行上采样或下采样的过程,此时,将对上采样或下采样后的第i-1轮的三维医学图像特征进行视角重排,得到二维图像特征。
步骤302,对各个二维图像特征进行语义特征提取处理,得到不同视角下的图像语义特征。
例如,当得到各个二维图像特征后,将对二维图像特征进行语义特征提取,从而学习对应二维图像切片中的图像信息。其中,对二维图像特征进行语义特征提取的过程包括对二维图像切换空间信息的学习以及基于对应视角的图像语义学习。
在对各个二维图像特征进行语义特征提取后,可得到不同视角下分别对应的图像语义特征。即得到(H,W),(H,D)以及(W,D)三个视角下分别对应的图像语义特征。
步骤303,对不同视角下的图像语义特征进行特征融合处理,得到第i轮的三维医学图像特征。
在一种实施方式中,当得到不同视角下的图像语义特征后,可将不同视角下的图像语义特征进行融合,从而完成该轮特征提取过程,得到第i轮的三维医学图像特征,再基于第i轮的三维医学图像特征进行第i+1轮三维医学图像特征的特征提取过程。
在本申请实施例中,通过对不同视角下的图像语义特征进行特征融合,实现全视角学习的丰富语义的聚合,从而完成三维医学图像特征的特征学习过程。
步骤304,基于第I轮特征提取得到的第I轮的三维医学图像特征进行图像识别处理,得到三维医学图像的图像识别结果,其中,i为依次递增的正整数,1<i≤I,I为正 整数。
通过多轮特征提取过程中,将结束特征提取过程,在第I轮特征提取过程结束后,基于第I轮的三维医学图像特征进行图像识别。
综上所述,本申请实施例中,在每个特征提取阶段中,首先对三维医学图像特征进行视角重排,从而将其划分为不同视角下的二维图像特征,并分别对二维图像特征进行特征提取,得到不同视角下的图像语义特征,从而对不同视角下的图像语义特征进行融合,得到特征提取后的三维图像语义特征。在该过程中,由于通过对不同视角下的二维图像特征进行特征提取,相较于相关技术中直接提取三维图像特征的方式,本申请实施例通过精简的局部计算单元进行不同视角的特征提取,可降低计算复杂度,从而提高三维医学图像的识别效率。
在一些实施例中,对不同视角下的二维图像特征进行特征提取过程中,将对各个二维图像特征进行划分,从而进行局部窗口对应特征的学习,以及对各个二维图像特征对应切片的上下文特征进行学习,从而得到不同视角下的图像语义特征,下面将以示例性实施例进行说明。
请参考图4,其示出了本申请实施例提供的三维医学图像方法的流程图,该方法包括如下步骤。
步骤401,在第i轮特征提取过程中,对第i-1轮的三维医学图像特征进行视角重排处理,得到二维图像特征。
当获取到三维医学图像后,首先对三维医学图像进行初始嵌入处理(Patch Embedding),例如,可利用卷积茎(Convolutional Stem)结构进行初步嵌入处理,得到初始三维医学图像特征,之后以初始三维医学图像特征为起点,进行多轮特征提取过程,其中,Convolutional Stem为卷积神经网络的初始卷积层。其中,初始嵌入处理用于将三维医学图像这种高维数据映射到低维空间,从而得到低维度的初始三维医学图像特征。
本申请实施例中,特征提取过程包括特征编码过程以及特征解码过程,其中,特征编码过程包括对三维医学图像特征的下采样过程,即减小三维医学图像特征的维度,而特征解码过程包含对三维医学图像特征的上采样过程,即增加三维医学图像特征的维度,其中,下采样过程使用内核大小(Kernel Size)为3,步长(Stride)为2的3D卷积,每次降采样两倍,而上采样过程使用Kernel Size为2、Stride为2的3D转置卷积,每次上采样两倍。进行多轮特征编码以及特征解码后,利用得到的三维医学图像特征进行医学图像的识别。其中,每轮特征提取过程均采用相同的变压多层感知机(TR-MLP,Transformer-Multilayer Perceptron)结构实现。
示意性的,如图5所示,输入大小为C i×H×W×D的三维医学图像,首先进行初始嵌入处理(Patch Embedding)501,其中,图像块(Patch)大小为2×2,得到C×H/4×W/4×D/4的三维医学图像特征,将C×H/4×W/4×D/4的三维医学图像特征输入第一个TR-MLP块(Block)中进行第1轮特征提取,在第1轮特征提取结束后,将得到的第1轮的三维医学图像特征进行下采样,得到2C×H/8×W/8×D/8的三维医学图像特征,并将2C×H/8×W/8×D/8的三维医学图像特征输入第二个TR-MLPBlock中进行第2轮特征提取,得到第2轮三维医学图像特征,之后,直接将第2轮三维医学图像特征输入第三个TR-MLP Block中进行第3轮特征提取,第3轮结束后再次将得到的第3轮的三维医学图像特征进行下采样,直至下采样至8C×H/32×W/32×D/32,再进行上采样过程。其中,在TR-MLP Block 502中进行的特征提取过程以及之前TR-MLP Block中特征提取过程为特征编码过程,而之后则为特征解码过程。
需要说明的是,每一轮的特征编码过程或者特征解码过程均是通过视觉重排处理、语义特征处理、特征融合处理实现。
需要说明的是,图3中的步骤302可通过图4中的步骤402-步骤403实现。
步骤402,对二维图像特征进行空间特征提取处理,得到二维图像空间特征。
在得到各个视角对应的二维图像特征后,首先对二维图像特征进行空间特征提取,其中,空间特征提取过程即为对各个对应二维图像切片的特征学习的过程。其中,基于三个视角进行空间特征提取的过程中,网络参数共享,即网络参数相同。该过程可包括步骤402a-步骤402c(图中未示出):
步骤402a,对二维图像特征进行窗口划分处理,得到N个窗口分别对应的局部二维图像特征,其中,N个窗口互不重叠,N为大于1的正整数。
在该过程中,主要利用基于窗口的多头自注意力(W-MSA,Window-Multi-head Self-Attention)网络结构对二维图像切片中长距离与局部空间语义信息进行建模。其中,在利用W-MSA网络结构对二维图像特征进行处理时,首先对二维图像特征Z进行窗口划分处理,将其划分为N个互不重叠的窗口对应的局部二维图像特征Z i,划分过程可如公式(1)所示:
Z={Z 1,Z 2,…,Z N},N=HW/M 2   (1)
其中,M是W-MSA设置的窗口大小,HW是指二维图像特征的尺寸大小,即为(H,W)视角下切分得到的二维图像尺寸大小。
之后,基于窗口进行注意力计算,得到输出结果,即局部二维图像空间特征。
需要说明的是,注意力处理是通过注意力机制实现的。在认知科学中,注意力机制(Attention Mechanism)用于选择性地关注所有信息的一部分,同时忽略其他信息。注意力机制可以使得神经网络具备专注于部分输入的能力,即选择特定的输入。在计算能力有限情况下,注意力机制是解决信息超载问题的主要手段的一种资源分配方案,将计算资源分配给更重要的任务。其中,本申请实施例并不局限于注意力机制的形式,例如注意力机制可以是多头注意力、键值对注意力、结构化注意力等。
步骤402b,对N个局部二维图像特征进行特征提取处理,得到二维图像窗口特征。
当得到N个互不重叠的窗口分别对应的局部二维图像特征Z i后,对各个局部二维图像特征进行特征提取,得到N个二维图像窗口特征。其中,该特征提取处理的方式包括如下步骤:
步骤一、对N个局部二维图像特征进行自注意力处理,得到N个局部二维图像特征的自注意力特征。
需要说明的是,首先分别对每个局部二维图像特征进行自注意力处理,其中,自注意力处理过程为多头自注意力处理。每个局部二维图像特征对应多个自注意力头。
例如,基于局部二维图像特征对应的查询项Q、健项K以及值项V进行自注意力处理,得到N个所述局部二维图像特征的自注意力特征。
其中,第k个自注意力头对应的查询项(Q,Query)、键项(K,Key)以及值项(V,Value)分别为
Figure PCTCN2022139576-appb-000001
其中,k为大于1的正整数,则第i个窗口对应的局部二维图像特征Z i的第k个自注意力特征计算方式如公式(2)所示:
Figure PCTCN2022139576-appb-000002
其中,RPE为相对位置编码信息,即窗口位置编码,表示窗口可感知的空间位置信息。
则第k个自注意力头对应的自注意力特征包含N个窗口所对应的特征,如公式(3)所示:
Figure PCTCN2022139576-appb-000003
步骤二、对N个局部二维图像特征的自注意力特征进行特征融合处理,得到第一图 像窗口内部特征。
当得到各个窗口对应的各个自注意力头对应的自注意力特征后,将所有自注意力头对应的自注意力特征进行合并,且通过参数矩阵进行线性映射,以实现特征融合处理,得到对应的第一图像窗口内部特征,方式如公式(4)所示:
W-MSA(Z)=Concat[H 1,H 2,…,H k]W H   (4)
其中,W H即为参数矩阵,Concat表示合并操作。
在一些实施例中,在基于W-MSA结构进行自注意力处理之前,首先需对来自视角v的第l个局部二维图像特征
Figure PCTCN2022139576-appb-000004
进行归一化处理,例如,归一化处理可采用批量归一化(BN,Batch Normalization)方式进行。其中,视角v即为视角(H,W),(H,D)以及(W,D)中的其中一种。归一化处理后,将归一化后的局部二维图像特征
Figure PCTCN2022139576-appb-000005
输入W-MSA结构中进行自注意力处理。
示意性的,如图6所示,首先将
Figure PCTCN2022139576-appb-000006
进行BN处理之后,再输入至W-MSA中进行自注意力处理,且W-MSA包含残差结构,即将W-MSA输出结果与原始输入特征
Figure PCTCN2022139576-appb-000007
进行融合,以实现特征融合处理,得到第一图像窗口内部特征
Figure PCTCN2022139576-appb-000008
(即积处理后的特征),如公式(5)所示:
Figure PCTCN2022139576-appb-000009
步骤三、对第一图像窗口内部特征进行卷积处理,得到第一图像窗口交互特征。
其中,利用W-MSA结构是对划分的各个局部二维图像特征的特征学习,而为进一步加强对二维图像特征的学习,将利用Kernel Size为5的深度可分离卷积块(DWConv2D)的结构进行卷积处理,从而增加空间相邻窗口间局部性的学习。例如,将第一图像窗口内部特征输入DWConv2D网络中进行卷积处理,得到第一图像窗口交互特征。
在一些实施例中,DWConv2D同样可以包含残差结构,即将卷积处理后的第一图像窗口内部特征与第一图像窗口内部特征进行融合,得到第一图像窗口交互特征
Figure PCTCN2022139576-appb-000010
如公式(6)所示:
Figure PCTCN2022139576-appb-000011
示意性的,如图6所示,将第一图像窗口内部特征
Figure PCTCN2022139576-appb-000012
输入DWConv2D进行卷积处理,并将卷积处理后的特征与
Figure PCTCN2022139576-appb-000013
进行融合,得到第一图像窗口交互特征
Figure PCTCN2022139576-appb-000014
步骤四、通过多层感知机MLP对第一图像窗口交互特征进行特征提取处理,得到二维图像窗口特征。
为进一步加强在对应视角下二维图像切换的学习,对卷积处理后的第一图像窗口交互特征利用BN进行归一化处理,并使用多层感知机(MLP,Multilayer Perceptron)学习通道特征,即对应视角下的二维图像切片的特征,从而得到二维图像窗口特征
Figure PCTCN2022139576-appb-000015
如公式(7)所示:
Figure PCTCN2022139576-appb-000016
其中,MLP表示多层感知机结构。
步骤402c,对N个窗口进行窗口重排处理,并对窗口重排后的N个窗口分别对应的二维图像窗口特征进行特征提取处理,得到二维图像空间特征,其中,窗口重排用于改变N个窗口的空间位置。
在利用W-MSA结构进行窗口自注意力学习后,还需学习跨窗口间的图像特征信息。因此,在一种可能的实施方式中,对N个窗口进行窗口重排,从而再次对窗口重排后的二维图像窗口特征进行学习。
例如,可利用洗牌操作(Shuffle)进行窗口重排,从而对空间信息进行打乱,增强跨窗口信息间的交互。窗口重排后,对N个窗口对应的二维图像窗口特征进行学习,得 到最终的二维图像空间特征。其中,该方式可包括如下步骤:
步骤一、对窗口重排后的N个窗口分别对应的二维图像窗口特征进行自注意力处理,得到N个窗口分别对应的自注意力特征。
首先对窗口重排后的N个窗口各自对应的二维图像窗口特征进行自注意力处理,得到自注意力特征。其中,方式可参考上述步骤,在此不再赘述。
步骤二、对N个自注意力特征进行特征融合处理,得到第二图像窗口内部特征。
其中,特征融合得到第二图像窗口内部特征的过程可参考融合得到第一图像窗口内部特征的过程,在此不再赘述。
步骤三、对第二图像窗口内部特征进行位置翻转处理,并对位置翻转后的第二图像窗口内部特征进行卷积处理,得到第二图像窗口交互特征。
例如,重新对窗口进行位置打乱,从而再利用W-MSA结构进行一次窗口自注意力学习,增强跨窗口间信息学习,之后,再对第二图像窗口内部特征进行位置翻转,即恢复各个窗口对应的位置信息,将其恢复至原始位置,得到第二图像窗口交互特征。
示意性的,如图6所示,首先对二维图像窗口特征进行BN归一化处理之后,进行窗口重排操作(Transpose),并基于W-MSA结构对窗口重排后的N个窗口分别对应的二维图像窗口特征进行特征学习(包括自注意力处理、特征融合处理),得到第二图像窗口交互特征,并再次对N个窗口进行位置翻转,恢复各个窗口对应的位置信息,如公式(8)所示:
Figure PCTCN2022139576-appb-000017
其中,
Figure PCTCN2022139576-appb-000018
表示对第二图像窗口交互特征进行位置翻转后的特征,即位置翻转后的第二图像窗口内部特征,T表示窗口重排操作,R表示位置翻转操作,
Figure PCTCN2022139576-appb-000019
表示二维图像窗口特征。
而在进行位置翻转后,再次利用DWConv2D进行卷积处理,得到第二图像窗口交互特征,该过程可参考上述步骤中卷积处理得到第一图像窗口交互特征的过程,在此不再赘述。
示意性的,如图6所示,将
Figure PCTCN2022139576-appb-000020
输入DWConv2D结构进行卷积处理,得到第二图像窗口交互特征
Figure PCTCN2022139576-appb-000021
如公式(9)所示。
Figure PCTCN2022139576-appb-000022
步骤四、通过MLP对第二图像窗口交互特征进行特征提取处理,得到二维图像空间特征。
例如,卷积处理过程后,再次利用MLP进行通道学习,得到最终的二维图像空间特征。
示意性的,如图6所示,首先将第二图像窗口交互特征
Figure PCTCN2022139576-appb-000023
进行归一化处理,并将归一化后的
Figure PCTCN2022139576-appb-000024
输入MLP中进行特征提取,得到最终的二维图像空间特征
Figure PCTCN2022139576-appb-000025
如公式(10)所示。
Figure PCTCN2022139576-appb-000026
对二维图像特征进行空间特征提取,得到二维图像空间特征是一次全视角切片空间洗牌块(FVSSSB,Full-View Slice Spatial Shuffle Block)过程,其整体过程如图6所示,从而充分学习二维图像特征,以便提取到准确的二维图像空间特征,便于后续进行准确的图像识别。
步骤403,基于主视角以及辅视角,对二维图像空间特征进行语义特征提取处理,得到图像语义特征,其中,主视角是二维图像特征对应的视角,辅视角是三维视角中与主视角不同的视角。
由于二维图像空间特征仅表示对应二维视角(即主视角)的特征,因此,在对各个二维图像特征进行空间特征提取,得到二维图像空间特征后,将捕获剩余第三视角(即辅视角)的剩余语义信息,从而进行信息的补充学习。其中,对二维图像空间特征进行语义特征提取,得到图像语义特征的过程是切片可感知的上下文混合(SAVCM,Slice-Aware Volume Context Mixing)过程,其中,各个视角下SAVCM网络的网络参数共享,即网络参数相同。该过程可包括如下步骤:
步骤403a,对二维图像空间特征与位置编码特征进行特征融合处理,得到第一图像语义特征,其中,位置编码特征用于指示二维图像特征对应的位置信息。
在一种可能的实施方式中,首先对每个二维图像空间特征
Figure PCTCN2022139576-appb-000027
添加绝对位置编码(APE,Absolute Position Encoding)特征(即位置编码特征),用于表示对应二维图像特征的空间位置信息,即对应二维图像切片的空间位置信息,也就是用于指示二维图像特征对应的位置信息,位置编码特征是可自动学习的参数,从而将二维图像切片的位置信息注入至二维图像空间特征
Figure PCTCN2022139576-appb-000028
内,实现切片位置感知学习。
示意性的,如图7所示,将二维图像空间特征与位置编码特征进行特征融合,得到第一图像语义特征
Figure PCTCN2022139576-appb-000029
如公式(11)所示:
Figure PCTCN2022139576-appb-000030
其中,APE S表示
Figure PCTCN2022139576-appb-000031
对应的空间位置编码。
步骤403b,在主视角下,通过MLP对第一图像语义特征进行语义特征提取,得到主图像语义特征。
在一种可能的实施方式中,将分别在主视角与辅视角下进行语义特征提取。其中,主视角是指二维图像特征对应的视角,辅视角是三维视角中与主视角不同的视角。比如,
Figure PCTCN2022139576-appb-000032
是对(H,W)视角下的二维图像特征提取得到的二维图像空间特征,则主视角为(H,W),辅视角则为剩余的D视角。
例如,利用残差轴向多层感知机(axial-MLP)在主视角下对第一图像语义特征进行语义特征提取,得到主图像语义特征
Figure PCTCN2022139576-appb-000033
如图7所示,首先对第一图像语义特征
Figure PCTCN2022139576-appb-000034
(B,SP,C,TH)进行位置重排得到(B,SP,TH,C),之后,利用MLP沿通道C方向提取,其中,先升维为4C,提取后重新恢复为原始通道数C,再对提取得到的主图像语义特征进行位置恢复,恢复为(B,SP,C,TH),其中,SP表示主视角下的空间维度。
步骤403c,在辅视角下,通过MLP对第一图像语义特征进行语义特征提取,得到辅图像语义特征。
在基于主视角进行语义特征提取的同时,基于辅视角利用MLP对第一图像语义特征进行语义特征提取,得到辅图像语义特征
Figure PCTCN2022139576-appb-000035
如图7所示,对第一图像语义特征沿辅视角进行语义特征提取,即先升维为4TH,提取第一图像语义特征后,再恢复为原始维度TH,其中,TH表示辅视角下的空间维度。
步骤403d,对主图像语义特征与辅图像语义特征进行特征融合处理,得到图像语义特征。
例如,得到主图像语义特征与辅图像语义特征后,将对二者进行特征融合,从而得到图像语义特征,在一种可能的实施方式中,如图7所示,将主图像语义特征
Figure PCTCN2022139576-appb-000036
辅图像语义特征
Figure PCTCN2022139576-appb-000037
以及原始特征
Figure PCTCN2022139576-appb-000038
在通道上进行合并,合并后得到合并特征,再利用MLP将合并特征进行映射,恢复为原始通道数,得到图像语义特征
Figure PCTCN2022139576-appb-000039
该过程通过融合第三视角下的图像特征信息,从而可感知二维图像切片的上下文信息,提高特征学习准确性,如公式(12)所示:
Figure PCTCN2022139576-appb-000040
Figure PCTCN2022139576-appb-000041
其中,Axial-MLP表示轴向多层感知机操作,Concat表示合并操作,MLP cp表示特征融合操作。
需要说明的是,图3中的步骤303可通过图4中的步骤404-步骤405实现。
步骤404,对图像语义特征与视角特征进行融合处理,得到视角图像语义特征。
在进行特征融合的过程中,首先对每个视角的图像语义特征
Figure PCTCN2022139576-appb-000042
上添加APE,以实现对图像语义特征与视角特征进行融合处理,得到视角图像语义特征
Figure PCTCN2022139576-appb-000043
由于对全视角学习的丰富语义进行聚合的过程是在通道上处理的,因此APE加在全视角特征的通道上,实现聚合时的视角可感知,如公式(13)所示。其中,APE是通道对应的编码,用于指示对应视角,即视角特征,比如,(H,W)视角。
Figure PCTCN2022139576-appb-000044
步骤405,对各个视角图像语义特征进行特征融合处理,得到第i轮的三维医学图像特征。
接着对三个通道的全视角特征
Figure PCTCN2022139576-appb-000045
(即视角图像语义特征)进行合并,得到合并特征,其中,通道数×3,再对合并特征使用自然对数(LN)做归一化,最后使用MLP视角聚合器MLP va将归一化的特征映射回原始通道数,得到MLP-Transformer当前块的医学体特征输出Z l+1,即第i轮的三维医学图像特征。即:
Figure PCTCN2022139576-appb-000046
其中,Concat表示合并操作,LN表示归一化操作,MLP va表示映射操作。
如图8所示,首先对各个图像语义特征与APE编码进行融合,再对三个视角进行拼接,得到最终的三维医学图像特征。
需要说明的是,图3中的步骤304可通过图4中的步骤406-步骤407实现。其中,特征提取过程包括特征编码过程或特征解码过程,其中,特征编码过程包括对三维医学图像特征的下采样过程,即减小三维医学图像特征的维度,而特征解码过程包含对三维医学图像特征的上采样过程,即增加三维医学图像特征的维度。
步骤406,在上采样结果达到原始尺寸的情况下,将提取得到的三维医学图像特征确定为第I轮特征提取得到的第I轮的三维医学图像特征。
在一种可能的实施方式中,当上采样结果达到三维医学图像的原始尺寸时,则确定为第I轮特征提取过程。示意性的,如图5所示,当达到C o′×H×W×D时,将对应的结果确定为第I轮的三维医学图像特征,利用第I轮的三维医学图像特征进行目标预测503(即图像识别),得到图像识别结果。而为进一步加强图像识别准确性,将对目标预测结果与初始输入的三维医学图像对应特征进行融合,从而基于融合后的特征进行图像识别。如图5所示,首先对输入三维医学图像C i×H×W×D进行卷积得到初始图像的初始三维医学图像特征C I′×H×W×D,将C′ O×H×W×D与C′ I×H×W×D进行融合并进行卷积,得到最后的输出结果。
步骤407,基于第I轮的三维医学图像特征进行图像识别处理,得到图像识别结果。
最终基于第I轮三维医学图像特征进行图像识别,从而后续可对三维医学图像进行图像配准、分类等。
在一种可能的实施方式中,TR-MLP网络结构可如图9所示,首先对当前块输入的三维医学图像特征Z l的(H,W,D)维度进行视角重排,重排为(H,W),(H,D),(W,D)三个视角的二维图像切片,每个视角对应3D中的一个二维切片方向;对重排后的全视角2D图像切片使用FVSSB充分学习2D切片信息,得到二维图像特征;接着使用切片可感知的医学体上下文混合SAVCM捕获沿着第三视角上的剩余图像语义信 息;最后使用视角可感知的聚合器对全视角学习的丰富语义进行聚合,最终得到Transformer-MLP块的三维医学图像特征Z l+1输出,并作为下一个Transformer-MLP块的输入特征。其中,三个视角并行计算,并且全视角切片空间洗牌块网络和切片可感知的医学体上下文混合块网络在各个视角中都是参数共享的,即分别利用相同网络参数对应的特征提取网络,对各个视角下的二维图像特征进行语义特征提取,得到不同视角下的图像语义特征。
本申请实施例中,通过先学习全视角2D空间信息,再学习第三上的剩余图像语义,之后对全视角语义进行融合实现了三维医学图像特征的上下文可感知能力,并大幅增强了归纳偏置能力,从而提高三维医学图像识别的准确性,且通过精简的局部视觉Transformer-MLP计算单元取代了计算量大的三维卷积神经网络(3D CNN)和纯视觉变换(Transformer),从而降低了计算复杂度,提高识别效率。
其中,特征提取过程中包含特征编码过程或特征解码过程,在提取过程中,包含自注意力处理过程,其中,自注意力处理过程为基于Q、K、V进行自注意力的计算。在一种可能的实施方式中,为融合多尺度的视觉特征,将特征编码过程(由编码器实现)的特征与特征解码过程(由解码器实现)的特征进行融合,得到特征解码过程中的Q、K、V值。
在一些实施例中,第t轮特征解码过程中的K值基于第t-1轮特征解码中的K值、与对应特征编码过程中的K值融合得到,第t轮特征解码过程中的V值基于第t-1轮特征解码中的V值、与对应特征编码过程中的V值融合得到,第t轮解码过程中的Q值为第t-1轮特征解码中的Q值。
在一种可能的实施方式中,第t轮特征解码输入特征与对应编码过程的输出特征分辨率相同,即对分辨率相同的图像特征进行跳连融合。示意性的,如图5所示,第2轮特征解码过程对应的分辨率为4C×H/16×W/16×D/16,对应跳连融合的特征编码过程为分辨率同样为4C×H/16×W/16×D/16的最后1轮编码过程,在进行跳连融合时,对第2轮特征解码输入的特征(即对第1轮特征解码输出特征进行上采样后的特征)与最后一轮特征编码过程的输出特征进行跳连融合。
以第t轮特征解码对应特征编码过程输出特征为E v,第t轮特征解码过程输入特征为D v为例进行说明,其中,v是指某视角,即分别在不同视角下进行跳连融合。首先对E v、D v用Kernel Size为1的标准卷积(PWConv2D)进行卷积。其中,特征解码过程中,Q值仅来自于上一轮特征解码过程,而对于编码器与解码器的跳连融合,仅对K值、V值进行融合。因此,如图10所示,利用PWConv2D将编码器特征E v的原始通道数分为两份,得到编码器E v的K值
Figure PCTCN2022139576-appb-000047
以及V值
Figure PCTCN2022139576-appb-000048
如公式(14)所示:
Figure PCTCN2022139576-appb-000049
如图10所示,而利用PWConv2D将解码器特征D v的原始通道数分为3份,得到解码器Q值
Figure PCTCN2022139576-appb-000050
K值
Figure PCTCN2022139576-appb-000051
以及V值
Figure PCTCN2022139576-appb-000052
如公式(15)所示:
Figure PCTCN2022139576-appb-000053
之后,对来自编码器的
Figure PCTCN2022139576-appb-000054
与来自解码器的
Figure PCTCN2022139576-appb-000055
进行融合,以及对来自编码器的
Figure PCTCN2022139576-appb-000056
与来自解码器的
Figure PCTCN2022139576-appb-000057
进行融合,如公式(16)所示:
Figure PCTCN2022139576-appb-000058
其中,
Figure PCTCN2022139576-appb-000059
即为第t轮特征解码过程中对应的K值,
Figure PCTCN2022139576-appb-000060
即为第t轮特征解码过程中对应V值,而第t轮特征解码过程中对应Q值
Figure PCTCN2022139576-appb-000061
即为
Figure PCTCN2022139576-appb-000062
其中,三者用于第t轮特征解码过程中的W-MSA的学习,如公式(17)所示:
Figure PCTCN2022139576-appb-000063
其中,CrossMerge表示跳连融合操作。
本申请实施例中,引入跳连融合网络,将编码器与解码器对应的特征进行跳连融合,从而融合多尺度信息,丰富图像特征语义学习。
图11是本申请实施例提供的三维医学图像的识别装置的结构框图,如图11所示,该装置包括:
视角重排模块1101,配置为在第i轮特征提取过程中,对第i-1轮的三维医学图像特征进行视角重排处理,得到二维图像特征,其中,所述第i-1轮的三维医学图像特征是对三维医学图像进行第i-1轮特征提取得到的特征、且不同的二维图像特征是所述第i-1轮的三维医学图像特征在不同视角下的特征;特征提取模块1102,配置为对各个所述二维图像特征进行语义特征提取处理,得到不同视角下的图像语义特征;特征融合模块1103,配置为对不同视角下的所述图像语义特征进行特征融合处理,得到第i轮的三维医学图像特征;图像识别模块1104,配置为基于第I轮特征提取得到的第I轮的三维医学图像特征进行图像识别处理,得到所述三维医学图像的图像识别结果,其中,i为依次递增的正整数,1<i≤I,I为正整数。
在一些实施例中,所述特征提取模块1102,包括:
第一提取单元,配置为对所述二维图像特征进行空间特征提取处理,得到二维图像空间特征;
第二提取单元,配置为基于主视角以及辅视角,对所述二维图像空间特征进行语义特征提取处理,得到所述图像语义特征,其中,所述主视角是所述二维图像特征对应的视角,所述辅视角是三维视角中与所述主视角不同的视角。
在一些实施例中,所述第一提取单元,还配置为:对所述二维图像特征进行窗口划分处理,得到N个窗口分别对应的局部二维图像特征,其中,所述N个窗口互不重叠,N为大于1的正整数;对N个所述局部二维图像特征进行特征提取处理,得到二维图像窗口特征;对所述N个窗口进行窗口重排处理,并对窗口重排后的N个窗口分别对应的所述二维图像窗口特征进行特征提取处理,得到二维图像空间特征,其中,所述窗口重排用于改变N个窗口的空间位置。
在一些实施例中,所述第一提取单元,还配置为:
对N个所述局部二维图像特征进行自注意力处理,得到N个所述局部二维图像特征分别对应的自注意力特征;对N个所述自注意力特征进行特征融合处理,得到第一图像窗口内部特征;对所述第一图像窗口内部特征进行卷积处理,得到第一图像窗口交互特征;通过多层感知机MLP对所述第一图像窗口交互特征进行特征提取处理,得到所述二维图像窗口特征。
在一些实施例中,所述第一提取单元,还配置为:
对窗口重排后的N个窗口分别对应的所述二维图像窗口特征进行自注意力处理,得到N个窗口分别对应的自注意力特征;对N个所述自注意力特征进行特征融合处理,得到第二图像窗口内部特征;对所述第二图像窗口内部特征进行位置翻转处理,并对位置翻转后的所述第二图像窗口内部特征进行卷积处理,得到第二图像窗口交互特征;通过多层感知机MLP对所述第二图像窗口交互特征进行特征提取处理,得到所述二维图像空间特征。
在一些实施例中,所述第一提取单元,还配置为:
基于所述局部二维图像特征对应的查询项Q、键项K以及值项V进行自注意力处理,得到N个所述局部二维图像特征的自注意力特征。
在一些实施例中,所述特征提取过程包括特征编码过程或特征解码过程,第t轮特征解码过程中的所述K值基于第t-1轮特征解码中的K值、与对应特征编码过程中的K 值融合得到,第t轮特征解码过程中的所述V值基于第t-1轮特征解码中的V值、与对应特征编码过程中的V值融合得到,所述第t轮解码过程中的所述Q值为所述第t-1轮特征解码中的所述Q值。
在一些实施例中,所述第二提取单元,还配置为:
对所述二维图像空间特征与位置编码特征进行特征融合处理,得到第一图像语义特征,其中,所述位置编码特征用于指示所述二维图像特征对应的位置信息;在所述主视角下,通过MLP对所述第一图像语义特征进行语义特征提取处理,得到主图像语义特征;在所述辅视角下,通过所述MLP对所述第一图像语义特征进行语义特征提取处理,得到辅图像语义特征;对所述主图像语义特征与所述辅图像语义特征进行特征融合处理,得到所述图像语义特征。
在一些实施例中,所述特征融合模块1103,还包括:
第一融合单元,配置为对所述图像语义特征与视角特征进行融合处理,得到视角图像语义特征;
第二融合单元,配置为对各个所述视角图像语义特征进行特征融合处理,得到所述第i轮的三维医学图像特征。
在一些实施例中,所述特征提取模块1102,还配置为:
分别利用相同网络参数对应的特征提取网络,对各个视角下的所述二维图像特征进行语义特征提取处理,得到不同视角下的所述图像语义特征。
在一些实施例中,所述特征提取过程包括特征编码过程或特征解码过程,所述特征编码过程包括对三维医学图像特征的下采样过程,所述特征解码过程包括对三维医学图像特征的上采样过程。
所述图像识别模块1104,还包括:
确定单元,配置为在上采样结果达到原始尺寸的情况下,将提取得到的所述三维医学图像特征确定为所述第I轮特征提取得到的第I轮的三维医学图像特征;
识别单元,配置为基于所述第I轮三维医学图像特征进行图像识别处理,得到所述图像识别结果。
在一些实施例中,三维医学图像是CT图像、MRI图像或PET图像。
综上所述,本申请实施例中,在每个特征提取阶段中,首先对三维医学图像特征进行视角重排,从而将其划分为不同视角下的二维图像特征,并分别对二维图像特征进行特征提取,得到不同视角下的图像语义特征,从而对不同视角下的图像语义特征进行融合,得到特征提取后的三维医学图像特征。在该过程中,由于通过对不同视角下的二维图像特征进行特征提取,相较于相关技术中直接提取三维图像特征进行图像识别的方式,本申请实施例通过精简的局部计算单元进行不同视角的特征提取,可降低计算复杂度,从而提高三维医学图像的识别效率。
需要说明的是:上述实施例提供的装置,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的装置与方法实施例属于同一构思,其实现过程详见方法实施例,这里不再赘述。
请参考图12,其示出了本申请实施例提供的计算机设备的结构示意图。具体来讲:所述计算机设备1200包括中央处理单元(CPU,Central Processing Unit)1201、包括随机存取存储器1202和只读存储器1203的系统存储器1204,以及连接系统存储器1204和中央处理单元1201的系统总线1205。所述计算机设备1200还包括帮助计算机内的各个器件之间传输信息的基本输入/输出系统(I/O系统,Input/Output)1206,和用于存储操作系统1213、应用程序1214和其他程序模块1215的大容量存储设备1207。
所述基本输入/输出系统1206包括有用于显示信息的显示器1208和用于用户输入信息的诸如鼠标、键盘之类的输入设备1209。其中所述显示器1208和输入设备1209都通过连接到系统总线1205的输入输出控制器1210连接到中央处理单元1201。所述基本输入/输出系统1206还可以包括输入输出控制器1210以用于接收和处理来自键盘、鼠标、或电子触控笔等多个其他设备的输入。类似地,输入输出控制器1210还提供输出到显示屏、打印机或其他类型的输出设备。
所述大容量存储设备1207通过连接到系统总线1205的大容量存储控制器(未示出)连接到中央处理单元1201。所述大容量存储设备1207及其相关联的计算机可读介质为计算机设备1200提供非易失性存储。也就是说,所述大容量存储设备1207可以包括诸如硬盘或者驱动器之类的计算机可读介质(未示出)。
不失一般性,所述计算机可读介质可以包括计算机存储介质和通信介质。计算机存储介质包括以用于存储诸如计算机可读指令、数据结构、程序模块或其他数据等信息的任何方法或技术实现的易失性和非易失性、可移动和不可移动介质。计算机存储介质包括随机存取记忆体(RAM,Random Access Memory)、只读存储器(ROM,Read Only Memory)、闪存或其他固态存储其技术,只读光盘(CD-ROM,Compact Disc Read-Only Memory)、数字通用光盘(DVD,Digital Versatile Disc)或其他光学存储、磁带盒、磁带、磁盘存储或其他磁性存储设备。当然,本领域技术人员可知所述计算机存储介质不局限于上述几种。上述的系统存储器1204和大容量存储设备1207可以统称为存储器。
存储器存储有一个或多个程序,一个或多个程序被配置成由一个或多个中央处理单元1201执行,一个或多个程序包含用于实现上述方法的指令,中央处理单元1201执行该一个或多个程序实现上述各个方法实施例提供的方法。
根据本申请的各种实施例,所述计算机设备1200还可以通过诸如因特网等网络连接到网络上的远程计算机运行。也即计算机设备1200可以通过连接在所述系统总线1205上的网络接口单元1211接到网络1212,或者说,也可以使用网络接口单元1211来连接到其他类型的网络或远程计算机系统(未示出)。
所述存储器还包括一个或者一个以上的程序,所述一个或者一个以上程序存储于存储器中,所述一个或者一个以上程序包含用于进行本申请实施例提供的方法中由计算机设备所执行的步骤。
本申请实施例还提供一种计算机可读存储介质,该可读存储介质中存储有至少一条指令、至少一段程序、代码集或指令集,至少一条指令、至少一段程序、代码集或指令集由处理器加载并执行以实现上述任一实施例所述的三维医学图像的识别方法。
本申请实施例提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行上述方面提供的三维医学图像的识别方法。
本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序可以存储于一计算机可读存储介质中,该计算机可读存储介质可以是上述实施例中的存储器中所包含的计算机可读存储介质;也可以是单独存在,未装配入终端中的计算机可读存储介质。该计算机可读存储介质中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由处理器加载并执行以实现上述任一方法实施例所述的三维医学图像的识别方法。
在一些实施例中,该计算机可读存储介质可以包括:ROM、RAM、固态硬盘(SSD,Solid State Drives)或光盘等。其中,RAM可以包括电阻式随机存取记忆体(ReRAM, Resistance Random Access Memory)和动态随机存取存储器(DRAM,Dynamic Random Access Memory)。上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
以上所述仅为本申请的一些实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (16)

  1. 一种三维医学图像的识别方法,由计算机设备执行,所述方法包括:
    在第i轮特征提取过程中,对第i-1轮的三维医学图像特征进行视角重排处理,得到二维图像特征,其中,所述第i-1轮的三维医学图像特征是对三维医学图像进行第i-1轮特征提取得到的特征、且不同的二维图像特征是所述第i-1轮的三维医学图像特征在不同视角下的特征;
    对各个所述二维图像特征进行语义特征提取处理,得到不同视角下的图像语义特征;
    对不同视角下的所述图像语义特征进行特征融合处理,得到第i轮的三维医学图像特征;
    基于第I轮特征提取得到的第I轮的三维医学图像特征进行图像识别处理,得到所述三维医学图像的图像识别结果,其中,i为依次递增的正整数,1<i≤I,I为正整数。
  2. 根据权利要求1所述的方法,其中,所述对各个所述二维图像特征进行语义特征提取,得到不同视角下的图像语义特征,包括:
    对所述二维图像特征进行空间特征提取处理,得到二维图像空间特征;
    基于主视角以及辅视角,对所述二维图像空间特征进行语义特征提取处理,得到所述图像语义特征,其中,所述主视角是所述二维图像特征对应的视角,所述辅视角是三维视角中与所述主视角不同的视角。
  3. 根据权利要求2所述的方法,其中,所述对所述二维图像特征进行空间特征提取处理,得到二维图像空间特征,包括:
    对所述二维图像特征进行窗口划分处理,得到N个窗口分别对应的局部二维图像特征,其中,所述N个窗口互不重叠,N为大于1的正整数;
    对N个所述局部二维图像特征进行特征提取处理,得到二维图像窗口特征;
    对所述N个窗口进行窗口重排处理,并对窗口重排后的N个窗口分别对应的所述二维图像窗口特征进行特征提取处理,得到二维图像空间特征,其中,所述窗口重排用于改变N个窗口的空间位置。
  4. 根据权利要求3所述的方法,其中,所述对窗口重排后的N个窗口分别对应的所述二维图像窗口特征进行特征提取处理,得到二维图像空间特征,包括:
    对窗口重排后的N个窗口分别对应的所述二维图像窗口特征进行自注意力处理,得到N个窗口分别对应的自注意力特征;
    对N个所述自注意力特征进行特征融合处理,得到第二图像窗口内部特征;
    对所述第二图像窗口内部特征进行位置翻转处理,并对位置翻转后的所述第二图像窗口内部特征进行卷积处理,得到第二图像窗口交互特征;
    通过多层感知机MLP对所述第二图像窗口交互特征进行特征提取处理,得到所述二维图像空间特征。
  5. 根据权利要求3所述的方法,其中,所述对N个所述局部二维图像特征进行特征提取处理,得到二维图像窗口特征,包括:
    对N个所述局部二维图像特征进行自注意力处理,得到N个所述局部二维图像特征分别对应的自注意力特征;
    对N个所述自注意力特征进行特征融合处理,得到第一图像窗口内部特征;
    对所述第一图像窗口内部特征进行卷积处理,得到第一图像窗口交互特征;
    通过多层感知机MLP对所述第一图像窗口交互特征进行特征提取处理,得到所述二维图像窗口特征。
  6. 根据权利要求5所述的方法,其中,所述对N个所述局部二维图像特征进行自注意力处理,得到N个所述局部二维图像特征分别对应的自注意力特征,包括:
    基于所述局部二维图像特征对应的查询项Q、键项K以及值项V进行自注意力处理,得到N个所述局部二维图像特征的自注意力特征。
  7. 根据权利要求6所述的方法,其中,所述特征提取过程包括特征编码过程或特征解码过程,第t轮特征解码过程中的所述K值基于第t-1轮特征解码中的K值、与对应特征编码过程中的K值融合得到,第t轮特征解码过程中的所述V值基于第t-1轮特征解码中的V值、与对应特征编码过程中的V值融合得到,所述第t轮解码过程中的所述Q值为所述第t-1轮特征解码中的所述Q值。
  8. 根据权利要求2所述的方法,其中,所述基于主视角以及辅视角,对所述二维图像空间特征进行语义特征提取处理,得到所述图像语义特征,包括:
    对所述二维图像空间特征与位置编码特征进行特征融合处理,得到第一图像语义特征,其中,所述位置编码特征用于指示所述二维图像特征对应的位置信息;
    在所述主视角下,通过MLP对所述第一图像语义特征进行语义特征提取处理,得到主图像语义特征;
    在所述辅视角下,通过所述MLP对所述第一图像语义特征进行语义特征提取处理,得到辅图像语义特征;
    对所述主图像语义特征与所述辅图像语义特征进行特征融合处理,得到所述图像语义特征。
  9. 根据权利要求1至8任一所述的方法,其中,所述对不同视角下的所述图像语义特征进行特征融合处理,得到第i轮的三维医学图像特征,包括:
    对所述图像语义特征与视角特征进行融合处理,得到视角图像语义特征;
    对各个所述视角图像语义特征进行特征融合处理,得到所述第i轮的三维医学图像特征。
  10. 根据权利要求1至8任一所述的方法,其中,所述对各个所述二维图像特征进行语义特征提取处理,得到不同视角下的图像语义特征,包括:
    分别利用相同网络参数对应的特征提取网络,对各个视角下的所述二维图像特征进行语义特征提取处理,得到不同视角下的所述图像语义特征。
  11. 根据权利要求1至8任一所述的方法,其中,
    所述特征提取过程包括特征编码过程或特征解码过程,所述特征编码过程包括对三维医学图像特征的下采样过程,所述特征解码过程包括对三维医学图像特征的上采样过程;
    所述基于第I轮特征提取得到的第I轮的三维医学图像特征进行图像识别处理,得到所述三维医学图像的图像识别结果之前,所述方法还包括:
    在上采样结果达到原始尺寸的情况下,将提取得到的所述三维医学图像特征确定为所述第I轮特征提取得到的第I轮的三维医学图像特征。
  12. 根据权利要求1至8任一所述的方法,其中,所述三维医学图像是计算机断层扫描图像CT、磁共振成像MRI或正电子发射断层成像PET。
  13. 一种三维医学图像的识别装置,所述装置包括:
    视角重排模块,配置为在第i轮特征提取过程中,对第i-1轮的三维医学图像特征进行视角重排处理,得到二维图像特征,其中,所述第i-1轮的三维医学图像特征是对三维医学图像进行第i-1轮特征提取得到的特征、且不同的二维图像特征是所述第i-1轮的三维医学图像特征在不同视角下的特征;
    特征提取模块,配置为对各个所述二维图像特征进行语义特征提取处理,得到不同视角下的图像语义特征;
    特征融合模块,配置为对不同视角下的所述图像语义特征进行特征融合处理,得到第i轮的三维医学图像特征;
    图像识别模块,配置为基于第I轮特征提取得到的第I轮的三维医学图像特征进行图像识别处理,得到所述三维医学图像的图像识别结果,其中,i为依次递增的正整数,1<i≤I,I为正整数。
  14. 一种计算机设备,所述计算机设备包括处理器和存储器,所述存储器中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由所述处理器加载并执行以实现如权利要求1至12任一所述的三维医学图像的识别方法。
  15. 一种计算机可读存储介质,所述可读存储介质中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由处理器加载并执行以实现如权利要求1至12任一所述的三维医学图像的识别方法。
  16. 一种计算机程序产品,所述计算机程序产品包括计算机指令,所述计算机指令存储在计算机可读存储介质中,计算机设备的处理器从所述计算机可读存储介质读取所述计算机指令,所述处理器执行所述计算机指令以实现如权利要求1至12任一所述的三维医学图像的方法。
PCT/CN2022/139576 2022-02-28 2022-12-16 三维医学图像的识别方法、装置、设备、存储介质及产品 WO2023160157A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US18/377,958 US20240046471A1 (en) 2022-02-28 2023-10-09 Three-dimensional medical image recognition method and apparatus, device, storage medium, and product

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210191770.3 2022-02-28
CN202210191770.3A CN114581396A (zh) 2022-02-28 2022-02-28 三维医学图像的识别方法、装置、设备、存储介质及产品

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/377,958 Continuation US20240046471A1 (en) 2022-02-28 2023-10-09 Three-dimensional medical image recognition method and apparatus, device, storage medium, and product

Publications (1)

Publication Number Publication Date
WO2023160157A1 true WO2023160157A1 (zh) 2023-08-31

Family

ID=81772468

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/139576 WO2023160157A1 (zh) 2022-02-28 2022-12-16 三维医学图像的识别方法、装置、设备、存储介质及产品

Country Status (3)

Country Link
US (1) US20240046471A1 (zh)
CN (1) CN114581396A (zh)
WO (1) WO2023160157A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114581396A (zh) * 2022-02-28 2022-06-03 腾讯科技(深圳)有限公司 三维医学图像的识别方法、装置、设备、存储介质及产品

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111145181A (zh) * 2019-12-25 2020-05-12 华侨大学 一种基于多视角分离卷积神经网络的骨骼ct图像三维分割方法
CN112949654A (zh) * 2021-02-25 2021-06-11 上海商汤智能科技有限公司 图像检测方法及相关装置、设备
WO2021179702A1 (zh) * 2020-10-27 2021-09-16 平安科技(深圳)有限公司 三维图像的碎骨分割方法、装置、计算机设备及存储介质
CN113920314A (zh) * 2021-09-30 2022-01-11 北京百度网讯科技有限公司 语义分割、模型训练方法,装置,设备以及存储介质
CN114581396A (zh) * 2022-02-28 2022-06-03 腾讯科技(深圳)有限公司 三维医学图像的识别方法、装置、设备、存储介质及产品

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111145181A (zh) * 2019-12-25 2020-05-12 华侨大学 一种基于多视角分离卷积神经网络的骨骼ct图像三维分割方法
WO2021179702A1 (zh) * 2020-10-27 2021-09-16 平安科技(深圳)有限公司 三维图像的碎骨分割方法、装置、计算机设备及存储介质
CN112949654A (zh) * 2021-02-25 2021-06-11 上海商汤智能科技有限公司 图像检测方法及相关装置、设备
CN113920314A (zh) * 2021-09-30 2022-01-11 北京百度网讯科技有限公司 语义分割、模型训练方法,装置,设备以及存储介质
CN114581396A (zh) * 2022-02-28 2022-06-03 腾讯科技(深圳)有限公司 三维医学图像的识别方法、装置、设备、存储介质及产品

Also Published As

Publication number Publication date
CN114581396A (zh) 2022-06-03
US20240046471A1 (en) 2024-02-08

Similar Documents

Publication Publication Date Title
US11861829B2 (en) Deep learning based medical image detection method and related device
US20210365717A1 (en) Method and apparatus for segmenting a medical image, and storage medium
EP3968222B1 (en) Classification task model training method, apparatus and device and storage medium
US20220028031A1 (en) Image processing method and apparatus, device, and storage medium
US20210334942A1 (en) Image processing method and apparatus, device, and storage medium
US11983903B2 (en) Processing images using self-attention based neural networks
CN111369576A (zh) 图像分割模型的训练方法、图像分割方法、装置及设备
US20220157041A1 (en) Image classification method and apparatus
US20220076052A1 (en) Similarity determining method and device, network training method and device, search method and device, and electronic device and storage medium
Kim et al. Binocular fusion net: deep learning visual comfort assessment for stereoscopic 3D
WO2021164280A1 (zh) 三维边缘检测方法、装置、存储介质和计算机设备
WO2023207416A1 (zh) 图像补全方法、装置、设备及存储介质
CN111932529A (zh) 一种图像分割方法、装置及系统
Pan et al. Hierarchical support vector machine for facial micro-expression recognition
US20240046471A1 (en) Three-dimensional medical image recognition method and apparatus, device, storage medium, and product
KR101925603B1 (ko) 병리 영상의 판독을 보조하는 방법 및 이를 이용한 장치
CN116129141A (zh) 医学数据处理方法、装置、设备、介质和计算机程序产品
WO2024087858A1 (zh) 图像处理模型的训练方法、装置、电子设备、计算机程序产品及计算机存储介质
WO2023207531A1 (zh) 一种图像处理方法及相关设备
Xiao et al. A feature extraction method for lung nodules based on a multichannel principal component analysis network (PCANet)
CN115965785A (zh) 图像分割方法、装置、设备、程序产品及介质
CN111598904B (zh) 图像分割方法、装置、设备及存储介质
CN111369564B (zh) 一种图像处理的方法、模型训练的方法及装置
Ren et al. Medical image super-resolution based on semantic perception transfer learning
CN113822846A (zh) 医学图像中确定感兴趣区域的方法、装置、设备及介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22928411

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022928411

Country of ref document: EP

Effective date: 20240312