CN111667459A - Medical sign detection method, system, terminal and storage medium based on 3D variable convolution and time sequence feature fusion - Google Patents

Medical sign detection method, system, terminal and storage medium based on 3D variable convolution and time sequence feature fusion Download PDF

Info

Publication number
CN111667459A
CN111667459A CN202010360486.5A CN202010360486A CN111667459A CN 111667459 A CN111667459 A CN 111667459A CN 202010360486 A CN202010360486 A CN 202010360486A CN 111667459 A CN111667459 A CN 111667459A
Authority
CN
China
Prior art keywords
medical
variable
fusion
neural network
time sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010360486.5A
Other languages
Chinese (zh)
Other versions
CN111667459B (en
Inventor
马杰超
张树
俞益洲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Shenrui Bolian Technology Co Ltd
Shenzhen Deepwise Bolian Technology Co Ltd
Original Assignee
Beijing Shenrui Bolian Technology Co Ltd
Shenzhen Deepwise Bolian Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Shenrui Bolian Technology Co Ltd, Shenzhen Deepwise Bolian Technology Co Ltd filed Critical Beijing Shenrui Bolian Technology Co Ltd
Priority to CN202010360486.5A priority Critical patent/CN111667459B/en
Publication of CN111667459A publication Critical patent/CN111667459A/en
Application granted granted Critical
Publication of CN111667459B publication Critical patent/CN111667459B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10072Tomographic images
    • G06T2207/10081Computed x-ray tomography [CT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10072Tomographic images
    • G06T2207/10088Magnetic resonance imaging [MRI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30061Lung
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30096Tumor; Lesion
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Radiology & Medical Imaging (AREA)
  • Quality & Reliability (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The application provides a medical sign detection method, a system, a terminal and a storage medium based on 3D variable convolution and time sequence feature fusion, wherein the method comprises the following steps: acquiring 3D input data or pseudo 3D input data of the medical image; constructing a variable convolution network model, inputting standard medical sign detection data into a variable convolution neural network for training to obtain a trained variable convolution neural network model; inputting the 3D input data or the pseudo 3D input data of the medical image into a trained variable convolution neural network model; modeling prediction data output by the variable convolutional neural network model by using time sequence fusion to form prediction data based on fusion time sequence characteristics of different scales; carrying out multi-scale pyramid progressive fusion on the prediction data of fusion time sequence characteristics with different resolutions to obtain a candidate frame of the medical symptom; the method and the device solve the problems of the computing efficiency of the 3D network and the loss and underutilization of multi-layer related information of the pseudo 3D network in the prior art.

Description

Medical sign detection method, system, terminal and storage medium based on 3D variable convolution and time sequence feature fusion
Technical Field
The application relates to the technical field of medical images and computer assistance, in particular to a medical sign detection method, a medical sign detection system, a medical sign detection terminal and a medical sign detection storage medium based on 3D variable convolution and time sequence feature fusion.
Background
The detection of medical signs is an important issue in the field of diagnosis of diseases. Traditional definitive diagnosis of suspected malignant areas requires that the patient's tissue be cut from the lesion and biopsied. However, this procedure places high demands on the position and angle of the slices and also requires consideration of the trauma level of the patient.
With the development of a plurality of medical imaging technologies and the improvement of medical equipment, an opportunity is provided for solving the problem. Computer vision and artificial intelligence have been developed rapidly over the last two decades, and many computer-aided diagnosis systems have been used to assist doctors in making diagnosis. The chest CT has the advantages of high spatial resolution, fast scanning time, clear image, capability of completing three-dimensional reconstruction of lesions, and the like, and is greatly determined by patients and doctors. For CT images, medical image processing has its special and different emphasis compared to image processing in the general sense. For natural images, general deep learning techniques are based on 2D images to solve the related problems. However, for medical images, different slices of the same medical sign have high correlation and complementarity between information, and if the detection of the image is performed only on a single slice, most of the information of the image is not effectively used, which not only wastes information, but also causes deviation of the final diagnosis result.
With the development of deep learning technology, a series of methods are available for accurate 3D target detection. The existing medical symptom detection can be generally divided into two common technologies, the first is to directly cut the 3D input in the form of patch and then input the 3D input into a 3D network for judgment, but this approach has high requirements on computing resources. The second method is to input pseudo 3D, which is formed by splicing successive layers, into a 2D network, and the method fuses the input layers at an early stage of the network, but because the method simultaneously models the x and y axes of the spatial 2D layer and the z axis of the layer with different time sequences, and because the resolution of the x and y axes is different from the z axis during the reconstruction of the CT image, the data of two dimensions needs to be modeled respectively, and the same weight parameters are applied to the x and y axes of the spatial dimension and the z axis of the time sequence dimension, the network is difficult to learn the effective difference between different resolutions. In addition, the pseudo 3D structure learns the weights of each layer before being input, but the weight information exists only at the input layer of the network, which may cause the multi-layer related information (such as characteristic patterns specific in high-dimensional sequence data) of the pseudo 3D to be lost or not fully utilized at the later stage of the network.
Therefore, a medical symptom detection method, a medical symptom detection system, a medical symptom detection terminal and a storage medium based on 3D variable convolution and time sequence feature fusion are needed to solve the problem of computational efficiency of a 3D network and the problem of insufficient utilization of multi-level related information loss of pseudo 3D in the existing 3D medical symptom detection.
Disclosure of Invention
Aiming at the defects of the prior art, the application provides a medical sign detection method, a medical sign detection system, a medical sign detection terminal and a medical sign detection storage medium based on 3D variable convolution and time sequence feature fusion, and solves the problems of the prior art that the calculation efficiency of a 3D network is high, the multi-layer related information of a pseudo 3D network is lost and not fully utilized, and the like.
In order to solve the above technical problem, in a first aspect, the present application provides a medical image detection method based on 3D variable convolution and time series feature fusion, including:
acquiring 3D input data of a medical image or pseudo 3D input data formed by splicing a plurality of continuous 2D layers;
constructing a variable convolution network model, inputting standard medical sign detection data into a variable convolution neural network for training to obtain a trained variable convolution neural network model;
inputting the 3D input data of the medical image or the pseudo 3D input data formed by splicing a plurality of continuous 2D layers into a trained variable convolution neural network model;
modeling prediction data output by the variable convolutional neural network model by using time sequence fusion to form prediction data based on fusion time sequence characteristics of different scales;
and carrying out multi-scale pyramid progressive fusion on the prediction data of the fusion time sequence characteristics with different resolutions to obtain a candidate frame of the medical symptom.
Optionally, the acquiring of the 3D input data of the medical image or the pseudo 3D input data formed by splicing a plurality of continuous 2D layers includes:
preprocessing the medical image to obtain 3D input data;
and (3) scanning and cutting the medical image into 2D (two-dimensional) level images, and splicing a plurality of continuous 2D level images to obtain pseudo-3D input data.
Optionally, the inputting the 3D input data of the medical image or the pseudo 3D input data formed by splicing a plurality of continuous 2D layers into the trained variable convolutional neural network model includes:
constructing a variable convolution network model;
inputting standard medical sign detection data into a variable convolutional neural network for training;
and calculating by a parallel standard convolutional neural network to obtain an offset parameter, and learning the variable convolutional neural network end to end by gradient back propagation to obtain a trained variable convolutional neural network model.
Optionally, the modeling the prediction data output by the variable convolutional neural network model by using time sequence fusion to form prediction data based on fusion time sequence features of different scales includes:
acquiring an image frame sequence with time-series prediction data;
extracting the features of the medical signs from the image frame sequence to obtain a medical sign feature map;
and fusing the medical sign feature images according to the time sequence to obtain medical sign prediction data based on fused time sequence features of different scales.
Optionally, the performing multi-scale pyramid progressive fusion on the prediction data of the fusion time sequence features with different resolutions to obtain a candidate frame of the medical sign includes:
and (3) performing feature addition through 1 × 1 convolution kernels by upsampling the high-level features to obtain a candidate frame of the medical feature.
Optionally, the method further includes:
and setting a threshold corresponding to the medical sign candidate frame according to the category of the medical sign candidate frame, and outputting the candidate frame with the detection score exceeding the threshold to obtain a medical sign detection result.
Optionally, the method further includes:
comparing the size of the candidate frame of the medical sign and the output prediction data with the corresponding standard medical sign detection data, and calculating to obtain a difference value between the prediction data and the actual data of the medical sign;
and (4) returning loss of the variable convolutional neural network model through the difference value, and optimizing the variable convolutional neural network model.
In a second aspect, the present application further provides a medical image detection system based on 3D variable convolution and time series feature fusion, comprising:
the data acquisition unit is configured for acquiring 3D input data of the medical image or pseudo 3D input data formed by splicing a plurality of continuous 2D layers;
the model construction unit is configured for constructing a variable convolution network model, and inputting standard medical sign detection data into a variable convolution neural network for training to obtain a trained variable convolution neural network model;
the data input unit is configured to input 3D input data of the medical image or pseudo 3D input data formed by splicing a plurality of continuous 2D layers into a trained variable convolution neural network model;
the time sequence feature fusion unit is configured for modeling the prediction data output by the variable convolutional neural network model by using time sequence fusion to form prediction data based on fusion time sequence features of different scales;
and the pyramid progressive fusion unit is configured to perform multi-scale pyramid progressive fusion on the prediction data of the fusion time sequence features with different resolutions to obtain a candidate frame of the medical symptom.
Optionally, the data obtaining unit is specifically configured to:
preprocessing the medical image to obtain 3D input data;
and (3) scanning and cutting the medical image into 2D (two-dimensional) level images, and splicing a plurality of continuous 2D level images to obtain pseudo-3D input data.
Optionally, the model building unit is specifically configured to:
constructing a variable convolution network model;
inputting standard medical sign detection data into a variable convolutional neural network for training;
and calculating by a parallel standard convolutional neural network to obtain an offset parameter, and learning the variable convolutional neural network end to end by gradient back propagation to obtain a trained variable convolutional neural network model.
Optionally, the time sequence feature fusion unit is specifically configured to:
acquiring an image frame sequence with time-series prediction data;
extracting the features of the medical signs from the image frame sequence to obtain a medical sign feature map;
and fusing the medical sign feature images according to the time sequence to obtain medical sign prediction data based on fused time sequence features of different scales.
Optionally, the pyramid progressive fusion unit is specifically configured to:
and (3) performing feature addition through 1 × 1 convolution kernels by upsampling the high-level features to obtain a candidate frame of the medical feature.
Optionally, the system further includes:
and the candidate frame extracting unit is configured for setting a threshold corresponding to the medical symptom candidate frame according to the category of the medical symptom candidate frame, and outputting the candidate frame with the detection score exceeding the threshold to obtain the detection result of the medical symptom.
Optionally, the system further includes:
the model loss calculation unit is configured for comparing the size of the candidate frame of the medical sign and the output prediction data with corresponding standard medical sign detection data and calculating a difference value between the prediction data and actual data of the medical sign; and (4) returning loss of the variable convolutional neural network model through the difference value, and optimizing the variable convolutional neural network model.
In a third aspect, the present application provides a terminal, comprising:
a processor, a memory, wherein,
the memory is used for storing a computer program which,
the processor is used for calling and running the computer program from the memory so as to make the terminal execute the method of the terminal.
In a fourth aspect, the present application provides a computer storage medium having instructions stored thereon, which when executed on a computer, cause the computer to perform the method of the above aspects.
Compared with the prior art, the method has the following beneficial effects:
1. according to the method and the device, the 3D variable convolution and the time sequence feature fusion are combined, a model framework which can not only aim at 3D input but also aim at pseudo 3D input is constructed, and the model framework can be applied to different positions, different data types and different task scenes.
2. The 3D-based variable convolution can adaptively adjust the high resolution on the spatial dimension according to different resolutions of an x axis, a y axis and a z axis, and can also adaptively adjust the parameters of network learning according to the geometric change of medical signs, thereby learning irregular shape information on the spatial dimension, learning the correlation information between different layers on the time sequence dimension, and adaptively learning the weight relationship between the layers;
3. according to the method and the device, time sequence feature fusion is introduced, feature fusion is carried out in the later stage of the network, so that the model can well learn high-dimensional feature representation among image information, modeling is carried out on information of different layers in the later stage according to the time sequence information, the time sequence characteristics of data are effectively utilized, and the detection performance of the model is further improved. The method also conforms to the habit that the doctor in the imaging department checks the film according to the time sequence relation when the film is read, so that the built model has more reasonability.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a flowchart of a medical image detection method based on 3D variable convolution and temporal feature fusion according to an embodiment of the present application;
fig. 2 is a flowchart of another medical feature detection method based on 3D variable volume and temporal feature fusion according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of a medical image detection system based on 3D variable convolution and temporal feature fusion according to another embodiment of the present application;
fig. 4 is a schematic structural diagram of a terminal system according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Referring to fig. 1, fig. 1 is a flowchart of a medical image detection method based on 3D variable convolution and temporal feature fusion according to an embodiment of the present application, where the method 100 includes:
s101: acquiring 3D input data of a medical image or pseudo 3D input data formed by splicing a plurality of continuous 2D layers;
s102: constructing a variable convolution network model, inputting standard medical sign detection data into a variable convolution neural network for training to obtain a trained variable convolution neural network model;
s103: inputting the 3D input data of the medical image or the pseudo 3D input data formed by splicing a plurality of continuous 2D layers into a trained variable convolution neural network model;
s104: modeling prediction data output by the variable convolutional neural network model by using time sequence fusion to form prediction data based on fusion time sequence characteristics of different scales;
s105: and carrying out multi-scale pyramid progressive fusion on the prediction data of the fusion time sequence characteristics with different resolutions to obtain a candidate frame of the medical symptom.
Based on the foregoing embodiment, as an optional embodiment, the S101 acquiring 3D input data of a medical image or pseudo 3D input data formed by splicing a plurality of continuous 2D slices includes:
preprocessing the medical image to obtain 3D input data;
and (3) scanning and cutting the medical image into 2D (two-dimensional) level images, and splicing a plurality of continuous 2D level images to obtain pseudo-3D input data.
Specifically, the dimension of 3D input data of the medical image is B × D × H × W, where B denotes a batch size during model input, D denotes the number of slices in the data time sequence dimension, and H and W denote the length and width of the image, respectively; the dimension of the pseudo 3D input data of the medical image is B × C × H × W, where B denotes a batch size at the time of model input, C denotes a plurality of channels formed by the multi-layered image, and H and W denote the length and width of the image, respectively.
It should be noted that the present application can be adapted to a currently commonly used 3D input and a pseudo 3D input formed by splicing a plurality of consecutive 2D layers, and for the input of a 3D structure and the input of a pseudo 3D structure, the difference is that the final output forms are 3D and 2D images. For 3D input, the weights of each layer are the same and learning is required, while for pseudo 3D input, typically only the middle layer of the image is predicted, and the successive upper and lower layers are only the auxiliary information data for predicting the layer.
Based on the foregoing embodiment, as an optional embodiment, the S102 constructs a variable convolutional network model, and inputs the standard medical sign detection data to the variable convolutional neural network for training, so as to obtain a trained variable convolutional neural network model, including:
constructing a variable convolution network model;
inputting standard medical sign detection data into a variable convolutional neural network for training;
and calculating by a parallel standard convolutional neural network to obtain an offset parameter, and learning the variable convolutional neural network end to end by gradient back propagation to obtain a trained variable convolutional neural network model.
Specifically, for a conventional 3D convolution kernel, which is 3 x 3 in size, the sampling points are arranged in squares, but many medical signs appear to have irregular shapes. For the 3D variable convolution kernel, the structure of the medical symptoms can be adapted in a self-adaptive mode, artifacts caused by respiration and other motions can be effectively reduced, meanwhile, a learnable offset parameter can be added to each point, the convolution kernel can learn irregular medical symptoms, the size and the position of the deformable convolution kernel can be adjusted dynamically according to the image content needing to be identified at present, and the visual effect of the variable convolution kernel is that the positions of sampling points of the convolution kernels at different positions can be changed in a self-adaptive mode according to the image content, so that the variable convolution kernel is suitable for geometric deformation of shapes, sizes and the like of different objects.
Based on the foregoing embodiment, as an optional embodiment, the S104 modeling the prediction data output by the variable convolutional neural network model by using time series fusion to form prediction data based on fusion time series features of different scales, including:
acquiring an image frame sequence with time-series prediction data;
extracting the features of the medical signs from the image frame sequence to obtain a medical sign feature map;
and fusing the medical sign feature images according to the time sequence to obtain medical sign prediction data based on fused time sequence features of different scales.
Specifically, for the features in each stage, the features are composed of multiple 2d networks of channels, and the channels of the multiple 2d networks are regarded as timelines; the time sequence is then modeled using LSTM, and the medical symptom feature map is input into the LSTM time sequence model to obtain medical symptom prediction data based on fused time sequence features of different scales.
For both 3D images and pseudo-3D images, the nature of the input is continuous in several layers. There will be some temporal correlation between these layers, as when looking at video, the information between several frames is relevant and continuous, and so on for CT images. For the time series feature fusion stage, the input vector is (B, T, H, W), and it can be seen that the time series feature fusion is also applicable to the two input modes. Each layer of the input is considered as each frame of the video, and for each time point, a corresponding output can be generated, and for this input, both the information of the previous time point and the information of the input layer at this time point are taken into account. And the output on the time node is also circularly connected with the hidden unit of the next time node, so that the output state of the next time point is influenced. To better simulate the timing relationships, convolution structures are used in both input-to-state and state-to-state transitions. And forming a time sequence characteristic fusion structure by overlapping a plurality of time sequence layers. An important concept in the time sequence feature fusion structure is a forgetting mechanism, and by modeling the forgetting mechanism, for some information in the current time point, it is determined which information is forgotten from the state, and which information needs to be output to the next time point. Recall that when a doctor reads a picture, if there is no abnormality at a certain level, the doctor quickly looks over the next one, and if it says that a suspicious symptom appears at a certain level, the doctor can perform careful analysis. This mechanism can be accomplished by a gating layer that reads the information of h (t-1) and x (t) and outputs a value between 0 and 1 in the C (t-1) state. Where 1 indicates "complete retention" and 0 indicates complete rejection. For each output of the variable convolutional neural network model, modeling using time series fusion, fusion features based on different scales can be formed.
Based on the foregoing embodiment, as an optional embodiment, in S105, performing multi-scale pyramid progressive fusion on the prediction data of the fusion time series features with different resolutions to obtain a candidate frame of the medical symptom, where the candidate frame includes:
and (3) performing feature addition through 1 × 1 convolution kernels by upsampling the high-level features to obtain a candidate frame of the medical feature.
Specifically, pyramid (FPN) fusion can be divided into 5 layers (stages): (F1, F2, F3, F4 and F5; each F layer represents different size resolution), after the 3D feature layers in each F layer are subjected to time sequence fusion, all the 5 candidate frames with different resolutions are added up to be subjected to multi-scale fusion to obtain the candidate frame of the medical symptom, and the multi-scale problem in medical symptom detection can be solved. The method comprises the steps of compressing or amplifying a certain input picture to form pictures with different dimensions as model input, respectively processing the pictures with different dimensions by using the same model, and finally combining the respectively obtained features to obtain a feature set capable of reflecting multi-dimensional information. In the pyramid progressive fusion process, the channel numbers of two feature layers are unified by upsampling the high-layer features and adding the features through a 1 × 1 convolution kernel, so that richer semantic information can be obtained.
During the detection of multiple signs in the lung, pulmonary nodules of extremely small size may appear, as well as emphysema that spreads throughout the chest. Under the condition of basically not increasing the calculated amount of the original model, the performance of medical sign detection of different sizes can be greatly improved through simple network connection change. Specifically, for the convolutional neural network, different depths correspond to different levels of semantic features, the shallow network has high resolution, more learners are detail features, the deep network has low resolution, and more learners are semantic features. As the network is propagating forward in the variable convolution modeling stage, each calculation is downsampled, i.e., each feature is the original 1/2.
Based on the foregoing embodiment, as an optional embodiment, the method 100 further includes:
and setting a threshold corresponding to the medical sign candidate frame according to the category of the medical sign candidate frame, and outputting the candidate frame with the detection score exceeding the threshold to obtain a medical sign detection result.
Specifically, suspicious region candidate frame extraction is performed on features including time sequence information and semantic information, and a group of candidate frames with different sizes and different aspect ratios are generated for each point in the features. For each candidate box, there is a score that measures how much of the region box contains the object to be detected. When the probability of the candidate box is greater than a given threshold, it is determined as a result of the detection of the medical sign.
Based on the foregoing embodiment, as an optional embodiment, the method 100 further includes:
comparing the size of the candidate frame of the medical sign and the output prediction data with the corresponding standard medical sign detection data, and calculating to obtain a difference value between the prediction data and the actual data of the medical sign;
and (4) returning loss of the variable convolutional neural network model through the difference value, and optimizing the variable convolutional neural network model.
Referring to fig. 2, fig. 2 is a flowchart of another medical symptom detection method based on 3D variable convolution and time series feature fusion according to an embodiment of the present disclosure, where the method mainly includes six stages, a 3D/pseudo 3D data input stage, a variable convolution neural network modeling stage, a time series feature fusion stage, a pyramid progressive fusion stage, a candidate box extraction stage, and a model loss calculation stage. For the time-series feature fusion module, it can be applied after the variable convolutional neural network modeling stage, and also after the pyramid progressive fusion stage (only one of them is shown in the figure).
It should also be noted that the present application is applicable to 3D or pseudo 3D based related tasks such as CT images and multi-modality MRI images. In addition, the network model framework can be applied to various symptoms, various body parts, various data types and various tasks, can be used for detecting chest-based symptoms, such as pulmonary nodules and masses, frosted glass shadows, semilunar air, real change shadows, streak shadows, emphysema, bullae, pleural effusion, bronchiectasis, pleural thickening and other symptoms and diseases in the lung, and can also be used for example segmentation and semantic segmentation of focus areas, such as delineation of brain tumor areas and delineation of lung lobe segments; and classification of lesions, such as judgment of malignancy of lung nodules, grading of degree of squamous carcinoma adenocarcinoma infiltration, and the like.
Referring to fig. 3, fig. 3 is a schematic structural diagram of a medical image detection system based on 3D variable convolution and temporal feature fusion according to an embodiment of the present application, where the system 300 includes:
a data acquisition unit 301 configured to acquire 3D input data of a medical image or pseudo 3D input data formed by splicing a plurality of continuous 2D layers;
the model building unit 302 is configured to build a variable convolutional network model, and input standard medical sign detection data into a variable convolutional neural network for training to obtain a trained variable convolutional neural network model;
a data input unit 303 configured to input 3D input data of the medical image or pseudo 3D input data formed by splicing a plurality of continuous 2D layers to a trained variable convolutional neural network model;
the time sequence feature fusion unit 304 is configured to model the prediction data output by the variable convolutional neural network model by using time sequence fusion to form prediction data based on fusion time sequence features of different scales;
the pyramid progressive fusion unit 305 is configured to perform multi-scale pyramid progressive fusion on the prediction data of the fusion time sequence features with different resolutions to obtain a candidate frame of the medical symptom.
Based on the foregoing embodiment, as an optional embodiment, the data obtaining unit 301 is specifically configured to:
preprocessing the medical image to obtain 3D input data;
and (3) scanning and cutting the medical image into 2D (two-dimensional) level images, and splicing a plurality of continuous 2D level images to obtain pseudo-3D input data.
Based on the foregoing embodiment, as an optional embodiment, the model building unit 302 is specifically configured to:
constructing a variable convolution network model;
inputting standard medical sign detection data into a variable convolutional neural network for training;
and calculating by a parallel standard convolutional neural network to obtain an offset parameter, and learning the variable convolutional neural network end to end by gradient back propagation to obtain a trained variable convolutional neural network model.
Based on the foregoing embodiment, as an optional embodiment, the time sequence feature fusion unit 304 is specifically configured to:
acquiring an image frame sequence with time-series prediction data;
extracting the features of the medical signs from the image frame sequence to obtain a medical sign feature map;
and fusing the medical sign feature images according to the time sequence to obtain medical sign prediction data based on fused time sequence features of different scales.
Based on the foregoing embodiment, as an optional embodiment, the pyramid progressive fusion unit 305 is specifically configured to:
and (3) performing feature addition through 1 × 1 convolution kernels by upsampling the high-level features to obtain a candidate frame of the medical feature.
Based on the foregoing embodiment, as an optional embodiment, the system 300 further includes:
and the candidate frame extracting unit is configured for setting a threshold corresponding to the medical symptom candidate frame according to the category of the medical symptom candidate frame, and outputting the candidate frame with the detection score exceeding the threshold to obtain the detection result of the medical symptom.
Based on the foregoing embodiment, as an optional embodiment, the system 300 further includes:
the model loss calculation unit is configured for comparing the size of the candidate frame of the medical sign and the output prediction data with corresponding standard medical sign detection data and calculating a difference value between the prediction data and actual data of the medical sign; and (4) returning loss of the variable convolutional neural network model through the difference value, and optimizing the variable convolutional neural network model.
Referring to fig. 4, fig. 4 is a schematic structural diagram of a terminal system 400 according to an embodiment of the present disclosure, where the terminal system 400 can be used to execute the software multi-language display and input synchronization switching method according to the embodiment of the present disclosure.
The terminal system 400 may include: a processor 401, a memory 402, and a communication unit 403. The components communicate via one or more buses, and those skilled in the art will appreciate that the architecture of the servers shown in the figures is not intended to be limiting, and may be a bus architecture, a star architecture, a combination of more or less components than those shown, or a different arrangement of components.
The memory 402 may be used for storing instructions executed by the processor 401, and the memory 402 may be implemented by any type of volatile or non-volatile storage terminal or combination thereof, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic disk or optical disk. The execution instructions in the memory 402, when executed by the processor 401, enable the terminal system 400 to perform some or all of the steps in the method embodiments described below.
The processor 401 is a control center of the storage terminal, connects various parts of the entire electronic terminal using various interfaces and lines, and performs various functions of the electronic terminal and/or processes data by operating or executing software programs and/or modules stored in the memory 402 and calling data stored in the memory. The processor may be composed of an Integrated Circuit (IC), for example, a single packaged IC, or a plurality of packaged ICs connected with the same or different functions. For example, the processor 401 may only include a Central Processing Unit (CPU). In the embodiment of the present invention, the CPU may be a single operation core, or may include multiple operation cores.
A communication unit 403, configured to establish a communication channel so that the storage terminal can communicate with other terminals. And receiving user data sent by other terminals or sending the user data to other terminals.
The present application also provides a computer storage medium, wherein the computer storage medium may store a program, and the program may include some or all of the steps in the embodiments provided by the present invention when executed. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM) or a Random Access Memory (RAM).
According to the method, 3D variable convolution and time sequence feature fusion are combined, a model framework which can not only aim at 3D input but also aim at pseudo 3D input is constructed, and the model framework can be applied to different positions, different data types and different task scenes; the 3D-based variable convolution can adaptively adjust the high resolution on the spatial dimension according to different resolutions of an x axis, a y axis and a z axis, and can also adaptively adjust the parameters of network learning according to the geometric change of medical signs, thereby learning irregular shape information on the spatial dimension, learning the correlation information between different layers on the time sequence dimension, and adaptively learning the weight relationship between the layers; according to the method and the device, time sequence feature fusion is introduced, feature fusion is carried out in the later stage of the network, so that the model can well learn high-dimensional feature representation among image information, modeling is carried out on information of different layers in the later stage according to the time sequence information, the time sequence characteristics of data are effectively utilized, and the detection performance of the model is further improved. The method also conforms to the habit that the doctor in the imaging department checks the film according to the time sequence relation when the film is read, so that the built model has more reasonability.
The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system provided by the embodiment, the description is relatively simple because the system corresponds to the method provided by the embodiment, and the relevant points can be referred to the method part for description.
The principles and embodiments of the present application are explained herein using specific examples, which are provided only to help understand the method and the core idea of the present application. It should be noted that, for those skilled in the art, it is possible to make several improvements and modifications to the present application without departing from the principle of the present application, and such improvements and modifications also fall within the scope of the claims of the present application.
It is further noted that, in the present specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Claims (10)

1. A medical sign detection method based on 3D variable convolution and time sequence feature fusion is characterized by comprising the following steps:
acquiring 3D input data of a medical image or pseudo 3D input data formed by splicing a plurality of continuous 2D layers;
constructing a variable convolution network model, inputting standard medical sign detection data into a variable convolution neural network for training to obtain a trained variable convolution neural network model;
inputting the 3D input data of the medical image or the pseudo 3D input data formed by splicing a plurality of continuous 2D layers into a trained variable convolution neural network model;
modeling prediction data output by the variable convolutional neural network model by using time sequence fusion to form prediction data based on fusion time sequence characteristics of different scales;
and carrying out multi-scale pyramid progressive fusion on the prediction data of the fusion time sequence characteristics with different resolutions to obtain a candidate frame of the medical symptom.
2. The medical image detection method based on 3D variable volume and time series feature fusion according to claim 1, wherein the acquiring of the 3D input data of the medical image or the pseudo 3D input data formed by splicing a plurality of continuous 2D layers comprises:
preprocessing the medical image to obtain 3D input data;
and (3) scanning and cutting the medical image into 2D (two-dimensional) level images, and splicing a plurality of continuous 2D level images to obtain pseudo-3D input data.
3. The method according to claim 1, wherein the inputting of the 3D input data of the medical image or the pseudo 3D input data obtained by splicing a plurality of continuous 2D slices into the trained variable convolutional neural network model comprises:
constructing a variable convolution network model;
inputting standard medical sign detection data into a variable convolutional neural network for training;
and calculating by a parallel standard convolutional neural network to obtain an offset parameter, and learning the variable convolutional neural network end to end by gradient back propagation to obtain a trained variable convolutional neural network model.
4. The medical sign detection method based on 3D variable convolution and time-series feature fusion according to claim 1, wherein the modeling of the prediction data output by the variable convolution neural network model by using time-series fusion to form prediction data based on fused time-series features of different scales comprises:
acquiring an image frame sequence with time-series prediction data;
extracting the features of the medical signs from the image frame sequence to obtain a medical sign feature map;
and fusing the medical sign feature images according to the time sequence to obtain medical sign prediction data based on fused time sequence features of different scales.
5. The method for detecting medical signs based on 3D variable convolution and time series feature fusion according to claim 1, wherein the multi-scale pyramid progressive fusion of the prediction data of the fused time series features with different resolutions to obtain the candidate frame of the medical signs comprises:
and (3) performing feature addition through 1 × 1 convolution kernels by upsampling the high-level features to obtain a candidate frame of the medical feature.
6. The method of medical feature detection based on 3D variable volume and temporal feature fusion of claim 1, further comprising:
and setting a threshold corresponding to the medical sign candidate frame according to the category of the medical sign candidate frame, and outputting the candidate frame with the detection score exceeding the threshold to obtain a medical sign detection result.
7. The method of medical feature detection based on 3D variable volume and temporal feature fusion of claim 1, further comprising:
comparing the size of the candidate frame of the medical sign and the output prediction data with the corresponding standard medical sign detection data, and calculating to obtain a difference value between the prediction data and the actual data of the medical sign;
and (4) returning loss of the variable convolutional neural network model through the difference value, and optimizing the variable convolutional neural network model.
8. A medical feature detection system based on 3D variable convolution and temporal feature fusion, comprising:
the data acquisition unit is configured for acquiring 3D input data of the medical image or pseudo 3D input data formed by splicing a plurality of continuous 2D layers;
the model construction unit is configured for constructing a variable convolution network model, and inputting standard medical sign detection data into a variable convolution neural network for training to obtain a trained variable convolution neural network model;
the data input unit is configured to input 3D input data of the medical image or pseudo 3D input data formed by splicing a plurality of continuous 2D layers into a trained variable convolution neural network model;
the time sequence feature fusion unit is configured for modeling the prediction data output by the variable convolutional neural network model by using time sequence fusion to form prediction data based on fusion time sequence features of different scales;
and the pyramid progressive fusion unit is configured to perform multi-scale pyramid progressive fusion on the prediction data of the fusion time sequence features with different resolutions to obtain a candidate frame of the medical symptom.
9. A terminal, comprising:
a processor;
a memory for storing instructions for execution by the processor;
wherein the processor is configured to perform the method of any one of claims 1-7.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-7.
CN202010360486.5A 2020-04-30 2020-04-30 Medical sign detection method, system, terminal and storage medium based on 3D variable convolution and time sequence feature fusion Active CN111667459B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010360486.5A CN111667459B (en) 2020-04-30 2020-04-30 Medical sign detection method, system, terminal and storage medium based on 3D variable convolution and time sequence feature fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010360486.5A CN111667459B (en) 2020-04-30 2020-04-30 Medical sign detection method, system, terminal and storage medium based on 3D variable convolution and time sequence feature fusion

Publications (2)

Publication Number Publication Date
CN111667459A true CN111667459A (en) 2020-09-15
CN111667459B CN111667459B (en) 2023-08-29

Family

ID=72383052

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010360486.5A Active CN111667459B (en) 2020-04-30 2020-04-30 Medical sign detection method, system, terminal and storage medium based on 3D variable convolution and time sequence feature fusion

Country Status (1)

Country Link
CN (1) CN111667459B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112446862A (en) * 2020-11-25 2021-03-05 北京医准智能科技有限公司 Dynamic breast ultrasound video full-focus real-time detection and segmentation device and system based on artificial intelligence and image processing method
CN113517046A (en) * 2021-04-15 2021-10-19 中南大学 Heterogeneous data feature fusion method in electronic medical record, prediction method and system based on fusion features and readable storage medium
CN115187805A (en) * 2022-02-22 2022-10-14 数坤(北京)网络科技股份有限公司 Symptom identification method and device, electronic equipment and storage medium
CN115222688A (en) * 2022-07-12 2022-10-21 广东技术师范大学 Medical image classification method based on graph network time sequence
CN115439423A (en) * 2022-08-22 2022-12-06 北京医准智能科技有限公司 CT image-based identification method, device, equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH08265562A (en) * 1995-03-23 1996-10-11 Ricoh Co Ltd Picture magnification device
WO2017210690A1 (en) * 2016-06-03 2017-12-07 Lu Le Spatial aggregation of holistically-nested convolutional neural networks for automated organ localization and segmentation in 3d medical scans
CN110427807A (en) * 2019-06-21 2019-11-08 诸暨思阔信息科技有限公司 A kind of temporal events motion detection method
US20190392267A1 (en) * 2018-06-20 2019-12-26 International Business Machines Corporation Framework for integrating deformable modeling with 3d deep neural network segmentation
CN110879874A (en) * 2019-11-15 2020-03-13 北京工业大学 Astronomical big data optical variation curve abnormity detection method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH08265562A (en) * 1995-03-23 1996-10-11 Ricoh Co Ltd Picture magnification device
WO2017210690A1 (en) * 2016-06-03 2017-12-07 Lu Le Spatial aggregation of holistically-nested convolutional neural networks for automated organ localization and segmentation in 3d medical scans
US20190392267A1 (en) * 2018-06-20 2019-12-26 International Business Machines Corporation Framework for integrating deformable modeling with 3d deep neural network segmentation
CN110427807A (en) * 2019-06-21 2019-11-08 诸暨思阔信息科技有限公司 A kind of temporal events motion detection method
CN110879874A (en) * 2019-11-15 2020-03-13 北京工业大学 Astronomical big data optical variation curve abnormity detection method

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
DEVINEAU G, ET AL: "Convolutional neural networks for multivariate time series classification using both inter-and intra-channel parallel convolutions" *
WANG K, ET AL: "Multiple convolutional neural networks for multivariate time series prediction" *
俞益洲等: "人工智能在医学影像分析中的应用进展" *
俞益洲等: "深度学习在医学影像分析中的应用综述" *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112446862A (en) * 2020-11-25 2021-03-05 北京医准智能科技有限公司 Dynamic breast ultrasound video full-focus real-time detection and segmentation device and system based on artificial intelligence and image processing method
CN113517046A (en) * 2021-04-15 2021-10-19 中南大学 Heterogeneous data feature fusion method in electronic medical record, prediction method and system based on fusion features and readable storage medium
CN113517046B (en) * 2021-04-15 2023-11-07 中南大学 Heterogeneous data feature fusion method in electronic medical record, fusion feature-based prediction method, fusion feature-based prediction system and readable storage medium
CN115187805A (en) * 2022-02-22 2022-10-14 数坤(北京)网络科技股份有限公司 Symptom identification method and device, electronic equipment and storage medium
CN115187805B (en) * 2022-02-22 2023-05-05 数坤(北京)网络科技股份有限公司 Sign recognition method and device, electronic equipment and storage medium
CN115222688A (en) * 2022-07-12 2022-10-21 广东技术师范大学 Medical image classification method based on graph network time sequence
CN115439423A (en) * 2022-08-22 2022-12-06 北京医准智能科技有限公司 CT image-based identification method, device, equipment and storage medium
CN115439423B (en) * 2022-08-22 2023-09-12 北京医准智能科技有限公司 CT image-based identification method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN111667459B (en) 2023-08-29

Similar Documents

Publication Publication Date Title
US11887311B2 (en) Method and apparatus for segmenting a medical image, and storage medium
CN110599448B (en) Migratory learning lung lesion tissue detection system based on MaskScoring R-CNN network
CN112017189B (en) Image segmentation method and device, computer equipment and storage medium
CN111667459B (en) Medical sign detection method, system, terminal and storage medium based on 3D variable convolution and time sequence feature fusion
TWI715117B (en) Method, device and electronic apparatus for medical image processing and storage mdeium thereof
US20220198230A1 (en) Auxiliary detection method and image recognition method for rib fractures based on deep learning
CN112446302B (en) Human body posture detection method, system, electronic equipment and storage medium
CN112767468A (en) Self-supervision three-dimensional reconstruction method and system based on collaborative segmentation and data enhancement
CN110276408B (en) 3D image classification method, device, equipment and storage medium
CN109035261A (en) Medical imaging processing method and processing device, electronic equipment and storage medium
CN111814768B (en) Image recognition method, device, medium and equipment based on AI composite model
KR20230113386A (en) Deep learning-based capsule endoscopic image identification method, device and media
CN109671055B (en) Pulmonary nodule detection method and device
CN111402217A (en) Image grading method, device, equipment and storage medium
CN114612832A (en) Real-time gesture detection method and device
CN117710760B (en) Method for detecting chest X-ray focus by using residual noted neural network
CN113158970B (en) Action identification method and system based on fast and slow dual-flow graph convolutional neural network
CN112884702A (en) Polyp identification system and method based on endoscope image
CN116434303A (en) Facial expression capturing method, device and medium based on multi-scale feature fusion
CN116258756A (en) Self-supervision monocular depth estimation method and system
CN111626972B (en) CT image reconstruction method, model training method and equipment
CN110570417B (en) Pulmonary nodule classification device and image processing equipment
CN117392137B (en) Intracranial aneurysm image detection method, system, equipment and medium
CN117690128B (en) Embryo cell multi-core target detection system, method and computer readable storage medium
CN115359046B (en) Organ blood vessel segmentation method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant