CN116630679A - Osteoporosis identification method based on CT image and domain invariant feature - Google Patents

Osteoporosis identification method based on CT image and domain invariant feature Download PDF

Info

Publication number
CN116630679A
CN116630679A CN202310321204.4A CN202310321204A CN116630679A CN 116630679 A CN116630679 A CN 116630679A CN 202310321204 A CN202310321204 A CN 202310321204A CN 116630679 A CN116630679 A CN 116630679A
Authority
CN
China
Prior art keywords
domain
convolution
module
stage
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310321204.4A
Other languages
Chinese (zh)
Other versions
CN116630679B (en
Inventor
张堃
林鹏程
邵瑞
王林
潘晶
曹蕊
徐沛霞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nantong University
Original Assignee
Nantong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nantong University filed Critical Nantong University
Priority to CN202310321204.4A priority Critical patent/CN116630679B/en
Publication of CN116630679A publication Critical patent/CN116630679A/en
Application granted granted Critical
Publication of CN116630679B publication Critical patent/CN116630679B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/094Adversarial learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/096Transfer learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/10Image acquisition
    • G06V10/16Image acquisition using multiple overlapping images; Image stitching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the technical field of medical image classification, in particular to an osteoporosis identification method based on CT images and domain invariant features, which comprises the following steps: s1, establishing a MACE-cyclegaN model for generating an countermeasure network to perform field migration on a normal dose image and a low dose image, and generating a plurality of pseudo image data; s2, establishing a multi-task relation learning model of correlation among three task modules and emphasizing the relation among vertebrae; s3, establishing a module based on domain anti-migration learning, and embedding domain adaptation into a learning representation process so that a final classification decision is based on the characteristic of distinguishing and invariance to the change of the domain; s4, establishing a deep learning model of the normal dose and low dose images based on the positioning module, the segmentation module, the classification model and the domain contrast transfer learning module, and constructing a step training loss function to prevent the model from being overfitted. The invention can identify the osteoporosis of the normal dose CT image and the osteoporosis of the low dose CT image.

Description

Osteoporosis identification method based on CT image and domain invariant feature
Technical Field
The invention relates to the technical field of medical image classification, in particular to an osteoporosis identification method based on CT images and domain invariant features.
Background
The number of Computed Tomography (CT) studies is rapidly increasing, and epidemiological studies have shown that even two to three CT scans result in an increased risk of detectable cancer due to the much higher radiation dose involved in CT scans than normal film. Low dose chest computed tomography (LDCT) is widely used for early lung cancer screening, with less ionizing radiation, and has been shown to significantly reduce lung cancer mortality. Currently, bone Mineral Density (BMD) is directly related to bone strength and is widely used in clinical practice to diagnose and monitor osteoporosis. Quantitative Computed Tomography (QCT) is increasingly used to measure vertebral BMD from clinical Computed Tomography (CT) scans. While CT must be used for this and other important tasks, minimizing radiation dose has been a trend in CT-related research over the past decades. LDCT and QCT may be an attractive combination that can be screened for lung cancer and osteoporosis by a single LDCT scan to limit radiation dose and cost. LDCT typically covers the upper lumbar spine and asynchronous QCT has been introduced into clinical workflows to facilitate accurate measurement of spinal BMD.
Currently, analysis of QCT images still requires frequent manual operations, including localization of the Vertebral Body (VB) and placement of the volume of interest (VOI), and also requires calculation of the average bone densities of the lumbar 1 and lumbar 2 vertebral bodies as a basis for the final judgment, which obviously is not suitable for large-scale osteoporosis screening. Recently, deep learning, especially convolutional neural networks, has significantly improved performance in various tasks of vertebra identification, segmentation and classification, which is beneficial to osteoporosis screening. However, these models are both models of osteoporosis detection from two aspects, some of which model based on images of normal doses and some of which model based on images of low doses. This means that if they are applied to different data sets than they were created, they may perform poorly due to differences between image fields, such as different doses, different scanning devices, etc. It is a solution to the above problem to build a model that can accommodate different image fields, seeking a common feature for each image field.
Automatic positioning and identification, segmentation, classification and CT value plus clinical index fitting of vertebral bodies to obtain QCT values are critical for establishing an osteoporosis Computer Aided System (CAS), which mainly comprises four steps: establishing a vertebra positioning and identifying model; vertebrae segmentation; sorting vertebrae; CT values fit to QCT, but a single task is very time consuming and the correlation between multiple tasks is easily ignored, so it is critical to build a correlation model between multiple tasks.
In addition, a great amount of medical image data is a precondition for establishing a deep learning model, but noise is increased clinically due to reduction of radiation dose, so that the model is mainly used as a normal dose image at present, and in order to obtain a great amount of low dose images used for model training, image style migration can be performed, and the migrated images are used for image training.
Disclosure of Invention
The invention aims to solve the defects in the prior art, and provides an osteoporosis identification method based on CT images and domain invariant features, which solves the problems of different image quality caused by different radiation doses of equipment in normal-dose CT images and low-dose CT images, influences on osteoporosis identification diagnosis results, improves the accuracy of osteoporosis primary screening classification, reduces the radiation of equipment to patients, and ensures diagnosis performance.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
a osteoporosis identification method based on CT image and domain invariant features comprises the following specific steps:
s1: establishing a MACE-cyclegaN model for generating an countermeasure network, and performing field migration on a normal dose image and a low dose image to generate pseudo image data;
s2: establishing a correlation among a key point positioning module, a cone segmentation task module and a cone classification task module and a multi-task relation learning model for emphasizing the relation among vertebrae;
s3: establishing a domain-based anti-migration learning module, and embedding domain adaptation into a learning representation process so that a final classification decision is based on the characteristic of distinguishing and invariance to the change of the domain;
s4: based on the key point positioning module, the cone segmentation task module, the cone classification task module and the domain contrast migration learning module, a deep learning model of normal dose and low dose images is established, and a step training loss function is established to prevent model overfitting.
Preferably, in step S1, the MACE-CycleGAN model specifically includes a generator and discriminator;
the generator is of a U-shaped architecture and comprises 5 coding layers and 5 decoding layers, a jump connection layer is arranged between the corresponding coding layers and decoding layers, CT images with original image sizes of 512 multiplied by 512 pixels are used as network inputs, downsampling is carried out by five convolution layers, batchnormal and ReLU activation functions are arranged behind each convolution layer, and a characteristic diagram of 16 multiplied by 1024 is obtained after passing through the five downsampling layers; then, a residual error module is formed through six residual error layers, so that the depth of the network is deepened, the nonlinearity of the model is improved, and the size of a characteristic diagram and the number of channels of the residual error layers are unchanged; finally, up-sampling the feature map through five transposed convolution layers, and doubling the size of the feature map of each layer; the generator ultimately produces RGB images of size 512 x 512;
the discriminator consists of nine convolutional layers, with a self-attention layer added after both the fifth and sixth layers of convolutional layers, since there are smaller feature layers at both the fifth and sixth layers.
Preferably, in step S2:
the key point positioning module calculates the middle positions of the L1 cone and the L2 cone, cuts the image with 512 multiplied by 512 pixels by the middle position, removes large-area redundant features after the cutting, and increases the usability of the features;
the cone segmentation task module extracts low-level features from the coding layer, aggregates high-level features from the key point positioning module and suppresses the interference of noise in the background on segmentation;
the cone classification task module takes the interested region results obtained by the key point positioning module and the cone segmentation task module as input, and aggregates the characteristics from the key point positioning module, thereby improving classification accuracy.
Preferably, the key point positioning module specifically includes:
the key point positioning module performs downsampling through two convolution layers with the convolution kernel size of 3×3 and the step distance of 2, and then performs four phases in total: the first stage comprises four residual blocks, the number of channels is adjusted by the four residual blocks, and the size of a characteristic layer is not changed; the second stage obtains two branches with different scales through two convolution layers with the parallel convolution kernel size of 3 multiplied by 3 on the basis of the first stage, model nonlinearity is increased through two identical parallel convolutions, and then multi-scale information generated by the two parallel convolutions is fused; a third stage, adding a convolution for downsampling based on the second stage, wherein the convolution kernel is 3×3 steps of 2, and the convolution is added after the added convolution, and then fusing the three scale features; a fourth stage, adding a convolution for downsampling based on the third stage, adding a convolution after the added downsampling, and then carrying out feature fusion on information from four scales; through parallel branches with multiple resolutions, information interaction among different branches is continuously performed, and the purposes of enhancing semantic information and accurate position information are achieved.
Preferably, the cone segmentation task module specifically comprises:
the cone segmentation task module is a U-shaped structure constructed on the basis of the key point positioning module, and the U-shaped structure is composed of an encoding layer and a decoding layer;
the coding layer is formed by the first three stages of the key point positioning module;
the first up-sampling stage of the decoding layer is to increase an up-sampling hole convolution after a third parallel convolution of a third stage of the key point positioning module, then to aggregate a low-level feature from the hole convolution and a high-level feature from the second parallel convolution of the third stage of the key point positioning module, and then to increase nonlinearity through a convolution; the second up-sampling stage is to up-sample on the basis of the first stage, and aggregate the low-level features from the up-sampling of the first stage with the high-level features from the first parallel convolution of the third stage of the key point positioning module, and then increase the nonlinearity by one convolution; the third up-sampling stage is to up-sample on the basis of the second stage, and aggregate the low-level features from the up-sampling of the second stage and the high-level features from the second convolution of the key point positioning module, then increase the nonlinearity by one convolution, and then obtain the segmented image by two convolutions; the cone segmentation task module increases the task relevance of the key point positioning module and the cone segmentation task module through partial feature extraction convolution of the shared key point positioning module.
Preferably, the cone classification task module specifically comprises:
firstly, integrating results of a key point positioning module and a cone segmentation task module to obtain an image with the size of 256 multiplied by 256 and the channel number of 6 as input of a cone classification task module, wherein the cone classification task module firstly is convolution with the convolution kernel size of 8 multiplied by 8 and the step distance of 2, secondly is a maximum pooling layer to reduce the resolution of the image, and then has four stages: the first stage is formed by stacking three residual blocks, and then aggregating the features of the 2 nd parallel convolution and the features of the residual blocks from the fourth stage of the key point positioning module; the second stage is formed by stacking four residual blocks, and then aggregating the features of the 3 rd parallel convolution and the features of the residual blocks from the fourth stage of the key point positioning module; the third stage is formed by stacking 6 residual blocks, and then aggregating the features of the 4 th parallel convolution and the features of the residual blocks from the fourth stage of the key point positioning module; a fourth stage, which is formed by stacking 3 residual blocks; finally, a one-dimensional feature vector is obtained, the class classification is carried out on the feature vector through a full-connection layer, three neurons are output at the end of the full-connection layer, and the result of each neuron is the probability of each class.
Preferably, in step S3:
the domain countermeasure migration learning module consists of a full-connection layer, wherein the input of the full-connection layer is a one-dimensional feature vector obtained by the cone classification task module, the final output of the full-connection layer is provided with two neurons, and the result of each neuron is the domain probability of each neuron.
Preferably, in step S4, a step training loss function is constructed to prevent model overfitting, comprising two steps:
firstly, performing gradient overturn on a domain countermeasure migration learning module;
second, a step-wise hierarchical loss function is constructed.
Preferably, the gradient inversion of the domain anti-migration learning module specifically includes:
let the domain classification layer be G d Through G d Task module G for classifying from vertebral bodies f M x 1 feature vectors of (c) are mapped to corresponding domain labels, expressed as:
G d (x)=ReLU(ux+z)
where (u, z) is a matrix vector pair, representing the weight and bias, respectively.
Given the example (x i ,d i ) The field loss is calculated using binary cross entropy:
L d (G d (G f (x i )),d i )=-d i log(G d (G f (x i )))-(1-d i )log(1-G d (G f (x i )))
wherein x is i D for inputting image i Is a domain category, and d i ∈[0,1]0 represents a normal dose image domain, 1 represents a low dose image domain, L d Representing a class loss function.
Wherein gradients from class and domain predictors should be subtracted rather than added during back propagation in the neural network, a gradient inversion layer is employed here to circumvent this error, and gradients are taken from subsequent layers and sign is changed during back propagation, i.e. multiplied by-1 before passing to the previous layer.
Preferably, the constructing the step-by-step hierarchical loss function specifically includes:
since the prediction for each patient is based on two CT images (lumbar 1 cone and lumbar 2 cone), the two CT images are stitched in the channel dimension at the time of input, so there are two loss functions for lumbar 1 and lumbar 2 in the segmentation and localization task, while in the class classification and domain classification tasks, the acquired image of interest is stitched once more in the channel dimension, so there is only one loss function.
In the multitasking, in order to balance the global relationship and the multitasking relationship, a new multitasking loss function is adopted, which is specifically as follows:
where epo represents the total round representing the network training, n represents the appropriate round capable of balancing the segmentation task and the localization task, λ 1 、λ 2 、λ 3 、λ 4 Balancing weights between multiple tasks for a learnable adaptation factor, L seg1 Represents the lumbar 1 vertebral body segmentation loss function, L seg2 Representing the loss function of the lumbar 2 vertebral body, L loc1 Represents the lumbar 1 vertebral body positioning loss function, L loc2 Representing the lumbar 2 vertebral body positioning loss function, L class Representing class loss functions, L domain1 Representing a normal dose image domain loss function, L domain2 Representing a low dose image domain loss function.
Compared with the prior art, the invention has the following beneficial effects:
1. the invention solves the problem of influence on the osteoporosis identification diagnosis result caused by different image quality due to different radiation doses of equipment in normal dose CT images and low dose CT images, improves the accuracy of osteoporosis preliminary screening classification, reduces the radiation of the equipment to patients, and ensures the diagnosis performance.
2. The invention can identify the osteoporosis of the normal dose CT image and the osteoporosis of the low dose CT image.
Drawings
FIG. 1 is an overall block diagram of the present invention;
FIG. 2 is a diagram of a MACE-CycleGAN model network structure in the present invention;
FIG. 3 is a diagram of a multi-task joint learning network according to the present invention.
Detailed Description
The following technical solutions in the embodiments of the present invention will be clearly and completely described with reference to the accompanying drawings, so that those skilled in the art can better understand the advantages and features of the present invention, and thus the protection scope of the present invention is more clearly defined. The described embodiments of the present invention are intended to be only a few, but not all embodiments of the present invention, and all other embodiments that may be made by one of ordinary skill in the art without inventive faculty are intended to be within the scope of the present invention.
Referring to fig. 1-3, a method for identifying osteoporosis based on CT images and domain invariant features comprises the following specific steps:
s1, establishing a MACE-cyclegaN model for generating an countermeasure network to perform field migration on a normal dose image and a low dose image, and generating a plurality of pseudo image data;
in order to solve the problem of insufficient low-dose images, MACE-CycleGAN models are used to generate a plurality of pseudo images, so that the aim of enhancing network generalization is fulfilled. The MACE-CycleGAN model, as shown in FIG. 2, specifically includes a generator and discriminator.
The generator uses a U-shaped architecture, comprising 5 coding layers and 5 decoding layers, and a jump connection layer is also arranged between the corresponding coding layers and decoding layers. Assume thatIs a normal dose image with the size w×h, and +.>Is a corresponding low dose image, the relationship of which can be expressed as:
wherein G is AB A style migration process representing a normal dose CT image to a low dose CT image, G BA Representing the style migration process of a low dose CT image to a normal dose CT image. The method is specifically described as follows, a CT image with original image size of 512×512 pixels is taken as an input of a network, downsampling is carried out by five convolution layers, the Batchnormal and ReLU activation functions are arranged behind each convolution layer, a characteristic image of 16×16×1024 is obtained after five downsampling layers, then the depth of the network is deepened through six residual layers, and the nonlinearity of the model is improved, wherein the characteristic image size and the channel number of the residual layers are unchanged. Finally, through five transposed rollsThe layering upsamples the feature map doubling the feature map size for each layer. The generator ultimately produces RGB images of size 512 x 512.
Wherein the discriminator consists of nine convolutional layers, since there are smaller feature layers at both the fifth and sixth layers, a self-attention layer is added after both the fifth and sixth layers.
S2, establishing a multi-task relation learning model of correlation among three task modules and emphasizing the relation among vertebrae; the three task modules comprise a key point positioning module, a cone segmentation task module and a cone classification task module;
as shown in fig. 3, the key point positioning module performs downsampling by using two convolution layers with a convolution kernel size of 3×3 and a step distance of 2, and then passes through four stages in total, wherein the first stage comprises four residual blocks, and the four residual blocks only adjust the number of channels and do not change the size of the feature layer; the second stage obtains two branches with different scales through two convolution layers with the parallel convolution kernel size of 3 multiplied by 3 on the basis of the first stage, model nonlinearity is increased through two identical parallel convolutions, and then multi-scale information generated by the two parallel convolutions is fused; a third stage, adding a convolution for downsampling based on the second stage, wherein the convolution kernel is 3×3 steps of 2, and the convolution is added after the added convolution, and then fusing the three scale features; a fourth stage, adding a convolution for downsampling based on the third stage, adding a convolution after the added downsampling convolution, then performing feature fusion on information from four scales, then adding nonlinearity of the model through four parallel convolutions, then performing feature fusion on high resolution, and outputting a high resolution image F in the last layer L Represented as regression heat map H P To predict the centrum of the vertebral body. It is particularly pointed out that the actual heat map is H t The calculation formula is as follows, generated by a 2D gaussian distribution with a standard deviation of 1 pixel at the center of the true position of each key point:
where (m, n) is the position of each pixel, (m) k ,n k ) Representing the coordinates of the real label, σ represents the radial extent. Through parallel branches with multiple resolutions, information interaction among different branches is continuously performed, and meanwhile the purposes of strong semantic information and accurate position information are achieved.
The cone segmentation task module shown in fig. 3 assumes that the output characteristics of three parallel convolutions of the third stage of the key point positioning module are F respectively 1 、F 2 、F 3 Then first for the parallel convolution feature F from the lowest layer 1 Upsampling by hole convolution to obtain F up1 And then F is carried out up1 Feature F convolved in parallel with upper layer 2 After cascade connection is carried out, after cascade connection characteristics are obtained, a convolution layer is utilized again to carry out nonlinear modeling on the characteristics after cascade connection:
likewise obtain F c2 Then, the cavity convolution is used again to perform up-sampling to obtain F up2 And then F is carried out up2 Features F after parallel convolution with the uppermost layer 3 After cascading, the characteristics are subjected to nonlinear modeling again by using a convolution layer;
likewise obtain F c3 Then, the cavity convolution is used again to perform up-sampling to obtain F up3 And then F is carried out up3 High-level features F after convolving with the second convolution layer 4 After cascading, the features are subjected to nonlinear modeling again by using a convolution layer, then up-sampling is performed by using hole convolution to map the feature space size to be consistent with the size of an input image, and convolution is performed againThe layer changes the channel number to be consistent with the original one so as to obtain a mask M of the interested region of the original image:
after obtaining the mask for each image, we can obtain a region of interest map R for each image X:
R=Extract(X,M)
the cone classification task module shown in fig. 3, after obtaining cone key point coordinates (m center ,n center ) Then, by combining the region of interest map R, a 256×256 small image X with redundant features removed and only including the vertebral body can be obtained Crop Specifically, the method can be expressed as:
X crop =Crop(R,(m center ,n center ))
then X is taken up Crop As an input to the cone classification module, the cone classification module is firstly a convolution with a convolution kernel size of 8×8 steps of 2, secondly a maximum pooling layer to reduce the resolution of the image, and then four phases are all provided, wherein the first phase is formed by stacking three residual blocks, and then the features of the 2 nd parallel convolution and the features of the residual blocks from the fourth phase of the key point positioning module are aggregated; the second stage is formed by stacking four residual blocks, and then aggregating the features of the 3 rd parallel convolution and the features of the residual blocks from the fourth stage of the key point positioning module; the third stage is formed by stacking 6 residual blocks, and then aggregating the features from the 4 th parallel convolution of the fourth stage of the keypoint location module with the features of the residual blocks. Specifically, it can be described as follows:
wherein stage n Representing the nth stage of the vertebral body classification module, parallel n+1 The n+1th parallel convolution representing the fourth stage of the keypoint locating module. And a fourth stage, which is formed by stacking 3 residual blocks. A characteristic map f of 1×1×2048 is obtained last The feature map is remodelled to obtain feature vectors f for category classification and domain classification vector
f vector =reshape(f last )
S3, establishing a module based on domain anti-migration learning, and embedding domain adaptation into a learning representation process so that a final classification decision is based on the characteristic of distinguishing and invariance to the change of the domain;
the domain countermeasure against migration learning module as shown in FIG. 3 assumes that the input isY= {0,1,..l-1 } as L possible label set, then x×y can be split into two different data fields, the normal dose image being the source field D S The low dose image is the target domain D T From source domain D S Taking out the sample S and marking the sample, and obtaining the sample from the target domain D T Unlabeled sample T) is removed:
where n=n+n' is the total number of samples used for training, x i For the input ith picture, y i Is the corresponding label. Here S and T are taken as a high-dimensional input vector, and an implicit convolution feature extraction network is assumed to be G f Through G f Mapping the input X to an m X1 feature vector can be expressed as:
G f (x)=ReLU(wx+b)
here, theWhere (w, b) is a matrix vector pair, representing the weight and bias, respectively.
Likewise, the prediction layer consisting of several fully connected layers is G y Through G y Mapping m×1 feature vectors to corresponding class labels can be expressed as:
G y (G f (x))=softmax(vG f (x)+c)
here, theWhere (v, c) is a matrix vector pair, representing the weight and bias, respectively.
Here G y (G f (x) Indicating that the neural network is to be inputConditional probability assigned to class Y. Assume that a given source example is (x i ,y i ) The multi-component cross entropy loss is used to calculate the loss between the predicted value and the true value.
L y (G y (G f (x i )),y i )=-G y (G f (x i ))log(y i )
In order to achieve the purpose of generalization of the field, it is first assumed that the outputs for the source field with the normal dose image and the target field with the low dose image are respectively:
then in the source domain sample S (G f (x) And target domain sample T (G) f (x) Feature vector divergence between) may represent:
wherein x represents a normal dose image sample, x represents a low dose image sample, ia]Representing a discriminant function, if a meets the condition, it is 1, otherwise it is 0, and eta is a classifier. In order to accurately estimate the min part, a domain classification layer is constructed to achieve the effect through a logistic regression mode, and the domain classification layer is assumed to be G d Through G d Mapping m×1 feature vectors to corresponding domain labels can be expressed as:
G d (x)=ReLU(ux+z)
where (u, z) is a matrix vector pair, representing the weight and bias, respectively.
Given the example (x i ,d i ) The field loss is calculated using binary cross entropy:
L d (G d (G f (x i )),d i )=-d i log(G d (G f (x i )))-(1-d i )log(1-G d (G f (x i )))
d herein i Representing a domain label, and d i ∈[0,1]0 denotes a source domain, and 1 denotes a target domain.
During the training of the network, the class labels of the normal dose images from the source domain are known, but the class labels from the target domain are unknown, but the class probability from the target domain needs to be accurately predicted at the time of prediction. A domain adaptation optimization objective can be added, and the specific expression is as follows:
then a new optimization objective is constructed:
here λ is an adjustable factor to balance the weights of the two targets. If the optimal parameters of E (w, b, v, c, u, z) are found through the neural network, this can divide E (w, b, v, c, u, z) into partial parameter minimization and partial parameter maximization problems, but during the back propagation in the neural network the gradients from class and domain predictors should be subtracted rather than added, so here a gradient inversion layer is employed to circumvent this error, during the back propagation the gradients are taken from the subsequent layer and the sign is changed, i.e. multiplied by-1 before passing to the previous layer.
S4, based on the key point positioning module, the cone segmentation task module, the cone classification task module and the domain contrast transfer learning module, a deep learning model of normal dose and low dose images is established, and a step training loss function is established to prevent the model from being fitted.
Since the prediction for each patient is based on two CT images (lumbar 1 cone and lumbar 2 cone), the two CT images are stitched in the channel dimension at the time of input, which inevitably has two loss functions for lumbar 1 and lumbar 2 in the segmentation and localization tasks, while the acquired image of interest is stitched once again in the channel dimension in the class classification and domain classification tasks, so this problem need not be considered.
The mean square error, a measure of how much the difference between the estimated and estimated quantities is reflected, is used in the key point location module. The smaller the value, the closer the predicted value is to the target value.
Wherein L is i Representing the actual centrum coordinates of the vertebral body,represents the predicted center point coordinates, and n represents the number of elements.
In the cone segmentation module, the Dice Loss is used, and is named by a Dice coefficient, and the Dice coefficient is a measurement function for evaluating the similarity of two samples, wherein the larger the value is, the more similar the two samples are.
Wherein, the liquid crystal display device comprises a liquid crystal display device,representing a true segmented image S and a predicted segmented image +.>Dot multiplication and addition between, |S| and +|>Representing the addition of pixels in their respective corresponding images.
The loss functions used in the category classification and the domain classification are binary cross entropy and multiple cross entropy functions, respectively. In multitasking, in order to balance the global relationship and the multitasking relationship, a new multitasking loss function is designed:
where epo represents the total round of training of the network, n represents the appropriate round to balance the segmentation task and the localization task, λ 1 、λ 2 、λ 3 、λ 4 Balancing weights between multiple tasks for a learnable adaptation factor, L seg1 Represents the lumbar 1 vertebral body segmentation loss function, L seg2 Representing the loss function of the lumbar 2 vertebral body, L loc1 Represents the lumbar 1 vertebral body positioning loss function, L loc2 Representing the lumbar 2 vertebral body positioning loss function, L class Representing class loss functions, L domain1 Representing a normal dose image domain loss function, L domain2 Representing a low dose image domain loss function.
In summary, the invention solves the problem that in the normal dose CT image and the low dose CT image, the image quality is different due to the different radiation doses of the equipment, and the influence on the osteoporosis identification diagnosis result is improved, the accuracy of the osteoporosis primary screening classification is improved, and the diagnosis performance is ensured while the radiation of the equipment to the patient is reduced.
The description and practice of the invention disclosed herein will be readily apparent to those skilled in the art, and may be modified and adapted in several ways without departing from the principles of the invention. Accordingly, modifications or improvements may be made without departing from the spirit of the invention and are also to be considered within the scope of the invention.

Claims (10)

1. The osteoporosis identification method based on CT image and domain invariant features is characterized by comprising the following specific steps:
s1: establishing a MACE-cyclegaN model for generating an countermeasure network, and performing field migration on a normal dose image and a low dose image to generate pseudo image data;
s2: establishing a correlation among a key point positioning module, a cone segmentation task module and a cone classification task module and a multi-task relation learning model for emphasizing the relation among vertebrae;
s3: establishing a domain-based anti-migration learning module, and embedding domain adaptation into a learning representation process so that a final classification decision is based on the characteristic of distinguishing and invariance to the change of the domain;
s4: based on the key point positioning module, the cone segmentation task module, the cone classification task module and the domain contrast migration learning module, a deep learning model of normal dose and low dose images is established, and a step training loss function is established to prevent model overfitting.
2. The method for identifying osteoporosis based on CT images and domain invariant features of claim 1, wherein in step S1, MACE-CycleGAN model specifically comprises a generator and discriminator;
the generator is of a U-shaped architecture and comprises 5 coding layers and 5 decoding layers, a jump connection layer is arranged between the corresponding coding layers and decoding layers, CT images with original image sizes of 512 multiplied by 512 pixels are used as network inputs, downsampling is carried out by five convolution layers, batch Normal and ReLU activation functions are arranged behind each convolution layer, and a characteristic image of 16 multiplied by 1024 is obtained after passing through the five downsampling layers; then, a residual error module is formed through six residual error layers, so that the depth of the network is deepened, the nonlinearity of the model is improved, and the size of a characteristic diagram and the number of channels of the residual error layers are unchanged; finally, up-sampling the feature map through five transposed convolution layers, and doubling the size of the feature map of each layer; the generator ultimately produces RGB images of size 512 x 512;
the discriminator consists of nine convolutional layers, with a self-attention layer added after both the fifth and sixth convolutional layers.
3. The method for identifying osteoporosis based on CT images and domain invariant features of claim 1, wherein in step S2:
the key point positioning module calculates the middle positions of the L1 cone and the L2 cone, cuts the image with 512 multiplied by 512 pixels by the middle position, removes large-area redundant features after the cutting, and increases the usability of the features;
the cone segmentation task module extracts low-level features from the coding layer, aggregates high-level features from the key point positioning module and suppresses the interference of noise in the background on segmentation;
the cone classification task module takes the interested region results obtained by the key point positioning module and the cone segmentation task module as input, and aggregates the characteristics from the key point positioning module, thereby improving classification accuracy.
4. The osteoporosis identification method based on CT images and domain invariant features of claim 3, wherein said key point location module specifically comprises:
the key point positioning module performs downsampling through two convolution layers with the convolution kernel size of 3×3 and the step distance of 2, and then performs four phases in total: the first stage comprises four residual blocks, the number of channels is adjusted by the four residual blocks, and the size of a characteristic layer is not changed; the second stage obtains two branches with different scales through two convolution layers with the parallel convolution kernel size of 3 multiplied by 3 on the basis of the first stage, model nonlinearity is increased through two identical parallel convolutions, and then multi-scale information generated by the two parallel convolutions is fused; a third stage, adding a convolution for downsampling based on the second stage, wherein the convolution kernel is 3×3 steps of 2, and the convolution is added after the added convolution, and then fusing the three scale features; a fourth stage, adding a convolution for downsampling based on the third stage, adding a convolution after the added downsampling, and then carrying out feature fusion on information from four scales; through parallel branches with multiple resolutions, information interaction among different branches is continuously performed, and the purposes of enhancing semantic information and accurate position information are achieved.
5. The osteoporosis identification method based on CT images and domain invariant features of claim 3, wherein the cone segmentation task module specifically comprises:
the cone segmentation task module is a U-shaped structure constructed on the basis of the key point positioning module, and the U-shaped structure is composed of an encoding layer and a decoding layer;
the coding layer is formed by the first three stages of the key point positioning module;
the first up-sampling stage of the decoding layer is to increase an up-sampling hole convolution after a third parallel convolution of a third stage of the key point positioning module, then to aggregate a low-level feature from the hole convolution and a high-level feature from the second parallel convolution of the third stage of the key point positioning module, and then to increase nonlinearity through a convolution; the second up-sampling stage is to up-sample on the basis of the first stage, and aggregate the low-level features from the up-sampling of the first stage with the high-level features from the first parallel convolution of the third stage of the key point positioning module, and then increase the nonlinearity by one convolution; the third up-sampling stage is to up-sample on the basis of the second stage, and aggregate the low-level features from the up-sampling of the second stage and the high-level features from the second convolution of the key point positioning module, then increase the nonlinearity by one convolution, and then obtain the segmented image by two convolutions; the cone segmentation task module increases the task relevance of the key point positioning module and the cone segmentation task module through partial feature extraction convolution of the shared key point positioning module.
6. The osteoporosis identification method based on CT images and domain invariant features of claim 3, wherein said cone classification task module specifically comprises:
firstly, integrating results of a key point positioning module and a cone segmentation task module to obtain an image with the size of 256 multiplied by 256 and the channel number of 6 as input of a cone classification task module, wherein the cone classification task module firstly is convolution with the convolution kernel size of 8 multiplied by 8 and the step distance of 2, secondly is a maximum pooling layer to reduce the resolution of the image, and then has four stages: the first stage is formed by stacking three residual blocks, and then aggregating the features of the 2 nd parallel convolution and the features of the residual blocks from the fourth stage of the key point positioning module; the second stage is formed by stacking four residual blocks, and then aggregating the features of the 3 rd parallel convolution and the features of the residual blocks from the fourth stage of the key point positioning module; the third stage is formed by stacking 6 residual blocks, and then aggregating the features of the 4 th parallel convolution and the features of the residual blocks from the fourth stage of the key point positioning module; a fourth stage, which is formed by stacking 3 residual blocks; finally, a one-dimensional feature vector is obtained, the class classification is carried out on the feature vector through a full-connection layer, three neurons are output at the end of the full-connection layer, and the result of each neuron is the probability of each class.
7. The method for identifying osteoporosis based on CT images and domain invariant features of claim 1, wherein in step S3:
the domain countermeasure migration learning module consists of a full-connection layer, wherein the input of the full-connection layer is a one-dimensional feature vector obtained by the cone classification task module, the final output of the full-connection layer is provided with two neurons, and the result of each neuron is the domain probability of each neuron.
8. The method for identifying osteoporosis based on CT images and domain invariant features of claim 1, wherein in step S4, step training loss functions are constructed to prevent model overfitting, comprising two steps:
firstly, performing gradient overturn on a domain countermeasure migration learning module;
second, a step-wise hierarchical loss function is constructed.
9. The method for identifying osteoporosis based on CT images and domain invariant features of claim 8, wherein said gradient-reversing domain anti-migration learning module comprises:
let the domain classification layer be G d Through G d Task module G for classifying from vertebral bodies f M x 1 of (a) is mapped to a corresponding domain label, expressed as:
G d (x)=ReLU(ux+z)
wherein (u, z) is a matrix vector pair, representing weight and bias, respectively;
given the example (x i ,d i ) The field loss is calculated using binary cross entropy:
L d (G d (G f (x i )),d i )=-d i log(G d (G f (x i )))-(1-d i )log(1-G d (G f (x i )))
wherein x is i D for inputting image i Is a domain category, and d i ∈[0,1]0 represents a normal dose image domain, 1 represents a low dose image domain, L d Representing a class loss function.
10. The method for identifying osteoporosis based on CT images and domain invariant features of claim 8, wherein said constructing step-wise hierarchical loss function comprises:
the new multitasking loss function is adopted, and the method is as follows:
where epo represents the total round representing the network training, n represents the appropriate round capable of balancing the segmentation task and the localization task, λ 1 、λ 2 、λ 3 、λ 4 Balancing weights between multiple tasks for a learnable adaptation factor, L seg1 Represents the lumbar 1 vertebral body segmentation loss function, L seg2 Representing the loss function of the lumbar 2 vertebral body, L loc1 Represents the lumbar 1 vertebral body positioning loss function, L loc2 Representing the lumbar 2 vertebral body positioning loss function, L class Representing class loss functions, L domain1 Representing a normal dose image domain loss function, L domain2 Representing a low dose image domain loss function.
CN202310321204.4A 2023-03-29 2023-03-29 Osteoporosis identification method based on CT image and domain invariant feature Active CN116630679B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310321204.4A CN116630679B (en) 2023-03-29 2023-03-29 Osteoporosis identification method based on CT image and domain invariant feature

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310321204.4A CN116630679B (en) 2023-03-29 2023-03-29 Osteoporosis identification method based on CT image and domain invariant feature

Publications (2)

Publication Number Publication Date
CN116630679A true CN116630679A (en) 2023-08-22
CN116630679B CN116630679B (en) 2024-06-04

Family

ID=87612284

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310321204.4A Active CN116630679B (en) 2023-03-29 2023-03-29 Osteoporosis identification method based on CT image and domain invariant feature

Country Status (1)

Country Link
CN (1) CN116630679B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110751207A (en) * 2019-10-18 2020-02-04 四川大学 Fault diagnosis method for anti-migration learning based on deep convolution domain
CN111881910A (en) * 2020-07-31 2020-11-03 杭州依图医疗技术有限公司 Information processing method based on vertebra image and computer readable storage medium
CN112446423A (en) * 2020-11-12 2021-03-05 昆明理工大学 Fast hybrid high-order attention domain confrontation network method based on transfer learning
CN113658142A (en) * 2021-08-19 2021-11-16 江苏金马扬名信息技术股份有限公司 Hip joint femur near-end segmentation method based on improved U-Net neural network
CN114066873A (en) * 2021-11-24 2022-02-18 袁兰 Method and device for detecting osteoporosis by utilizing CT (computed tomography) image
CN114821097A (en) * 2022-04-07 2022-07-29 西南交通大学 Multi-scale feature image classification method based on transfer learning
CN114863165A (en) * 2022-04-12 2022-08-05 南通大学 Vertebral body bone density classification method based on fusion of image omics and deep learning features
CN114937502A (en) * 2022-07-07 2022-08-23 西安交通大学 Method and system for evaluating osteoporotic vertebral compression fracture based on deep learning
CN115374943A (en) * 2022-09-01 2022-11-22 武汉东湖大数据交易中心股份有限公司 Data cognition calculation method and system based on domain confrontation migration network
CN115719329A (en) * 2022-08-23 2023-02-28 中国医学科学院北京协和医院 Method and system for fusing RA ultrasonic modal synovial membrane scores based on deep learning

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110751207A (en) * 2019-10-18 2020-02-04 四川大学 Fault diagnosis method for anti-migration learning based on deep convolution domain
CN111881910A (en) * 2020-07-31 2020-11-03 杭州依图医疗技术有限公司 Information processing method based on vertebra image and computer readable storage medium
CN112446423A (en) * 2020-11-12 2021-03-05 昆明理工大学 Fast hybrid high-order attention domain confrontation network method based on transfer learning
CN113658142A (en) * 2021-08-19 2021-11-16 江苏金马扬名信息技术股份有限公司 Hip joint femur near-end segmentation method based on improved U-Net neural network
CN114066873A (en) * 2021-11-24 2022-02-18 袁兰 Method and device for detecting osteoporosis by utilizing CT (computed tomography) image
CN114821097A (en) * 2022-04-07 2022-07-29 西南交通大学 Multi-scale feature image classification method based on transfer learning
CN114863165A (en) * 2022-04-12 2022-08-05 南通大学 Vertebral body bone density classification method based on fusion of image omics and deep learning features
CN114937502A (en) * 2022-07-07 2022-08-23 西安交通大学 Method and system for evaluating osteoporotic vertebral compression fracture based on deep learning
CN115719329A (en) * 2022-08-23 2023-02-28 中国医学科学院北京协和医院 Method and system for fusing RA ultrasonic modal synovial membrane scores based on deep learning
CN115374943A (en) * 2022-09-01 2022-11-22 武汉东湖大数据交易中心股份有限公司 Data cognition calculation method and system based on domain confrontation migration network

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
KUN ZHANG.ET AL: "End to End Multitask Joint Learning Model for Osteoporosis Classification in CT Images", 《COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE》, 16 March 2023 (2023-03-16) *
R. EBSIM.ET AL: "Automatic segmentation of hip osteophytes in DXA scans using U-Nets, in: International Conference on Medical Image Computing and Computer-Assisted Intervention", 《SPRINGER NATURE SWITZERLAND》》, 31 December 2022 (2022-12-31) *
冉智强: "基于深度学习的新冠肺炎CT图像辅助诊断研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》, no. 01, 15 January 2023 (2023-01-15) *

Also Published As

Publication number Publication date
CN116630679B (en) 2024-06-04

Similar Documents

Publication Publication Date Title
Sahiner et al. Classification of mass and normal breast tissue: a convolution neural network classifier with spatial domain and texture images
CN111612754B (en) MRI tumor optimization segmentation method and system based on multi-modal image fusion
Han et al. Automated pathogenesis-based diagnosis of lumbar neural foraminal stenosis via deep multiscale multitask learning
CN109754007A (en) Peplos intelligent measurement and method for early warning and system in operation on prostate
Štern et al. Multi-factorial age estimation from skeletal and dental MRI volumes
CN114332572B (en) Method for extracting breast lesion ultrasonic image multi-scale fusion characteristic parameters based on saliency map-guided hierarchical dense characteristic fusion network
CN112132878A (en) End-to-end brain nuclear magnetic resonance image registration method based on convolutional neural network
Li et al. Optical coherence tomography vulnerable plaque segmentation based on deep residual U-Net
CN114359642A (en) Multi-modal medical image multi-organ positioning method based on one-to-one target query Transformer
Li et al. S 3 egANet: 3D spinal structures segmentation via adversarial nets
Tao et al. Highly efficient follicular segmentation in thyroid cytopathological whole slide image
Hou et al. Cross attention densely connected networks for multiple sclerosis lesion segmentation
Tang et al. MMMNA-net for overall survival time prediction of brain tumor patients
Wen et al. Short‐term and long‐term memory self‐attention network for segmentation of tumours in 3D medical images
Shah et al. Classifying and localizing abnormalities in brain MRI using channel attention based semi-Bayesian ensemble voting mechanism and convolutional auto-encoder
CN116630679B (en) Osteoporosis identification method based on CT image and domain invariant feature
Thirunavukkarasu et al. Intracranial hemorrhage detection using deep convolutional neural network
Wu et al. Human identification with dental panoramic images based on deep learning
Bhattacharjya et al. A genetic algorithm for intelligent imaging from quantum-limited data
Alosaimi et al. Efficient data augmentation techniques for improved classification in limited data set of Oral squamous cell carcinoma
Ummah et al. Covid-19 and Tuberculosis Detection in X-Ray of Lung Images with Deep Convolutional Neural Network.
Liu et al. Ensemble Learning with multiclassifiers on pediatric hand radiograph segmentation for bone age assessment
Manoila et al. SmartMRI Framework for Segmentation of MR Images Using Multiple Deep Learning Methods
Cao et al. Fusion of multi-size candidate regions enhances two-stage hippocampus segmentation
Polejowska et al. Impact of Visual Image Quality on Lymphocyte Detection Using YOLOv5 and RetinaNet Algorithms

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant