CN116563285B

CN116563285B - Focus characteristic identifying and dividing method and system based on full neural network

Info

Publication number: CN116563285B
Application number: CN202310838165.5A
Authority: CN
Inventors: 冯世庆; 伊力扎提·伊力哈木; 杨锦韬; 荣飞豪
Original assignee: Jiangsu Shiyu Intelligent Medical Technology Co ltd; Shandong Shiyu Intelligent Medical Technology Co ltd; Bangshi Technology Nanjing Co ltd
Current assignee: Jiangsu Shiyu Intelligent Medical Technology Co ltd; Shandong Shiyu Intelligent Medical Technology Co ltd; Bangshi Technology Nanjing Co ltd
Priority date: 2023-07-10
Filing date: 2023-07-10
Publication date: 2023-09-19
Anticipated expiration: 2043-07-10
Also published as: CN116563285A

Abstract

The application provides a focus characteristic identification and segmentation method and a focus characteristic identification and segmentation system based on a full neural network, which belong to medical image processing, wherein the method comprises the following steps: the definition and contrast of CT images are improved by adopting a wavelet filter neural network model; identifying each structure type and segmenting each structure in the CT image after adjustment by adopting a semantic segmentation network; stacking lesion data in a plurality of continuous and adjacent lesion images to construct lesion volume data with three-dimensional context information; and extracting the characteristics from the volume data by using a multi-scale characteristic pyramid network, processing and optimizing the characteristics according to a conditional random field method, and outputting characteristic information and category information of the characteristics after identification and segmentation. According to the application, the medical image is subjected to feature extraction and enhancement, image segmentation and feature image classification, and finally the recognition and segmentation accuracy of each structure in the medical image is effectively improved through processing and optimization.

Description

Focus characteristic identifying and dividing method and system based on full neural network

Technical Field

The application relates to the field of medical image processing, in particular to a focus characteristic identification and segmentation method based on a full neural network. And also relates to a focus characteristic identification and segmentation system based on the full neural network.

Background

At present, the focus can be identified and the characteristics are segmented by constructing and training a deep learning neural network, and the deep neural network participates in improving the diagnosis rate and reducing the misdiagnosis rate and the missed diagnosis rate.

However, the complexity of the CT image features directly utilizes the global features of the CT image to perform image recognition, so that the difficulty is high, the training speed is low, the workload is increased, and the recognition accuracy may be low. In addition, the image is identified after being segmented, the characteristics at the boundary of the image outline are ignored, and the reliability of focus identification is reduced.

Therefore, the conventional medical image recognition and segmentation method lacks effective feature extraction and enhancement means, resulting in high error rate of recognition and segmentation of each structure in the medical image.

Disclosure of Invention

The application aims to overcome the defect that the medical image identification and segmentation in the prior art lacks an effective feature extraction and enhancement means, and provides a focus feature identification and segmentation method based on a full neural network. And also relates to a focus characteristic identification and segmentation system based on the full neural network.

The application provides a focus characteristic identification and segmentation method based on a full neural network, which comprises the following steps:

s1, improving the definition and contrast of a CT image by adopting a wavelet filter neural network model;

s2, recognizing and segmenting focuses of the CT image with improved definition and contrast by adopting a semantic segmentation network, and generating a focus image;

s3, stacking focus data in a plurality of continuous and adjacent focus images to construct focus volume data with three-dimensional context information;

s4, extracting focus features from the focus volume data by adopting a multi-scale feature pyramid network, processing and optimizing the focus features by adopting a conditional random field method, and outputting recognition and segmentation results of the focus features.

Optionally, the wavelet filtering neural network model adopts a Mallat algorithm:

wherein x represents a vector of sampling points, y outputs a filtered vector of the sampling points, J is a scale parameter, k represents a position parameter under the scale represented by J,is the scale parameter->Wavelet function of position parameter k, saidIs the scale parameter->Is a minimum scale wavelet function of (1).

Optionally, the semantic segmentation network includes:

extracting the focus by adopting cavity convolution;

based on the extracted focus, adopting group normalization to replace batch normalization to perform data standardization treatment;

for the focus of the standardized treatment, 5 parallel cavity convolution branches and an ASPP model of an average pooling branch are adopted to acquire focus identification and segmented focus images.

Optionally, stacking the focus images to obtain a three-dimensional tensor, where each element in the three-dimensional tensor corresponds to a specific pixel point in the original CT image.

Optionally, the multi-scale feature pyramid network comprises: a low-dimensional feature layer, a high-dimensional feature layer and a parallel connection layer;

the low-dimensional feature layer converts low-level focus features in the focus volume data into high-level focus features;

the high-dimensional feature layer extracts focus feature representations from focus features of different scales;

the parallel connection layer fuses the lesion characteristic representations of different levels and scales.

Optionally, the method further comprises:

in the high-dimensional feature layer, the input three-dimensional tensor is converted into a plurality of two-dimensional feature graphs with different scales through layer-by-layer downsampling;

fusing the characteristic representations of different scales in the parallel connection layer;

in the global pooling layer, carrying out global pooling on the fused feature representation to obtain a feature vector with a fixed size;

inputting the feature vectors into a full-connection layer for classification, and outputting probability values of each category;

comparing the output probability value with a preset threshold value to obtain object types, positions and confidence degrees;

and performing de-duplication and screening by using non-maximum suppression, and outputting the predicted category, position and confidence.

Optionally, the processing and optimizing the feature according to the conditional random field method includes:

modeling the dependency relationship between the focus characteristic pixels by using a CRF model to eliminate noise, fill holes and smooth segmentation boundaries.

Optionally, the method further comprises:

performance evaluation: performance evaluation was performed using a recipe comprising a Dice coefficient, precision, recall, mAP0.5:0.95.

The application also provides a focus characteristic identification and segmentation system based on the full neural network, which comprises a reading module and a processing module;

the reading module reads a DICOM file comprising a CT image by adopting a pydicom library, and inputs the CT image into the processing module;

the processing module adopts a wavelet filter neural network model to improve the definition and contrast of CT images; identifying and segmenting focuses of the CT images with improved definition and contrast by adopting a semantic segmentation network to generate focus images; stacking the focus images based on the adjacent relation of the focuses in the adjacent focus images to construct focus volume data with three-dimensional context information; and (3) extracting focus features from the focus volume data by adopting a multi-scale feature pyramid network, processing and optimizing the focus features by adopting a conditional random field method, and outputting recognition and segmentation results of the focus features.

The application has the advantages and beneficial effects that:

the application provides a focus characteristic identification and segmentation method based on a full neural network, which comprises the following steps: the definition and contrast of a CT image are improved by adopting a wavelet filter neural network model, and the adjustment of each structure type in the CT image is carried out according to the need; identifying each structure type and segmenting each structure in the CT image after adjustment by adopting a semantic segmentation network; stacking the CT image slices adjacent to each other in different directions after each structure is identified and segmented, and constructing volume data with three-dimensional context information; and extracting the characteristics from the volume data by using a multi-scale characteristic pyramid network, processing and optimizing the characteristics according to a conditional random field method, and outputting characteristic information and category information of the characteristics after identification and segmentation. According to the application, the medical image is subjected to feature extraction and enhancement, image segmentation and feature image classification, and finally the recognition and segmentation accuracy of each structure in the medical image is effectively improved through processing and optimization.

Drawings

Fig. 1 is a schematic diagram of a lesion feature recognition and segmentation process based on a full neural network in the present application.

FIG. 2 is a schematic representation of a multiple resolution analysis scheme of the Mallat algorithm of the present application.

FIG. 3 is a schematic diagram of the process flow of DeepLabV3 in the present application.

Fig. 4 is a graph comparing lesion feature recognition and segmentation results in the present application.

Detailed Description

The present application is further described in conjunction with the accompanying drawings and specific embodiments so that those skilled in the art may better understand the present application and practice it.

The following is a detailed description of the embodiments of the present application, but the present application may be implemented in other ways than those described herein, and those skilled in the art can implement the present application by different technical means under the guidance of the inventive concept, so that the present application is not limited by the specific embodiments described below.

The application belongs to the field of medical image processing, and solves the problems that: the identification and segmentation accuracy of each structure in the medical image is improved by carrying out feature enhancement and optimization on the CT image.

The application adopts the wavelet filter neural network to process CT images, thereby improving definition and contrast; the semantic segmentation network is improved, and the accuracy of feature recognition and segmentation is improved; a multi-scale feature pyramid network is added, so that the recognition and accuracy of the features are further improved; and the processing of the feature by the conditional random field method is increased, and the integrity of the output feature is optimized.

The following description will be made with respect to an example of identification and segmentation of individual structures in a lumbar CT image.

Referring to fig. 1, S1 improves the sharpness and contrast of the CT image by using a wavelet neural network model.

The DICOM library is used to read DICOM files containing CT graphics. Meanwhile, the CT image is subjected to operations such as data cleaning, denoising, scaling and standardization through a preprocessing function, so that more reliable data are obtained.

Specifically, the application adopts a wavelet filter neural network model to improve the definition and contrast of the image, and can adjust different structure types according to the requirement.

The application adopts a wavelet filter neural network model optimized by a Mallat algorithm.

The reason for using the Mallat algorithm is: the Mallat algorithm has better time-frequency local property and stability, and can effectively solve the problem of singularity in wavelet transformation. Meanwhile, the method has good time-frequency localization characteristics, can better capture local characteristics of signals, and has higher compression ratio and reconstruction quality.

The flow of the Mallat algorithm is as shown in fig. 2:

s201, multiple resolution analysis of the Mallat algorithm, expressed as:

；

wherein x represents a vector of sampling points, y outputs a filtered vector of the sampling points, J is a scale parameter, represents the total scale number, k represents a position parameter under the scale represented by J,is the scale parameter->Wavelet function of position parameter k, said +.>Is the scale parameter->Is a minimum scale wavelet function of (1).

S202, performing wavelet transformation on the images to obtain wavelet coefficients of the images under different scales and positions, wherein the wavelet coefficients are expressed as follows:

；

wherein, the liquid crystal display device comprises a liquid crystal display device,representing the scale parameter as +.>Wavelet coefficient with position parameter k.

S203 performs noise reduction and contrast enhancement processing on the image using the wavelet coefficients described above, and the expression is as follows:

；

s204, after processing the wavelet coefficients, combines the coefficients of all scales back into the original image using the inverse wavelet transform, expressed as:

；

wherein, the liquid crystal display device comprises a liquid crystal display device,and->The wavelet functions are calculated according to the processed wavelet coefficients.

S2, recognizing and segmenting the focus of the CT image with improved definition and contrast by adopting a semantic segmentation network, and generating a focus image.

The application adopts the modified deep LabV3 semantic segmentation network based on the full convolution network to realize the identification and segmentation of the focus in the lumbar CT.

The deep LabV3 adopts a cavity convolution, an ASPP module and an end-to-end training method to effectively solve the problems of fuzzy object edges and lack of context information in semantic segmentation tasks.

The process flow of the deep LabV3 is as follows:

s301 inputting an image of the size of the image。

Feature extraction is performed in a first layer convolution by using a cavity convolution method, wherein the reason for using the cavity convolution is to enlarge a receptive field and capture multiple scales at the same time, and the expression is as follows:

s302 uses the method of GroupNormalization (group normalization) for normalization of the graph.

The method of groupnomation is used instead of batch normalization in order to avoid serious degradation of the input data for smaller batch size, thus reducing the training effect.

The GroupNormalization is not dependent on the size of the batch size, but is normalized according to the channel grouping, thereby reducing the dependency on the batch size. In addition, the GroupNormalization can overcome the problem of the Batchnormalization when the data distribution is relatively uneven, and the expression is as follows:

wherein, the liquid crystal display device comprises a liquid crystal display device,representing the +.>The sample is at->Values on individual channels ∈ ->And->Respectively represent the +.>The mean and variance of all samples on each channel, ϵ, is a constant to avoid dividing by zero。

S303, adding the input and the output by using residual connection (residual connection) to obtain a feature with the sizeMapping feature F1.

The first full-join layer of the residual structure (called the squeeze layer) is a 1x1 convolution operation, to beCompressed to a smaller dimension (compression ratio of 8, compression of the input feature map to 1/8 of the original), then use the ReLU activation function.

The second full-join layer (called the extraction layer) of the residual structure is a 1×1 convolution operation, outputting the squeeze layerFurther expansion: compression is used on the first layer, this compression ratio is 8, so the compressed feature vector is +.>The mapping matrix is needed to map back to the original dimension>。

Expanding to the same dimension as the number of channels of the input feature map, and obtaining the attention weight of each channel by using a sigmoid normalization functionThe expression is as follows:

s304 uses a attention mechanism module of the Squeeze-and-experientationNetwork (SE-Net) to improve the expression capability of the feature and the performance of the model.

The convolved image is passed throughGlobal pooling to get sizeIs a channel description vector +.>Channel description vector +.>Mapping to new channel attention weight vectors through two full connection layersAnd above, each element represents the weight of the corresponding channel.

Attention weight vectorAnd (3) remolding: the remodeling is performed by adding two new dimensions, namely adding a dimension of length 1 in front of the original one-dimensional vector and adding a dimension of length C in back, resulting in a three-dimensional tensor of size 1x C, wherein the third dimension represents the attention weight of each channel. It is understood that one-dimensional vectors are spread across the channel dimension to enable element-wise multiplication with the feature map per channel.

Remodelling to a 1x1xC tensor to match the input profileElement-by-element multiplication is performed.

The weighted feature map Y is obtained by multiplying s with the input feature map element by element,.

s305, adding Y and the input feature map X, and performing convolution and pooling operation by using a ReLU activation function to obtain a feature map with the size ofIs defined as the mapping feature F2. The mapping feature F2 is not only the identification of the applicationAnd segmenting the generated lesion image.

Specifically, the step of obtaining the mapping feature F2 includes:

a AtrousSpatialPyramidPooling (ASPP) module is constructed that includes 5 parallel hole convolution branches and one average pooling branch.

Each branch convolution operation uses a different void fraction in order to obtain a different range of context information. The structure is as follows:

branch 1: with a 1x1 convolution, groupNormalization, reLU.

Branch 2: convolutions with a fill of 12, a dilation of 12, a kernel size of 3 x 3, groupNormalization, reLU.

Branch 3: convolutions with a fill of 24, a dilation of 24, a kernel size of 3 x 3, groupNormalization, reLU.

Branch 4: a convolution with a fill of 36, a dilation of 36, a kernel size of 3 x 3, groupNormalization, reLU.

Branch 5: the input W and H are restored back by a bilinear interpolation method using a pooling layer of the size of the higher layer features, pooling to 1x1, convolving with a 1x1 convolution, groupNormalization, reLU.

The outputs of these 5 branches are spliced (in the channels direction) by means of a Concat, and finally the information is further fused in a convolutional layer through a 1x 1.

S306, calculating the distance between the model output result and the real result by using the regularized cross entropy loss function, and updating model parameters by using a back propagation algorithm, wherein the expression is as follows:

and S3, stacking the focus data in a plurality of continuous and adjacent focus images, and constructing focus volume data with three-dimensional context information.

After ASPP has completed feature extraction, a volumetric data with three-dimensional context information is constructed by stacking adjacent CT slices in different directions (e.g., 1-10 CT images are selected, in which case the first and third images are used to stack their data for the second CT image).

Since lumbar lesions usually appear on multiple sections, the accuracy of lesion detection can be improved by using information of the multiple sections. In this volume data, ASPP was used for feature extraction. By the stacking process, a three-dimensional tensor is obtained, wherein each element corresponds to a certain pixel point of the original CT image.

S4, extracting focus features from the focus volume data by adopting a multi-scale feature pyramid network, processing and optimizing the focus features by adopting a conditional random field method, and outputting recognition and segmentation results of the focus features. The segmentation result comprises: feature information and category information for a lesion of a feature.

Features are extracted using a multi-scale Feature Pyramid (FPN) network operation to enhance the discrimination of lumbar lesions. A multi-scale Feature Pyramid (FPN) network is a convolutional neural network used to extract image features. The feature pyramid is compressed into a vector with a fixed size by using a global pooling technology based on the feature representations of different levels and scales extracted layer by layer.

The main purpose of the FPN network design is to solve the problem that in the target detection task, the feature representations of different scales and layers have advantages in the aspect of improving accuracy.

The FPN network comprises three main modules, namely a low-dimensional feature layer (landalayers), a high-dimensional feature layer (Top-down path) and a parallel connection (featurefile).

and the parallel connection layer fuses the focus characteristic representations of different layers and scales to improve detection accuracy.

Further processing the output three-dimensional tensor using the FPN network, including:

first, in a high-dimensional feature layer, an input three-dimensional tensor is converted into a plurality of two-dimensional feature maps with different scales by means of layer-by-layer downsampling.

And then, in the parallel connection layer, fusing the feature graphs with different scales, namely the feature representations, so as to obtain the feature representation with rich context information.

And finally, in a global pooling layer, carrying out global pooling on the fused feature images so as to obtain a vector representation with a fixed size.

The vector representation contains characteristic information extracted from different levels and scales and can be used for the final lumbar lesion classification task.

After feature extraction is completed, the feature vectors are input into a fully connected layer (FC) for classification, and probability values of each category are output. And predicting according to the output probability value and a threshold value (between 0 and 1, wherein the higher the probability of the probability is higher, the higher the probability is, and obtaining the object category, the position and the confidence. Non-maximal suppression (NMS) is applied for deduplication and filtering, ultimately outputting the predicted class.

Further, a Conditional Random Field (CRF) method is adopted for carrying out segmentation post-processing and optimization so as to eliminate the problems of noise, filling holes, smoothing segmentation boundaries and the like.

For the image segmentation problem, the CRF method is adopted for post-processing and optimization, and the pixels of the image are mainly marked, divided into two or more categories and segmented according to the categories. In particular, the image may be represented as a grid graph, where each node represents a pixel, and the node is marked as belonging to a certain class.

In the foregoing process, the CRF model may be used to model the dependency relationship between pixels, so as to eliminate the problems of noise, filling the hole, and smoothing the segmentation boundary. The CRF model is typically used to learn interactions between pixels and predict the labels for each pixel from the labels for other pixels in the image.

In the CRF model, some feature functions may be constructed to represent the correlation between pixels. These functions may be defined based on information such as distance between pixels, gray values, textures, etc. These feature functions may be used to score each possible pixel marker during image segmentation. The CRF model is then solved using a message-based algorithm and the best set of pixel markers is found. The mark set can eliminate the problems of noise, filling holes, smoothing the dividing boundary and the like.

Furthermore, common measurement methods such as a Dice coefficient, precision, recall rate, mAP0.5:0.95 and the like can be adopted to evaluate indexes such as accuracy, robustness, speed and the like of the segmentation algorithm, and the performance and generalization capability of the model are further improved through means such as super-parameter tuning, data expansion, integrated learning and the like.

Evaluation index:

the evaluation index adopted in the experiment is Dice, namely, dice of all test samples on a test set is calculated, and an average value is obtained to obtain mDice; the Dice is one of evaluation indexes of semantic segmentation and is used for measuring accuracy of segmentation results.

P, the accuracy of lesion classification determination for all test specimens on the test set.

R, i.e., recall for lesion category determination for all test specimens on the test set.

mAP0.5:0.95, mAP0.5:0.95 refers to the average mean of the precision of all test samples on the test set, mAP 0.5:0.05 representing the average mAP over different IoU thresholds (from 0.5 to 0.95, step size 0.05) (0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95).

The experimental results are as follows:

mDice is 0.9068268537521362, p is 0.993, r is 0.9949, map0.5:0.95 is 0.8509.

As shown by the experimental results, the Dice index of the application reaches a higher degree.

In addition, the training and reasoning process of the model can be accelerated by utilizing the technologies such as GPU parallel computing capability, video memory management strategy and the like, and the computing efficiency and response speed are improved.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

1. The focus characteristic identification and segmentation method based on the full neural network is characterized by comprising the following steps of:

s1, improving the definition and contrast of a CT image by adopting a wavelet filter neural network model, decomposing a signal or an image into wavelet coefficients under different scales and positions based on a Mallat algorithm, then performing wavelet transformation to obtain the wavelet coefficients under different scales and positions, and performing noise reduction and contrast enhancement treatment on the image by using the wavelet coefficients;

2. The lesion feature recognition and segmentation method according to claim 1, wherein the wavelet filter neural network model employs a Mallat algorithm:

;

wherein x represents a vector of sampling points, y outputs a filtered vector of the sampling points, J is a scale parameter, k represents a position parameter under the scale represented by J,is the scale parameter->Wavelet function of position parameter k, said +.>Is the scale parameter->Is a minimum scale wavelet function of (1).

3. The full neural network-based lesion feature recognition and segmentation method according to claim 1, wherein the semantic segmentation network comprises:

extracting the focus by adopting cavity convolution;

4. The method for identifying and segmenting focal features based on the full neural network according to claim 3, wherein the focal images are stacked to obtain three-dimensional tensors, and each element in the three-dimensional tensors corresponds to a pixel point in an original CT image.

5. The full neural network-based lesion feature recognition and segmentation method according to claim 1, wherein the multi-scale feature pyramid network comprises: a low-dimensional feature layer, a high-dimensional feature layer and a parallel connection layer;

6. The method for identifying and segmenting lesion features based on an all-neural network according to claim 5, further comprising:

7. The method for identifying and segmenting focal features based on the full neural network according to claim 1, wherein the processing and optimizing the focal features by using a conditional random field method comprises:

8. The method for identifying and segmenting focal features based on the full neural network according to any one of claims 1 to 7, further comprising:

performance evaluation: the method comprises the steps of Dice coefficient, precision, recall rate and mAP0.5:0.95.

9. The focus characteristic recognition and segmentation system based on the full neural network is characterized by comprising a reading module and a processing module;

the processing module adopts a wavelet filter neural network model to improve the definition and contrast of a CT image, the wavelet filter neural network model decomposes a signal or an image into wavelet coefficients under different scales and positions based on a Mallat algorithm, then performs wavelet transformation to obtain the wavelet coefficients under different scales and positions, and performs noise reduction and contrast enhancement on the image by using the wavelet coefficients; identifying and segmenting focuses of the CT images with improved definition and contrast by adopting a semantic segmentation network to generate focus images; stacking the focus images based on the adjacent relation of the focuses in the adjacent focus images to construct focus volume data with three-dimensional context information; and (3) extracting focus features from the focus volume data by adopting a multi-scale feature pyramid network, processing and optimizing the focus features by adopting a conditional random field method, and outputting recognition and segmentation results of the focus features.