CN117496280B - Craniocerebral CT image quality control method based on 3D convolution and multi-label decoding - Google Patents

Craniocerebral CT image quality control method based on 3D convolution and multi-label decoding Download PDF

Info

Publication number
CN117496280B
CN117496280B CN202410004772.6A CN202410004772A CN117496280B CN 117496280 B CN117496280 B CN 117496280B CN 202410004772 A CN202410004772 A CN 202410004772A CN 117496280 B CN117496280 B CN 117496280B
Authority
CN
China
Prior art keywords
craniocerebral
image
attention
inputting
label
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410004772.6A
Other languages
Chinese (zh)
Other versions
CN117496280A (en
Inventor
江波
张鑫
李传富
李淑芳
宣寒宇
汤进
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui University
First Affiliated Hospital of AHUTCM
Original Assignee
Anhui University
First Affiliated Hospital of AHUTCM
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui University, First Affiliated Hospital of AHUTCM filed Critical Anhui University
Priority to CN202410004772.6A priority Critical patent/CN117496280B/en
Publication of CN117496280A publication Critical patent/CN117496280A/en
Application granted granted Critical
Publication of CN117496280B publication Critical patent/CN117496280B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/03Recognition of patterns in medical or anatomical images
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Abstract

The invention relates to a craniocerebral CT image quality control method based on 3D convolution and multi-label decoding. The method comprises the following steps: acquiring a craniocerebral CT sequence image to be evaluated; inputting the craniocerebral CT sequence image to be evaluated into a 3D convolution network of a trained image quality evaluation model, and extracting space-time characteristics of the craniocerebral CT sequence image to be evaluated; inputting the space-time characteristics into a multi-label decoder of an image quality evaluation model based on a transducer to acquire query characteristics; and inputting the query characteristics into a linear classifier of an image quality evaluation model, and predicting quality problems of the craniocerebral CT sequence images to be evaluated. The invention solves the multi-label classification problems of low model efficiency and unbalanced data, and provides a new direction for the quality control of craniocerebral CT images.

Description

Craniocerebral CT image quality control method based on 3D convolution and multi-label decoding
Technical Field
The invention relates to the field of computer image processing, in particular to a craniocerebral CT image quality control method based on 3D convolution and multi-label decoding.
Background
Craniocerebral computed tomography (CT, computedTomography) is a gold standard for medical imaging techniques and is highly favored by the medical industry, particularly in the diagnosis and study of brain diseases. With the acceleration of technical progress, the quality and the resolution of CT images are improved remarkably. However, how to accurately capture key information from a plurality of CT sequence images and perform multi-label classification remains a challenge to be overcome. Traditional craniocerebral CT image interpretation methods mostly rely on the specialized eye light of doctors. The method is complex in operation, low in efficiency, and easy to influence by experience differences of doctors, and risks of misunderstanding or omission exist. While deep learning provides a revolutionary breakthrough for 2D medical image analysis, the technical challenges faced for 3D images, particularly complex spatiotemporal CT sequences, are not completely resolved due to their unique high-dimensional structure and characteristics.
In recent years, the transducer architecture has achieved attention in natural image processing, but its potential is still in the preliminary exploration phase in the medical image field. In particular, how to elegantly integrate the method with a 3D convolution network so as to fully mine space-time characteristics and realize self-adaptive characteristic extraction and classification is a promising research field.
In addition, data imbalance has been a troublesome problem in multi-tag tasks. The conventional method is easily biased to the majority class in the face of tag imbalance, thereby ignoring key information of the minority class. Therefore, how to ingeniously introduce category specific weights ensures that each label is treated fairly so as to achieve higher classification accuracy and stability is certainly a forced problem to be solved. Accordingly, there is a need to provide a method for craniocerebral CT image quality control based on 3D convolution and multi-tag decoding.
Disclosure of Invention
The invention provides a craniocerebral CT image quality control method based on 3D convolution and multi-label decoding. The method solves the problem that the characteristics of CT images cannot be extracted accurately and multi-label classification is carried out in the prior art.
The invention provides a craniocerebral CT image quality control method based on 3D convolution and multi-label decoding, which comprises the following steps: acquiring a craniocerebral CT sequence image to be evaluated; inputting the craniocerebral CT sequence image to be evaluated into a 3D convolution network of a trained image quality evaluation model, and extracting space-time characteristics of the craniocerebral CT sequence image to be evaluated; inputting the space-time features to a multi-label decoder of the image quality evaluation model based on a transducer to obtain query features; and inputting the query characteristics into a linear classifier of the image quality evaluation model, and predicting quality problems of the craniocerebral CT sequence images to be evaluated.
In an embodiment of the present invention, the inputting the spatio-temporal feature into the transform-based multi-tag decoder of the image quality evaluation model, obtaining the query feature includes: inputting the space-time characteristics to a multi-head attention module of the multi-tag decoder, and processing the space-time characteristics by using the query vector of the multi-head attention module to acquire an attention matrix; and inputting the attention moment array into a feedforward layer of the multi-label decoder, and performing linear transformation on the attention moment array to obtain query characteristics.
In an embodiment of the present invention, the multi-head attention module includes a plurality of attention heads, the multi-head attention module inputting the space-time feature to the multi-tag decoder, processing the space-time feature using a query vector of the multi-head attention module to obtain an attention matrix, including: inputting the spatiotemporal features into a multi-head attention module; for each attention head: processing the space-time characteristics by using the query vector and the corresponding weight to obtain an output result of the current attention head; and performing splicing processing on the output results of all the attention heads to obtain an attention matrix.
In an embodiment of the present invention, the output result of the current attention head is:wherein->For the output result of the ith attention head, < +.>Output for calculating weights and generating attention header, Q is query vector, E is spatiotemporal feature, ++>、/>、/>The query weight matrix, the key weight matrix and the value weight matrix corresponding to the ith attention head are respectively adopted.
In an embodiment of the present invention, the inputting the query feature into the linear classifier of the image quality evaluation model predicts quality problems of the craniocerebral CT sequence image, including: inputting the query features into a linear classifier, and performing linear projection on the query features to obtain probability values of each quality problem in the craniocerebral CT sequence images; and selecting the quality problem with the probability value larger than a preset threshold value as the predicted quality problem of the craniocerebral CT sequence image.
In an embodiment of the present invention, the image quality evaluation model is obtained through training, wherein the training sample is a plurality of sets of craniocerebral CT sequence images including labels, and parameters of the image quality evaluation model are obtained based on a predicted quality problem and a degree of difference between corresponding labels.
In an embodiment of the present invention, before inputting the craniocerebral CT sequence image into the image quality evaluation model, the method further comprises: before inputting the craniocerebral CT sequence image as the training sample into the image quality evaluation model, the method further comprises: data enhancement is performed on craniocerebral CT sequence images serving as training samples, and the data enhancement mode comprises the following steps: performing enhancement operation on each slice image in the craniocerebral CT sequence image serving as a training sample; wherein the enhancement operation includes rotation of different preset angles, image scaling, image color enhancement, and image contrast adjustment.
In an embodiment of the present invention, before the enhancing operation is performed on each slice image in the craniocerebral CT sequence image as the training sample, the method further includes: cutting each slice image in the craniocerebral CT sequence image according to the preset side length requirement.
In an embodiment of the present invention, the parameter adjustment method of the image quality evaluation model includes: calculating the difference between the predicted quality problem and the corresponding label according to a preset loss function; and adjusting parameters of the image quality evaluation model layer by using back propagation according to the degree of difference based on a gradient descent method.
In one embodiment of the present invention, the loss function(x) The method comprises the following steps: />(x)=Wherein->The ratio of the number of samples of the negative class to the total number of samples, +.>The ratio of the number of samples in the positive category to the total number of samples, y is the label value, and f (x) is the prediction result.
The invention provides a craniocerebral CT image quality control method based on 3D convolution and multi-label decoding, which combines deep learning and traditional image processing technology to effectively extract space-time characteristics in craniocerebral CT images. By introducing the 3D convolution network, not only the three-dimensional structure information of the image is fully captured, but also the calculation complexity and the resource consumption are successfully reduced. In addition, the multi-label decoder of the transducer is utilized for query updating, so that the identification and processing of key features are effectively enhanced. The invention provides a high-efficiency, accurate and robust method for craniocerebral CT image analysis, and effectively solves the problem that the characteristics of CT images cannot be accurately extracted and multi-label classification can not be carried out in the prior art.
Drawings
Fig. 1 is a flow chart of a craniocerebral CT image quality control method based on 3D convolution and multi-label decoding according to an embodiment of the present invention;
fig. 2 is a diagram showing a structure of an image quality evaluation model in an embodiment of the present invention.
Detailed Description
Other advantages and effects of the present invention will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present invention with reference to specific examples. The invention may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present invention. It should be noted that the following embodiments and features in the embodiments may be combined with each other without conflict.
It should be noted that the illustrations provided in the following embodiments merely illustrate the basic concept of the present invention by way of illustration, and only the components related to the present invention are shown in the drawings and are not drawn according to the number, shape and size of the components in actual implementation, and the form, number and proportion of the components in actual implementation may be arbitrarily changed, and the layout of the components may be more complicated.
In the following description, numerous details are set forth in order to provide a more thorough explanation of embodiments of the present invention, it will be apparent, however, to one skilled in the art that embodiments of the present invention may be practiced without these specific details, in other embodiments, well-known structures and devices are shown in block diagram form, rather than in detail, in order to avoid obscuring the embodiments of the present invention.
The existing CT multi-label quality control model has the following problems: (1) Despite the intensive research and application of 2D images, it remains a great challenge to effectively augment data for 3D images to enhance the generalization ability of models, and to effectively extract features of CT images for multi-label classification, due to the high dimensionality and complexity of CT images. (2) Although the transducer has demonstrated its effectiveness in a number of fields, in existing CT image processing models, its capabilities may not be fully utilized for adaptive feature extraction. (3) Existing models may encounter difficulties in handling multi-label classification, particularly in the face of data imbalance, without introducing specific weighting factors for different classes, which may result in low recognition rates for certain classes.
In order to solve the problems, a craniocerebral CT image quality control method based on 3D convolution and multi-label decoding is provided, and the deep learning and the traditional image processing technology are combined to effectively extract the space-time characteristics in the craniocerebral CT image. By introducing the 3D convolution network, not only the three-dimensional structure information of the image is fully captured, but also the calculation complexity and the resource consumption are successfully reduced. In addition, the multi-label decoder of the transducer is utilized for query updating, so that the identification and processing of key features are effectively enhanced. When the unbalanced data set is processed, the loss function strategy ensures the balanced contribution of each class in model training, and greatly improves the performance of the model on rare classes. In general, the present invention provides an efficient, accurate and robust method for craniocerebral CT image analysis.
Referring to fig. 1 and 2, the craniocerebral CT image quality control method based on 3D convolution and multi-tag decoding includes the steps of:
s1, acquiring a craniocerebral CT sequence image to be evaluated;
s2, inputting the craniocerebral CT sequence image to be evaluated into a 3D convolution network of a trained image quality evaluation model, and extracting space-time characteristics of the craniocerebral CT sequence image to be evaluated;
s3, inputting the space-time characteristics to a multi-label decoder of the image quality evaluation model based on a transducer to obtain query characteristics;
s4, inputting the query characteristics into a linear classifier of the image quality evaluation model, and predicting quality problems of the craniocerebral CT sequence images to be evaluated.
The following details the steps:
s1, acquiring a craniocerebral CT sequence image to be evaluated.
After a patient is scanned by a CT scanner, three-dimensional craniocerebral CT continuous images are generated. Processing the continuous craniocerebral CT images by a uniform sampling technique to obtain a craniocerebral CT sequence image covering from the base to the top of the cranium, wherein the craniocerebral CT sequence image comprises T (the number of T is not limited, such as t=40, 41, 42, etc.) craniocerebral slice images, and each slice image represents a cross-sectional view at a different position of the craniocerebral. For example, a craniocerebral CT sequence image is X, its dimensions are t×c×h×w, T refers to the sequence length (i.e., the number of craniocerebral slice images contained in the craniocerebral CT sequence image), C refers to the number of channels per craniocerebral slice image, and H and W refer to the length and width of the craniocerebral slice images.
S2, inputting the craniocerebral CT sequence image to be evaluated into a 3D convolution network of a trained image quality evaluation model, and extracting space-time characteristics of the craniocerebral CT sequence image to be evaluated.
And inputting the craniocerebral CT sequence image into a 3D convolution network, and extracting features in a space-time dimension to obtain a space-time feature E of the craniocerebral CT sequence image, wherein the dimension of the space-time feature E is h multiplied by w multiplied by D, h is the length of the space-time feature, w is the width of the space-time feature, and D is the dimension of the space-time feature. Specific details of the 3D convolutional network are as follows: the 3D convolutional network includes ten 3D convolutional layers and six 3D pooling layers. Each convolution layer uses a 3 x 3 filter, and the step size is 1 x 1, this configuration facilitates complex spatio-temporal feature extraction. The first pooling layer of the network uses a kernel size of 1 x 2 and a step size of 1 x 2, which design carefully considers the need to retain time information during the early stages of processing. The second through fourth pooling layers uniformly use a kernel size of 2 x 2 and a step size of 2 x 2, the key feature characterization is preserved while the spatial dimension is effectively reduced. The fifth pooling layer, which is slightly different, uses a 3 x 2 kernel and a 2 x 2 step size, and is provided with 0 x 1 padding to meet the requirements of the network architecture. Finally, the sixth pooling layer adjusts its kernel size to 2×1×1 and optimizes feature map reduction for subsequent layers using a step size of 2×1×1.
S3, inputting the space-time characteristics to a multi-label decoder based on a transducer of the image quality evaluation model to acquire query characteristics.
The space-time features are input into the multi-label decoder, the object features are mined in a self-adaptive mode by utilizing a cross attention module of the transducer multi-label decoder, and the representation of the object can be effectively decomposed into a plurality of parts or angles through multi-head attention, so that the classification accuracy is improved, and the interpretation of a model is enhanced. Therefore, the obtained query characteristics can more accurately embody the corresponding characteristics of the characteristic diagram.
Specifically, in one embodiment of the present invention, the inputting the spatio-temporal feature into the transform-based multi-tag decoder of the image quality evaluation model, obtaining the query feature includes:
inputting the space-time characteristics to a multi-head attention module of the multi-tag decoder, and processing the space-time characteristics by using the query vector of the multi-head attention module to acquire an attention matrix;
and inputting the attention moment array into a feedforward layer of the multi-label decoder, and performing linear transformation on the attention moment array to obtain query characteristics.
The time-space characteristics are input into a multi-head attention module, and as the multi-head attention module is provided with a plurality of attention heads, each attention head processes the time-space characteristics according to a preset attention mechanism, and output results of the plurality of attention heads are spliced to obtain an attention matrix. Considering that the attention mechanism may not fit well to the complex process, the resulting model is poorly effective. Therefore, in the invention, the attention moment array is input to the feedforward layer (namely the feedforward neural network), and the feedforward layer is a fully-connected network consisting of two linear layers, so that the robustness and the accuracy of the model are enhanced through the capability of the model enhanced by the two layers of networks.
Specifically, in one embodiment of the present invention, the multi-head attention module includes a plurality of attention heads, the multi-head attention module inputting the space-time feature to the multi-tag decoder, processing the space-time feature using a query vector of the multi-head attention module to obtain an attention matrix, including:
inputting the spatiotemporal features into a multi-head attention module;
for each attention head: processing the space-time characteristics by using the query vector and the corresponding weight to obtain an output result of the current attention head;
and performing splicing processing on the output results of all the attention heads to obtain an attention matrix.
After the space-time feature is input into the multi-head attention module, for each attention head, the output result of the current attention head is as shown in the formula (1):
(1)
wherein,for the output result of the ith attention head, < +.>Output for calculating weights and generating attention header,/->For query vectors, E is a spatiotemporal feature, +.>、/>、/>The query weight matrix, the key weight matrix and the value weight matrix corresponding to the ith attention head are respectively adopted.After model training is completed, parameters of the query vector, the query weight matrix, the key weight matrix and the value weight matrix are obtained. Specifically, the->The correlation calculation mode of the function is shown in the formula (2):
(2)
where K is a key vector, V is a value vector,is the dimension of the key vector, +.>Is a normalization function, T is a transpose operation. After each attention head calculates an output result of the attention head according to the formulas (1) and (2), the output result of each head is spliced according to the formula (3) to obtain an attention matrix:
(3)
wherein,is the output of the multi-headed attention mechanism, i.e., the attention matrix, multi-head attn is the multi-headed attention mechanism, which consists of a plurality of heads, each of which is an independent attention mechanism, specifically,the calculation mode of the function is shown in the formula (4):
=/>(4)
wherein,for the stitching function, h is the total amount of attention head, +.>Is a linearly transformed weight matrix for converting the spliced output of the plurality of heads into one fixed-size output. Output result of the multi-head attention mechanism obtained in equation (3)>Inputting the query characteristics into a feedforward layer, and performing linear transformation according to formulas (5) and (6) to obtain the query characteristics Q:
(5)
(6)
where Q is the query feature, FF (x) is the computation function of the feed-forward layer,the weights and biases of the neural network layers are represented, respectively.
In the training stage of the model, a plurality of sets of craniocerebral CT sequence images containing labels are used as training samples, and when the model is trained, label embedding is needed to be used as an initial query vectorWherein, the method comprises the steps of, wherein,nis the number of different quality problem categories in the tag and d is the dimension of the initial query vector. Embedding the spatiotemporal features output by the 3D convolutional network as images, namely, embedding the images as key vectors and value vectors of the multi-label decoder, and finally outputting the multi-head attention module according to the formulas (1) to (6)>And (3) as an initial key vector used when the model is trained next time, repeating iterative training until the model training is completed.
S4, inputting the query characteristics into a linear classifier of the image quality evaluation model, and predicting quality problems of the craniocerebral CT sequence images to be evaluated.
The query features are input into the linear classifier to realize classified query on the query features, so that the quality problems of the craniocerebral CT sequence images are obtained, wherein the quality problems can be zero, one or a plurality of the quality problems can be obtained, and the multi-label classification of the craniocerebral CT sequence images is realized.
Specifically, in one embodiment of the present invention, the inputting the query feature into the linear classifier of the image quality evaluation model predicts the quality problem of the craniocerebral CT sequence image, including:
inputting the query features into a linear classifier, and performing linear projection on the query features to obtain probability values of each quality problem in the craniocerebral CT sequence images;
and selecting the quality problem with the probability value larger than a preset threshold value as the predicted quality problem of the craniocerebral CT sequence image.
In this embodiment, the preset threshold is 0.5, and the quality problem with probability value greater than 0.5 is selected, that is, the quality problem of the craniocerebral CT sequence image to be evaluated.
In an embodiment of the present invention, the image quality evaluation model is obtained through training, wherein the training samples are a plurality of sets of craniocerebral CT sequence images including labels, and parameters of the image quality evaluation model are adjusted based on a predicted quality problem and a degree of difference between corresponding labels. As a training sample of the model, firstly, labeling processing needs to be performed on the training sample, wherein the labeling mode is as follows: labeling each craniocerebral CT sequence image by manually judging the problems in the craniocerebral CT sequence images to obtain the craniocerebral CT sequence image containing the label. Wherein the tag is used for characterizing quality problems existing in craniocerebral CT sequence images. It should be noted that the specific content of the quality problem is not limited, and those skilled in the art can adapt based on the actual need, which is not limited herein. In one embodiment of the invention, quality issues include, but are not limited to, scan position-X-axis-adduction; scanning the body position, namely the X axis and the outward elevation; scanning the body position, namely the Y axis and the left incline; scanning the body position, namely the Y axis and the right oblique; scanning the body position, the Z axis and the left rotation; scanning the body position, the Z axis and the right rotation; the image is off-center; scanning range-oversized-under the skull base; scanning over range-oversize-over the top of the cranium; scanning range-too small-above the skull base; scanning range-too small-under the top of the cranium; image artifact-motion artifact; image artifact-foreign body artifact-earrings; image artifact-foreign body artifact-denture; image artifacts-foreign body artifacts in vitro-others.
Since one craniocerebral CT sequence image may have no quality problem, may have only one quality problem, or may have a plurality of quality problems, the labels of the craniocerebral CT sequence images are represented in the form of a list, where 1 indicates that the quality problem exists and 0 indicates that the quality problem does not exist. Illustratively, if there are five categories of quality problems, they are: the image is off-center; scanning range-oversized-under the skull base; scanning over range-oversize-over the top of the cranium; scanning range-too small-above the skull base; scan range-too small-under the top of the cranium, labeled [1,0,0,1,0], which represents the quality issue of the cranium CT sequence image calibration as: the image is off-centered and over a scanning range-too small-above the skull base.
In the model training, in an embodiment of the present invention, before the craniocerebral CT sequence images as training samples are input into the image quality evaluation model, the method further comprises: data enhancement is performed on craniocerebral CT sequence images serving as training samples, and the data enhancement mode comprises the following steps: performing enhancement operation on each slice image in the craniocerebral CT sequence image serving as a training sample; wherein the enhancement operation includes rotation of different preset angles, image scaling, image color enhancement, and image contrast adjustment. In order to increase the data volume and improve the robustness of the model, data enhancement is also required before the training samples are input into the image quality evaluation model for training. It should be noted that the data enhancement method is not limited to the above four methods, and those skilled in the art may adapt the data enhancement method based on actual needs, and the method is not limited herein.
Further, in an embodiment of the present invention, before the enhancing operation is performed on each slice image in the craniocerebral CT sequence image as the training sample, the method further includes: cutting each slice image in the craniocerebral CT sequence image according to the preset side length requirement. For example, each slice image in the craniocerebral CT sequence image may be randomly cropped to a generally square region of the original slice image side length. The image clipping is used as one of the data enhancement modes, so that the variation range of the image is increased, the robustness of the model can be improved, images of different categories and sizes can be processed, and in addition, the calculation amount of the model can be obviously reduced, so that the efficiency of the model is improved. Through data enhancement, randomness and rich diversity can be injected into the model, so that the model can be better learned to different image features, and the risk of overfitting is reduced.
In an embodiment of the present invention, the parameter adjustment method of the image quality evaluation model includes:
calculating the difference between the predicted quality problem and the corresponding label according to a preset loss function;
and adjusting parameters of the image quality evaluation model layer by using back propagation according to the degree of difference based on a gradient descent method.
In the model training stage, the training is continuously iterated to minimize the loss function, so that a trained image quality evaluation model is obtained. Specifically, during each iterative training, the image quality evaluation model predicts the quality problem of the training data according to the training data, and the difference between the predicted quality problem and the corresponding label can be calculated through the loss function, so that the model parameters are reversely updated according to the difference. In order to deal with the problem of data imbalance, category-specific weighting factors are also introduced in the present invention:=/>,/>=/>and constructs a loss function shown in formula (7):
(x)=/>(7)
wherein,(i.e.)>) The ratio of the number of samples of the negative class to the total number of samples, +.>(i.e.)>) The ratio of the number of samples in the positive category to the total number of samples, y is the label value, and f (x) is the prediction result. Through continuous iterative training, relevant parameters such as a 3D convolution network, a multi-label decoder, a linear classifier and the like are changed, and finally, through multiple iterations, a loss function is minimized, and a trained image quality evaluation model is obtained.
In summary, the invention discloses a craniocerebral CT image quality control method based on 3D convolution and multi-label decoding. The 3D convolution network structure can accurately capture the characteristics of the craniocerebral CT image in the space-time dimension, and ensure the accurate identification of abnormal areas. In addition, the image quality evaluation model used in the invention skillfully combines the 3D convolution network with the transducer structure, thereby realizing the self-adaptive extraction of the space-time characteristics. The multi-label decoder of the transducer is utilized for query updating, so that the identification and processing of key features are effectively enhanced. Further, by calculating the tag imbalance loss and employing class-specific weighting factors, the balanced contribution of each class in model training is ensured. This not only solves the problem of data distribution imbalance, but also ensures that the model performs well over a few categories. Therefore, the method provided by the invention can be used for processing the craniocerebral CT sequence images, so that the multi-label classification problem of low model efficiency and unbalanced data is solved, and a new direction is provided for the quality control of the craniocerebral CT images. Therefore, the invention effectively overcomes various defects in the prior art and has high industrial utilization value.
The above embodiments are merely illustrative of the principles of the present invention and its effectiveness, and are not intended to limit the invention. Modifications and variations may be made to the above-described embodiments by those skilled in the art without departing from the spirit and scope of the invention. Accordingly, it is intended that all equivalent modifications and variations of the invention be covered by the claims, which are within the ordinary skill of the art, be within the spirit and scope of the present disclosure.

Claims (7)

1. A craniocerebral CT image quality control method based on 3D convolution and multi-label decoding, the method comprising:
acquiring a craniocerebral CT sequence image to be evaluated;
inputting the craniocerebral CT sequence image to be evaluated into a 3D convolution network of a trained image quality evaluation model, and extracting space-time characteristics of the craniocerebral CT sequence image to be evaluated;
inputting the space-time features to a multi-label decoder of the image quality evaluation model based on a transducer, and obtaining query features based on cross attention self-adaptive mining object features;
inputting query characteristics into a linear classifier of the image quality evaluation model, and predicting quality problems of the craniocerebral CT sequence images to be evaluated;
the inputting the space-time feature to a multi-label decoder of the image quality evaluation model based on a transducer, and obtaining the query feature based on the cross-attention self-adaptive mining object feature comprises the following steps:
inputting the space-time characteristics to a multi-head attention module of the multi-tag decoder, and processing the space-time characteristics by using the query vector of the multi-head attention module to acquire an attention matrix;
inputting an attention moment array to a feedforward layer of the multi-label decoder, and performing linear transformation on the attention moment array to obtain query characteristics;
the multi-head attention module comprises a plurality of attention heads, the multi-head attention module for inputting the space-time characteristics to the multi-label decoder processes the space-time characteristics by using the query vector of the multi-head attention module to acquire an attention matrix, and the multi-head attention module comprises:
inputting the spatiotemporal features into a multi-head attention module;
for each attention head: processing the space-time characteristics by using the query vector and the corresponding weight to obtain an output result of the current attention head;
splicing the output results of all the attention heads to obtain an attention matrix;
the output result of the current attention head is:wherein->For the output result of the ith attention head, < +.>Output for calculating weights and generating attention header, Q is query vector, E is spatiotemporal feature, ++>、/>、/>Respectively a query weight matrix, a key weight matrix and a value weight matrix corresponding to the ith attention head;
specific details of the 3D convolutional network are as follows: the 3D convolution network comprises ten 3D convolution layers and six 3D pooling layers; each convolution layer uses a 3 x 3 filter, and the step size is 1 multiplied by 1; the first pooling layer of the network uses a kernel size of 1×2×2 and a step size of 1×2×2; unified use of second through fourth pooling layers 2 core size x 2 and step size 2 x 2; the fifth pooling layer uses a 3 x 2 kernel and a 2 x 2 step size, and is provided with a 0 x 1 filling; the sixth pooling layer adjusts its kernel size to 2 x 1 and uses a step size of 2 x 1.
2. The method for controlling the quality of a craniocerebral CT image based on 3D convolution and multi-label decoding according to claim 1, wherein said inputting query features to the linear classifier of the image quality assessment model predicts quality problems with the craniocerebral CT sequence images, comprising:
inputting the query features into a linear classifier, and performing linear projection on the query features to obtain probability values of each quality problem in the craniocerebral CT sequence images;
and selecting the quality problem with the probability value larger than a preset threshold value as the predicted quality problem of the craniocerebral CT sequence image.
3. The method for controlling the quality of the craniocerebral CT image based on 3D convolution and multi-label decoding according to claim 1, wherein the image quality evaluation model is obtained by training, wherein the training samples are a plurality of groups of craniocerebral CT sequence images containing labels, and the parameters of the image quality evaluation model are obtained by adjusting based on the predicted quality problem and the degree of difference between the corresponding labels.
4. The method for controlling the image quality of a craniocerebral CT based on 3D convolution and multi-label decoding according to claim 3, further comprising, before inputting the craniocerebral CT sequence image as the training sample to the image quality evaluation model: data enhancement is performed on craniocerebral CT sequence images serving as training samples, and the data enhancement mode comprises the following steps: performing enhancement operation on each slice image in the craniocerebral CT sequence image serving as a training sample; wherein the enhancement operation includes rotation of different preset angles, image scaling, image color enhancement, and image contrast adjustment.
5. The method for controlling the quality of a craniocerebral CT image based on 3D convolution and multi-label decoding according to claim 4, wherein said enhancing each slice image in the craniocerebral CT sequence image as a training sample further comprises: cutting each slice image in the craniocerebral CT sequence image according to the preset side length requirement.
6. The method for controlling the quality of a craniocerebral CT image based on 3D convolution and multi-label decoding according to claim 3, wherein the method for adjusting parameters of the image quality evaluation model comprises:
calculating the difference between the predicted quality problem and the corresponding label according to a preset loss function;
and adjusting parameters of the image quality evaluation model layer by using back propagation according to the degree of difference based on a gradient descent method.
7. The method for controlling the image quality of craniocerebral CT based on 3D convolution and multi-label decoding according to claim 6, wherein the loss function(x) The method comprises the following steps: />(x)= Wherein->The ratio of the number of samples of the negative class to the total number of samples, +.>The ratio of the number of samples in the positive category to the total number of samples, y is the label value, and f (x) is the prediction result.
CN202410004772.6A 2024-01-03 2024-01-03 Craniocerebral CT image quality control method based on 3D convolution and multi-label decoding Active CN117496280B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410004772.6A CN117496280B (en) 2024-01-03 2024-01-03 Craniocerebral CT image quality control method based on 3D convolution and multi-label decoding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410004772.6A CN117496280B (en) 2024-01-03 2024-01-03 Craniocerebral CT image quality control method based on 3D convolution and multi-label decoding

Publications (2)

Publication Number Publication Date
CN117496280A CN117496280A (en) 2024-02-02
CN117496280B true CN117496280B (en) 2024-04-02

Family

ID=89676851

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410004772.6A Active CN117496280B (en) 2024-01-03 2024-01-03 Craniocerebral CT image quality control method based on 3D convolution and multi-label decoding

Country Status (1)

Country Link
CN (1) CN117496280B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111783831A (en) * 2020-05-29 2020-10-16 河海大学 Complex image accurate classification method based on multi-source multi-label shared subspace learning
CA3138679A1 (en) * 2019-04-30 2020-11-05 The Trustees Of Dartmouth College System and method for attention-based classification of high-resolution microscopy images
WO2022099325A1 (en) * 2022-01-10 2022-05-12 Innopeak Technology, Inc. Transformer-based scene text detection
CN115409812A (en) * 2022-09-01 2022-11-29 杭州电子科技大学 CT image automatic classification method based on fusion time attention mechanism
CN116091833A (en) * 2023-02-20 2023-05-09 西安交通大学 Attention and transducer hyperspectral image classification method and system
CN116245832A (en) * 2023-01-30 2023-06-09 北京医准智能科技有限公司 Image processing method, device, equipment and storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11494616B2 (en) * 2019-05-09 2022-11-08 Shenzhen Malong Technologies Co., Ltd. Decoupling category-wise independence and relevance with self-attention for multi-label image classification
US20230401717A1 (en) * 2022-06-10 2023-12-14 Adobe Inc. Transformer for efficient image segmentation

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA3138679A1 (en) * 2019-04-30 2020-11-05 The Trustees Of Dartmouth College System and method for attention-based classification of high-resolution microscopy images
CN111783831A (en) * 2020-05-29 2020-10-16 河海大学 Complex image accurate classification method based on multi-source multi-label shared subspace learning
WO2022099325A1 (en) * 2022-01-10 2022-05-12 Innopeak Technology, Inc. Transformer-based scene text detection
CN115409812A (en) * 2022-09-01 2022-11-29 杭州电子科技大学 CT image automatic classification method based on fusion time attention mechanism
CN116245832A (en) * 2023-01-30 2023-06-09 北京医准智能科技有限公司 Image processing method, device, equipment and storage medium
CN116091833A (en) * 2023-02-20 2023-05-09 西安交通大学 Attention and transducer hyperspectral image classification method and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Transformer-Based Recognition Model for Ground-Glass Nodules from the View of Global 3D Asymmetry Feature Representation;Jun Miao等;Symmetry;20231212;第15卷(第12期);3-8 *
主动学习的多标签图像在线分类;徐美香;孙福明;李豪杰;;中国图象图形学报;20150228(第02期);85-92 *

Also Published As

Publication number Publication date
CN117496280A (en) 2024-02-02

Similar Documents

Publication Publication Date Title
Chin et al. Incremental kernel principal component analysis
CN110705555B (en) Abdomen multi-organ nuclear magnetic resonance image segmentation method, system and medium based on FCN
Karami et al. Noise reduction of hyperspectral images using kernel non-negative tucker decomposition
CN110211165B (en) Image multi-mode registration method based on asynchronous depth reinforcement learning
CN111583285B (en) Liver image semantic segmentation method based on edge attention strategy
Zhou et al. Volume upscaling with convolutional neural networks
CN113012140A (en) Digestive endoscopy video frame effective information region extraction method based on deep learning
KR102645698B1 (en) Method and apparatus for face recognition robust to alignment shape of the face
CN111881920B (en) Network adaptation method of large-resolution image and neural network training device
CN104077742B (en) Human face sketch synthetic method and system based on Gabor characteristic
CN114219719A (en) CNN medical CT image denoising method based on dual attention and multi-scale features
Love et al. Topological deep learning
CN117496280B (en) Craniocerebral CT image quality control method based on 3D convolution and multi-label decoding
CN116543021A (en) Siamese network video single-target tracking method based on feature fusion
CN116385454A (en) Medical image segmentation method based on multi-stage aggregation
CN113379655B (en) Image synthesis method for generating antagonistic network based on dynamic self-attention
CN113689544B (en) Cross-view geometric constraint medical image three-dimensional reconstruction method
CN113343770B (en) Face anti-counterfeiting method based on feature screening
CN112784800B (en) Face key point detection method based on neural network and shape constraint
CN114283301A (en) Self-adaptive medical image classification method and system based on Transformer
CN112132253A (en) 3D motion recognition method and device, computer readable storage medium and equipment
Lee et al. Feature2mass: Visual feature processing in latent space for realistic labeled mass generation
Li et al. Human detection via image denoising for 5G-enabled intelligent applications
CN116597041B (en) Nuclear magnetic image definition optimization method and system for cerebrovascular diseases and electronic equipment
CN113689548B (en) Medical image three-dimensional reconstruction method based on mutual attention transducer

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant