CN113313691A

CN113313691A - Thyroid color Doppler ultrasound processing method based on deep learning

Info

Publication number: CN113313691A
Application number: CN202110619349.3A
Authority: CN
Inventors: 俞晔; 方圆圆; 姜婷
Original assignee: Shanghai First Peoples Hospital
Current assignee: Shanghai First Peoples Hospital
Priority date: 2021-06-03
Filing date: 2021-06-03
Publication date: 2021-08-27

Abstract

The invention relates to the technical field of medical image processing, and discloses a thyroid color Doppler ultrasound processing method based on deep learning, which comprises the following steps of S1: acquiring a real low-resolution image and a real high-resolution image of the thyroid color Doppler ultrasound, and defining the real high-resolution image as a high-resolution image; s2: converting the first-class high-resolution image to enable the thyroid area of the first-class high-resolution image to be overlapped with the thyroid area of the low-resolution image, and defining the converted first-class high-resolution image as a second-class high-resolution image; s3: constructing a convolutional neural network, and training the convolutional neural network by using the low-resolution images and the corresponding second-class high-resolution images; s4: processing a new low-resolution image of the thyroid color Doppler ultrasound by using the trained convolutional neural network; according to the method, the low-resolution images and the transformed high-resolution images are used as the training set of the neural network, the reconstructed high-resolution images are closer to the real high-resolution images, and the reconstruction effect is improved.

Description

Thyroid color Doppler ultrasound processing method based on deep learning

Technical Field

The invention relates to the technical field of medical image processing, in particular to a thyroid color Doppler ultrasound processing method based on deep learning.

Background

The color ultrasound is called color Doppler ultrasound examination, utilizes ultrasonic waves emitted by an instrument to display the structure and the shape of internal organs of a human body, and utilizes color to display blood flow information on the organs and the periphery, thereby having the advantages of no wound and high speed. The thyroid color Doppler ultrasound is a common technical means for examining the thyroid in medical treatment, but because of the small equipment performance and the small size of the thyroid, the thyroid color Doppler ultrasound image is not high in general definition, is not beneficial to a doctor to judge the state of an illness, further needs to be examined more, consumes time and medical resources, and although a few high-performance color Doppler ultrasound instruments can present clear images, the instruments are high in cost and cannot be widely applied.

The existing image super-resolution reconstruction technology generally adopts a neural network. Inputting a low-resolution image, outputting a high-resolution image, comparing the output high-resolution image with an actual high-resolution image, feeding the difference between the output high-resolution image and the actual high-resolution image back to the neural network, adjusting internal parameters by the neural network, then outputting the high-resolution image again, and continuously repeating the process for training until the difference between the high-resolution image output by the neural network and the actual high-resolution image is smaller than a preset threshold value, and finishing the training by the neural network. And then performing super-resolution reconstruction on the low-resolution image by using the trained neural network, thereby achieving a satisfactory effect. However, the data set used for training the neural network by the method usually consists of a real high-resolution image and a low-resolution image obtained by down-sampling according to the high-resolution image, the reality is not high enough, the neural network obtained by training reconstructs the real low-resolution image, the difference between the obtained high-resolution image and the real high-resolution image is large, and the reconstruction effect is not ideal.

Disclosure of Invention

Aiming at the defects in the prior art, the invention aims to provide a thyroid color ultrasonography processing method based on deep learning, which can enable a reconstructed thyroid color ultrasonography high-resolution image to be closer to a real thyroid color ultrasonography high-resolution image shot by a high-performance color ultrasonography instrument.

In order to achieve the above purpose, the invention provides the following technical scheme:

a thyroid color Doppler ultrasound processing method based on deep learning comprises the following steps: s1: the method comprises the steps that a thyroid lesion area in the same area is shot simultaneously by color ultrasound image acquisition equipment with different resolutions so as to obtain a real low-resolution image and a real high-resolution image of a thyroid color ultrasound at the same moment, and the real high-resolution image is defined as a type of high-resolution image; s2: performing conversion processing on the first-class high-resolution image by taking the low-resolution image as a reference so as to enable the coincidence degree of the thyroid gland area of the first-class high-resolution image and the thyroid gland area of the low-resolution image to be larger than a first proportional value, and defining the converted first-class high-resolution image as a second-class high-resolution image; s3: constructing a convolutional neural network, and training the convolutional neural network by using the low-resolution images and the corresponding second-class high-resolution images; s4: and processing a new low-resolution image of the thyroid color Doppler ultrasound by using the trained convolutional neural network.

In the present invention, preferably, S2 includes: s21: dividing respective thyroid areas from the low-resolution images and the first-class high-resolution images, and respectively carrying out binarization gray assignment to form binary images; s22: taking the binary image of the thyroid region of the low-resolution image as a reference, converting the binary image of the thyroid region of the first-class high-resolution image to enable the contact ratio of the two binary images to be larger than a first proportional value, and recording conversion parameters; s23: the first-class high-resolution image is transformed according to the transformation parameters recorded in S22, and the transformed image is subjected to gray-scale interpolation by a bilinear transformation interpolation method, and the formed image is defined as a second-class high-resolution image.

In the present invention, preferably, S22 includes: s221: determining an affine transformation model from a binary image of a thyroid region of a high-resolution image to a binary image of a thyroid region of a low-resolution image, wherein the affine transformation model at least comprises two unknown transformation parameters of translation amount and rotation angle; s222: and solving the unknown transformation parameters in the affine transformation model by using a mixed frog-leaping algorithm.

In the present invention, preferably, S4 includes: s41: extracting shallow features of the low-resolution image, and processing to generate a class of feature images; s42: respectively convolving the first class of characteristic images by using two different channels, and respectively processing the first class of characteristic images and the second class of characteristic images in the two channels to generate a second class of characteristic images A and a second class of characteristic images B; s43: enhancing high-frequency parts in the second-class characteristic image A and the second-class characteristic image B respectively through an attention mechanism, and processing the two channels respectively to generate a third-class characteristic image A and a third-class characteristic image B; s44: superposing the three types of characteristic images A and the three types of characteristic images B, and processing to generate four types of characteristic images; s45: residual error learning is carried out on the four types of characteristic images for a plurality of times, and five types of characteristic images are generated through processing; s46: and reconstructing a new thyroid color Doppler ultrasound image by utilizing a sub-pixel convolution method according to the five characteristic images.

In the present invention, preferably, S41 includes: s411: convolving the low-resolution image of the thyroid color Doppler ultrasound by using more than two cascaded convolution layers, and recording the formed image as a shallow characteristic image; s412: the shallow feature image is convolved with one convolution layer of a convolution kernel of 1 × 1, and the formed image is defined as a class of feature images.

In the present invention, preferably, S43 includes: s431: performing first convolution and activation operation on the second type characteristic image A and the second type characteristic image B by utilizing a convolution layer and a PReLU activation layer; s432: performing second convolution and activation operation on the first convolution and activated second type characteristic image A and the second type characteristic image B by utilizing a convolution layer and a Sigmoid activation layer to form a channel attention weight; s433: hadamard products are respectively carried out on the attention weights of the second-class characteristic image A and the second-class characteristic image B and the channel to form a third-class characteristic image A and a third-class characteristic image B.

In the present invention, preferably, each residual learning in S45 is implemented by a residual learning module, where the residual learning module includes a convolution layer, a normalization layer, an activation layer, a convolution layer, and a normalization layer, which are connected in sequence, and a jump connection is set between an input end of a first convolution layer and an output end of a second normalization layer.

In the present invention, preferably, the normalization layer adopts a weight normalization algorithm.

In the present invention, preferably, the convolutional neural network is trained as described in S3, and the loss function is a weighted sum of the mean square error and the mean absolute error.

Compared with the prior art, the invention has the beneficial effects that:

the thyroid color Doppler ultrasound processing method based on deep learning firstly collects real low-resolution images and high-resolution images of the thyroid gland shot by color Doppler ultrasound instruments with different performances, then the high-resolution images are converted, so that the converted high-resolution images are overlapped with the low-resolution images, the difference between the two images is only in the height of the resolution, and the converted high-resolution images are very close to the real high-resolution images, so that the low-resolution images and the converted high-resolution images are used as training sets of a neural network, the reality is higher, the reconstructed high-resolution images are closer to the real high-resolution images, and the reconstruction effect of the thyroid color Doppler ultrasound images is improved; the conversion parameters of the high-resolution image are solved by adopting a mixed frog-leaping algorithm, so that the solving speed and precision are ensured.

Drawings

Fig. 1 is a flowchart of a thyroid color Doppler ultrasound processing method based on deep learning.

Fig. 2 is a flowchart of S2 in the deep learning-based thyroid color Doppler ultrasound processing method.

Fig. 3 is a flowchart of S22 in the deep learning-based thyroid color Doppler ultrasound processing method.

Fig. 4 is a flowchart of S4 in the deep learning-based thyroid color Doppler ultrasound processing method.

Fig. 5 is a flowchart of S41 in the deep learning-based thyroid color Doppler ultrasound processing method.

Fig. 6 is a flowchart of S43 in the deep learning-based thyroid color Doppler ultrasound processing method.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It will be understood that when an element is referred to as being "secured to" another element, it can be directly on the other element or intervening elements may also be present. When a component is referred to as being "connected" to another component, it can be directly connected to the other component or intervening components may also be present. When a component is referred to as being "disposed on" another component, it can be directly on the other component or intervening components may also be present. The terms "vertical," "horizontal," "left," "right," and the like as used herein are for illustrative purposes only.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

Referring to fig. 1, a preferred embodiment of the present invention provides a deep learning-based thyroid color Doppler ultrasound processing method, including:

s1: the thyroid lesion blocks in the same area are shot simultaneously by utilizing color Doppler ultrasound image acquisition equipment with different resolutions so as to obtain real low-resolution images and high-resolution images of the thyroid color ultrasound at the same moment, and the real high-resolution images are defined as high-resolution images.

The low resolution image and the high resolution image are real images shot by a color ultrasonic instrument, the resolution is determined by the performance of the adopted instrument, the thyroid in the high resolution image and the thyroid in the low resolution image are from the same period of the same patient, and the shooting time interval of the two images is preferably not longer than one day. Thus, the main difference between the low-resolution image and the high-resolution image of the thyroid color ultrasound is in terms of resolution, position, angle, and the like. For the sake of easy discrimination, the real high-resolution image is defined as a kind of high-resolution image.

S2: and performing conversion processing on the first-class high-resolution image by taking the low-resolution image as a reference so as to enable the coincidence degree of the thyroid gland area of the first-class high-resolution image and the thyroid gland area of the low-resolution image to be larger than a first proportion value, and defining the converted first-class high-resolution image as a second-class high-resolution image.

The spatial and angular transformation of the first high-resolution image is performed to make the thyroid areas of the first high-resolution image and the second high-resolution image have a coincidence degree larger than a first ratio value, so that the angle and position relations of the low-resolution image and the first high-resolution image are basically the same, and the main difference is that the low-resolution image and the first high-resolution image have different resolutions and can be regarded as a low-resolution version and a high-resolution version of the same image. Wherein the first proportion value is preset according to the required precision of the image. For the convenience of distinguishing, the transformed first-class high-resolution image is defined as a second-class high-resolution image.

S3: and constructing a convolutional neural network, and training the convolutional neural network by using the low-resolution images and the corresponding two types of high-resolution images.

Convolutional neural networks typically contain several convolutional layers, activation layers, pooling layers, and the like. The basic training process is that the low-resolution image is input into a convolutional neural network, the convolutional neural network carries out operations such as extraction, learning and the like on the characteristics of the low-resolution image, then a new high-resolution image is generated through reconstruction, the high-resolution image is compared with the two types of high-resolution images, and the value of a loss function is calculated; feeding the value back to a convolutional neural network, adjusting parameters of each layer in the convolutional neural network, inputting a low-resolution image again, generating a new high-resolution image again, comparing the high-resolution image with the two types of high-resolution images, and calculating a value of a loss function; this process is repeated until the value of the loss function reaches a preset threshold.

The convolutional neural network is trained through a large number of low-resolution images and two types of high-resolution images corresponding to the low-resolution images, so that the convolutional neural network can form a mapping for reconstructing the low-resolution images into the two types of high-resolution images, when a new high-resolution image obtained by processing the low-resolution images is compared with the two types of high-resolution images, the value of a loss function is stabilized within a preset threshold value, the training can be considered to be finished, the difference between the reconstructed high-resolution image and the two types of high-resolution images can be ignored, and the convolutional neural network is considered to form a mapping for reconstructing the low-resolution images into the two types of high-resolution images.

The convolutional neural network is trained, and the loss function is weighted summation of mean square error and mean absolute value error. One of the loss functions commonly used in image processing is pixel-by-pixel loss, which basically means that the Error between pixel values at corresponding channels and spatial locations of two images is calculated pixel-by-pixel, and can be expressed as Mean Square Error (MSE) or Mean Absolute Error (MAE).

Mean square error represents the original high resolution image I_HRAnd the generated high resolution image I_SRThe average value after the squared euclidean distance of the pixel differences between each point, i.e.:

wherein L is_MSEAnd expressing a mean square error loss function, H × W is the size of the feature map, c is the number of channels of the feature map, s is the number of small-batch learning samples, and I (v, I, j, k) is the pixel value with the position (I, j) in the k channel of the v-th image. Continuously reducing L through model training and parameter updating_MSEIs reduced to a value of (a), the prediction result is reduced toThe difference between the real images improves the pixel similarity between the real images, thereby improving the performance of the model and the effect of image restoration. However, in the training of MSE, penalty is increased only for regions with large pixel difference, and penalty is decreased for regions with small difference, so that artifacts are likely to occur in smoother regions.

Mean absolute value error represents the original high resolution image I_HRAnd the generated high resolution image I_SRThe Euclidean distance between each point pixel difference is taken as the average value after the absolute value, namely:

wherein L is_MAEMean absolute value error is indicated. The MAE loss function has the advantages of being beneficial to fast convergence of the model and fast reduction of the loss value, but has the defects that training is easy to oscillate in the neighborhood of the global minimum value and the model is unstable. To overcome the above disadvantages of the two loss functions, the present embodiment combines MSE and MAE to set a loss function for training the neural network, which can be expressed as:

Loss＝(1-α)Loss_MAE+αLoss_MSE

where α is equal to the current iteration round divided by the total iteration round. 1- α and α are weighted for MAE and MSE, respectively, and then summed as the final loss function. The MAE loss function occupies larger weight at the initial stage of training, and is beneficial to quickly finding the region where the global minimum value is located by the model parameters; and as the number of training rounds increases, the weight ratio occupied by the MSE loss function begins to exceed the MAE, which is helpful for better convergence of the model parameters at the optimal point.

S4: and processing a new low-resolution image of the thyroid color Doppler ultrasound by using the trained convolutional neural network.

And directly inputting the low-resolution image needing super-resolution reconstruction into the trained convolutional neural network, so as to obtain a reconstructed image which is very close to the real high-resolution image.

In the present embodiment, as shown in fig. 2, S2 preferably includes:

s21: and (4) segmenting the thyroid areas from the low-resolution image and the first-class high-resolution image, and respectively carrying out binarization gray assignment to form a binary image.

When the thyroid areas are segmented in the low-resolution images and the high-resolution images, a segmentation method based on a graph segmentation theory is adopted. The basic steps of the segmentation method based on the graph cut theory are as follows: giving an initial contour line near the boundary of the thyroid gland area; with the curve as an axis, performing bidirectional expansion on d pixels to two sides to obtain an annular Contour line Neighborhood (FCN) with Fixed width, and establishing an s-t network with the inner boundary of the Neighborhood as a source point and the outer boundary as a sink point; calculating the weight of the s-t network edge by utilizing the gradient information of the image, and converting the multi-source and multi-sink s-t network into a single-source and single-sink s-t network by utilizing node unification operation (node identification) so as to reduce the number of nodes and edges and avoid the occurrence of large weight edges; cutting the network by using a maximum flow-minimum cutting algorithm to obtain a new contour line with the minimum energy function in the neighborhood; finally, a new contour line which is positioned in the narrow band and enables the energy function to be minimum is obtained by graph cutting; and then updating the neighborhood and carrying out iterative cutting by taking the new contour line as an axis. And then, the process is repeated continuously, the contour line gradually converges to the boundary of the thyroid gland area until the contour does not change any more, and the contour of the thyroid gland area can be obtained after the algorithm is terminated. This line is the new contour line with the smallest energy function in the neighborhood, and thus the thyroid gland region is segmented.

The image obtained after segmentation only comprises the background and the thyroid area outline, and the image comprises the angle, the position and other information of the thyroid area of the low-resolution image or the high-resolution image. In order to facilitate comparison and calculation of the two divided images, binarization assignment needs to be performed on the two divided images, so that a binary image is formed, and assignment of the thyroid gland area is different from a gray value of a background area. For example, the thyroid region is assigned to 1, the background region is assigned to 0, or vice versa, the thyroid region and the background region are well-defined, and a binary image of the thyroid region in a low-resolution image and the thyroid region in a high-resolution image is formed.

S22: and taking the binary image of the thyroid region of the low-resolution image as a reference, converting the binary image of the thyroid region of the first-class high-resolution image to enable the coincidence degree of the two binary images to be larger than a first proportional value, and recording conversion parameters.

The binary images of the thyroid areas of the low-resolution images and the high-resolution images are simple in composition and convenient to transform positions and angles. In the embodiment, the binary image of the thyroid region of the low-resolution image is taken as a reference and is kept unchanged, the binary image of the thyroid region of the first-class high-resolution image is converted, the binary image is continuously close to the binary image of the thyroid region of the low-resolution image, and the thyroid regions of the first-class high-resolution image and the thyroid regions of the second-class high-resolution image are finally overlapped.

Specifically, as shown in fig. 3, S22 may include:

s221: determining an affine transformation model from the binary image of the thyroid region of the high-resolution image to the binary image of the thyroid region of the low-resolution image, wherein the affine transformation model at least comprises two unknown transformation parameters of translation amount and rotation angle.

Assuming that the affine transformation model is T, the central coordinates of the image are (a, b), the rotated central coordinates are (c, d), the image is horizontally moved by Δ x pixels and vertically moved by Δ y pixels, the image is rotated by an angle θ with the center of the image as a base point, and the rotated upper left corner of the image is the origin, then:

wherein, Δ x and Δ y are translation amounts, and θ rotation angles are unknown transformation parameters.

S222: and solving the unknown transformation parameters in the affine transformation model by using a mixed frog-leaping algorithm.

Solving the unknown transformation parameters, that is, searching an optimal solution in a solution space, may be implemented by using a hybrid Frog-Leaping Algorithm (SFLA).

In the mixed frog-leaping algorithm, the total amount of frog groups is N, the number of subgroups is m, the frog number of each subgroup is N, N is m × N, and the maximum allowable jumping step length is S_maxThe subgroup iteration number is Ne, the total evolution number is MAXGEN, the ith frog is P_iWith an adaptation value of f_iGlobal optimal solution frog is P_xThe kth subgroup is M_k：

M_k＝{P_k(l),f_k(l)|P_k(l)＝P(k+m*(l-1)),f_k(l)＝f(k+m*(l-1)),l∈(1,n)}，k∈(1,m)

Wherein, P_k(l) Identifying the first frog in the subgroup, f_k(l) The fitness function of the first frog is shown. On the basis, according to the updating strategy of the mixed frog-leaping algorithm, the frog of each subgroup is subjected to modular factor management, local optimization is carried out, the optimal frog of each subgroup is recorded, after all subgroups are evolved, the optimal frog of each subgroup is sequenced, and the optimal frog P in the frog group is obtained_xThen all frogs in the group are remixed and sequenced, the subgroups are subdivided, and the above-mentioned finding of the optimum frogs P in the frog group is repeated_xUntil the global optimum frog P_xAdapted value of f_xThere is no longer any significant improvement, at this point P_xIs the optimal solution of the problem solved by the mixed frog leaping algorithm.

And setting each frog in the frog group to represent a feasible solution of unknown transformation parameters of the affine transformation model, substituting each frog into the affine transformation model to transform the binary image of the thyroid region of the high-resolution image, wherein the similarity between the transformed image and the binary image of the thyroid region of the low-resolution image is the adaptive value of each frog, and enabling the frog group to evolve until the adaptive value of the optimal frog reaches a preset threshold value.

According to the above affine transformation model, the solution space is three-dimensional, so that each frog P can be set_i＝(P_i ¹,P_i ²,P_i ³) Is provided with P_i ¹、P_i ²、P_i ³Respectively represents Deltax, Delay, theta, P_i ¹、P_i ²、P_i ³Substituting into T, namely performing determined transformation on the binary image of the thyroid region of a high-resolution image by using a known affine transformation model, and then calculating the similarity between the transformed image and the binary image of the thyroid region of a low-resolution image as an adaptive value f of the ith frog_iThen to f_iContinuously optimizing and improving until the adaptive value f of the optimal frog_xReaching a preset threshold.

The optimal solution is the obtained transformation parameters.

S23: the first-class high-resolution image is transformed according to the transformation parameters recorded in S22, and the transformed image is subjected to gray-scale interpolation by a bilinear transformation interpolation method, and the formed image is defined as a second-class high-resolution image.

And substituting the obtained transformation parameters into an affine transformation model, and transforming the high-resolution images by using the affine transformation model. The method has the advantages that the calculation precision and speed are moderate compared with a nearest neighbor interpolation method and a cubic convolution interpolation method, and the gray scale interpolation requirements of the transformed high-resolution images can be met.

Specifically, assume that the four neighboring pixel points of a certain interpolation point p are (I, j), (I, j +1), (I +1, j), and (I +1, j +1), and their gray values are represented as I_(i,j)、I_(i,j+1)、I_(i+1,j)、I_(i+1,j+1). Because the gray level change between adjacent pixel points in the thyroid color ultrasound image is linear, the gray level value I of the interpolation point can be calculated by using a linear interpolation method according to the gray level values of the adjacent pixel points around the interpolation point_pNamely:

I_p＝(1-b)(1-d)I_(i,j)+(1-b)dI_(i,j+1)+(1-d)bI_(i+1,j)+bdI_(i+1,j+1)

where b represents the lateral distance from the interpolation point p to the point (i, j), and d represents the longitudinal distance from the interpolation point p to the point (i, j).

In the present embodiment, as shown in fig. 4, S4 preferably includes:

s41: and extracting shallow features of the low-resolution image, and processing to generate a class of feature images.

The shallow feature is extracted by performing convolution operation on a low-resolution image by a few cascaded convolution layers, for example, one to three convolution layers, and for the convenience of distinguishing, an image formed by extracting the shallow feature is defined as a class of feature image. Specifically, as shown in fig. 5, S41 includes:

s411: and (3) convolving the low-resolution image of the thyroid color Doppler ultrasound by using more than two cascaded convolution layers, and recording the formed image as a shallow characteristic image.

The purpose of performing convolution with more than two concatenated convolutional layers is to expand the field of view of the convolution operation. Two methods can be adopted for enlarging the receptive field, one is to adopt a convolution kernel with larger size, and the other is to adopt multilayer continuous convolution. Although both methods can achieve the purpose of enlarging the receptive field, since the multilayer continuous convolution also has the effect of improving the nonlinear expression capability of the convolutional neural network, the present embodiment employs more than two cascaded convolution layers to enlarge the receptive field. The convolution operation extracts shallow features of the low-resolution image, and the formed image is defined as a shallow feature image.

S412: the shallow feature image is convolved with one convolution layer of a convolution kernel of 1 × 1, and the formed image is defined as a class of feature images.

In the step, the convolution layer adopts a convolution kernel of 1 × 1, so that the dimensionality of the characteristic channel can be reduced, and the additional calculation parameters are not increased. The convolution operation is equivalent to linear combination of characteristic values of the same coordinate position in different channels in shallow features, redundancy among the channels can be reduced by reducing the number of the channels, redundant characteristic information is discarded, and the fitting capability and the generalization capability of the convolutional neural network are improved.

S42: and respectively convolving the first-class characteristic image by using two different channels, and respectively processing the first-class characteristic image and the second-class characteristic image in the two channels to generate a second-class characteristic image A and a second-class characteristic image B.

In the step, two channels are set, the first layer set in each channel is a convolution layer, the convolution kernels of the two first layer convolution layers are different and are respectively used for performing convolution on one type of characteristic image in the channel, the images obtained after the convolution respectively reflect different characteristics of the original low-resolution image, so that the two channels respectively extract different characteristics of the original low-resolution image, and the images after the convolution of the first layer convolution layers in the two channels are respectively defined as a second type characteristic image A and a second type characteristic image B for convenient distinguishing.

S43: and respectively enhancing high-frequency parts in the second-class characteristic image A and the second-class characteristic image B through an attention mechanism, and respectively processing the two channels to generate a third-class characteristic image A and a third-class characteristic image B.

In the step, attention mechanism modules are arranged in the two channels, and the high-frequency parts of the second-class characteristic images A and the second-class characteristic images B are enhanced through the attention mechanism modules. The attention mechanism module comprises a plurality of convolution layers, an activation layer and the like, high-frequency parts in the two types of characteristic images A and the two types of characteristic images B are screened out in the two channels, and the high-frequency parts are enhanced, so that the images can be reconstructed in a more targeted manner. For the convenience of distinguishing, the images formed after the operation are respectively defined as a three-class characteristic image A and a three-class characteristic image B. Specifically, as shown in fig. 6, S43 may include:

s431: and performing first convolution and activation operation on the second type characteristic image A and the second type characteristic image B by utilizing a convolution layer and a PReLU activation layer.

The convolution layer of the step adopts a convolution kernel of 1 multiplied by 1 to carry out dimension reduction on the second class characteristic image A and the second class characteristic image B, so that the number of channels is reduced according to a certain proportion, and then a layer of PReLU activation is carried out to avoid the problem of gradient disappearance or explosion.

S432: and performing second convolution and activation operation on the first convolution and activated second type characteristic image A and the second type characteristic image B by utilizing a convolution layer and a Sigmoid activation layer to form a channel attention weight.

The convolution layer in the step also adopts a convolution kernel of 1 multiplied by 1 to perform dimension increasing on the first convolution and the second-class characteristic image A and the second-class characteristic image B after activation so as to restore the first-class characteristic image A and the second-class characteristic image B to the original channel number, then performs one-layer Sigmoid activation to capture the correlation among all the channels and generate the channel attention weight.

S433: hadamard products are respectively carried out on the attention weights of the second-class characteristic image A and the second-class characteristic image B and the channel to form a third-class characteristic image A and a third-class characteristic image B.

The step is to carry out weighting processing on the second-class characteristic image A and the second-class characteristic image B in different channels to realize the purposes of characteristic response and recalibration, so that the characteristics with rich information are selectively emphasized, redundant useless information is suppressed, and the formed images are the third-class characteristic image A and the third-class characteristic image B.

S44: and superposing the three types of characteristic images A and the three types of characteristic images B, and processing to generate four types of characteristic images.

This step combines the different high frequency features obtained by the two channels together for further deep learning. For the convenience of distinguishing, images formed after superposition are defined as four types of characteristic images.

S45: residual error learning is carried out on the four types of characteristic images for a plurality of times, and five types of characteristic images are generated through processing.

The step is realized by a plurality of cascaded residual error learning modules, and the depth characteristics of the low-resolution images are obtained through residual error learning and are used for subsequent super-resolution reconstruction operation. In order to facilitate the distinction, images formed after the residual error learning of the depth are defined as five types of characteristic images. Each residual error learning module comprises a convolution layer, a normalization layer, an activation layer, a convolution layer and a normalization layer which are sequentially connected, jump connection is arranged between the input end of the first convolution layer and the output end of the second normalization layer, namely, when four types of characteristic images pass through the region of the residual error learning module, not only identical mapping but also deep convolution learning is carried out, and then the four types of characteristic images are superposed to form five types of characteristic images.

Wherein, the normalization layer adopts a weight normalization algorithm. The idea of weight normalization is to standardize the weight of each layer of the network, decompose the weight vector of each neuron of the network layer in the parameter training process in the weight and direction, and then update the weight vector and the weight vector respectively. Weight normalization can define operations at the neuron level while preserving the advantages of normalization. And the decoupling of the weight amplitude and the direction accelerates the convergence of the neural network, and the parameters are only required to be slightly modified, so that the method is easy to realize.

S46: and reconstructing a new thyroid color Doppler ultrasound image by utilizing a sub-pixel convolution method according to the five characteristic images.

Suppose the size of the five types of feature images is H × W, and the number of channels is n. If the reconstruction requires the magnification factor of r, the size of convolution kernels for performing convolution on the five types of characteristic images is set to be 3 multiplied by 3, and the number of the convolution kernels is r²The size of the image obtained by the convolution operation of the x n, five types of characteristic images is unchanged, and the number of channels is expanded to r²X n, i.e. the dimension of the five-class feature image is expanded to H x W x r²n。

The next network layer converts r corresponding to each pixel point in the convolved five-class characteristic images²The channels are reordered into an r x r region that corresponds to the r x r sized sub-block that generated the upsampled five feature image classes. The dimension after the operation is H multiplied by W multiplied by r²The n images are reordered into rH × rW × n upsampled five feature image types.

And finally, performing image recovery operation on the five types of up-sampled characteristic images output after pixel rearrangement by adopting a convolution layer, setting the size of a convolution kernel of the layer to be 3 multiplied by 3, setting the number of channels to be c, and keeping the size of c consistent with the number of channels of the low-resolution image of the thyroid color Doppler ultrasound input by the neural network model initially, thereby recovering and obtaining the thyroid color Doppler ultrasound image with high resolution.

The above description is intended to describe in detail the preferred embodiments of the present invention, but the embodiments are not intended to limit the scope of the claims of the present invention, and all equivalent changes and modifications made within the technical spirit of the present invention should fall within the scope of the claims of the present invention.

Claims

1. A thyroid color Doppler ultrasound processing method based on deep learning is characterized by comprising the following steps:

s1: the method comprises the steps that a thyroid lesion area in the same area is shot simultaneously by color ultrasound image acquisition equipment with different resolutions so as to obtain a real low-resolution image and a real high-resolution image of a thyroid color ultrasound at the same moment, and the real high-resolution image is defined as a type of high-resolution image;

s2: performing conversion processing on the first-class high-resolution image by taking the low-resolution image as a reference so as to enable the coincidence degree of the thyroid gland area of the first-class high-resolution image and the thyroid gland area of the low-resolution image to be larger than a first proportional value, and defining the converted first-class high-resolution image as a second-class high-resolution image;

s3: constructing a convolutional neural network, and training the convolutional neural network by using the low-resolution images and the corresponding second-class high-resolution images;

2. The deep learning-based thyroid color Doppler ultrasound processing method according to claim 1, wherein S2 comprises:

s21: dividing respective thyroid areas from the low-resolution images and the first-class high-resolution images, and respectively carrying out binarization gray assignment to form binary images;

s22: taking the binary image of the thyroid region of the low-resolution image as a reference, converting the binary image of the thyroid region of the first-class high-resolution image to enable the contact ratio of the two binary images to be larger than a first proportional value, and recording conversion parameters;

3. The deep learning-based thyroid color Doppler ultrasound processing method according to claim 2, wherein S22 comprises:

s221: determining an affine transformation model from a binary image of a thyroid region of a high-resolution image to a binary image of a thyroid region of a low-resolution image, wherein the affine transformation model at least comprises two unknown transformation parameters of translation amount and rotation angle;

4. The deep learning-based thyroid color Doppler ultrasound processing method according to claim 1, wherein S4 comprises:

s41: extracting shallow features of the low-resolution image, and processing to generate a class of feature images;

s42: respectively convolving the first class of characteristic images by using two different channels, and respectively processing the first class of characteristic images and the second class of characteristic images in the two channels to generate a second class of characteristic images A and a second class of characteristic images B;

s43: enhancing high-frequency parts in the second-class characteristic image A and the second-class characteristic image B respectively through an attention mechanism, and processing the two channels respectively to generate a third-class characteristic image A and a third-class characteristic image B;

s44: superposing the three types of characteristic images A and the three types of characteristic images B, and processing to generate four types of characteristic images;

s45: residual error learning is carried out on the four types of characteristic images for a plurality of times, and five types of characteristic images are generated through processing;

5. The deep learning-based thyroid color Doppler ultrasound processing method according to claim 4, wherein S41 comprises:

s411: convolving the low-resolution image of the thyroid color Doppler ultrasound by using more than two cascaded convolution layers, and recording the formed image as a shallow characteristic image;

6. The deep learning-based thyroid color Doppler ultrasound processing method according to claim 4, wherein S43 comprises:

s431: performing first convolution and activation operation on the second type characteristic image A and the second type characteristic image B by utilizing a convolution layer and a PReLU activation layer;

s432: performing second convolution and activation operation on the first convolution and activated second type characteristic image A and the second type characteristic image B by utilizing a convolution layer and a Sigmoid activation layer to form a channel attention weight;

7. The thyroid color Doppler ultrasound processing method based on deep learning of claim 4, wherein each residual learning in S45 is implemented by a residual learning module, the residual learning module comprises a convolutional layer, a normalization layer, an activation layer, a convolutional layer, and a normalization layer, which are connected in sequence, and a jump connection is arranged between an input end of a first convolutional layer and an output end of a second normalization layer.

8. The deep learning-based thyroid color Doppler ultrasound processing method according to claim 7, wherein the normalization layer adopts a weight normalization algorithm.

9. The deep learning-based thyroid color Doppler ultrasound processing method according to claim 1, wherein the convolutional neural network is trained in S3, and the loss function is a weighted sum of mean square error and mean absolute value error.