CN110136067B

CN110136067B - Real-time image generation method for super-resolution B-mode ultrasound image

Info

Publication number: CN110136067B
Application number: CN201910443786.7A
Authority: CN
Inventors: 陈涛; 黄艳峰; 刘冠秀; 张丽; 刘骥宇
Original assignee: Shangqiu Normal University
Current assignee: Shangqiu Normal University
Priority date: 2019-05-27
Filing date: 2019-05-27
Publication date: 2022-09-06
Anticipated expiration: 2039-05-27
Also published as: CN110136067A

Abstract

The invention discloses a real-time image generation method for super-resolution B-mode ultrasound images, which aims at the problem that the definition needs to be improved in the generation process of the B-mode ultrasound real-time images. The method comprises the following steps: establishing a loss function based on MSE; two types of loss were prepared for the different phases: carrying out secondary classification on the gray scale of the black-and-white image by using the cross entropy loss, initializing the internal characteristics and attention parameters of the network, and using MSE as a loss function of a post-stage refinement result; constructing the neural network according to the core block of the deep convolutional neural network; stacking by using a block structure to obtain a deep convolution network, and generating a static graph; b, initializing the cross entropy loss in the step b by using an Adam optimizer, stopping training when the loss reduction rate is reduced to be close to flat, changing the loss to be MSE loss, and refining to generate enough reasoning results to improve the PSNR; weights are derived, integrated and run in the medical device using static maps. The invention can input single frame and output single frame, to obtain the result better than manual design image enhancement algorithm.

Description

Real-time image generation method for super-resolution B-mode ultrasound image

Technical Field

The invention relates to an image processing method of medical equipment, in particular to a real-time image generation method aiming at super-resolution B-mode ultrasound images.

Background

The B-mode ultrasound image is an image signal with low signal-to-noise ratio, and in order to realize images with higher signal-to-noise ratio, sharper and more information content, the performance can be improved through an image processing means besides improving the sampling rate of the device. The super-resolution can significantly reduce the A/D circuit performance and precision requirements of the device, thereby relying on image processing to make up for the deficiency of hardware. The typical super-resolution approach is based on integrating temporal information, i.e. considering the continuously changing B-Mode video as a video stream, and obtaining a clearer image through a multi-frame synthesis technique. This method directly results in a large delay in the imaging result, since the latest frame requires the first several frames to provide information. On one hand, the multi-frame method causes large calculation amount and cannot balance effects and calculation load; on the other hand, the multi-frame synthesis technology needs the image sequence to be almost static to integrate the background and foreground data, which improves the frame rate requirement of the equipment in a variable manner and loses the practicability of the technology.

The super-resolution technical contradiction of the model image is mainly attributed to: a plurality of still images are difficult to obtain, and the real-time performance and the equipment performance of a plurality of frames are in positive correlation; the delay of multi-frame super-resolution is large, and the method is not suitable for observation. The invention provides an implementation method of a real-time single-frame super-resolution system, which has strong practical significance for solving the contradiction.

Disclosure of Invention

The invention overcomes the problem that the definition needs to be improved in the process of generating the B-mode ultrasound real-time image in the prior art, and provides a real-time single-frame super-resolution real-time image generation method for the super-resolution B-mode ultrasound image.

The technical scheme of the invention is to provide a real-time image generation method for super-resolution B-mode ultrasound images, which comprises the following steps: a deep neural network structure for generating super-resolution B-Mode images by using a single frame and a training method thereof comprise the following steps:

step a, using single frame data of a B-ultrasonic equipment display as input of original data, simultaneously using multi-frame super-resolution enhanced images, forming a pair of the two, repeating the step to obtain a data set for training the neural network;

step b, preparing two kinds of loss for different stages: carrying out secondary classification on the gray scale of the black-and-white image by using the cross entropy loss, initializing the internal characteristics and attention parameters of the network, and using the MSE as a loss function of a post-stage thinning result;

step c, building a block structure of the deep convolutional network, generating a static diagram of the block, and building a network whole;

d, stacking by using a block structure to obtain a deep convolutional network, and generating a static graph of the whole network;

step e, initializing the cross entropy loss in the step b by using an Adam optimizer, terminating the training when the loss reduction rate is reduced to be close to flat, changing the loss to be MSE loss, and carrying out subsequent super-resolution refinement until enough reasoning results are generated to improve the PSNR;

and f, deriving the network weight, integrating and operating in the medical equipment by using the static map of the network, and only carrying out forward propagation.

In the step a, in the process of running and collecting samples, the display output image of the B ultrasonic equipment is directly used as the data of an original image, and meanwhile, the enhancement result of the image is obtained by using a high-resolution mode or using image enhancement software based on multi-frame super-resolution, and the two forms a data set for a neural network.

B, calculating a two-dimensional matrix obtained by network forward propagation by using the loss function established in the step b to obtain a loss value as backward propagation of the optimizer, using an evaluation method of mean square error and cross entropy, initializing network parameters by using cross entropy loss when using an Adam optimizer, distinguishing brightness and implicit spatial characteristics, and then training image detail generation by using MSE loss;

the two loss equations are as follows:

MSE loss, where (i, j) is the row and column position of the pixel,

for cross entropy loss, a pixel is regarded as a one-dimensional sequence, p is a true value, p is an input super-resolution image, and q is an inference result.

And c, selecting an internal structure of the Block for configuring connection layer by layer in the neural network, wherein the network Block is divided into three types of conventional feature extraction, down sampling and up sampling, the static graph of the network is generated by the configuration, and then the Block is configured according to a stacking mode to form the static graph of the whole network.

In the step d, the core block of the deep convolutional neural network is a single-input single-output module package of the end-to-end deep convolutional neural network and is a sub-network of the deep convolutional neural network, the input and output data of the core block are four-dimensional tensors of N, C, H and W, wherein N is the number of the input three-dimensional tensors C, H and W, and the formed network is a static graph for defining the network.

And e, using two loss functions and an optimizer based on gradient descent search, firstly using cross entropy loss and then using MSE loss according to different stages, carrying out calculation with limited iteration times by the optimizer, and carrying out termination condition that the loss is descended to 0.05-0.04, wherein the weights of the deep convolutional neural network for improving the PSNR of the input image are obtained in the steps a-e.

And f, storing all neural network parameters by the trained deep neural network in the step f, exporting the neural network parameters, combining the static graph constructed in the step d, operating in a computing device, simultaneously using an image output by a display as the input of the network by the computing device according to the method in the step a, further integrating an edge computing unit in one machine, and directly obtaining an image with a higher PSNR (Peak Signal to noise ratio) by depending on single frame data, namely a super-resolution image.

Compared with the prior art, the real-time image generation method for the super-resolution B-mode ultrasound image has the following advantages: the invention provides a real-time super-resolution image generation method aiming at super-resolution of B-mode ultrasound images. The invention is based on machine learning, needs to prepare samples and labels in advance, provides a convenient way to directly obtain labeled data, and simplifies the data preparation process. The proposed deep convolutional neural network is used for fast generating super-resolution B-Mode images, and generates images with higher PSNR than the original images.

The good performance of the deep convolution neural network is that single frame input and single frame output can be realized, and a result superior to a manually designed image enhancement algorithm is obtained. The computation amount of the deep neural network is fixed, and the parallelization of the convolution instructions can achieve very high efficiency. Therefore, local calculation can be realized, and a software and hardware system is formed.

Drawings

FIG. 1 is a schematic diagram illustrating the difference between the real-time image generation method for super-resolution B-mode ultrasound image and the conventional time-domain super-resolution method according to the present invention;

FIG. 2 is a schematic diagram showing the relationship between the real-time image generation method for super-resolution B-mode ultrasound image and the output image of B-mode ultrasound machine according to the present invention;

FIG. 3 is a schematic diagram of forward propagation computation on input images for the method of real-time image generation for super-resolution B-mode ultrasound images of the present invention;

FIG. 4 is a schematic diagram of an internal implementation process of a conventional feature extraction Block440 in a deep convolutional neural network Block according to a real-time image generation method for super-resolution B-mode ultrasound images;

FIG. 5 is a schematic diagram of an internal implementation process of a downsampling Block441 in a deep convolutional neural network Block according to a real-time image generation method for super-resolution B-mode ultrasound images;

FIG. 6 is a schematic diagram of an internal implementation process of an upsampling Block442 in a depth convolutional neural network Block according to a real-time image generation method for super-resolution B-mode ultrasound images;

FIG. 7 is a schematic diagram illustrating an internal implementation process of the real-time super-resolution deep convolutional neural network 400 in the real-time image generation method for super-resolution B-mode ultrasound images according to the present invention;

fig. 8 is a schematic flow chart of obtaining a super-resolution image from an original input image in the method for generating a real-time super-resolution B-mode image according to the present invention.

Detailed Description

The present invention will be further explained with reference to the drawings and the detailed description of the method for generating real-time super-resolution B-mode ultrasound images according to the present invention: a deep neural network structure for generating super-resolution B-Mode images by using a single frame and a training method thereof comprise the following steps:

step a, using single frame data of a B ultrasonic equipment display as the input of original data, simultaneously using multi-frame super-resolution enhanced images, forming a pair of the two, repeating the step to obtain a data set for training the neural network;

step b, preparing two kinds of loss for different stages: carrying out secondary classification on the gray scale of the black-and-white image by using the cross entropy loss, initializing the internal characteristics and attention parameters of the network, and using MSE as a loss function of a post-stage refinement result;

and f, exporting the network weight, integrating and operating in the medical equipment by using the static map of the network, and only carrying out forward propagation.

In the step a, in the process of running and collecting samples, the display of the B-ultrasonic equipment is directly used for outputting images as data of original images, and meanwhile, the enhancement result of the images is obtained by using a high-resolution mode or using image enhancement software based on multi-frame super-resolution, and the two images form a data set for a neural network.

the two loss equations are as follows:

MSE loss, where (i, j) is the row and column position of the pixel,

for cross entropy loss, a pixel is taken as a one-dimensional sequence, p is a true value, p is an input super-resolution image, and q is an inference result.

In the step d, the core block of the deep convolutional neural network is a single-in single-out module package of the end-to-end deep convolutional neural network, the core block is a sub-network of the deep convolutional neural network, the input and output data of the core block are four-dimensional tensors of N, C, H, W, wherein N is the number of the input three-dimensional tensors C, H, W, and the formed network is a static graph for defining the network.

The specific implementation process of this embodiment is as follows, and a deep neural network edge computing system for super-resolution of B-Mode images is a computing module for local real-time processing of B-Mode images, and a deep convolutional neural network is used to overcome the disadvantage that the conventional super-resolution or modeling method depends on time domain information, and the difference from the conventional time domain super-resolution method is shown in fig. 1.

Based on the above, a real-time image post-processing system and a training method are provided for solving the problems of high replacement cost and complicated post-processing flow of the existing equipment. To implement this system, a detailed embodiment is as follows: to is coming toIn distinction from the time-domain super-resolution method, fig. 1 intuitively shows the difference between the two. The original image 101 passes through n frames F ₀ ～F _n After the information is collected, the super-resolution image 102 is obtained by processing in the time domain method, in order to obtain the latest frame F _n The time consuming associated with the latest time-domain super-resolution image 102 is: objective presence of F ₀ ～F _n Frame acquisition time of (2): t (F) ₀ ～F _n ) And a post-processing time T _p 。

The super-resolution time in the invention does not depend on the previous frame information, so the time consumption is as follows: objective presence of F ₀ Frame acquisition time: t (F) ₀ ) And a post-processing time T _p . The

super-resolution images

102 and 103 are different in whether a plurality of frames or a single frame is used.

In order to more conveniently attach the system to the existing B-ultrasonic machine for working, the image output by the B-ultrasonic machine can be directly post-processed. As shown in fig. 2, there are many equivalent alternatives to the method of obtaining the input image 101, and only a preferred embodiment for obtaining the image without modifying the existing hardware is shown here.

The system hardware is composed of a signal acquisition module 202, a calculation unit 301 and an image output unit, and the signal acquisition module 202 can be used for directly acquiring an image signal 201 output by the display. One preferred embodiment is: an HDMI signal acquisition module is connected to a display interface of the B-mode ultrasonic machine to obtain an output picture of the B-mode ultrasonic machine, and then region-of-interest clipping is performed, as shown by 201-202 in FIG. 2. The obtained image 101 is transmitted to the calculation unit 301, and finally the super-resolution image signal 103 is obtained.

The computing unit is a computer, and is used for loading the deep neural network intermediate representation data, such as the weights 302 and the network structure 303, and performing forward propagation computation on the input image 101 to finally obtain the super-resolution image 103, as shown in fig. 3. The computing unit is used for realizing a hardware system, the super-resolution computation is completed by using the deep convolutional neural network, and the Graph450, the blocks 440-442 of the internal deep convolutional neural network and the preparation method of the training data thereof are emphasized below. Internal implementations of deep convolutional neural networks are shown in fig. 4-7.

The deep convolutional neural network implementation within the compute unit is divided into a Block design (440, 441, 442) and a Graph configuration 450.

As shown in fig. 4-7, Block is a single-in single-out end-to-end structure, and is a module package of an end-to-end deep convolutional neural network as a deep neural network, and is a sub-network, and channel attention bypass is used in Block to improve network efficiency. The input and output data is a four-dimensional tensor of N C H W, N is the number of the input three-dimensional tensor C H W, N is strictly 1 at any position of the network in the inference process, N is a positive integer which is not 0 in the training process, and N is strictly equal to the size of the training Batch (Batch) at any position of the network. C represents the number of input/output channels of Block, C is a non-0 positive integer, and the number of input and output channels C is not necessarily equal according to the attribute of Blcok. H and W define the height and width of the two-dimensional tensor, i.e. the number of rows and columns.

In order to describe the internal structure and working principle of the deep convolutional neural network in the invention in a concise manner, N is considered to describe the task batch size of the network, and N can be any non-zero natural number, so that the number of N is not considered in the following description of tensor.

The network Block is divided into three types, namely conventional feature extraction 440, downsampling 441 and upsampling 442. The Block of the conventional feature extraction uses a channel attention mechanism based on feature global strength, namely, an average value is obtained for each channel and input into a small-sized fully-connected network for weight gain identification, the distribution mode of the network channel strength represents the difference of input tasks, and the attention mechanism can improve the network resource utilization rate and the expression capacity under different tasks. The activation function is sigmoid:

therefore, the strength of the channel does not affect the network performance under the condition of being incapable of converging, the channel utilization rate can be greatly improved under the condition of converging, namely the expressive ability can be improved according to the pyramid characteristics in a single layer,it is also possible to explicitly have the optimizer allocate different channel resources to different tasks according to the channel distribution that identifies the feature strength of the input.

For convenience of description, the key operations 401 to 410 of blocks 440 to 442 are explained first: for the sake of the following description and to avoid repetition, it is first noted that K is the convolution kernel side length, and the following convolution kernel shapes are all squares, allowing a side length of 1, i.e. 1 × 1 convolution.

The Padding parameters for all convolution operations are based on no change in H and W after convolution, as explained below, such that the Size of each edge _padding Or (K-1). The convolution operation is performed with a sliding window, where Padding has the effect of ensuring that the magnitude of the tensor does not "shrink by one turn" after the convolution calculation.

Padding operation is usually to perform 0 Padding, but data of adjacent edges can also be copied, even mirror Padding is performed with the outermost edge as a symmetry axis, and due to its variability, the requirement of Padding form is not made here in this embodiment.

The tensor 401 is an input tensor of Block, and the tensor 406 is a tensor obtained by performing a point product calculation 404 on the tensor obtained by the operation 405 on the tensor 401.

The Block 402 operation is a Global Average Pooling operation (Global Average Pooling) that globally averages the tensor of C H W by channel (GAP).

To the first

The sum formula of GAP performed by the channel is as follows:

the

output tensors

408 and 412 are different in that 408 is an intermediate variable of the network and 412 may be an intermediate variable of the network or the final output of the network. The shapes of 408 and 412 are set according to the actual size of the tensor in Blcok.

The different Blcok will be explained below: as shown in Block440 of fig. 4, 440 is a set of (Block) convolutional neural networks, which are used for multi-scale depth extraction features, and are sized and configured to: the input and output tensors are of the same size, C × H × W.

The input tensor 401 computes a tensor 406 over two paths. Using

operations

402, 403 to obtain a set of scalars of size C, a dot-by-channel C multiplication is performed on 406 by 404. 406 operate through 411, 412, 413, 414 to yield four tensors. The four tensors are combined into an output tensor 408 in channel dimensions by 407.

It should be explained in detail that 405 is a 1 × 1 convolution for obtaining C/4 two-dimensional tensors with the same size by linearly combining the C two-dimensional tensors inside 401, and forming the two-dimensional tensors with the same size

The tensor 406.

It has been mentioned above that for a shallower network, in order to enhance the Block expression capability, an

explicit expression manner

402, 403, and 404 is used to enable Blcok to dynamically adjust the feature strength in the forward propagation process according to the feature strengths of different channels, that is: the response intensity of each channel of the input feature 401 is identified, and dynamic weight compensation and suppression are performed for 406 by channel. The size of the sub-network 403 is arranged such that the sub-network 403 is a set of three fully connected networks, the first and third layers being of size C, the second being an implicit layer, of size C x 4.

411. The 412, 413, 414 operations are generalized packet convolution operations with a recommended Group (Group) size of

Is a depth separable convolution, and in special cases can be

Is a typical packet convolution. The convolution kernel parameters of the convolution operation can be trained, and each convolution operation in fig. 4-7 is a training operation, which is not described in detail below. The configuration is as follows: the size K of the convolution Kernel (Kernel) of 411 is 7, that is, a two-dimensional matrix with a size of 7 × 7, which is not described herein below, the number of output channels is the same as the number of input channels, and the size of the recommended Group (Group) is C/4.

412 has a convolution Kernel (Kernel) size of 3, the number of output channels is the same as the number of input channels, and the recommended Group size is 3

The operation is an expanding convolution, also known as a perforated convolution, which is equivalent to a sieve-like sampling structure with 3 x 3 pixels sampled. R is the expansion ratio, and for small resolution input, R is 1, i.e. no pixel is skipped when sampling with a sliding window. This parameter can be determined from practical considerations, e.g., R2, the convolution kernel of 412 is equivalent to a window size of 7 x 7, but 3 x 3 pixels are still sampled, skipping 1 pixel. If the pixel value of the convolution kernel involved in sampling is 1 and the skipped value is 0, then the two-dimensional representation when R is 2 is:

413 has a convolution Kernel (Kernel) size K of 5, the number of output channels is the same as the number of input channels, the operation is a Group convolution, and the recommended Group size is

414 has a convolution Kernel (Kernel) size K of 3, the number of output channels is the same as the number of input channels, the operation is a Group convolution, and the recommended Group size is

By means of 411, 412, 413, 414 operations, a total of 4 sizes of

The tensor of (a).

Here, the operation 407 is defined, and merge (merge) refers to an operation in which a plurality of tensors are arranged in order in a certain dimension and are used as new tensors. Here, the dimension is a channel dimension. For ease of understanding, one representation of the figure is: taking 411 as an example, the method is obtained

The two-dimensional tensors of H × W are stacked, i.e., 411 outputs tensors of size C × H × W. 407 is operated to continue the tensor of 412 "base" on 411. The dimension of the "base" is the channel dimension. By analogy 407 merges the tensors of 411, 412, 413, 414 by channel into a tensor 408 of size C × H × W.

The operations of 407 are the same logic in the following, and are also the same in the representations of fig. 4-7, and therefore are not described again. For avoiding duplicate descriptions, the following

concepts

401, 407, and 408 refer to the above information.

Block440 is a feature extraction structure that is interleaved in the network, and can effectively improve the network expression capability, which is an important idea of the present invention, and this idea is also embodied in Block 441.

As shown in Block441 in fig. 5: 441 is a set (Block) of convolutional neural networks, whose purpose is to downsample and extract features, whose size and configuration are: the input tensor 401 size is C × H × W; the output tensor 408 size is

401 are computed over a path 409. The resulting tensor 409 is operated on by 415, 416, 417, 418 respectively to yield four tensors.

The four tensors are combined into an output tensor 408 in channel dimensions by 407. The operation 409 is that the size K of the convolution Kernel (Kernel) is 3, the number of output channels is the same as the number of input channels, the step size (Stride) is 2, the operation is conventional convolution, and the Group size (Group) is 1, i.e. the grouping calculation is not performed. 409 features are extracted by sliding a convolution kernel of 3 x 3, the step size (Stride) is 2, Stride-1 pixel due to step discontinuity, and the sliding window is in the HW dimension, so the resulting tensor size is 2

The tensors obtained at 409 are respectively four tensors obtained by the following operations: 415. 416, 417, 418 operation is broadThe recommended Group (Group) size of C, is a deep separable convolution, and may be a special case

Is a typical packet convolution.

415 has a convolution Kernel (Kernel) size K of 7, the number of output channels is the same as the number of input channels, and the recommended Group size is C.

416 has a convolution Kernel (Kernel) size K of 3, the number of output channels is the same as the number of input channels, and the recommended Group size is

The operation is a dilation convolution. See the description related to 412 for a definition of the dilation convolution.

441 is the process of down-sampling, if the tensor 401(H, W) input to 441 is too large, suggesting R2 or R3, the convolution kernel of 416 is equivalent to a window of size 7 x 7 or 15 x 15, but still samples 3 x 3 pixels, skipping 1 or 3 pixels when sampling.

If the pixel value of the convolution kernel involved in sampling is 1 and the skipped value is 0, then the two-dimensional representation when R is 2 is:

when R ═ 2 or R ═ 3, a large or extremely large window (7 × 7 or 15 × 15), i.e., a receptive field, can be obtained, facilitating efficient capture of spatial features over a wide range. 442 is at the front of the network, where a large reception field is important. 442 also comprises convolutions of small

receptive fields

415, 417 and 418, and combined with information of large receptive fields 416, tensor 410 can simultaneously sense multi-scale features in 442, wherein the features are important for extracting later depth features, and the depth of the network can be effectively reduced through the multi-scale design of 441 and 440.

440 and 441 are capable of extending the expressive power of the network while using fine-grained packet convolution or deep separable convolution, which can improve the computational efficiency and ensure the accuracy and expressive power of the network.

442 is a naive upsampling network, which benefits from the rich features of the front segment and the multi-scale expression capability of 420, and 442 can obtain richer information at the back segment of the network only by upsampling. If PSNR is to be improved or used for other purposes, it can be replaced by a structure based on deconvolution, 442 is only an embodiment of a complete network, upsampling at the later stage of the network can be diversified, and the upsampling method is not within the protection scope of the present invention.

As shown in Block442 of fig. 6: 442 is a set of (Block) convolutional neural networks, for upsampling and combining features, sized and configured to: the input tensor 401 size is C × H × W; the output tensor 412 is of size

401 are computed via a path 410.

410 is a naive Bicubic upsampling and 1 x 1 convolution combining operation, and the 1 x 1 convolution is combined to linearly combine the input tensor 401 before upsampling to obtain the input tensor

The tensor 412.

The number of channels output by 412 at the end is 1, and in order to overcome the upsampling blur of 412, a simple edge enhancement algorithm can be performed on the network output result to improve the PSNR.

As shown in CNN400 of fig. 7, the overall structure of the network is hourglass-shaped, oriented from left to right, and the input image is a single channel. Wherein the parameters shown in each Block have the following meanings: taking the first 441C4 as an example, 441 is the structure of Block441 shown in fig. 5, and C4 is 4 output channels. Taking the middle 440C256 by 20 as an example, 440 is the structure of Block440 shown in fig. 4, C256 is the number of output channels 256, and 20 indicates that the structure is connected in series by 20. By analogy, network end 442C1 is the structure of Block442 shown in fig. 6, and C1 is the output single channel image.

In order to train the network, details of the method of obtaining training data, the Loss function, and the optimizer need to be provided.

As shown in fig. 8, an original sample set 501 is a set of the acquired original input 101, and a super-resolution sample 502 is a set of the super-resolution image 102 obtained by time domain method offline. Referring to the schematic of the time domain method in fig. 1, 101 and 102 are paired, and multiple 101 sets can be sorted by repeating the time domain method to obtain an original sample set 501 and a super-resolution sample 502. Referring to fig. 8 and fig. 4-7, CNN in fig. 8 refers to CNN400 in fig. 7, i.e., the entire neural network. Referring to fig. 8, CNN loads data in the original sample set 501, and the Loss function 503 calculates the result of forward propagation of the neural network 400 and the corresponding data in the super-resolution sample 502.

In order to facilitate the representation of the Loss function, an image selected from the original sample set 501 is transmitted into the neural network 400, an image obtained by forward propagation is referred to as X, and an image corresponding to 501 in the super-resolution sample 502 is referred to as Y.

The Loss function uses MSE (mean square error):

where (i, j) is the row and column position of the pixel.

The Loss function generally uses a regularization term to prevent overfitting, and here does not use the regularization term in Loss, but rather uses the Dropout approach to prevent overfitting. This is designed in consideration of the specificity of the network.

Dropout is used for all trainable neurons in the neural network 400, the rule is as follows: global Dropout probability p _global 10%, where Dropout probability p of 403 in fig. 4 ₄₀₃ ＝[0％，10%]During training, manually and gradually reducing according to the training process until floor (C4 p) ₄₀₃ ) 0. Where floor (x) is a rounding down operation on x.

Due to the wide use of the full-automatic optimizer, the optimizer using the Adam method can avoid manual adjustment of the learning rate and relieve the over-high learning rateThe resulting training failure and the too low learning rate lead to the problem of too long convergence time. The optimizer 504 employed in the present invention is an Adam optimizer, one recommended parameter being: initial learning rate of 0.003, gradient update parameter beta _1，2 ＝(0.9，0.99)。

The trained CNN weights 302 can be loaded through any deep learning framework capable of parsing and instantiating the CNN structure 303, and run in the 301 calculation unit shown in fig. 3, and finally form an end-to-end real-time super-resolution system through the relationship shown in fig. 2.

To sum up: the invention provides a system total architecture, and an end-to-end software and hardware system for super-resolution is obtained based on the system total architecture. The invention provides a deep convolutional neural network 400 for rapidly generating a super-resolution B-Mode image. The invention provides a convenient way to directly obtain the marked data 502, and simplifies the data preparation process.

The real-time super-resolution deep convolution neural network 400 for super-resolution of B-mode ultrasound images provided by the invention has a strong realistic meaning. Furthermore, in combination with the training method and the system architecture provided by the invention, the computing module based on the invention can be used as a component of ultrasonic imaging equipment, and is suitable for portable equipment with lower signal quality to improve the signal-to-noise ratio or for improving the performance of common equipment.

Although the embodiments of the present invention have been described in detail as possible, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A real-time image generation method for super-resolution B-mode ultrasound images is characterized by comprising the following steps: the deep neural network structure for generating the super-resolution B-Mode image by using a single frame and the training method thereof comprise the following steps:

step c, building a block structure of the deep convolutional network, generating a static graph of the block, and building a network whole;

2. The method for generating real-time images for super-resolution B-mode images according to claim 1, wherein: in the step a, in the process of running and collecting samples, the display output image of the B ultrasonic equipment is directly used as the data of an original image, and meanwhile, the enhancement result of the image is obtained by using a high-resolution mode or using image enhancement software based on multi-frame super-resolution, and the two forms a data set for a neural network.

3. The method for generating a real-time image for super-resolution B-mode images according to claim 1, wherein: b, calculating a two-dimensional matrix obtained by network forward propagation by using the loss function established in the step b to obtain a loss value as backward propagation of the optimizer, initializing network parameters by using cross entropy loss when using an Adam optimizer by using an average variance and cross entropy evaluation method, distinguishing brightness and implicit spatial characteristics, and then training image detail generation by using MSE loss;

two kinds ofThe formula for loss is as follows:

MSE loss, where (i, j) is the row and column position of the pixel,

4. The method for generating real-time images for super-resolution B-mode images according to claim 1, wherein: and c, selecting an internal structure of the Block for configuring connection layer by layer in the neural network, wherein the network Block is divided into three types of conventional feature extraction, down sampling and up sampling, the configuration generates a static graph of the network, and then the Block is configured in a stacking mode to form the static graph of the whole network.

5. The method for generating a real-time image for super-resolution B-mode images according to claim 1, wherein: in the step d, the core block of the deep convolutional neural network is a single-input single-output module package of the end-to-end deep convolutional neural network and is a sub-network of the deep convolutional neural network, the input and output data of the core block are four-dimensional tensors of N, C, H and W, wherein N is the number of the input three-dimensional tensors C, H and W, and the formed network is a static graph for defining the network.

6. The method for generating real-time images for super-resolution B-mode images according to claim 1, wherein: and e, using two loss functions and an optimizer based on gradient descent search, firstly using cross entropy loss and then using MSE loss according to different stages, carrying out calculation with limited iteration times by the optimizer, and obtaining the weight of the deep convolution neural network for improving the PSNR of the input image under the termination condition that the loss is reduced to 0.05-0.04.

7. The method for generating real-time images for super-resolution B-mode images according to claim 1, wherein: and f, storing all the neural network parameters by the trained deep neural network in the step f, exporting the neural network parameters, combining the static graph constructed in the step d, running in a computing device, simultaneously using an image output by a display as the input of the network by the computing device according to the method in the step a, further integrating an edge computing unit in one machine, and directly obtaining an image with a higher PSNR (Peak signal to noise ratio), namely a super-resolution image, by depending on single-frame data.