CN116563396A

CN116563396A - Three-dimensional biomedical image compression method, system, equipment and storage medium

Info

Publication number: CN116563396A
Application number: CN202210099132.9A
Authority: CN
Inventors: 李礼; 薛冬梅; 马海川; 刘�东; 熊志伟
Original assignee: University of Science and Technology of China USTC
Current assignee: University of Science and Technology of China USTC
Priority date: 2022-01-27
Filing date: 2022-01-27
Publication date: 2023-08-08

Abstract

The invention discloses a three-dimensional biomedical image compression method, a system, equipment and a storage medium, which designs three-dimensional affine wavelet transformation based on training, introduces affine images, and after the training is finished, although a prediction and update network is fixed, the affine images are changed according to different inputs, so that the affine images are equivalent to wavelet basis functions to be adjusted according to different input contents, and the coding performance of the three-dimensional biomedical image is improved. On the basis, the three-dimensional affine wavelet transformation based on training and the three-dimensional entropy coding method based on the depth network are combined, so that the coding performance can be further improved.

Description

Three-dimensional biomedical image compression method, system, equipment and storage medium

Technical Field

The present invention relates to the field of image compression encoding technology, and in particular, to a method, a system, an apparatus, and a storage medium for compressing a three-dimensional biomedical image.

Background

The three-dimensional biomedical image mainly includes a three-dimensional biomedical image and a three-dimensional medical image. With the development of artificial intelligence, the three-dimensional biomedical image has wider application prospect and application requirement. Three-dimensional biomedical images are huge in size, and present a great challenge for storage and transmission. Three-dimensional image compression techniques are therefore key technologies that enable widespread use of such images. The current mainstream technology for three-dimensional biomedical images is based on the traditional wavelet transform scheme, with the most widely used technology being the JP3D standard (Extensions for Three-Dimensional Datain JPEG-2000, three-dimensional image expansion of the jpeg-2000 standard).

The traditional wavelet transformation is a local transformation method, inherits and develops the concept of short-time Fourier transformation localization, overcomes the defects that the window size does not change along with frequency and the like, can provide a time-frequency window which changes along with frequency, and is an ideal tool for carrying out time-frequency analysis and processing on signals. It can perform localized, multi-scale analysis on images and is therefore often used in the task of image coding. There are two implementations of conventional wavelet transforms, called first generation wavelet and second generation wavelet, respectively. The first generation wavelet transformation adopts a form of a base function to decompose signals, and the matrix operation is complex, which is not beneficial to hardware realization. The second generation wavelet is called a wavelet lifting scheme, which decomposes the first generation wavelet into lifting structures, is convenient for manual design, and the structure is suitable for hardware realization.

Taking a three-dimensional image as an example, the process of decomposing the signal by the second generation wavelet transform is shown in fig. 1. The first three-dimensional image (input signal) is split in a certain direction, and the odd frame (odd signal) and the even frame (even signal) are split. The odd frames then go through a prediction step, which in conventional wavelets is implemented by a linear prediction filter. The resulting filtered result is differenced from the even frames and the difference result is considered to contain the detail components of the original signal, i.e. the decomposed high frequency signal. Filtering the high-frequency signal by an updating filter, and adding the obtained signal and the odd frame to obtain an approximate component of the original signal, namely a low-frequency signal obtained by decomposition; the above step of lifting is completed once, and it is noted that the lifting step can be freely selected for a plurality of times without being limited to one time, the low-frequency signal obtained last time is used as an odd signal and the high-frequency signal is used as an even signal in the next lifting step, and then the prediction step and the updating step are performed according to the above mode to obtain new low-frequency signals and high-frequency signals; multiple boosting steps may further result in more accurate high and low frequency components. After one or more lifting steps, the decomposition in the corresponding direction is completed, and a decomposed high-frequency signal and a decomposed low-frequency signal are obtained; and decomposing the decomposed high-frequency signal and low-frequency signal in other two directions to complete decomposition, wherein the number of times N of complete decomposition is more than or equal to 1. And 8 decomposed sub-bands are obtained from the final three-dimensional image through a one-time decomposition process, and three directions of the three-dimensional image are sequentially marked as x, y and z. It is assumed that the low frequency signal L and the high frequency signal H are obtained by first performing one decomposition (or a plurality of times) in the z direction. And respectively decomposing L and H in the y direction, wherein L is decomposed into: LL and HL; h is decomposed into LH and HH. The four subbands are decomposed in the x-direction to obtain 8 subbands LLL, HLL, LHL, HHL, LLH, HLH, LHH, HHH. Where LLL is the lowest frequency subband, which contains the most dominant low frequency details in the original signal, which can be used for the next complete decomposition. N decompositions result in 7N+1 subbands. The subbands resulting from the wavelet transform are more concentrated than the original signal energy, and therefore the discrete wavelet transform is often used as a transform part of image coding, and these subbands are further quantized and entropy coded to obtain the final code stream. Common subband entropy coding methods include EZW, SPIHT, EBCOT, etc.

As shown in fig. 2, a JPEG-2000 encoding scheme is shown, which is a common encoding method based on wavelet coding transformation, and a standard developed by JPEG for image encoding. The original image data can obtain a series of subband coefficients with more concentrated energy through preprocessing and discrete wavelet transformation (namely the traditional wavelet transformation mentioned above), and the subband coefficients are floating point numbers, which is unfavorable for saving code rate in coding. The floating point numbers are converted into integers through a quantization step, and then the subband coefficients are organized into code streams by using an entropy coding method, so that compressed image data are obtained. And for the decoding end, after the transmitted coefficients are obtained, reconstructing an original image through entropy decoding, inverse quantization and inverse discrete wavelet transformation, wherein the flow is shown in fig. 3.

In addition, three-dimensional image coding methods based on training wavelet transformation are also proposed. Training wavelet transform from image texture features, a targeted wavelet transform is designed and applied to image coding. The main idea is to use a deep learning network instead of the prediction and update filters in the traditional wavelet transform. Based on a large number of training samples, a wavelet transform for the texture distribution is trained. As shown in fig. 4, a training wavelet flow step is provided.

When the image is coded, wavelet transformation based on a depth network is applied, the image is decomposed to obtain wavelet coefficients, and then a subband coding method is applied to obtain a compressed code stream. In image decoding, a sub-band decoding method is applied to decode wavelet coefficients from a compressed code stream, and then inverse wavelet transform based on a depth network is applied to reconstruct an image.

The image coding method based on training wavelet can effectively capture non-directional characteristics, so that wavelet transformation is adapted to different textures to a great extent. However, it still has a problem that after training is finished, the wavelet transform coefficients of the training wavelet are determined, and when images with different textures are processed, the same set of coefficients is used, so that the training wavelet transform cannot adaptively process different images, and the image coding performance is affected.

Disclosure of Invention

The invention aims to provide a three-dimensional biomedical image compression method, a system, equipment and a storage medium, which can improve coding performance.

The invention aims at realizing the following technical scheme:

a method of three-dimensional biomedical image compression, comprising:

the encoding stage, which adopts a training-based three-dimensional affine wavelet transformation to decompose the input three-dimensional biomedical image, comprises the following steps: splitting the current direction of an input three-dimensional biomedical image into an odd frame and an even frame, inputting the odd frame into a prediction network based on a depth network to obtain a prediction result and a first affine image, scaling the prediction result by the first affine image, making a difference value with the even frame, wherein the difference value result is a high-frequency signal, inputting the high-frequency signal into an updating network based on the depth network to obtain an updating result and a second affine image, scaling the updating result by the second affine image, adding the updating result with the odd frame, adding the adding result into a low-frequency signal, completing a decomposition flow, executing a plurality of decomposition flows, and completing the decomposition of the current direction; decomposing the low-frequency signal and the high-frequency signal obtained by decomposing in the current direction in the other directions; entropy coding is carried out by utilizing decomposition results in all directions;

And a decoding stage, wherein a three-dimensional biomedical image is reconstructed by adopting a flow opposite to the encoding stage.

A three-dimensional biomedical image compression system comprising:

an encoder applied to the encoding stage; the encoding stage adopts a training-based three-dimensional affine wavelet transformation to decompose an input three-dimensional biomedical image, and comprises the following steps: splitting the current direction of an input three-dimensional biomedical image into an odd frame and an even frame, inputting the odd frame into a prediction network based on a depth network, scaling a prediction result by an affine graph, making a difference value with the even frame, inputting the difference value result into an updating network based on the depth network, adding the updating result into the odd frame after scaling by the affine graph, and adding the adding result into a low-frequency signal to complete a decomposition process, executing a plurality of decomposition processes, and completing the decomposition of the current direction; decomposing the low-frequency signal and the high-frequency signal obtained by decomposing in the current direction in the other directions; entropy coding is carried out by utilizing decomposition results in all directions;

a decoder applied to the decoding stage; the decoding stage adopts a flow opposite to the encoding stage to reconstruct the three-dimensional biomedical image.

A processing apparatus, comprising: one or more processors; a memory for storing one or more programs;

wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the aforementioned methods.

A readable storage medium storing a computer program which, when executed by a processor, implements the method described above.

According to the technical scheme provided by the invention, the three-dimensional affine wavelet transformation based on training is designed, the affine graph is introduced, and after the training is finished, although the prediction and update network is fixed, the affine graph is changed according to different inputs, so that the affine graph is equivalent to the wavelet basis function to be adjusted according to different input contents, and the coding performance of the three-dimensional biomedical image is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a second generation wavelet transform lifting structure provided in the background of the invention;

FIG. 2 is a flowchart of JPEG-2000 encoding provided in the background of the invention;

FIG. 3 is a flowchart of JPEG-2000 decoding provided in the background of the invention;

FIG. 4 is a flowchart of a training wavelet process provided in the background of the invention;

FIG. 5 is a flow chart of a method for compressing three-dimensional biomedical images according to an embodiment of the present invention;

FIG. 6 is a flowchart of a lifting step of a training-based three-dimensional affine wavelet transform provided by an embodiment of the present invention;

FIG. 7 is a schematic diagram of a prediction network and an update network according to an embodiment of the present invention;

FIG. 8 is a flow chart of context modeling based on a depth network according to an embodiment of the present invention;

FIG. 9 is a schematic diagram of a three-dimensional inter-subband context depth network according to an embodiment of the present invention;

FIG. 10 is a schematic diagram of a three-dimensional intra-subband context depth network structure according to an embodiment of the present invention;

FIG. 11 is a schematic diagram of a three-dimensional context fusion depth network structure according to an embodiment of the present invention;

FIG. 12 is a schematic diagram of subband coefficient entropy encoding according to an embodiment of the present invention;

FIG. 13 is an inverse transformation flow chart of affine wavelet based on depth network provided by an embodiment of the invention;

FIG. 14 is a diagram showing the beneficial results of a three-dimensional affine wavelet transform provided by an embodiment of the invention;

FIG. 15 is a lossy image encoding flow chart based on a training-based three-dimensional affine wavelet transform provided by an embodiment of the invention;

FIG. 16 is a lossy image decoding flow chart based on a training three-dimensional affine wavelet transform provided by an embodiment of the invention;

FIG. 17 is a flowchart of lossless image encoding based on a training three-dimensional affine wavelet transform provided by an embodiment of the invention;

FIG. 18 is a flowchart of lossless image decoding based on a training three-dimensional affine wavelet transform provided by an embodiment of the invention;

FIG. 19 is a flowchart of an image encoding method combining affine wavelet transformation and parameter sharing strategy according to an embodiment of the present invention;

FIG. 20 is a flowchart of an image decoding method combining affine wavelet transformation and parameter sharing strategy according to an embodiment of the present invention;

FIG. 21 is a flowchart of an image encoding method combining affine wavelet transformation and training three-dimensional entropy encoding according to an embodiment of the present invention;

FIG. 22 is a flowchart of an image decoding method combining affine wavelet transformation and training three-dimensional entropy encoding according to an embodiment of the present invention;

FIG. 23 is a flowchart of an image encoding method combining affine wavelet transformation based on parameter sharing strategy and training three-dimensional entropy encoding according to an embodiment of the present invention;

FIG. 24 is a flowchart of an image decoding method for combining affine wavelet transformation and training three-dimensional codec based on a parameter sharing strategy according to an embodiment of the present invention;

FIG. 25 is a schematic diagram of a three-dimensional biomedical image compression system according to an embodiment of the present invention;

fig. 26 is a schematic diagram of a processing apparatus according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.

The terms that may be used herein will first be described as follows:

the terms "comprises," "comprising," "includes," "including," "has," "having" or other similar referents are to be construed to cover a non-exclusive inclusion. For example: including a particular feature (e.g., a starting material, component, ingredient, carrier, formulation, material, dimension, part, means, mechanism, apparatus, step, procedure, method, reaction condition, processing condition, parameter, algorithm, signal, data, product or article of manufacture, etc.), should be construed as including not only a particular feature but also other features known in the art that are not explicitly recited.

The three-dimensional biomedical image compression scheme provided by the present invention is described in detail below. What is not described in detail in the embodiments of the present invention belongs to the prior art known to those skilled in the art. The specific conditions are not noted in the examples of the present invention and are carried out according to the conditions conventional in the art or suggested by the manufacturer.

Example 1

As shown in fig. 1, a three-dimensional biomedical image compression method mainly comprises the following steps:

For ease of understanding, the following description is provided in detail for each link of the codec stage.

1. And (3) a coding stage.

1. Based on a trained three-dimensional affine wavelet transform.

1) A lifting step based on a trained three-dimensional affine wavelet transform.

Fig. 6 illustrates a lifting step of a training-based three-dimensional affine wavelet transform. The affine wavelet transformation based on the depth network is introduced with the depth network to realize the steps of prediction and updating; besides, the updating and predicting network based on the trained three-dimensional affine wavelet transformation also outputs an affine graph besides learning an updating and predicting filter, and the affine graph and the interaction of the predicting result and the updating result are used for scaling the output size of each spatial position. After training, although the prediction and update network is fixed, the affine graph will change according to different inputs, and thus the wavelet basis function is equivalent to adjusting according to different contents of the inputs.

In the embodiment of the invention, training is performed according to an objective function of image compression, wherein the objective function is as follows: r+λd; wherein R represents the code rate consumed by encoding, D represents the L2 loss between the reconstructed image and the input image, and lambda is a coefficient. During training, the training parameters of the prediction network and the updated network are optimized by minimizing the objective function.

In the embodiment of the invention, the affine graph has the value of 0-1, and the affine graph can be realized by any one of the following modes: 1) The first affine graph is a tensor with the same size as the predicted result, the second affine graph is a tensor with the same size as the updated result, the predicted result and the updated result are respectively calculated into corresponding affine graphs through a sigmoid function, and the output of the updated network and the predicted network are different (namely, the predicted result is different from the updated result), so that the affine graphs acting on the predicted result and the updated result are different. The two affine graphs each provide a scaling value for each spatial location of the output results of the prediction network and the update network separately, with an effect similar to a spatial attention mechanism. 2) The affine map is a scalar, by which the same scaling value is provided for each spatial position of the prediction result and the update result, with an effect similar to a channel attention mechanism. In the learning process, the affine graph is set as a variable which can be learned, and the variable acts on the output results of the prediction network and the updating network, so that the same scaling is carried out on each position of the output results.

In the embodiment of the present invention, the prediction network and the update network adopt the same network structure, as shown in fig. 7, mainly including: and the output of the first three-dimensional convolution layer is connected with the output of the fifth three-dimensional convolution layer, the output of the second three-dimensional convolution layer is connected with the output of the fourth three-dimensional convolution layer, and the tan h activation function is firstly used in the third three-dimensional convolution layer and the fourth three-dimensional convolution layer for convolution operation. In fig. 7, 3 x 1 means that the three-dimensional convolution layer generates 1 feature map using a 3 x 3 convolution kernel, similarly, the number of the devices to be used in the system, 3 x 16 represents three-dimensional convolutional layer usage the 3×3×3 convolution kernel generates 16 feature maps; of course, the convolution kernel parameters and the number of feature images provided herein are merely examples, and are not limiting, and in practical applications, the user may make corresponding adjustments according to the situation or experience.

Based on the above principle, the embodiment of the invention can realize the lossy coding and the lossless coding.

a) In lossy encoding, the first affine map is multiplied by the prediction result, and the second affine map is multiplied by the update result to scale the output size of each spatial position in the prediction result and the update result, for example, fig. 6 illustrates an example of lossy encoding.

b) Lossless coding.

Although the forward and backward transformation of the affine wavelet transformation is completely reversible in the algorithm flow, the multiplication and division operations (decoding stages) exist in the affine wavelet transformation, so that the complete reversibility cannot be achieved in the practical implementation. In order to solve the problem, the invention designs a lossless form of efficient affine wavelet transformation.

It is known that a shift to the left or right by a bit does not cause a loss of precision in practical operation. Therefore, the numerical value of each position of the first affine graph and the second affine graph is quantized, multiplication operation is realized through right shift operation, and the output size of each spatial position in the prediction result and the update result is scaled through right shift operation; the loss of precision is avoided briefly through the mode. Illustratively, rounding the affine graph to its nearest integer power of 2, e.g., 0.5 is quantized to 2 ^-1 。

It should be noted that, when the decomposition is performed for each direction, the number of times of executing the decomposition process may be set by those skilled in the art according to actual situations or experiences; similar to the prior art, the low-frequency signal and the high-frequency signal obtained by the previous decomposition in the next decomposition are used as inputs (i.e. no splitting is needed); after the decomposition in the current direction is finished, all the low-frequency signals and the high-frequency signals obtained by the decomposition are decomposed in other directions (the decomposition times are the same as those in the first direction), and after the decomposition in the three directions is finished, the complete decomposition is completed; likewise, the number of complete decompositions may be one or more.

2) A parameter sharing strategy based on a trained three-dimensional wavelet transform.

As mentioned in the introduction above, the three-dimensional image coding method based on the training wavelet transform, in which the parameters of the wavelet transform in three dimensions of the image are shared, but it is not reasonable for three-dimensional images having different properties in three directions, so that sharing the parameters in all directions affects the performance of the three-dimensional image coding based on the training wavelet transform.

The three-dimensional biomedical image is different in characteristics in three directions, limited by the level of the imaging device, and can be classified into an isotropic image and an anisotropic image according to the difference in axial resolution. Wherein isotropic image refers to the same axial resolution as the row and column resolution, such images being consistent in nature in three dimensions; anisotropic images, in contrast, are those having different axial resolutions (typically, smaller axial resolutions) than row and column resolutions, and thus have different axial properties than the other two directions.

Aiming at the characteristics, the invention provides a parameter sharing strategy of three-dimensional wavelet transformation based on training. For an isotropic image, the prediction network and the update network in the three-dimensional affine wavelet transformation based on training use the same structure (as shown in fig. 7) and share all parameters, so that the complexity of the network can be reduced, and the parameter quantity can be saved; for anisotropic images, the prediction network and the update network in the training-based three-dimensional affine wavelet transform use the same structure (as shown in fig. 7) in order to better adapt to different properties in the axial direction, one set of parameters is used in the axial direction alone, and all parameters are shared in the other directions.

2. And (5) three-dimensional entropy coding.

After the processing based on the above 1), the high-frequency signals and the low-frequency signals obtained by the decomposition in all directions are referred to as three-dimensional subbands; and carrying out entropy coding on all the sub-bands in sequence according to a set sequence. If a lossy coding mode is adopted, quantization is needed first and then entropy coding is needed, and if a lossless coding mode is adopted, entropy coding can be directly carried out.

Currently, common subband entropy encoding methods include EZW, EBCOT, and the like.

EZW is an embedded zero tree wavelet coding (Embedded Zerotree Wavelets Encoding) refers to a mathematical structure based on image wavelet transforms. In the EZW algorithm, the realization of the embedded code stream is realized by combining a zero tree structure with successive approximation quantization. The purpose of the zero tree structure is to efficiently represent the locations of non-zero values (significance map) in the wavelet transform coefficient matrix.

EBCOT is the best truncated embedded code block coding (embedded block coding with optimized truncation), which was published in 1999 as a coding algorithm. The wavelet coefficient quantization coding of JPEG2000 adopts EBCOT coding, which is the core of JPEG2000 standard and is an embedded bit layer coding method of wavelet coefficient. EBCOT coding is divided into two parts: part 1, namely dividing each sub-band into independent coding blocks, then independently carrying out embedded coding scanning on each coding block, coding a bit layer of each coding block, and finally carrying out MQ arithmetic coding on a coding scanning result to obtain an embedded code stream; and 2. The 2 nd section tier2 combines the embedded code stream of each coding block according to the requirement of the output code rate, and performs the processing of optimizing, cutting, sorting, packaging and the like on the coding streams of all the coding blocks to obtain the code stream of JPEG 2000.

While the above commonly used three-dimensional versions of sub-band entropy coding can be used for three-dimensional image coding, they are not data driven, i.e. they cannot be adjusted according to the texture of the three-dimensional image. The coding performance of these methods is not good.

Therefore, the embodiment of the invention provides a three-dimensional entropy coding method based on a depth network, which utilizes a three-dimensional context rough extraction network to extract a context from an encoded three-dimensional sub-band for the current three-dimensional sub-band to be coded; inputting the context into a three-dimensional inter-subband context extraction network to obtain the context between the three-dimensional subbands; inputting the current three-dimensional sub-band to be coded into a three-dimensional sub-band context extraction network to obtain a context in the three-dimensional sub-band; inputting the context between the three-dimensional sub-bands and the context in the three-dimensional sub-bands into a three-dimensional context fusion depth network to obtain entropy coding parameters of the three-dimensional sub-bands to be coded currently; and coding the three-dimensional sub-band to be coded currently by utilizing the entropy coding parameters. The following describes the three-dimensional entropy coding method based on the depth network in detail.

As described earlier, 7n+1 three-dimensional subbands can be obtained by the training-based three-dimensional affine wavelet transform, taking N equal to 2 as an example, the first complete decomposition yields 8 subbands, where the lowest-frequency subband LLL can be decomposed again to yield 8 subbands, and the two complete decompositions yield 15 subbands. When 15 three-dimensional subbands are encoded, each subband is encoded in turn in the order { LLL, HLL, LHL, HHL, LLH, HLH, LHH, HHH }. The coding is characterized by removing the correlation of two aspects: (1) inter-subband correlation; (2) subband relativity. Each subband coefficient is encoded in turn from left to right, top to bottom. For each coefficient, firstly carrying out context modeling by using a depth network to give probability distribution of the coefficient; then entropy encoding is performed using an arithmetic encoder according to the probability distribution. As shown in fig. 8, the main flow of the context modeling is shown, mainly including:

1) Three-dimensional context coarse extraction network extracts coarse context C _t 。

Encoding a current three-dimensional subband S _t And when the three-dimensional sub-band which is already coded contains the context of the current three-dimensional sub-band, extracting the context by using a deep neural network. Firstly, the coded three-dimensional sub-band is reconstructed into the size consistent with the size of the current three-dimensional sub-band through inverse transformation, and is combined in the channel dimension, and then a three-dimensional context rough extraction network is passed through, wherein the three-dimensional context rough extraction network is formed by a layer of convolution neural network, the three-dimensional convolution layer uses a convolution kernel of 3X 3, and the network outputs the context C which is roughly extracted _t 。

2) Three-dimensional subband context extraction.

a) Inputting the context into a three-dimensional inter-subband context extraction network to obtain a three-dimensional inter-subband context C_b _t 。

As shown in fig. 9, the three-dimensional inter-subband context extracting network mainly includes: the system comprises a first convolution unit and a second convolution unit which are sequentially arranged, wherein each convolution unit comprises two three-dimensional convolution layers which are sequentially arranged, and a Relu function is arranged between the two three-dimensional convolution layers; the input and output of each convolution unit are connected.

b) Inputting the current three-dimensional sub-band to be coded into a three-dimensional sub-band context extraction network to obtain a context C_w in the three-dimensional sub-band _t 。

As shown in fig. 10, the three-dimensional intra-subband context extraction network mainly includes: the three-dimensional convolution layer, the third convolution unit and the fourth convolution unit with masks are sequentially arranged; each convolution unit comprises two three-dimensional convolution layers with masks, which are sequentially arranged, and a Relu function is arranged between the two three-dimensional convolution layers with the masks; the input of the third convolution unit comprises the output of the three-dimensional convolution layer with the mask in front of the third convolution unit and the input of the three-dimensional sub-band context extraction network, and meanwhile, the input of the third convolution unit is also connected with the output of the third convolution unit; the input and output of the fourth convolution unit are connected.

3) Through the steps, the three-dimensional sub-band S to be coded is obtained _t The contexts are fused through a three-dimensional context fusion depth network, and parameters of a cumulative distribution probability function (namely entropy coding parameters) of three-dimensional subband coefficients to be coded are output.

As shown in fig. 11, the three-dimensional context fusion depth network mainly includes: and the three-dimensional convolution layers with masks and the three-dimensional convolution layers are sequentially arranged, and a Relu function is arranged between the three-dimensional convolution layers with masks and between the adjacent three-dimensional convolution layers.

For each three-dimensional subband to be encoded, as a whole, each coefficient is encoded sequentially from front to back, from left to right, top to bottom. For each coefficient, firstly carrying out context modeling by using the network shown in fig. 10 to obtain probability distribution of the three-dimensional subband coefficients; then entropy encoding is performed using an arithmetic encoder according to the probability distribution. Context modeling is performed using a three-dimensional PixelCNN (Pixel Convolutional Neural Network, pixel-by-pixel convolutional neural network) to obtain the probability distribution of the coefficients to be encoded. The Pixel CNN takes the current three-dimensional sub-band as input and outputs the parameter of the cumulative distribution probability function of the coefficient to be coded. Taking fig. 12 as an example, the middle-most position point represents a coefficient to be encoded, the light-colored portion represents a coefficient that has been encoded, and the dark-colored portion represents a coefficient that has not been encoded. By ensuring that encoded coefficients are available, unencoded coefficients are not available, thereby ensuring that the decoding logic is correct.

2. And a decoding stage.

And a decoding stage, wherein a flow opposite to the encoding stage is adopted.

First, a training-based three-dimensional entropy decoding method is used: each subband coefficient is decoded in turn from left to right, top to bottom. For each three-dimensional subband coefficient, performing context modeling by using a depth network to obtain probability distribution of the three-dimensional subband coefficients; and then, according to the probability distribution, invoking an arithmetic decoder to carry out entropy decoding to obtain the reconstructed three-dimensional subband coefficient.

Then, a training-based three-dimensional affine wavelet inverse transformation is performed, which is an inverse process of the training-based three-dimensional affine wavelet transformation, and a three-dimensional biomedical image is reconstructed through the training-based three-dimensional affine wavelet inverse transformation, as shown in fig. 13. The respective networks of the inverse transformation typically share parameters with the forward transformation (i.e., the training-based three-dimensional affine wavelet transformation).

Similarly, the decoding stage adopts a matched lossy decoding and lossless decoding mode in contrast to the lossy encoding and lossless encoding modes described in the encoding stage. Specific: in lossy decoding, the prediction result is divided by the first affine graph, the update result is divided by the second affine graph, and the output size of each spatial position in the prediction result and the update result is scaled, and fig. 13 shows a lossy decoding scheme based on training three-dimensional affine wavelet inverse transformation. And during lossless decoding, quantizing the numerical value of each position of the first affine graph and the second affine graph, realizing division operation through left shift operation, and scaling the output size of each spatial position in the prediction result and the update result through the left shift operation. Similarly, with the lossy decoding method, inverse quantization is required to be performed, and then inverse transformation of the three-dimensional affine wavelet based on training is performed; and (3) directly performing training-based three-dimensional affine wavelet inverse transformation on the entropy decoding result by using a lossless decoding mode.

The scheme of the invention mainly comprises three improvements:

1. a three-dimensional affine wavelet transformation (inverse transformation) based on training is proposed; including lossy encoding and lossless encoding (lossy decoding and lossless decoding).

2. A parameter sharing strategy based on a trained three-dimensional affine wavelet transform (inverse transform) designed for the characteristics of three-dimensional biomedical images.

3. A depth network-based three-dimensional entropy encoding (entropy decoding) method designed for three-dimensional biomedical images.

The basic improvement scheme in the 1 st aspect can improve coding performance compared with the existing scheme; on the basis, the coding performance can be further improved by combining the 2 nd aspect, the 3 rd aspect or the 2 nd and the 3 rd aspects.

As shown in fig. 14, for the benefit obtained by the training-based three-dimensional affine wavelet transform (i.e., using only the modification of the 1 st aspect), the training-based three-dimensional affine wavelet transform has versatility in various data sets, where the benefit of the affine wavelet transform (referred to as affine wavelet in the figure) compared to the conventional 9/7wavelet transform (referred to as 9/7wavelet in the figure) and the training wavelet transform (referred to as additive wavelet in the figure) replacing only the update and prediction network is demonstrated on the three-dimensional image data set, i.e., the electron microscope image FAFB data set. The three line segments from top to bottom in fig. 24 correspond to an affine wavelet, an additive wavelet, and a 9/7wavelet in order.

The following improvements based on the above three aspects provide related examples of solutions.

Example one, a training-based three-dimensional affine wavelet transform.

In this section, a scheme for three-dimensional biomedical image compression based on a trained three-dimensional affine wavelet transform is introduced, namely, the follow-up entropy coding uses the existing mode.

1. Three-dimensional affine wavelet transform for lossy image coding.

1) And (3) a coding stage.

As shown in fig. 15, the input is a three-dimensional biomedical image to be encoded, the output is a compressed code stream, and the compressed code stream can be sent to a decoding end for decoding operation. The main steps are described as follows:

step 1: and (3) applying a training-based three-dimensional affine wavelet transform (in a lossy form) to the input three-dimensional biomedical image to be encoded, and carrying out affine wavelet transform for N times to obtain wavelet coefficients, wherein the wavelet coefficients are composed of 7N+1 subbands. The value of N is a priori, e.g., in this example, N is set to 4.

Step 2: and carrying out quantization and entropy coding on the wavelet coefficient by using a subband coding method to obtain a compressed code stream.

The subband coding method for wavelet coefficients comprises two steps of quantization and entropy coding. Common subband coding methods include three-dimensional EZW, SPIHT, EBCOT, etc., which can be selected at this step in conjunction with specific requirements.

2. And a decoding stage.

Corresponding to the encoding stage scheme architecture and encoding operations, as shown in fig. 16, the input is a received compressed code stream and the output is a reconstructed three-dimensional biomedical image. The main steps are described as follows:

step 1: and performing entropy decoding and inverse quantization on the compressed code stream by using a sub-band decoding method to obtain a reconstructed wavelet coefficient. The subband decoding method is compatible with the subband encoding method.

Step 2: the reconstructed wavelet coefficients are transformed (lossy form) using an inverse transformation based on a trained three-dimensional affine wavelet transform to obtain a reconstructed three-dimensional biomedical image.

2. Three-dimensional affine wavelet transform for lossless image coding.

1) And (3) a coding stage.

As shown in fig. 17, the input is a three-dimensional biomedical image to be encoded, the output is a compressed code stream, and the compressed code stream can be sent to a decoding end for decoding operation. The main steps are described as follows:

step 1: and (3) applying a training-based three-dimensional affine wavelet transform (in a lossless form) to the input three-dimensional biomedical image to be encoded, and carrying out affine wavelet transform for N times to obtain wavelet coefficients, wherein the wavelet coefficients are composed of 7N+1 subbands. The value of N is a priori, e.g., in this example, N is set to 4.

Step 2: and entropy coding the wavelet coefficient by using a subband coding method to obtain a compressed code stream.

Entropy encoding is performed for subbands of the wavelet coefficients. Common subband coding methods include three-dimensional EZW, SPIHT, EBCOT, etc., which can be selected at this step in conjunction with specific requirements.

2. And a decoding stage.

Corresponding to the encoding stage scheme architecture and encoding operations, as shown in fig. 18, the input is a received compressed code stream and the output is a reconstructed three-dimensional biomedical image. The main steps are described as follows:

step 1: and performing entropy decoding on the compressed code stream by using a sub-band decoding method to obtain a reconstructed wavelet coefficient. The subband decoding method is compatible with the subband encoding method.

Step 2: the reconstructed wavelet coefficients are transformed (in lossless form) using an inverse transformation based on a trained three-dimensional affine wavelet transform to obtain a reconstructed three-dimensional biomedical image.

Example two, an affine wavelet transform and parameter sharing strategy image coding method are combined.

In this section, a scheme of performing three-dimensional biomedical image compression by combining the three-dimensional affine wavelet transformation based on training with the parameter sharing strategy is introduced, that is, the compression scheme of the foregoing 1 st aspect and 2 nd aspect is combined, hereinafter simply referred to as affine wavelet transformation (inverse transformation) based on the parameter sharing strategy, and the following entropy encoding uses the existing mode.

1. And (3) a coding stage.

As shown in fig. 19, the input is a three-dimensional biomedical image to be encoded, the output is a compressed code stream, and the compressed code stream can be sent to a decoding end for decoding operation. The main steps are described as follows:

step 1: and applying affine wavelet transformation based on a parameter sharing strategy to the input image to be encoded, and carrying out affine wavelet transformation for N times to obtain wavelet coefficients, wherein the wavelet coefficients are composed of 7N+1 sub-bands. The value of N is a priori, e.g., in this example, N is set to 4.

2. And a decoding stage.

Corresponding to the encoding stage scheme architecture and encoding operations, as shown in fig. 20, the input is a received compressed code stream and the output is a reconstructed three-dimensional biomedical image. The main steps are described as follows:

Step 2: and obtaining the reconstructed three-dimensional biomedical image by using affine wavelet inverse transformation based on a parameter sharing strategy to the reconstructed wavelet coefficients.

In this section, for lossless coding, the quantization in the coding stage and the inverse quantization in the decoding stage are omitted, and the other processes are the same, so that no description is given.

Example three an image coding method combining affine wavelet transformation and depth network-based three-dimensional entropy coding.

In this section, a scheme of performing three-dimensional biomedical image compression by combining a training-based three-dimensional affine wavelet transform and a depth network-based three-dimensional entropy coding method is introduced, namely, the compression scheme combining the 1 st aspect and the 3 rd aspect.

1. And (3) a coding stage.

As shown in fig. 21, the input is a three-dimensional biomedical image to be encoded, the output is a compressed code stream, and the compressed code stream can be sent to a decoding end for decoding operation. The main steps are described as follows:

step 1: and (3) applying affine wavelet transformation based on training to the input image to be encoded, and carrying out affine wavelet transformation for N times to obtain wavelet coefficients, wherein the wavelet coefficients are composed of 7N+1 subbands. The value of N is a priori, e.g., in this example, N is set to 4.

Step 2: and after the wavelet coefficients are quantized, entropy coding is carried out by using a three-dimensional entropy coding method based on a depth network, so as to obtain a compressed code stream.

2. And a decoding stage.

Corresponding to the encoding stage scheme architecture and encoding operations, as shown in fig. 22, the input is a received compressed code stream and the output is a reconstructed three-dimensional biomedical image. The main steps are described as follows:

step 1: and performing entropy decoding by using a three-dimensional entropy decoding method based on a depth network, and performing inverse quantization to obtain a reconstructed wavelet coefficient.

Step 2: and obtaining the reconstructed three-dimensional biomedical image by using inverse transformation of affine wavelet based on training to the reconstructed wavelet coefficient.

Example four, an image coding method combining affine wavelet transformation based on parameter sharing strategy and three-dimensional entropy coding based on depth network.

In this section, the scheme of performing three-dimensional biomedical image compression in combination with the three-dimensional affine wavelet transformation based on training, the parameter sharing strategy and the three-dimensional entropy coding method based on depth network is described, that is, the compression scheme in combination with the foregoing aspects 1, 2 and 3, and the scheme of combining the three-dimensional affine wavelet transformation based on training (inverse transformation) with the parameter sharing strategy is hereinafter referred to as affine wavelet transformation (inverse transformation) based on the parameter sharing strategy.

1. And (3) a coding stage.

As shown in fig. 23, the input is a three-dimensional biomedical image to be encoded, the output is a compressed code stream, and the compressed code stream can be sent to a decoding end for decoding operation. The main steps are described as follows:

2. And a decoding stage.

Corresponding to the encoding stage scheme architecture and encoding operations, as shown in fig. 24, the input is a received compressed code stream and the output is a reconstructed three-dimensional biomedical image. The main steps are described as follows:

The related calculation flow involved in the above four examples is described in detail above, and therefore, the above four examples are not described in detail. The superior properties of the present invention are demonstrated by comparative experiments below.

Example two

The present invention also provides a three-dimensional biomedical image compression system, which is implemented mainly based on the method provided in the first embodiment, as shown in fig. 25, and the system mainly includes:

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional modules is illustrated, and in practical application, the above-described functional allocation may be performed by different functional modules according to needs, i.e. the internal structure of the system is divided into different functional modules to perform all or part of the functions described above.

It should be noted that, the main technical details related to the above coding and decoding stage and the combination of the specific improvements related to the coding and decoding stage may refer to the manner in the implementation one, so that the description is omitted.

Example III

The present invention also provides a processing apparatus, as shown in fig. 26, which mainly includes: one or more processors; a memory for storing one or more programs; wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the methods provided by the foregoing embodiments.

Further, the processing device further comprises at least one input device and at least one output device; in the processing device, the processor, the memory, the input device and the output device are connected through buses.

In the embodiment of the invention, the specific types of the memory, the input device and the output device are not limited; for example:

the input device can be a touch screen, an image acquisition device, a physical key or a mouse and the like;

the output device may be a display terminal;

the memory may be random access memory (Random Access Memory, RAM) or non-volatile memory (non-volatile memory), such as disk memory.

Example IV

The invention also provides a readable storage medium storing a computer program which, when executed by a processor, implements the method provided by the foregoing embodiments.

The readable storage medium according to the embodiment of the present invention may be provided as a computer readable storage medium in the aforementioned processing apparatus, for example, as a memory in the processing apparatus. The readable storage medium may be any of various media capable of storing a program code, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a magnetic disk, and an optical disk.

The foregoing is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions easily contemplated by those skilled in the art within the scope of the present invention should be included in the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.

Claims

1. A method of three-dimensional biomedical image compression comprising:

2. A method of three-dimensional biomedical image compression according to claim 1, wherein,

The first affine graph is a tensor with the same size as the predicted result, the second affine graph is a tensor with the same size as the updated result, corresponding affine graphs are calculated for the predicted result and the updated result through sigmoid functions respectively, and a scaling value is provided for each spatial position of the predicted result and the updated result independently;

alternatively, the affine map is a scalar, which is set as a variable that can be learned, and the same scaling value is provided for each spatial position of the prediction result and the update result by the affine map.

3. A method of three-dimensional biomedical image compression according to claim 1 or 2, wherein the way in which the prediction result is scaled by the first affine map and the update result is scaled by the second affine map comprises:

a coding stage, multiplying the first affine graph by the prediction result, multiplying the second affine graph by the update result, and scaling the output size of each spatial position in the prediction result and the update result; or, quantizing the numerical value of each position of the first affine graph and the second affine graph, implementing multiplication operation through right shift operation, and scaling the output size of each spatial position in the prediction result and the update result through right shift operation;

A decoding stage, dividing the prediction result by the first affine graph, dividing the update result by the second affine graph, and scaling the output size of each spatial position in the prediction result and the update result; or, quantizing the numerical value of each position of the first affine graph and the second affine graph, implementing division operation through left shift operation, and scaling the output size of each spatial position in the prediction result and the update result through left shift operation.

4. A method of three-dimensional biomedical image compression according to claim 1, wherein the training-based three-dimensional affine wavelet transform uses a parameter sharing strategy;

the three-dimensional biomedical image is divided into an isotropic image and an anisotropic image according to the difference of axial resolutions; for isotropic images, the prediction network and the update network in the training-based three-dimensional affine wavelet transform use the same structure and share all parameters; for anisotropic images, the same structure is used for the prediction network and the update network in the trained three-dimensional affine wavelet transformation, a set of parameters is used alone in the axial direction, and all parameters are shared in the other directions.

5. The method of claim 1 or 4, wherein the structure of the prediction network and the update network comprises: and the output of the first three-dimensional convolution layer is connected with the output of the fifth three-dimensional convolution layer, the output of the second three-dimensional convolution layer is connected with the output of the fourth three-dimensional convolution layer, and the tan h activation function is firstly used in the third three-dimensional convolution layer and the fourth three-dimensional convolution layer for convolution operation.

6. The method according to claim 1, wherein the encoding stage uses a depth network-based three-dimensional entropy encoding method, and the high-frequency signal and the low-frequency signal obtained by decomposition in all directions are referred to as three-dimensional subbands; entropy coding is sequentially carried out on all the sub-bands according to a set sequence;

for the current three-dimensional sub-band to be coded, extracting the context from the coded three-dimensional sub-band by utilizing a three-dimensional context rough extraction network; inputting the context into a three-dimensional inter-subband context extraction network to obtain the context between the three-dimensional subbands; inputting the current three-dimensional sub-band to be coded into a three-dimensional sub-band context extraction network to obtain a context in the three-dimensional sub-band; inputting the context between the three-dimensional sub-bands and the context in the three-dimensional sub-bands into a three-dimensional context fusion depth network to obtain entropy coding parameters of the three-dimensional sub-bands to be coded currently; and carrying out entropy coding on the three-dimensional sub-band to be coded currently by utilizing the entropy coding parameters.

7. A method of three-dimensional biomedical image compression according to claim 6,

the three-dimensional inter-subband context extraction network comprises: the system comprises a first convolution unit and a second convolution unit which are sequentially arranged, wherein each convolution unit comprises two three-dimensional convolution layers which are sequentially arranged, and a Relu function is arranged between the two three-dimensional convolution layers; the input and the output of each convolution unit are connected;

the three-dimensional intra-subband context extraction network comprises: the three-dimensional convolution layer, the third convolution unit and the fourth convolution unit with masks are sequentially arranged; each convolution unit comprises two three-dimensional convolution layers with masks, which are sequentially arranged, and a Relu function is arranged between the two three-dimensional convolution layers with the masks; the input of the third convolution unit comprises the output of the three-dimensional convolution layer with the mask in front of the third convolution unit and the input of the three-dimensional sub-band context extraction network, and meanwhile, the input of the third convolution unit is also connected with the output of the third convolution unit; the input and the output of the fourth convolution unit are connected;

the three-dimensional context fusion depth network comprises: and the three-dimensional convolution layers with masks and the three-dimensional convolution layers are sequentially arranged, and a Relu function is arranged between the three-dimensional convolution layers with masks and between the adjacent three-dimensional convolution layers.

8. A three-dimensional biomedical image compression system realized based on the method of any one of claims 1 to 7, comprising:

9. A processing apparatus, comprising: one or more processors; a memory for storing one or more programs;

wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-7.

10. A readable storage medium storing a computer program, characterized in that the method according to any one of claims 1-7 is implemented when the computer program is executed by a processor.