CN114170076A

CN114170076A - Method for extracting target object information from video based on super-resolution and application

Info

Publication number: CN114170076A
Application number: CN202111272433.9A
Authority: CN
Inventors: 秦斌杰; 茅好好
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2021-10-29
Filing date: 2021-10-29
Publication date: 2022-03-11

Abstract

The invention relates to a super-resolution-based method for extracting target object information from a video and application thereof, wherein the method comprises the following steps: acquiring a video sequence containing a target object; dividing the video sequence into sub-blocks, inputting the sub-blocks into a trained deep expansion network model for solving, and splicing the output to obtain a prediction result of a target object; the deep expansion network model is a convolution robust principal component analysis deep expansion network and is constructed and obtained by combining a super-resolution module according to a deep expansion algorithm based on robust principal component analysis. Compared with the prior art, the method has the advantages of high real-time performance, interference elimination, accurate detection and the like, and when the method is applied to the X-ray angiography video, the influence of the background blood vessel structure and the complex mixed noise can be effectively reduced, and the small blood vessel extraction effect is remarkably improved.

Description

Method for extracting target object information from video based on super-resolution and application

Technical Field

The invention relates to the technical field of information extraction, in particular to a video processing method, and particularly relates to a super-resolution-based method for extracting target object information from a video and application thereof.

Background

In the field of information, it is often necessary to extract target object information in a video sequence. Such as an X-ray angiographic video sequence, is a video sequence in which accurate vessel information is the object that the technician needs to acquire. Due to the mechanism of X-ray projection imaging, such video sequence images include numerous structures other than the flow of contrast agent through blood vessels, such as human tissues and organs like bones, lungs, diaphragms, etc. In addition, various mixed noises are inevitably generated in the imaging process. These background structures and mixed noise interfere with the identification of the vessel information, thereby affecting the further analysis of the vessel information and accurate clinical diagnosis. Therefore, the background layer in the video sequence needs to be separated from the blood vessel layer, so as to obtain the blood vessel layer video sequence with easier acquisition of blood vessel information.

At present, robust principal component analysis (robust principal component analysis) is an algorithm [ Jin, m., Li, r., Jiang, j.and Qin, b.,2017.Extracting constrained-filtered vessels in X-ray imaging by reduced RPCA with motion coherence constraint.pattern Recognition,63, pp.653-666 ] that performs the best effect of Extracting the vessel layer of an X-ray angiography video sequence. The algorithm decomposes a video sequence into a low-rank matrix and a sparse matrix from the viewpoint of motion analysis, wherein the low-rank matrix represents a background layer with larger similarity and smaller motion change in the video sequence, and the sparse matrix represents a target object layer with sparse distribution and larger motion change in the video sequence.

The traditional robust principal component analysis algorithm has limitation on the blood vessel extraction of an X-ray angiography video sequence. The algorithm needs a large amount of iterative computation, so that the time efficiency and the space efficiency are low, and the application in clinic is limited. Secondly, the human tissues and organs present in the background layer of the X-ray image are not completely static, and slight movements of these structures have a large influence on the algorithm results. Meanwhile, a great deal of complex mixed noise exists in the X-ray image, and the mixed noise can destroy the blood vessel information, especially the small blood vessel branch information. Therefore, the interference of tissues and organs in the background layer and the complex mixed noise makes the traditional robust principal component analysis algorithm unable to accurately separate the blood vessel layer from the background layer.

In addition, there are some image segmentation techniques used in the blood vessel segmentation work to obtain the blood vessel region part in the image. Common methods are image enhancement techniques, deformable models, vessel tracking, etc. These methods are usually based on vessel morphology or image grey values. When the methods are used, the blood vessel-like structure in the image background and the complex Gaussian Poisson mixed noise in the image can cause great interference to the segmentation result, so that the foreground and the background in a partial region are difficult to distinguish. Meanwhile, the segmentation results of the methods pay attention to the extraction of the contour features of the blood vessel structure, and the gray information of the blood vessel in the original image is ignored.

Therefore, the existing method cannot extract the blood vessel information from the X-ray angiography video sequence quickly and accurately, and further diagnostic items such as quantification and functional analysis based on the shape and gray scale restoration of the contrast blood vessel are difficult to develop. In general, the drawbacks of the existing vessel extraction algorithms are summarized as follows:

1. the time efficiency and the space efficiency of extracting the blood vessels are low;

2. the extracted contrast blood vessel image contains tissue organ structures and noise in a background layer;

3. in the extracted contrast blood vessel image, the small blood vessel branch information cannot be retained.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide a super-resolution-based method for extracting target object information from a video and an application thereof, wherein the method has high real-time performance and accurate detection.

The purpose of the invention can be realized by the following technical scheme:

a super-resolution-based method for extracting target object information from a video is applied to a transmission imaging video, and comprises the following steps:

acquiring a video sequence containing a target object;

dividing the video sequence into sub-blocks, inputting the sub-blocks into a trained deep expansion network model for solving, and splicing the output to obtain a prediction result of a target object;

the deep expansion network model is a convolution robust principal component analysis deep expansion network and is constructed and obtained by combining a super-resolution module according to a deep expansion algorithm based on robust principal component analysis.

Further, the specific construction process of the deep expansion network model is as follows:

constructing a robust principal component analysis model based on video characteristics, and converting the robust principal component analysis model into a Lagrange form model;

performing iterative solution on the Lagrange formal model to obtain a calculation formula of each motion layer of the video;

carrying out depth expansion on the calculation formulas of the motion layers of the video to obtain a plurality of iteration layers;

and combining a plurality of iteration layers and a super-resolution module into the deep expansion network model.

Further, the constructed robust principal component analysis model is as follows:

min||L||_*+λ||S||₁s.t.D＝L+S

wherein matrix D represents a data matrix of the original video sequence, each column vector of which is a frame of vectorized original video image, matrix L represents a low rank matrix which is a data matrix of a background layer to be solved, matrix S represents a sparse matrix which is a data matrix of a foreground layer to be solved, | L |_*Represents the kernel norm, | S | of the matrix L₁Represents l of the matrix S₁The norm, λ, is a regularization parameter used to adjust the proportion of the foreground layer component obtained by decomposition.

Further, the lagrangian formal model is:

wherein H₁And H₂Metric matrices of L and S, respectively, here taken as H₁＝H₂＝I，‖S‖_1,2Represents l of the matrix S_1,2Norm, λ₁And λ₂Regularization parameters for L and S, respectively.

Further, the iterative solution is realized by adopting a linear inverse problem solution algorithm.

Specifically, the linear inverse problem solving algorithm includes a soft threshold iteration algorithm, a fast soft threshold iteration algorithm, an alternating direction multiplier method, and the like.

Further, each motion layer of the video comprises an approximately static background layer and a moving object layer.

Specifically, in the iterative solution process of the Lagrange formal model by using a soft threshold iterative algorithm, the low-rank matrix L and the sparse matrix S are iteratively updated until convergence is reached, and in the (k + 1) th iteration, L is obtained^k+1And S^k+1Can be updated according to the following calculation:

wherein the content of the first and second substances,

is a singular value threshold operator that is,

is a soft threshold operator, L_fIs the Lipschitz constant.

Further, the performing of the deep unfolding specifically includes: and replacing the coefficient items in the calculation formulas of all the motion layers with convolution layers, and replacing the multiplication operation with the convolution operation.

In particular, from H₁And H₂The coefficient matrix terms formed may be replaced by convolutional layers for multiplicationThe convolution operation is replaced, and the k-th layer in the expansion network is calculated as follows:

wherein x represents a convolution operator,

is a coiled-up layer, and is,

is a regularization parameter. Both convolutional layer parameters and regularization parameters are obtained in the training.

Further, the super-resolution module includes a sampling layer and a sub-block sparse feature selection network layer, and the combination of the multiple iteration layers and the super-resolution module specifically includes:

and embedding the sampling layer at the start position of the iteration layer, and embedding the sub-block sparse feature selection network layer at the end position of the iteration layer.

The sampling layer is a network layer which is commonly used in the neural network and has the function of feature selection, and is used for eliminating redundant information and reserving effective information. In particular, the sampling layers include, but are not limited to, an average pooling layer, a maximum pooling layer, an overlapping pooling layer, an empty pyramid pooling layer, an upsampling layer, and the like.

Further, the sub-block sparse feature selection network layer comprises a residual network layer and a recurrent neural network layer.

The recurrent neural network layer is a network which takes a sequence as input and has a memory function. In particular, the recurrent neural network layer includes, but is not limited to, conventional recurrent neural networks, bidirectional recurrent neural networks, gated recurrent neural networks, long-short term memory networks, convolutional long-short term memory networks, and the like.

Further, the conventional target object extraction algorithm includes an extraction method based on background completion.

Further, the extraction method based on background completion comprises the following steps:

segmenting the original image to obtain a background layer image of the region where the target object is removed;

estimating a background gray value of a target object region through background completion to obtain an estimated background layer image;

and obtaining a target object gray information image by subtracting the estimated background layer image from the original image.

Further, the deep expansion network model is formed by training label samples, wherein the label samples are weak supervision label samples and are obtained by utilizing a traditional target object extraction algorithm or manual labeling.

The invention also provides application of the method for extracting the target object information from the video based on the super-resolution in the X-ray angiography video.

Compared with the prior art, the invention has the following beneficial effects:

first, the conventional robust principal component analysis algorithm utilizes iterative solution, the number of iterations is large, and a large amount of time is consumed, so that the method is limited in practical application. The invention constructs a convolution robust principal component analysis deep expansion network, firstly proposes to combine robust principal component analysis and deep expansion, and each layer of the network represents one iteration of an iterative algorithm. Under the general condition, the deep expansion network can obtain better results under the condition that the number of network layers is far less than the iteration times of the traditional algorithm, so that the time efficiency of the deep expansion network is greatly improved compared with the time efficiency of the original iteration algorithm. Therefore, the use of the robust principal component analysis deep expansion network enables the method to have higher real-time performance, and the method can be applied to clinical applications such as X-ray sequence vessel extraction.

Secondly, the convolution robust principal component analysis deep expansion network constructed by the method is firstly provided to be combined with a super-resolution module. The background portion of the transmission imaging image, such as the X-ray sequence image, contains overlapping complex anatomical structures, such as human tissues and organs, such as bones, lungs, vertebrae, diaphragms, and the like. Due to factors such as respiratory motion and human body movement, a certain amplitude of motion exists in part of background structures. Meanwhile, some structures in the background have morphological features and gray levels similar to those of blood vessels. These factors have a great influence on the extraction effect of the traditional robust principal component analysis method and the deep-expansion network method based on the robust principal component analysis only, and the blood vessel component in a partial region is difficult to distinguish from the background component, especially the small blood vessel component. According to the invention, a super-resolution module is embedded in a network layer, and the super-resolution module comprises a sampling layer and a subblock sparse feature selection network layer. The sampling layer is positioned before robust principal component analysis, and can screen input features, retain effective features, remove useless features and eliminate partial background vascular structure interference. The subblock sparse feature selection network layer can realize the functions of reducing the influence of complex mixed noise, extracting and enhancing image detail features and improving the detection rate of small blood vessels.

Thirdly, the sub-block sparse feature selection network layer comprises a residual error network layer and a cyclic neural network layer, and the selection of the blood vessel features by using the cyclic neural network is firstly proposed. The recurrent neural network has memorability, can transmit characteristic information between a front frame and a rear frame of a video sequence in the network, screens the characteristics and improves the detection rate of a current frame.

Fourthly, the method divides the video sequence into sub-blocks, solves the sub-blocks, and then splices the sub-blocks to obtain the prediction result of the target object. Transmission imaging images, such as X-ray sequence images, have complex mixed gaussian poisson noise caused by quantum noise and thermal noise generated by electronic devices. The poisson noise is signal-dependent, the strength of the noise is greatly related to the strength of the local signal, and a general global noise reduction algorithm is not suitable for the noise reduction of the poisson noise. The method can effectively reduce noise aiming at the mixed noise mode in the subblock region based on the subblock mode X-ray sequence image, and effectively remove the mixed Gaussian Poisson noise.

Drawings

FIG. 1 is a schematic flow diagram of the present invention;

FIG. 2 is a schematic diagram of a convolution robust principal component analysis deep expansion network constructed by the present invention.

Detailed Description

The invention is described in detail below with reference to the figures and specific embodiments. The present embodiment is implemented on the premise of the technical solution of the present invention, and a detailed implementation manner and a specific operation process are given, but the scope of the present invention is not limited to the following embodiments.

The invention provides a super-resolution-based method for extracting target object information from a video, which is applied to a transmission imaging video, wherein in the video imaging process, a target object has an attenuation effect on imaging light, and referring to fig. 1, the method comprises the following steps: acquiring a video sequence containing a target object; dividing the video sequence into sub-blocks, inputting the sub-blocks into a trained deep expansion network model for solving, and splicing the output to obtain a prediction result of a target object; the deep expansion network model is a convolution Robust Principal Component Analysis (RPCA) deep expansion network, is constructed and obtained by combining a super-resolution module according to a RPCA based on the RPCA, and is trained by weak supervision label samples acquired by using a traditional target object extraction algorithm.

Specifically, the specific construction process of the deep expansion network model is as follows: constructing a robust principal component analysis model based on video characteristics, and converting the robust principal component analysis model into a Lagrange form model; solving the Lagrange formal model, wherein the solving Method comprises but is not limited to soft threshold Iterative Solution (ISTA), Fast soft threshold Iterative Solution (ISTA), Alternating Direction multiplier Method (Alternating Direction Method of Multipliers), and obtaining the calculation formula of each motion layer of the video; performing depth expansion on the calculation formulas of the motion layers of the video to obtain a plurality of iteration layers, wherein the depth expansion specifically comprises the following steps: replacing coefficient items in the calculation formulas of all the motion layers with convolution layers, and replacing multiplication operation with convolution operation; and combining a plurality of iteration layers and a super-resolution module into the deep expansion network model.

Specifically, the super-resolution module includes a sampling layer and a sub-block sparse feature selection network layer, and the combination of the multiple iteration layers and the super-resolution module specifically includes: and embedding the sampling layer at the start position of the iteration layer, and embedding the sub-block sparse feature selection network layer at the end position of the iteration layer. The subblock sparse feature selection network layer comprises a residual error network layer and a recurrent neural network layer. Wherein, the sampling layer includes but is not limited to an average pooling layer, a maximum pooling layer, an overlapping pooling layer, an empty pyramid pooling layer, an upsampling layer, etc.; the recurrent neural network layer includes, but is not limited to, a conventional recurrent neural network, a bidirectional recurrent neural network, a gated recurrent neural network, a long-short term memory network, a convolutional long-short term memory network, and the like.

As shown in fig. 2, a Convolutional robust principal component analysis depth expansion network constructed based on the above method includes a Pooling layer (Pooling layer), a robust principal component analysis depth expansion layer (RPCAurolinglayer), and a Super-resolution layer (SR module), where the SR module includes a Convolutional layer (Convolutional layer), a Residual module (Residual module), a CLSTM module (Convolutional long short term memory network), and a Pixel Shuffle (Pixel reorganization).

The traditional target object extraction algorithm is a background-based completion method, and specifically comprises the following steps: segmenting the original image to obtain a background layer image of the region where the target object is removed; estimating a background gray value of a target object region through background completion to obtain an estimated background layer image; and obtaining a target object gray information image by subtracting the estimated background layer image from the original image.

The method utilizes the constructed convolution robust principal component analysis deep expansion network, and can effectively improve the real-time performance and accuracy of target object information extraction.

The above method, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Example 1

The embodiment of the present invention is based on the above method to realize accurate extraction of blood vessels of an X-ray angiography image sequence, and includes:

101) constructing a robust principal component analysis model for the X-ray angiography image sequence, namely:

min||L||_*+λ||S||₁s.t.D＝L+S

102) Writing a robust principal component analysis model (RPCA algorithm mathematical model) into Lagrange form:

wherein H₁And H₂Metric matrices of L and S, respectively, here taken as H₁＝H₂＝I，‖S‖_1,2Represents l of the matrix S_1,2Norm, λ₁And λ₂Are respectively positive for L and SThe parameters are normalized.

103) And solving the Lagrange form model by adopting a soft threshold iterative algorithm. In the iterative process, the low-rank matrix L and the sparse matrix S are updated until convergence. In the k +1 th iteration, L^k+1And S^k+1Can be updated according to the following equation:

wherein the content of the first and second substances,

is a singular value threshold operator that is,

is a soft threshold operator, L_fIs a Lipschitz constant which is a constant,

and

is H₁And H₂The conjugate matrix of (2).

104) And carrying out deep expansion on the robust principal component analysis solution. H is to be₁And H₂The constructed coefficient matrix terms are replaced by convolution layers, and the multiplication operations are replaced by convolution operations. Thus, the k-th layer in the expanded network is calculated as follows:

whereinRepresents the convolution operator and the convolution operation,

is a coiled-up layer, and is,

is a regularization parameter.

105) And connecting the expanded iterative network layers to construct a deep expanded network. The number of iteration layers constructed in this example is 4. The first two layers of convolution kernels have a size of 5 and the second two layers of convolution kernels have a size of 3. Regularization parameters for low rank components

Regularization parameter of sparse component of 0.4

Is 1.8.

106) A mean pooling layer is embedded starting at the iteration layer for down-sampling the input.

107) And a residual error module and a recurrent neural network module are embedded behind the robust principal component analysis module. In this embodiment, the recurrent neural network module employs a convolution long-short term memory network.

108) And segmenting the original blood vessel image sequence by using SVS-net to obtain a blood vessel region segmentation map sequence and a background region segmentation map sequence.

109) And solving the segmentation graph sequence of the background area by adopting a t-TNN tensor completion model to obtain a tensor image of the background layer data.

110) And dividing the original image sequence data by the elements at the same position in the background layer data tensor to obtain a blood vessel data tensor image.

111) And training the robust principal component analysis deep expansion network by using the blood vessel data tensor image to obtain a training model.

112) And (3) segmenting the image sequence of the blood vessel to be extracted into subblocks, inputting the subblocks into a training model, and splicing the output to obtain a complete output image. In this embodiment, the sub-blocks obtained by division have a size of 64 × 64 × 20 (length and width are 64, respectively, and the number of frames is 20), and adjacent sub-blocks partially overlap by 50%.

The overall implementation of the above-described accurate extraction of vessels from an X-ray angiographic image sequence is shown in fig. 1. The structure of the deep network iteration layer in the embodiment is shown in fig. 2.

The present embodiment further illustrates the above method by taking 43 clinical angiographic image sequences as an example, each angiographic image sequence includes 30-140 frames of images, each frame of image has a resolution of 512 × 512, each pixel represents an actual size of 0.3mm × 0.3mm, and the bit depth of the pixel is 8.

The blood vessel extraction process of the X-ray angiography image sequence has the advantages of effectively reducing the influence of background blood vessel structures and complex mixed noise, remarkably improving the small blood vessel extraction effect and the like.

The foregoing detailed description of the preferred embodiments of the invention has been presented. It should be understood that numerous modifications and variations could be devised by those skilled in the art in light of the present teachings without departing from the inventive concepts. Therefore, the technical solutions available to those skilled in the art through logic analysis, reasoning and limited experiments based on the prior art according to the concept of the present invention should be within the scope of protection defined by the claims.

Claims

1. A method for extracting target object information from a video based on super-resolution is applied to a transmission imaging video, and is characterized by comprising the following steps:

acquiring a video sequence containing a target object;

2. The super-resolution-based method for extracting target object information from a video according to claim 1, wherein the specific construction process of the deep-expansion network model is as follows:

3. The super-resolution-based method for extracting target object information from a video according to claim 2, wherein the iterative solution is implemented by using a linear inverse problem solution algorithm.

4. The super-resolution-based method for extracting target object information from a video according to claim 2, wherein each motion layer of the video comprises an approximately static background layer and a moving target object layer.

5. The super-resolution-based method for extracting target object information from a video according to claim 2, wherein the performing depth expansion specifically comprises: and replacing the coefficient items in the calculation formulas of all the motion layers with convolution layers, and replacing the multiplication operation with the convolution operation.

6. The super-resolution-based method for extracting target object information from a video according to claim 2, wherein the super-resolution module comprises a sampling layer and a sub-block sparse feature selection network layer, and the combination of the plurality of iteration layers and the super-resolution module is specifically:

7. The super-resolution-based method for extracting target object information from a video according to claim 6, wherein the sub-block sparse feature selection network layer comprises a residual network layer and a recurrent neural network layer.

8. The super-resolution-based method for extracting target object information from a video according to claim 1, wherein the conventional target object extraction algorithm comprises a background-completion-based extraction method.

9. The super-resolution-based method for extracting target object information from a video according to claim 1, wherein the deep-expanded network model is trained with label samples, the label samples are weakly supervised label samples, and are obtained by using a traditional target object extraction algorithm or manual labeling.

10. Use of the super resolution based method of extracting target object information from video according to any of claims 1-9 in X-ray angiography video.