CN114897783A

CN114897783A - Endoscopic exposure frame repair method based on anti-neural network RTGAN

Info

Publication number: CN114897783A
Application number: CN202210387487.8A
Authority: CN
Inventors: 潘晓英; 贺琪琪; 廉佳; 王昊
Original assignee: Xian University of Posts and Telecommunications
Current assignee: Xian University of Posts and Telecommunications
Priority date: 2022-04-14
Filing date: 2022-04-14
Publication date: 2022-08-12

Abstract

The invention discloses an endoscopic exposure frame repairing method based on an antagonistic neural network RTGAN, which solves the problem of serious overexposure of an operation video image caused by closed environment and high power of a light beam emitted by a visible light laser knife in the endoscopic operation process in the prior art. The invention comprises the following steps: (1) acquiring an endoscope video stream under black and white light and unfreezing the video stream; (2) preprocessing each frame of image, including deleting an exposure frame and repairing a highlight point; (3) training an anti-neural network RTGAN by utilizing a color video frame; (4) real-time coloring the preprocessed frame by using a trained antagonistic neural network RTGAN; (5) and synthesizing the color video frames restored by the RTGAN into a normal operation video stream.

Description

Endoscopic exposure frame repair method based on anti-neural network RTGAN

Technical Field

The invention belongs to the technical field of endoscope imaging, and relates to an endoscopic exposure frame repairing method based on an anti-neural network RTGAN.

Background

With the continuous development of endoscope imaging systems, a novel soft endoscope is developed from a traditional hard endoscope, so that the application scene of the endoscope is further enlarged, and the use mode is more flexible; meanwhile, laser technology is applied to endoscopic therapy and is becoming one of the important means of endoscopic therapy. In laser operation, because the laser sword during operation can send high energy visible light for endoscope imaging picture appears serious overexposure phenomenon, and soft endoscope can't install the light filter like hard endoscope, consequently, the doctor can inevitably appear a bit of invisible time when carrying out the laser to patient's focus, and this will be one of the important hidden danger that leads to the operation to take place accident.

In the prior art, there are also methods for detecting and repairing an exposure image, which are mainly divided into two types: physical-based methods and algorithm-based methods. The method adjusts the object distance and/or focal length of the front end lens of the endoscope through a feedback mechanism so as to stabilize the visual field area obtained by the front end lens, and the method looks reasonable, but in practice, the feedback mechanism often generates larger delay, so that the real-time performance is poor, and the continuous adjustment of the lens can bring a plurality of unstable factors to the laser operation; because the exposure condition occurring in the laser surgery is complex, the method based on the algorithm is difficult to accurately reflect the actual exposure condition of the image by utilizing the manually designed characteristics, and even the detected exposure frame is usually discarded or locally repaired, the method has large limitation, the repair result is not ideal when the large-area overexposure phenomenon is encountered, and the video is blocked because a large number of overexposure frames are lost.

Disclosure of Invention

The invention aims to provide an endoscopic exposure frame repairing method based on an antagonistic neural network RTGAN, which solves the problem of serious overexposure of an operation video image caused by closed environment and high power of a light beam emitted by a visible light laser knife in the endoscopic operation process in the prior art.

In order to achieve the purpose, the invention adopts the technical scheme that:

an exposure frame repairing method under an endoscope based on an anti-neural network RTGAN is characterized by comprising the following steps:

the method comprises the following steps: changing an endoscope imaging mode into a black and white mode, acquiring an endoscope video stream under black and white light, deleting an exposure frame by using inter-frame histogram difference, and repairing a highlight point by using a fast marching algorithm;

step two: training the RTGAN coloring network by using a video stream acquired when the endoscope imaging mode is color:

firstly, frame dismantling is carried out on collected color video streams, and preprocessing operations in the first step, including exposure frame deletion and highlight point repair, are carried out; taking the preprocessed color frame and a black and white frame corresponding to the color frame as an input training model of the RTGAN until model training reaches a stop condition;

step three: the coloring network RTGAN consists of a generator and a discriminator; the generator takes a structure of residual error module cascade as a main stem and introduces the spatial up-sampling of the step convolution learning;

step four: and (3) coloring the black-and-white image frames preprocessed in the step one in real time by using a generator in the trained RTGAN, and synthesizing the colored color video frames into a normal operation video stream.

The image preprocessing method in the first step comprises operations of deleting an exposure frame and repairing a highlight point, and the specific method comprises the following steps:

(1) the step of deleting the exposure frame by the inter-frame histogram difference is as follows:

in the black-and-white image sequence, taking the previous frame at the beginning of exposure as a normal frame, and calculating the brightness mean value T of the frame, wherein T is an exposure threshold value; in the sequence of black and white images, the luminance mean values X1, X2, X3 of the remaining frames are calculated. . . Xn; if the average value Xi of the ith frame image is greater than T, the algorithm judges that the frame is an overexposed frame and discards the overexposed frame, and then a copied frame of the previous frame is adopted to fill the overexposed frame; wherein, whether the exposure starts to be calculated through a physical signal or an algorithm is obtained;

(2) the steps of repairing the highlight point with the fast marching algorithm are as follows:

setting a highlight threshold value as S, and initializing a mask zero matrix M with the same size as the video frame; taking S as a threshold value, carrying out brightness division on each pixel point in the image, setting the number value of the corresponding position of the pixel point in M to be 1 by the pixel point which is larger than S, and otherwise, setting the number value to be 0; obtaining a mask M through threshold segmentation, wherein the position with the numerical value of 1 in the mask M is a highlight area needing to be repaired;

let the pixel to be repaired be p, u ⁽ⁱ⁾ Representing the gray scale value of pixel i, defining B _e (P) is a neighborhood of size e around P, B _e Contribution u of q to p of a pixel in (p) _q (p) is defined as:

wherein the content of the first and second substances,

and representing the gradient value of q, and then calculating the gray value of the P to be repaired according to the following formula:

wherein, B _e (P) is the 8 neighborhood of P, and the weight w (P, q) is obtained by using the following formula:

W(p，q)＝dir(p，q)·dst(p，q)·lev(p，q)

w (×) represents a weight function and is used for limiting the contribution size of each pixel in the neighborhood; dir (×) is a direction factor; dst (, x) is the geometric distance factor, lev (, x) is the horizontal set distance factor.

The design of the generator in the RTGAN network in step three is specifically as follows:

the generator adopts a structure of a coder-decoder and consists of three down-sampling layers, five middle layers and three up-sampling layers, and step-by-step convolution is adopted in up-sampling and down-sampling;

(1) the design of five middle layers in the generator includes:

the network agent is composed of 5 residual blocks, which are defined as:

Y _out ＝C(C(Y _in ))+Y _in

wherein, Y _in For the input of the current residual block, the convolution characteristic of the input layer output is in the first residual block, the output of the previous residual block is in the remaining four residual blocks, C (-) is the convolution operation, Y (-) is the convolution operation _out Is the output result of the residual block;

each residual block comprises two 3 multiplied by 3 convolutional layers, a batch normalization function and a ReLu activation function are added after the first convolutional layer, BN of the second convolutional layer is placed before addition, the situation that the original input is changed due to the fact that the BN layer transforms data distribution of the layer is avoided, and the ReLu activation function is added after addition; and then cascading the residual blocks to further enhance the color detail information of the image, wherein the residual cascade structure is defined as:

R _out ＝res ₅ (C _in )

wherein res ₅ (. represents a 5-layer residual block cascade operation, C _in As output results of the input layer, R _out The output result of the residual block cascade is obtained;

(2) the design of the up-sampling and down-sampling layers includes:

tensor filling is carried out on the image by utilizing boundary reflection at the initial position of the network, so that the input and the output of the network have the same size; adding batch processing normalization BN and ReLu nonlinear activation functions after all convolution layers of a lower sampling layer and an upper sampling layer; the first layer of the downsampling layer uses a convolution kernel of 9 × 9, the remaining two layers of downsampling layer use convolution kernels of 3 × 3, and the output characteristic of each layer of the convolution layer is expressed as:

x _i ＝H _i (x _i-1 )i＝1，2，3

wherein H _i (. cno.) represents the Conv-ReLu operation; i represents the ith convolutional layer; x is the number of _i Output characteristics representing the ith convolutional layer;

in the up-sampling layer, the network uses two deconvolution layers, and the output layer uses a tanh activation function to ensure that the output pixels are in the range of [0,255 ];

(3) the discriminator in the RTGAN network adopts a resnet18 network, and the loss function adopts BCELoss.

Compared with the prior art, the invention has the following beneficial effects:

the invention provides an endoscopic exposure frame repairing method based on an anti-neural network RTGAN. Convert the exposure frame detection and repair problem in endoscopy into its surrogate problem: the problem of rendering video frames in black and white mode. The black-and-white video stream can store more image detail information than the color video stream during exposure, so that after the black-and-white video stream is acquired, each frame of image is colored to achieve the purpose of repairing the exposed frame. The invention provides a real-time coloring method which can perform real-time coloring treatment on an endoscope video stream under black and white imaging. The method comprises the steps of firstly deleting fluency of an exposure frame operation video by using inter-frame histogram difference, introducing a fast marching algorithm to perform highlight restoration to improve coloring reality, and simultaneously providing a real-time coloring network RTGAN. Referring to fig. 5, the result of a comparison experiment between the method of the present invention and 5 coloring networks shows that the RTGAN only needs 0.026s for coloring a single frame image, the coloring speed is improved by 67 times to the maximum, the peak signal-to-noise ratio (PSNR) is 34.04, and the blue laser scalpel video can be colored in real time to obtain satisfactory visual quality.

Drawings

FIG. 1 is a flow chart of the method of the present invention;

FIG. 2 is a schematic diagram of a generator of the RTGAN of the present invention;

FIG. 3 is a diagram of a residual block RB structure employed in the present invention;

FIG. 4 is a graph of the test effect of the present invention;

FIG. 5 is a graph comparing results of various methods according to the embodiments of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Referring to fig. 1, the invention is an under-endoscope exposure frame repairing method based on an antagonistic neural network RTGAN, and the image processing method comprises the following steps:

the method comprises the following steps: and changing the endoscope imaging mode into a black and white mode, acquiring the endoscope video stream under black and white light, deleting the exposure frame by using the inter-frame histogram difference, and repairing the highlight point by using a fast advancing algorithm.

in the black-and-white image sequence, taking the previous frame at the beginning of exposure as a normal frame, and calculating the brightness mean value T of the frame, wherein T is an exposure threshold value; in the sequence of black and white images, the luminance mean values X1, X2, X3 of the remaining frames are calculated. . . And Xn. If the average value Xi of the ith frame image is greater than T, the algorithm judges the frame to be an overexposed frame and discards the overexposed frame, and then the frame is filled by adopting a copied frame of the previous frame. Wherein, whether the exposure starts or not can be calculated by a physical signal or an algorithm.

let the pixel to be repaired be p, u (i) to represent the gray value of the pixel i, and specify B _e (P) is a neighborhood of size e around P, B _e Contribution u of q to p of a pixel in (p) _q (p) is defined as:

wherein, the first and the second end of the pipe are connected with each other,

w(p，q)＝dir(p，q)·dst(p，q)·lev(p，q)

w (×) represents a weight function and is used for limiting the contribution size of each pixel in the neighborhood; dir (×) is a direction factor; dst (, x) is a geometric distance factor, and lev (, x) is a horizontal set distance factor.

Step two: and training the RTGAN coloring network by utilizing a video stream acquired when the endoscope imaging mode is color.

Firstly, frame splitting is carried out on the collected color video stream, and preprocessing operations in the first step are carried out, wherein the preprocessing operations comprise exposure frame deletion and highlight point repair. And taking the preprocessed color frame and a black and white frame corresponding to the color frame as an input training model of the RTGAN until the model training reaches a stop condition.

Step three: the coloring network RTGAN consists of a generator and a discriminator; the generator takes a structure of residual error module cascade as a main stem, and introduces the spatial up-sampling of step convolution learning, which is shown in fig. 2.

The generator adopts an autonomously designed codec structure and consists of three down-sampling layers, five middle layers and three up-sampling layers, and the up-sampling and the down-sampling adopt stride convolution to replace the traditional pooling operation so as to improve the efficiency and the learning capability;

(1) the design of five middle layers in the generator includes:

the network agent is composed of 5 residual blocks, see fig. 3, which is defined as:

Y _out ＝C(C(Y _in ))+Y _in

R _out ＝res ₅ (C _in )

wherein res ₅ (. represents a 5-layer residual block cascade operation, C _in As output results of the input layer, R _out Is the output result of the residual block cascade.

(2) The design of the up-sampling and down-sampling layers includes:

the input of the RTGAN network is 256 × 1, the output is 256 × 2, the image is tensor-filled by using boundary reflection at the beginning of the network, and the input and the output of the network have the same size; adding batch processing normalization BN and ReLu nonlinear activation functions after all convolution layers of the lower sampling layer and the upper sampling layer; the first layer of the downsampling layer uses a 9 × 9 convolution kernel, and the remaining two layers of downsampling use a 3 × 3 convolution kernel, and referring to fig. 2, the output characteristics of each layer of the convolution layer are shown as:

x _i ＝H _i (x _i-1 )i＝1，2，3

wherein H _i (. cno.) represents the Conv-ReLu operation; i represents the ith convolutional layer; x is a radical of a fluorine atom _i Representing the output characteristics of the ith convolutional layer.

In the upsampling layer, the network of the invention uses two deconvolution layers, and the output layer uses the tanh activation function to ensure that the output pixels are in the range of [0,255 ].

The loss of the discriminator uses BCEloss, namely the cross entropy of two categories, and the generator is updated by using the returned difference value, and the loss of the discriminator is the average value of the generated BCE and the BCE with the true value, and the specific formula is as follows:

l _n ＝-ω _n [y _n ·log x _n +(1-y _n )·log(1-x _n )]

where x represents the output of the model, and x ═ x (x) ₁ ，...，x _n ) (ii) a Vector x represents a batch, n represents batch _ size; ln represents the loss of each sample in a batch; w is a _n Representing different weights for different samples. For a batch, the following penalty vector will be obtained:

l(x，y)＝L＝l ₁ ，...，l _N ^T

and finally, calculating the average value of the loss vectors to obtain the final loss value.

Step four: and (3) using a generator in the trained RTGAN to color the black and white image frames preprocessed in the step one in real time, and synthesizing the colored video frames into a normal operation video stream, wherein the result is shown in figure 4.

The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes will occur to those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. An exposure frame repairing method under an endoscope based on an anti-neural network RTGAN is characterized by comprising the following steps:

step three: the coloring network RTGAN consists of a generator and a discriminator; the generator takes a structure of residual error module cascade as a backbone and introduces the spatial up-sampling of step convolution learning;

2. The method for repairing exposure frames under an endoscope based on the anti-neural network RTGAN as claimed in claim 1, wherein: the image preprocessing method in the first step comprises operations of deleting an exposure frame and repairing a highlight point, and the specific method comprises the following steps:

(2) the steps of repairing the highlight spot with the fast marching algorithm are:

setting a highlight threshold value as S, and initializing a mask zero matrix M with the same size as the video frame; taking S as a threshold value, carrying out brightness division on each pixel point in the image, setting the value of the corresponding position of the pixel point in M to be 1 by the pixel point which is larger than S, and otherwise, setting the value to be 0; obtaining a mask M through threshold segmentation, wherein the position with the numerical value of 1 in the mask M is a highlight area needing to be repaired;

wherein the content of the first and second substances,

w(p，q)＝dir(p，q)·dsq(p，q)·lev(p，q)

3. The method for repairing exposure frames under an endoscope based on the anti-neural network RTGAN as claimed in claim 1, wherein: the design of the generator in the RTGAN network in step three is specifically as follows:

(1) the design of five middle layers in the generator includes:

the network agent is composed of 5 residual blocks, which are defined as:

Y _out ＝C(C(Y _in ))+Y _in

each residual block comprises two 3 multiplied by 3 convolutional layers, a batch normalization function and a ReLu activation function are added behind the first convolutional layer, BN of the second convolutional layer is placed before addition, the situation that the original input is changed due to the fact that the data distribution of the layer is transformed by the BN layer is avoided, and the ReLu activation function is added behind the addition; and then cascading the residual blocks to further enhance the color detail information of the image, wherein the residual cascade structure is defined as:

R _out ＝res _s (C _in )

(2) the design of the up-sampling and down-sampling layers includes:

tensor filling is carried out on the image by utilizing boundary reflection at the initial position of the network, so that the input and the output of the network have the same size; adding batch processing normalization BN and ReLu nonlinear activation functions after all convolution layers of a lower sampling layer and an upper sampling layer; the first layer of the downsampling layer uses a convolution kernel of 9 × 9, the other two layers of downsampling use convolution kernels of 3 × 3, and the output characteristic of each layer of the convolution layer is expressed as:

x _i ＝H _i (x _i-1 )i＝1，2，3

wherein H _i (. cno.) represents the Conv-ReLu operation; i represents the ith convolutional layer; x is a radical of a fluorine atom _i Representing an output characteristic of the ith convolutional layer;