WO2023025245A1 - 视频图像处理方法、网络训练方法、电子设备、和计算机可读存储介质 - Google Patents

视频图像处理方法、网络训练方法、电子设备、和计算机可读存储介质 Download PDF

Info

Publication number
WO2023025245A1
WO2023025245A1 PCT/CN2022/114827 CN2022114827W WO2023025245A1 WO 2023025245 A1 WO2023025245 A1 WO 2023025245A1 CN 2022114827 W CN2022114827 W CN 2022114827W WO 2023025245 A1 WO2023025245 A1 WO 2023025245A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
network
loss
current image
calculating
Prior art date
Application number
PCT/CN2022/114827
Other languages
English (en)
French (fr)
Inventor
宋剑军
徐科
孔德辉
易自尧
杨维
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2023025245A1 publication Critical patent/WO2023025245A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/207Analysis of motion for motion estimation over a hierarchy of resolutions

Definitions

  • the embodiments of the present application relate to the field of image processing, and in particular, to a video image processing method, a network training method, electronic equipment, and a computer-readable storage medium.
  • video resolutions range from SD, HD, UHD, UHD to 4K/8K, and frame rates range from 30 frames, 60 frames, 90 frames to 120 frames, and the amount of information contained in videos is also increasing. It is constantly expanding, which will inevitably bring great pressure to the network bandwidth, and how to improve the quality of video images is becoming more and more important.
  • One way to improve the quality of video images is to continuously increase the transmission bit rate.
  • Another method is to perform super-resolution (SR, Super Resolution) processing before the video image is displayed. Obviously, the transmission bit rate cannot be increased infinitely, but SR processing can Constantly adjust according to the scene.
  • SR processing refers to improving the resolution of original video images by means of hardware or software, that is, the process of obtaining high-resolution video images by processing a series of low-resolution video images.
  • the core idea of SR processing is to exchange time bandwidth (that is, to obtain one or more frame image sequences in the same scene) for spatial resolution, and realize the conversion from time resolution to space resolution.
  • the current SR processing may fall into the situation where local features are relatively better, thus ignoring the correlation of the overall feature level.
  • the embodiment of the present application provides a video image processing method, including: using the first capsule network to perform feature extraction on the current image and N frames of reference images adjacent to the current image, to obtain the features of the current image vector and the feature vector of the reference image of each frame; N is an integer greater than or equal to 1; for the reference image of each frame, the feature vector of the current image and the reference image using the first attention network Correlation processing is performed on the eigenvectors of the current image to obtain a first correlation vector between the eigenvectors of the current image and the eigenvectors of the reference image; the first motion estimation network is used to perform motion estimation on the first correlation vector Processing to obtain the first inter-frame motion information; performing image transformation on the reference image according to the first inter-frame motion information to obtain an image-transformed reference image; and using a first motion compensation network to perform image transformation on the current image and all the The reference image after the image transformation is fused to obtain a first fused image; the super-resolution network is used to perform super-
  • the embodiment of the present application provides a video image processing method, including: for each frame of reference image adjacent to the current image, using a third motion estimation network to perform motion estimation processing on the current image and the reference image to obtain second inter-frame motion information; performing image transformation on the reference image according to the second inter-frame motion information to obtain an image-transformed reference image; using a second capsule network to transform the current image and all the images after transformation Feature extraction is performed on the reference image to obtain the feature vector of the current image and the feature vector of the reference image after the image transformation of each frame; for the reference image after the image transformation of each frame, the second attention network is used to Perform correlation processing on the feature vector of the current image and the feature vector of the reference image after image transformation to obtain a fifth correlation between the feature vector of the current image and the feature vector of the reference image after image transformation and using a second motion compensation network to perform fusion processing on the current image and all transformed reference images of the image according to all the fifth correlation vectors to obtain a second fusion image; using a super-resolution network
  • the embodiment of the present application provides a network training method, including: using the video image processing method in the first aspect above to process the current image and N frames of reference images adjacent to the current image to obtain super-resolution Target image; N is an integer greater than or equal to 1; calculate the L2 loss based on the target image and the corresponding real image, calculate the first information entropy loss, calculate the first reconstruction loss of the first capsule network, and calculate the first reconstruction loss according to the L2 loss, the first information entropy loss and the first reconstruction loss calculate a first total loss; and update the first capsule network, the first attention network, the first motion estimation network, the first total loss according to the first total loss All parameters that need to be trained in a motion compensation network and a super-resolution network, continue to execute the video image processing method using the above-mentioned first aspect to process the current image and N frames of reference images adjacent to the current image to obtain super step of resolving the target image until the first total loss is less than or equal to a first preset threshold.
  • the embodiment of the present application provides a network training method, including: based on the trained first capsule network, using the video image processing method in the first aspect to process the current image and the N frames adjacent to the current image
  • the reference image is processed to obtain a reference image after transformation of N frames of images; N is an integer greater than or equal to 1; the first reconstruction loss of the first capsule network is calculated, the second information entropy loss is calculated, and according to the first reconstruction Loss and the second information entropy loss to calculate a second total loss; update all the first capsule network, the first attention network and the first motion estimation network that need to be trained according to the second total loss parameters, continue to execute the step of using the video image processing method of the first aspect above to process the current image and N frames of reference images adjacent to the current image to obtain N frames of reference images after image transformation, until the first Two total loss is less than or equal to the second preset threshold; using the video image processing method of the first aspect above to process the current image and N frames of reference images adjacent to the current image to obtain a super
  • the embodiment of the present application provides a network training method, including: using the video image processing method of the second aspect above to process the current image and N frames of reference images adjacent to the current image to obtain super-resolution Target image; N is an integer greater than or equal to 1; calculate the L2 loss based on the target image and the corresponding real image, calculate the first information entropy loss, calculate the second reconstruction loss of the second capsule network, and calculate the second reconstruction loss according to the L2 Loss, the first information entropy loss and the second reconstruction loss calculate a third total loss; and update the second capsule network, the second attention network, the third motion estimation network, the third total loss according to the third total loss 2.
  • All parameters that need to be trained in the motion compensation network and the super-resolution network continue to execute the video image processing method using the above-mentioned second aspect to process the current image and N frames of reference images adjacent to the current image to obtain super step of resolving the target image until the third total loss is less than or equal to a third preset threshold.
  • the embodiment of the present application provides a network training method, including: using the video image processing method of the second aspect above to process the current image and N frames of reference images adjacent to the current image to obtain N frames of image transformation N is an integer greater than or equal to 1; calculate the second information entropy loss, update all parameters that need to be trained in the third motion estimation network according to the second information entropy loss, and continue to execute the above-mentioned
  • Calculate the second information entropy loss calculate the second reconstruction loss of the second capsule network, and calculate the fourth total
  • the embodiment of the present application provides an electronic device, including: at least one processor; and a memory, on which at least one computer program is stored, when the at least one computer program is executed by the at least one processor , implement any one of the above-mentioned video image processing methods, or any one of the above-mentioned network training methods.
  • an embodiment of the present application provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, any one of the above-mentioned video image processing methods, or Any one of the above network training methods.
  • FIG. 1 is a flow chart of a video image processing method provided in an embodiment of the present application
  • Fig. 2 is a schematic diagram of image changes during the video image super-resolution processing process of the embodiment of the present application
  • FIG. 3 is a schematic diagram of image changes during the super-resolution processing of video images according to an embodiment of the present application
  • FIG. 4 is a schematic diagram of changes in images during super-resolution processing of video images according to an embodiment of the present application
  • FIG. 5 is a flowchart of a video image processing method provided by an embodiment of the present application.
  • FIG. 6 is a schematic diagram of changes in images during super-resolution processing of video images according to an embodiment of the present application.
  • FIG. 7 is a schematic diagram of changes in images during super-resolution processing of video images according to an embodiment of the present application.
  • FIG. 8 is a schematic diagram of image changes during super-resolution processing of video images according to an embodiment of the present application.
  • FIG. 9 is a flowchart of a network training method provided in an embodiment of the present application.
  • FIG. 10 is a flowchart of a network training method provided in an embodiment of the present application.
  • FIG. 11 is a flow chart of a network training method provided in an embodiment of the present application.
  • Fig. 12 is a flowchart of a network training method provided by the embodiment of the present application.
  • FIG. 13 is a block diagram of an electronic device provided by an embodiment of the present application.
  • Video image super-resolution (SR, Super Resolution) processing is divided into two categories, namely video image restoration and video image interpolation.
  • Video image interpolation also includes video image resolution changes and video image frame number changes.
  • Video image resolution changes are, for example, infinite enlargement or reduction
  • video image frame number changes are, for example, frame insertion or frame extraction.
  • video super-resolution processing technology is derived from image super-resolution processing technology, and its purpose is to recover a high-resolution target image (Target Image) from one or more low-resolution reference images (Reference Image).
  • Target Image high-resolution target image
  • Reference Image reference images
  • the difference between video super-resolution processing technology and image super-resolution processing technology is also obvious. Since video is composed of multiple frames, video super-resolution processing technology usually uses inter-frame and intra-frame information to image Make repairs.
  • inter-frame information has a great impact on the performance of video super-resolution processing techniques. Proper and full use of inter-frame information can improve the final result of video super-resolution processing techniques.
  • Motion Estimation and Motion Compensation (MEMC, Motion Estimate and Motion Compensation) is a very mainstream method in video super-resolution processing technology.
  • the purpose of motion estimation (ME, Motion Estimate) is to extract inter-frame motion information
  • motion compensation (MC, Motion Compensation) is used to perform inter-frame warping operations based on inter-frame motion information to align them.
  • Motion estimation techniques are implemented by optical flow methods.
  • Optical flow methods calculate the motion between adjacent frames through their correlation and changes in the temporal domain.
  • Motion estimation methods are divided into traditional methods (such as LucasKanade, Druleas, etc.) and deep learning methods (such as FlowNet, FlowNet 2.0, and SpyNet, etc.).
  • Optical flow is the instantaneous speed of the pixel movement of the space moving object on the observation imaging plane.
  • the optical flow method uses the changes of pixels in the image sequence in the time domain and the correlation between adjacent frames to find the corresponding relationship between the previous frame and the current frame, thereby calculating the motion of objects between adjacent frames.
  • a method of information Usually, the instantaneous rate of change of gray level at a specific coordinate point on a two-dimensional image plane is defined as the optical flow vector.
  • the optical flow method takes two consecutive frames as input. Among the two consecutive frames, one is the image j corresponding to the target frame J, and the other is the adjacent frame of image j, that is, image i.
  • F i ⁇ j is the optical flow from image i to image j
  • h i ⁇ j is the horizontal component of the displacement change
  • v i ⁇ j is the vertical component of the displacement change
  • ME( ⁇ ) is the function to calculate the optical flow .
  • I i ⁇ j is an image i after image transformation
  • MC( ⁇ ) is a motion compensation function.
  • the optical flow method can improve the single-frame quality of the video.
  • the generated pixels can maintain continuity in time, so that the temporally matched pixels can be played coherently.
  • CNN Convolution Neural Networks
  • CNN can learn more global context information, and then use these context information to carry out Prediction, but because CNN is locally connected and parameter shared, and does not consider the correlation and mutual positional relationship between features, CNN lacks the hierarchical structure information of each feature.
  • CNN does not take into account the correlation and structure between these features, and may fall into the situation where local features are optimal, ignoring the correlation of the overall feature level.
  • FIG. 1 is a flowchart of a video image processing method provided by an embodiment of the present application.
  • an embodiment of the present application provides a video image processing method, including steps 100 to 102 .
  • Step 100 using the first capsule network cap-net1 for the current image and with the current image
  • the adjacent N frames of reference images are used for feature extraction to obtain the feature vector of the current image
  • the feature vector of each frame of reference image N is an integer greater than or equal to 1.
  • the adjacent N frames of reference images refer to the current image in time Neighboring N frames of reference images, e.g., temporally located in the current image Previous N frames of reference images
  • the feature vectors of the corresponding reference images are Alternatively, at the current image in time The next N frames of reference images
  • the feature vectors of the corresponding reference images are Alternatively, at the current image in time Previous M frame reference image and at the current image in time Subsequent (NM) frame reference images
  • the feature vectors of the corresponding reference images are as well as M is an integer greater than or equal to 1 and less than or equal to N.
  • Fig. 2 shows a schematic diagram of image changes in the process of video image super-resolution processing by taking N as 1 as an example.
  • the first capsule network cap-net1 used for feature extraction and the first capsule network cap-net1 used for feature extraction of the reference image belong to the same capsule network, or belong to different capsule networks; perform feature extraction on different reference images
  • the adopted first capsule network cap-net1 belongs to the same capsule network, or belongs to different capsule networks.
  • the first capsule network cap-net1 used for feature extraction and the first capsule network cap-net1 used for feature extraction of reference images belong to the same capsule network, and the first capsule network used for feature extraction of different reference images
  • the network cap-net1 belongs to the same capsule network
  • the adjacent N frames of reference images are used for feature extraction to obtain the feature vector of the current image
  • the feature vector of each frame reference image refers to the current image and with the current image
  • the adjacent N frames of reference images are sequentially input to the first capsule network cap-net1 to obtain the corresponding feature vectors. That is to say, the first capsule network cap-net1 can only process one frame of image at a time to obtain the corresponding feature vector.
  • the first capsule network cap-net1 includes at least one of the following: a convolutional layer, a main capsule layer, or a digital capsule layer.
  • the main capsule layer is also called the bottom capsule layer
  • the digital capsule layer is also called the high-level capsule layer
  • Step 101 For each frame of reference image, use the first attention network att-net1 to classify the current image The eigenvectors of and the feature vector of the reference image for correlation processing to obtain the current image The eigenvectors of The first correlation vector between the feature vector of the reference image; using the first motion estimation network ME-net1 to perform motion estimation processing on the first correlation vector to obtain the first inter-frame motion information; according to the first inter-frame motion information
  • the reference image is subjected to image transformation warp to obtain the reference image after image transformation.
  • the first attention network att-net1 is constructed using at least one of a channel attention mechanism, a spatial attention mechanism, and the like.
  • the first attention network att-net1 implements correlation processing using dot product calculation.
  • the reference image The corresponding first correlation vectors are reference image
  • the corresponding first correlation vectors are reference image as well as
  • the corresponding first correlation vectors are as well as
  • the first motion estimation network ME-net1 is implemented using at least one of the following: common methods, traditional methods, or optical flow methods. Ordinary methods such as Res-net, traditional methods such as LucasKanade, Druleas, etc., optical flow methods such as FlowNet, FlowNet 2.0 and SpyNet.
  • the first motion estimation network ME-net1 is used to directly perform motion estimation processing on the first correlation vector to obtain the first inter-frame motion information, as shown in FIG. 2; or, the second motion estimation network ME-net1 is used to obtain the first inter-frame motion information; net2 on the current image Perform feature extraction and correlation processing with the reference image to obtain a second correlation vector between the current image and the reference image; perform dot product on the first correlation vector and the second correlation vector Calculate and obtain a third correlation vector; use the first motion estimation network ME-net1 to perform motion estimation processing on the third correlation vector to obtain the first inter-frame motion information, as shown in Figure 3; or, performing dot product calculation on the first correlation vector and the current image to obtain a new current image; performing dot product calculation on the first correlation vector and the reference image to obtain a new reference image; using a second motion estimation
  • the network ME-net2 performs feature extraction and correlation processing on the new current image and the new reference image to obtain a fourth correlation vector between the new current image and the new reference image; using the The
  • the reference image The corresponding first inter-frame motion information is The corresponding reference images after image transformation are The corresponding second correlation vectors are reference image
  • the corresponding first inter-frame motion information is The corresponding reference images after image transformation are
  • the corresponding second correlation vectors are reference image as well as
  • the corresponding first inter-frame motion information is as well as
  • the corresponding reference images after image transformation are as well as
  • the corresponding second correlation vectors are as well as
  • Step 102 using the first motion compensation network MC-net1 to correct the current image Perform fusion processing with all image-transformed reference images to obtain the first fusion image Using the super-resolution network P-net for the first fused image Perform super-resolution processing to obtain a super-resolution target image
  • the first motion compensation network MC-net1 is at least one of a convolutional neural network Cnn-net or a recurrent neural network (Recurrent Neural Network, Rnn-net).
  • super-resolution processing includes at least one of resolution scaling, frame interpolation, or enhancement.
  • Resolution scaling refers to adding an upsample or downsample of an image
  • interpolation refers to adding an interpolated image
  • enhancement refers to adding an image restoration.
  • the first motion compensation network MC-net1 is used for the current image Perform fusion processing with all image-transformed reference images to obtain the first fusion image Including: for each frame of reference image, determining the weight of the reference image according to the first correlation vector; and using the first motion compensation network MC-net1, according to the weight of all reference images Perform fusion processing with all image-transformed reference images to obtain the first fusion image
  • the weight of the reference image is the average value of the first correlation vector. For example, if both the current image and the reference image are 64 ⁇ 64 ⁇ 3 vectors, then the first correlation vector is a 64 ⁇ 64 ⁇ C vector, then the weight of the reference image should be 64 ⁇ 64 ⁇ 1, that is, the first The correlation vector is averaged over the dimension corresponding to C.
  • multiple methods can be used to realize the weighting of the current image according to the weights of all reference images Perform fusion processing with all image-transformed reference images to obtain the first fusion image For example, taking the weight of the reference image as the coefficient of the reference image after image transformation, the current image Weighted average with all transformed reference images to obtain the first fused image or, respectively, from the current image Extract the corresponding features from the reference images after all image transformations, take the weight of the reference images as the coefficients of the features corresponding to the reference images after image transformation, and convert the current image The corresponding features and the features corresponding to the reference images after all image transformations are weighted and averaged to obtain the first fused image
  • the video image processing method provided by the embodiment of the present application uses the capsule network to obtain the feature vector of the corresponding image, and combines the attention network to improve the alignment effect of the features in the motion estimation network, thereby avoiding the situation of falling into the optimal solution of local features. Considering Correlation at the overall feature level.
  • FIG. 5 is a flowchart of a video image processing method provided by an embodiment of the present application.
  • the embodiment of the present application provides a video image processing method, including steps 500 to 503 .
  • Step 500 for the current image
  • the third motion estimation network ME-net3 is used for the current image Perform motion estimation processing with the reference image to obtain second inter-frame motion information; perform image transformation warp on the reference image according to the second inter-frame motion information to obtain an image-transformed reference image.
  • the third motion estimation network ME-net3 is equivalent to the superposition of the first motion estimation network ME-net1 and the second motion estimation network ME-net2 in the above video image processing method. That is to say, the third motion estimation network ME-net3 is actually first to the current image Feature extraction and correlation processing are performed with the reference image, and then motion estimation processing is performed to obtain the second inter-frame motion information.
  • the current image The adjacent reference image is N frames
  • the current image The adjacent N frames of reference images refer to the current image in time Neighboring N frames of reference images, e.g., temporally located in the current image Previous N frames of reference images
  • the corresponding second inter-frame motion information is respectively The corresponding reference images after image transformation are Alternatively, at the current image in time The next N frames of reference images
  • the corresponding second inter-frame motion information is respectively The corresponding reference images after image transformation are Alternatively, at the current image in time Previous M frame reference image and at the current image in time Subsequent (NM) frame reference images
  • the corresponding second inter-frame motion information is respectively as well as
  • the corresponding reference images after image transformation are as well as M is an integer greater than or equal to 1 and less than or equal to N.
  • Fig. 6 shows a schematic diagram of image changes during video image super-resolution processing by taking N as 1 as an example.
  • Step 501 using the second capsule network cap-net2 to process the current image Perform feature extraction with all image-transformed reference images to obtain the feature vector of the current image And the feature vector of the reference image after each frame image transformation.
  • the second capsule network cap-net2 used for feature extraction and the second capsule network used for feature extraction of the transformed reference image belong to the same capsule network, or belong to different capsule networks;
  • the second capsule network cap-net2 used for feature extraction of the reference image belongs to the same capsule network, or belongs to different capsule networks.
  • the second capsule network cap-net2 used for feature extraction and the second capsule network cap-net2 used for feature extraction of the reference image after image transformation belong to the same capsule network
  • the reference image after different image transformation is
  • the feature vector of the reference image after each frame of image transformation refers to the current image
  • the reference images transformed with all images are sequentially input to the second capsule network cap-net2 to obtain the corresponding feature vectors. That is to say, the second capsule network cap-net2 can only process one frame of image at a time to obtain the corresponding feature vector.
  • the second capsule network cap-net2 includes at least one of the following: a convolutional layer, a main capsule layer, or a digital capsule layer.
  • the main capsule layer is also called the bottom capsule layer
  • the digital capsule layer is also called the high-level capsule layer
  • the reference image The corresponding feature vectors of the reference image after image transformation are reference image
  • the corresponding feature vectors of the reference image after image transformation are reference image as well as
  • the corresponding feature vectors of the reference image after image transformation are as well as
  • Step 502 for the reference image after image transformation of each frame, adopt the second attention network att-net2 to the feature vector of the current image And the eigenvector of the reference image after the image transformation is subjected to correlation processing to obtain the eigenvector of the current image A fifth correlation vector with the feature vector of the image-transformed reference image.
  • the second attention network att-net2 is constructed by using at least one of a channel attention mechanism, a spatial attention mechanism, and the like.
  • the reference image The corresponding fifth correlation vectors are reference image
  • the corresponding fifth correlation vectors are reference image as well as
  • the corresponding fifth correlation vectors are as well as
  • Step 503 using the second motion compensation network MC-net2, according to all the fifth correlation vectors for the current image Perform fusion processing with all image-transformed reference images to obtain a second fusion image Using the super-resolution network P-net for the second fused image Perform super-resolution processing to obtain a super-resolution target image
  • the second motion compensation network MC-net2 is used to directly process the current image according to all fifth correlation vectors Perform fusion processing with all image-transformed reference images to obtain a second fusion image As shown in Figure 6; or, adopt the third motion compensation network MC-net3 to describe the current image Perform feature extraction and correlation processing with the reference image after image transformation to obtain a sixth correlation vector between the current image and the reference image after image transformation; combine the fifth correlation vector with the The sixth correlation vector is calculated by point multiplication to obtain the seventh correlation vector; using the second motion compensation network MC-net2, the current image is processed according to all the seventh correlation vectors performing fusion processing with all the transformed reference images of the images to obtain a second fusion image As shown in Figure 7; or, the fifth correlation vector and the current image Perform dot product calculation to obtain a new current image; perform dot product processing on the fifth correlation vector and the image-transformed reference image to obtain a new image-transformed reference image; adopt the third motion compensation network MC-net3 performing feature extraction and correlation processing on the new current
  • the current image is calculated according to all the fifth correlation vectors Perform fusion processing with all image-transformed reference images to obtain a second fusion image Including: for each frame of image-transformed reference image, determining the weight of the image-transformed reference image according to the fifth correlation vector; using the second motion compensation network MC-net2, according to the weight of all image-transformed reference images to current image Perform fusion processing with all image-transformed reference images to obtain a second fusion image
  • the weight of the image-transformed reference image is the average value of the fifth correlation vector.
  • the current image and the reference image after image transformation are both vectors of 64 ⁇ 64 ⁇ 3, then the fifth correlation vector is a vector of 64 ⁇ 64 ⁇ C, then the weight of the reference image after image transformation should be 64 ⁇ 64 ⁇ 1, that is, the fifth correlation vector is averaged in the dimension corresponding to C.
  • multiple methods can be used to realize the weight of the current image according to the weights of the reference images transformed by all images.
  • Perform fusion processing with all image-transformed reference images to obtain a second fusion image For example, taking the weight of the reference image after image transformation as the coefficient of the reference image after image transformation, the current image and the weighted average of all image-transformed reference images to obtain the second fused image or, respectively, from the current image Extract the corresponding features from the reference images after all image transformations, take the weights of the reference images after image transformation as the coefficients of the features corresponding to the reference images after image transformation, and convert the current image
  • the corresponding features and the features corresponding to the reference images after all image transformations are weighted and averaged to obtain the second fused image
  • the current image is analyzed according to all the seventh correlation vectors performing fusion processing with all the transformed reference images of the images to obtain a second fusion image and for the current image according to all of the eighth correlation vectors performing fusion processing with all the transformed reference images of the images to obtain a second fusion image
  • the implementation process is based on all the fifth correlation vectors for the current image performing fusion processing with all the transformed reference images of the images to obtain a second fusion image
  • the implementation process is similar and will not be repeated here.
  • the video image processing method provided by the embodiment of the present application uses the capsule network to obtain the feature vector of the corresponding image, and combines the attention network to improve the alignment effect of the features in the motion estimation network, thereby avoiding the situation of falling into the optimal solution of local features. Considering Correlation at the overall feature level.
  • FIG. 9 is a flowchart of a network training method provided by an embodiment of the present application.
  • the embodiment of the present application provides a network training method, including steps 900 to 902 .
  • Step 900 using the video image processing method in steps 100 to 102 above to process the current image and N frames of reference images adjacent to the current image to obtain a super-resolution target image; N is an integer greater than or equal to 1.
  • Step 901 Calculate the L2 loss according to the target image and the corresponding real image, calculate the first information entropy loss, calculate the first reconstruction loss of the first capsule network, and calculate the first reconstruction loss according to the L2 loss, the first information entropy loss and the first reconstruction loss Calculate the first total loss.
  • calculating the L2 loss according to the target image and the corresponding real image includes: according to the formula Calculate L2 loss; Loss SR is L2 loss, H is the height of the target image, W is the width of the target image, is the pixel value corresponding to row i and column j of the target image, is the pixel value corresponding to row i and column j of the real image, and
  • is a square function.
  • calculating the first information entropy loss includes any one of the following:
  • a first information entropy loss is calculated according to the reference image and the image-transformed reference image.
  • calculating the first information entropy loss according to the target image and the real image includes: according to the formula Calculate the first information entropy loss; Loss in is the first information entropy loss, is the information entropy of the target image, and is the information entropy of the real image.
  • calculating the first information entropy loss according to the target image and the current image includes: according to the formula Calculate the first information entropy loss; Loss in is the first information entropy loss, is the information entropy of the target image, and is the information entropy of the current image.
  • calculating the first information entropy loss according to the reference image and the image-transformed reference image includes: according to the formula Calculate the first information entropy loss; Loss in is the first information entropy loss, is the information entropy of the kth reference image, and is the information entropy of the transformed reference image of the kth image.
  • the pixel value xi is distributed in the range from 0 to N, and P xi is the probability of the pixel value xi .
  • P xi is the probability of the pixel value xi .
  • the logarithm in the formula generally takes 2 as the base.
  • calculating the first reconstruction loss of the first capsule network includes any one of the following:
  • a first reconstruction loss is calculated according to the current image and the feature vector of the current image.
  • calculating the first reconstruction loss according to the current image and the feature vector of the current image includes: according to the formula Calculate the first reconstruction loss; Loss recon is the first reconstruction loss, H is the height of the current image, W is the width of the current image, is the pixel value of row i and column j of the current image, and is the pixel value of row i and column j of the feature vector of the current image.
  • calculating the first reconstruction loss according to the kth reference image and the feature vector of the kth reference image includes: according to the formula Calculate the first reconstruction loss; Loss recon is the first reconstruction loss, H is the height of the kth reference image, W is the width of the kth reference image, is the pixel value of row i and column j of the kth reference image, and is the pixel value of row i and column j of the feature vector of the kth reference image.
  • calculating the first total loss according to the L2 loss, the first information entropy loss and the first reconstruction loss includes: adding the L2 loss, the first information entropy loss and the first reconstruction loss to obtain the first total loss .
  • Step 902 Update all the parameters that need to be trained in the first capsule network, the first attention network, the first motion estimation network, the first motion compensation network and the super-resolution network according to the first total loss, and continue to execute the above step 900 until The first total loss is less than or equal to a first preset threshold.
  • all parameters to be trained in the first capsule network, the first attention network, the first motion estimation network, the first motion compensation network and the super-resolution network are updated according to the first total loss.
  • all of the first capsule network, the first attention network, the first motion estimation network, the second motion estimation network, the first motion compensation network, and the super-resolution network that need to be trained are updated according to the first total loss. parameter.
  • the network training method provided by the embodiment of the present application calculates the total loss used to update the training parameters based on L2 loss, information entropy loss, and capsule network reconstruction loss.
  • the capsule network-based output reconstruction is guaranteed due to the capsule network reconstruction loss.
  • the consistency between the image and the image input to the capsule network ensures the accuracy of feature extraction by the capsule network.
  • the information entropy loss not only ensures the consistency between the output target image and the basic features of the current image, that is, the spatial domain information
  • the fluctuation is small, which also ensures the consistency of the basic features of the output target image and the reference image, and the fluctuation of the instant domain information is small, so that when the video image super-resolution is processed based on the trained network, the spatial domain information and time domain information are obtained.
  • the target image with less fluctuation of information improves the effect of video image processing.
  • FIG. 10 is a flowchart of a network training method provided by an embodiment of the present application.
  • the embodiment of the present application provides a network training method, including steps 1000 to 1005 .
  • Step 1000 based on the trained first capsule network, use the video image processing method in the above steps 100 to 102 to process the current image and N frames of reference images adjacent to the current image to obtain N frames of reference images after transformation; N is an integer greater than or equal to 1.
  • Step 1001. Calculate the first reconstruction loss of the first capsule network, calculate the second information entropy loss, and calculate the second total loss according to the first reconstruction loss and the second information entropy loss.
  • calculating the second information entropy loss includes: calculating the second information entropy loss according to the reference image and the image-transformed reference image.
  • calculating the second information entropy loss according to the reference image and the image-transformed reference image includes: according to the formula Calculate the second information entropy loss; Loss in is the second information entropy loss, is the information entropy of the kth reference image, and is the information entropy of the transformed reference image of the kth image.
  • the pixel value xi is distributed in the range from 0 to N, and P xi is the probability of the pixel value xi .
  • P xi is the probability of the pixel value xi .
  • the logarithm in the formula generally takes 2 as the base.
  • calculating the first reconstruction loss of the first capsule network includes any one of the following:
  • a first reconstruction loss is calculated according to the current image and the feature vector of the current image.
  • calculating the first reconstruction loss according to the current image and the feature vector of the current image includes: according to the formula Calculate the first reconstruction loss; Loss recon is the first reconstruction loss, H is the height of the current image, W is the width of the current image, is the pixel value of row i and column j of the current image, and is the pixel value of row i and column j of the feature vector of the current image.
  • calculating the first reconstruction loss according to the kth reference image and the feature vector of the kth reference image includes: according to the formula Calculate the first reconstruction loss; Loss recon is the first reconstruction loss, H is the height of the kth reference image, W is the width of the kth reference image, is the pixel value of row i and column j of the kth reference image, and is the pixel value of row i and column j of the feature vector of the kth reference image.
  • calculating the second total loss according to the first reconstruction loss and the second information entropy loss includes: adding the first reconstruction loss and the second information entropy loss to obtain the second total loss.
  • Step 1002 Update all the parameters that need to be trained in the first capsule network, the first attention network and the first motion estimation network according to the second total loss, and continue to execute the above step 1000 until the second total loss is less than or equal to the second preset threshold.
  • all parameters to be trained in the first capsule network, the first attention network and the first motion estimation network are updated according to the second total loss. In some other embodiments, all parameters to be trained in the first capsule network, the first attention network, the first motion estimation network and the second motion estimation network are updated according to the second total loss.
  • Step 1003 Process the current image and N frames of reference images adjacent to the current image by using the video image processing method in steps 100 to 102 above to obtain a super-resolution target image.
  • Step 1004 calculate the L2 loss according to the target image and the corresponding real image, calculate the first information entropy loss, calculate the first reconstruction loss of the first capsule network, and calculate the first reconstruction loss according to the L2 loss, the first information entropy loss and the first reconstruction loss Calculate the first total loss.
  • calculating the L2 loss according to the target image and the corresponding real image includes: according to the formula Calculate L2 loss; Loss SR is L2 loss, H is the height of the target image, W is the width of the target image, is the pixel value corresponding to row i and column j of the target image, is the pixel value corresponding to row i and column j of the real image, and
  • is a square function.
  • calculating the first information entropy loss includes any one of the following:
  • a first information entropy loss is calculated according to the reference image and the image-transformed reference image.
  • calculating the first information entropy loss according to the target image and the real image includes: according to the formula Calculate the first information entropy loss; Loss in is the first information entropy loss, is the information entropy of the target image, and is the information entropy of the real image.
  • calculating the first information entropy loss according to the target image and the current image includes: according to the formula Calculate the first information entropy loss; Loss in is the first information entropy loss, is the information entropy of the target image, and is the information entropy of the current image.
  • calculating the first information entropy loss according to the reference image and the image-transformed reference image includes: according to the formula Calculate the first information entropy loss; Loss in is the first information entropy loss, is the information entropy of the kth reference image, and is the information entropy of the transformed reference image of the kth image.
  • calculating the first total loss according to the L2 loss, the first information entropy loss and the first reconstruction loss includes: adding the L2 loss, the first information entropy loss and the first reconstruction loss to obtain the first total loss .
  • Step 1005 update all the parameters that need to be trained in the first capsule network, the first attention network, the first motion estimation network, the first motion compensation network and the super-resolution network according to the first total loss, and continue to execute the above step 1000 until The first total loss is less than or equal to a first preset threshold.
  • all parameters to be trained in the first capsule network, the first attention network, the first motion estimation network, the first motion compensation network and the super-resolution network are updated according to the first total loss.
  • all of the first capsule network, the first attention network, the first motion estimation network, the second motion estimation network, the first motion compensation network, and the super-resolution network that need to be trained are updated according to the first total loss. parameter.
  • the network training method provided in the embodiment of the present application performs stage-by-stage training on the network in the video image processing method, which further improves the training effect.
  • FIG. 11 is a flow chart of a network training method provided by an embodiment of the present application.
  • the embodiment of the present application provides a network training method, including steps 1100 to 1102 .
  • Step 1100 using the video image processing method in steps 500 to 503 above to process the current image and N frames of reference images adjacent to the current image to obtain a super-resolution target image; N is an integer greater than or equal to 1.
  • Step 1101 calculate the L2 loss according to the target image and the corresponding real image, calculate the first information entropy loss, calculate the second reconstruction loss of the second capsule network, and calculate the L2 loss, the first information entropy loss and the second reconstruction loss Calculate the third total loss.
  • calculating the L2 loss according to the target image and the corresponding real image includes: according to the formula Calculate L2 loss; Loss SR is L2 loss, H is the height of the target image, W is the width of the target image, is the pixel value corresponding to row i and column j of the target image, is the pixel value corresponding to row i and column j of the real image, and
  • is a square function.
  • calculating the first information entropy loss includes any one of the following:
  • a first information entropy loss is calculated according to the reference image and the image-transformed reference image.
  • calculating the first information entropy loss according to the target image and the real image includes: according to the formula Calculate the first information entropy loss; Loss in is the first information entropy loss, is the information entropy of the target image, and is the information entropy of the real image.
  • calculating the first information entropy loss according to the target image and the current image includes: according to the formula Calculate the first information entropy loss; Loss in is the first information entropy loss, is the information entropy of the target image, and is the information entropy of the current image.
  • calculating the first information entropy loss according to the reference image and the image-transformed reference image includes: according to the formula Calculate the first information entropy loss; Loss in is the first information entropy loss, is the information entropy of the kth reference image, and is the information entropy of the transformed reference image of the kth image.
  • the pixel value xi is distributed in the range from 0 to N, and P xi is the probability of the pixel value xi .
  • P xi is the probability of the pixel value xi .
  • the logarithm in the formula generally takes 2 as the base.
  • calculating the second reconstruction loss of the second capsule network includes any one of the following:
  • a second reconstruction loss is calculated based on the current image and the feature vector of the current image.
  • calculating the second reconstruction loss according to the current image and the feature vector of the current image includes: according to the formula Calculate the second reconstruction loss; Loss recon is the second reconstruction loss, H is the height of the current image, W is the width of the current image, is the pixel value of row i and column j of the current image, and is the pixel value of row i and column j of the feature vector of the current image.
  • calculating the second reconstruction loss according to the transformed reference image of the kth image and the feature vector of the transformed reference image of the kth image includes: according to the formula Calculate the second reconstruction loss; Loss recon is the second reconstruction loss, H is the height of the reference image after the transformation of the kth image, W is the width of the reference image after the transformation of the kth image, is the pixel value of row i and column j of the transformed reference image of the kth image, and is the pixel value of the i-th row and j-th column of the feature vector of the reference image after the transformation of the k-th image.
  • calculating the third total loss according to the L2 loss, the first information entropy loss and the second reconstruction loss includes: adding the L2 loss, the first information entropy loss and the second reconstruction loss to obtain the third total loss .
  • Step 1102 update all the parameters that need to be trained in the second capsule network, the second attention network, the third motion estimation network, the second motion compensation network and the super-resolution network according to the third total loss, and continue to execute the above step 1100 until The third total loss is less than or equal to a third preset threshold.
  • all parameters to be trained in the second capsule network, the second attention network, the third motion estimation network, the second motion compensation network and the super-resolution network are updated according to the third total loss. In some embodiments, all of the second capsule network, the second attention network, the third motion estimation network, the second motion compensation network, the third motion compensation network and the super-resolution network that need to be trained are updated according to the third total loss parameter.
  • the network training method provided by the embodiment of the present application calculates the total loss used to update the training parameters based on L2 loss, information entropy loss, and capsule network reconstruction loss.
  • the capsule network-based output reconstruction is guaranteed due to the capsule network reconstruction loss.
  • the consistency between the image and the image input to the capsule network ensures the accuracy of feature extraction by the capsule network.
  • the information entropy loss not only ensures the consistency between the output target image and the basic features of the current image, that is, the spatial domain information
  • the fluctuation is small, which also ensures the consistency of the basic features of the output target image and the reference image, and the fluctuation of the instant domain information is small, so that when the video image super-resolution is processed based on the trained network, the spatial domain information and time domain information are obtained.
  • the target image with less fluctuation of information improves the effect of video image processing.
  • FIG. 12 is a flow chart of a network training method provided by an embodiment of the present application.
  • the embodiment of the present application provides a network training method, including steps 1200 to 1207 .
  • Step 1200 using the video image processing method in steps 500 to 503 above to process the current image and N frames of reference images adjacent to the current image to obtain converted reference images of N frames of images; N is an integer greater than or equal to 1.
  • Step 1201 calculate the second information entropy loss, update all the parameters that need to be trained in the third motion estimation network according to the second information entropy loss, and continue to execute the above step 1200 until the second information entropy loss is less than or equal to the fourth preset threshold.
  • calculating the second information entropy loss includes:
  • a second information entropy loss is calculated according to the reference image and the image-transformed reference image.
  • calculating the second information entropy loss according to the reference image and the image-transformed reference image includes: according to the formula Calculate the second information entropy loss; Loss in is the second information entropy loss, is the information entropy of the kth reference image, and is the information entropy of the transformed reference image of the kth image.
  • the pixel value xi is distributed in the range from 0 to N, and P xi is the probability of the pixel value xi .
  • P xi is the probability of the pixel value xi .
  • the logarithm in the formula generally takes 2 as the base.
  • Step 1202 using the video image processing method in the above steps 500 to 503 to process the current image and N frames of reference images adjacent to the current image, to obtain the feature vector of the current image and the features of the reference image after each frame of image transformation vector.
  • Step 1203 calculate the second information entropy loss, calculate the second reconstruction loss of the second capsule network, and calculate the fourth total loss according to the second information entropy loss and the second reconstruction loss.
  • calculating the second reconstruction loss of the second capsule network includes any one of the following:
  • a second reconstruction loss is calculated based on the current image and the feature vector of the current image.
  • calculating the second reconstruction loss according to the current image and the feature vector of the current image includes: according to the formula Calculate the second reconstruction loss; Loss recon is the second reconstruction loss, H is the height of the current image, W is the width of the current image, is the pixel value of row i and column j of the current image, and is the pixel value of row i and column j of the feature vector of the current image.
  • calculating the second reconstruction loss according to the transformed reference image of the kth image and the feature vector of the transformed reference image of the kth image includes: according to the formula Calculate the second reconstruction loss; Loss recon is the second reconstruction loss, H is the height of the reference image after the transformation of the kth image, W is the width of the reference image after the transformation of the kth image, is the pixel value of row i and column j of the transformed reference image of the kth image, and is the pixel value of the i-th row and j-th column of the feature vector of the reference image after the transformation of the k-th image.
  • calculating the fourth total loss according to the second information entropy loss and the second reconstruction loss includes: adding the second information entropy loss and the second reconstruction loss to obtain the fourth total loss.
  • Step 1204 update the parameters that need to be trained in the third motion estimation network and the second capsule network according to the fourth total loss, and continue to execute the above step 1200 until the fourth total loss is less than or equal to the fifth preset threshold.
  • Step 1205 using the video image processing method in steps 500 to 503 above to process the current image and N frames of reference images adjacent to the current image to obtain a super-resolution target image.
  • Step 1206 Calculate the L2 loss according to the target image and the corresponding real image, calculate the first information entropy loss, calculate the second reconstruction loss of the second capsule network, and calculate the L2 loss, the first information entropy loss and the second reconstruction loss Calculate the third total loss.
  • calculating the L2 loss according to the target image and the corresponding real image includes: according to the formula Calculate L2 loss; Loss SR is L2 loss, H is the height of the target image, W is the width of the target image, is the pixel value corresponding to row i and column j of the target image, is the pixel value corresponding to row i and column j of the real image, and
  • is a square function.
  • calculating the first information entropy loss includes any one of the following:
  • a first information entropy loss is calculated according to the reference image and the image-transformed reference image.
  • calculating the first information entropy loss according to the target image and the real image includes: according to the formula Calculate the first information entropy loss; Loss in is the first information entropy loss, is the information entropy of the target image, and is the information entropy of the real image.
  • calculating the first information entropy loss according to the target image and the current image includes: according to the formula Calculate the first information entropy loss; Loss in is the first information entropy loss, is the information entropy of the target image, and is the information entropy of the current image.
  • calculating the first information entropy loss according to the reference image and the image-transformed reference image includes: according to the formula Calculate the first information entropy loss; Loss in is the first information entropy loss, is the information entropy of the kth reference image, and is the information entropy of the transformed reference image of the kth image.
  • the pixel value xi is distributed in the range from 0 to N, and P xi is the probability of the pixel value xi .
  • P xi is the probability of the pixel value xi .
  • the logarithm in the formula generally takes 2 as the base.
  • calculating the third total loss according to the L2 loss, the first information entropy loss and the second reconstruction loss includes: adding the L2 loss, the first information entropy loss and the second reconstruction loss to obtain the third total loss .
  • Step 1207 update all the parameters that need to be trained in the second capsule network, the second attention network, the third motion estimation network, the second motion compensation network and the super-resolution network according to the third total loss, and continue to execute steps 1200 to 1206, until the third total loss is less than or equal to the third preset threshold.
  • all parameters to be trained in the second capsule network, the second attention network, the third motion estimation network, the second motion compensation network and the super-resolution network are updated according to the third total loss. In some embodiments, all of the second capsule network, the second attention network, the third motion estimation network, the second motion compensation network, the third motion compensation network and the super-resolution network that need to be trained are updated according to the third total loss parameter.
  • the network training method provided in the embodiment of the present application performs stage-by-stage training on the network in the video image processing method, which further improves the training effect.
  • the embodiment of the present application provides an electronic device 1300.
  • the electronic device 1300 includes: at least one processor 1301; When a computer program is executed by the at least one processor 1301, any one of the above-mentioned video image processing methods, or any one of the above-mentioned network training methods can be realized.
  • Processor is a device with data processing capability, including but not limited to central processing unit (CPU), etc.; and memory is a device with data storage capability, which includes but not limited to random access memory (RAM, such as SDRAM, DDR, etc. ), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), and flash memory (FLASH).
  • RAM random access memory
  • ROM read-only memory
  • EEPROM electrically erasable programmable read-only memory
  • FLASH flash memory
  • the processor 1301 and the memory 1302 are connected to each other through a bus 1303 , and further connected to other components of the computing device.
  • the embodiment of the present application provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, any of the above-mentioned video image processing methods, or any of the above-mentioned A network training method.
  • Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media).
  • computer storage media includes both volatile and nonvolatile media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules, or other data. permanent, removable and non-removable media.
  • Computer storage media include, but are not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cartridges, tape, magnetic disk storage or other magnetic storage, or may be used Any other medium that stores desired information and can be accessed by a computer.
  • communication media typically embodies computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media .

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

本申请提供了一种视频图像处理方法、一种网络训练方法、一种电子设备、以及一种计算机可读存储介质,所述视频图像处理方法包括:采用第一胶囊网络对当前图像和与当前图像相邻的N帧参考图像进行特征提取,得到当前图像的特征向量以及每一帧参考图像的特征向量,N为大于或等于1的整数;采用第一注意力网络对当前图像的特征向量以及参考图像的特征向量进行相关性处理,得到第一相关性向量;采用第一运动估计网络对第一相关性向量进行运动估计处理得到第一帧间运动信息;根据第一帧间运动信息对参考图像进行图像变换得到图像变换后的参考图像;以及采用第一运动补偿网络对当前图像和所有图像变换后的参考图像进行融合处理得到第一融合图像;对第一融合图像进行超分辨率处理得到目标图像。

Description

视频图像处理方法、网络训练方法、电子设备、和计算机可读存储介质
相关申请的交叉引用
本申请要求于2021年8月25日提交的中国专利申请NO.202110985417.8的优先权,该中国专利申请的内容通过引用的方式整体合并于此。
技术领域
本申请实施例涉及图像处理领域,特别涉及视频图像处理方法、网络训练方法、电子设备、以及计算机可读存储介质。
背景技术
随着视频图像行业的快速发展,视频的分辨率从标清、高清、超清、超高清到4K/8K,帧率从30帧、60帧、90帧到120帧,视频中包含的信息量也在不断扩大,这势必会给网络带宽带来极大的压力,如何提高视频图像质量变得越来越重要。提高视频图像质量的一种方法是不断提高传输码率,另一种方法是在视频图像显示前进行超分辨率(SR,Super Resolution)处理,显然传输码率不能够无限增加,而SR处理能够根据场景进行不断调整。
SR处理是指通过硬件或软件的方法来提高原有视频图像的分辨率,即通过对一系列低分辨率的视频图像进行处理来得到高分辨率的视频图像的过程。SR处理的核心思想就是用时间带宽(即获取同一场景中的一帧或多帧图像序列)换取空间分辨率,实现时间分辨率向空间分辨率的转换。
目前的SR处理有可能会陷入局部特征相对较优的情况,从而忽略了整体特征层次的相关性。
公开内容
第一方面,本申请实施例提供一种视频图像处理方法,包括:采用第一胶囊网络对当前图像和与所述当前图像相邻的N帧参考图像进行特征提取,得到所述当前图像的特征向量以及每一帧所述参考图像的特征向量;N为大于或等于1的整数;针对每一帧所述参考图像,采用第一注意力网络对所述当前图像的特征向量以及所述参考图像的特征向量进行相关性处理,得到所述当前图像的特征向量与所述参考图像的特征向量之间的第一相关性向量;采用第一运动估计网络对所述第一相关性向量进行运动估计处理得到第一帧间运动信息;根据所述第一帧间运动信息对所述参考图像进行图像变换得到图像变换后的参考图像;以及采用第一运动补偿网络对所述当前图像和所有所述图像变换后的参考图像进行融合处理得到第一融合图像;采用超分辨率网络对所述第一融合图像进行超分辨率处理得到超分辨率的目标图像。
第二方面,本申请实施例提供一种视频图像处理方法,包括:针对与当前图像相邻的每一帧参考图像,采用第三运动估计网络对当前图像和所述参考图像进行运动估计处理得到第二帧间运动信息;根据所述第二帧间运动信息对所述参考图像进行图像变换得到图像变换后的参考图像;采用第二胶囊网络对所述当前图像和所有所述图像变换后的参考图像进行特征提取,得到所述当前图像的特征向量以及每一帧所述图像变换后的参考图像的特征向量;针对每一帧所述图像变换后的参考图像,采用第二注意力网络对所述当前图像的特征向量以及所述图像变换后的参考图像的特征向量进行相关性处理,得到所述当前图像的特征向量与所述图像变换后的参考图像的特征向量之间的第五相关性向量;以及采用第二运动补偿网络,根据所有所述第五相关性向量对所述当前图像和所有所述图像变换后的参考图像进行融合处理得到第二融合图像;采用超分辨率网络对所述第二融合图像进行超分辨率处理得到超分辨率的目标图像。
第三方面,本申请实施例提供一种网络训练方法,包括:采用上述第一方面的视频图像处理方法对当前图像和与所述当前图像相邻的N帧参考图像进行处理得 到超分辨率的目标图像;N为大于或等于1的整数;根据所述目标图像和对应的真实图像计算L2损失,计算第一信息熵损失,计算第一胶囊网络的第一重构损失,以及根据所述L2损失、所述第一信息熵损失和所述第一重构损失计算第一总损失;以及根据所述第一总损失更新第一胶囊网络、第一注意力网络、第一运动估计网络、第一运动补偿网络和超分辨率网络中需要训练的所有参数,继续执行所述采用上述第一方面的视频图像处理方法对当前图像和与所述当前图像相邻的N帧参考图像进行处理得到超分辨率的目标图像的步骤,直到所述第一总损失小于或等于第一预设阈值。
第四方面,本申请实施例提供一种网络训练方法,包括:基于训练好的第一胶囊网络,采用上述第一方面的视频图像处理方法对当前图像和与所述当前图像相邻的N帧参考图像进行处理得到N帧图像变换后的参考图像;N为大于或等于1的整数;计算第一胶囊网络的第一重构损失,计算第二信息熵损失,以及根据所述第一重构损失和所述第二信息熵损失计算第二总损失;根据所述第二总损失更新所述第一胶囊网络、所述第一注意力网络和所述第一运动估计网络中需要训练的所有参数,继续执行所述采用上述第一方面的视频图像处理方法对当前图像和与所述当前图像相邻的N帧参考图像进行处理得到N帧图像变换后的参考图像的步骤,直到所述第二总损失小于或等于第二预设阈值;采用上述第一方面的视频图像处理方法对当前图像和与所述当前图像相邻的N帧参考图像进行处理得到超分辨率的目标图像;根据所述目标图像和对应的真实图像计算L2损失,计算第一信息熵损失,计算所述第一胶囊网络的第一重构损失,以及根据所述L2损失、所述第一信息熵损失和所述第一重构损失计算第一总损失;以及根据所述第一总损失更新第一胶囊网络、第一注意力网络、第一运动估计网络、第一运动补偿网络和超分辨率网络中需要训练的所有参数,继续执行所述采用上述第一方面的视频图像处理方法对当前图像和与所述当前图像相邻的N帧参考图像进行处理得到超分辨率的目标图像的步骤,直到所述第一总损失小于或等于第一预设阈值。
第五方面,本申请实施例提供一种网络训练方法,包括:采用上述第二方面的视频图像处理方法对当前图像和与所述当前图像相邻的N帧参考图像进行处理得到超分辨率的目标图像;N为大于或等于1的整数;根据所述目标图像和对应的真实图像计算L2损失,计算第一信息熵损失,计算第二胶囊网络的第二重构损失,以及根据所述L2损失、所述第一信息熵损失和所述第二重构损失计算第三总损失;以及根据所述第三总损失更新第二胶囊网络、第二注意力网络、第三运动估计网络、第二运动补偿网络和超分辨率网络中需要训练的所有参数,继续执行所述采用上述第二方面的视频图像处理方法对当前图像和与所述当前图像相邻的N帧参考图像进行处理得到超分辨率的目标图像的步骤,直到所述第三总损失小于或等于第三预设阈值。
第六方面,本申请实施例提供一种网络训练方法,包括:采用上述第二方面的视频图像处理方法对当前图像和与所述当前图像相邻的N帧参考图像进行处理得到N帧图像变换后的参考图像;N为大于或等于1的整数;计算第二信息熵损失,根据所述第二信息熵损失更新所述第三运动估计网络中需要训练的所有参数,继续执行所述采用上述第二方面的视频图像处理方法对当前图像和与所述当前图像相邻的N帧参考图像进行处理得到N帧图像变换后的参考图像的步骤,直到所述第二信息熵损失小于或等于第四预设阈值;采用上述第二方面的视频图像处理方法对当前图像和与所述当前图像相邻的N帧参考图像进行处理得到所述当前图像的特征向量以及每一帧所述图像变换后的参考图像的特征向量;计算第二信息熵损失,计算第二胶囊网络的第二重构损失,以及根据所述第二信息熵损失和所述第二重构损失计算第四总损失;根据所述第四总损失更新所述第三运动估计网络以及所述第二胶囊网络中需要训练的参数,继续执行所述采用上述第二方面的视频图像处理方法对当前图像和与所述当前图像相邻的N帧参考图像进行处理得到所述当前图像的特征向量以及每一帧所述图像变换后的参考图像的特征向量的步骤,直到所述第四总损失小于或等于第五预设阈值;采用上述第二方面的视频图像处理方法对当前图像和与所述当前图像相邻的N帧参考图像进行处理得到超分辨率的目标图像;根据所述目标图像和对应的真实图像计算L2损失,计算第一信息熵损失,计算所述第二胶囊网络的第二重构损失,以及根据所述L2损失、所述第一信息熵损失和所述第二 重构损失计算第三总损失;以及根据所述第三总损失更新第二胶囊网络、第二注意力网络、第三运动估计网络、第二运动补偿网络和超分辨率网络中需要训练的所有参数,继续执行所述采用上述第二方面的视频图像处理方法对当前图像和与所述当前图像相邻的N帧参考图像进行处理得到超分辨率的目标图像的步骤,直到所述第三总损失小于或等于第三预设阈值。
第七方面,本申请实施例提供一种电子设备,包括:至少一个处理器;以及存储器,所述存储器上存储有至少一个计算机程序,当所述至少一个计算机程序被所述至少一个处理器执行时,实现上述任意一种视频图像处理方法、或上述任意一种网络训练方法。
第八方面,本申请实施例提供一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现上述任意一种视频图像处理方法、或上述任意一种网络训练方法。
附图说明
图1为本申请实施例提供的一种视频图像处理方法的流程图;
图2为本申请实施例的视频图像超分辨率处理过程中图像的变化示意图;
图3为本申请实施例的视频图像超分辨率处理过程中图像的变化示意图;
图4为本申请实施例的视频图像超分辨率处理过程中图像的变化示意图;
图5为本申请实施例提供的一种视频图像处理方法的流程图;
图6为本申请实施例的视频图像超分辨率处理过程中图像的变化示意图;
图7为本申请实施例的视频图像超分辨率处理过程中图像的变化示意图;
图8为本申请实施例的视频图像超分辨率处理过程中图像的变化示意图;
图9为本申请实施例提供的一种网络训练方法的流程图;
图10为本申请实施例提供的一种网络训练方法的流程图;
图11为本申请实施例提供的一种网络训练方法的流程图;
图12为本申请实施例提供的一种网络训练方法的流程图;以及
图13为本申请实施例提供的电子设备的组成框图。
具体实施方式
为使本领域的技术人员更好地理解本申请的技术方案,下面结合附图对本申请提供的视频图像处理方法、网络训练方法、电子设备、以及计算机可读存储介质进行详细描述。
在下文中将参考附图更充分地描述示例实施例,但是所述示例实施例能够以不同形式来体现,且本申请不应当被解释为限于本文阐述的实施例。提供这些实施例的目的在于使本申请更加透彻和完整,并使本领域技术人员充分理解本申请的范围。
在不冲突的情况下,本申请各实施例及实施例中的各特征可相互组合。
如本文所使用的,术语“和/或”包括至少一个相关列举条目的任何和所有组合。
本文所使用的术语仅用于描述特定实施例,且不意欲限制本申请。如本文所使用的,单数形式“一个”和“该”也意欲包括复数形式,除非上下文另外清楚指出。还将理解的是,当本说明书中使用术语“包括”和/或“由……制成”时,指定存在特定特征、整体、步骤、操作、元件和/或组件,但不排除存在或可添加至少一个其它特征、整体、步骤、操作、元件、组件和/或其群组。
除非另外限定,否则本文所用的所有术语(包括技术和科学术语)的含义与本领域普通技术人员通常理解的含义相同。还将理解,诸如在常用字典中限定的那些术语应当被解释为具有与其在相关技术以及本申请的背景下的含义一致的含义,且将不解释为具有理想化或过度形式上的含义,除非本文明确如此限定。
视频图像超分辨率(SR,Super Resolution)处理分两类,分别为视频图像修复和视频图像插值。视频图像插值又包括视频图像分辨率变化和视频图像帧数变化,视频图像分辨率变化例如是无极放大或缩小,视频图像帧数变化例如是插帧或抽帧。一般来说,视频超分辨率处理技术源于图像超分辨率处理技术,其目的是从一个或多个低分辨率的参考图像(Reference Image)中恢复出高分辨率的目标图 像(Target Image),视频超分辨率处理技术和图像超分辨率处理技术之间的区别也很明显,由于视频是由多个帧组成的,视频超分辨率处理技术通常利用帧间和帧内的信息对视频图像进行修复。
帧间信息的利用对视频超分辨率处理技术的性能有很大的影响。正确和充分地利用帧间信息能够提高视频超分辨率处理技术的最终结果。运动估计和运动补偿(MEMC,Motion Estimate and Motion Compensation)是视频超分辨率处理技术中非常主流的方法,运动估计(ME,Motion Estimate)的目的是提取帧间运动信息,运动补偿(MC,Motion Compensation)用于根据帧间运动信息执行帧间的扭曲操作使其对齐。
大多数运动估计技术都是通过光流方法来实现的。光流方法通过相邻帧在时域中的相关性和变化来计算相邻帧之间的运动。运动估计方法分为传统方法(如LucasKanade、Druleas等)和深度学习方法(如FlowNet、FlowNet 2.0和SpyNet等)。
光流(optical flow)是空间运动物体在观察成像平面上的像素运动的瞬时速度。光流法是利用图像序列中像素在时间域上的变化以及相邻帧之间的相关性来找到上一帧跟当前帧之间存在的对应关系,从而计算出相邻帧之间物体的运动信息的一种方法。通常将二维图像平面上特定坐标点的灰度瞬时变化率定义为光流矢量。
光流方法以两个连续帧作为输入,两个连续帧中,一个是目标帧J对应的图像j,另一个是图像j的相邻帧,即图像i,按照公式F i→j=(h i→j,v i→j)=ME(I i,I j)计算从图像i到图像j的光流。F i→j为从图像i到图像j的光流,h i→j为位移变化量的水平分量,v i→j为位移变化量的垂直分量,以及ME(·)为计算光流的函数。
MC用于根据帧间运动信息对图像i进行图像变换,使相邻帧(即图像i)在空间上与目标帧J对齐,再将图像变换后的图像i与图像j进行融合得到目标帧J,即J=MC(I i→j,F i→j)。I i→j为图像变换后的图像i,以及MC(·)为运动补偿函数。
光流方法一方面能够提高视频的单帧质量,另一方面由于考虑了时间相关性,生成的像素在时间上能够保持连续性,使得时间上匹配的像素连贯播放。
目前光流法的深度学习方法采用卷积神经网络(CNN,Convolution Neural Networks),虽然随着卷积网络的层数的加深,CNN能够学习到更为全局的上下文信息,然后利用这些上下文信息进行预测,但是因为CNN是局部连接和参数共享的,并没有考虑特征之间的相互关联和相互位置关系,CNN缺乏各个特征的层次结构信息。例如,一旦CNN的卷积核检测到了类似于眼睛、鼻子和嘴巴这种特征,相关卷积核对这些特征卷积出来的值就会很大,那么与人脸相关的神经元就相当突出,最后光流特征对齐到人脸这一类,但是CNN不会考虑到这些特征之间的相关性和结构性,有可能会陷入局部特征最优的情况,忽略了整体特征层次的相关性。
图1为本申请实施例提供的一种视频图像处理方法的流程图。
第一方面,参照图1,本申请实施例提供一种视频图像处理方法,包括步骤100至102。
步骤100、采用第一胶囊网络cap-net1对当前图像
Figure PCTCN2022114827-appb-000001
和与当前图像
Figure PCTCN2022114827-appb-000002
相邻的N帧参考图像进行特征提取,得到当前图像的特征向量
Figure PCTCN2022114827-appb-000003
以及每一帧参考图像的特征向量;N为大于或等于1的整数。
在本申请实施例中,与当前图像
Figure PCTCN2022114827-appb-000004
相邻的N帧参考图像是指时间上与当前图像
Figure PCTCN2022114827-appb-000005
相邻的N帧参考图像,例如,时间上位于当前图像
Figure PCTCN2022114827-appb-000006
之前的N帧参考图像
Figure PCTCN2022114827-appb-000007
Figure PCTCN2022114827-appb-000008
相应的参考图像的特征向量分别为
Figure PCTCN2022114827-appb-000009
或者,时间上位于当前图像
Figure PCTCN2022114827-appb-000010
之后的N帧参考图像
Figure PCTCN2022114827-appb-000011
相应的参考图像的特征向量分别为
Figure PCTCN2022114827-appb-000012
或者,时间上位于当前图像
Figure PCTCN2022114827-appb-000013
之前的M帧参考图像
Figure PCTCN2022114827-appb-000014
以及时间上位于当前图像
Figure PCTCN2022114827-appb-000015
之后的(N-M)帧参考图像
Figure PCTCN2022114827-appb-000016
相应的参考图像的特征 向量分别为
Figure PCTCN2022114827-appb-000017
以及
Figure PCTCN2022114827-appb-000018
M为大于或等于1,且小于或等于N的整数。图2以N取1为例给出了视频图像超分辨率处理过程中图像的变化示意图。
在一些实施方式中,对当前图像
Figure PCTCN2022114827-appb-000019
进行特征提取所采用的第一胶囊网络cap-net1与对参考图像进行特征提取所采用的第一胶囊网络cap-net1属于同一个胶囊网络,或属于不同的胶囊网络;对不同参考图像进行特征提取所采用的第一胶囊网络cap-net1属于同一个胶囊网络,或属于不同的胶囊网络。
在本申请实施例中,当对当前图像
Figure PCTCN2022114827-appb-000020
进行特征提取所采用的第一胶囊网络cap-net1与对参考图像进行特征提取所采用的第一胶囊网络cap-net1属于同一个胶囊网络,且对不同参考图像进行特征提取所采用的第一胶囊网络cap-net1属于同一个胶囊网络时,采用第一胶囊网络cap-net1对当前图像
Figure PCTCN2022114827-appb-000021
和与当前图像
Figure PCTCN2022114827-appb-000022
相邻的N帧参考图像进行特征提取得到当前图像的特征向量
Figure PCTCN2022114827-appb-000023
以及每一帧参考图像的特征向量是指将当前图像
Figure PCTCN2022114827-appb-000024
和与当前图像
Figure PCTCN2022114827-appb-000025
相邻的N帧参考图像依次输入到第一胶囊网络cap-net1得到对应的特征向量。也就是说,第一胶囊网络cap-net1一次只能处理一帧图像得到对应的特征向量。
在一些实施方式中,第一胶囊网络cap-net1包括以下至少之一:卷积层、主胶囊层、或者数字胶囊层等。
在本申请实施例中,主胶囊层也称为底层胶囊层,数字胶囊层也称为高层胶囊层。
步骤101、针对每一帧参考图像,采用第一注意力网络att-net1对当前图像
Figure PCTCN2022114827-appb-000026
的特征向量
Figure PCTCN2022114827-appb-000027
以及参考图像的特征向量进行相关性处理,得到当前图像
Figure PCTCN2022114827-appb-000028
的特征向量
Figure PCTCN2022114827-appb-000029
与参考图像的特征向量之间的第一相关性向量;采用第一运动估计网络ME-net1对第一相关性向量进行运动估计处理得到第一帧间运动信息;根据第一帧间运动信息对参考图像进行图像变换warp得到图像变换后的参考图像。
在一些实施方式中,第一注意力网络att-net1采用通道注意力(channel attention)机制、或空间注意力(spatial attention)机制等中的至少一个构建得到。
在一些实施方式中,第一注意力网络att-net1采用点乘计算实现相关性处理。
在本申请实施例中,参考图像
Figure PCTCN2022114827-appb-000030
对应的第一相关性向量分别为
Figure PCTCN2022114827-appb-000031
参考图像
Figure PCTCN2022114827-appb-000032
对应的第一相关性向量分别为
Figure PCTCN2022114827-appb-000033
参考图像
Figure PCTCN2022114827-appb-000034
以及
Figure PCTCN2022114827-appb-000035
对应的第一相关性向量分别为
Figure PCTCN2022114827-appb-000036
Figure PCTCN2022114827-appb-000037
以及
Figure PCTCN2022114827-appb-000038
在一些实施方式中,第一运动估计网络ME-net1采用以下至少之一实现:普通方法、传统方法、或光流法等。普通方法如Res-net,传统方法如LucasKanade、Druleas等,光流法如FlowNet、FlowNet 2.0和SpyNet。
在一些实施方式中,采用第一运动估计网络ME-net1直接对第一相关性向量进行运动估计处理得到第一帧间运动信息,如图2所示;或者,采用第二运动估计网络ME-net2对所述当前图像
Figure PCTCN2022114827-appb-000039
和所述参考图像进行特征提取和相关性处理得到所述当前图像和所述参考图像之间的第二相关性向量;将所述第一相关性向量和所述第二相关性向量进行点乘计算得到第三相关性向量;采用所述第一运动估计网络ME-net1对所述第三相关性向量进行运动估计处理得到所述第一帧间运动信息,如图3所示;或者,将所述第一相关性向量和所述当前图像进行点乘计算得到新的当前图像;将所述第一相关性向量和所述参考图像进行点乘计算得到新的参考图像;采用第二运动估计网络ME-net2对所述新的当前图像和所述新的参考图像进行特征提取和相关性处理得到所述新的当前图像和所述新的参考图像之间的第四相关性 向量;采用所述第一运动估计网络ME-net1对所述第四相关性向量进行运动估计处理得到所述第一帧间运动信息,如图4所示。
在本申请实施例中,参考图像
Figure PCTCN2022114827-appb-000040
对应的第一帧间运动信息分别为
Figure PCTCN2022114827-appb-000041
对应的图像变换后的参考图像分别为
Figure PCTCN2022114827-appb-000042
对应的第二相关性向量分别为
Figure PCTCN2022114827-appb-000043
Figure PCTCN2022114827-appb-000044
参考图像
Figure PCTCN2022114827-appb-000045
对应的第一帧间运动信息分别为
Figure PCTCN2022114827-appb-000046
对应的图像变换后的参考图像分别为
Figure PCTCN2022114827-appb-000047
对应的第二相关性向量分别为
Figure PCTCN2022114827-appb-000048
Figure PCTCN2022114827-appb-000049
参考图像
Figure PCTCN2022114827-appb-000050
以及
Figure PCTCN2022114827-appb-000051
Figure PCTCN2022114827-appb-000052
对应的第一帧间运动信息分别为
Figure PCTCN2022114827-appb-000053
以及
Figure PCTCN2022114827-appb-000054
对应的图像变换后的参考图像分别为
Figure PCTCN2022114827-appb-000055
以及
Figure PCTCN2022114827-appb-000056
对应的第二相关性向量分别为
Figure PCTCN2022114827-appb-000057
以及
Figure PCTCN2022114827-appb-000058
Figure PCTCN2022114827-appb-000059
步骤102、采用第一运动补偿网络MC-net1对当前图像
Figure PCTCN2022114827-appb-000060
和所有图像变换后的参考图像进行融合处理得到第一融合图像
Figure PCTCN2022114827-appb-000061
采用超分辨率网络P-net对第一融合图像
Figure PCTCN2022114827-appb-000062
进行超分辨率处理得到超分辨率的目标图像
Figure PCTCN2022114827-appb-000063
在一些实施方式中,第一运动补偿网络MC-net1是卷积神经网络Cnn-net、或者循环神经网络(Recurrent Neural Network,Rnn-net)中的至少一个。
在一些实施方式中,超分辨率处理包括分辨率缩放、插帧或增强中的至少一个。
分辨率缩放是指添加图像的上采样部分(upsample)或下采样部分(downsample),插帧是指添加图像插入部分(interpolated),增强是指添加图像修复部分。
在一些实施方式中,采用第一运动补偿网络MC-net1对当前图像
Figure PCTCN2022114827-appb-000064
和所有图像变换后的参考图像进行融合处理得到第一融合图像
Figure PCTCN2022114827-appb-000065
包括:针对每一帧参考图像,根据第一相关性向量确定参考图像的权重;以及采用第一运动补偿网络MC-net1,根据所有参考图像的权重对当前图像
Figure PCTCN2022114827-appb-000066
和所有图像变换后的参考图像进行融合处理得到第一融合图像
Figure PCTCN2022114827-appb-000067
在一些实施方式中,参考图像的权重为第一相关性向量的平均值。例如,当前图像和参考图像均为64×64×3的向量,那么第一相关性向量为64×64×C的向量,那么参考图像的权重应该是64×64×1,也就是将第一相关性向量在C对应的维度上进行平均计算。
在本申请实施例中,能够采用多种方式实现根据所有参考图像的权重对当前图像
Figure PCTCN2022114827-appb-000068
和所有图像变换后的参考图像进行融合处理得到第一融合图像
Figure PCTCN2022114827-appb-000069
例如,以参考图像的权重为图像变换后的参考图像的系数,将当前图像
Figure PCTCN2022114827-appb-000070
和所有图像变换后的参考图像进行加权平均得到第一融合图像
Figure PCTCN2022114827-appb-000071
或者,分别从当前图像
Figure PCTCN2022114827-appb-000072
和所有图像变换后的参考图像中提取对应的特征,以参考图像的权重为图像变换后的参考图像对应的特征的系数,将当前图像
Figure PCTCN2022114827-appb-000073
对应的特征和所有图像变换后的参考图像对应的特征进行加权平均得到第一融合图像
Figure PCTCN2022114827-appb-000074
本申请实施例提供的视频图像处理方法,采用胶囊网络得到对应图像的特征 向量,结合注意力网络提升了运动估计网络中对特征的对齐效果,从而避免了陷入局部特征最优解的情况,考虑了整体特征层次的相关性。
图5为本申请实施例提供的一种视频图像处理方法的流程图。
第二方面,参照图5,本申请实施例提供一种视频图像处理方法,包括步骤500至503。
步骤500、针对与当前图像
Figure PCTCN2022114827-appb-000075
相邻的每一帧参考图像,采用第三运动估计网络ME-net3对当前图像
Figure PCTCN2022114827-appb-000076
和参考图像进行运动估计处理得到第二帧间运动信息;根据第二帧间运动信息对参考图像进行图像变换warp得到图像变换后的参考图像。
在本申请实施例中,第三运动估计网络ME-net3相当于上述视频图像处理方法中第一运动估计网络ME-net1和第二运动估计网络ME-net2的叠加。也就是说,第三运动估计网络ME-net3实际上是先对当前图像
Figure PCTCN2022114827-appb-000077
和参考图像进行特征提取和相关性处理,再进行运动估计处理才得到第二帧间运动信息。
在本申请实施例中,假设与当前图像
Figure PCTCN2022114827-appb-000078
相邻的参考图像为N帧,与当前图像
Figure PCTCN2022114827-appb-000079
相邻的N帧参考图像是指时间上与当前图像
Figure PCTCN2022114827-appb-000080
相邻的N帧参考图像,例如,时间上位于当前图像
Figure PCTCN2022114827-appb-000081
之前的N帧参考图像
Figure PCTCN2022114827-appb-000082
相应的第二帧间运动信息分别为
Figure PCTCN2022114827-appb-000083
对应的图像变换后的参考图像分别为
Figure PCTCN2022114827-appb-000084
或者,时间上位于当前图像
Figure PCTCN2022114827-appb-000085
之后的N帧参考图像
Figure PCTCN2022114827-appb-000086
相应的第二帧间运动信息分别为
Figure PCTCN2022114827-appb-000087
对应的图像变换后的参考图像分别为
Figure PCTCN2022114827-appb-000088
或者,时间上位于当前图像
Figure PCTCN2022114827-appb-000089
之前的M帧参考图像
Figure PCTCN2022114827-appb-000090
以及时间上位于当前图像
Figure PCTCN2022114827-appb-000091
之后的(N-M)帧参考图像
Figure PCTCN2022114827-appb-000092
相应的第二帧间运动信息分别为
Figure PCTCN2022114827-appb-000093
Figure PCTCN2022114827-appb-000094
以及
Figure PCTCN2022114827-appb-000095
对应的图像变换后的参考图像分别为
Figure PCTCN2022114827-appb-000096
以及
Figure PCTCN2022114827-appb-000097
M为大于或等于1,且小于或等于N的整数。图6以N取1为例给出了视频图像超分辨率处理过程中图像的变化示意图。
步骤501、采用第二胶囊网络cap-net2对当前图像
Figure PCTCN2022114827-appb-000098
和所有图像变换后的参考图像进行特征提取,得到当前图像的特征向量
Figure PCTCN2022114827-appb-000099
以及每一帧图像变换后的参考图像的特征向量。
在一些实施方式中,对当前图像
Figure PCTCN2022114827-appb-000100
进行特征提取所采用的第二胶囊网络cap-net2与对图像变换后的参考图像进行特征提取所采用的第二胶囊网络属于同一个胶囊网络,或属于不同的胶囊网络;对不同图像变换后的参考图像进行特征提取所采用的第二胶囊网络cap-net2属于同一个胶囊网络,或属于不同的胶囊网络。
在本申请实施例中,当对当前图像
Figure PCTCN2022114827-appb-000101
进行特征提取所采用的第二胶囊网络cap-net2与对图像变换后的参考图像进行特征提取所采用的第二胶囊网络cap-net2属于同一个胶囊网络,且对不同图像变换后的参考图像进行特征提取所采用的第二胶囊网络cap-net2属于同一个胶囊网络时,采用第二胶囊网络cap-net2对当前图像
Figure PCTCN2022114827-appb-000102
和所有图像变换后的参考图像进行特征提取得到当前图像的特征向量
Figure PCTCN2022114827-appb-000103
以及每一帧图像变换后的参考图像的特征向量是指将当前图像
Figure PCTCN2022114827-appb-000104
和所有图像变换后的参考图像依次输入到第二胶囊网络cap-net2得到对应的特征向量。也就是说,第二胶囊网络cap-net2一次只能处理一帧图像得到对应的特征向量。
在一些实施方式中,第二胶囊网络cap-net2包括以下至少之一:卷积层、主胶囊层、或数字胶囊层等。
在本申请实施例中,主胶囊层也称为底层胶囊层,数字胶囊层也称为高层胶囊层。
在本申请实施例中,参考图像
Figure PCTCN2022114827-appb-000105
对应的图像变换后的参考图像的特征向量分别为
Figure PCTCN2022114827-appb-000106
参考图像
Figure PCTCN2022114827-appb-000107
Figure PCTCN2022114827-appb-000108
对应的图像变换后的参考图像的特征向量分别为
Figure PCTCN2022114827-appb-000109
参考图像
Figure PCTCN2022114827-appb-000110
以及
Figure PCTCN2022114827-appb-000111
对应的图像变换后的参考图像的特征向量分别为
Figure PCTCN2022114827-appb-000112
以及
Figure PCTCN2022114827-appb-000113
Figure PCTCN2022114827-appb-000114
步骤502、针对每一帧图像变换后的参考图像,采用第二注意力网络att-net2对当前图像的特征向量
Figure PCTCN2022114827-appb-000115
以及图像变换后的参考图像的特征向量进行相关性处理,得到当前图像的特征向量
Figure PCTCN2022114827-appb-000116
与图像变换后的参考图像的特征向量之间的第五相关性向量。
在一些实施方式中,第二注意力网络att-net2采用通道注意力(channel attention)机制、或空间注意力(spatial attention)机制等中的至少一个构建得到。
在本申请实施例中,参考图像
Figure PCTCN2022114827-appb-000117
对应的第五相关性向量分别为
Figure PCTCN2022114827-appb-000118
参考图像
Figure PCTCN2022114827-appb-000119
对应的第五相关性向量分别为
Figure PCTCN2022114827-appb-000120
参考图像
Figure PCTCN2022114827-appb-000121
以及
Figure PCTCN2022114827-appb-000122
对应的第五相关性向量分别为
Figure PCTCN2022114827-appb-000123
Figure PCTCN2022114827-appb-000124
以及
Figure PCTCN2022114827-appb-000125
步骤503、采用第二运动补偿网络MC-net2,根据所有第五相关性向量对当前图像
Figure PCTCN2022114827-appb-000126
和所有图像变换后的参考图像进行融合处理得到第二融合图像
Figure PCTCN2022114827-appb-000127
采用超分辨率网络P-net对第二融合图像
Figure PCTCN2022114827-appb-000128
进行超分辨率处理得到超分辨率的目标图像
Figure PCTCN2022114827-appb-000129
在一些实施方式中,采用第二运动补偿网络MC-net2,直接根据所有第五相关性向量对当前图像
Figure PCTCN2022114827-appb-000130
和所有图像变换后的参考图像进行融合处理得到第二融合图像
Figure PCTCN2022114827-appb-000131
如图6所示;或者,采用第三运动补偿网络MC-net3对所述当前图像
Figure PCTCN2022114827-appb-000132
和所述图像变换后的参考图像进行特征提取和相关性处理得到所述当前图像和所述图像变换后的参考图像之间的第六相关性向量;将所述第五相关性向量和所述第六相关性向量进行点乘计算得到第七相关性向量;采用第二运动补偿网络MC-net2,根据所有所述第七相关性向量对所述当前图像
Figure PCTCN2022114827-appb-000133
和所有所述图像变换后的参考图像进行融合处理得到第二融合图像
Figure PCTCN2022114827-appb-000134
如图7所示;或者,将所述第五相关性向量和所述当前图像
Figure PCTCN2022114827-appb-000135
进行点乘计算得到新的当前图像;将所述第五相关性向量和所述图像变换后的参考图像进行点乘处理得到新的图像变换后的参考图像;采用第三运动补偿网络MC-net3对所述新的当前图像和所述新的图像变换后的参考图像进行特征提取和相关性处理,得到所述新的当前图像和所述新的图像变换后的参考图像之间的第八相关性向量;采用第二运动补偿网络MC-net2,根据所有所述第八相关性向量对所述当前图像
Figure PCTCN2022114827-appb-000136
和所有所述图像变换后的参考图像进行融合处理得到第二融合图像
Figure PCTCN2022114827-appb-000137
如图8所示。
在一些实施方式中,采用第二运动补偿网络MC-net2,根据所有第五相关性向量对当前图像
Figure PCTCN2022114827-appb-000138
和所有图像变换后的参考图像进行融合处理得到第二融合图像
Figure PCTCN2022114827-appb-000139
包括:针对每一帧图像变换后的参考图像,根据第五相关性向量确定图像变换后的参考图像的权重;采用第二运动补偿网络MC-net2,根据所有图像变换后的参考图像的权重对当前图像
Figure PCTCN2022114827-appb-000140
和所有图像变换后的参考图像进行融合处理得到第二融合图像
Figure PCTCN2022114827-appb-000141
在一些实施方式中,图像变换后的参考图像的权重为第五相关性向量的平均值。例如,当前图像和图像变换后的参考图像均为64×64×3的向量,那么第五相关性向量为64×64×C的向量,那么图像变换后的参考图像的权重应该是64×64×1,也就是将第五相关性向量在C对应的维度上进行平均计算。
在本申请实施例中,能够采用多种方式实现根据所有图像变换后的参考图像的权重对当前图像
Figure PCTCN2022114827-appb-000142
和所有图像变换后的参考图像进行融合处理得到第二融合图像
Figure PCTCN2022114827-appb-000143
例如,以图像变换后的参考图像的权重为图像变换后的参考图像的系数,将当前图像
Figure PCTCN2022114827-appb-000144
和所有图像变换后的参考图像进行加权平均得到第二融合图像
Figure PCTCN2022114827-appb-000145
或者,分别从当前图像
Figure PCTCN2022114827-appb-000146
和所有图像变换后的参考图像中提取对应的特征,以图像变换后的参考图像的权重为图像变换后的参考图像对应的特征的系数,将当前图像
Figure PCTCN2022114827-appb-000147
对应的特征和所有图像变换后的参考图像对应的特征进行加权平均得到第二融合图像
Figure PCTCN2022114827-appb-000148
在本申请实施例中,根据所有所述第七相关性向量对所述当前图像
Figure PCTCN2022114827-appb-000149
和所有所述图像变换后的参考图像进行融合处理得到第二融合图像
Figure PCTCN2022114827-appb-000150
以及根据所有所述第八相关性向量对所述当前图像
Figure PCTCN2022114827-appb-000151
和所有所述图像变换后的参考图像进行融合处理得到第二融合图像
Figure PCTCN2022114827-appb-000152
的实现过程与根据所有所述第五相关性向量对所述当前图像
Figure PCTCN2022114827-appb-000153
和所有所述图像变换后的参考图像进行融合处理得到第二融合图像
Figure PCTCN2022114827-appb-000154
的实现过程类似,这里不再赘述。
本申请实施例提供的视频图像处理方法,采用胶囊网络得到对应图像的特征向量,结合注意力网络提升了运动估计网络中对特征的对齐效果,从而避免了陷入局部特征最优解的情况,考虑了整体特征层次的相关性。
图9为本申请实施例提供的一种网络训练方法的流程图。
第三方面,参照图9,本申请实施例提供一种网络训练方法,包括步骤900至902。
步骤900、采用上述步骤100至102中的视频图像处理方法对当前图像和与当前图像相邻的N帧参考图像进行处理得到超分辨率的目标图像;N为大于或等于1的整数。
步骤901、根据目标图像和对应的真实图像计算L2损失,计算第一信息熵损失,计算第一胶囊网络的第一重构损失,以及根据L2损失、第一信息熵损失和第一重构损失计算第一总损失。
在一些实施方式中,根据目标图像和对应的真实图像计算L2损失包括:按照公式
Figure PCTCN2022114827-appb-000155
计算L2损失;Loss SR为L2损失,H为目标图像的高度,W为目标图像的宽度,
Figure PCTCN2022114827-appb-000156
为目标图像的第i行第j列对应的像素值,
Figure PCTCN2022114827-appb-000157
为真实图像的第i行第j列对应的像素值,以及|| ||为平方函数。
在一些实施方式中,计算第一信息熵损失包括以下任意一个:
根据目标图像和真实图像计算第一信息熵损失;
根据目标图像和当前图像计算第一信息熵损失;或者
根据参考图像和图像变换后的参考图像计算第一信息熵损失。
在一些实施方式中,根据目标图像和真实图像计算第一信息熵损失包括:按 照公式
Figure PCTCN2022114827-appb-000158
计算第一信息熵损失;Loss in为第一信息熵损失,
Figure PCTCN2022114827-appb-000159
为目标图像的信息熵,以及
Figure PCTCN2022114827-appb-000160
为真实图像的信息熵。
在一些实施方式中,根据目标图像和当前图像计算第一信息熵损失包括:按照公式
Figure PCTCN2022114827-appb-000161
计算第一信息熵损失;Loss in为第一信息熵损失,
Figure PCTCN2022114827-appb-000162
为目标图像的信息熵,以及
Figure PCTCN2022114827-appb-000163
为当前图像的信息熵。
在一些实施方式中,根据参考图像和图像变换后的参考图像计算第一信息熵损失包括:按照公式
Figure PCTCN2022114827-appb-000164
计算第一信息熵损失;Loss in为第一信息熵损失,
Figure PCTCN2022114827-appb-000165
为第k个参考图像的信息熵,以及
Figure PCTCN2022114827-appb-000166
为第k个图像变换后的参考图像的信息熵。
按照公式
Figure PCTCN2022114827-appb-000167
计算图像x的信息熵。
Figure PCTCN2022114827-appb-000168
像素值x i分布在0至N的范围内,P xi为像素值为x i的概率,这里只取P xi不为0的情况,并且公式中的对数一般取2为底。
在一些实施方式中,计算第一胶囊网络的第一重构损失包括以下任意一个:
根据参考图像和参考图像的特征向量计算第一重构损失;或者
根据当前图像和当前图像的特征向量计算第一重构损失。
在一些实施方式中,根据当前图像和当前图像的特征向量计算第一重构损失包括:按照公式
Figure PCTCN2022114827-appb-000169
计算第一重构损失;Loss recon为第一重构损失,H为当前图像的高度,W为当前图像的宽度,
Figure PCTCN2022114827-appb-000170
为当前图像的第i行第j列的像素值,以及
Figure PCTCN2022114827-appb-000171
为当前图像的特征向量的第i行第j列的像素值。
在一些实施方式中,根据第k个参考图像和第k个参考图像的特征向量计算第一重构损失包括:按照公式
Figure PCTCN2022114827-appb-000172
计算第一重构损失;Loss recon为第一重构损失,H为第k个参考图像的高度,W为第k个参考图像的宽度,
Figure PCTCN2022114827-appb-000173
为第k个参考图像的第i行第j列的像素值,以及
Figure PCTCN2022114827-appb-000174
为第k个参考图像的特征向量的第i行第j列的像素值。
在一些实施方式中,根据L2损失、第一信息熵损失和第一重构损失计算第一总损失包括:将L2损失、第一信息熵损失和第一重构损失相加得到第一总损失。
步骤902、根据第一总损失更新第一胶囊网络、第一注意力网络、第一运动估计网络、第一运动补偿网络和超分辨率网络中需要训练的所有参数,继续执行上述步骤900,直到第一总损失小于或等于第一预设阈值。
在一些实施方式中,根据第一总损失更新第一胶囊网络、第一注意力网络、第一运动估计网络、第一运动补偿网络和超分辨率网络中需要训练的所有参数。在一些实施方式中,根据第一总损失更新第一胶囊网络、第一注意力网络、第一运动估计网络、第二运动估计网络、第一运动补偿网络和超分辨率网络中需要训练的所有参数。
本申请实施例提供的网络训练方法,基于L2损失、信息熵损失和胶囊网络的重构损失计算用于更新训练参数的总损失,由于胶囊网络的重构损失保证了基于胶囊网络的输出重构的图像与输入到胶囊网络的图像的一致性,也就是保证了胶囊网络对特征提取的准确性,信息熵损失既保证了输出的目标图像与当前图像的基本特征的一致性,即空域信息的波动较小,也保证了输出的目标图像与参考图像的基本特征的一致性,即时域信息的波动较小,从而基于训练好的网络进行视频图像超分辨率处理时,得到空域信息和时域信息的波动较小的目标图像,提高视频图像处理的效果。
图10为本申请实施例提供的一种网络训练方法的流程图。
第四方面,参照图10,本申请实施例提供一种网络训练方法,包括步骤1000至1005。
步骤1000、基于训练好的第一胶囊网络,采用上述步骤100至102中的视频图像处理方法对当前图像和与当前图像相邻的N帧参考图像进行处理得到N帧图像变换后的参考图像;N为大于或等于1的整数。
步骤1001、计算第一胶囊网络的第一重构损失,计算第二信息熵损失,以及根据第一重构损失和第二信息熵损失计算第二总损失。
在一些实施方式中,计算第二信息熵损失包括:根据参考图像和图像变换后的参考图像计算第二信息熵损失。
在一些实施方式中,根据参考图像和图像变换后的参考图像计算第二信息熵损失包括:按照公式
Figure PCTCN2022114827-appb-000175
计算第二信息熵损失;Loss in为第二信息熵损失,
Figure PCTCN2022114827-appb-000176
为第k个参考图像的信息熵,以及
Figure PCTCN2022114827-appb-000177
为第k个图像变换后的参考图像的信息熵。
按照公式
Figure PCTCN2022114827-appb-000178
计算图像x的信息熵。
Figure PCTCN2022114827-appb-000179
像素值x i分布在0至N的范围内,P xi为像素值为x i的概率,这里只取P xi不为0的情况,并且公式中的对数一般取2为底。
在一些实施方式中,计算第一胶囊网络的第一重构损失包括以下任意一个:
根据参考图像和参考图像的特征向量计算第一重构损失;或者
根据当前图像和当前图像的特征向量计算第一重构损失。
在一些实施方式中,根据当前图像和当前图像的特征向量计算第一重构损失包括:按照公式
Figure PCTCN2022114827-appb-000180
计算第一重构损失;Loss recon为第一重构损失,H为当前图像的高度,W为当前图像的宽度,
Figure PCTCN2022114827-appb-000181
为当前图像的第i行第j列的像素值,以及
Figure PCTCN2022114827-appb-000182
为当前图像的特征向量的第i行第j列的像素值。
在一些实施方式中,根据第k个参考图像和第k个参考图像的特征向量计算第一重构损失包括:按照公式
Figure PCTCN2022114827-appb-000183
计算第一重构损失;Loss recon为第一重构损失,H为第k个参考图像的高度,W为第k个参考图像的宽度,
Figure PCTCN2022114827-appb-000184
为第k个参考图像的第i行第j列的像素值,以及
Figure PCTCN2022114827-appb-000185
为第k个参考图像的特征向量的第i行第j列的像素值。
在一些实施方式中,根据第一重构损失和第二信息熵损失计算第二总损失包括:将第一重构损失和第二信息熵损失相加得到第二总损失。
步骤1002、根据第二总损失更新第一胶囊网络、第一注意力网络和第一运动估计网络中需要训练的所有参数,继续执行上述步骤1000,直到第二总损失小于或等于第二预设阈值。
在一些实施方式中,根据第二总损失更新第一胶囊网络、第一注意力网络和第一运动估计网络中需要训练的所有参数。在另一些实施方式中,根据第二总损失更新第一胶囊网络、第一注意力网络、第一运动估计网络和第二运动估计网络中需要训练的所有参数。
步骤1003、采用上述步骤100至102中的视频图像处理方法对当前图像和与当前图像相邻的N帧参考图像进行处理得到超分辨率的目标图像。
步骤1004、根据目标图像和对应的真实图像计算L2损失,计算第一信息熵损失,计算第一胶囊网络的第一重构损失,以及根据L2损失、第一信息熵损失和第一重构损失计算第一总损失。
在一些实施方式中,根据目标图像和对应的真实图像计算L2损失包括:按照 公式
Figure PCTCN2022114827-appb-000186
计算L2损失;Loss SR为L2损失,H为目标图像的高度,W为目标图像的宽度,
Figure PCTCN2022114827-appb-000187
为目标图像的第i行第j列对应的像素值,
Figure PCTCN2022114827-appb-000188
为真实图像的第i行第j列对应的像素值,以及|| ||为平方函数。
在一些实施方式中,计算第一信息熵损失包括以下任意一个:
根据目标图像和真实图像计算第一信息熵损失;
根据目标图像和当前图像计算第一信息熵损失;或者
根据参考图像和图像变换后的参考图像计算第一信息熵损失。
在一些实施方式中,根据目标图像和真实图像计算第一信息熵损失包括:按照公式
Figure PCTCN2022114827-appb-000189
计算第一信息熵损失;Loss in为第一信息熵损失,
Figure PCTCN2022114827-appb-000190
为目标图像的信息熵,以及
Figure PCTCN2022114827-appb-000191
为真实图像的信息熵。
在一些实施方式中,根据目标图像和当前图像计算第一信息熵损失包括:按照公式
Figure PCTCN2022114827-appb-000192
计算第一信息熵损失;Loss in为第一信息熵损失,
Figure PCTCN2022114827-appb-000193
为目标图像的信息熵,以及
Figure PCTCN2022114827-appb-000194
为当前图像的信息熵。
在一些实施方式中,根据参考图像和图像变换后的参考图像计算第一信息熵损失包括:按照公式
Figure PCTCN2022114827-appb-000195
计算第一信息熵损失;Loss in为第一信息熵损失,
Figure PCTCN2022114827-appb-000196
为第k个参考图像的信息熵,以及
Figure PCTCN2022114827-appb-000197
为第k个图像变换后的参考图像的信息熵。
在一些实施方式中,根据L2损失、第一信息熵损失和第一重构损失计算第一总损失包括:将L2损失、第一信息熵损失和第一重构损失相加得到第一总损失。
步骤1005、根据第一总损失更新第一胶囊网络、第一注意力网络、第一运动估计网络、第一运动补偿网络和超分辨率网络中需要训练的所有参数,继续执行上述步骤1000,直到第一总损失小于或等于第一预设阈值。
在一些实施方式中,根据第一总损失更新第一胶囊网络、第一注意力网络、第一运动估计网络、第一运动补偿网络和超分辨率网络中需要训练的所有参数。在一些实施方式中,根据第一总损失更新第一胶囊网络、第一注意力网络、第一运动估计网络、第二运动估计网络、第一运动补偿网络和超分辨率网络中需要训练的所有参数。
本申请实施例提供的网络训练方法,对视频图像处理方法中的网络进行分阶段训练,进一步提高了训练效果。
图11为本申请实施例提供的一种网络训练方法的流程图。
第五方面,参照图11,本申请实施例提供一种网络训练方法,包括步骤1100至1102。
步骤1100、采用上述步骤500至503中的视频图像处理方法对当前图像和与当前图像相邻的N帧参考图像进行处理得到超分辨率的目标图像;N为大于或等于1的整数。
步骤1101、根据目标图像和对应的真实图像计算L2损失,计算第一信息熵损失,计算第二胶囊网络的第二重构损失,以及根据L2损失、第一信息熵损失和第二重构损失计算第三总损失。
在一些实施方式中,根据目标图像和对应的真实图像计算L2损失包括:按照公式
Figure PCTCN2022114827-appb-000198
计算L2损失;Loss SR为L2损失,H为目标图像的高度,W为目标图像的宽度,
Figure PCTCN2022114827-appb-000199
为目标图像的第i行第j列对应的像素值,
Figure PCTCN2022114827-appb-000200
为真实图像的第i行第j列对应的像素值,以及|| ||为平方函数。
在一些实施方式中,计算第一信息熵损失包括以下任意一个:
根据目标图像和真实图像计算第一信息熵损失;
根据目标图像和当前图像计算第一信息熵损失;或者
根据参考图像和图像变换后的参考图像计算第一信息熵损失。
在一些实施方式中,根据目标图像和真实图像计算第一信息熵损失包括:按照公式
Figure PCTCN2022114827-appb-000201
计算第一信息熵损失;Loss in为第一信息熵损失,
Figure PCTCN2022114827-appb-000202
为目标图像的信息熵,以及
Figure PCTCN2022114827-appb-000203
为真实图像的信息熵。
在一些实施方式中,根据目标图像和当前图像计算第一信息熵损失包括:按照公式
Figure PCTCN2022114827-appb-000204
计算第一信息熵损失;Loss in为第一信息熵损失,
Figure PCTCN2022114827-appb-000205
为目标图像的信息熵,以及
Figure PCTCN2022114827-appb-000206
为当前图像的信息熵。
在一些实施方式中,根据参考图像和图像变换后的参考图像计算第一信息熵损失包括:按照公式
Figure PCTCN2022114827-appb-000207
计算第一信息熵损失;Loss in为第一信息熵损失,
Figure PCTCN2022114827-appb-000208
为第k个参考图像的信息熵,以及
Figure PCTCN2022114827-appb-000209
为第k个图像变换后的参考图像的信息熵。
按照公式
Figure PCTCN2022114827-appb-000210
计算图像x的信息熵。
Figure PCTCN2022114827-appb-000211
像素值x i分布在0至N的范围内,P xi为像素值为x i的概率,这里只取P xi不为0的情况,并且公式中的对数一般取2为底。
在一些实施方式中,计算第二胶囊网络的第二重构损失包括以下任意一个:
根据图像变换后的参考图像和图像变换后的参考图像的特征向量计算第二重构损失;或者
根据当前图像和当前图像的特征向量计算第二重构损失。
在一些实施方式中,根据当前图像和当前图像的特征向量计算第二重构损失包括:按照公式
Figure PCTCN2022114827-appb-000212
计算第二重构损失;Loss recon为第二重构损失,H为当前图像的高度,W为当前图像的宽度,
Figure PCTCN2022114827-appb-000213
为当前图像的第i行第j列的像素值,以及
Figure PCTCN2022114827-appb-000214
为当前图像的特征向量的第i行第j列的像素值。
在一些实施方式中,根据第k个图像变换后的参考图像和第k个图像变换后的参考图像的特征向量计算第二重构损失包括:按照公式
Figure PCTCN2022114827-appb-000215
计算第二重构损失;Loss recon为第二重构损失,H为第k个图像变换后的参考图像的高度,W为第k个图像变换后的参考图像的宽度,
Figure PCTCN2022114827-appb-000216
为第k个图像变换后的参考图像的第i行第j列的像素值,以及
Figure PCTCN2022114827-appb-000217
为第k个图像变换后的参考图像的特征向量的第i行第j列的像素值。
在一些实施方式中,根据L2损失、第一信息熵损失和第二重构损失计算第三总损失包括:将L2损失、第一信息熵损失和第二重构损失相加得到第三总损失。
步骤1102、根据第三总损失更新第二胶囊网络、第二注意力网络、第三运动估计网络、第二运动补偿网络和超分辨率网络中需要训练的所有参数,继续执行上述步骤1100,直到第三总损失小于或等于第三预设阈值。
在一些实施方式中,根据第三总损失更新第二胶囊网络、第二注意力网络、第三运动估计网络、第二运动补偿网络和超分辨率网络中需要训练的所有参数。在一些实施方式中,根据第三总损失更新第二胶囊网络、第二注意力网络、第三运动估计网络、第二运动补偿网络、第三运动补偿网络和超分辨率网络中需要训练的所有参数。
本申请实施例提供的网络训练方法,基于L2损失、信息熵损失和胶囊网络的重构损失计算用于更新训练参数的总损失,由于胶囊网络的重构损失保证了基于胶囊网络的输出重构的图像与输入到胶囊网络的图像的一致性,也就是保证了胶囊网 络对特征提取的准确性,信息熵损失既保证了输出的目标图像与当前图像的基本特征的一致性,即空域信息的波动较小,也保证了输出的目标图像与参考图像的基本特征的一致性,即时域信息的波动较小,从而基于训练好的网络进行视频图像超分辨率处理时,得到空域信息和时域信息的波动较小的目标图像,提高视频图像处理的效果。
图12为本申请实施例提供的一种网络训练方法的流程图。
第六方面,参照图12,本申请实施例提供一种网络训练方法,包括步骤1200至1207。
步骤1200、采用上述步骤500至503中的视频图像处理方法对当前图像和与当前图像相邻的N帧参考图像进行处理得到N帧图像变换后的参考图像;N为大于或等于1的整数。
步骤1201、计算第二信息熵损失,根据第二信息熵损失更新第三运动估计网络中需要训练的所有参数,继续执行上述步骤1200,直到第二信息熵损失小于或等于第四预设阈值。
在一些实施方式中,计算第二信息熵损失包括:
根据参考图像和图像变换后的参考图像计算第二信息熵损失。
在一些实施方式中,根据参考图像和图像变换后的参考图像计算第二信息熵损失包括:按照公式
Figure PCTCN2022114827-appb-000218
计算第二信息熵损失;Loss in为第二信息熵损失,
Figure PCTCN2022114827-appb-000219
为第k个参考图像的信息熵,以及
Figure PCTCN2022114827-appb-000220
为第k个图像变换后的参考图像的信息熵。
按照公式
Figure PCTCN2022114827-appb-000221
计算图像x的信息熵。
Figure PCTCN2022114827-appb-000222
像素值x i分布在0至N的范围内,P xi为像素值为x i的概率,这里只取P xi不为0的情况,并且公式中的对数一般取2为底。
步骤1202、采用上述步骤500至503中的视频图像处理方法对当前图像和与当前图像相邻的N帧参考图像进行处理,得到当前图像的特征向量以及每一帧图像变换后的参考图像的特征向量。
步骤1203、计算第二信息熵损失,计算第二胶囊网络的第二重构损失,以及根据第二信息熵损失和第二重构损失计算第四总损失。
在一些实施方式中,计算第二胶囊网络的第二重构损失包括以下任意一个:
根据图像变换后的参考图像和图像变换后的参考图像的特征向量计算第二重构损失;或者
根据当前图像和当前图像的特征向量计算第二重构损失。
在一些实施方式中,根据当前图像和当前图像的特征向量计算第二重构损失包括:按照公式
Figure PCTCN2022114827-appb-000223
计算第二重构损失;Loss recon为第二重构损失,H为当前图像的高度,W为当前图像的宽度,
Figure PCTCN2022114827-appb-000224
为当前图像的第i行第j列的像素值,以及
Figure PCTCN2022114827-appb-000225
为当前图像的特征向量的第i行第j列的像素值。
在一些实施方式中,根据第k个图像变换后的参考图像和第k个图像变换后的参考图像的特征向量计算第二重构损失包括:按照公式
Figure PCTCN2022114827-appb-000226
计算第二重构损失;Loss recon为第二重构损失,H为第k个图像变换后的参考图像的高度,W为第k个图像变换后的参考图像的宽度,
Figure PCTCN2022114827-appb-000227
为第k个图像变换后的参考图像的第i行第j列的像素值,以及
Figure PCTCN2022114827-appb-000228
为第k个图像变换后的参考图像的特征向量的第i行第j列的像素值。
在一些实施方式中,根据第二信息熵损失和第二重构损失计算第四总损失包括:将第二信息熵损失和第二重构损失相加得到第四总损失。
步骤1204、根据第四总损失更新第三运动估计网络和第二胶囊网络中需要训 练的参数,继续执行上述步骤1200,直到第四总损失小于或等于第五预设阈值。
步骤1205、采用上述步骤500至503中的视频图像处理方法对当前图像和与当前图像相邻的N帧参考图像进行处理得到超分辨率的目标图像。
步骤1206、根据目标图像和对应的真实图像计算L2损失,计算第一信息熵损失,计算第二胶囊网络的第二重构损失,以及根据L2损失、第一信息熵损失和第二重构损失计算第三总损失。
在一些实施方式中,根据目标图像和对应的真实图像计算L2损失包括:按照公式
Figure PCTCN2022114827-appb-000229
计算L2损失;Loss SR为L2损失,H为目标图像的高度,W为目标图像的宽度,
Figure PCTCN2022114827-appb-000230
为目标图像的第i行第j列对应的像素值,
Figure PCTCN2022114827-appb-000231
为真实图像的第i行第j列对应的像素值,以及|| ||为平方函数。
在一些实施方式中,计算第一信息熵损失包括以下任意一个:
根据目标图像和真实图像计算第一信息熵损失;
根据目标图像和当前图像计算第一信息熵损失;或者
根据参考图像和图像变换后的参考图像计算第一信息熵损失。
在一些实施方式中,根据目标图像和真实图像计算第一信息熵损失包括:按照公式
Figure PCTCN2022114827-appb-000232
计算第一信息熵损失;Loss in为第一信息熵损失,
Figure PCTCN2022114827-appb-000233
为目标图像的信息熵,以及
Figure PCTCN2022114827-appb-000234
为真实图像的信息熵。
在一些实施方式中,根据目标图像和当前图像计算第一信息熵损失包括:按照公式
Figure PCTCN2022114827-appb-000235
计算第一信息熵损失;Loss in为第一信息熵损失,
Figure PCTCN2022114827-appb-000236
为目标图像的信息熵,以及
Figure PCTCN2022114827-appb-000237
为当前图像的信息熵。
在一些实施方式中,根据参考图像和图像变换后的参考图像计算第一信息熵损失包括:按照公式
Figure PCTCN2022114827-appb-000238
计算第一信息熵损失;Loss in为第一信息熵损失,
Figure PCTCN2022114827-appb-000239
为第k个参考图像的信息熵,以及
Figure PCTCN2022114827-appb-000240
为第k个图像变换后的参考图像的信息熵。
按照公式
Figure PCTCN2022114827-appb-000241
计算图像x的信息熵。
Figure PCTCN2022114827-appb-000242
像素值x i分布在0至N的范围内,P xi为像素值为x i的概率,这里只取P xi不为0的情况,并且公式中的对数一般取2为底。
在一些实施方式中,根据L2损失、第一信息熵损失和第二重构损失计算第三总损失包括:将L2损失、第一信息熵损失和第二重构损失相加得到第三总损失。
步骤1207、根据第三总损失更新第二胶囊网络、第二注意力网络、第三运动估计网络、第二运动补偿网络和超分辨率网络中需要训练的所有参数,继续执行步骤1200至1206,直到第三总损失小于或等于第三预设阈值。
在一些实施方式中,根据第三总损失更新第二胶囊网络、第二注意力网络、第三运动估计网络、第二运动补偿网络和超分辨率网络中需要训练的所有参数。在一些实施方式中,根据第三总损失更新第二胶囊网络、第二注意力网络、第三运动估计网络、第二运动补偿网络、第三运动补偿网络和超分辨率网络中需要训练的所有参数。
本申请实施例提供的网络训练方法,对视频图像处理方法中的网络进行分阶段训练,进一步提高了训练效果。
第七方面,本申请实施例提供一种电子设备1300,如图13所示,所述电子设备1300包括:至少一个处理器1301;以及存储器1302,存储器1302上存储有至少一个计算机程序,当至少一个计算机程序被所述至少一个处理器1301执行时,实现上述任意一种视频图像处理方法、或上述任意一种网络训练方法。
处理器为具有数据处理能力的器件,包括但不限于中央处理器(CPU)等;以 及,存储器为具有数据存储能力的器件,其包括但不限于随机存取存储器(RAM,如SDRAM、DDR等)、只读存储器(ROM)、带电可擦可编程只读存储器(EEPROM)、以及闪存(FLASH)。
在一些实施方式中,处理器1301、存储器1302通过总线1303相互连接,进而与计算设备的其它组件连接。
第八方面,本申请实施例提供一种计算机可读存储介质,计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现上述任意一种视频图像处理方法,或上述任意一种网络训练方法。
本领域普通技术人员应当理解,上文中所公开方法中的全部或某些步骤、及装置中的功能模块/单元能够被实施为软件、固件、硬件及其适当的组合。在硬件实施方式中,在以上描述中提及的功能模块/单元之间的划分不一定对应于物理组件的划分;例如,一个物理组件可以具有多个功能,或者一个功能或步骤可以由若干物理组件合作执行。某些物理组件或所有物理组件可以被实施为由处理器(如中央处理器、数字信号处理器或微处理器)执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。这样的软件可以分布在计算机可读介质上,计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的,术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其它数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其它存储器技术、CD-ROM、数字多功能盘(DVD)或其它光盘存储、磁盒、磁带、磁盘存储或其它磁存储器、或者可以用于存储期望的信息并且可以被计算机访问的任何其它的介质。此外,本领域普通技术人员公知的是,通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其它传输机制之类的调制数据信号中的其它数据,并且可包括任何信息递送介质。
本文已经公开了示例实施例,并且虽然采用了具体术语,但它们仅用于并仅应当被解释为一般说明性含义,并且不用于限制的目的。在一些实例中,对本领域技术人员显而易见的是,除非另外明确指出,否则与特定实施例相结合描述的特征、特性和/或元素可单独使用,或可与结合其它实施例描述的特征、特性和/或元件组合使用。因此,本领域技术人员将理解,在不脱离由所附的权利要求阐明的本申请的范围的情况下,可进行各种形式和细节上的改变。

Claims (26)

  1. 一种视频图像处理方法,包括:
    采用第一胶囊网络对当前图像和与所述当前图像相邻的N帧参考图像进行特征提取,得到所述当前图像的特征向量以及每一帧所述参考图像的特征向量;其中,N为大于或等于1的整数;
    针对每一帧所述参考图像,采用第一注意力网络对所述当前图像的特征向量以及所述参考图像的特征向量进行相关性处理,得到所述当前图像的特征向量与所述参考图像的特征向量之间的第一相关性向量;采用第一运动估计网络对所述第一相关性向量进行运动估计处理得到第一帧间运动信息;根据所述第一帧间运动信息对所述参考图像进行图像变换得到图像变换后的参考图像;以及
    采用第一运动补偿网络对所述当前图像和所有所述图像变换后的参考图像进行融合处理得到第一融合图像;采用超分辨率网络对所述第一融合图像进行超分辨率处理得到超分辨率的目标图像。
  2. 根据权利要求1所述的视频图像处理方法,其中,对所述当前图像进行特征提取所采用的第一胶囊网络与对所述参考图像进行特征提取所采用的第一胶囊网络属于同一个胶囊网络,或属于不同的胶囊网络;以及
    对不同所述参考图像进行特征提取所采用的第一胶囊网络属于同一个胶囊网络,或属于不同的胶囊网络。
  3. 根据权利要求1所述的视频图像处理方法,其中,所述采用第一运动补偿网络对所述当前图像和所有所述图像变换后的参考图像进行融合处理得到第一融合图像包括:
    针对每一帧所述参考图像,根据所述第一相关性向量确定所述参考图像的权重;以及
    采用所述第一运动补偿网络,根据所有所述参考图像的权重对所述当前图像和所有所述图像变换后的参考图像进行融合处理得到所述第一融合图像。
  4. 根据权利要求1所述的视频图像处理方法,还包括:
    所述采用第一运动估计网络对所述第一相关性向量进行运动估计处理得到第一帧间运动信息之前,采用第二运动估计网络对所述当前图像和所述参考图像进行特征提取和相关性处理,得到所述当前图像和所述参考图像之间的第二相关性向量;将所述第一相关性向量和所述第二相关性向量进行点乘计算得到第三相关性向量;
    所述采用第一运动估计网络对所述第一相关性向量进行运动估计处理得到第一帧间运动信息包括:采用所述第一运动估计网络对所述第三相关性向量进行运动估计处理得到所述第一帧间运动信息。
  5. 根据权利要求1所述的视频图像处理方法,还包括:
    所述采用第一运动估计网络对所述第一相关性向量进行运动估计处理得到第一帧间运动信息之前,将所述第一相关性向量和所述当前图像进行点乘计算得到新的当前图像;将所述第一相关性向量和所述参考图像进行点乘计算得到新的参考图像;采用第二运动估计网络对所述新的当前图像和所述新的参考图像进行特征提取和相关性处理得到所述新的当前图像和所述新的参考图像之间的第四相关性向量;
    所述采用第一运动估计网络对所述第一相关性向量进行运动估计处理得到第一帧间运动信息包括:采用所述第一运动估计网络对所述第四相关性向量进行运动估计处理得到所述第一帧间运动信息。
  6. 一种视频图像处理方法,包括:
    针对与当前图像相邻的每一帧参考图像,采用第三运动估计网络对当前图像和所述参考图像进行运动估计处理得到第二帧间运动信息;根据所述第二帧间运动信息对所述参考图像进行图像变换得到图像变换后的参考图像;
    采用第二胶囊网络对所述当前图像和所有所述图像变换后的参考图像进行特征提取,得到所述当前图像的特征向量以及每一帧所述图像变换后的参考图像的特征向量;
    针对每一帧所述图像变换后的参考图像,采用第二注意力网络对所述当前图像的特征向量以及所述图像变换后的参考图像的特征向量进行相关性处理,得到所述当前图像的特征向量与所述图像变换后的参考图像的特征向量之间的第五相关性向量;以及
    采用第二运动补偿网络,根据所有所述第五相关性向量对所述当前图像和所有所述图像变换后的参考图像进行融合处理得到第二融合图像;采用超分辨率网络对所述第二融合图像进行超分辨率处理得到超分辨率的目标图像。
  7. 根据权利要求6所述的视频图像处理方法,其中,对所述当前图像进行特征提取所采用的第二胶囊网络与对所述图像变换后的参考图像进行特征提取所采用的第二胶囊网络属于同一个胶囊网络,或属于不同的胶囊网络;以及
    对不同所述图像变换后的参考图像进行特征提取所采用的第二胶囊网络属于同一个胶囊网络,或属于不同的胶囊网络。
  8. 根据权利要求6所述的视频图像处理方法,其中,所述采用第二运动补偿网络,根据所有所述第五相关性向量对所述当前图像和所有所述图像变换后的参考图像进行融合处理得到第二融合图像包括:
    针对每一帧所述图像变换后的参考图像,根据所述第五相关性向量确定所述图像变换后的参考图像的权重;以及
    采用所述第二运动补偿网络,根据所有所述图像变换后的参考图像的权重对所述当前图像和所有所述图像变换后的参考图像进行融合处理得到所述第二融合图像。
  9. 根据权利要求6所述的视频图像处理方法,还包括:
    所述采用第二运动补偿网络根据所有所述第五相关性向量对所述当前图像和所有所述图像变换后的参考图像进行融合处理得到第二融合图像之前,采用第三运动补偿网络对所述当前图像和所述图像变换后的参考图像进行特征提取和相关性处理,得到所述当前图像和所述图像变换后的参考图像之间的第六相关性向量;将所述第五相关性向量和所述第六相关性向量进行点乘计算得到第七相关性向量;以及
    所述采用第二运动补偿网络,根据所有所述第五相关性向量对所述当前图像和所有所述图像变换后的参考图像进行融合处理得到第二融合图像包括:采用所述第二运动补偿网络,根据所有所述第七相关性向量对所述当前图像和所有所述图像变换后的参考图像进行融合处理得到所述第二融合图像。
  10. 根据权利要求6所述的视频图像处理方法,还包括:
    所述采用第二运动补偿网络根据所有所述第五相关性向量对所述当前图像和所有所述图像变换后的参考图像进行融合处理得到第二融合图像之前,将所述第五相关性向量和所述当前图像进行点乘计算得到新的当前图像;将所述第五相关性向量和所述图像变换后的参考图像进行点乘处理得到新的图像变换后的参考图像;采用第三运动补偿网络对所述新的当前图像和所述新的图像变换后的参考图像进行特征提取和相关性处理,得到所述新的当前图像和所述新的图像变换后的参考图像之间的第八相关性向量;
    所述采用第二运动补偿网络,根据所有所述第五相关性向量对所述当前图像和所有所述图像变换后的参考图像进行融合处理得到第二融合图像包括:采用所述第二运动补偿网络,根据所有所述第八相关性向量对所述当前图像和所有所述图像变换后的参考图像进行融合处理得到所述第二融合图像。
  11. 一种网络训练方法,包括:
    采用权利要求1至5中任意一项所述的视频图像处理方法对当前图像和与所 述当前图像相邻的N帧参考图像进行处理得到超分辨率的目标图像;其中,N为大于或等于1的整数;
    根据所述目标图像和对应的真实图像计算L2损失,计算第一信息熵损失,计算第一胶囊网络的第一重构损失,以及根据所述L2损失、所述第一信息熵损失和所述第一重构损失计算第一总损失;以及
    根据所述第一总损失更新所述第一胶囊网络、第一注意力网络、第一运动估计网络、第一运动补偿网络和超分辨率网络中需要训练的所有参数,继续执行所述采用权利要求1至5中任意一项所述的视频图像处理方法对当前图像和与所述当前图像相邻的N帧参考图像进行处理得到超分辨率的目标图像的步骤,直到所述第一总损失小于或等于第一预设阈值。
  12. 根据权利要求11所述的网络训练方法,其中,所述计算第一信息熵损失包括以下任意一个:
    根据所述目标图像和所述真实图像计算所述第一信息熵损失;
    根据所述目标图像和所述当前图像计算所述第一信息熵损失;或者
    根据所述参考图像和图像变换后的参考图像计算所述第一信息熵损失。
  13. 根据权利要求11所述的网络训练方法,其中,所述计算第一胶囊网络的第一重构损失包括以下任意一个:
    根据所述参考图像和所述参考图像的特征向量计算所述第一重构损失;或者
    根据所述当前图像和所述当前图像的特征向量计算所述第一重构损失。
  14. 一种网络训练方法,包括:
    基于训练好的第一胶囊网络,采用权利要求1至5中任意一项所述的视频图像处理方法对当前图像和与所述当前图像相邻的N帧参考图像进行处理得到N帧图像变换后的参考图像;其中,N为大于或等于1的整数;
    计算第一胶囊网络的第一重构损失,计算第二信息熵损失,以及根据所述第一重构损失和所述第二信息熵损失计算第二总损失;
    根据所述第二总损失更新所述第一胶囊网络、所述第一注意力网络和所述第一运动估计网络中需要训练的所有参数,继续执行所述采用权利要求1至5中任意一项所述的视频图像处理方法对当前图像和与所述当前图像相邻的N帧参考图像进行处理得到N帧图像变换后的参考图像的步骤,直到所述第二总损失小于或等于第二预设阈值;
    采用权利要求1至5中任意一项所述的视频图像处理方法对当前图像和与所述当前图像相邻的N帧参考图像进行处理得到超分辨率的目标图像;
    根据所述目标图像和对应的真实图像计算L2损失,计算第一信息熵损失,计算所述第一胶囊网络的第一重构损失,以及根据所述L2损失、所述第一信息熵损失和所述第一重构损失计算第一总损失;以及
    根据所述第一总损失更新第一胶囊网络、第一注意力网络、第一运动估计网络、第一运动补偿网络和超分辨率网络中需要训练的所有参数,继续执行所述采用权利要求1至5中任意一项所述的视频图像处理方法对当前图像和与所述当前图像相邻的N帧参考图像进行处理得到超分辨率的目标图像的步骤,直到所述第一总损失小于或等于第一预设阈值。
  15. 根据权利要求14所述的网络训练方法,其中,所述计算第二信息熵损失包括:
    根据所述参考图像和所述图像变换后的参考图像计算所述第二信息熵损失。
  16. 根据权利要求14所述的网络训练方法,其中,所述计算第一信息熵损失包括以下任意一个:
    根据所述目标图像和所述真实图像计算所述第一信息熵损失;
    根据所述目标图像和所述当前图像计算所述第一信息熵损失;或者
    根据所述参考图像和所述图像变换后的参考图像计算所述第一信息熵损失。
  17. 根据权利要求14所述的网络训练方法,其中,所述计算第一胶囊网络的第一重构损失包括以下任意一个:
    根据所述参考图像和所述参考图像的特征向量计算所述第一重构损失;或者
    根据所述当前图像和所述当前图像的特征向量计算所述第一重构损失。
  18. 一种网络训练方法,包括:
    采用权利要求6至10中任意一项所述的视频图像处理方法对当前图像和与所述当前图像相邻的N帧参考图像进行处理得到超分辨率的目标图像;其中,N为大于或等于1的整数;
    根据所述目标图像和对应的真实图像计算L2损失,计算第一信息熵损失,计算第二胶囊网络的第二重构损失,以及根据所述L2损失、所述第一信息熵损失和所述第二重构损失计算第三总损失;以及
    根据所述第三总损失更新第二胶囊网络、第二注意力网络、第三运动估计网络、第二运动补偿网络和超分辨率网络中需要训练的所有参数,继续执行所述采用权利要求6至10中任意一项所述的视频图像处理方法对当前图像和与所述当前图像相邻的N帧参考图像进行处理得到超分辨率的目标图像的步骤,直到所述第三总损失小于或等于第三预设阈值。
  19. 根据权利要求18所述的网络训练方法,其中,所述计算第一信息熵损失包括以下任意一个:
    根据所述目标图像和所述真实图像计算所述第一信息熵损失;
    根据所述目标图像和所述当前图像计算所述第一信息熵损失;或者
    根据所述参考图像和图像变换后的参考图像计算所述第一信息熵损失。
  20. 根据权利要求18所述的网络训练方法,其中,所述计算第二胶囊网络的第二重构损失包括以下任意一个:
    根据所述图像变换后的参考图像和所述图像变换后的参考图像的特征向量计算所述第二重构损失;或者
    根据所述当前图像和所述当前图像的特征向量计算所述第二重构损失。
  21. 一种网络训练方法,包括:
    采用权利要求6至10中任意一项所述的视频图像处理方法对当前图像和与所述当前图像相邻的N帧参考图像进行处理得到N帧图像变换后的参考图像;其中,N为大于或等于1的整数;
    计算第二信息熵损失,根据所述第二信息熵损失更新所述第三运动估计网络中需要训练的所有参数,继续执行所述采用权利要求6至10中任意一项所述的视频图像处理方法对当前图像和与所述当前图像相邻的N帧参考图像进行处理得到N帧图像变换后的参考图像的步骤,直到所述第二信息熵损失小于或等于第四预设阈值;
    采用权利要求6至10中任意一项所述的视频图像处理方法对当前图像和与所述当前图像相邻的N帧参考图像进行处理得到所述当前图像的特征向量以及每一帧所述图像变换后的参考图像的特征向量;
    计算第二信息熵损失,计算第二胶囊网络的第二重构损失,以及根据所述第二信息熵损失和所述第二重构损失计算第四总损失;
    根据所述第四总损失更新所述第三运动估计网络和所述第二胶囊网络中需要训练的参数,继续执行所述采用权利要求6至10中任意一项所述的视频图像处理方法对当前图像和与所述当前图像相邻的N帧参考图像进行处理得到所述当前图像的特征向量以及每一帧所述图像变换后的参考图像的特征向量的步骤,直到所述第四总损失小于或等于第五预设阈值;
    采用权利要求6至10中任意一项所述的视频图像处理方法对当前图像和与所述当前图像相邻的N帧参考图像进行处理得到超分辨率的目标图像;
    根据所述目标图像和对应的真实图像计算L2损失,计算第一信息熵损失,计 算所述第二胶囊网络的第二重构损失,以及根据所述L2损失、所述第一信息熵损失和所述第二重构损失计算第三总损失;以及
    根据所述第三总损失更新第二胶囊网络、第二注意力网络、第三运动估计网络、第二运动补偿网络和超分辨率网络中需要训练的所有参数,继续执行所述采用权利要求6至10中任意一项所述的视频图像处理方法对当前图像和与所述当前图像相邻的N帧参考图像进行处理得到超分辨率的目标图像的步骤,直到所述第三总损失小于或等于第三预设阈值。
  22. 根据权利要求21所述的网络训练方法,其中,所述计算第二信息熵损失包括:
    根据所述参考图像和所述图像变换后的参考图像计算所述第二信息熵损失。
  23. 根据权利要求21所述的网络训练方法,其中,所述计算第一信息熵损失包括以下任意一个:
    根据所述目标图像和所述真实图像计算所述第一信息熵损失;
    根据所述目标图像和所述当前图像计算所述第一信息熵损失;或者
    根据所述参考图像和所述图像变换后的参考图像计算所述第一信息熵损失。
  24. 根据权利要求21所述的网络训练方法,其中,所述计算第二胶囊网络的第二重构损失包括以下任意一个:
    根据所述图像变换后的参考图像和所述图像变换后的参考图像的特征向量计算所述第二重构损失;或者
    根据所述当前图像和所述当前图像的特征向量计算所述第二重构损失。
  25. 一种电子设备,包括:
    至少一个处理器;以及
    存储器,所述存储器上存储有至少一个计算机程序,当所述至少一个计算机程序被所述至少一个处理器执行时,实现权利要求1至10中任意一项所述的视频图像处理方法、或权利要求11至24中任意一项所述的网络训练方法。
  26. 一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现权利要求1至10中任意一项所述的视频图像处理方法、或权利要求11至24中任意一项所述的网络训练方法。
PCT/CN2022/114827 2021-08-25 2022-08-25 视频图像处理方法、网络训练方法、电子设备、和计算机可读存储介质 WO2023025245A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110985417.8A CN115731098A (zh) 2021-08-25 2021-08-25 视频图像处理方法、网络训练方法、电子设备、介质
CN202110985417.8 2021-08-25

Publications (1)

Publication Number Publication Date
WO2023025245A1 true WO2023025245A1 (zh) 2023-03-02

Family

ID=85291053

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/114827 WO2023025245A1 (zh) 2021-08-25 2022-08-25 视频图像处理方法、网络训练方法、电子设备、和计算机可读存储介质

Country Status (2)

Country Link
CN (1) CN115731098A (zh)
WO (1) WO2023025245A1 (zh)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160358314A1 (en) * 2015-06-03 2016-12-08 Zhengping Ji Method and apparatus of multi-frame super resolution robust to local and global motion
CN111696035A (zh) * 2020-05-21 2020-09-22 电子科技大学 一种基于光流运动估计算法的多帧图像超分辨率重建方法
CN112734644A (zh) * 2021-01-19 2021-04-30 安徽工业大学 一种多个注意力结合光流的视频超分辨模型及方法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160358314A1 (en) * 2015-06-03 2016-12-08 Zhengping Ji Method and apparatus of multi-frame super resolution robust to local and global motion
CN111696035A (zh) * 2020-05-21 2020-09-22 电子科技大学 一种基于光流运动估计算法的多帧图像超分辨率重建方法
CN112734644A (zh) * 2021-01-19 2021-04-30 安徽工业大学 一种多个注意力结合光流的视频超分辨模型及方法

Also Published As

Publication number Publication date
CN115731098A (zh) 2023-03-03

Similar Documents

Publication Publication Date Title
CN109118431B (zh) 一种基于多记忆及混合损失的视频超分辨率重建方法
CN110324664B (zh) 一种基于神经网络的视频补帧方法及其模型的训练方法
WO2021208122A1 (zh) 基于深度学习的视频盲去噪方法及装置
CN106600536B (zh) 一种视频图像超分辨率重建方法及装置
CN111028150B (zh) 一种快速时空残差注意力视频超分辨率重建方法
CN107274347A (zh) 一种基于深度残差网络的视频超分辨率重建方法
Li et al. Face hallucination based on sparse local-pixel structure
CN106210449B (zh) 一种多信息融合的帧率上变换运动估计方法及系统
Wen et al. VIDOSAT: High-dimensional sparsifying transform learning for online video denoising
Kaviani et al. Frame rate upconversion using optical flow and patch-based reconstruction
CN104202603B (zh) 一种应用于视频帧速率上转换的运动向量场生成方法
Shimano et al. Video temporal super-resolution based on self-similarity
CN105513033A (zh) 一种非局部联合稀疏表示的超分辨率重建方法
CN115578255A (zh) 一种基于帧间亚像素块匹配的超分辨率重建方法
Li et al. Space–time super-resolution with patch group cuts prior
CN104376544B (zh) 一种基于多区域尺度放缩补偿的非局部超分辨率重建方法
WO2022111208A1 (zh) 一种视频帧频提升方法、装置、设备及介质
CN112801876B (zh) 信息处理方法、装置及电子设备和存储介质
CN112396554A (zh) 一种基于生成对抗网络的图像超分辨率算法
WO2023025245A1 (zh) 视频图像处理方法、网络训练方法、电子设备、和计算机可读存储介质
Ye et al. SNR-Prior Guided Trajectory-Aware Transformer for Low-Light Video Enhancement
CN107155096B (zh) 一种基于半误差反向投影的超分辨率重建方法及装置
Dong et al. Pyramid convolutional network for colorization in monochrome-color multi-lens camera system
CN109788297B (zh) 一种基于元胞自动机的视频帧率上转换方法
CN108681988B (zh) 一种基于多幅图像的鲁棒的图像分辨率增强方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22860591

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22860591

Country of ref document: EP

Kind code of ref document: A1