WO2023133888A1 - 图像处理方法、装置、遥控设备、系统及存储介质 - Google Patents

图像处理方法、装置、遥控设备、系统及存储介质 Download PDF

Info

Publication number
WO2023133888A1
WO2023133888A1 PCT/CN2022/072348 CN2022072348W WO2023133888A1 WO 2023133888 A1 WO2023133888 A1 WO 2023133888A1 CN 2022072348 W CN2022072348 W CN 2022072348W WO 2023133888 A1 WO2023133888 A1 WO 2023133888A1
Authority
WO
WIPO (PCT)
Prior art keywords
frame
processed
video
feature
reference frame
Prior art date
Application number
PCT/CN2022/072348
Other languages
English (en)
French (fr)
Inventor
汪海
郭靖宇
杨文明
张李亮
赵亮
郑萧桢
Original Assignee
深圳市大疆创新科技有限公司
清华大学深圳国际研究生院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市大疆创新科技有限公司, 清华大学深圳国际研究生院 filed Critical 深圳市大疆创新科技有限公司
Priority to PCT/CN2022/072348 priority Critical patent/WO2023133888A1/zh
Publication of WO2023133888A1 publication Critical patent/WO2023133888A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/61Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding

Definitions

  • the present application relates to the technical field of image processing, and specifically relates to an image processing method, device, remote control device, system, and storage medium.
  • a method in related technologies is to reconstruct the image (or video frame), such as inputting the frame image (or video frame) into a deep neural network with powerful nonlinear modeling capabilities for reconstruction processed to obtain the reconstructed image.
  • a frame of image (or video frame) can provide limited information, and it is difficult to reconstruct a high-quality image (or video frame) even with a deep neural network.
  • one of the objectives of the present application is to provide an image processing method, device, remote control device, system and storage medium.
  • the embodiment of the present application provides an image processing method, including:
  • an image processing device comprising:
  • processors one or more processors
  • the one or more processors execute the executable instructions, they are individually or collectively configured to execute the method described in the first aspect.
  • an embodiment of the present application provides a remote control device, including the image processing apparatus described in the second aspect.
  • an embodiment of the present application provides an image processing system, including a movable platform and the remote control device described in the third aspect;
  • the movable platform is equipped with a photographing device, and the photographing device is used to collect video frame sequences during the movement of the movable platform;
  • the movable platform is used to compress the sequence of video frames to obtain a compressed video stream, and transmit the compressed video stream to the image processing device.
  • the embodiment of the present application provides a computer-readable storage medium, the computer-readable storage medium stores executable instructions, and when the executable instructions are executed by a processor, the method as described in the first aspect is implemented .
  • An image processing method, device, remote control device, system, and storage medium provided in the embodiments of the present application can obtain decompressed frames to be processed and at least one reference frame of the frames to be processed after decompressing the compressed video stream , and the motion vector between the reference frame and the frame to be processed generated during the compression process, and then at least one motion vector and at least one reference frame can be used to reconstruct the frame to be processed, Guided by the motion vector, the reference frame is fully utilized to supplement more information for the reconstruction process of the frame to be processed, so that the target frame whose image quality is higher than the frame to be processed can be obtained.
  • Fig. 1 is a product schematic diagram of an unmanned aerial system provided by the embodiment of the present application.
  • FIG. 2 is a schematic flow diagram of a video encoding provided by an embodiment of the present application.
  • FIG. 3 is a schematic flow diagram of an image processing method provided in an embodiment of the present application.
  • FIG. 4 is a schematic diagram of a data fusion network and a video frame reconstruction network provided by an embodiment of the present application;
  • FIG. 5 is a schematic diagram of a second data fusion network and a video frame reconstruction network provided by an embodiment of the present application
  • FIG. 6 is a schematic diagram of a third data fusion network and a video frame reconstruction network provided by an embodiment of the present application.
  • FIG. 7A and FIG. 7B are schematic structural diagrams of two different video frame restoration networks provided by the embodiment of the present application.
  • Figure 8A and Figure 8B are schematic structural diagrams of a data fusion network, a video frame reconstruction network and a video frame restoration network provided by the embodiment of the present application, wherein the video frames processed by the video frame restoration network in Figure 8A and Figure 8B are different ;
  • FIG. 9 is a schematic structural diagram of another data fusion network, video frame reconstruction network and video frame restoration network provided by the embodiment of the present application.
  • Fig. 10 is a schematic structural diagram of an image processing device provided by an embodiment of the present application.
  • the embodiment provides an image processing method.
  • a decompressed frame to be processed After decompressing a compressed video stream, a decompressed frame to be processed, at least one reference frame of the frame to be processed, and the reference frame and The motion vector between the frames to be processed can then use at least one motion vector and at least one reference frame to perform reconstruction processing on the frame to be processed, and under the guidance of the motion vector, make full use of the reference frame as the frame to be processed
  • the reconstruction process of the processed frame provides more information, so that the target frame can be obtained with a higher image quality than the frame to be processed.
  • the image processing method provided by the embodiments of the present application may be applied to an image processing device.
  • the image processing device can be an electronic device with data processing capability; it can also be a computer chip or an integrated circuit with data processing capability, such as a central processing unit (Central Processing Unit, CPU), a digital signal processor (Digital Signal Processor, DSP) , Application Specific Integrated Circuit (ASIC) or off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA), etc.
  • CPU Central Processing Unit
  • DSP Digital Signal Processor
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • examples of electronic devices include, but are not limited to: smart phones/cell phones, tablet computers, personal digital assistants (PDAs), laptop computers, desktop computers, media content players, video game stations/systems, virtual reality systems, Augmented reality systems, wearable devices (e.g., watches, glasses, gloves, headgear (e.g., hats, helmets, virtual reality headsets, augmented reality headsets, head-mounted devices (HMDs), headbands), pendants , armbands, leg rings, shoes, vests), remote control devices (such as remote controls), or any other type of device.
  • PDAs personal digital assistants
  • laptop computers desktop computers
  • media content players e.g., watches, glasses, gloves, headgear (e.g., hats, helmets, virtual reality headsets, augmented reality headsets, head-mounted devices (HMDs), headbands), pendants , armbands, leg rings, shoes, vests), remote control devices (such as remote controls), or any other type of device.
  • HMDs headge
  • the image processing device when the image processing device is a computer chip or an integrated circuit with data processing capability, the image processing device may be installed in an electronic device.
  • the compressed video stream obtained by the image processing device may be captured by a mobile platform during motion or non-movement using its on-board camera to capture video frame sequences, and then the mobile platform will The video frame sequence is compressed and transmitted.
  • the movable platform include but are not limited to unmanned aerial vehicles, unmanned vehicles, cloud platforms, unmanned ships, or mobile robots (such as sweeping robots) and the like.
  • the movable platform is an unmanned aerial vehicle (UAV)
  • the image processing device is a remote control device for an unmanned aerial vehicle.
  • FIG. 1 shows an unmanned
  • the UAV 110 is communicatively connected to the remote control device 120 .
  • the UAV 110 can be operated by the remote control device 120 and its own program control device, and can fly under automatic or semi-automatic control.
  • the unmanned aerial vehicle 110 includes a flight controller, and the flight controller can control the unmanned aerial vehicle according to pre-programmed instructions, and can also control the unmanned aerial vehicle by responding to one or more remote control signals from the remote control device 120. machine is controlled.
  • the UAV 110 is provided with a photographing device 111.
  • the photographing device 111 can be, for example, a camera or video camera, etc., for capturing images.
  • the photographing device 111 can communicate with the UAV 110 and take pictures under the control of the UAV 110.
  • the photographing device 111 of this embodiment includes at least a photosensitive element, such as a complementary metal oxide semiconductor (Complementary Metal Oxide Semiconductor, CMOS) sensor or a charge-coupled device (Charge-coupled Device, CCD) sensor. It can be understood that the photographing device 111 may also be directly fixed to the UAV 110 , or may be mounted in the UAV 110 through a pan/tilt.
  • CMOS complementary Metal Oxide Semiconductor
  • CCD charge-coupled Device
  • the remote control device 120 can control the UAV 110 to fly, and control the camera 111 in the UAV 110 to collect video frames.
  • the shooting device 111 can collect a video frame sequence during the flight of the unmanned aerial vehicle 110, and then the unmanned aerial vehicle 110 sends the video frame sequence collected by the shooting device 111 to the remote control device 120, and the remote control device 120 can be provided with a display 121, and the shooting device 111 The captured video frame sequence can be displayed on the display 121 .
  • the unmanned aerial vehicle 110 will compress the video frame sequence collected by the shooting device 111, and then send the compressed video stream with less compressed data to the remote control device 120.
  • a decoder can be used to decode it, and the decoded video frame sequence is displayed on the display 121 of the remote control device 120 .
  • the image processing method provided by the embodiment of the present application can be used to obtain the decompressed frame to be processed, at least one reference frame of the frame to be processed, and the The motion vector between the reference frame and the frame to be processed is generated in the reference frame; and then the frame to be processed is reconstructed according to at least one motion vector and at least one reference frame to obtain an image quality higher than the frame to be processed
  • the target frame of the frame to be processed is described, and the target frame is displayed on the display, thereby improving the user's visual experience.
  • FIG. 2 shows a coding flow chart.
  • the prediction includes intra-frame prediction and inter-frame prediction, and its purpose is to use prediction block information to remove redundant information of the current image to be encoded.
  • Intra-frame prediction For intra-frame prediction: Each frame in the video can be regarded as an independent image, and there will be certain spatial redundancy in the image, such as the sky area that often appears in the image or video background, the internal pixels between It is very similar, such an area provides a large compression space for image or video encoding.
  • Intra-frame prediction is used to remove spatial redundancy within each frame.
  • Intra-frame prediction uses the information of the current frame image to obtain prediction block data. The process includes dividing the image to be encoded into several image blocks to be encoded; then, for each image block to be encoded, using the adjacent encoded image block to generate a prediction block of the current image block to be encoded.
  • the process includes dividing the image to be encoded into several image blocks to be encoded; then, for each image block to be encoded, search the reference frame for the best match with the current image block to be encoded ( Or the most similar image block) is used as the prediction block, and the relative displacement between the prediction block and the current image block to be encoded is the motion vector.
  • the reference frame may be an encoded image adjacent to the image to be encoded.
  • An image frame that only uses intra-frame prediction mode in encoding is called an I frame, and an image frame that uses both intra-frame prediction and inter-frame prediction is called a P or B frame.
  • the image block to be coded is subtracted from the corresponding pixel value of the pre-side block to obtain a residual block.
  • the transformation is to transform the residual block from the time domain to the frequency domain, so that the residual block can be further analyzed in the frequency domain, and the residual block can be transformed using a transformation matrix.
  • the transformation of the residual block usually adopts two-dimensional transformation, that is, the residual value in the residual block is multiplied by an NxN transformation matrix and its transpose matrix at the encoding end, and the transformation coefficient is obtained after multiplication.
  • the video content comes from the real world, and it cannot be guaranteed that all the information contained in it can be perceived by the human eye. Therefore, the video can be appropriately streamlined according to the characteristics of the human eye to perceive light signals to remove the visual redundancy.
  • Quantization is used to remove visual redundancy based on the human eye; among them, the transformed coefficients obtained after transformation can be quantized by quantization parameters to obtain quantized coefficients, and the coding efficiency can be further improved through the quantization process.
  • the quantization parameter includes but is not limited to a quantization parameter (Quantization Parameter, QP) or a quantization matrix (Quantization Matrix, QM).
  • Entropy coding is used to remove this statistical redundancy, entropy coding the quantized coefficients, assigning shorter codewords to value points with higher probability, and assigning longer codewords to value points with lower probability Ways to improve compression efficiency and remove statistical redundancy.
  • the code stream obtained by entropy encoding and encoded encoding mode information such as intra prediction mode, motion vector information, quantization parameters, etc.
  • the decoding end such as the above-mentioned image processing device.
  • the quantized coefficients are obtained through inverse quantization and inverse transformation processes to obtain the reconstructed residual block, and then the reconstructed residual block is added to the corresponding prediction block to obtain a reconstructed frame. After loop filtering, the reconstructed frame is used as a reference for other images to be encoded Frames are used for inter-frame prediction of other images to be encoded.
  • FIG. 3 is a schematic flowchart of an image processing method provided in an embodiment of the present application. The method is applied to an image processing device, and the method includes:
  • step S101 after decompressing the compressed video stream, obtain the decompressed frame to be processed, at least one adjacent frame of the frame to be processed, and the adjacent frame generated during the compression process and the frame to be processed Handles motion vectors between frames.
  • step S102 the frame to be processed is reconstructed according to at least one motion vector and at least one adjacent frame to obtain a target frame; wherein, the image quality of the target frame is higher than that of the frame to be processed image quality.
  • the reference frame under the guidance of the motion vector, is used to provide more supplementary information for the reconstruction process of the frame to be processed, so as to improve the image quality of the reconstructed target frame.
  • the image processing device is a remote control device of a movable platform or installed in the remote control device as a processing chip
  • the movable platform includes but is not limited to unmanned aerial vehicle (UAV), unmanned vehicle, unmanned aerial vehicle , mobile robot or sweeping robot, etc.
  • UAV unmanned aerial vehicle
  • the movable platform communicates with the remote control device
  • the movable platform is equipped with a shooting device.
  • the shooting device in the movable platform collects video frame sequences , and then the movable platform compresses the sequence of video frames collected by the shooting device to obtain a compressed video stream and then transmits it to the remote control device, and the remote control device obtains the compressed video stream.
  • the compressed video stream may also be obtained by the image processing apparatus from other media such as a server.
  • the image processing device After the image processing device obtains the compressed video stream, it can use the decoder to decompress the compressed video stream. For example, referring to the above video encoding process, the decoder performs entropy decoding, inverse quantization and inverse transformation after obtaining the compressed video stream, to obtain For the corresponding residual block, obtain the corresponding prediction block according to the information such as the motion vector or intra-frame prediction obtained by decoding, obtain the reconstruction value of each pixel in the current image block to be encoded according to the prediction block and the residual block, and output the decompressed video sequence of frames. For each video frame in the decompressed video frame sequence, the image processing method provided by the embodiment of the present application may be used to reconstruct each video frame, so as to obtain a video frame sequence with better image quality.
  • the motion vector between the reference frame and the frame to be processed that needs to be used in the embodiment of the present application can be decompressed by the decoder during the decompression of the compressed video stream. Output during the process, the process of obtaining the motion vector does not require additional calculation, which is beneficial to improve the reconstruction efficiency. Or, in some other possible implementation manners, the motion vector between the reference frame and the frame to be processed may be obtained by further processing according to the motion vector output by the decoder during the process of decompressing the compressed video stream.
  • the image processing device obtains the decompressed frame to be processed, at least one decompressed reference frame of the frame to be processed, and the A motion vector between the reference frame and the frame to be processed; then using the correlation between the reference frame and the frame to be processed in the time dimension to provide more information for the reconstruction process of the frame to be processed, according to at least one of the motion
  • the vector and at least one reference frame perform reconstruction processing on the frame to be processed to obtain a target frame whose image quality is higher than that of the frame to be processed, thereby improving the visual experience of the user.
  • the image quality can be measured by parameters such as image resolution, image information, image texture, image color; wherein, the image information includes but not limited to signal-to-noise ratio, image gradient, local variance or mean square error (Mean Square Error , MSE) and so on.
  • the image quality of the target frame is higher than that of the frame to be processed may refer to: the resolution of the target frame is higher than the resolution of the frame to be processed, and the image texture and color information of the target frame are higher than those of the frame to be processed.
  • the image texture, color information, etc. are richer, or the image information of the target frame is more than that of the frame to be processed.
  • the embodiment of the present application does not impose any restrictions on the specific way of obtaining image information, and can be specifically selected according to the actual application scenario.
  • the image information is image gradient information
  • the Brenner gradient function, Tenengrad gradient function can , Laplacian gradient function or energy gradient function to obtain the image gradient information of the target frame or the frame to be processed.
  • the reference frame may be an adjacent frame of the frame to be processed.
  • the adjacent frames may include M frames of video frames collected before the frame to be processed and/or N frames of video frames collected after the frame to be processed; wherein, M and N are integers greater than 0. It can be understood that the embodiment of the present application does not impose any limitation on the number M and N of adjacent frames to be acquired, and specific settings can be made according to actual application scenarios, for example, one or more frames collected before the frame to be processed can be acquired A video frame may also acquire one or more video frames collected after the frame to be processed.
  • the adjacent frames include the Mth video frame collected before the frame to be processed and/or the Nth video frame collected after the frame to be processed, where M and N are integers greater than 0, such as Taking the frame to be processed as the 0th frame, the adjacent frame may be the first image frame collected before the frame to be processed, or the second image frame collected before the frame to be processed , and can be selected according to the actual application scenario.
  • the reference frame may be a video frame having the same target object as the frame to be processed, so as to facilitate obtaining a target frame with a better display effect of the target object.
  • the target object includes but is not limited to a person, a building, or a designated object and the like.
  • the image processing device may fuse at least one reference frame and the frame to be processed according to at least one motion vector, and perform reconstruction processing according to the fusion result.
  • the fusion of the frame to be processed and the reference frame is implemented under the guidance of the motion vector, so blurred reconstruction results can be avoided, and it is beneficial to obtain a target frame with better image quality.
  • the image processing apparatus may perform affine transformation on the reference frame according to the motion vector, and perform fusion processing on the transformed reference frame and the frame to be processed. It can be understood that this embodiment does not impose any restrictions on the specific implementation process of fusion processing, and specific settings can be made according to actual application scenarios, for example, the pixel values of pixels at the same position in the transformed reference frame and the frame to be processed can be After the addition, take the average value to obtain the fused result.
  • the reference frame under the guidance of the motion vector, the reference frame is used to provide more supplementary information for the frame to be processed, so as to provide rich information for the subsequent reconstruction process.
  • the image processing device performs feature extraction on the reference frame at least according to a motion vector between the reference frame and the frame to be processed, to obtain a first feature; And, the image processing device performs feature extraction on the frame to be processed to obtain a second feature; and then fuses at least one of the first feature and the second feature.
  • the first feature and the second feature are extracted respectively, and under the guidance of the motion vector, the fusion of effective information (first feature and second feature) in the frame to be processed and the reference frame is realized, instead of all information Fusion, on the basis of providing rich features for the subsequent reconstruction process to improve the image quality of the target frame, also reduces the amount of data in the subsequent reconstruction process, which is conducive to improving the reconstruction process efficiency.
  • the information extracted by the feature extraction process includes but is not limited to edge features, shape (contour) features, color features or texture features, etc. It can be understood that the embodiment of the present application does not impose any restrictions on the method used for feature extraction, and can be specifically set according to the actual application scenario, such as convolution operation, HOG (histogram of Oriented Gradient, histogram of oriented gradient), SIFT (Scale-invariant features transform, scale-invariant feature transformation), SURF (Speeded Up Robust Features, accelerated robust features) or DOG (Difference of Gaussian, Gaussian function difference) and other methods for feature extraction.
  • HOG hoverogram of Oriented Gradient, histogram of oriented gradient
  • SIFT Scale-invariant features transform, scale-invariant feature transformation
  • SURF Speeded Up Robust Features, accelerated robust features
  • DOG Difference of Gaussian, Gaussian function difference
  • the image processing device may perform fusion processing on the frame to be processed, the reference frame, and motion vectors between the reference frame and the frame to be processed to obtain fusion data; and then The fusion data is subjected to feature extraction to obtain the first feature.
  • the extracted first features include feature information of the motion vector, feature information of the reference frame, and feature information of the frame to be processed.
  • the image processing apparatus may perform affine transformation on the reference frame according to the motion vector, and perform feature extraction on the transformed reference frame to obtain the first feature.
  • the extracted first feature includes feature information of the reference frame transformed by the motion vector.
  • the image processing device may use the fused result to perform reconstruction processing.
  • the image processing device can perform dimensionality reduction processing on the fused results, and use the reduced dimensionality results to perform Reconstruction processing is beneficial to reduce the amount of computing data and improve reconstruction processing efficiency.
  • the fused result includes a third feature obtained by fusing at least one of the first feature and the second feature; the image processing device may perform dimensionality reduction processing on the third feature, and use the dimensionality-reduced The third feature is reconstructed.
  • the embodiment of the present application does not impose any limitation on the specific method of dimensionality reduction processing, which can be specifically set according to actual application scenarios, for example, the dimensionality reduction processing of the third feature can be performed by pooling method or convolution operation.
  • the image processing device can obtain the quantization parameter related to the frame to be processed generated in the compression process, and then perform the The frame to be processed is reconstructed; among them, the quantization parameter can reflect the degree of degradation of the decompressed video frame during the compression process, which can well guide the reconstruction process of the frame to be processed, so as to further improve the image quality of the reconstructed target frame .
  • the image processing device may fuse at least one of the reference frames and the frame to be processed according to at least one of the motion vectors to obtain a first fusion result, and further fuse the first fusion result with a quantization parameter , and perform reconstruction processing according to the fusion result.
  • the image processing device may perform feature extraction on the reference frame at least according to a motion vector between the reference frame and the frame to be processed to obtain the first feature; and, perform feature extraction on the frame to be processed to obtain the second feature Two features; then fusing quantization parameters related to the frame to be processed, at least one first feature and second feature, and performing reconstruction processing according to the fusion result.
  • the fused result can be input into a pre-established video frame reconstruction network, and the reconstruction process is performed through the video frame reconstruction network to obtain an image quality higher than that of the frame to be processed The target frame of the image quality. It can be understood that the embodiment of the present application does not impose any limitation on the specific structure of the video frame reconstruction network, and specific settings can be made according to actual application scenarios.
  • the video frame reconstruction network is used to restore the frame to be processed, so that the acquired target frame can be close to the video frame captured by the shooting device.
  • the video frame reconstruction network is used to perform super-resolution reconstruction processing on the frame to be processed, so that the resolution of the obtained target frame is higher than the resolution of the frame to be processed.
  • the training samples may be fused data obtained by fusing the decompressed video frame and at least one reference frame of the video frame by using the relevant motion vector, or may be fused data according to The decompressed video frame, at least one reference frame of the video frame, the motion vector between the video frame and the reference frame, and the fusion data obtained by the quantization parameters of the decompressed video frame;
  • the label includes a restored video frame or a super-resolution video frame ;
  • a number of fusion data belong to the video frame reconstruction network, and the fusion data is reconstructed by the video frame reconstruction network to obtain the predicted video frame; Adjust the parameters of the video frame reconstruction network to obtain the video frame reconstruction network used to restore the video frame; if it is based on the purpose of super-resolution reconstruction, the video frame can be adjusted according to the difference between the super-resolution video frame and the predicted video frame
  • the parameters of the reconstruction network are used to obtain a video frame reconstruction network for performing super-resolution reconstruction processing on
  • FIG. 4 shows a schematic diagram of a data fusion network 100 and a video frame reconstruction network 200 .
  • the data fusion network 100 is used to adopt the motion vector V t-1 ⁇ t of the video frame t-1 to frame t generated by the compression process and frame t to frame t
  • the motion vector V t ⁇ t+1 of +1 is used as a guide to fuse the information of video frame t-1, video frame t and video frame t+1 to assist the restoration or super-resolution process of video frame t.
  • the video frame reconstruction network 200 is used to restore or super-resolution the video frame t by using the information obtained after fusing the video frame t ⁇ 1, the video frame t and the video frame t+1.
  • the sizes of video frame t-1, video frame t and video frame t+1 are all C 1 ⁇ H ⁇ W, where C 1 represents the number of channels, H represents the height of the video frame, and W represents the height of the video frame Width, its specific value can be set according to the actual application scene; the size of the motion vector V t-1 ⁇ t and V t ⁇ t+1 is expressed as 2 ⁇ H ⁇ W, and its specific value can be set according to the actual application scene .
  • the data fusion network 100 includes one or more first fusion layers 10 , one or more first convolutional layers 20 , a second fusion layer 30 and a second convolutional layer 40 .
  • the number of first fusion layers 10 is determined according to the number of reference frames
  • the number of first convolutional layers 20 is determined according to the total number of reference frames and frames to be processed.
  • the video frame t-1, the video frame t-1, the video frame t and the motion vector V t-1 ⁇ t are fused through the first fusion layer 10 to obtain fusion data.
  • the first fusion layer 10 can be along the The channel dimension concatenates video frame t, video frame t-1, and motion vector V t-1 ⁇ t to obtain fusion data (a tensor with a size of C 2 ⁇ H ⁇ W); and then through the first convolutional layer 20 pairs of fusion
  • the data is subjected to feature extraction to obtain the first feature (a tensor of size C ⁇ H ⁇ W).
  • the first convolutional layer 20 For the video frame t, use the first convolutional layer 20 to extract the features of the video frame t to obtain the second feature (a tensor with a size of C ⁇ H ⁇ W).
  • the video frame t+1 the video frame t, the video frame t+1 and the motion vector V t ⁇ t+1 are fused through the first fusion layer 10 to obtain fusion data.
  • the first fusion layer 10 can be connected in series along the channel dimension Video frame t, video frame t+1 and motion vector V t ⁇ t+1 , and then use the first convolutional layer 20 to perform feature extraction on the fused data to obtain the first feature (a tensor with a size of C ⁇ H ⁇ W) .
  • the two first features and the second features can be fused through the second fusion layer 30 to obtain the third feature, such as the second fusion layer 30 along the channel dimension for the two
  • the first feature and the second feature are concatenated to obtain the third feature (a tensor with a size of 3C ⁇ H ⁇ W).
  • the second convolutional layer 40 can be used to reduce the channel dimension, and the third feature can be reduced from a 3C ⁇ H ⁇ W tensor to a C ⁇ H ⁇ W tensor , and then input the dimension-reduced third feature into the pre-established video frame reconstruction network 200, and obtain the target frame by performing reconstruction processing through the video frame reconstruction network 200.
  • the third feature obtained by the second fusion layer 30 may also be directly input into the video frame reconstruction network 200, which is not limited in this embodiment.
  • the final output target frame can be the restoration result or the super-resolution result corresponding to the video frame t, which depends on the specific structure of the video frame reconstruction network.
  • the size of the output target frame is C ⁇ m ⁇ H ⁇ m ⁇ W, where m represents the magnification factor, which can be set according to the actual application scenario. For example, for the restoration task, the value of m is 1; for the super-resolution task , usually 4 times super-resolution, so the value of m is 4.
  • the supplementary information of the two frames before and after the video frame t is effectively fused, thereby providing rich features for the subsequent video frame reconstruction network, enhancing the quality of the output target frame, and between frames Information fusion is realized under the guidance of motion vectors, which can avoid blurred reconstruction results.
  • the data fusion network 100 includes an affine transformation module 50 , a first convolutional layer 20 , a second fusion layer 30 and a second convolutional layer 40 .
  • the number of affine transformation modules 50 is determined according to the number of reference frames.
  • affine transformation module 50 For video frame t-1, in affine transformation module 50, utilize motion vector V t-1 ⁇ t to carry out affine transformation to video frame t-1; Then in the first convolutional layer 20, after transformation Video frame t-1 is subjected to feature extraction to obtain the first feature (a tensor of size C ⁇ H ⁇ W). For the video frame t, use the first convolutional layer 20 to extract the features of the video frame t to obtain the second feature (a tensor with a size of C ⁇ H ⁇ W).
  • affine transformation module 50 For video frame t+1, in the affine transformation module 50, utilize motion vector V t ⁇ t+1 to carry out affine transformation to video frame t+1; Then in the first convolutional layer 20, after transformation Video frame t+1 is subjected to feature extraction to obtain the first feature (a tensor of size C ⁇ H ⁇ W). The subsequent operation process is similar to that in FIG. 4.
  • the two first features and second features can be fused through the second fusion layer 30 to obtain the third feature, and then through the second volume Layer 40 is used to reduce the channel dimension, and the third feature is reduced from the tensor of 3C ⁇ H ⁇ W to the tensor of C ⁇ H ⁇ W, and then the third feature after dimensionality reduction is input into the pre-established video frame
  • the reconstruction network 200 is obtained by performing reconstruction processing on the video frame reconstruction network 200 to obtain a target frame.
  • the supplementary information of the two frames before and after the video frame t is effectively fused, thereby providing rich features for the subsequent video frame reconstruction network, enhancing the quality of the output target frame, and between frames Information fusion is realized under the guidance of motion vectors, which can avoid blurred reconstruction results.
  • FIG. 6 Please refer to FIG. 6 .
  • the quantization parameter q of the video frame t in the compression process is further fused.
  • the second fusion layer 30 is used to fuse the two first features, one second feature and the quantization parameter q of the video frame t in the compression process to obtain the third feature that combines the quantization parameters; optionally, Then the third feature fused with the quantization parameter can be passed through the second convolutional layer 40 to reduce the channel dimension, and the third feature fused with the quantization parameter can be reduced from a tensor of 3C ⁇ H ⁇ W to C ⁇ H ⁇ W tensor, and then input the dimension-reduced third feature into the pre-established video frame reconstruction network 200, and obtain the target frame through the reconstruction process of the video frame reconstruction network 200.
  • the third feature fused with quantization parameters can also be directly input into the video frame reconstruction network 200 for reconstruction without going through the dimensionality reduction process of the second convolutional layer 40 .
  • the quantization parameter generated in the compression process is introduced, and the quantization parameter can reflect the degradation degree of the decompressed video frame in the compression process, so the quantization parameter can be used to guide the reconstruction process of the decompressed video frame well, and enhance Reconstruct effects to improve image quality.
  • the image signal processing device may first perform restoration processing on the frame to be processed, and the image signal processing device obtains the Quantization parameters related to the frame to be processed, and then restore the frame to be processed according to the quantization parameter related to the frame to be processed, the image quality of the restored frame to be processed is higher than that before restoration The image quality of the frame to be processed; furthermore, the image processing device may use the motion vector, the reference frame, and the restored frame to be processed to perform reconstruction processing to obtain the target frame.
  • the quantization parameter generated in the compression process is introduced in the recovery process, and the quantization parameter can reflect the degradation degree of the decompressed video frame in the compression process, so the quantization parameter can be used to guide the decompressed video frame well.
  • the restoration process enhance the restoration effect, improve the image quality.
  • the restoration process can also be performed on the reference frame, that is, the image signal processing device obtains the information related to the frame to be processed generated during the compression process.
  • a quantization parameter and a quantization parameter related to the reference frame then performing restoration processing on the frame to be processed according to the quantization parameter related to the frame to be processed; and, according to the quantization parameter related to the reference frame
  • the quantization parameter performs restoration processing on the reference frame; wherein, the image quality of the restored frame to be processed is higher than the image quality of the pending frame before restoration, and the image quality of the restored reference frame is higher than that of the reference frame before restoration.
  • the image quality of the frame may use the motion vector, the restored reference frame, and the restored frame to be processed to perform reconstruction processing to obtain the target frame.
  • restoration processing is performed on both the adjacent frame and the frame to be processed, which is beneficial to improve image quality.
  • the image processing The device may acquire quantization parameters related to the frame to be processed generated during the compression process, and then perform restoration processing on the fused result according to the quantization parameter related to the frame to be processed. Furthermore, the image processing device can perform reconstruction processing using the result of the restoration processing.
  • the quantization parameters generated during the compression process are introduced into the restoration process, so the quantization parameters can be used to guide the restoration process of the decompressed video frame well, enhance the restoration effect, and only need to restore the fused result It only needs to be processed, which is beneficial to improve the recovery efficiency.
  • the image processing device After the image processing device obtains the compressed video stream, it can use the decoder to decompress the compressed video stream. Since the compressed video stream also carries quantization parameter information related to the frame to be processed, the embodiment of the present application needs to use
  • the quantization parameter of can be output by the decoder during the process of decompressing the compressed video stream, and the process of obtaining the quantization parameter does not require additional calculation.
  • the quantization parameter includes but not limited to a quantization parameter (Quantization Parameter, QP) or a quantization matrix (Quantization Matrix, QM).
  • QP quantization Parameter
  • QM quantization Matrix
  • the quantization parameter is at least determined according to the channel quality of the channel used to transmit the frame to be processed.
  • the video frame sequence is collected by the mobile platform using the shooting device carried by it during the movement, and then the video frame sequence is compressed and transmitted by the mobile platform.
  • the movable platform detects the channel quality of the channel between the movable platform and the image processing device, wherein the channel quality can be determined by at least one of the following channel parameters in the channel: signal strength, noise strength, signal-to-noise ratio or channel capacity. Then the movable platform determines the degree of quantization of the video frames in the video frame sequence according to the quality of the channel, so as to realize the good transmission of the compressed video stream.
  • the quantization degree corresponding to the frame to be processed indicated by the quantization parameter has a negative correlation with the channel quality. If the channel quality of the channel between the movable platform and the image processing device is better (for example, higher than the preset value), it means that the current channel can transmit more data, and the movable platform can set the frame corresponding to the frame to be processed.
  • the movable platform can set the corresponding The higher the degree of quantization of the video frame, for example, the greater the quantization parameter, the greater the quantization loss. In other words, the greater the degree of degradation of the frame to be processed, the smaller the quantized data amount of the frame to be processed.
  • the video frame sequence corresponding to the compressed video stream is affected by the actual channel environment during the compression process.
  • the quantization parameters determined during the compression process of the video frames in the video frame sequence also change with the actual channel environment. Changes, so that after the image processing device decompresses the compressed video stream, the degree of degradation of different frames to be processed is also different, and the degree of degradation is determined based on the size of the quantization parameter determined during the compression process. Therefore, in the embodiment of the present application, the quantization parameter generated during the compression process is introduced in the process of restoring the frame to be processed.
  • the quantization parameter can reflect the degree of degradation of the frame to be processed during the compression process, so it can well guide the quality of the frame to be processed. Restoration process, enhancing the restoration effect, is also conducive to improving image quality.
  • the image processing device may perform fusion processing on the quantization parameter and the frame to be processed to obtain fusion data, and then perform feature extraction on the fusion data to obtain fusion features.
  • the fusion features extracted in the frame are restored to obtain the restored frames to be processed.
  • the related features of the quantization parameters generated in the compression process are introduced into the fusion feature, which can enhance the restoration effect.
  • the image processing device may perform feature extraction on the quantization parameter to obtain a fourth feature; and perform feature extraction on the frame to be processed to obtain a fifth feature; and then fuse the first Fusion features are obtained from the four features and the fifth feature, and restoration processing is performed according to the fusion features to obtain a restored frame to be processed.
  • the related features of the quantization parameters generated in the compression process are introduced into the fusion feature, which can enhance the restoration effect.
  • the frame to be processed can be restored through a pre-trained video frame restoration network, for example, the quantization parameter and the frame to be processed can be input into a pre-trained video frame restoration network; by the The video frame restoration network performs restoration processing on the frame to be processed according to the quantization degree of the frame to be processed indicated by the quantization parameter during the compression process; furthermore, the image processing device can obtain the restored frame output by the video frame restoration network of pending frames.
  • the image quality of the frame to be processed after restoration is higher than the image quality of the frame to be processed before restoration. Wherein, image quality can be measured by parameters such as resolution, image information, image color, or image texture.
  • the quantization parameters in the input video frame restoration network correspond to the frames to be processed one by one
  • the video frame restoration network may be based on one or more The frames to be processed and their one-to-one corresponding quantization parameters are restored frame by frame to obtain one or more restored frames to be processed.
  • the video frame restoration network 300 includes a fusion layer 301 , a convolutional layer 302 and a restoration network 303 .
  • the fusion layer 301 is used to fuse the quantized parameters and the frame to be processed to obtain fusion data; the convolution layer 302 is used to convolve elements of the fusion data to extract fusion features; Perform restoration processing to obtain the first target frame.
  • the video frame restoration network 300 includes a third convolutional layer 3021 , a fourth convolutional layer 3022 , a fusion layer 301 and a restoration network 303 .
  • the third convolutional layer 3021 is used to perform feature extraction on the quantized parameters to obtain the first feature
  • the fourth convolutional layer 3022 is used to perform feature extraction on the frame to be processed to obtain the second feature
  • the fusion layer 301 is used to The first feature and the second feature are fused to obtain a fusion feature.
  • the first feature and the second feature can be connected in series along the channel dimension
  • the restoration network 303 is used to perform restoration processing according to the fusion feature to obtain the first target frame.
  • the training process of the video frame restoration network is exemplified.
  • the training samples of the video frame restoration network may include the original video frame sequence, the degraded video frame sequence obtained by compressing the original video frame sequence, and the The quantization parameter sequence corresponding to the degraded video frame sequence; in the training process, the degraded video frame sequence and the quantized parameter sequence can be input into the video frame restoration network, and the video frame restoration network is based on the degraded video frame sequence and the quantization parameter sequence Perform restoration processing frame by frame to obtain prediction results, and then calculate the loss function of the video frame restoration network according to the difference between the original video frame sequence and the prediction result, and adjust the video frame according to the loss function of the video frame restoration network The parameters of the restoration network are obtained to obtain the trained video frame restoration network.
  • the video frames may be divided into multiple image blocks, and each image block adopts a corresponding quantization method according to the channel quality of the current channel.
  • the quantization degree of different image blocks in the video frame may be different. That is to say, the quantization parameter can be used to indicate the different quantization degrees of different image blocks in the frame to be processed during the compression process; different quantization degrees of different image blocks in the frame to be processed, and perform different restoration processes on the different image blocks.
  • different quantization degrees mean that different image blocks in the frame to be processed have different degradation degrees, and different restoration processing methods may be used to perform restoration processing, thereby effectively improving the restoration effect.
  • the quantization parameter includes multiple sub-quantization parameters, and the frame to be processed has corresponding sub-quantization parameters for different regions.
  • the quantization parameter of the frame to be processed includes 4 sub-quantization parameters, which are respectively related to the There is a one-to-one correspondence between the four different image blocks in the frame. Then, for each image block, the above two possible implementation manners can be adopted to introduce the sub-quantization parameter corresponding to the image block into the restoration process of the image block.
  • the image processing device may perform fusion processing on sub-quantization parameters and corresponding image blocks to obtain fusion data, and then may perform feature extraction on the fusion data to obtain fusion features, and based on the fusion features extracted from the fusion data Restoration processing is performed to obtain restored image blocks; after restoration processing is performed on different image blocks of the frame to be processed, the first target frame can be obtained.
  • the image processing device may perform feature extraction on the sub-quantization parameters to obtain a fourth feature; and perform feature extraction on the image block of the frame to be processed to obtain a fifth feature; then Fusing the fourth feature and the fifth feature to obtain a fusion feature, performing restoration processing according to the fusion feature to obtain a restored image block; after performing restoration processing on different image blocks of the frame to be processed, the first target frame can be obtained .
  • the pre-trained video frame restoration network can be used to restore different image blocks in the frame to be processed, for example, the quantization parameters and the frame to be processed can be input into the pre-trained video frame restoration network; the video frame restoration network performs different restoration processes on the different image block regions according to the different quantization degrees of the image blocks in different regions in the frame to be processed indicated by the quantization parameter; furthermore, the image processing device can Obtain the restored frames to be processed output by the video frame restoration network.
  • FIG. 8A adds a restoration process on the basis of the embodiment in FIG. 4, decompressing the compressed video stream, and obtaining decompressed video frame t-1, t and video frame t+1, and quantization parameter q-1, quantization parameter q and quantization parameter q+1 respectively associated with video frame t-1, video frame t and video frame t+1.
  • the parameters and video frames are restored frame by frame, and the restored video frame t-1, the restored video frame t, and the restored video frame t+1 are obtained.
  • the quantization parameter generated in the compression process is introduced in the process of restoring the video frame.
  • the quantization parameter can reflect the degradation degree of the video frame in the compression process, so the quantization parameter can be used to guide the restoration of the video frame well. process, enhance the restoration effect, and also help to improve the image quality.
  • FIG. 8B In another exemplary embodiment, please refer to FIG. 8B.
  • it may be considered to only reconstruct the video frame t and decompress the compressed video stream. Processing, obtain the decompressed video frame t-1, video frame t and video frame t+1, and the quantization parameter q related to the video frame t, then input the video frame t and the quantization parameter q into the video frame restoration network 300, The video frame restoration network 300 performs restoration processing according to the quantization parameters and the video frame to obtain the restored video frame t; and then the data fusion network is used for the motion vector V t -1 from the video frame t-1 to the frame t generated by the compression process ⁇ t and the motion vector V t ⁇ t+1 from frame t to frame t+1 are used as a guide to fuse the information of video frame t-1, restored video frame t and video frame t+1 to assist in the reconstruction of video frame t process.
  • FIG. 9 On the basis of FIG. 4 , a restoration process is added in FIG. 9 .
  • the data fusion process of the data fusion network 100 is similar to that in FIG. 4 , and details are not repeated here.
  • the image processing device can obtain the quantization parameters related to the frame to be processed generated during the compression process, and then combine them with
  • the quantization parameter related to the frame to be processed and the dimension-reduced third feature output by the second convolutional layer 40 are input into the video frame restoration network 300, and the video frame restoration network 300 performs the dimensionality-reduction third feature according to the quantization parameter.
  • the three features are restored, and then the restored result is input into the video frame reconstruction network 200 to obtain a target frame with better image quality.
  • the dimensionality reduction operation of the second convolutional layer 40 is an optional solution, and in other implementation manners, the third feature sum output by the second fusion layer 30 can also be directly correlated with the frame to be processed
  • the quantization parameters of are directly input into the video frame reconstruction network 200.
  • only the fusion data that is, the third feature after dimensionality reduction
  • needs to be restored which is beneficial to reduce the amount of data that needs to be restored and improve the efficiency of restoration processing.
  • an image processing apparatus 400 including:
  • memory 41 for storing executable instructions
  • processors 42 one or more processors 42;
  • processors 42 execute the executable instructions, they are individually or collectively configured to perform any one of the methods described above.
  • the processor 42 executes the executable instructions included in the memory 41, the processor 42 can be a central processing unit (Central Processing Unit, CPU), and can also be other general-purpose processors, digital signal processors (Digital Signal Processor) Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
  • the memory 41 stores the executable instructions of the method for returning to the voyage of the unmanned aerial vehicle, and the memory 41 can include at least one type of storage medium, and the storage medium includes a flash memory, a hard disk, a multimedia card, a card type memory (for example, SD or DX memory etc.), Random Access Memory (RAM), Static Random Access Memory (SRAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Programmable Read Only Memory (PROM), Magnetic Memory , Disk, CD, etc. Also, the device may cooperate with a web storage which performs a storage function of the memory through a network connection.
  • the storage 41 may be an internal storage unit, such as a hard disk or a memory.
  • the memory 41 can also be an external storage device, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, a flash memory card (Flash Card) and the like. Further, the memory 41 may also include both an internal storage unit and an external storage device. The memory 41 can also be used to temporarily store data that has been output or will be output.
  • an external storage device such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, a flash memory card (Flash Card) and the like.
  • the memory 41 may also include both an internal storage unit and an external storage device.
  • the memory 41 can also be used to temporarily store data that has been output or will be output.
  • processor 42 when the processor 42 executes the executable instructions, it is individually or jointly configured to:
  • the reference frame includes adjacent frames of the frame to be processed.
  • the adjacent frame includes M video frames collected before the frame to be processed and/or N video frames collected after the frame to be processed; or the adjacent frame is included before the frame to be processed
  • Mth video frame collected and/or the Nth video frame collected after the frame to be processed wherein, M and N are integers greater than 0.
  • the motion vectors are output by a decoder during decompression of the compressed video stream.
  • the processor is further configured to fuse at least one reference frame and the frame to be processed according to at least one motion vector, and perform reconstruction processing according to the fusion result.
  • the processor is further configured to, for the reference frame, perform feature extraction on the reference frame at least according to a motion vector between the reference frame and the frame to be processed, to obtain a first feature; And, feature extraction is performed on the frame to be processed to obtain a second feature; and at least one of the first feature and the second feature is fused.
  • the processor is further configured to perform fusion processing on the frame to be processed, the reference frame, and motion vectors between the reference frame and the frame to be processed to obtain fusion data; performing feature extraction on the fused data to obtain the first feature.
  • the processor is further configured to obtain a quantization parameter related to the frame to be processed generated during the compression process; fuse the quantization parameter, at least one of the first feature and the second feature .
  • the processor is further configured to perform affine transformation on the reference frame according to the motion vector, and perform feature extraction on the transformed reference frame to obtain the first feature.
  • the processor is further configured to perform dimensionality reduction processing on the fused result, and use the dimensionality reduction result to perform reconstruction processing.
  • the target frame is obtained by inputting the fused result into a pre-established video frame reconstruction network, and performing reconstruction processing through the video frame reconstruction network.
  • the video frame reconstruction network is used to restore the frame to be processed; or, the video frame reconstruction network is used to perform super-resolution reconstruction on the frame to be processed.
  • the processor before the reconstruction process is performed on the frame to be processed according to the motion vector and the reference frame to obtain the target frame, the processor is further configured to: obtain the result generated during the compression process Quantization parameters related to the frame to be processed and quantization parameters related to the reference frame; performing restoration processing on the frame to be processed according to the quantization parameter related to the frame to be processed; and, according to the quantization parameter related to the frame to be processed; and The quantization parameter related to the reference frame performs restoration processing on the reference frame.
  • the target frame is obtained by performing reconstruction processing according to the motion vector, the restored reference frame, and the restored frame to be processed.
  • the processor before performing reconstruction processing on the frame to be processed according to the motion vector and the reference frame of the adjacent frame to obtain the target frame, is further configured to: obtain Generated quantization parameters related to the frame to be processed; perform restoration processing on the frame to be processed according to the quantization parameter related to the frame to be processed; wherein, the target frame is based on the motion vector, the The reference frame and the frame to be processed after the restoration process are reconstructed and obtained.
  • the processor before performing the reconstruction process according to the fused result, is further configured to obtain a quantization parameter related to the frame to be processed generated during the compression process; according to the The quantization parameters related to the frame to be processed are restored to the fused result; reconstruction is performed according to the restored result.
  • the quantization parameters include quantization parameters or quantization matrices.
  • the quantization parameter is determined at least according to channel quality of a channel used to transmit the frame to be processed.
  • the degradation degree of the frame to be processed indicated by the quantization parameter is negatively correlated with the channel quality.
  • the quantization parameter is output by a decoder during decoding of the compressed video stream.
  • the compressed video stream is obtained by the mobile platform using its on-board shooting device to collect video frame sequences during motion, and then the mobile platform compresses and transmits the video frame sequences.
  • Various implementations described herein can be implemented using a computer readable medium such as computer software, hardware, or any combination thereof.
  • the embodiments described herein can be implemented by using Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays ( FPGA), processors, controllers, microcontrollers, microprocessors, electronic units designed to perform the functions described herein.
  • ASICs Application Specific Integrated Circuits
  • DSPs Digital Signal Processors
  • DSPDs Digital Signal Processing Devices
  • PLDs Programmable Logic Devices
  • FPGA Field Programmable Gate Arrays
  • processors controllers, microcontrollers, microprocessors, electronic units designed to perform the functions described herein.
  • an embodiment such as a procedure or a function may be implemented with a separate software module that allows at least one function or operation to be performed.
  • the software codes can be implemented by a software application (or program
  • a remote control device including the above image processing device.
  • an image processing system including a movable platform and a remote control device.
  • the movable platform is equipped with a photographing device, and the photographing device is used for capturing video frame sequences during the movement of the movable platform.
  • the movable platform is used to compress the sequence of video frames to obtain a compressed video stream, and transmit the compressed video stream to the image processing device.
  • the mobile platform includes one or more of the following: unmanned aerial vehicles, unmanned vehicles, cloud platforms, unmanned ships or mobile robots. See, for example, Figure 1, which shows a schematic diagram of a remote control device and an unmanned aerial vehicle.
  • non-transitory computer-readable storage medium including instructions, such as a memory including instructions, which are executable by a processor of an apparatus to perform the above method.
  • the non-transitory computer readable storage medium may be ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like.
  • a non-transitory computer-readable storage medium enabling the terminal to execute the above method when instructions in the storage medium are executed by a processor of the terminal.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

一种图像处理方法、装置、遥控设备、系统及存储介质。图像处理方法包括:在对压缩视频流进行解压之后,获取解压后的待处理帧、所述待处理帧的至少一个参考帧、以及在压缩过程中产生的所述参考帧与所述待处理帧之间的运动向量;根据至少一个所述运动向量和至少一个所述参考帧对所述待处理帧进行重建处理,获得目标帧;其中,所述目标帧的图像质量高于所述待处理帧的图像质量。实现以运动向量进行引导,充分利用参考帧为待处理帧的重建过程补充更多的信息,从而可以获得图像质量高于待处理帧的目标帧。

Description

图像处理方法、装置、遥控设备、系统及存储介质 技术领域
本申请涉及图像处理技术领域,具体而言,涉及一种图像处理方法、装置、遥控设备、系统及存储介质。
背景技术
随着人工智能的不断发展,计算机视觉的应用也越来越广泛。为了更好的视觉体验,用户期望看到图像质量更高的图像或者视频。
为了提高图像质量,相关技术中的一种方法是对图像(或视频帧)进行重建处理,比如将该帧图像(或视频帧)输入具有强大的非线性建模能力的深度神经网络中进行重建处理,获得重建后的图像。然而,一帧图像(或视频帧)能够提供的信息有限,即使采用深度神经网络也难以重建出具有很高质量的图像(或视频帧)。
发明内容
有鉴于此,本申请的目的之一是提供一种图像处理方法、装置、遥控设备、系统及存储介质。
第一方面,本申请实施例提供了一种图像处理方法,包括:
在对压缩视频流进行解压之后,获取解压后的待处理帧、所述待处理帧的至少一个参考帧、以及在压缩过程中产生的所述参考帧与所述待处理帧之间的运动向量;
根据至少一个所述运动向量和至少一个所述参考帧对所述待处理帧进行重建处理,获得目标帧;其中,所述目标帧的图像质量高于所述待处理帧的图像质量。
第二方面,本申请实施例提供了一种图像处理装置,所述装置包括:
用于存储可执行指令的存储器;
一个或多个处理器;
其中,所述一个或多个处理器执行所述可执行指令时,被单独地或共同地配置成执行第一方面所述的方法。
第三方面,本申请实施例提供了一种遥控设备,包括第二方面所述的图像处理装置。
第四方面,本申请实施例提供了一种图像处理系统,包括可移动平台和第三方面所述的遥控设备;
所述可移动平台安装有拍摄装置,所述拍摄装置用于在所述可移动平台运动过程中采集视频帧序列;
所述可移动平台用于对所述视频帧序列进行压缩获得压缩视频流,并向所述图像处理装置传输所述压缩视频流。
第五方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质存储有可执行指令,所述可执行指令被处理器执行时实现如第一方面所述的方法。
本申请实施例所提供的一种图像处理方法、装置、遥控设备、系统及存储介质,对压缩视频流进行解压之后,可以获得解压后的待处理帧、所述待处理帧的至少一个参考帧、以及在压缩过程中产生的所述参考帧与所述待处理帧之间的运动向量,然后可以利用至少一个所述运动向量和至少一个所述参考帧对所述待处理帧进行重建处理,以运动向量进行引导,充分利用参考帧为待处理帧的重建过程补充更多的信息,从而可以获得图像质量高于待处理帧的目标帧。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例提供的一种无人飞行系统的产品示意图;
图2是本申请实施例提供的一种视频编码的流程示意图;
图3是本申请实施例提供的一种图像处理方法的流程示意图;
图4是本申请实施例提供的一种数据融合网络和视频帧重建网络的示意图;
图5是本申请实施例提供的第二种数据融合网络和视频帧重建网络的示意图;
图6是本申请实施例提供的第三种数据融合网络和视频帧重建网络的示意图;
图7A和图7B是本申请实施例提供的两种不同的视频帧复原网络的结构示意图;
图8A和图8B是本申请实施例提供的一种数据融合网络、视频帧重建网络和视频帧复原网络的结构示意图,其中,图8A和图8B中视频帧复原网络处理的视频帧有所 差异;
图9是本申请实施例提供的另一种数据融合网络、视频帧重建网络和视频帧复原网络的结构示意图;
图10是本申请一个实施例提供的一种图像处理装置的结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
针对于相关技术中由于一帧图像(或者视频帧)能够提供的信息有限,即使采用具有强大的非线性建模能力的深度神经网络也很难重建出对应的高质量视频帧的问题,本申请实施例提供了一种图像处理方法,对压缩视频流进行解压之后,可以获得解压后的待处理帧、所述待处理帧的至少一个参考帧、以及在压缩过程中产生的所述参考帧与所述待处理帧之间的运动向量,然后可以利用至少一个所述运动向量和至少一个所述参考帧对所述待处理帧进行重建处理,在运动向量的引导下,充分利用参考帧为待处理帧的重建过程提供更多的信息,从而可以获得图像质量高于待处理帧的目标帧。
在一些实施例中,本申请实施例提供的图像处理方法可应用于图像处理装置中。图像处理装置可以是具有数据处理能力的电子设备;也可以是具有数据处理能力的计算机芯片或者集成电路,例如中央处理单元(Central Processing Unit,CPU)、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)或者现成可编程门阵列(Field-Programmable Gate Array,FPGA)等。
其中,所述电子设备的示例包括但不限于:智能电话/手机、平板计算机、个人数字助理(PDA)、膝上计算机、台式计算机、媒体内容播放器、视频游戏站/系统、虚拟现实系统、增强现实系统、可穿戴式装置(例如,手表、眼镜、手套、头饰(例如,帽子、头盔、虚拟现实头戴耳机、增强现实头戴耳机、头装式装置(HMD)、头带)、挂件、臂章、腿环、鞋子、马甲)、遥控设备(比如遥控器)、或者任何其他类型的装置。
示例性的,当图像处理装置为具有数据处理能力的计算机芯片或者集成电路,所述图像处理装置可以安装于电子设备中。
在一示例性的实施例中,图像处理装置获得的压缩视频流可以由可移动平台在运动或者非运动过程中利用其搭载的拍摄装置采集视频帧序列,然后由所述可移动平台对所述视频帧序列进行压缩并传输得到。其中,可移动平台的示例包括但不限于无人飞行器、无人驾驶车辆、云台、无人驾驶船只或者移动机器人(例如扫地机器人)等。
在一示例性的应用场景中,以可移动平台为无人飞行器(UAV),图像处理装置为无人飞行器的遥控设备为例进行说明,请参阅图1,图1示出了一种无人飞行系统的产品示意图,无人飞行系统包括有无人飞行器(UAV)110和遥控设备120。无人飞行器110和遥控设备120通信连接。
无人飞行器110可以利用遥控设备120和自备的程序控制装置操纵,可以在自动或者半自动控制下进行飞行。示例性的,无人飞行器110包括有飞行控制器,飞行控制器可以按照预先编好的程序指令对无人机进行控制,也可以通过响应来自遥控设备120的一个或多个遥控信号对无人机进行控制。
无人飞行器110设置有拍摄装置111,拍摄装置111例如可以是照相机或摄像机等用于捕获图像的设备,拍摄装置111可以与无人飞行器110通信,并在无人飞行器110的控制下进行拍摄。本实施例的拍摄装置111至少包括感光元件,该感光元件例如为互补金属氧化物半导体(Complementary Metal Oxide Semiconductor,CMOS)传感器或电荷耦合元件(Charge-coupled Device,CCD)传感器。可以理解,拍摄装置111也可直接固定于无人飞行器110,也可以通过云台搭载于无人飞行器110中。
遥控设备120可以控制无人飞行器110飞行,并控制无人飞行器110中的拍摄装置111采集视频帧。拍摄装置111可以在无人飞行器110飞行过程中采集视频帧序列,进而无人飞行器110将拍摄装置111采集的视频帧序列发送给遥控设备120,遥控设备120中可以设置有显示器121,拍摄装置111采集的视频帧序列可以在显示器121中进行显示。
其中,为了提高数据传输效率,通常无人飞行器110会将拍摄装置111采集的视频帧序列进行压缩处理之后,将压缩后的数据量更少的压缩视频流发送给遥控设备120,遥控设备120在接收到压缩视频流之后,可以使用解码器对其进行解码,并将解码后的视频帧序列显示在遥控设备120的显示器121中。
进一步地,为了提高解码后的视频帧序列的图像指令,可以使用本申请实施例提供的图像处理方法,获取解压后的待处理帧、所述待处理帧的至少一个参考帧、以及在压缩过程中产生的所述参考帧与所述待处理帧之间的运动向量;然后根据至少一个所述运动向量和至少一个所述参考帧对所述待处理帧进行重建处理,获得图像质量高 于所述待处理帧的目标帧,并在显示器中显示所述目标帧,从而有利于提高用户的视觉体验。
为了更好地理解本申请实施例中提及的运动向量,以下对视频编码进行简要说明。一般来说,视频编码过程包括预测、变换、量化和熵编码等步骤,图2示出了一种编码流程图。其中预测包括帧内预测和帧间预测两种类型,其目的在于利用预测块信息来去除当前待编码图像的冗余信息。
对于帧内预测:视频内的每一帧都可以看成是一幅独立的图像,图像中会存在一定的空间冗余,比如经常在图像或视频背景中出现的天空区域,其内部像素之间就极为相似,这样的区域为图像或视频的编码提供了很大的压缩空间。帧内预测就是用来去除每一帧内部的空间冗余。帧内预测利用本帧图像的信息获得预测块数据,其过程包括将待编码图像划分成若干个待编码图像块;然后,针对每个待编码图像块,利用待编码图像块相邻的已编码图像块来生成当前待编码图像块的预测块。
对于帧间预测:为了保持视频播放的连贯性,使人眼感觉不到帧与帧之间的停顿,一般视频都会采用25帧/秒以上的帧采样率。也就是说,在时间上两个连续帧之间的时间间隔将小于1/25=0.04秒。当视频中运动物体的速度不致过快时,相邻两帧之间的相关度就会非常高,因此,会形成帧与帧之间的时间冗余。帧间预测就是用来去除帧与帧之间的时间冗余,通过使用运动估计方法来获得运动向量。利用参考帧的信息获得预测块数据,其过程包括将待编码图像划分成若干个待编码图像块;然后,针对每个待编码图像块,在参考帧中搜索与当前待编码图像块最匹配(或者说最相似)的图像块作为预测块,预测块与当前待编码图像块的相对位移即为运动向量。所述参考帧可以是与待编码图像相邻的已编码图像。
在编码中仅使用帧内预测模式的图像帧被称为I帧,同时使用帧内预测及帧间预测的图像帧被称为P或B帧。使用帧内预测或帧间预测获得预测块后,将该待编码图像块与预侧块的相应像素值相减得到残差块。
变换是将残差块从时域变换到频域上,进而能够在频域上对残差块进行进一步分析,可以使用变换矩阵对残差块进行变换。残差块的变换通常采用二维变换,即在编码端将残差块中的残差值分别与一个NxN的变换矩阵及其转置矩阵相乘,相乘之后得到的是变换系数。
对于量化:视频内容来源于现实世界,不能确保其包含的全部信息都能被人眼感知,故可以针对人眼感知光信号的特性对视频做适当的精简,以去除其中的视觉冗余。量化就是用来去除基于人眼的视觉冗余;其中,经变换后得到的变换系数经量化参量 量化后可得到量化系数,经量化过程可以进一步提高编码效率。其中,量化参量包括但不限于量化参数(Quantization Parameter,QP)或者量化矩阵(Quantization Matrix,QM)。
对于熵编码:一般来说,显示世界中的信号,尤其是视频中的各种参量信号,在其对应信号控件中的取值不会遵循单一的均匀分布,而通常会在一个或几个特殊点取极大值或极小值,在这过程中的冗余即为统计冗余。熵编码就是用来去除这种统计冗余,将量化后的系数进行熵编码,通过为概率较大的取值点分配较短码字,为概率较小的取值点分配较长码字的途径来提高压缩效率,去除统计冗余。
最后将熵编码得到的码流及进行编码后的编码模式信息,如帧内预测模式、运动向量信息、量化参量等,进行存储或发送到解码端(比如上述的图像处理装置)。另外,量化后的系数通过反量化和反变换过程获取重建残差块,然后重建残差块与对应的预测块相加得到重建帧,重建帧经过环路滤波之后,作为其他待编码图像的参考帧使用,以便其他待编码图像进行帧间预测。
在一些实施例中,请参阅图3,图3为本申请实施例提供的一种图像处理方法的流程示意图。所述方法应用于图像处理装置,所述方法包括:
在步骤S101中,在对压缩视频流进行解压之后,获取解压后的待处理帧、所述待处理帧的至少一个相邻帧、以及在压缩过程中产生的所述相邻帧与所述待处理帧之间的运动向量。
在步骤S102中,根据至少一个所述运动向量和至少一个所述相邻帧对所述待处理帧进行重建处理,获得目标帧;其中,所述目标帧的图像质量高于所述待处理帧的图像质量。
本实施例中,在运动向量的引导下,利用参考帧为待处理帧的重建过程提供更多补充信息,从而有利于提高重建得到的目标帧的图像质量。
可以理解的是,本申请实施例对于所述压缩视频流的来源不做任何限制,可依据实际应用场景进行具体设置。示例性的,所述图像处理装置为可移动平台的遥控设备或者作为处理芯片安装于遥控设备中,所述可移动平台包括但不限于无人机(UAV)、无人驾驶车辆、无人机、移动机器人或者扫地机器人等等,可移动平台与遥控设备通信连接,可移动平台设置有拍摄装置,用户通过遥控设备控制可移动平台运动的过程中,可移动平台中的拍摄装置采集视频帧序列,然后可移动平台将拍摄装置采集的视频帧序列进行压缩得到压缩视频流之后传输给遥控设备,由遥控设备获得所述压缩视频流。示例性的,所述压缩视频流也可以是图像处理装置从其他介质比如从服务器中 获得。
图像处理装置在获得压缩视频流之后,可以使用解码器对压缩视频流进行解压处理,示例性的,参考上述视频编码过程,解码器获得压缩视频流后进行熵解码、反量化以及反变换,得到相应的残差块,根据解码得到的运动向量或帧内预测等信息获取对应的预测块,根据预测块与残差块得到当前待编码图像块中各像素点的重建值,输出解压后的视频帧序列。对于解压后的视频帧序列中的每一视频帧,可以采用本申请实施例提供的图像处理方法对每一视频帧进行重建处理,从而获取图像质量更佳的视频帧序列。
其中,由于压缩视频流中也会携带有运动向量信息,则本申请实施例需要使用的所述参考帧与所述待处理帧之间的运动向量可以由解码器在解压所述压缩视频流的过程中输出,获得运动向量的过程不需要额外的计算量,有利于提高重建效率。或者,在另一些可能实施例方式中,所述参考帧与所述待处理帧之间的运动向量可以根据由解码器在解压所述压缩视频流的过程中输出的运动向量进一步处理得到。在一些实施例中,在通过解码器对压缩视频流进行解压之后,图像处理装置获得解压后的待处理帧、所述待处理帧的解压后的至少一个参考帧、以及在压缩过程中产生的所述参考帧与所述待处理帧之间的运动向量;然后利用参考帧与待处理帧在时间维度上的相关性为待处理帧的重建过程提供更多的信息,根据至少一个所述运动向量和至少一个所述参考帧对所述待处理帧进行重建处理,获得图像质量高于所述待处理帧的图像质量的目标帧,从而有利于提高用户的视觉体验。
其中,所述图像质量可以通过图像分辨率、图像信息、图像纹理、图像颜色等参数来衡量;其中,图像信息包括但不限于信噪比、图像梯度、局部方差或者均方误差(Mean Square Error,MSE)等。示例性的,目标帧的图像质量高于待处理帧的图像质量可以是指:目标帧的分辨率高于待处理帧的分辨率、目标帧的图像纹理、颜色信息等分别比待处理帧的图像纹理、颜色信息等更丰富、或者目标帧的图像信息多于待处理帧的图像信息。
可以理解的是,本申请实施例对于获取图像信息的具体方式不做任何限制,可依据实际应用场景进行具体选择,例如所述图像信息为图像梯度信息时,可以通过Brenner梯度函数、Tenengrad梯度函数、Laplacian梯度函数或者能量梯度函数等方式来获取目标帧或者待处理帧的图像梯度信息。
示例性的,所述参考帧可以是所述待处理帧的相邻帧。其中,所述相邻帧可以包括在所述待处理帧之前采集的M帧视频帧和/或在所述待处理帧之后采集的N帧视频 帧;其中,M,N为大于0的整数。可以理解的是,本申请实施例对于获取的相邻帧的数量M,N不做任何限制,可依据实际应用场景进行具体设置,比如可以获取在所述待处理帧之前采集的一帧或者多帧视频帧,也可以获取在所述待处理帧之后采集的一帧或者多帧视频帧。或者,所述相邻帧包括在所述待处理帧之前采集的第M帧视频帧和/或在所述待处理帧之后采集的第N帧视频帧,M,N为大于0的整数,比如以所述待处理帧为第0帧,所述相邻帧可以是在所述待处理帧之前采集的第1帧图像帧,也可以是在所述待处理帧之前采集的第2帧图像帧,可依据实际应用场景进行具体选择。
示例性的,所述参考帧可以是与所述待处理帧具有相同目标对象的视频帧,从而有利于获得目标对象显示效果更佳的目标帧。所述目标对象包括但不限于人物、建筑或者指定物体等等。
在一些实施例中,图像处理装置可以根据至少一个所述运动向量,融合至少一个所述参考帧和所述待处理帧,并根据融合后的结果进行重建处理。本实施例中,待处理帧和参考帧的融合是在运动向量引导下实现的,所以能够避免产生模糊的重建结果,有利于获取图像质量更佳的目标帧。
对于至少一个所述参考帧和所述待处理帧的融合过程,这里示例性示出两种可能的实现方式:
在一种可能的实现方式中,图像处理装置可以根据运动向量对所述参考帧进行仿射变换,并将变换后的参考帧和所述待处理帧进行融合处理。可以理解的是,本实施例对于融合处理的具体实现过程不做任何限制,可依据实际应用场景进行具体设置,比如可以是将变换后的参考帧和待处理帧中相同位置的像素的像素值相加后取平均值,从而获取融合后的结果。本实施例实现在运动向量的引导下,利用参考帧为待处理帧提供更多补充信息,从而为后续的重建处理过程提供丰富的信息。
在另一种可能的实现方式中,对于所述参考帧,图像处理装置至少根据所述参考帧与所述待处理帧之间的运动向量对所述参考帧进行特征提取,获得第一特征;以及,图像处理装置对所述待处理帧进行特征提取,获得第二特征;然后融合至少一个所述第一特征和所述第二特征。本实施例中,分别提取第一特征和第二特征,并在运动向量的引导下实现对待处理帧和参考帧中的有效信息(第一特征和第二特征)的融合,而不是全部信息的融合,在为后续的重建处理过程提供丰富的特征以提高目标帧的图像质量的基础上,也减少了后续重建处理过程中的数据量,有利于提高重建处理效率。
其中,特征提取过程提取的信息包括但不限于边缘特征、形状(轮廓)特征、颜 色特征或者纹理特征等等。可以理解的是,本申请实施例对于特征提取所应用的方法不做任何限制,可依据实际应用场景进行具体设置,比如可以通过卷积运算、HOG(histogram of Oriented Gradient,方向梯度直方图)、SIFT(Scale-invariant features transform,尺度不变特征变换)、SURF(Speeded Up Robust Features,加速稳健特征)或者DOG(Difference of Gaussian,高斯函数差分)等方法来进行特征提取。
其中,对于获取第一特征的过程,这里示例性示出两种可能的实现方式:
在一种可能的实现方式中,图像处理装置可以将所述待处理帧、所述参考帧、所述参考帧与所述待处理帧之间的运动向量进行融合处理,获得融合数据;然后对所述融合数据进行特征提取,获得所述第一特征。本实施例中,提取的第一特征中包括运动向量的特征信息、参考帧的特征信息和待处理帧的特征信息。
在另一种可能的实现方式中,图像处理装置可以根据所述运动向量对所述参考帧进行仿射变换,并对变换后的参考帧进行特征提取以获得所述第一特征。本实施例中,提取的第一特征中包括有利用运动向量变换后的参考帧的特征信息。
在一些实施例中,在获得融合后的结果之后,图像处理装置可以使用融合后的结果进行重建处理。在一些应用场景中,考虑到图像处理装置的运行资源有限,或者对于重建处理效率具有一定的要求等情况,图像处理装置可以对融合后的结果进行降维处理之后,使用降维后的结果进行重建处理,从而有利于减少运算数据量,提高重建处理效率。
示例性的,所述融合后的结果包括融合至少一个所述第一特征和所述第二特征获得的第三特征;图像处理装置可以对第三特征进行降维处理,并使用降维后的第三特征进行重建处理。可以理解的是,本申请实施例对于降维处理的具体方法不做任何限制,可依据实际应用场景进行具体设置,比如可以通过池化方法或者卷积运算来对第三特征进行降维处理。
在一些实施例中,考虑到所述参考帧和所述待处理帧在压缩过程中可能损失了部分图像信息,比如在压缩过程中的量化步骤会丢失掉部分图像信息,因此,为了进一步提高图像质量,图像处理装置可以获取在压缩过程中产生的与所述待处理帧相关的量化参量,然后根据所述量化参量、至少一个所述运动向量和至少一个所述相邻帧参考帧对所述待处理帧进行重建处理;其中,量化参量可以反映出解压后的视频帧在压缩过程中的退化程度,则可以良好地指导待处理帧的重建过程,以进一步提高重建得到的目标帧的图像质量。
在一可能的实施方式中,图像处理装置可以根据至少一个所述运动向量融合至少 一个所述参考帧和所述待处理帧获得第一融合结果,并将第一融合结果与量化参量进行进一步融合,并根据融合后的结果进行重建处理。
示例性的,对于所述参考帧,图像处理装置可以至少根据参考帧与待处理帧之间的运动向量对参考帧进行特征提取,获得第一特征;以及,对待处理帧进行特征提取,获得第二特征;然后融合与所述待处理帧相关的量化参量、至少一个第一特征和第二特征,根据融合后的结果进行重建处理。
在一些实施例中,在获得融合后的结果之后,可以将融合后的结果输入预先建立好的视频帧重建网络,通过所述视频帧重建网络进行重建处理得到,得到图像质量高于待处理帧的图像质量的目标帧。可以理解的是,本申请实施例对于视频帧重建网络的具体结构不做任何限制,可依据实际应用场景进行具体设置。
示例性的,所述视频帧重建网络用于复原所述待处理帧,使得获取的目标帧能够接近拍摄装置采集的视频帧。或者,所述视频帧重建网络用于对所述待处理帧进行超分辨率重建处理,使得获得的目标帧的分辨率高于待处理帧的分辨率。
示例性的,在视频帧重建网络的训练过程中,训练样本可以是利用相关的运动向量对解压后的视频帧和该视频帧的至少一个参考帧进行融合处理得到的融合数据,或者可以是根据解压后的视频帧、该视频帧的至少一个参考帧、视频帧与参考帧之间的运动向量和解压后的视频帧的量化参量获得的融合数据;标签包括复原视频帧或者超分辨率视频帧;在训练过程中,将若干融合数据属于视频帧重建网络,由视频帧重建网络对融合数据进行重建处理获得预测视频帧;如果是基于图像复原目的,则可以根据复原视频帧和预测视频帧之间的差异调整视频帧重建网络的参数,获得用于复原视频帧的视频帧重建网络;如果是基于超分辨重建目的,则可以根据超分辨率视频帧和预测视频帧之间的差异调整视频帧重建网络的参数,获得用于对视频帧进行超分辨率重建处理的视频帧重建网络。
在一示例性的实施例中,请参阅图4,示出了数据融合网络100和视频帧重建网络200的示意图。在对解压后的视频帧t进行复原或超分辨率时,数据融合网络100用于采用压缩过程产生的视频帧t-1到帧t的运动向量V t-1→t和帧t到帧t+1的运动向量V t→t+1作为引导,融合视频帧t-1、视频帧t和视频帧t+1的信息,来辅助视频帧t的复原或超分辨率过程。视频帧重建网络200用于使用融合视频帧t-1、视频帧t和视频帧t+1后得到的信息来进行视频帧t的复原或超分辨率。在图4中,视频帧t-1、视频帧t和视频帧t+1的大小均为C 1×H×W,其中C 1表示通道数,H表示视频帧的高,W表示视频帧的宽,其具体数值可依据实际应用场景进行具体设置;运动向量V t-1→t和V t→t+1的大小表 示为2×H×W,其具体数值可依据实际应用场景进行具体设置。
对于视频帧t(大小为C 1×H×W)的复原或超分辨率,首先分别结合视频帧t-1和视频帧t+1,然后在运动向量的引导下进行融合。其中,数据融合网络100包括有一个或多个第一融合层10、一个或多个第一卷积层20、第二融合层30和第二卷积层40。其中,第一融合层10的数量根据参考帧的数量确定,第一卷积层20的数量根据参考帧和待处理帧的总数量确定。
具体地,对于视频帧t-1,将视频帧t-1、视频帧t和运动向量V t-1→t通过第一融合层10进行融合处理获得融合数据,比如第一融合层10可以沿通道维串联视频帧t、视频帧t-1和运动向量V t-1→t,得到融合数据(大小为C 2×H×W的张量);然后再通过第一卷积层20对融合数据进行特征提取,得到第一特征(大小为C×H×W的张量)。
对于视频帧t,利用第一卷积层20提取视频帧t的特征,获得第二特征(大小为C×H×W的张量)。
对于视频帧t+1,将视频帧t、视频帧t+1和运动向量V t→t+1通过第一融合层10进行融合处理获得融合数据,比如第一融合层10可以沿通道维串联视频帧t、视频帧t+1和运动向量V t→t+1,然后采用第一卷积层20对融合数据进行特征提取,得到第一特征(大小为C×H×W的张量)。
接着,对于产生两个第一特征和第二特征,可以对两个第一特征和第二特征通过第二融合层30进行融合处理得到第三特征,比如第二融合层30沿通道维对两个第一特征和第二特征进行串联得到第三特征(大小为3C×H×W的张量)。最后为了提高图像重建处理效率,可选地,可以通过第二卷积层40来进行降低通道维,将第三特征从3C×H×W的张量降维为C×H×W的张量,进而将降维后的第三特征输入预先建立好的视频帧重建网络200,通过所述视频帧重建网络200进行重建处理得到,得到目标帧。当然,也可以将第二融合层30得到的第三特征直接输入视频帧重建网络200,本实施例对此不做任何限制。
其中,最终输出的目标帧可以是视频帧t对应的复原结果或者超分辨率的结果,这取决于视频帧重构网络的具体结构。输出的目标帧的大小为C×m·H×m·W,其中m表示放大的倍数,可依据实际应用场景进行具体设置,比如对于复原任务,m的取值为1;对于超分辨率任务,通常进行4倍超分,因此m的取值为4。本实施例中,在运动向量的引导下,有效融合了视频帧t的前后两帧的补充信息,从而为后续的视频帧重建网络提供丰富的特征,加强输出的目标帧的质量,并且帧间信息的融合是在运动向量引导 下实现的,能够避免产生模糊的重建结果。
在另一示例性的实施例中,请参阅图5,图5与图4所述实施例的区别在于第一特征的获取方式不同。数据融合网络100包括有仿射变换模块50、第一卷积层20、第二融合层30和第二卷积层40。其中,仿射变换模块50的数量根据参考帧的数量确定。
对于视频帧t-1,在仿射变换模块50中,利用运动向量V t-1→t来对视频帧t-1进行仿射变换;然后在第一卷积层20中,对变换后的视频帧t-1进行特征提取以获得第一特征(大小为C×H×W的张量)。对于视频帧t,利用第一卷积层20提取视频帧t的特征,获得第二特征(大小为C×H×W的张量)。对于视频帧t+1,在仿射变换模块50中,利用运动向量V t→t+1来对视频帧t+1进行仿射变换;然后在第一卷积层20中,对变换后的视频帧t+1进行特征提取以获得第一特征(大小为C×H×W的张量)。后续操作过程与图4类似,对于产生两个第一特征和第二特征,可以对两个第一特征和第二特征通过第二融合层30进行融合处理得到第三特征,接着通过第二卷积层40来进行降低通道维,将第三特征从3C×H×W的张量降维为C×H×W的张量,进而将降维后的第三特征输入预先建立好的视频帧重建网络200,通过所述视频帧重建网络200进行重建处理得到,得到目标帧。本实施例中,在运动向量的引导下,有效融合了视频帧t的前后两帧的补充信息,从而为后续的视频帧重建网络提供丰富的特征,加强输出的目标帧的质量,并且帧间信息的融合是在运动向量引导下实现的,能够避免产生模糊的重建结果。
在又一示例性的实施例中,请参阅图6,图6在与图4所述实施例的区别在于进一步融合了视频帧t在压缩过程中的量化参量q。其中,第二融合层30用于对两个第一特征、一个第二特征和视频帧t在压缩过程中的量化参量q进行融合处理,获得融合了量化参量的第三特征;可选地,然后可以将融合了量化参量的第三特征通过第二卷积层40来进行降低通道维,将融合了量化参量的第三特征从3C×H×W的张量降维为C×H×W的张量,进而将降维后的第三特征输入预先建立好的视频帧重建网络200,通过所述视频帧重建网络200进行重建处理得到,得到目标帧。当然,在数据处理资源足够的情况下,也可以不通过第二卷积层40的降维处理,而将融合了量化参量的第三特征直接输入视频帧重建网络200中进行重建处理。本实施例中,引入压缩过程中产生的量化参量,量化参量可以反映出解压后的视频帧在压缩过程中的退化程度,因此可以使用量化参数良好地指导解压后的视频帧的重建过程,增强重建效果,提高图像质量。在一些实施例中,考虑到所述参考帧和所述待处理帧在压缩过程中可能损失了部分图像信息,比如在压缩过程中的量化步骤会丢失掉部分图像信息,因此,为了进一步提高图像质量,在所述根据所述运动向量和所述参考帧对所述待处理帧进行重建 处理之前,图像信号处理装置可以先对待处理帧进行复原处理,图像信号处理装置获取在压缩过程中产生的与所述待处理帧相关的量化参量,然后根据所述与所述待处理帧相关的量化参量对所述待处理帧进行复原处理,复原后的待处理帧的图像质量高于在复原之前的待处理帧的图像质量;进而图像处理装置可以使用所述运动向量、参考帧和复原处理后的待处理帧进行重建处理得到所述目标帧。本实施例中,在复原的过程中引入压缩过程中产生的量化参量,量化参量可以反映出解压后的视频帧在压缩过程中的退化程度,因此可以使用量化参数良好地指导解压后的视频帧的复原过程,增强复原效果,提高图像质量。
在图像信号处理装置具有足够的计算资源的情况下,为了进一步提高图像质量,还可以对参考帧也进行复原处理,即图像信号处理装置获取在压缩过程中产生的与所述待处理帧相关的量化参量、以及与所述参考帧相关的量化参量;然后根据所述与所述待处理帧相关的量化参量对所述待处理帧进行复原处理;以及,根据所述与所述参考帧相关的量化参量对所述参考帧进行复原处理;其中,复原后的待处理帧的图像质量高于在复原之前的待处理帧的图像质量,复原后的参考帧的图像质量高于在复原之前的参考帧的图像质量;进而图像处理装置可以使用所述运动向量、复原处理后的参考帧和复原处理后的待处理帧进行重建处理得到所述目标帧。本实施例中,对所述相邻帧和所述待处理帧均进行复原处理,有利于提高图像质量。
在另一实施例中,为了提高复原效率,在根据至少一个所述运动向量,融合至少一个所述参考帧和所述待处理帧之后,并且在根据融合后的结果进行重建处理之,图像处理装置可以获取在压缩过程中产生的与所述待处理帧相关的量化参量,然后根据与所述待处理帧相关的量化参量对所述融合后的结果进行复原处理。进而图像处理装置可以使用复原处理后的结果进行重建处理。本实施例中,在复原的过程中引入压缩过程中产生的量化参量,因此可以使用量化参数良好地指导解压后的视频帧的复原过程,增强复原效果,并且仅需对融合后的结果进行复原处理即可,有利于提高复原效率。
图像处理装置在获得压缩视频流之后,可以使用解码器对压缩视频流进行解压处理,由于压缩视频流中也会携带有与所述待处理帧相关的量化参量信息,则本申请实施例需要使用的量化参量可以由解码器在解压所述压缩视频流的过程中输出,获得量化参量的过程不需要额外的计算量。示例性的,量化参量包括但不限于量化参数(Quantization Parameter,QP)或者量化矩阵(Quantization Matrix,QM)。接下来以根据与所述待处理帧相关的量化参量对待处理帧进行复原处理的过程进行示例性说明, 其他视频帧(如参考帧、解压后的结果)的复原过程与其类似。
在一些实施例中,为了实现压缩视频流的良好传输,所述量化参量至少根据用于传输所述待处理帧的信道的信道质量确定。以压缩视频流由可移动平台在运动过程中利用其搭载的拍摄装置采集视频帧序列,然后由所述可移动平台对所述视频帧序列进行压缩并传输得到为例,在视频帧序列的压缩过程中,可移动平台检测可移动平台与图像处理装置之间的信道的信道质量,其中,信道质量可以通过该信道中的以下至少一种信道参数确定:信号强度、噪声强度、信噪比或者信道容量。然后可移动平台根据信道质量的好坏来决定视频帧序列中的视频帧的量化程度,以实现压缩视频流的良好传输。
示例性的,所述量化参量指示的所述待处理帧对应的量化程度与所述信道质量成负相关关系。如果可移动平台与图像处理装置之间的信道的信道质量越好(比如高于预设值),表示当前信道能够传输的数据量越多,则可移动平台可以设置所述待处理帧对应的视频帧的量化程度越低,比如量化参数越小,则量化损失也越小,换句话说,所述待处理帧的退化程度越小,待处理帧在量化后的数据量越大。如果可移动平台与图像处理装置之间的信道的信道质量越差(比如低于预设值),表明当前信道能够传输的数据量越少,则可移动平台可以设置所述待处理帧对应的视频帧的量化程度越高,比如量化参数越大,则量化损失也越大,换句话说,所述待处理帧的退化程度越大,待处理帧在量化后的数据量越小。
可以说,压缩视频流对应的视频帧序列在压缩过程中,受实际信道环境的影响,随着实际信道的信道质量变化,视频帧序列中的视频帧在压缩过程中确定的量化参量也随着变化,使得图像处理装置在对压缩视频流进行解压之后,获得的不同待处理帧的退化程度也有所差别,其退化程度基于在压缩过程中确定的量化参量的大小所确定。因此,本申请实施例在对待处理帧进行复原的过程中引入压缩过程中产生的量化参量,量化参量可以反映出所述待处理在压缩过程中的退化程度,因此可以良好地指导待处理帧的复原过程,增强复原效果,也有利于提高图像质量。
示例性地,这里提供将量化参量引入待处理帧的复原处理过程的两种可能的实施方式:
在一些可能的实施方式中,图像处理装置可以将所述量化参量和所述待处理帧进行融合处理,获得融合数据,然后可以对融合数据进行特征提取以获得融合特征,根据从所述融合数据中提取的融合特征进行复原处理,获得复原后的待处理帧。本实施例中,融合特征中引入压缩过程中产生的量化参量的相关特征,可以增强复原效果。
在另一种可能的实施方式中,图像处理装置可以对所述量化参量进行特征提取,获得第四特征;以及,对所述待处理帧进行特征提取,获得第五特征;然后融合所述第四特征和所述第五特征得到融合特征,根据融合特征进行复原处理,获得复原后的待处理帧。本实施例中,融合特征中引入压缩过程中产生的量化参量的相关特征,可以增强复原效果。
在一些实施例中,可以通过预先训练好的视频帧复原网络来对待处理帧进行复原处理,比如可以将所述量化参量和所述待处理帧输入预先训练好的视频帧复原网络;由所述视频帧复原网络根据所述量化参量指示的所述待处理帧在压缩过程中的量化程度,对所述待处理帧进行复原处理;进而图像处理装置可以获得所述视频帧复原网络输出的复原后的待处理帧。复原后的待处理帧的图像质量高于复原前的待处理帧的图像质量。其中,图像质量可以通过分辨率、图像信息、图像颜色或者图像纹理等参数来衡量。
其中,输入视频帧复原网络中的待处理帧可以有一个或多个,对应的,输入视频帧复原网络中的量化参量与待处理帧一一对应,则视频帧复原网络可以根据一个或多个待处理帧及其一一对应的量化参量,对待处理帧逐帧进行复原处理,获得一个或多个复原后的待处理帧。
在一个例子中,请参阅图7A,视频帧复原网络300包括有融合层301、卷积层302和复原网络303。融合层301用于将所述量化参量和所述待处理帧进行融合处理,获得融合数据;卷积层302用于对融合数据进行卷积元素以提取融合特征;复原网络303用于根据融合特征进行复原处理,获得所述第一目标帧。
在另一个例子中,请参阅图7B,视频帧复原网络300包括有第三卷积层3021、第四卷积层3022、融合层301和复原网络303。第三卷积层3021用于对所述量化参量进行特征提取,获得第一特征;第四卷积层3022用于对所述待处理帧进行特征提取,获得第二特征;融合层301用于融合所述第一特征和所述第二特征得到融合特征,比如可以沿通道维串联第一特征和第二特征;复原网络303用于根据融合特征进行复原处理,获得所述第一目标帧。
这里对所述视频帧复原网络的训练过程进行示例性说明,所述视频帧复原网络的训练样本可以包括原始视频帧序列、对所述原始视频帧序列进行压缩得到的退化视频帧序列和所述退化视频帧序列对应的量化参量序列;在训练过程中,可以将退化视频帧序列和量化参量序列输入视频帧复原网络中,由视频帧复原网络根据所述退化视频帧序列和所述量化参量序列逐帧进行复原处理得到预测结果,进而根据所述原始视频 帧序列和预测结果之间的差异计算所述视频帧复原网络的损失函数,根据所述视频帧复原网络的损失函数调整所述视频帧复原网络的参数,获得训练好的视频帧复原网络。
在一些实施例中,拍摄装置采集的视频帧序列中的视频帧在量化处理的过程中,可以是将视频帧划分成多个图像块,每个图像块根据当前信道的信道质量采用对应的量化参量,则基于实际信道环境的变化,视频帧中的不同图像块的量化程度可能有所不同。即是说,所述量化参量可以用于指示所述待处理帧中的不同图像块在压缩过程中的不同量化程度;则图像处理装置在进行复原处理的过程中,可以根据所述量化参量指示的所述待处理帧中的不同图像块的不同量化程度,对所述不同图像块进行不同地复原处理。本实施例中,量化程度不同,即表示待处理帧中的不同图像块的退化程度不同,则可以采用不同的复原处理方式来进行复原处理,从而有效提高复原效果。
示例性的,所述量化参量包括有多个子量化参量,所述待处理帧针对于不同区域有对应的子量化参量,比如待处理帧的量化参量包括有4个子量化参量,其分别与待处理帧中的4个不同的图像块一一对应。则对于每一图像块,可以采用上述两种可能的实施方式来将该图像块对应的子量化参量引入该图像块的复原处理过程。
在一个例子中,图像处理装置可以将子量化参量和对应的图像块进行融合处理,获得融合数据,然后可以对融合数据进行特征提取以获得融合特征,根据从所述融合数据中提取的融合特征进行复原处理,获得复原后的图像块;在分别对待处理帧的不同图像块进行复原处理之后,可以获得第一目标帧。
在另一种可能的实施方式中,图像处理装置可以对所述子量化参量进行特征提取,获得第四特征;以及,对所述待处理帧的图像块进行特征提取,获得第五特征;然后融合所述第四特征和所述第五特征得到融合特征,根据融合特征进行复原处理,获得复原后的图像块;在分别对待处理帧的不同图像块进行复原处理之后,可以获得第一目标帧。
在一些实施例中,可以通过预先训练好的视频帧复原网络来对待处理帧中的不同图像块进行复原处理,比如可以将所述量化参量和所述待处理帧输入预先训练好的视频帧复原网络;由所述视频帧复原网络根据所述量化参量指示的所述待处理帧中的不同区域图像块的不同量化程度,对所述不同图像块区域进行不同地复原处理;进而图像处理装置可以获得所述视频帧复原网络输出的复原后的待处理帧。
在一示例性的实施例中,请参阅图8A,图8A在图4实施例的基础上增加了复原处理过程,对压缩视频流进行解压处理,获得解压后的视频帧t-1、视频帧t和视频帧t+1,以及与视频帧t-1、视频帧t和视频帧t+1分别相关的量化参量q-1、量化参量q和量化 参量q+1。将视频帧t-1、视频帧t和视频帧t+1和量化参量q-1、量化参量q和量化参量q+1输入视频帧复原网络300中,由视频帧复原网络300根据对应的量化参数和视频帧逐帧进行复原处理,获得复原后的视频帧t-1、复原后的视频帧t和复原后的视频帧t+1。本实施例中,在对视频帧进行复原的过程中引入压缩过程中产生的量化参量,量化参量可以反映出视频帧在压缩过程中的退化程度,因此可以使用量化参数良好地指导视频帧的复原过程,增强复原效果,也有利于提高了图像质量。进一步地,采用压缩过程产生的视频帧t-1到帧t的运动向量V t-1→t和帧t到帧t+1的运动向量V t→t+1作为引导,融合复原后的视频帧t-1、复原后的视频帧t和复原后的视频帧t+1的信息,来辅助视频帧t的重建过程。数据融合网络100的数据融合过程和视频帧重建网络200的重建过程与图4中描述的过程类似,此处不再赘述。
在另一示例性的实施例中,请参阅图8B,在图像处理装置的计算资源不足的情况,或者为了进一步提高处理效率,可以考虑仅对视频帧t进行重建处理,对压缩视频流进行解压处理,获得解压后的视频帧t-1、视频帧t和视频帧t+1,以及与视频帧t相关的量化参量q,然后将视频帧t和量化参量q输入视频帧复原网络300中,由视频帧复原网络300根据量化参数和视频帧进行复原处理,获得复原后的视频帧t;进而数据融合网络用于采用压缩过程产生的视频帧t-1到帧t的运动向量V t-1→t和帧t到帧t+1的运动向量V t→t+1作为引导,融合视频帧t-1、复原后的视频帧t和视频帧t+1的信息,辅助视频帧t的重建过程。在另一示例性的实施例中,请参阅图9,图9在图4的基础上增加了复原处理过程,数据融合网络100的数据融合过程与图4类似,此处不再赘述。在第二融合层30获取第三特征,并且经过第二卷积层40进行降维处理之后,图像处理装置可以获取在压缩过程中产生的与所述待处理帧相关的量化参量,然后将与所述待处理帧相关的量化参量和第二卷积层40输出的降维后的第三特征输入视频帧复原网络300中,由视频帧复原网络300根据所述量化参量对降维后的第三特征进行复原处理,进而将复原后的结果输入视频帧重建网络200中,以获得图像质量更佳的目标帧。当然,可以理解的是,第二卷积层40的降维操作为可选方案,在其他实施方式中,也可以直接将第二融合层30输出的第三特征和与所述待处理帧相关的量化参量直接输入视频帧重建网络200中。本实施例中,仅需对融合数据(即降维后的第三特征)进行复原处理,有利于减少需要进行复原处理的数据量,提高复原处理效率。
以上实施方式中的各种技术特征可以任意进行组合,只要特征之间的组合不存在冲突或矛盾,因此上述实施方式中的各种技术特征的任意进行组合也属于本说明书公开的范围。
相应地,请参阅图10,本申请实施例还提供了一种图像处理装置400,包括:
用于存储可执行指令的存储器41;
一个或多个处理器42;
其中,所述一个或多个处理器42执行所述可执行指令时,被单独地或共同地配置成执行上述任意一项所述的方法。
所述处理器42执行所述存储器41中包括的可执行指令,所述处理器42可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
所述存储器41存储无人机的返航方法的可执行指令,所述存储器41可以包括至少一种类型的存储介质,存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等等。而且,设备可以与通过网络连接执行存储器的存储功能的网络存储装置协作。存储器41可以是内部存储单元,例如硬盘或内存。存储器41也可以是外部存储设备,例如插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,存储器41还可以既包括内部存储单元也包括外部存储设备。存储器41还可以用于暂时地存储已经输出或者将要输出的数据。
在一些实施例中,所述处理器42执行所述可执行指令时,被单独地或共同地配置成:
在对压缩视频流进行解压之后,获取解压后的待处理帧、所述待处理帧的至少一个参考帧、以及在压缩过程中产生的所述参考帧与所述待处理帧之间的运动向量;
根据至少一个所述运动向量和至少一个所述参考帧对所述待处理帧进行重建处理,获得目标帧;其中,所述目标帧的图像质量高于所述待处理帧的图像质量。
在一些实施例中,所述参考帧包括所述待处理帧的相邻帧。所述相邻帧包括在所述待处理帧之前采集的M帧视频帧和/或在所述待处理帧之后采集的N帧视频帧;或者所述相邻帧包括在所述待处理帧之前采集的第M帧视频帧和/或在所述待处理帧之后采集的第N帧视频帧;其中,M、N为大于0的整数。
在一些实施例中,所述运动向量由解码器在解压所述压缩视频流的过程中输出。
在一些实施例中,所述处理器还用于根据至少一个所述运动向量,融合至少一个所述参考帧和所述待处理帧,并根据融合后的结果进行重建处理。
在一些实施例中,所述处理器还用于对于所述参考帧,至少根据所述参考帧与所述待处理帧之间的运动向量对所述参考帧进行特征提取,获得第一特征;以及,对所述待处理帧进行特征提取,获得第二特征;融合至少一个所述第一特征和所述第二特征。
在一些实施例中,所述处理器还用于将所述待处理帧、所述参考帧、所述参考帧与所述待处理帧之间的运动向量进行融合处理,获得融合数据;对所述融合数据进行特征提取,获得所述第一特征。
在一些实施例中,所述处理器还用于获取在压缩过程中产生的与所述待处理帧相关的量化参量;融合所述量化参量、至少一个所述第一特征和所述第二特征。
在一些实施例中,所述处理器还用于根据所述运动向量对所述参考帧进行仿射变换,并对变换后的参考帧进行特征提取以获得所述第一特征。
在一些实施例中,所述处理器还用于对所述融合后的结果进行降维处理,并使用降维后的结果进行重建处理。
在一些实施例中,所述目标帧为将所述融合后的结果输入预先建立好的视频帧重建网络,通过所述视频帧重建网络进行重建处理得到。
在一些实施例中,所述视频帧重建网络用于复原所述待处理帧;或者,所述视频帧重建网络用于对所述待处理帧进行超分辨率重建处理。
在一些实施例中,在所述根据所述运动向量和所述参考帧对所述待处理帧进行重建处理,获得目标帧之前,所述处理器还用于:获取在压缩过程中产生的与所述待处理帧相关的量化参量、以及与所述参考帧相关的量化参量;根据所述与所述待处理帧相关的量化参量对所述待处理帧进行复原处理;以及,根据所述与所述参考帧相关的量化参量对所述参考帧进行复原处理。其中,所述目标帧根据所述运动向量、复原处理后的参考帧和复原处理后的待处理帧进行重建处理得到。
在一些实施例中,在所述根据所述运动向量和所述相邻帧参考帧对所述待处理帧进行重建处理,获得目标帧之前,所述处理器还用于:获取在压缩过程中产生的与所述待处理帧相关的量化参量;根据所述与所述待处理帧相关的量化参量对所述待处理帧进行复原处理;其中,所述目标帧根据所述运动向量、所述参考帧和复原处理后的待处理帧进行重建处理得到。
在一些实施例中,在所述根据融合后的结果进行重建处理之前,所述处理器还用于获取在压缩过程中产生的与所述待处理帧相关的量化参量;根据所述与所述待处理帧相关的量化参量对所述融合后的结果进行复原处理;根据复原处理后的结果进行重建处理。
在一些实施例中,所述量化参量包括量化参数或量化矩阵。
在一些实施例中,所述量化参量至少根据用于传输所述待处理帧的信道的信道质量确定。
在一些实施例中,所述量化参量指示的所述待处理帧的退化程度与所述信道质量成负相关关系。
在一些实施例中,所述量化参量由解码器在对所述压缩视频流进行解码的过程中输出。
在一些实施例中,所述压缩视频流由可移动平台在运动过程中利用其搭载的拍摄装置采集视频帧序列,然后由所述可移动平台对所述视频帧序列进行压缩并传输得到。
这里描述的各种实施方式可以使用例如计算机软件、硬件或其任何组合的计算机可读介质来实施。对于硬件实施,这里描述的实施方式可以通过使用特定用途集成电路(ASIC)、数字信号处理器(DSP)、数字信号处理装置(DSPD)、可编程逻辑装置(PLD)、现场可编程门阵列(FPGA)、处理器、控制器、微控制器、微处理器、被设计为执行这里描述的功能的电子单元中的至少一种来实施。对于软件实施,诸如过程或功能的实施方式可以与允许执行至少一种功能或操作的单独的软件模块来实施。软件代码可以由以任何适当的编程语言编写的软件应用程序(或程序)来实施,软件代码可以存储在存储器中并且由控制器执行。
上述设备中各个单元的功能和作用的实现过程具体详见上述方法中对应步骤的实现过程,在此不再赘述。
在一些实施例中,还提供一种遥控设备,包括上述的图像处理装置。
在一些实施例中,还提供一种图像处理系统,包括可移动平台和遥控设备。
所述可移动平台安装有拍摄装置,所述拍摄装置用于在所述可移动平台运动过程中采集视频帧序列。
所述可移动平台用于对所述视频帧序列进行压缩获得压缩视频流,并向所述图像处理装置传输所述压缩视频流。
示例性的,所述可移动平台包括以下一种或多种:无人飞行器、无人驾驶车辆、云台、无人驾驶船只或者移动机器人。比如请参阅图1,示出了遥控设备和无人飞行 器的示意图。
在示例性实施例中,还提供了一种包括指令的非临时性计算机可读存储介质,例如包括指令的存储器,上述指令可由装置的处理器执行以完成上述方法。例如,非临时性计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。
一种非临时性计算机可读存储介质,当存储介质中的指令由终端的处理器执行时,使得终端能够执行上述方法。
需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
以上对本申请实施例所提供的方法和装置进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。

Claims (25)

  1. 一种图像处理方法,其特征在于,包括:
    在对压缩视频流进行解压之后,获取解压后的待处理帧、所述待处理帧的至少一个参考帧、以及在压缩过程中产生的所述参考帧与所述待处理帧之间的运动向量;
    根据至少一个所述运动向量和至少一个所述参考帧对所述待处理帧进行重建处理,获得目标帧;其中,所述目标帧的图像质量高于所述待处理帧的图像质量。
  2. 根据权利要求1所述的方法,其特征在于,所述参考帧包括所述待处理帧的相邻帧。
  3. 根据权利要求2所述的方法,其特征在于,所述相邻帧包括在所述待处理帧之前采集的M帧视频帧和/或在所述待处理帧之后采集的N帧视频帧;或者
    所述相邻帧包括在所述待处理帧之前采集的第M帧视频帧和/或在所述待处理帧之后采集的第N帧视频帧;其中,M,N为大于0的整数。
  4. 根据权利要求1所述的方法,其特征在于,所述运动向量由解码器在解压所述压缩视频流的过程中输出。
  5. 根据权利要求1所述的方法,其特征在于,所述根据至少一个所述运动向量和至少一个所述参考帧对所述待处理帧进行重建处理,包括:
    根据至少一个所述运动向量,融合至少一个所述参考帧和所述待处理帧,并根据融合后的结果进行重建处理。
  6. 根据权利要求5所述的方法,其特征在于,所述根据至少一个所述运动向量,融合至少一个所述参考帧和所述待处理帧,包括:
    对于所述参考帧,至少根据所述参考帧与所述待处理帧之间的运动向量对所述参考帧进行特征提取,获得第一特征;以及,对所述待处理帧进行特征提取,获得第二特征;
    融合至少一个所述第一特征和所述第二特征。
  7. 根据权利要求6所述的方法,其特征在于,所述至少根据所述参考帧与所述待处理帧之间的运动向量对所述参考帧进行特征提取,包括:
    将所述待处理帧、所述参考帧、所述参考帧与所述待处理帧之间的运动向量进行融合处理,获得融合数据;
    对所述融合数据进行特征提取,获得所述第一特征。
  8. 根据权利要求6所述的方法,其特征在于,所述至少根据所述参考帧与所述待处理帧之间的运动向量对所述参考帧进行特征提取,包括:
    根据所述运动向量对所述参考帧进行仿射变换,并对变换后的参考帧进行特征提取以获得所述第一特征。
  9. 根据权利要求6所述的方法,其特征在于,还包括:
    获取在压缩过程中产生的与所述待处理帧相关的量化参量;
    所述融合至少一个所述第一特征和所述第二特征,包括:
    融合所述量化参量、至少一个所述第一特征和所述第二特征。
  10. 根据权利要求5所述的方法,其特征在于,
    所述根据融合后的结果进行重建处理,包括:
    对所述融合后的结果进行降维处理,并使用降维后的结果进行重建处理。
  11. 根据权利要求5所述的方法,其特征在于,所述目标帧为将所述融合后的结果输入预先建立好的视频帧重建网络,通过所述视频帧重建网络进行重建处理得到。
  12. 根据权利要求11所述的方法,其特征在于,所述视频帧重建网络用于复原所述待处理帧;或者,
    所述视频帧重建网络用于对所述待处理帧进行超分辨率重建处理。
  13. 根据权利要求1所述的方法,其特征在于,在所述根据所述运动向量和所述参考帧对所述待处理帧进行重建处理,获得目标帧之前,还包括:
    获取在压缩过程中产生的与所述待处理帧相关的量化参量、以及与所述参考帧相关的量化参量;
    根据所述与所述待处理帧相关的量化参量对所述待处理帧进行复原处理;以及,根据所述与所述参考帧相关的量化参量对所述参考帧进行复原处理;
    其中,所述目标帧根据所述运动向量、复原处理后的参考帧和复原处理后的待处理帧进行重建处理得到。
  14. 根据权利要求1所述的方法,其特征在于,在所述根据所述运动向量和所述参考帧对所述待处理帧进行重建处理,获得目标帧之前,还包括:
    获取在压缩过程中产生的与所述待处理帧相关的量化参量;
    根据所述与所述待处理帧相关的量化参量对所述待处理帧进行复原处理;
    其中,所述目标帧根据所述运动向量、所述参考帧和复原处理后的待处理帧进行重建处理得到。
  15. 根据权利要求5所述的方法,其特征在于,在所述根据融合后的结果进行重建处理之前,还包括:
    获取在压缩过程中产生的与所述待处理帧相关的量化参量;
    根据所述与所述待处理帧相关的量化参量对所述融合后的结果进行复原处理;
    则所述根据融合后的结果进行重建处理包括:根据复原处理后的结果进行重建处理。
  16. 根据权利要求10、13至15任意一项所述的方法,其特征在于,所述量化参量包括量化参数或量化矩阵。
  17. 根据权利要求10、13至15任意一项所述的方法,其特征在于,所述量化参量至少根据用于传输所述待处理帧的信道的信道质量确定。
  18. 根据权利要求17所述的方法,其特征在于,所述量化参量指示的所述待处理帧的退化程度与所述信道质量成负相关关系。
  19. 根据权利要求10、13至18任意一项所述的方法,其特征在于,所述量化参量由解码器在对所述压缩视频流进行解码的过程中输出。
  20. 根据权利要求1至19任意一项所述的方法,其特征在于,所述压缩视频流由可移动平台在运动过程中利用其搭载的拍摄装置采集视频帧序列,然后由所述可移动平台对所述视频帧序列进行压缩并传输得到。
  21. 一种图像处理装置,其特征在于,包括:
    用于存储可执行指令的存储器;
    一个或多个处理器;
    其中,所述一个或多个处理器执行所述可执行指令时,被单独地或共同地配置成执行权利要求1至20任意一项所述的方法。
  22. 一种遥控设备,其特征在于,包括如权利要求21所述的图像处理装置。
  23. 一种图像处理系统,其特征在于,包括可移动平台和如权利要求22所述的遥控设备;
    所述可移动平台安装有拍摄装置,所述拍摄装置用于在所述可移动平台运动过程中采集视频帧序列;
    所述可移动平台用于对所述视频帧序列进行压缩获得压缩视频流,并向所述图像处理装置传输所述压缩视频流。
  24. 根据权利要求23所述的系统,其特征在于,所述可移动平台包括以下一种或多种:无人飞行器、无人驾驶车辆、云台、无人驾驶船只或者移动机器人。
  25. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有可执行指令,所述可执行指令被处理器执行时实现如权利要求1至20任一项所述的方法。
PCT/CN2022/072348 2022-01-17 2022-01-17 图像处理方法、装置、遥控设备、系统及存储介质 WO2023133888A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/072348 WO2023133888A1 (zh) 2022-01-17 2022-01-17 图像处理方法、装置、遥控设备、系统及存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/072348 WO2023133888A1 (zh) 2022-01-17 2022-01-17 图像处理方法、装置、遥控设备、系统及存储介质

Publications (1)

Publication Number Publication Date
WO2023133888A1 true WO2023133888A1 (zh) 2023-07-20

Family

ID=87279945

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/072348 WO2023133888A1 (zh) 2022-01-17 2022-01-17 图像处理方法、装置、遥控设备、系统及存储介质

Country Status (1)

Country Link
WO (1) WO2023133888A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117896552A (zh) * 2024-03-14 2024-04-16 浙江华创视讯科技有限公司 视频会议的处理方法、视频会议系统以及相关装置

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1568011A (zh) * 2003-06-20 2005-01-19 松下电器产业株式会社 一种基于运动联合图像专家组的图像帧间增强方法
CN101551902A (zh) * 2009-05-15 2009-10-07 武汉大学 基于学习的压缩视频超分辨率的特征匹配方法
CN103475876A (zh) * 2013-08-27 2013-12-25 北京工业大学 一种基于学习的低比特率压缩图像超分辨率重建方法
CN107945108A (zh) * 2016-10-13 2018-04-20 华为技术有限公司 视频处理方法及装置
CN109255822A (zh) * 2018-07-13 2019-01-22 中国人民解放军战略支援部队航天工程大学 一种多尺度编码和多重约束的超时间分辨率压缩感知重建方法
CN111226441A (zh) * 2017-10-16 2020-06-02 华为技术有限公司 视频编码的空间变化变换

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1568011A (zh) * 2003-06-20 2005-01-19 松下电器产业株式会社 一种基于运动联合图像专家组的图像帧间增强方法
CN101551902A (zh) * 2009-05-15 2009-10-07 武汉大学 基于学习的压缩视频超分辨率的特征匹配方法
CN103475876A (zh) * 2013-08-27 2013-12-25 北京工业大学 一种基于学习的低比特率压缩图像超分辨率重建方法
CN107945108A (zh) * 2016-10-13 2018-04-20 华为技术有限公司 视频处理方法及装置
CN111226441A (zh) * 2017-10-16 2020-06-02 华为技术有限公司 视频编码的空间变化变换
CN109255822A (zh) * 2018-07-13 2019-01-22 中国人民解放军战略支援部队航天工程大学 一种多尺度编码和多重约束的超时间分辨率压缩感知重建方法

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117896552A (zh) * 2024-03-14 2024-04-16 浙江华创视讯科技有限公司 视频会议的处理方法、视频会议系统以及相关装置

Similar Documents

Publication Publication Date Title
Wang et al. Towards analysis-friendly face representation with scalable feature and texture compression
CN110798690B (zh) 视频解码方法、环路滤波模型的训练方法、装置和设备
TWI834087B (zh) 用於從位元流重建圖像及用於將圖像編碼到位元流中的方法及裝置、電腦程式產品
CN113766249B (zh) 视频编解码中的环路滤波方法、装置、设备及存储介质
CN114079779B (zh) 图像处理方法、智能终端及存储介质
Zhang et al. Davd-net: Deep audio-aided video decompression of talking heads
CN116803079A (zh) 视频和相关特征的可分级译码
TWI826160B (zh) 圖像編解碼方法和裝置
CN115409716B (zh) 视频处理方法、装置、存储介质及设备
Hu et al. An adaptive two-layer light field compression scheme using GNN-based reconstruction
CN115442609A (zh) 特征数据编解码方法和装置
CN117501696A (zh) 使用在分块之间共享的信息进行并行上下文建模
CN115604485A (zh) 视频图像的解码方法及装置
Zhao et al. CBREN: Convolutional neural networks for constant bit rate video quality enhancement
WO2023050720A1 (zh) 图像处理方法、图像处理装置、模型训练方法
TW202337211A (zh) 條件圖像壓縮
WO2023133888A1 (zh) 图像处理方法、装置、遥控设备、系统及存储介质
WO2023133889A1 (zh) 图像处理方法、装置、遥控设备、系统及存储介质
CN116847087A (zh) 视频处理方法、装置、存储介质及电子设备
WO2023193629A1 (zh) 区域增强层的编解码方法和装置
TW202420815A (zh) 使用神經網路進行圖像區域的並行處理-解碼、後濾波和rdoq
WO2023225808A1 (en) Learned image compress ion and decompression using long and short attention module
CN115643406A (zh) 视频解码方法、视频编码方法、装置、存储介质及设备
CN117321989A (zh) 基于神经网络的图像处理中的辅助信息的独立定位
NO20200708A1 (en) Method, computer program and system for detecting changes and moving objects in a video view

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22919534

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE