WO2023133889A1 - 图像处理方法、装置、遥控设备、系统及存储介质 - Google Patents

图像处理方法、装置、遥控设备、系统及存储介质 Download PDF

Info

Publication number
WO2023133889A1
WO2023133889A1 PCT/CN2022/072349 CN2022072349W WO2023133889A1 WO 2023133889 A1 WO2023133889 A1 WO 2023133889A1 CN 2022072349 W CN2022072349 W CN 2022072349W WO 2023133889 A1 WO2023133889 A1 WO 2023133889A1
Authority
WO
WIPO (PCT)
Prior art keywords
frame
video frame
processed
video
restoration
Prior art date
Application number
PCT/CN2022/072349
Other languages
English (en)
French (fr)
Inventor
郭靖宇
汪海
杨文明
张李亮
赵亮
郑萧桢
Original Assignee
深圳市大疆创新科技有限公司
清华大学深圳国际研究生院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市大疆创新科技有限公司, 清华大学深圳国际研究生院 filed Critical 深圳市大疆创新科技有限公司
Priority to PCT/CN2022/072349 priority Critical patent/WO2023133889A1/zh
Publication of WO2023133889A1 publication Critical patent/WO2023133889A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/124Quantisation

Definitions

  • the present application relates to the technical field of image processing, and specifically relates to an image processing method, device, remote control device, system, and storage medium.
  • one of the objectives of the present application is to provide an image processing method, device, remote control device, system and storage medium.
  • the embodiment of the present application provides an image processing method, including:
  • an image processing device comprising:
  • processors one or more processors
  • the one or more processors execute the executable instructions, they are individually or collectively configured to execute the method described in the first aspect.
  • the embodiment of the present application provides a remote control device, including the image processing device described in the second aspect.
  • an embodiment of the present application provides an image processing system, including a movable platform and the remote control device described in the third aspect;
  • the movable platform is equipped with a photographing device, and the photographing device is used to collect video frame sequences during the movement of the movable platform;
  • the movable platform is used to compress the sequence of video frames to obtain a compressed video stream, and transmit the compressed video stream to the image processing device.
  • the embodiment of the present application provides a computer-readable storage medium, the computer-readable storage medium stores executable instructions, and when the executable instructions are executed by a processor, the method as described in the first aspect is implemented .
  • An image processing method, device, remote control device, system, and storage medium provided in the embodiments of the present application, after decompressing the compressed video stream, obtain the decompressed frames to be processed and the frames generated during the compression process and the frames to be processed Process frame-related quantization parameters, and then introduce the quantization parameters generated during the compression process in the process of restoring the frames to be processed.
  • the quantization parameters can reflect the degree of degradation of the pending processing during the compression process, so the quantization parameters can be used to better Guide the restoration process of the frame to be processed, enhance the restoration effect, improve the image quality, and perform restoration processing on the frame to be processed according to the quantization parameter to obtain a first target frame whose image quality is higher than that of the frame to be processed.
  • Fig. 1 is a product schematic diagram of an unmanned aerial system provided by the embodiment of the present application.
  • FIG. 2 is a schematic flow diagram of a video encoding provided by an embodiment of the present application.
  • FIG. 3 is a schematic flow diagram of an image processing method provided in an embodiment of the present application.
  • FIG. 4 and FIG. 5 are schematic structural diagrams of two different video frame restoration networks provided by embodiments of the present application.
  • Fig. 6 is a schematic diagram of the generation process of the compressed video stream provided by the embodiment of the present application.
  • FIG. 7 is a schematic diagram of the acquisition process of the first target frame and the second target frame provided by the embodiment of the present application.
  • Fig. 8 is a schematic flowchart of another image processing method provided by the embodiment of the present application.
  • Figure 9A and Figure 9B are schematic structural diagrams of a video frame restoration network, data fusion network and video frame reconstruction network provided by the embodiment of the present application; wherein, the video frames processed by the video frame restoration network in Figure 9A and Figure 9B are different ;
  • FIG. 10 is a schematic diagram of two structures of the second video frame restoration network, data fusion network and video frame reconstruction network provided by the embodiment of the present application;
  • FIG. 11 is a schematic structural diagram of an image processing device provided by an embodiment of the present application.
  • the embodiment of the present application provides an image processing method, after the compressed video stream is decompressed , to obtain the decompressed frame to be processed and the quantization parameter related to the frame to be processed generated during the compression process, and then introduce the quantization parameter generated during the compression process in the process of restoring the frame to be processed, the quantization parameter can reflect The degree of degradation of the to-be-processed frame in the compression process, therefore, the quantization parameter can be used to guide the restoration process of the to-be-processed frame well, enhance the restoration effect, and improve the image quality, thereby obtaining an image quality higher than that of the to-be-processed frame The first target frame.
  • the image processing method provided by the embodiments of the present application may be applied to an image processing device.
  • the image processing device can be an electronic device with data processing capability; it can also be a computer chip or an integrated circuit with data processing capability, such as a central processing unit (Central Processing Unit, CPU), a digital signal processor (Digital Signal Processor, DSP) , Application Specific Integrated Circuit (ASIC) or off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA), etc.
  • CPU Central Processing Unit
  • DSP Digital Signal Processor
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • examples of electronic devices include, but are not limited to: smart phones/cell phones, tablet computers, personal digital assistants (PDAs), laptop computers, desktop computers, media content players, video game stations/systems, virtual reality systems, Augmented reality systems, wearable devices (e.g., watches, glasses, gloves, headgear (e.g., hats, helmets, virtual reality headsets, augmented reality headsets, head-mounted devices (HMDs), headbands), pendants , armbands, leg rings, shoes, vests), remote control devices (such as remote controls), or any other type of device.
  • PDAs personal digital assistants
  • laptop computers desktop computers
  • media content players e.g., watches, glasses, gloves, headgear (e.g., hats, helmets, virtual reality headsets, augmented reality headsets, head-mounted devices (HMDs), headbands), pendants , armbands, leg rings, shoes, vests), remote control devices (such as remote controls), or any other type of device.
  • HMDs headge
  • the image processing device when the image processing device is a computer chip or an integrated circuit with data processing capability, the image processing device may be installed in an electronic device (such as a remote control device).
  • an electronic device such as a remote control device
  • the compressed video stream obtained by the image processing device may be captured by a mobile platform during motion or non-movement using its on-board camera to capture video frame sequences, and then the mobile platform will The video frame sequence is compressed and transmitted.
  • the movable platform include but are not limited to unmanned aerial vehicles, unmanned vehicles, cloud platforms, unmanned ships, or mobile robots (such as sweeping robots) and the like.
  • the movable platform is an unmanned aerial vehicle (UAV)
  • the image processing device is a remote control device for an unmanned aerial vehicle.
  • FIG. 1 shows an unmanned
  • the UAV 110 is communicatively connected to the remote control device 120 .
  • the UAV 110 can be operated by the remote control device 120 and its own program control device, and can fly under automatic or semi-automatic control.
  • the unmanned aerial vehicle 110 includes a flight controller, and the flight controller can control the unmanned aerial vehicle according to pre-programmed instructions, and can also control the unmanned aerial vehicle by responding to one or more remote control signals from the remote control device 120. machine is controlled.
  • the UAV 110 is provided with a photographing device 111.
  • the photographing device 111 can be, for example, a camera or video camera, etc., for capturing images.
  • the photographing device 111 can communicate with the UAV 110 and take pictures under the control of the UAV 110.
  • the photographing device 111 of this embodiment includes at least a photosensitive element, such as a complementary metal oxide semiconductor (Complementary Metal Oxide Semiconductor, CMOS) sensor or a charge-coupled device (Charge-coupled Device, CCD) sensor. It can be understood that the photographing device 111 may also be directly fixed to the UAV 110 , or may be mounted in the UAV 110 through a pan/tilt.
  • CMOS complementary Metal Oxide Semiconductor
  • CCD charge-coupled Device
  • the remote control device 120 can control the UAV 110 to fly, and control the camera 111 in the UAV 110 to collect video frames.
  • the shooting device 111 can collect a video frame sequence during the flight of the unmanned aerial vehicle 110, and then the unmanned aerial vehicle 110 sends the video frame sequence collected by the shooting device 111 to the remote control device 120, and the remote control device 120 can be provided with a display 121, and the shooting device 111 The captured video frame sequence can be displayed on the display 121 .
  • the unmanned aerial vehicle 110 will compress the video frame sequence collected by the shooting device 111, and then send the compressed video stream with less compressed data to the remote control device 120.
  • a decoder can be used to decode it, and the decoded video frame sequence is displayed on the display 121 of the remote control device 120 .
  • the image processing method provided by the embodiment of the present application can be used to obtain the decompressed frame to be processed and the quantization related to the frame to be processed generated during the compression process. parameter; then according to the quantization parameter, the frame to be processed is restored to obtain the first target frame whose image quality is higher than that of the frame to be processed, and the first target frame is displayed on the display, which is beneficial to improve user's visual experience.
  • FIG. 2 shows a coding flow chart.
  • the prediction includes intra-frame prediction and inter-frame prediction, and its purpose is to use prediction block information to remove redundant information of the current image to be encoded.
  • Intra-frame prediction For intra-frame prediction: Each frame in the video can be regarded as an independent image, and there will be certain spatial redundancy in the image, such as the sky area that often appears in the image or video background, the internal pixels between It is very similar, such an area provides a large compression space for image or video encoding.
  • Intra-frame prediction is used to remove spatial redundancy within each frame.
  • Intra-frame prediction uses the information of the current frame image to obtain prediction block data. The process includes dividing the image to be encoded into several image blocks to be encoded; then, for each image block to be encoded, using the adjacent encoded image block to generate a prediction block of the current image block to be encoded.
  • the process includes dividing the image to be encoded into several image blocks to be encoded; then, for each image block to be encoded, search the reference frame for the best match with the current image block to be encoded ( Or the most similar image block) is used as the prediction block, and the relative displacement between the prediction block and the current image block to be encoded is the motion vector.
  • the reference frame may be an encoded image adjacent to the image to be encoded.
  • An image frame that only uses intra-frame prediction mode in encoding is called an I frame, and an image frame that uses both intra-frame prediction and inter-frame prediction is called a P or B frame.
  • the image block to be coded is subtracted from the corresponding pixel value of the pre-side block to obtain a residual block.
  • the transformation is to transform the residual block from the time domain to the frequency domain, so that the residual block can be further analyzed in the frequency domain, and the residual block can be transformed using a transformation matrix.
  • the transformation of the residual block usually adopts two-dimensional transformation, that is, the residual value in the residual block is multiplied by an NxN transformation matrix and its transpose matrix at the encoding end, and the transformation coefficient is obtained after multiplication.
  • the video content comes from the real world, and it cannot be guaranteed that all the information contained in it can be perceived by the human eye. Therefore, the video can be appropriately streamlined according to the characteristics of the human eye to perceive light signals to remove the visual redundancy.
  • Quantization is used to remove visual redundancy based on the human eye; among them, the transformation coefficients obtained after transformation are quantized by quantization parameters to obtain quantization coefficients, and the coding efficiency can be further improved through the quantization process.
  • the quantization parameter includes but not limited to quantization parameter (Quantization Parameter, QP) or quantization matrix (Quantization Matrix, QM).
  • Entropy coding is used to remove this statistical redundancy, entropy coding the quantized coefficients, assigning shorter codewords to value points with higher probability, and assigning longer codewords to value points with lower probability Ways to improve compression efficiency and remove statistical redundancy.
  • the code stream obtained by entropy encoding and encoded encoding mode information such as intra prediction mode, motion vector information, quantization parameters, etc.
  • the decoding end such as the above-mentioned image processing device.
  • the quantized coefficients are obtained through inverse quantization and inverse transformation processes to obtain the reconstructed residual block, and then the reconstructed residual block is added to the corresponding prediction block to obtain a reconstructed frame. After loop filtering, the reconstructed frame is used as a reference for other images to be encoded Frames are used for inter-frame prediction of other images to be encoded.
  • FIG. 3 is a schematic flowchart of an image processing method provided in an embodiment of the present application. The method is applied to an image processing device, and the method includes:
  • step S101 after the compressed video stream is decompressed, a decompressed frame to be processed and a quantization parameter related to the frame to be processed generated during the compression process are obtained.
  • step S102 the frame to be processed is restored according to the quantization parameter to obtain a first target frame; wherein, the image quality of the first target frame is higher than the image quality of the frame to be processed.
  • the quantization parameter generated in the compression process is introduced in the restoration process of the frame to be processed.
  • the quantization parameter can reflect the degradation degree of the frame to be processed in the compression process, so the quantization parameter can be used to guide the frame to be processed well. Handle the restoration process of the frame, enhance the restoration effect, and improve the image quality.
  • the image quality may include parameters such as image resolution, image information, image texture, and image color; wherein, the image information includes but is not limited to signal-to-noise ratio, image gradient, local variance or mean square error (Mean Square Error, MSE), etc. .
  • the image quality of the first target frame is higher than that of the frame to be processed may refer to: the resolution of the first target frame is higher than the resolution of the frame to be processed, the image texture and color information of the first target frame, etc.
  • the image texture and color information of the frame to be processed are respectively richer, or the image information of the first target frame is more than the image information of the frame to be processed.
  • the embodiment of the present application does not impose any restrictions on the specific way of obtaining image information, and can be specifically selected according to the actual application scenario.
  • the image information is image gradient information
  • the Brenner gradient function, Tenengrad gradient function can , Laplacian gradient function or energy gradient function to obtain the image gradient information of the target frame or the frame to be processed.
  • the image processing device is a remote control device of a movable platform or installed in the remote control device as a processing chip
  • the movable platform includes but is not limited to unmanned aerial vehicle (UAV), unmanned vehicle, unmanned aerial vehicle , mobile robot or sweeping robot, etc.
  • UAV unmanned aerial vehicle
  • the movable platform communicates with the remote control device
  • the movable platform is equipped with a shooting device.
  • the shooting device in the movable platform collects video frame sequences , and then the movable platform compresses the sequence of video frames collected by the shooting device to obtain a compressed video stream and then transmits it to the remote control device, and the remote control device obtains the compressed video stream.
  • the compressed video stream may also be obtained by the image processing apparatus from other media such as a server.
  • the image processing device After the image processing device obtains the compressed video stream, it can use the decoder to decompress the compressed video stream. For example, referring to the above video encoding process, the decoder performs entropy decoding, inverse quantization and inverse transformation after obtaining the compressed video stream, to obtain For the corresponding residual block, obtain the corresponding prediction block according to the information such as the motion vector or intra-frame prediction obtained by decoding, obtain the reconstruction value of each pixel in the current image block to be encoded according to the prediction block and the residual block, and output the decompressed video sequence of frames. For each video frame in the decompressed video frame sequence, the image processing method provided by the embodiment of the present application may be used to restore each video frame, so as to obtain a video frame sequence with better image quality.
  • the quantization parameter related to the frame to be processed can be decompressed by the decoder. Output during the process of video streaming, and the process of obtaining quantization parameters does not require additional calculation.
  • the quantization parameter includes but not limited to a quantization parameter (Quantization Parameter, QP) or a quantization matrix (Quantization Matrix, QM).
  • the quantization parameter is at least determined according to the channel quality of the channel used to transmit the frame to be processed.
  • the video frame sequence is collected by the mobile platform using the shooting device carried by it during the movement, and then the video frame sequence is compressed and transmitted by the mobile platform.
  • the movable platform detects the channel quality of the channel between the movable platform and the image processing device, wherein the channel quality can be determined by at least one of the following channel parameters in the channel: signal strength, noise strength, signal-to-noise ratio or channel capacity. Then the movable platform determines the degree of quantization of the video frames in the video frame sequence according to the quality of the channel, so as to realize the good transmission of the compressed video stream.
  • the quantization degree corresponding to the frame to be processed indicated by the quantization parameter has a negative correlation with the channel quality. If the channel quality of the channel between the movable platform and the image processing device is better (for example, higher than the preset value), it means that the current channel can transmit more data, and the movable platform can set the frame corresponding to the frame to be processed.
  • the movable platform can set the corresponding The higher the degree of quantization of the video frame, for example, the greater the quantization parameter, the greater the quantization loss. In other words, the greater the degree of degradation of the frame to be processed, the smaller the quantized data amount of the frame to be processed.
  • the video frame sequence corresponding to the compressed video stream is affected by the actual channel environment during the compression process.
  • the quantization parameters determined during the compression process of the video frames in the video frame sequence also change with the actual channel environment. Changes, so that after the image processing device decompresses the compressed video stream, the degree of degradation of different frames to be processed is also different, and the degree of degradation is determined based on the size of the quantization parameter determined during the compression process. Therefore, in the embodiment of the present application, the quantization parameter generated during the compression process is introduced in the process of restoring the frame to be processed.
  • the quantization parameter can reflect the degree of degradation of the frame to be processed during the compression process, so it can well guide the quality of the frame to be processed. Restoration process, enhancing the restoration effect, is also conducive to improving image quality.
  • the image processing device may perform fusion processing on the quantization parameter and the frame to be processed to obtain fusion data, and then perform feature extraction on the fusion data to obtain fusion features.
  • the fusion feature extracted in the method is restored to obtain the first target frame.
  • the related features of the quantization parameters generated in the compression process are introduced into the fusion feature, which can enhance the restoration effect.
  • the image processing device may perform feature extraction on the quantization parameter to obtain a first feature; and perform feature extraction on the frame to be processed to obtain a second feature; and then fuse the first A fusion feature is obtained from the first feature and the second feature, and restoration processing is performed according to the fusion feature to obtain the first target frame.
  • the related features of the quantization parameters generated in the compression process are introduced into the fusion feature, which can enhance the restoration effect.
  • the frame to be processed can be restored through a pre-trained video frame restoration network, for example, the quantization parameter and the frame to be processed can be input into a pre-trained video frame restoration network; by the The video frame restoration network performs restoration processing on the frame to be processed according to the quantization degree of the frame to be processed indicated by the quantization parameter during the compression process; furthermore, the image processing device can obtain the first output of the video frame restoration network target frame.
  • the frames to be processed in the input video frame restoration network can be composed of one or more, correspondingly, the quantization parameters in the input video frame restoration network are in one-to-one correspondence with the frames to be processed, then the video frame restoration network can be based on one or more The frames to be processed and their one-to-one corresponding quantization parameters are restored frame by frame to obtain one or more first target frames.
  • the video frame restoration network 100 includes a fusion layer 10 , a convolutional layer 20 and a restoration network 30 .
  • the fusion layer 10 is used to fuse the quantized parameters and the frame to be processed to obtain fusion data;
  • the convolution layer 20 is used to convolve elements of the fusion data to extract fusion features; Perform restoration processing to obtain the first target frame.
  • the video frame restoration network includes a first convolutional layer 21 , a second convolutional layer 22 , a fusion layer 10 and a restoration network 30 .
  • the first convolutional layer 21 is used to perform feature extraction on the quantized parameters to obtain a first feature
  • the second convolutional layer 22 is used to perform feature extraction on the frame to be processed to obtain a second feature
  • the fusion layer 10 is used to The first feature and the second feature are fused to obtain a fusion feature.
  • the first feature and the second feature can be connected in series along the channel dimension
  • the restoration network 30 is used to perform restoration processing according to the fusion feature to obtain the first target frame.
  • the training process of the video frame restoration network is exemplified.
  • the training samples of the video frame restoration network may include the original video frame sequence, the degraded video frame sequence obtained by compressing the original video frame sequence, and the The quantization parameter sequence corresponding to the degraded video frame sequence; in the training process, the degraded video frame sequence and the quantized parameter sequence can be input into the video frame restoration network, and the video frame restoration network is based on the degraded video frame sequence and the quantization parameter sequence Perform restoration processing frame by frame to obtain prediction results, and then calculate the loss function of the video frame restoration network according to the difference between the original video frame sequence and the prediction result, and adjust the video frame according to the loss function of the video frame restoration network The parameters of the restoration network are obtained to obtain the trained video frame restoration network.
  • the video frames may be divided into multiple image blocks, and each image block adopts a corresponding quantization method according to the channel quality of the current channel.
  • the quantization degree of different image blocks in the video frame may be different. That is to say, the quantization parameter can be used to indicate the different quantization degrees of different image blocks in the frame to be processed during the compression process; different quantization degrees of different image blocks in the frame to be processed, and perform different restoration processes on the different image blocks.
  • different quantization degrees mean that different image blocks in the frame to be processed have different degradation degrees, and different restoration processing methods may be used to perform restoration processing, thereby effectively improving the restoration effect.
  • the quantization parameter includes multiple sub-quantization parameters, and the frame to be processed has corresponding sub-quantization parameters for different regions.
  • the quantization parameter of the frame to be processed includes 4 sub-quantization parameters, which are respectively related to the There is a one-to-one correspondence between the four different image blocks in the frame. Then, for each image block, the above two possible implementation manners can be adopted to introduce the sub-quantization parameter corresponding to the image block into the restoration process of the image block.
  • the image processing device may perform fusion processing on sub-quantization parameters and corresponding image blocks to obtain fusion data, and then may perform feature extraction on the fusion data to obtain fusion features, and based on the fusion features extracted from the fusion data Restoration processing is performed to obtain restored image blocks; after restoration processing is performed on different image blocks of the frame to be processed, the first target frame can be obtained.
  • the image processing device may perform feature extraction on the sub-quantization parameters to obtain the first feature; and perform feature extraction on the image block of the frame to be processed to obtain the second feature; then Fusing the first feature and the second feature to obtain a fusion feature, performing restoration processing according to the fusion feature, and obtaining a restored image block; after performing restoration processing on different image blocks of the frame to be processed, the first target frame can be obtained .
  • the pre-trained video frame restoration network can be used to restore different image blocks in the frame to be processed, for example, the quantization parameters and the frame to be processed can be input into the pre-trained video frame restoration network; the video frame restoration network performs different restoration processes on the different image block regions according to the different quantization degrees of the image blocks in different regions in the frame to be processed indicated by the quantization parameter; furthermore, the image processing device can A first target frame output by the video frame restoration network is obtained.
  • FIG. Compress the subsequent video frame sequence to obtain the compressed video stream, and transmit it to the image processing device. Then, after the image processing device restores the decompressed frame to be processed to obtain the first target frame, it may further perform super-resolution reconstruction processing on the first target frame to obtain the second target frame.
  • the super-resolution reconstruction process can be performed through a pre-trained super-resolution reconstruction network 200, for example, the first target video frame can be restored by a pre-trained video frame
  • the network 100 performs restoration processing according to the quantization parameter and the frame to be processed; the second target frame can be obtained by performing a super-resolution reconstruction process on the first target video frame by a pre-trained super-resolution reconstruction network 200 .
  • the super-resolution reconstruction network and the video frame restoration network can be jointly trained through multi-task learning.
  • the training samples of the super-resolution reconstruction network and the video frame restoration network include: an original video frame sequence, a downsampled original video frame, and a degraded video frame obtained by compressing the downsampled original video frame sequence A sequence of quantization parameters corresponding to the sequence of degraded video frames.
  • the degraded video frame sequence and the quantization parameter sequence corresponding to the degraded video frame sequence are input into the video frame restoration network, and the video frame restoration network is based on the degraded video frame sequence and the degraded video frame sequence.
  • the corresponding quantization parameters are restored frame by frame to obtain the second prediction result, and then the second prediction result is input into the super-resolution reconstruction network, and the super-resolution reconstruction network performs reconstruction processing according to the second prediction result to obtain the first prediction result.
  • the loss function of the super-resolution reconstruction network and the video frame restoration network can adjust the super-resolution reconstruction network and the video frame restoration network according to the difference between the original video frame sequence and the first prediction result. parameter; or, the loss function of the super-resolution reconstruction network and the video frame restoration network can be based on the difference between the original video frame sequence and the first prediction result, and the original video frame after the downsampling and the second prediction result Adjust the parameters of the super-resolution reconstruction network and the video frame restoration network to obtain a trained super-resolution reconstruction network and video frame restoration network.
  • the joint training process of the super-resolution reconstruction network and the video frame restoration network is beneficial to improve training efficiency and training accuracy.
  • FIG. 8 shows a schematic flowchart of another image processing method, which can be performed by an image processing device, and the method includes:
  • step S201 after the compressed video stream is decompressed, a decompressed frame to be processed and quantization parameters related to the frame to be processed generated during the compression process are obtained. It is similar to step S101 and will not be repeated here.
  • step S202 the frame to be processed is restored according to the quantization parameter to obtain a first target frame; wherein, the image quality of the first target frame is higher than the image quality of the frame to be processed. It is similar to step S102 and will not be repeated here.
  • step S203 at least one reference frame of the frame to be processed is obtained; and the at least one reference frame and a motion vector between the reference frame and the frame to be processed generated during the compression process are obtained.
  • step S204 reconstructing the first target frame according to the at least one reference frame and at least one motion vector to obtain a third target frame; wherein, the image quality of the third target frame is higher than that of the first target frame The image quality of a target frame.
  • the reference frame is a decompressed reference frame or a result of restoring the decompressed reference frame, which can be specifically selected according to an actual application scenario.
  • the reference frame may be a decompressed reference frame; if the computing resources of the image processing device are sufficient, the reference frame may also be a
  • the result of performing restoration processing on the decompressed reference frame, the result of performing restoration processing on the decompressed reference frame can provide more supplementary information, which is conducive to further improving the reconstruction effect.
  • the restoration process of the reference frame is similar to the restoration process of the frame to be processed, and the quantization parameters related to the reference frame are also used to restore the reference frame, which will not be repeated here.
  • supplementary information of the reference frame is provided for the first target frame, so as to improve the image quality of the reconstructed third target frame.
  • the image processing device After the image processing device obtains the compressed video stream, it can use the decoder to decompress the compressed video stream, wherein, since the compressed video stream also carries motion vector information, the adjacent frame that needs to be used in the embodiment of the present application
  • the motion vector between the frames to be processed may be output by the decoder during the process of decompressing the compressed video stream, and the process of obtaining the motion vector does not require additional calculation, which is beneficial to improve reconstruction efficiency.
  • the motion vector between the reference frame and the frame to be processed may be obtained by further processing according to the motion vector output by the decoder during the process of decompressing the compressed video stream.
  • the reference frame may be an adjacent frame of the frame to be processed.
  • the adjacent frames may include M video frames collected before the frame to be processed and/or N video frames collected after the frame to be processed; where M and N are integers greater than 0. It can be understood that the embodiment of the present application does not impose any limitation on the number N of adjacent frames to be acquired, and specific settings can be made according to actual application scenarios, for example, one or more frames of video captured before the frame to be processed can be acquired frame, and one or more video frames collected after the frame to be processed may also be obtained.
  • the adjacent frames include the Mth video frame collected before the frame to be processed and/or the Nth video frame collected after the frame to be processed, where M and N are integers greater than 0, such as Taking the frame to be processed as the 0th frame, the adjacent frame may be the first image frame collected before the frame to be processed, or the second image frame collected before the frame to be processed , and can be selected according to the actual application scenario.
  • the reference frame may be a video frame having the same target object as the frame to be processed, so as to facilitate obtaining a third target frame with a better display effect of the target object.
  • the target objects include but are not limited to people, buildings, animals or other specified objects.
  • the image processing device may fuse at least one reference frame and the first target frame according to at least one motion vector, and perform reconstruction processing according to the fusion result to obtain a third target frame.
  • the fusion of the first target frame and the reference frame is implemented under the guidance of the motion vector, so blurred reconstruction results can be avoided, and it is beneficial to obtain a third target frame with better image quality.
  • the image processing apparatus may perform affine transformation on at least one reference frame according to the motion vector, and perform fusion processing on the transformed reference frame and the first target frame.
  • this embodiment does not impose any restrictions on the specific implementation process of the fusion process, and specific settings can be made according to actual application scenarios, for example, pixels at the same position in the transformed reference frame and the first target frame can be The pixel values are added and averaged to obtain the fused result.
  • the supplementary information of the reference frame is effectively fused into the first target frame, thereby providing rich information for the subsequent reconstruction process.
  • the image processing device performs feature extraction on the reference frame at least according to a motion vector between the reference frame and the first target frame to obtain a third feature; and, the image The processing device performs feature extraction on the first target frame to obtain a fourth feature; and then fuses at least one of the third feature and the fourth feature.
  • the third feature and the fourth feature are respectively extracted, and the effective information (third feature and fourth feature) in the first target frame and the reference frame are fused under the guidance of the motion vector, instead of all
  • the fusion of information on the basis of providing rich features for the subsequent reconstruction process to improve the image quality of the target frame, also reduces the amount of data in the subsequent reconstruction process, which is conducive to improving the reconstruction process efficiency.
  • the information extracted by the feature extraction process includes but not limited to edge features, shape (contour) features, color features or texture features and so on. It can be understood that the embodiment of the present application does not impose any restrictions on the method used for feature extraction, and can be specifically set according to the actual application scenario, such as convolution operation, HOG (histogram of Oriented Gradient, histogram of oriented gradient), SIFT (Scale-invariant features transform, scale-invariant feature transformation), SURF (Speeded Up Robust Features, accelerated robust features) or DOG (Difference of Gaussian, Gaussian function difference) and other methods for feature extraction.
  • HOG hoverogram of Oriented Gradient, histogram of oriented gradient
  • SIFT Scale-invariant features transform, scale-invariant feature transformation
  • SURF Speeded Up Robust Features, accelerated robust features
  • DOG Difference of Gaussian, Gaussian function difference
  • the image processing device may perform fusion processing on the first target frame, a reference frame, and a motion vector between the reference frame and the first target frame to obtain fusion data; and then The fusion data is subjected to feature extraction to obtain the third feature.
  • the extracted third feature includes feature information of the motion vector, feature information of the reference frame, and feature information of the first target frame.
  • the image processing apparatus may perform affine transformation on the reference frame according to the motion vector, and perform feature extraction on the transformed reference frame to obtain the third feature.
  • the extracted third feature includes feature information of the reference frame transformed by the motion vector.
  • the image processing device may use the fused result to perform reconstruction processing.
  • the image processing device can perform dimensionality reduction processing on the fused results, and use the reduced dimensionality results to perform Reconstruction processing is beneficial to reduce the amount of computing data and improve reconstruction processing efficiency.
  • the fused result includes a fifth feature obtained by fusing at least one of the third feature and the fourth feature; the image processing device may perform dimensionality reduction processing on the fifth feature, and use the dimensionality-reduced
  • the fifth feature performs reconstruction processing. It can be understood that the embodiment of the present application does not impose any limitation on the specific method of dimensionality reduction processing, and specific settings can be made according to actual application scenarios.
  • the dimensionality reduction processing of the fifth feature can be performed by pooling method or convolution operation.
  • the fused result can be input into a pre-established video frame reconstruction network, and the video frame reconstruction network is used to perform reconstruction processing to obtain an image quality higher than the first target The third target frame for the image quality of the frame. It can be understood that the embodiment of the present application does not impose any limitation on the specific structure of the video frame reconstruction network, and specific settings can be made according to actual application scenarios.
  • the video frame reconstruction network is used to restore the first target frame, so that the acquired third target frame can be close to the video frame captured by the shooting device.
  • the video frame restoration network is configured to perform super-resolution reconstruction processing on the first target frame, so that the obtained third target frame has a higher resolution than the first target frame.
  • the training sample in the training process of the video frame reconstruction network, can be the fusion data obtained by using the relevant motion vector to fuse the decompressed video frame and at least one reference frame of the video frame; the label includes the restored video frame or super-resolution video frame; in the training process, some fusion data belong to the video frame reconstruction network, and the video frame reconstruction network reconstructs the fusion data to obtain the predicted video frame; if it is based on the purpose of image restoration, it can be based on the restoration
  • the difference between the video frame and the predicted video frame adjusts the parameters of the video frame reconstruction network to obtain the video frame reconstruction network used to restore the video frame; if it is based on super-resolution reconstruction, it can be based on the super-resolution video frame and the predicted video frame
  • the difference between adjusts the parameters of the video frame reconstruction network to obtain a video frame reconstruction network for super-resolution reconstruction processing of video frames.
  • FIG. 9A shows a video frame restoration network 100 , a data fusion network 300 and a video frame reconstruction network 400 .
  • the parameters and video frames are restored frame by frame, and the restored video frame t-1, the restored video frame t, and the restored video frame t+1 are obtained.
  • the quantization parameter generated in the compression process is introduced in the process of restoring the video frame.
  • the quantization parameter can reflect the degradation degree of the video frame in the compression process, so the quantization parameter can be used to guide the restoration of the video frame well. process, enhance the restoration effect, and also help to improve the image quality.
  • the data fusion network 300 is used to use the motion vector V t -1 ⁇ t from video frame t-1 to frame t and the motion vector V t ⁇ t+1 from frame t to frame t+1 generated by the compression process as a guide , fuse the information of the restored video frame t-1, the restored video frame t and the restored video frame t+1 to assist the reconstruction process of the video frame t.
  • the video frame reconstruction network 400 is used to restore or super-resolution the video frame t by using the information obtained by fusing the restored video frame t-1, the restored video frame t and the restored video frame t+1.
  • the sizes of video frame t-1, video frame t, video frame t+1, restored video frame t-1, restored video frame t, and restored video frame t+1 are all C 1 ⁇ H ⁇ W, where C 1 represents the number of channels, H represents the height of the restored video frame, W represents the width of the restored video frame, and its specific value can be set according to the actual application scene; the motion vector V t- The size of 1 ⁇ t and V t ⁇ t+1 is expressed as 2 ⁇ H ⁇ W, and its specific values can be set according to actual application scenarios.
  • the data fusion network 300 includes one or more first fusion layers 301 , one or more third convolution layers 302 , second fusion layers 303 and fourth convolution layers 304 .
  • the number of the first fusion layer 301 is determined according to the number of adjacent frames
  • the number of the third convolutional layer 302 is determined according to the total number of adjacent frames and frames to be processed.
  • the first fusion layer 301 can concatenate the restored video frame t, the restored video frame t-1, and the motion vector V t-1 ⁇ t along the channel dimension to obtain fusion data (a size of C 2 ⁇ H ⁇ W Tensor); Then, feature extraction is performed on the fusion data through the third convolutional layer 302 to obtain the third feature (the tensor whose size is C ⁇ H ⁇ W).
  • the third convolutional layer 302 For the restored video frame t, use the third convolutional layer 302 to extract features of the restored video frame t to obtain a fourth feature (a tensor with a size of C ⁇ H ⁇ W).
  • a fourth feature a tensor with a size of C ⁇ H ⁇ W.
  • the restored video frame t+1 the restored video frame t, the restored video frame t+1 and the motion vector V t ⁇ t+1 are fused through the first fusion layer 301 to obtain fused data, for example, A fusion layer 301 can concatenate the restored video frame t, the restored video frame t+1, and the motion vector V t ⁇ t+1 along the channel dimension, and then use the third convolutional layer 302 to perform feature extraction on the fused data to obtain The third feature (tensor of size C ⁇ H ⁇ W).
  • two third features and one fourth feature can be fused through the second fusion layer 303 to obtain the fifth feature, such as the second fusion layer 303 along the channel dimension
  • Two third features and one fourth feature are concatenated to obtain the fifth feature (a tensor with a size of 3C ⁇ H ⁇ W), and finally in order to improve the image reconstruction processing efficiency, optionally, the fourth convolutional layer 304 can be used
  • the fifth feature is reduced from the tensor of 3C ⁇ H ⁇ W to the tensor of C ⁇ H ⁇ W, and then the fifth feature after dimension reduction is input into the pre-established video frame reconstruction network 400 , obtained through reconstruction processing by the video frame reconstruction network 400 , and a third target frame with better image quality is obtained.
  • the final output third target frame may be a restoration result or a super-resolution result corresponding to the restored video frame t, which depends on the specific structure of the video frame reconstruction network.
  • the size of the output target frame is C ⁇ m ⁇ H ⁇ m ⁇ W, where m represents the magnification factor, which can be set according to the actual application scenario.
  • m represents the magnification factor
  • the value of m is 1; for the super-resolution task , usually 4 times super-resolution, so the value of m is 4.
  • the supplementary information of the two frames before and after the restored video frame t is effectively fused, thereby providing rich features for the subsequent video frame reconstruction network and enhancing the quality of the output target frame.
  • the fusion of inter-frame information is realized under the guidance of motion vectors, which can avoid blurred reconstruction results.
  • the compressed video stream is decompressed to obtain decompressed video frame t-1, video frame t, and video frame t+1. It may be considered that only the decompressed
  • the video frame t is restored, and the decompressed video frame t ⁇ 1 and video frame t+1 are not restored, then the quantization parameter q related to the video frame t can be obtained.
  • the video frame t and the quantization parameter q are input into the video frame restoration network 100, and the video frame restoration network 100 performs restoration processing according to the quantization parameter and the video frame to obtain the restored video frame t.
  • the data fusion network 300 is used to use the motion vector V t -1 ⁇ t from video frame t-1 to frame t and the motion vector V t ⁇ t+1 from frame t to frame t+1 generated by the compression process as a guide , fuse the information of the decompressed video frame t-1, the restored video frame t and the decompressed video frame t+1 to assist the reconstruction process of the video frame t.
  • the video frame reconstruction network 400 is used to restore or super-resolution the video frame t by using the information obtained by fusing the decompressed video frame t ⁇ 1, the restored video frame t and the decompressed video frame t+1.
  • FIG. 10 Please refer to FIG. 10.
  • the difference between FIG. 10 and the embodiment described in FIG. 1 and the quantization parameter q-1, quantization parameter q and quantization parameter q+1 are input into the video frame restoration network 100, and the video frame restoration network 100 performs restoration processing frame by frame according to the corresponding quantization parameters and video frames to obtain the restored video Frame t-1, restored video frame t, and restored video frame t+1.
  • the affine transformation module 305 uses the motion vector V t-1 ⁇ t to carry out affine transformation to the restored video frame t-1; then In the third convolutional layer 302, feature extraction is performed on the transformed and restored video frame t ⁇ 1 to obtain a third feature (a tensor with a size of C ⁇ H ⁇ W). For the restored video frame t, use the third convolutional layer 302 to extract features of the restored video frame t to obtain a fourth feature (a tensor with a size of C ⁇ H ⁇ W).
  • the motion vector V t ⁇ t+1 is used to carry out affine transformation to the restored video frame t+1; then in the third convolutional layer 302 In , feature extraction is performed on the transformed and restored video frame t+1 to obtain the third feature (a tensor of size C ⁇ H ⁇ W).
  • the subsequent operation process is similar to that in Figure 4.
  • two third features and fourth features can be fused through the second fusion layer 3030 to obtain the fifth feature, and then through the fourth feature
  • the convolutional layer 304 is used to reduce the channel dimension, and the fifth feature is reduced from the tensor of 3C ⁇ H ⁇ W to the tensor of C ⁇ H ⁇ W, and then the fifth feature after dimensionality reduction is input into the pre-established video
  • the frame reconstruction network 400 is obtained by reconstructing the video frame reconstruction network 400 to obtain a target frame.
  • the supplementary information of the two frames before and after the restored video frame t is effectively fused, thereby providing rich features for the subsequent video frame reconstruction network and enhancing the quality of the output target frame.
  • the fusion of inter-frame information is realized under the guidance of motion vectors, which can avoid blurred reconstruction results.
  • the embodiment of the present application also provides a panoramic image shooting device 40, including:
  • memory 41 for storing executable instructions
  • processors 42 one or more processors 42;
  • processors 42 execute the executable instructions, they are individually or collectively configured to perform any one of the methods described above.
  • the processor 42 executes the executable instructions included in the memory 41, the processor 42 can be a central processing unit (Central Processing Unit, CPU), and can also be other general-purpose processors, digital signal processors (Digital Signal Processor) Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
  • the memory 41 stores the executable instructions of the method for returning to the voyage of the unmanned aerial vehicle, and the memory 41 can include at least one type of storage medium, and the storage medium includes a flash memory, a hard disk, a multimedia card, a card type memory (for example, SD or DX memory etc.), Random Access Memory (RAM), Static Random Access Memory (SRAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Programmable Read Only Memory (PROM), Magnetic Memory , Disk, CD, etc. Also, the device may cooperate with a web storage which performs a storage function of the memory through a network connection.
  • the storage 41 may be an internal storage unit, such as a hard disk or a memory.
  • the memory 41 can also be an external storage device, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, a flash memory card (Flash Card) and the like. Further, the memory 41 may also include both an internal storage unit and an external storage device. The memory 41 can also be used to temporarily store data that has been output or will be output.
  • an external storage device such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, a flash memory card (Flash Card) and the like.
  • the memory 41 may also include both an internal storage unit and an external storage device.
  • the memory 41 can also be used to temporarily store data that has been output or will be output.
  • processor 42 when the processor 42 executes the executable instructions, it is individually or jointly configured to:
  • the quantization parameters include quantization parameters or quantization matrices.
  • the quantization parameter is determined at least according to channel quality of a channel used to transmit the frame to be processed.
  • the quantization degree corresponding to the frame to be processed indicated by the quantization parameter has a negative correlation with the channel quality.
  • the quantization parameter is output by a decoder during decoding of the compressed video stream.
  • the processor 42 is further configured to perform fusion processing on the quantization parameter and the frame to be processed to obtain fusion data; perform restoration processing according to fusion features extracted from the fusion data to obtain the Describe the first target frame.
  • the processor 42 is further configured to perform feature extraction on the quantization parameter to obtain a first feature; and perform feature extraction on the frame to be processed to obtain a second feature; according to fusing the first feature The fusion feature obtained by the first feature and the second feature is restored to obtain the first target frame.
  • the quantization parameter is used to indicate different quantization degrees of different image blocks in the frame to be processed.
  • the processor 42 is further configured to perform different restoration processes on different image blocks in the frame to be processed according to different quantization degrees indicated by the quantization parameter.
  • the processor 42 is further configured to input the quantization parameter and the frame to be processed into a pre-trained video frame restoration network; The quantization degree of the frame to be processed in the compression process is restored, and the frame to be processed is restored; the first target frame output by the video frame restoration network is obtained.
  • the training samples of the video frame restoration network include an original video frame sequence, a degraded video frame sequence obtained by compressing the original video frame sequence, and a quantization parameter sequence corresponding to the degraded video frame sequence.
  • the loss function of the video frame restoration network is used to adjust the parameters of the video frame restoration network according to the difference between the original video frame sequence and the prediction result; wherein, the prediction result is determined by the video frame restoration network according to the The degraded video frame sequence and the quantization parameter sequence are restored and obtained.
  • the compressed video stream is obtained by compressing the down-sampled video frame sequence by an encoder.
  • the processor 42 is further configured to perform super-resolution reconstruction processing on the first target frame to obtain a second target frame.
  • the first target video frame is obtained by a pre-trained video frame restoration network according to the quantization parameter and the frame to be processed; the second target frame is obtained by a pre-trained super
  • the resolution reconstruction network performs super-resolution reconstruction processing on the first target video frame.
  • the super-resolution reconstruction network and the video frame restoration network are jointly trained through multi-task learning;
  • the training samples of the super-resolution reconstruction network and the video frame restoration network include: original video frames sequence, downsampled original video frames, degraded video frame sequences obtained by compressing the downsampled original video frame sequences, and quantization parameter sequences corresponding to the degraded video frame sequences.
  • the loss functions of the super-resolution reconstruction network and the video frame restoration network are used to adjust the super-resolution reconstruction network and the The parameters of the video frame restoration network; or according to the difference between the original video frame sequence and the first prediction result, and the difference between the original video frame after the downsampling and the second prediction result, adjust the super-resolution reconstruction network and the first prediction result
  • the parameters of the video frame restoration network wherein, the second prediction result is obtained by performing restoration processing by the video frame restoration network according to the quantization parameters corresponding to the degraded video frame sequence and the degraded video frame sequence respectively;
  • the first prediction structure is obtained by the super-resolution reconstruction network performing reconstruction processing according to the second prediction result.
  • the processor 42 is further configured to obtain at least one reference frame after restoration processing of the frame to be processed; obtain the at least one reference frame and the reference frame generated during the compression process and the motion vector between the frame to be processed; reconstructing the first target frame according to the at least one reference frame and at least one motion vector to obtain a third target frame; wherein, the third target frame The image quality is higher than the image quality of the first target frame.
  • the reference frame is a decompressed reference frame or a result of restoring the decompressed reference frame.
  • the reference frame includes adjacent frames of the frame to be processed.
  • the adjacent frame includes M video frames collected before the frame to be processed and/or N video frames collected after the frame to be processed; or the adjacent frame is included before the frame to be processed
  • Mth video frame collected and/or the Nth video frame collected after the frame to be processed wherein, M and N are integers greater than 0.
  • the motion vectors are output by a decoder during decompression of the compressed video stream.
  • the compressed video stream is obtained by the mobile platform using its on-board shooting device to collect video frame sequences during motion, and then the mobile platform compresses and transmits the video frame sequences.
  • Various implementations described herein can be implemented using a computer readable medium such as computer software, hardware, or any combination thereof.
  • the embodiments described herein can be implemented by using Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays ( FPGA), processors, controllers, microcontrollers, microprocessors, electronic units designed to perform the functions described herein.
  • ASICs Application Specific Integrated Circuits
  • DSPs Digital Signal Processors
  • DSPDs Digital Signal Processing Devices
  • PLDs Programmable Logic Devices
  • FPGA Field Programmable Gate Arrays
  • processors controllers, microcontrollers, microprocessors, electronic units designed to perform the functions described herein.
  • an embodiment such as a procedure or a function may be implemented with a separate software module that allows at least one function or operation to be performed.
  • the software codes can be implemented by a software application (or program
  • a remote control device including the above image processing device.
  • an image processing system including a movable platform and a remote control device.
  • the movable platform is equipped with a photographing device, and the photographing device is used for capturing video frame sequences during the movement of the movable platform.
  • the movable platform is used to compress the sequence of video frames to obtain a compressed video stream, and transmit the compressed video stream to the image processing device.
  • the mobile platform includes one or more of the following: unmanned aerial vehicles, unmanned vehicles, cloud platforms, unmanned ships or mobile robots.
  • FIG. 1 shows a schematic diagram of a remote control device and an unmanned aerial vehicle.
  • non-transitory computer-readable storage medium including instructions, such as a memory including instructions, which are executable by a processor of an apparatus to perform the above method.
  • the non-transitory computer readable storage medium may be ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, and optical data storage device, among others.
  • a non-transitory computer-readable storage medium enabling the terminal to execute the above method when instructions in the storage medium are executed by a processor of the terminal.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

一种图像处理方法、装置、遥控设备、系统及存储介质。图像处理方法包括:在对压缩视频流进行解压之后,获取解压后的待处理帧以及在压缩过程中产生的与所述待处理帧相关的量化参量;根据所述量化参量对所述待处理帧进行复原处理,获得第一目标帧;其中,所述第一目标帧的图像质量高于所述待处理帧的图像质量。实现在对待处理帧进行复原的过程中引入压缩过程中产生的量化参量,量化参量可以反映出所述待处理在压缩过程中的退化程度,因此可以使用量化参数良好地指导待处理帧的复原过程,增强复原效果,提高图像质量。

Description

图像处理方法、装置、遥控设备、系统及存储介质 技术领域
本申请涉及图像处理技术领域,具体而言,涉及一种图像处理方法、装置、遥控设备、系统及存储介质。
背景技术
随着人工智能的不断发展,计算机视觉的应用也越来越广泛。为了更好的视觉体验,用户期望看到图像质量更高的图像或者视频。
在一些图像处理过程中,比如在对图像或者视频帧进行压缩处理的过程中可能会丢失一些图像信息,使得解压后的图像或视频帧的图像质量低于压缩之前的图像或者视频帧的图像质量。因此,有必要对该类图像或者视频帧进行复原处理。
发明内容
有鉴于此,本申请的目的之一是提供一种图像处理方法、装置、遥控设备、系统及存储介质。
第一方面,本申请实施例提供了一种图像处理方法,包括:
在对压缩视频流进行解压之后,获取解压后的待处理帧以及在压缩过程中产生的与所述待处理帧相关的量化参量;
根据所述量化参量对所述待处理帧进行复原处理,获得第一目标帧;其中,所述第一目标帧的图像质量高于所述待处理帧的图像质量。
第二方面,本申请实施例提供了一种图像处理装置,所述装置包括:
用于存储可执行指令的存储器;
一个或多个处理器;
其中,所述一个或多个处理器执行所述可执行指令时,被单独地或共同地配置成执行第一方面所述的方法。
第三方面,本申请实施例提供了一种遥控设备,包括第二方面所述的图像处理装 置。
第四方面,本申请实施例提供了一种图像处理系统,包括可移动平台和第三方面所述的遥控设备;
所述可移动平台安装有拍摄装置,所述拍摄装置用于在所述可移动平台运动过程中采集视频帧序列;
所述可移动平台用于对所述视频帧序列进行压缩获得压缩视频流,并向所述图像处理装置传输所述压缩视频流。
第五方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质存储有可执行指令,所述可执行指令被处理器执行时实现如第一方面所述的方法。
本申请实施例所提供的一种图像处理方法、装置、遥控设备、系统及存储介质,在对压缩视频流进行解压之后,获取解压后的待处理帧以及在压缩过程中产生的与所述待处理帧相关的量化参量,然后在对待处理帧进行复原的过程中引入压缩过程中产生的量化参量,量化参量可以反映出所述待处理在压缩过程中的退化程度,因此可以使用量化参数良好地指导待处理帧的复原过程,增强复原效果,提高图像质量,根据所述量化参量对所述待处理帧进行复原处理获得图像质量高于所述待处理帧的图像质量的第一目标帧。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例提供的一种无人飞行系统的产品示意图;
图2是本申请实施例提供的一种视频编码的流程示意图;
图3是本申请实施例提供的一种图像处理方法的流程示意图;
图4和图5是本申请实施例提供的两种不同的视频帧复原网络的结构示意图;
图6是本申请实施例提供的压缩视频流的产生过程的示意图;
图7是本申请实施例提供的第一目标帧和第二目标帧的获取过程的示意图;
图8是本申请实施例提供的另一种图像处理方法的流程示意图;
图9A和图9B是本申请实施例提供的一种视频帧复原网络、数据融合网络和视频帧重建网络的结构示意图;其中,图9A和图9B中视频帧复原网络处理的视频帧有所 差异;
图10是本申请实施例提供的第二种视频帧复原网络、数据融合网络和视频帧重建网络的两种结构示意图;
图11是本申请实施例提供的一种图像处理装置的结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
针对于图像或者视频帧在压缩过程中丢失部分图像信息,导致解压后的图像或者视频帧的图像质量下降的问题,本申请实施例提供了一种图像处理方法,在对压缩视频流进行解压之后,获取解压后的待处理帧以及在压缩过程中产生的与所述待处理帧相关的量化参量,然后在对待处理帧进行复原的过程中引入压缩过程中产生的量化参量,量化参量可以反映出所述待处理在压缩过程中的退化程度,因此可以使用量化参数良好地指导待处理帧的复原过程,增强复原效果,提高图像质量,从而获得图像质量高于所述待处理帧的图像质量的第一目标帧。
在一些实施例中,本申请实施例提供的图像处理方法可应用于图像处理装置中。图像处理装置可以是具有数据处理能力的电子设备;也可以是具有数据处理能力的计算机芯片或者集成电路,例如中央处理单元(Central Processing Unit,CPU)、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)或者现成可编程门阵列(Field-Programmable Gate Array,FPGA)等。
其中,所述电子设备的示例包括但不限于:智能电话/手机、平板计算机、个人数字助理(PDA)、膝上计算机、台式计算机、媒体内容播放器、视频游戏站/系统、虚拟现实系统、增强现实系统、可穿戴式装置(例如,手表、眼镜、手套、头饰(例如,帽子、头盔、虚拟现实头戴耳机、增强现实头戴耳机、头装式装置(HMD)、头带)、挂件、臂章、腿环、鞋子、马甲)、遥控设备(比如遥控器)、或者任何其他类型的装置。
示例性的,当图像处理装置为具有数据处理能力的计算机芯片或者集成电路,所述图像处理装置可以安装于电子设备(比如遥控设备)中。
在一示例性的实施例中,图像处理装置获得的压缩视频流可以由可移动平台在运动或者非运动过程中利用其搭载的拍摄装置采集视频帧序列,然后由所述可移动平台对所述视频帧序列进行压缩并传输得到。其中,可移动平台的示例包括但不限于无人飞行器、无人驾驶车辆、云台、无人驾驶船只或者移动机器人(例如扫地机器人)等。
在一示例性的应用场景中,以可移动平台为无人飞行器(UAV),图像处理装置为无人飞行器的遥控设备为例进行说明,请参阅图1,图1示出了一种无人飞行系统的产品示意图,无人飞行系统包括有无人飞行器(UAV)110和遥控设备120。无人飞行器110和遥控设备120通信连接。
无人飞行器110可以利用遥控设备120和自备的程序控制装置操纵,可以在自动或者半自动控制下进行飞行。示例性的,无人飞行器110包括有飞行控制器,飞行控制器可以按照预先编好的程序指令对无人机进行控制,也可以通过响应来自遥控设备120的一个或多个遥控信号对无人机进行控制。
无人飞行器110设置有拍摄装置111,拍摄装置111例如可以是照相机或摄像机等用于捕获图像的设备,拍摄装置111可以与无人飞行器110通信,并在无人飞行器110的控制下进行拍摄。本实施例的拍摄装置111至少包括感光元件,该感光元件例如为互补金属氧化物半导体(Complementary Metal Oxide Semiconductor,CMOS)传感器或电荷耦合元件(Charge-coupled Device,CCD)传感器。可以理解,拍摄装置111也可直接固定于无人飞行器110,也可以通过云台搭载于无人飞行器110中。
遥控设备120可以控制无人飞行器110飞行,并控制无人飞行器110中的拍摄装置111采集视频帧。拍摄装置111可以在无人飞行器110飞行过程中采集视频帧序列,进而无人飞行器110将拍摄装置111采集的视频帧序列发送给遥控设备120,遥控设备120中可以设置有显示器121,拍摄装置111采集的视频帧序列可以在显示器121中进行显示。
其中,为了提高数据传输效率,通常无人飞行器110会将拍摄装置111采集的视频帧序列进行压缩处理之后,将压缩后的数据量更少的压缩视频流发送给遥控设备120,遥控设备120在接收到压缩视频流之后,可以使用解码器对其进行解码,并将解码后的视频帧序列显示在遥控设备120的显示器121中。
进一步地,为了提高解码后的视频帧序列的图像指令,可以使用本申请实施例提供的图像处理方法,获取解压后的待处理帧以及在压缩过程中产生的与所述待处理帧相关的量化参量;然后根据所述量化参量对所述待处理帧进行复原处理,获得图像质量高于所述待处理帧的第一目标帧,并在显示器中显示所述第一目标帧,从而有利于 提高用户的视觉体验。
为了更好地理解本申请实施例中提及的量化参量,以下对视频编码进行简要说明。一般来说,视频编码过程包括预测、变换、量化和熵编码等步骤,图2示出了一种编码流程图。其中预测包括帧内预测和帧间预测两种类型,其目的在于利用预测块信息来去除当前待编码图像的冗余信息。
对于帧内预测:视频内的每一帧都可以看成是一幅独立的图像,图像中会存在一定的空间冗余,比如经常在图像或视频背景中出现的天空区域,其内部像素之间就极为相似,这样的区域为图像或视频的编码提供了很大的压缩空间。帧内预测就是用来去除每一帧内部的空间冗余。帧内预测利用本帧图像的信息获得预测块数据,其过程包括将待编码图像划分成若干个待编码图像块;然后,针对每个待编码图像块,利用待编码图像块相邻的已编码图像块来生成当前待编码图像块的预测块。
对于帧间预测:为了保持视频播放的连贯性,使人眼感觉不到帧与帧之间的停顿,一般视频都会采用25帧/秒以上的帧采样率。也就是说,在时间上两个连续帧之间的时间间隔将小于1/25=0.04秒。当视频中运动物体的速度不致过快时,相邻两帧之间的相关度就会非常高,因此,会形成帧与帧之间的时间冗余。帧间预测就是用来去除帧与帧之间的时间冗余,通过使用运动估计方法来获得运动向量。利用参考帧的信息获得预测块数据,其过程包括将待编码图像划分成若干个待编码图像块;然后,针对每个待编码图像块,在参考帧中搜索与当前待编码图像块最匹配(或者说最相似)的图像块作为预测块,预测块与当前待编码图像块的相对位移即为运动向量。所述参考帧可以是与待编码图像相邻的已编码图像。
在编码中仅使用帧内预测模式的图像帧被称为I帧,同时使用帧内预测及帧间预测的图像帧被称为P或B帧。使用帧内预测或帧间预测获得预测块后,将该待编码图像块与预侧块的相应像素值相减得到残差块。
变换是将残差块从时域变换到频域上,进而能够在频域上对残差块进行进一步分析,可以使用变换矩阵对残差块进行变换。残差块的变换通常采用二维变换,即在编码端将残差块中的残差值分别与一个NxN的变换矩阵及其转置矩阵相乘,相乘之后得到的是变换系数。
对于量化:视频内容来源于现实世界,不能确保其包含的全部信息都能被人眼感知,故可以针对人眼感知光信号的特性对视频做适当的精简,以去除其中的视觉冗余。量化就是用来去除基于人眼的视觉冗余;其中,经变换后得到的变换系数经量化参量量化后可得到量化系数,经量化过程可以进一步提高编码效率。其中,量化参量包括 但不限于量化参数(Quantization Parameter,QP)或者量化矩阵(Quantization Matrix,QM)。
对于熵编码:一般来说,显示世界中的信号,尤其是视频中的各种参量信号,在其对应信号控件中的取值不会遵循单一的均匀分布,而通常会在一个或几个特殊点取极大值或极小值,在这过程中的冗余即为统计冗余。熵编码就是用来去除这种统计冗余,将量化后的系数进行熵编码,通过为概率较大的取值点分配较短码字,为概率较小的取值点分配较长码字的途径来提高压缩效率,去除统计冗余。
最后将熵编码得到的码流及进行编码后的编码模式信息,如帧内预测模式、运动向量信息、量化参量等,进行存储或发送到解码端(比如上述的图像处理装置)。另外,量化后的系数通过反量化和反变换过程获取重建残差块,然后重建残差块与对应的预测块相加得到重建帧,重建帧经过环路滤波之后,作为其他待编码图像的参考帧使用,以便其他待编码图像进行帧间预测。
在一些实施例中,请参阅图3,图3为本申请实施例提供的一种图像处理方法的流程示意图。所述方法应用于图像处理装置,所述方法包括:
在步骤S101中,在对压缩视频流进行解压之后,获取解压后的待处理帧以及在压缩过程中产生的与所述待处理帧相关的量化参量。
在步骤S102中,根据所述量化参量对所述待处理帧进行复原处理,获得第一目标帧;其中,所述第一目标帧的图像质量高于所述待处理帧的图像质量。
本实施例中,在对待处理帧进行复原的过程中引入压缩过程中产生的量化参量,量化参量可以反映出所述待处理帧在压缩过程中的退化程度,因此可以使用量化参数良好地指导待处理帧的复原过程,增强复原效果,提高图像质量。
所述图像质量可以包括图像分辨率、图像信息、图像纹理、图像颜色等参数;其中,图像信息包括但不限于信噪比、图像梯度、局部方差或者均方误差(Mean Square Error,MSE)等。示例性的,第一目标帧的图像质量高于待处理帧的图像质量可以是指:第一目标帧的分辨率高于待处理帧的分辨率、第一目标帧的图像纹理、颜色信息等分别比待处理帧的图像纹理、颜色信息等更丰富、或者第一目标帧的图像信息多于待处理帧的图像信息。
可以理解的是,本申请实施例对于获取图像信息的具体方式不做任何限制,可依据实际应用场景进行具体选择,例如所述图像信息为图像梯度信息时,可以通过Brenner梯度函数、Tenengrad梯度函数、Laplacian梯度函数或者能量梯度函数等方式来获取目标帧或者待处理帧的图像梯度信息。
可以理解的是,本申请实施例对于所述压缩视频流的来源不做任何限制,可依据实际应用场景进行具体设置。示例性的,所述图像处理装置为可移动平台的遥控设备或者作为处理芯片安装于遥控设备中,所述可移动平台包括但不限于无人机(UAV)、无人驾驶车辆、无人机、移动机器人或者扫地机器人等等,可移动平台与遥控设备通信连接,可移动平台设置有拍摄装置,用户通过遥控设备控制可移动平台运动的过程中,可移动平台中的拍摄装置采集视频帧序列,然后可移动平台将拍摄装置采集的视频帧序列进行压缩得到压缩视频流之后传输给遥控设备,由遥控设备获得所述压缩视频流。示例性的,所述压缩视频流也可以是图像处理装置从其他介质比如从服务器中获得。
图像处理装置在获得压缩视频流之后,可以使用解码器对压缩视频流进行解压处理,示例性的,参考上述视频编码过程,解码器获得压缩视频流后进行熵解码、反量化以及反变换,得到相应的残差块,根据解码得到的运动向量或帧内预测等信息获取对应的预测块,根据预测块与残差块得到当前待编码图像块中各像素点的重建值,输出解压后的视频帧序列。对于解压后的视频帧序列中的每一视频帧,可以采用本申请实施例提供的图像处理方法对每一视频帧进行复原处理,从而获取图像质量更佳的视频帧序列。
其中,由于压缩视频流中也会携带有与所述待处理帧相关的量化参量信息,则本申请实施例需要使用的与所述待处理帧相关的量化参量可以由解码器在解压所述压缩视频流的过程中输出,获得量化参量的过程不需要额外的计算量。示例性的,量化参量包括但不限于量化参数(Quantization Parameter,QP)或者量化矩阵(Quantization Matrix,QM)。
在一些实施例中,为了实现压缩视频流的良好传输,所述量化参量至少根据用于传输所述待处理帧的信道的信道质量确定。以压缩视频流由可移动平台在运动过程中利用其搭载的拍摄装置采集视频帧序列,然后由所述可移动平台对所述视频帧序列进行压缩并传输得到为例,在视频帧序列的压缩过程中,可移动平台检测可移动平台与图像处理装置之间的信道的信道质量,其中,信道质量可以通过该信道中的以下至少一种信道参数确定:信号强度、噪声强度、信噪比或者信道容量。然后可移动平台根据信道质量的好坏来决定视频帧序列中的视频帧的量化程度,以实现压缩视频流的良好传输。
示例性的,所述量化参量指示的所述待处理帧对应的量化程度与所述信道质量成负相关关系。如果可移动平台与图像处理装置之间的信道的信道质量越好(比如高于 预设值),表示当前信道能够传输的数据量越多,则可移动平台可以设置所述待处理帧对应的视频帧的量化程度越低,比如量化参数越小,则量化损失也越小,换句话说,所述待处理帧的退化程度越小,待处理帧在量化后的数据量越大。如果可移动平台与图像处理装置之间的信道的信道质量越差(比如低于预设值),表明当前信道能够传输的数据量越少,则可移动平台可以设置所述待处理帧对应的视频帧的量化程度越高,比如量化参数越大,则量化损失也越大,换句话说,所述待处理帧的退化程度越大,待处理帧在量化后的数据量越小。
可以说,压缩视频流对应的视频帧序列在压缩过程中,受实际信道环境的影响,随着实际信道的信道质量变化,视频帧序列中的视频帧在压缩过程中确定的量化参量也随着变化,使得图像处理装置在对压缩视频流进行解压之后,获得的不同待处理帧的退化程度也有所差别,其退化程度基于在压缩过程中确定的量化参量的大小所确定。因此,本申请实施例在对待处理帧进行复原的过程中引入压缩过程中产生的量化参量,量化参量可以反映出所述待处理在压缩过程中的退化程度,因此可以良好地指导待处理帧的复原过程,增强复原效果,也有利于提高图像质量。
示例性地,这里提供将量化参量引入待处理帧的复原处理过程的两种可能的实施方式:
在一些可能的实施方式中,图像处理装置可以将所述量化参量和所述待处理帧进行融合处理,获得融合数据,然后可以对融合数据进行特征提取以获得融合特征,根据从所述融合数据中提取的融合特征进行复原处理,获得所述第一目标帧。本实施例中,融合特征中引入压缩过程中产生的量化参量的相关特征,可以增强复原效果。
在另一种可能的实施方式中,图像处理装置可以对所述量化参量进行特征提取,获得第一特征;以及,对所述待处理帧进行特征提取,获得第二特征;然后融合所述第一特征和所述第二特征得到融合特征,根据融合特征进行复原处理,获得所述第一目标帧。本实施例中,融合特征中引入压缩过程中产生的量化参量的相关特征,可以增强复原效果。
在一些实施例中,可以通过预先训练好的视频帧复原网络来对待处理帧进行复原处理,比如可以将所述量化参量和所述待处理帧输入预先训练好的视频帧复原网络;由所述视频帧复原网络根据所述量化参量指示的所述待处理帧在压缩过程中的量化程度,对所述待处理帧进行复原处理;进而图像处理装置可以获得所述视频帧复原网络输出的第一目标帧。其中,输入视频帧复原网络中的待处理帧可以由一个或多个,对应的,输入视频帧复原网络中的量化参量与待处理帧一一对应,则视频帧复原网络可 以根据一个或多个待处理帧及其一一对应的量化参量,对待处理帧逐帧进行复原处理,获得一个或多个第一目标帧。
在一个例子中,请参阅图4,视频帧复原网络100包括有融合层10、卷积层20和复原网络30。融合层10用于将所述量化参量和所述待处理帧进行融合处理,获得融合数据;卷积层20用于对融合数据进行卷积元素以提取融合特征;复原网络30用于根据融合特征进行复原处理,获得所述第一目标帧。
在另一个例子中,请参阅图5,视频帧复原网络包括有第一卷积层21、第二卷积层22、融合层10和复原网络30。第一卷积层21用于对所述量化参量进行特征提取,获得第一特征;第二卷积层22用于对所述待处理帧进行特征提取,获得第二特征;融合层10用于融合所述第一特征和所述第二特征得到融合特征,比如可以沿通道维串联第一特征和第二特征;复原网络30用于根据融合特征进行复原处理,获得所述第一目标帧。
这里对所述视频帧复原网络的训练过程进行示例性说明,所述视频帧复原网络的训练样本可以包括原始视频帧序列、对所述原始视频帧序列进行压缩得到的退化视频帧序列和所述退化视频帧序列对应的量化参量序列;在训练过程中,可以将退化视频帧序列和量化参量序列输入视频帧复原网络中,由视频帧复原网络根据所述退化视频帧序列和所述量化参量序列逐帧进行复原处理得到预测结果,进而根据所述原始视频帧序列和预测结果之间的差异计算所述视频帧复原网络的损失函数,根据所述视频帧复原网络的损失函数调整所述视频帧复原网络的参数,获得训练好的视频帧复原网络。
在一些实施例中,拍摄装置采集的视频帧序列中的视频帧在量化处理的过程中,可以是将视频帧划分成多个图像块,每个图像块根据当前信道的信道质量采用对应的量化参量,则基于实际信道环境的变化,视频帧中的不同图像块的量化程度可能有所不同。即是说,所述量化参量可以用于指示所述待处理帧中的不同图像块在压缩过程中的不同量化程度;则图像处理装置在进行复原处理的过程中,可以根据所述量化参量指示的所述待处理帧中的不同图像块的不同量化程度,对所述不同图像块进行不同地复原处理。本实施例中,量化程度不同,即表示待处理帧中的不同图像块的退化程度不同,则可以采用不同的复原处理方式来进行复原处理,从而有效提高复原效果。
示例性的,所述量化参量包括有多个子量化参量,所述待处理帧针对于不同区域有对应的子量化参量,比如待处理帧的量化参量包括有4个子量化参量,其分别与待处理帧中的4个不同的图像块一一对应。则对于每一图像块,可以采用上述两种可能的实施方式来将该图像块对应的子量化参量引入该图像块的复原处理过程。
在一个例子中,图像处理装置可以将子量化参量和对应的图像块进行融合处理,获得融合数据,然后可以对融合数据进行特征提取以获得融合特征,根据从所述融合数据中提取的融合特征进行复原处理,获得复原后的图像块;在分别对待处理帧的不同图像块进行复原处理之后,可以获得第一目标帧。
在另一种可能的实施方式中,图像处理装置可以对所述子量化参量进行特征提取,获得第一特征;以及,对所述待处理帧的图像块进行特征提取,获得第二特征;然后融合所述第一特征和所述第二特征得到融合特征,根据融合特征进行复原处理,获得复原后的图像块;在分别对待处理帧的不同图像块进行复原处理之后,可以获得第一目标帧。
在一些实施例中,可以通过预先训练好的视频帧复原网络来对待处理帧中的不同图像块进行复原处理,比如可以将所述量化参量和所述待处理帧输入预先训练好的视频帧复原网络;由所述视频帧复原网络根据所述量化参量指示的所述待处理帧中的不同区域图像块的不同量化程度,对所述不同图像块区域进行不同地复原处理;进而图像处理装置可以获得所述视频帧复原网络输出的第一目标帧。
在一些实施例中,为了进一步减少在数据传输过程中的数据量,提高数据传输效率,请参阅图6,可移动平台可能会对拍摄装置采集的视频帧序列进行下采样处理,进而对下采样后的视频帧序列进行压缩得到所述压缩视频流,并传输给图像处理装置。则图像处理装置在对解压后的待处理帧进行复原处理得到第一目标帧之后,可以进一步对第一目标帧进行超分辨重建处理,获得第二目标帧。
在一些可能的实施方式中,请参阅图7,超分辨重建处理过程可以通过预先训练好的超分辨率重建网络200来进行,比如所述第一目标视频帧可以由预先训练好的视频帧复原网络100根据所述量化参量和所述待处理帧进行复原处理得到;所述第二目标帧可以由预先训练好的超分辨率重建网络200对所述第一目标视频帧进行超分辨重建处理得到。
为了提高训练效率和训练准确性,所述超分辨率重建网络和所述视频帧复原网络可以通过多任务学习联合训练得到。所述超分辨率重建网络和所述视频帧复原网络的训练样本包括:原始视频帧序列、下采样后的原始视频帧、对所述下采样后的原始视频帧序列进行压缩得到的退化视频帧序列和所述退化视频帧序列对应的量化参量序列。
在训练过程中,将退化视频帧序列和所述退化视频帧序列对应的量化参量序列输入视频帧复原网络中,由所述视频帧复原网络根据所述退化视频帧序列和所述退化视频帧序列分别对应的量化参量进行逐帧复原处理得到第二预测结果,进而将第二预测 结果输入超分辨率重建网络中,由超分辨率重建网络根据所述第二预测结果进行重建处理得到第一预测结果。
其中,所述超分辨率重建网络和所述视频帧复原网络的损失函数可以根据所述原始视频帧序列和第一预测结果的差异调整所述超分辨率重建网络和所述视频帧复原网络的参数;或者,所述超分辨率重建网络和所述视频帧复原网络的损失函数可以根据原始视频帧序列和第一预测结果的差异、以及所述下采样后的原始视频帧和第二预测结果的差异调整所述超分辨率重建网络和所述视频帧复原网络的参数,从而获得训练好的超分辨率重建网络和视频帧复原网络。本实施例中,超分辨率重建网络和视频帧复原网络的联合训练过程有利于提高训练效率和训练准确性。
在一些实施例中,请参阅图8,为了进一步提高图像质量,图8示出了另一种图像处理方法的流程示意图,所述方法可由图像处理装置来执行,所述方法包括:
在步骤S201中,在对压缩视频流进行解压之后,获取解压后的待处理帧以及在压缩过程中产生的与所述待处理帧相关的量化参量。与步骤S101类似,此处不再赘述。
在步骤S202中,根据所述量化参量对所述待处理帧进行复原处理,获得第一目标帧;其中,所述第一目标帧的图像质量高于所述待处理帧的图像质量。与步骤S102类似,此处不再赘述。
在步骤S203中,获取所述待处理帧的至少一个参考帧;以及,获取所述至少一个参考帧、以及在压缩过程中产生的所述参考帧与所述待处理帧之间的运动向量。
在步骤S204中,根据所述至少一个参考帧和至少一个所述运动向量对所述第一目标帧进行重建处理,获得第三目标帧;其中,第三目标帧的图像质量高于所述第一目标帧的图像质量。
其中,所述参考帧为解压后的参考帧或者对所述解压后的参考帧进行复原处理后的结果,可依据实际应用场景进行具体选择。示例性的,比如在图像处理装置的计算资源不足的情况下,所述参考帧可以是解压后的参考帧;在图像处理装置的计算资源足够的情况下,所述参考帧也可以是对所述解压后的参考帧进行复原处理后的结果,对所述解压后的参考帧进行复原处理后的结果可以提供更多补充信息,从而有利于进一步提高重建效果。
可以理解的是参考帧的复原过程与待处理帧的复原处理类似,也是利用与参考帧相关的量化参量对参考帧进行复原处理,此处不再赘述。
本实施例中,在运动向量的引导下,为第一目标帧提供参考帧的补充信息,从而有利于提高重建得到的第三目标帧的图像质量。
图像处理装置在获得压缩视频流之后,可以使用解码器对压缩视频流进行解压处理,其中,由于压缩视频流中也会携带有运动向量信息,则本申请实施例需要使用的所述相邻帧与所述待处理帧之间的运动向量可以由解码器在解压所述压缩视频流的过程中输出,获得运动向量的过程不需要额外的计算量,有利于提高重建效率。或者,在另一些可能实施例方式中,所述参考帧与所述待处理帧之间的运动向量可以根据由解码器在解压所述压缩视频流的过程中输出的运动向量进一步处理得到。
示例性的,所述参考帧可以是所述待处理帧的相邻帧。其中,所述相邻帧可以包括在所述待处理帧之前采集的M帧视频帧和/或在所述待处理帧之后采集的N帧视频帧;其中,M,N为大于0的整数。可以理解的是,本申请实施例对于获取的相邻帧的数量N不做任何限制,可依据实际应用场景进行具体设置,比如可以获取在所述待处理帧之前采集的一帧或者多帧视频帧,也可以获取在所述待处理帧之后采集的一帧或者多帧视频帧。或者,所述相邻帧包括在所述待处理帧之前采集的第M帧视频帧和/或在所述待处理帧之后采集的第N帧视频帧,M,N为大于0的整数,比如以所述待处理帧为第0帧,所述相邻帧可以是在所述待处理帧之前采集的第1帧图像帧,也可以是在所述待处理帧之前采集的第2帧图像帧,可依据实际应用场景进行具体选择。
示例性的,所述参考帧可以是与所述待处理帧具有相同目标对象的视频帧,从而有利于获得目标对象显示效果更佳的第三目标帧。所述目标对象包括但不限于人物、建筑、动物或者其他指定物体。
在一些实施例中,图像处理装置可以根据至少一个所述运动向量,融合至少一个参考帧和所述第一目标帧,并根据融合后的结果进行重建处理,获得第三目标帧。本实施例中,第一目标帧和参考帧的融合是在运动向量引导下实现的,所以能够避免产生模糊的重建结果,有利于获取图像质量更佳的第三目标帧。
对于至少一个参考帧和第一目标帧的融合过程,这里示例性示出两种可能的实现方式:
在一种可能的实现方式中,图像处理装置可以根据运动向量对至少一个参考帧进行仿射变换,并将变换后的参考帧和所述第一目标帧进行融合处理。可以理解的是,本实施例对于所述融合处理的具体实现过程不做任何限制,可依据实际应用场景进行具体设置,比如可以是将变换后的参考帧和第一目标帧中相同位置的像素的像素值相加后取平均值,从而获取融合后的结果。本实施例实现在运动向量的引导下,使得第一目标帧有效融合了参考帧的补充信息,从而为后续的重建处理过程提供丰富的信息。
在另一种可能的实现方式中,对于至少一个参考帧,图像处理装置至少根据参考 帧与所述第一目标帧之间的运动向量对参考帧进行特征提取,获得第三特征;以及,图像处理装置对第一目标帧进行特征提取,获得第四特征;然后融合至少一个所述第三特征和所述第四特征。本实施例中,分别提取第三特征和第四特征,并在运动向量的引导下实现对第一目标帧和参考帧中的有效信息(第三特征和第四特征)的融合,而不是全部信息的融合,在为后续的重建处理过程提供丰富的特征以提高目标帧的图像质量的基础上,也减少了后续重建处理过程中的数据量,有利于提高重建处理效率。
其中,特征提取过程提取的信息包括但不限于边缘特征、形状(轮廓)特征、颜色特征或者纹理特征等等。可以理解的是,本申请实施例对于特征提取所应用的方法不做任何限制,可依据实际应用场景进行具体设置,比如可以通过卷积运算、HOG(histogram of Oriented Gradient,方向梯度直方图)、SIFT(Scale-invariant features transform,尺度不变特征变换)、SURF(Speeded Up Robust Features,加速稳健特征)或者DOG(Difference of Gaussian,高斯函数差分)等方法来进行特征提取。
其中,对于获取第三特征的过程,这里示例性示出两种可能的实现方式:
在一种可能的实现方式中,图像处理装置可以将所述第一目标帧、参考帧、所述参考帧与所述第一目标帧之间的运动向量进行融合处理,获得融合数据;然后对所述融合数据进行特征提取,获得所述第三特征。本实施例中,提取的第三特征中包括有运动向量的特征信息、参考帧的特征信息和第一目标帧的特征信息。
在另一种可能的实现方式中,图像处理装置可以根据所述运动向量对所述参考帧进行仿射变换,并对变换后的参考帧进行特征提取以获得所述第三特征。本实施例中,提取的第三特征中包括有利用运动向量变换后的参考帧的特征信息。
在一些实施例中,在获得融合后的结果之后,图像处理装置可以使用融合后的结果进行重建处理。在一些应用场景中,考虑到图像处理装置的运行资源有限,或者对于重建处理效率具有一定的要求等情况,图像处理装置可以对融合后的结果进行降维处理之后,使用降维后的结果进行重建处理,从而有利于减少运算数据量,提高重建处理效率。
示例性的,所述融合后的结果包括融合至少一个所述第三特征和所述第四特征获得的第五特征;图像处理装置可以对第五特征进行降维处理,并使用降维后的第五特征进行重建处理。可以理解的是,本申请实施例对于降维处理的具体方法不做任何限制,可依据实际应用场景进行具体设置,比如可以通过池化方法或者卷积运算来对第五特征进行降维处理。
在一些实施例中,在获得融合后的结果之后,可以将融合后的结果输入预先建立 好的视频帧重建网络,通过所述视频帧重建网络进行重建处理得到,得到图像质量高于第一目标帧的图像质量的第三目标帧。可以理解的是,本申请实施例对于视频帧重建网络的具体结构不做任何限制,可依据实际应用场景进行具体设置。
示例性的,所述视频帧重建网络用于复原所述第一目标帧,使得获取的第三目标帧能够接近拍摄装置采集的视频帧。或者,所述视频帧复原网络用于对所述第一目标帧进行超分辨率重建处理,使得获得的第三目标帧的分辨率高于第一目标帧的分辨率。
示例性的,在视频帧重建网络的训练过程中,训练样本可以是利用相关的运动向量对解压后的视频帧和该视频帧的至少一个参考帧进行融合处理得到的融合数据;标签包括复原视频帧或者超分辨率视频帧;在训练过程中,将若干融合数据属于视频帧重建网络,由视频帧重建网络对融合数据进行重建处理获得预测视频帧;如果是基于图像复原目的,则可以根据复原视频帧和预测视频帧之间的差异调整视频帧重建网络的参数,获得用于复原视频帧的视频帧重建网络;如果是基于超分辨重建目的,则可以根据超分辨率视频帧和预测视频帧之间的差异调整视频帧重建网络的参数,获得用于对视频帧进行超分辨率重建处理的视频帧重建网络。
在一示例性的实施例中,请参阅图9A,图9A示出了视频帧复原网络100、数据融合网络300和视频帧重建网络400。
对压缩视频流进行解压处理,获得解压后的视频帧t-1、视频帧t和视频帧t+1,以及与视频帧t-1、视频帧t和视频帧t+1分别相关的量化参量q-1、量化参量q和量化参量q+1。将视频帧t-1、视频帧t和视频帧t+1和量化参量q-1、量化参量q和量化参量q+1输入视频帧复原网络100中,由视频帧复原网络100根据对应的量化参数和视频帧逐帧进行复原处理,获得复原后的视频帧t-1、复原后的视频帧t和复原后的视频帧t+1。本实施例中,在对视频帧进行复原的过程中引入压缩过程中产生的量化参量,量化参量可以反映出视频帧在压缩过程中的退化程度,因此可以使用量化参数良好地指导视频帧的复原过程,增强复原效果,也有利于提高了图像质量。
进一步地,数据融合网络300用于采用压缩过程产生的视频帧t-1到帧t的运动向量V t-1→t和帧t到帧t+1的运动向量V t→t+1作为引导,融合复原后的视频帧t-1、复原后的视频帧t和复原后的视频帧t+1的信息,来辅助视频帧t的重建过程。视频帧重建网络400用于使用融合复原后的视频帧t-1、复原后的视频帧t和复原后的视频帧t+1后得到的信息来进行视频帧t的复原或超分辨率。
在图9A中,视频帧t-1、视频帧t、视频帧t+1、复原后的视频帧t-1、复原后的视频帧t和复原后的视频帧t+1的大小均为C 1×H×W,其中C 1表示通道数,H表示复原后 的视频帧的高,W表示复原后的视频帧的宽,其具体数值可依据实际应用场景进行具体设置;运动向量V t-1→t和V t→t+1的大小表示为2×H×W,其具体数值可依据实际应用场景进行具体设置。
对于复原后的视频帧t(大小为C 1×H×W)的复原或超分辨率,首先分别结合复原后的视频帧t-1和复原后的视频帧t+1,然后在运动向量的引导下进行融合。其中,数据融合网络300包括有一个或多个第一融合层301、一个或多个第三卷积层302、第二融合层303和第四卷积层304。其中,第一融合层301的数量根据相邻帧的数量确定,第三卷积层302的数量根据相邻帧和待处理帧的总数量确定。
具体地,对于复原后的视频帧t-1,将复原后的视频帧t-1、复原后的视频帧t和运动向量V t-1→t通过第一融合层301进行融合处理获得融合数据,比如第一融合层301可以沿通道维串联复原后的视频帧t、复原后的视频帧t-1和运动向量V t-1→t,得到融合数据(大小为C 2×H×W的张量);然后再通过第三卷积层302对融合数据进行特征提取,得到第三特征(大小为C×H×W的张量)。对于复原后的视频帧t,利用第三卷积层302提取复原后的视频帧t的特征,获得第四特征(大小为C×H×W的张量)。对于复原后的视频帧t+1,将复原后的视频帧t、复原后的视频帧t+1和运动向量V t→t+1通过第一融合层301进行融合处理获得融合数据,比如第一融合层301可以沿通道维串联复原后的视频帧t、复原后的视频帧t+1和运动向量V t→t+1,然后采用第三卷积层302对融合数据进行特征提取,得到第三特征(大小为C×H×W的张量)。
接着,对于产生两个第三特征和一个第四特征,可以对两个第三特征和一个第四特征通过第二融合层303进行融合处理得到第五特征,比如第二融合层303沿通道维对两个第三特征和一个第四特征进行串联得到第五特征(大小为3C×H×W的张量),最后为了提高图像重建处理效率,可选地,可以通过第四卷积层304来进行降低通道维,将第五特征从3C×H×W的张量降维为C×H×W的张量,进而将降维后的第五特征输入预先建立好的视频帧重建网络400,通过视频帧重建网络400进行重建处理得到,得到图像质量更佳的第三目标帧。
其中,最终输出的第三目标帧可以是复原后的视频帧t对应的复原结果或者超分辨率的结果,这取决于视频帧重构网络的具体结构。输出的目标帧的大小为C×m·H×m·W,其中m表示放大的倍数,可依据实际应用场景进行具体设置,比如对于复原任务,m的取值为1;对于超分辨率任务,通常进行4倍超分,因此m的取值为4。本实施例中,在运动向量的引导下,有效融合了复原后的视频帧t的前后两帧的补充信息,从而为后 续的视频帧重建网络提供丰富的特征,加强输出的目标帧的质量,并且帧间信息的融合是在运动向量引导下实现的,能够避免产生模糊的重建结果。
在另一示例性的实施例中,请参阅图9B,对压缩视频流进行解压处理,获得解压后的视频帧t-1、视频帧t和视频帧t+1,可以考虑仅对解压后的视频帧t进行复原处理,对解压后的视频帧t-1和视频帧t+1不进行复原处理,则可以获得与视频帧t相关的量化参量q。将视频帧t和量化参量q输入视频帧复原网络100中,由视频帧复原网络100根据量化参数和视频帧进行复原处理,获得复原后的视频帧t。
进一步地,数据融合网络300用于采用压缩过程产生的视频帧t-1到帧t的运动向量V t-1→t和帧t到帧t+1的运动向量V t→t+1作为引导,融合解压后的视频帧t-1、复原后的视频帧t和解压后的视频帧t+1的信息,来辅助视频帧t的重建过程。视频帧重建网络400用于使用融合解压后的视频帧t-1、复原后的视频帧t和解压后的视频帧t+1后得到的信息来进行视频帧t的复原或超分辨率。
在又一示例性的实施例中,请参阅图10,图10与图9A所述实施例的区别在于第三特征的获取方式不同,将视频帧t-1、视频帧t和视频帧t+1和量化参量q-1、量化参量q和量化参量q+1输入视频帧复原网络100中,由视频帧复原网络100根据对应的量化参数和视频帧逐帧进行复原处理,获得复原后的视频帧t-1、复原后的视频帧t和复原后的视频帧t+1。
在数据融合网络300中,对于复原后的视频帧t-1,在仿射变换模块305中,利用运动向量V t-1→t来对复原后的视频帧t-1进行仿射变换;然后在第三卷积层302中,对变换后的复原后的视频帧t-1进行特征提取以获得第三特征(大小为C×H×W的张量)。对于复原后的视频帧t,利用第三卷积层302提取复原后的视频帧t的特征,获得第四特征(大小为C×H×W的张量)。对于复原后的视频帧t+1,在仿射变换模块305中,利用运动向量V t→t+1来对复原后的视频帧t+1进行仿射变换;然后在第三卷积层302中,对变换后的复原后的视频帧t+1进行特征提取以获得第三特征(大小为C×H×W的张量)。后续操作过程与图4类似,对于产生两个第三特征和一个第死特征,可以对两个第三特征和第四特征通过第二融合层3030进行融合处理得到第五特征,接着通过第四卷积层304来进行降低通道维,将第五特征从3C×H×W的张量降维为C×H×W的张量,进而将降维后的第五特征输入预先建立好的视频帧重建网络400,通过所述视频帧重建网络400进行重建处理得到,得到目标帧。本实施例中,在运动向量的引导下,有效融合了复原后的视频帧t的前后两帧的补充信息,从而为后续的视频帧重建网络提供丰富的特征,加强输出的目标帧的质量,并且帧间信息的融合是在运动向量引导下实现的, 能够避免产生模糊的重建结果。
以上实施方式中的各种技术特征可以任意进行组合,只要特征之间的组合不存在冲突或矛盾,因此上述实施方式中的各种技术特征的任意进行组合也属于本说明书公开的范围。
相应地,请参阅图11,本申请实施例还提供了一种全景图像拍摄装置40,包括:
用于存储可执行指令的存储器41;
一个或多个处理器42;
其中,所述一个或多个处理器42执行所述可执行指令时,被单独地或共同地配置成执行上述任意一项所述的方法。
所述处理器42执行所述存储器41中包括的可执行指令,所述处理器42可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
所述存储器41存储无人机的返航方法的可执行指令,所述存储器41可以包括至少一种类型的存储介质,存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等等。而且,设备可以与通过网络连接执行存储器的存储功能的网络存储装置协作。存储器41可以是内部存储单元,例如硬盘或内存。存储器41也可以是外部存储设备,例如插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,存储器41还可以既包括内部存储单元也包括外部存储设备。存储器41还可以用于暂时地存储已经输出或者将要输出的数据。
在一些实施例中,所述处理器42执行所述可执行指令时,被单独地或共同地配置成:
在对压缩视频流进行解压之后,获取解压后的待处理帧以及在压缩过程中产生的与所述待处理帧相关的量化参量;
根据所述量化参量对所述待处理帧进行复原处理,获得第一目标帧;其中,所述第一目标帧的图像质量高于所述待处理帧的图像质量。
在一些实施例中,所述量化参量包括量化参数或量化矩阵。
在一些实施例中,所述量化参量至少根据用于传输所述待处理帧的信道的信道质量确定。
在一些实施例中,所述量化参量指示的所述待处理帧对应的量化程度与所述信道质量成负相关关系。
在一些实施例中,所述量化参量由解码器在对所述压缩视频流进行解码的过程中输出。
在一些实施例中,所述处理器42还用于将所述量化参量和所述待处理帧进行融合处理,获得融合数据;根据从所述融合数据中提取的融合特征进行复原处理,获得所述第一目标帧。
在一些实施例中,所述处理器42还用于对所述量化参量进行特征提取,获得第一特征;以及,对所述待处理帧进行特征提取,获得第二特征;根据融合所述第一特征和所述第二特征得到的融合特征进行复原处理,获得所述第一目标帧。
在一些实施例中,所述量化参量用于指示所述待处理帧中的不同图像块的不同量化程度。所述处理器42还用于根据所述量化参量指示的所述待处理帧中的不同图像块的不同量化程度,对所述不同图像块进行不同地复原处理。
在一些实施例中,所述处理器42还用于将所述量化参量和所述待处理帧输入预先训练好的视频帧复原网络;由所述视频帧复原网络根据所述量化参量指示的所述待处理帧在压缩过程中的量化程度,对所述待处理帧进行复原处理;获得所述视频帧复原网络输出的第一目标帧。
在一些实施例中,所述视频帧复原网络的训练样本包括原始视频帧序列、对所述原始视频帧序列进行压缩得到的退化视频帧序列和所述退化视频帧序列对应的量化参量序列。所述视频帧复原网络的损失函数用于根据所述原始视频帧序列和预测结果之间的差异调整所述视频帧复原网络的参数;其中,所述预测结果由所述视频帧复原网络根据所述退化视频帧序列和所述量化参量序列进行复原处理得到。
在一些实施例中,所述压缩视频流由编码器对下采样后的视频帧序列进行压缩得到。所述处理器42还用于对所述第一目标帧进行超分辨重建处理,获得第二目标帧。
在一些实施例中,所述第一目标视频帧由预先训练好的视频帧复原网络根据所述量化参量和所述待处理帧进行复原处理得到;所述第二目标帧由预先训练好的超分辨率重建网络对所述第一目标视频帧进行超分辨重建处理得到。
在一些实施例中,所述超分辨率重建网络和所述视频帧复原网络通过多任务学习联合训练得到;所述超分辨率重建网络和所述视频帧复原网络的训练样本包括:原始 视频帧序列、下采样后的原始视频帧、对所述下采样后的原始视频帧序列进行压缩得到的退化视频帧序列和所述退化视频帧序列对应的量化参量序列。
在一些实施例中,所述超分辨率重建网络和所述视频帧复原网络的损失函数用于根据所述原始视频帧序列和第一预测结果的差异调整所述超分辨率重建网络和所述视频帧复原网络的参数;或者根据所述原始视频帧序列和第一预测结果的差异、以及所述下采样后的原始视频帧和第二预测结果的差异调整所述超分辨率重建网络和所述视频帧复原网络的参数;其中,第二预测结果为所述由所述视频帧复原网络根据所述退化视频帧序列和所述退化视频帧序列分别对应的量化参量进行复原处理得到;所述第一预测结构由所述超分辨率重建网络根据所述第二预测结果进行重建处理得到。
在一些实施例中,所述处理器42还用于获取所述待处理帧的经过复原处理后的至少一个参考帧;获取所述至少一个参考帧、以及在压缩过程中产生的所述参考帧与所述待处理帧之间的运动向量;根据所述至少一个参考帧和至少一个所述运动向量对所述第一目标帧进行重建处理,获得第三目标帧;其中,第三目标帧的图像质量高于所述第一目标帧的图像质量。
在一些实施例中,所述参考帧为解压后的参考帧或者对所述解压后的参考帧进行复原处理后的结果。
在一些实施例中,所述参考帧包括所述待处理帧的相邻帧。所述相邻帧包括在所述待处理帧之前采集的M帧视频帧和/或在所述待处理帧之后采集的N帧视频帧;或者所述相邻帧包括在所述待处理帧之前采集的第M帧视频帧和/或在所述待处理帧之后采集的第N帧视频帧;其中,M,N为大于0的整数。
在一些实施例中,所述运动向量由解码器在解压所述压缩视频流的过程中输出。
在一些实施例中,所述压缩视频流由可移动平台在运动过程中利用其搭载的拍摄装置采集视频帧序列,然后由所述可移动平台对所述视频帧序列进行压缩并传输得到。
这里描述的各种实施方式可以使用例如计算机软件、硬件或其任何组合的计算机可读介质来实施。对于硬件实施,这里描述的实施方式可以通过使用特定用途集成电路(ASIC)、数字信号处理器(DSP)、数字信号处理装置(DSPD)、可编程逻辑装置(PLD)、现场可编程门阵列(FPGA)、处理器、控制器、微控制器、微处理器、被设计为执行这里描述的功能的电子单元中的至少一种来实施。对于软件实施,诸如过程或功能的实施方式可以与允许执行至少一种功能或操作的单独的软件模块来实施。软件代码可以由以任何适当的编程语言编写的软件应用程序(或程序)来实施,软件代码可以存储在存储器中并且由控制器执行。
上述设备中各个单元的功能和作用的实现过程具体详见上述方法中对应步骤的实现过程,在此不再赘述。
在一些实施例中,还提供一种遥控设备,包括上述的图像处理装置。
在一些实施例中,还提供一种图像处理系统,包括可移动平台和遥控设备。
所述可移动平台安装有拍摄装置,所述拍摄装置用于在所述可移动平台运动过程中采集视频帧序列。
所述可移动平台用于对所述视频帧序列进行压缩获得压缩视频流,并向所述图像处理装置传输所述压缩视频流。
示例性的,所述可移动平台包括以下一种或多种:无人飞行器、无人驾驶车辆、云台、无人驾驶船只或者移动机器人。比如请参阅图1,示出了遥控设备和无人飞行器的示意图。
在示例性实施例中,还提供了一种包括指令的非临时性计算机可读存储介质,例如包括指令的存储器,上述指令可由装置的处理器执行以完成上述方法。例如,非临时性计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。
一种非临时性计算机可读存储介质,当存储介质中的指令由终端的处理器执行时,使得终端能够执行上述方法。
需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
以上对本申请实施例所提供的方法和装置进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。

Claims (25)

  1. 一种图像处理方法,其特征在于,包括:
    在对压缩视频流进行解压之后,获取解压后的待处理帧以及在压缩过程中产生的与所述待处理帧相关的量化参量;
    根据所述量化参量对所述待处理帧进行复原处理,获得第一目标帧;其中,所述第一目标帧的图像质量高于所述待处理帧的图像质量。
  2. 根据权利要求1所述的方法,其特征在于,所述量化参量包括量化参数或量化矩阵。
  3. 根据权利要求1所述的方法,其特征在于,所述量化参量至少根据用于传输所述待处理帧的信道的信道质量确定。
  4. 根据权利要求3所述的方法,其特征在于,所述量化参量指示的所述待处理帧对应的量化程度与所述信道质量成负相关关系。
  5. 根据权利要求1所述的方法,其特征在于,所述量化参量由解码器在对所述压缩视频流进行解码的过程中输出。
  6. 根据权利要求1所述的方法,其特征在于,所述根据所述量化参量对所述待处理帧进行复原处理,包括:
    将所述量化参量和所述待处理帧进行融合处理,获得融合数据;
    根据从所述融合数据中提取的融合特征进行复原处理,获得所述第一目标帧。
  7. 根据权利要求1所述的方法,其特征在于,所述根据所述量化参量对所述待处理帧进行复原处理,包括:
    对所述量化参量进行特征提取,获得第一特征;以及,对所述待处理帧进行特征提取,获得第二特征;
    根据融合所述第一特征和所述第二特征得到的融合特征进行复原处理,获得所述第一目标帧。
  8. 根据权利要求1所述的方法,其特征在于,所述量化参量用于指示所述待处理帧中的不同图像块的不同量化程度;
    所述根据所述量化参量对所述待处理帧进行复原处理,包括:
    根据所述量化参量指示的所述待处理帧中的不同图像块的不同量化程度,对所述不同图像块进行不同地复原处理。
  9. 根据权利要求1所述的方法,其特征在于,所述根据所述量化参量对所述待处理帧进行复原处理,包括:
    将所述量化参量和所述待处理帧输入预先训练好的视频帧复原网络;
    由所述视频帧复原网络根据所述量化参量指示的所述待处理帧在压缩过程中的量化程度,对所述待处理帧进行复原处理;
    获得所述视频帧复原网络输出的第一目标帧。
  10. 根据权利要求9所述的方法,其特征在于,所述视频帧复原网络的训练样本包括原始视频帧序列、对所述原始视频帧序列进行压缩得到的退化视频帧序列和所述退化视频帧序列对应的量化参量序列;
    所述视频帧复原网络的损失函数用于根据所述原始视频帧序列和预测结果之间的差异调整所述视频帧复原网络的参数;其中,所述预测结果由所述视频帧复原网络根据所述退化视频帧序列和所述量化参量序列进行复原处理得到。
  11. 根据权利要求1所述的方法,其特征在于,所述压缩视频流由编码器对下采样后的视频帧序列进行压缩得到;
    所述方法还包括:
    对所述第一目标帧进行超分辨重建处理,获得第二目标帧。
  12. 根据权利要求11所述的方法,其特征在于,所述第一目标视频帧由预先训练好的视频帧复原网络根据所述量化参量和所述待处理帧进行复原处理得到;
    所述第二目标帧由预先训练好的超分辨率重建网络对所述第一目标视频帧进行超分辨重建处理得到。
  13. 根据权利要求12所述的方法,其特征在于,所述超分辨率重建网络和所述视频帧复原网络通过多任务学习联合训练得到;
    所述超分辨率重建网络和所述视频帧复原网络的训练样本包括:原始视频帧序列、下采样后的原始视频帧、对所述下采样后的原始视频帧序列进行压缩得到的退化视频帧序列和所述退化视频帧序列对应的量化参量序列。
  14. 根据权利要求13所述的方法,其特征在于,所述超分辨率重建网络和所述视频帧复原网络的损失函数用于根据所述原始视频帧序列和第一预测结果的差异调整所述超分辨率重建网络和所述视频帧复原网络的参数;或者根据所述原始视频帧序列和第一预测结果的差异、以及所述下采样后的原始视频帧和第二预测结果的差异调整所述超分辨率重建网络和所述视频帧复原网络的参数;
    其中,第二预测结果为所述由所述视频帧复原网络根据所述退化视频帧序列和所述退化视频帧序列分别对应的量化参量进行复原处理得到;所述第一预测结构由所述超分辨率重建网络根据所述第二预测结果进行重建处理得到。
  15. 根据权利要求1所述的方法,其特征在于,还包括:
    获取所述待处理帧的至少一个参考帧;
    获取所述至少一个参考帧、以及在压缩过程中产生的所述参考帧与所述待处理帧之间的运动向量;
    根据所述至少一个参考帧和至少一个所述运动向量对所述第一目标帧进行重建处理,获得第三目标帧;其中,第三目标帧的图像质量高于所述第一目标帧的图像质量。
  16. 根据权利要求15所述的方法,其特征在于,所述参考帧为解压后的参考帧或者对所述解压后的参考帧进行复原处理后的结果。
  17. 根据权利要求15所述的方法,其特征在于,所述参考帧包括所述待处理帧的相邻帧。
  18. 根据权利要求17所述的方法,其特征在于,所述相邻帧包括在所述待处理帧之前采集的M帧视频帧和/或在所述待处理帧之后采集的N帧视频帧;或者
    所述相邻帧包括在所述待处理帧之前采集的第M帧视频帧和/或在所述待处理帧之后采集的第N帧视频帧;
    其中,M、N为大于0的整数。
  19. 根据权利要求15所述的方法,其特征在于,所述运动向量由解码器在解压所述压缩视频流的过程中输出。
  20. 根据权利要求1至19任意一项所述的方法,其特征在于,所述压缩视频流由可移动平台在运动过程中利用其搭载的拍摄装置采集视频帧序列,然后由所述可移动平台对所述视频帧序列进行压缩并传输得到。
  21. 一种图像处理装置,其特征在于,包括:
    用于存储可执行指令的存储器;
    一个或多个处理器;
    其中,所述一个或多个处理器执行所述可执行指令时,被单独地或共同地配置成执行权利要求1至20任意一项所述的方法。
  22. 一种遥控设备,其特征在于,包括如权利要求21所述的图像处理装置。
  23. 一种图像处理系统,其特征在于,包括可移动平台和如权利要求22所述的遥控设备;
    所述可移动平台安装有拍摄装置,所述拍摄装置用于在所述可移动平台运动过程中采集视频帧序列;
    所述可移动平台用于对所述视频帧序列进行压缩获得压缩视频流,并向所述遥控 设备传输所述压缩视频流。
  24. 根据权利要求23所述的系统,其特征在于,所述可移动平台包括以下一种或多种:无人飞行器、无人驾驶车辆、云台、无人驾驶船只或者移动机器人。
  25. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有可执行指令,所述可执行指令被处理器执行时实现如权利要求1至20任一项所述的方法。
PCT/CN2022/072349 2022-01-17 2022-01-17 图像处理方法、装置、遥控设备、系统及存储介质 WO2023133889A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/072349 WO2023133889A1 (zh) 2022-01-17 2022-01-17 图像处理方法、装置、遥控设备、系统及存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/072349 WO2023133889A1 (zh) 2022-01-17 2022-01-17 图像处理方法、装置、遥控设备、系统及存储介质

Publications (1)

Publication Number Publication Date
WO2023133889A1 true WO2023133889A1 (zh) 2023-07-20

Family

ID=87279855

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/072349 WO2023133889A1 (zh) 2022-01-17 2022-01-17 图像处理方法、装置、遥控设备、系统及存储介质

Country Status (1)

Country Link
WO (1) WO2023133889A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150023410A1 (en) * 2013-07-16 2015-01-22 Arcsoft Hangzhou Co., Ltd. Method for simultaneously coding quantized transform coefficients of subgroups of frame
CN107925762A (zh) * 2015-09-03 2018-04-17 联发科技股份有限公司 基于神经网络的视频编解码处理方法和装置
CN109151475A (zh) * 2017-06-27 2019-01-04 杭州海康威视数字技术股份有限公司 一种视频编码方法、解码方法、装置及电子设备
CN110099280A (zh) * 2019-05-24 2019-08-06 浙江大学 一种无线自组织网络带宽受限下的视频业务质量增强方法
CN113920010A (zh) * 2020-07-10 2022-01-11 华为技术有限公司 图像帧的超分辨率实现方法和装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150023410A1 (en) * 2013-07-16 2015-01-22 Arcsoft Hangzhou Co., Ltd. Method for simultaneously coding quantized transform coefficients of subgroups of frame
CN107925762A (zh) * 2015-09-03 2018-04-17 联发科技股份有限公司 基于神经网络的视频编解码处理方法和装置
CN109151475A (zh) * 2017-06-27 2019-01-04 杭州海康威视数字技术股份有限公司 一种视频编码方法、解码方法、装置及电子设备
CN110099280A (zh) * 2019-05-24 2019-08-06 浙江大学 一种无线自组织网络带宽受限下的视频业务质量增强方法
CN113920010A (zh) * 2020-07-10 2022-01-11 华为技术有限公司 图像帧的超分辨率实现方法和装置

Similar Documents

Publication Publication Date Title
US11057646B2 (en) Image processor and image processing method
TWI759668B (zh) 視頻圖像處理方法、電子設備和電腦可讀儲存介質
EP3583777A1 (en) A method and technical equipment for video processing
CN114079779B (zh) 图像处理方法、智能终端及存储介质
CN113766249B (zh) 视频编解码中的环路滤波方法、装置、设备及存储介质
WO2023005740A1 (zh) 图像编码、解码、重建、分析方法、系统及电子设备
CN115409716B (zh) 视频处理方法、装置、存储介质及设备
KR20200050284A (ko) 영상 적응적 양자화 테이블을 이용한 영상의 부호화 장치 및 방법
WO2023050720A1 (zh) 图像处理方法、图像处理装置、模型训练方法
EP3646286A1 (en) Apparatus and method for decoding and coding panoramic video
CN114979672A (zh) 视频编码方法、解码方法、电子设备及存储介质
CN115442609A (zh) 特征数据编解码方法和装置
WO2024078066A1 (zh) 视频解码方法、视频编码方法、装置、存储介质及设备
CN115604485A (zh) 视频图像的解码方法及装置
CN116847087A (zh) 视频处理方法、装置、存储介质及电子设备
TWI826160B (zh) 圖像編解碼方法和裝置
WO2023193629A1 (zh) 区域增强层的编解码方法和装置
WO2023133889A1 (zh) 图像处理方法、装置、遥控设备、系统及存储介质
WO2023133888A1 (zh) 图像处理方法、装置、遥控设备、系统及存储介质
US11538169B2 (en) Method, computer program and system for detecting changes and moving objects in a video view
CN112822497B (zh) 基于边缘计算的视频压缩编码处理方法及相关组件
CN117321989A (zh) 基于神经网络的图像处理中的辅助信息的独立定位
CN111988621A (zh) 视频处理器训练方法、装置、视频处理装置及视频处理方法
WO2024078403A1 (zh) 图像处理方法、装置及设备
TWI834087B (zh) 用於從位元流重建圖像及用於將圖像編碼到位元流中的方法及裝置、電腦程式產品

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22919535

Country of ref document: EP

Kind code of ref document: A1