WO2020238439A1 - 无线自组织网络带宽受限下的视频业务质量增强方法 - Google Patents

无线自组织网络带宽受限下的视频业务质量增强方法 Download PDF

Info

Publication number
WO2020238439A1
WO2020238439A1 PCT/CN2020/084255 CN2020084255W WO2020238439A1 WO 2020238439 A1 WO2020238439 A1 WO 2020238439A1 CN 2020084255 W CN2020084255 W CN 2020084255W WO 2020238439 A1 WO2020238439 A1 WO 2020238439A1
Authority
WO
WIPO (PCT)
Prior art keywords
frame
video
neural network
reconstructed
sampling
Prior art date
Application number
PCT/CN2020/084255
Other languages
English (en)
French (fr)
Inventor
丁丹丹
雷鸣
王婵
徐莹莹
鞠阳
盛华联
刘派
Original Assignee
浙江大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 浙江大学 filed Critical 浙江大学
Publication of WO2020238439A1 publication Critical patent/WO2020238439A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/124Quantisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/132Sampling, masking or truncation of coding units, e.g. adaptive resampling, frame skipping, frame interpolation or high-frequency transform coefficient masking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/30Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/59Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial sub-sampling or interpolation, e.g. alteration of picture size or resolution

Definitions

  • the present invention relates to the technical field of wireless ad hoc network transmission, in particular to a method for enhancing the quality of video services under the limited bandwidth of the radio ad hoc network.
  • H.264/AVC is a video coding standard jointly proposed by the International Telecommunication Union (ITU) and the International Organization for Standardization (ISO/IEC), which was released in 2003.
  • H.265/HEVC is a new generation of high-efficiency video coding standards jointly proposed by the International Telecommunication Union (ITU) and the International Organization for Standardization (ISO/IEC). Compared with the previous generation H.264/AVC, H.265 under the same coding quality /HEVC can save 50% bit rate.
  • the existing video transmission technology and video compression technology do not modify the video image resolution, so under the condition of limited bandwidth, it brings difficulties to real-time transmission and encoding.
  • the present invention proposes to download the video image before compressing and transmitting the high-resolution video image to obtain the corresponding low-resolution video image, thus greatly reducing the transmission burden of the wireless network and improving the transmission efficiency.
  • the neural network is used to enhance the image to obtain a high-resolution image again. This method can greatly reduce the transmission bandwidth requirement and meet the real-time transmission under the condition of limited bandwidth.
  • the present invention provides a method for enhancing video service quality under limited bandwidth of a wireless ad hoc network.
  • the present invention adopts a method of down-sampling high-resolution video before transmission, and then enhancing the video quality during reception.
  • the process includes:
  • is the weight coefficient of the network.
  • the video frame at a lower level in the coding reference structure refers to a frame with a relatively large quantization parameter in a coding group.
  • the first frame image before the current frame to be enhanced and the first frame image after the current frame to be enhanced, the virtual frame of the current frame can be obtained according to the pair of reconstructed frames, which is marked as virtual frame 1;
  • the second frame image in front of the current frame to be enhanced and the second frame image in the back of the current frame to be enhanced, the virtual frame of the current frame can be obtained according to the pair of reconstructed frames, which is marked as virtual frame 2.
  • the virtual frame of the current frame can be obtained from the third frame image in front of the current frame to be enhanced and the third frame image behind in the frame to be enhanced, which is marked as virtual frame 3;
  • the prediction method that uses its neighboring decoded reconstructed frame to predict the frame is a neural network-based method.
  • the predictive neural network inputs two pairs of images before and after the current frame to obtain the virtual frame.
  • the current frame is X
  • the predicted frame obtained by passing through the network of the two pairs of images before and after it is X’.
  • the prediction neural network is trained by minimizing the following objective function:
  • is the weight coefficient of the neural network.
  • the quality enhancement refers to the use of an enhanced neural network to input the virtual frame and the current frame of the current frame to be enhanced, and output the enhanced current frame.
  • the resolution of the enhanced video frame is enlarged by 2 P times, where P Is a non-negative integer.
  • the input specific virtual frame is related to the reference level of the current frame to be enhanced in the encoding group, and may be any of the corresponding virtual frame 1, virtual frame 2,..., virtual frame N (N is a positive integer) One frame or any two frames, and so on, or all frames.
  • the encoding group size is 8, select the frame at the penultimate layer of the encoding reference structure and the frame at the penultimate layer for multi-frame enhancement; if the encoding group size is 16, select the frame at Multi-frame enhancement is performed on the frames of the first-to-last layer, the frames of the second-to-last layer, and the frames of the third-to-last layer of the coding reference structure;
  • the down-sampling technology is used to down-sample the high-resolution video before transmission, and then enhancement during reception can effectively reduce bandwidth requirements.
  • the present invention has the following beneficial effects: the present invention utilizes neural network for down-sampling before transmission, which ensures that the sampling is beneficial to the quality restoration of the coded reconstructed image, and the transmission bandwidth requirement is doubled. More video content can be transmitted under bandwidth conditions.
  • the quality of some images with high quantization parameters is particularly damaged.
  • the present invention uses multi-frame quality enhancement technology to improve their images. Specifically, a neural network is used to predict the virtual frame of the current frame based on the previous and subsequent multi-frame images. , Thereby assisting the quality enhancement of the current frame and improving the enhancement performance.
  • the present invention ensures the video quality as much as possible, and achieves beneficial effects.
  • Figure 1 is a schematic diagram of a method for enhancing video service quality in a wireless ad hoc network environment
  • Figure 2 is a schematic diagram of the down-sampling neural network structure in the embodiment
  • Fig. 3 is a schematic diagram of the structure of an up-sampling neural network in an embodiment
  • Fig. 4 is a flowchart of an enhancement method applied to the video enhancement unit of Fig. 1 provided by an embodiment
  • Figure 5 is a commonly used layered coding structure of H.265/HEVC video compression in the embodiment
  • Figure 6 is a commonly used reference structure of layered coding for H.265/HEVC video compression in the embodiment
  • Figure 7 is a schematic diagram of the predictive neural network structure in the embodiment.
  • Fig. 8 is a schematic diagram of the neural network structure used in the embodiment.
  • the method for enhancing the quality of video services in a wireless ad hoc network environment in an embodiment of the present invention includes a video down-sampling unit, a video compression unit, a video transmission unit, a video decompression unit, and a video enhancement unit. among them,
  • the video down-sampling unit 100 down-samples the high-resolution video to obtain the corresponding low-resolution video
  • the video compression unit 200 uses the existing video compression standard to compress low-resolution videos and obtain the corresponding video stream;
  • the video transmission unit 300 transmits the above-mentioned video code stream
  • the video decompression unit 400 parses the transmitted code stream to obtain a reconstructed low-resolution video image
  • the video enhancement unit 500 enhances the reconstructed video image to obtain an enhanced video image.
  • the above components are directly or indirectly electrically connected to each other to realize data transmission or interaction.
  • these components can be electrically connected to each other through one or more communication buses or signal lines.
  • the down-sampling neural network applied to the video down-sampling unit of FIG. 1 provided in this embodiment doubles the input high-resolution image Down-sampling, the resolution of the obtained image is a quarter of the original resolution.
  • the original high-resolution video image is H
  • H is obtained by down-sampling neural network to obtain low-resolution video L
  • the reconstructed video image obtained by L at the receiving end is L'
  • L' is restored by up-sampling neural network (see Figure 3)
  • the high-resolution video H'of the network model is trained by minimizing the following objective function:
  • is the weight coefficient of the network.
  • the value of ⁇ is 0.8.
  • the specific parameter configuration of the down-sampling neural network shown in Figure 2 is: a total of 10 convolutional layers. After the first convolutional layer, a down-sampling 2 times operation is performed. This operation takes an average value for every 2 ⁇ 2 pixel block. The size of the convolution kernel of each layer is 3 ⁇ 3. Except for the number of feature maps in the last layer, which is 1, the number of feature maps in the remaining convolutional layers is 64.
  • the up-sampling neural network in Fig. 3 has a symmetrical structure with Fig. 2, and will not be repeated.
  • this embodiment uses the H.265/HEVC encoder to compress to obtain the compressed video Code stream.
  • the encoder configuration is random access mode, and the code group size is set to 8.
  • Fig. 4 is a flowchart of the enhancement method applied to the video enhancement unit of Fig. 1 provided by this embodiment. This embodiment includes the following steps:
  • Step S510 generate a virtual frame
  • a low-resolution frame to be enhanced is selected.
  • FIG. 5 is a commonly used layered coding structure of H.265/HEVC video compression in this embodiment.
  • This embodiment uses a commonly used layered coding structure, that is, the decoding group size is 8 for description.
  • Figure 6 illustrates the coding reference structure between frames in a decoding group, where the 0th and 8th frames are located in the first layer of the reference structure, the 4th frame is located in the second layer, and the 2nd and 6th frames Located on the third layer, the first, third, and fifth frames are on the last layer. Choose to perform multi-frame enhancement on the second-to-last layer, the second and sixth frames, and the first-to-last layer, the first, third, and fifth frames.
  • a virtual frame is obtained through the predictive neural network according to the existing reconstructed frames before and after.
  • the predictive neural network structure is shown in Figure 7.
  • the specific parameter configuration of the predictive neural network is: a total of 10 convolutional layers, the first convolutional layer includes two groups, respectively input forward and backward frames, and the output of the two is cascaded and then undergoes 9 layers of convolution in sequence.
  • the first 4 layers of convolution are subjected to a down-sampling 2 times operation, which takes an average value for every 2 ⁇ 2 pixel block; after the last 4 layers of convolution, an up-sampling 2 times operation is carried out, and this operation will Each pixel is expanded into a 2 ⁇ 2 pixel block; the last layer of convolution generates a virtual frame.
  • the size of the convolution kernel of each layer is 3 ⁇ 3. Except for the number of feature maps in the last layer, which is 1, the number of feature maps in the remaining convolutional layers is 64.
  • the prediction neural network is trained by minimizing the following objective function:
  • is the weight coefficient of the prediction neural network.
  • Step S520 training an enhanced neural network to obtain an image enhancement model
  • sub-step S521 a reconstructed low-resolution image is obtained at the decoding end;
  • sub-step S522 the reconstructed low-resolution image and its virtual frames are used as samples, and the corresponding original high-resolution images are used as labels to form a training set;
  • the neural network is trained using the above-mentioned samples and their corresponding labels to obtain a video image quality enhancement model.
  • the training rules are:
  • the reconstructed low-resolution image and the corresponding virtual frame pass through the enhanced neural network to obtain Y', where Y is the original high-resolution image corresponding to it, and ⁇ is the neural network parameter.
  • an H.265/HEVC encoder is used to compress and decompress the low-resolution video obtained by downsampling to obtain a reconstructed low-resolution video image to be enhanced.
  • different training sets can be constructed to enhance the neural network according to different coding configurations, different coding parameters, and different video contents.
  • different training sets are constructed according to quantization parameters, different zoom sizes, and different frame positions, and the training set is used to train the enhanced neural network to obtain the corresponding image enhancement model.
  • the enhanced neural network used has 11 layers, and the convolution kernel size of each layer is 3 ⁇ 3.
  • the reconstructed low-resolution video frame and its virtual frame are sent to the first layer. There are several virtual The frame corresponds to several inputs. After the feature map output by the first layer of convolution is cascaded, it is sent to the up-sampling convolutional neural network structure shown in Figure 3.
  • Figure 8 is the input virtual frame 1, virtual frame 2, and Enhanced neural network structure of the current frame.
  • Step S530 making a decision to select an enhanced model
  • each video frame corresponds to a basic quantization parameter, zoom size and frame position.
  • the corresponding enhancement model is selected according to its basic quantization parameter, corresponding zoom size, and frame position .
  • Step S540 using the virtual frame of the frame and the reconstructed frame of the frame to enhance the quality of the frame to obtain an enhanced reconstructed image. Enhance video quality;
  • the neural network shown in Figure 8 is used for enhancement, but the number of convolutions corresponding to the input should be the same as the input frame (including virtual frame). Corresponds to the number of the current frame). That is, in addition to itself,
  • the first frame uses its corresponding virtual frame 1 for enhancement
  • the second frame uses its corresponding virtual frame 2 for enhancement
  • the third frame uses its corresponding virtual frame 1 and virtual frame 3 for enhancement;
  • the fifth frame uses its corresponding virtual frame 1 and virtual frame 3 for enhancement;
  • Frame 6 uses its corresponding virtual frame 2 for enhancement
  • the seventh frame uses its corresponding virtual frame 1 for enhancement.
  • an enhanced strategy based on a single frame is adopted, that is, the structure corresponding to the two inputs of virtual frame 1 and virtual frame 2 is removed in FIG. 8.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本发明公开了一种用于无线自组织网络环境下进行视频业务质量增强的方法,属于无线自由组网传输及视频图像处理技术研究领域。在带宽受限环境下,通过下采样神经网络对将传输的高分辨率视频进行下采样,得到低分辨率视频,进而对低分辨率视频进行传输以降低带宽需求。在接收端,通过基于视频增强技术,尤其是对一些量化参数较高的低质量帧采用多帧质量增强技术,使用预测神经网络根据当前帧前后图像生成虚拟帧,再通过增强神经网络增强当前帧的质量,得到高分辨率视频图像,提高了主观性能。本发明在降低视频传输业务带宽需求的同时,尽量保证视频质量,达到了有益效果。

Description

无线自组织网络带宽受限下的视频业务质量增强方法 技术领域
本发明涉及无线自组织网络传输技术领域,具体地说,涉及一种在无线自组织网络带宽受限下的视频业务质量增强方法。
背景技术
随着传感器技术、网络技术、人工智能技术的发展,大规模无人系统在工业、军事等领域正得到越来越广泛的应用。如工业传感器网络可用于复杂环境的探测、监视;军事无人机、无人船等可以代替人工进行侦察和任务处理等工作。这些群体协作无人系统能够在野外无预布置网络下顺利精确地相互交互、协同工作,需要无线自组织网络支持;而类似探测、监视、监控、态势感知等功能都需要在无线自组织网络有限的带宽下,实现良好的视频业务质量支撑。
由于带宽与存储的限制,视频一般要经过压缩以大幅减少编码比特数。然而,压缩视频不可避免地存在噪声,严重影响了主观质量。此外,这些噪声可能会降低分类和识别任务的准确性。由于人工智能技术和产业逐步成熟,结合深度神经网络的视频传输与压缩方式渐渐成为一个新的发展方向。
在视频压方面,已经有不少压缩标准得到了广泛应用。H.264/AVC是国际电信联盟(ITU)和国际标准化组织(ISO/IEC)共同提出的视频编码标准,于2003年发布标准。H.265/HEVC是国际电信联盟(ITU)和国际标准化组织(ISO/IEC)共同提出的新一代的高效视频编码标准,相比于前代H.264/AVC,同样编码质量下H.265/HEVC可以节约50%的码率。
不过,现有的视频传输技术与视频压缩技术对视频图像分辨率不进行修改,因此在带宽受限情况下,对实时传输与编码带来困难。为解决上述问题,本发明提出在压缩传输高分辨率视频图像前,对视频图像进行下采用,得到相应的低分辨率视频图像,因此大大降低了无线网络的传输负担,提升了传输效率。在接收端得到重建的低分辨率视频图像后,使用神经网络对图像进行增强,再次获得高分辨率图像。这种方式能够大幅度降低传输带宽需求,满足带宽受限情况下的实时传输。
发明内容
本发明提供了一种无线自组织网络带宽受限下的视频业务质量增强方法,为达到上述目的,本发明采用在传输前对高分辨率视频下采样,接收时再对视频质量进行增强的方法,尤其是对一些视频帧采用多帧的方式进行增强,其过程包括:
(1)对高分辨率视频图像进行下采样操作,将视频图像缩小2 M倍,M是非负整数得到对应的低分辨率视频;
(1.1)所述的下采样操作由下采样神经网络完成,并且对不同采样倍数训练不同的网络模型;
(1.2)对所述的下采样神经网络进行训练时,设有相应的上采样神经网络,两个网络共享使用同样的参数。设原始高分辨率视频图像为H,H经过下采样神经网络得到低分辨率视频L,L在接收端得到的重建视频图像为L’,L’经过上采样神经网络得到恢复的高分辨率视频H’,通过最小化下列目标函数对网络模型进行训练:
Figure PCTCN2020084255-appb-000001
其中,θ是网络的权重系数。
(2)对得到的低分辨率视频进行信源编码,并在无线自组织网络环境下,对得到的码
流进行传输;
(3)在接收端,对接收到的码流进行信源解码,得到重建的低分辨率视频;
(4)对于该重建的低分辨率视频,以编码组为单位,选定位于编码参考结构中较低层次的视频帧作为待使用多帧方法进行增强的视频帧;
所述的位于编码参考结构中较低层次的视频帧是指在一个编码组中量化参数相对较大的帧。
(5)对选定的视频帧,利用其邻近已经解码的重建帧来预测该帧,得到该帧的虚拟帧;
(5.1)所述的邻近已经解码的重建帧是位置对称且成对出现的,它们的基础量化参数值高于当前帧,具体地:
当前待增强帧的前面第一帧图像与当前待增强帧的后面第一帧图像,根据这对重建帧可得到当前帧的虚拟帧,标记为虚拟帧1;
当前待增强帧的前面第二帧图像与当前待增强帧的后面第二帧图像,根据这对重建帧可得到当前帧的虚拟帧,标记为虚拟帧2;
当前待增强帧的前面第三帧图像与当前待增强帧的后面第三帧图像,根据这对重建帧可得到当前帧的虚拟帧,标记为虚拟帧3;
以此类推。
(5.2)所述的利用其邻近已经解码的重建帧来预测该帧的预测方法是一种基于神经网络的方法,该预测神经网络输入当前帧前后成对的两帧图像,得到当前帧的虚拟帧。设当前帧为X,其前后成对的两帧图像经过网络得到的预测帧为X’,通过最小化以下目标函数训练预测神经网络:
Figure PCTCN2020084255-appb-000002
其中,ω是神经网络的权重系数。
(6)利用该帧的虚拟帧与该帧的重建帧,对该帧进行质量增强,得到增强的重建图像。
(6.1)所述的质量增强是指使用增强神经网络,输入待增强帧的当前帧的虚拟帧与当前帧,输出增强的当前帧,增强后视频帧的分辨率放大了2 P倍,其中P是非负整数。
(6.2)所输入的具体虚拟帧与当前待增强帧在编码组中的参考层次有关,可能是相对应的虚拟帧1、虚拟帧2、……、虚拟帧N(N是正整数)中的任意一帧或任意两帧,以此类推,或全部帧。
进一步地,作为优选,编码组尺寸为8的情况下,选择位于编码参考结构倒数第一层的帧与位于倒数第二层的帧进行多帧增强;编码组尺寸为16的情况下,选择位于编码参考结构倒数第一层的帧、倒数第二层的帧与倒数第三层的帧进行多帧增强;
进一步地,作为优选,对于位于编码组参考帧结构最后一层的帧,其对应的虚拟帧1将被用到。
在传输前利用下采样技术对高分辨率视频进行了下采样,在接收时再进行增强可以有效降低带宽需求。与现有技术相比,本发明的有益效果为:本发明在传输前利用了神经网络进行下采样,保证了采样有利于编码重建图像的质量恢复,并使得传输带宽需求成倍降低,在同等带宽条件下可以传输更多视频内容。在接收端,一些量化参数高的图像质量受损尤为严重,本发明通过多帧质量增强技术提高其图像,具体地,利用神经网络通过预测的方式根据前后多帧图像预测得到当前帧的虚拟帧,进而辅助当前帧的质量增强,提高了增强性能。综上,本发明在降低视频传输业务带宽需求的同时,尽量保证视频质量,达到了有益效果。
附图说明
图1是在无线自组织网络环境下进行视频业务质量增强的方法示意图;
图2是实施例中的下采样神经网络结构示意图;
图3是实施例中的上采样神经网络结构示意图;
图4是实施例提供的应用于图1的视频增强单元的增强方法流程图;
图5是实施例中H.265/HEVC视频压缩的常用的分层编码结构;
图6为实施例中H.265/HEVC视频压缩的常用的分层编码参考结构;
图7是实施例中的预测神经网络结构示意图;
图8为实施例中所用的神经网络结构示意图。
具体实施方式
为了使本发明的技术方案和优点变得更加清晰,接下来将结合附图对技术方案的具体实施方式作更加详细地说明:
参见图1,本发明实施例用于在无线自组织网络环境下进行视频业务质量增强的方法包括视频下采样单元、视频压缩单元、视频传输单元、视频解压缩单元与视频增强单元。其中,
视频下采样单元100对高分辨率视频进行下采样,得到对应的低分辨率视频;
视频压缩单元200利用现有视频压缩标准,对低分辨率视频进行压缩,并得到对应的视频码流;
视频传输单元300对上述视频码流进行传输;
视频解压缩单元400对传输过来的码流进行解析,得到重建的低分辨率视频图像;
视频增强单元500对重建的视频图像进行增强,得到增强的视频图像。
以上各元件相互之间直接或间接地电性连接,以实现数据的传输或交互。例如,这些元件相互之间可通过一条或多条通讯总线或信号线实现电性连接。
本实施例的具体实施方式如下:
(1)本实施例中提供的应用于图1的视频下采样单元的下采样神经网络,参见图2所示的本实施例的下采样神经网络结构,对输入的高分辨率图像进行2倍下采样,得到图像的分辨率为原分辨率的四分之一。设原始高分辨率视频图像为H,H经过下采样神经网络得到低分辨率视频L,L在接收端得到的重建视频图像为L',L'经过上采样神经网络(参见图3)得到恢复的高分辨率视频H',通过最小化下列目标函数对网络模型进行训练:
Figure PCTCN2020084255-appb-000003
其中,θ是网络的权重系数。本实施例中,作为优选,λ取值为0.8。
图2所示的下采样神经网络具体参数配置是:共10层卷积层,第一层卷积层之后进行了下采样2倍操作,该操作对每2×2像素块取一个平均值,每层的卷积核尺寸都是3×3,除最后一层的特征图数目为1外,其余卷积层的特征图数目为64。
图3的上采样神经网络与图2具有对称结构,不再重复。
(2)对得到的低分辨率视频进行信源编码,并在无线自组织网络环境下,对得到的码流进行传输;本实施例使用H.265/HEVC编码器进行压缩,得到压缩的视频码流。其中编码器配置是随机访问模式,编码组尺寸设置为8。
(3)利用H.265/HEVC解码器对传输过来的码流进行解析,得到重建的低分辨率视频图像。
(4)对于该重建的低分辨率视频,以编码组为单位,选定位于编码参考结构中较低层次的视频帧作为待使用多帧方法进行增强的视频帧;本实施例中的编码组尺寸为8,其对应的编码参考结构如图5与图6。在一个编码组中,第1、2、3、5、6、7帧量化参数较大,被选定来进行多帧增强。
(5)对选定的视频帧,利用其邻近已经解码的重建帧来预测该帧,得到该帧的虚拟帧;其中,增强时候对图像进行2 P(P是非负整数)上采样。本实施例中,P取值为2。图4为 本实施例提供的应用于图1的视频增强单元的增强方法流程图。本实施例中包括以下步骤:
步骤S510,生成虚拟帧;
子步骤S511,选择待增强的低分辨率帧。参见图5,图5是本实施例中H.265/HEVC视频压缩的常用的分层编码结构。本实施例以一种常用的分层编码结构,即解码组大小为8进行说明。图6说明了在一个解码组内帧之间的编码参考结构,其中,第0帧与第8帧位于该参考结构的第一层,第4帧位于第二层,第2帧与第6帧位于第3层,第1帧、第3帧与第5帧位于最后一层。选择对倒数第2层,即第2帧与第6帧、倒数第1层,即第1帧、第3帧与第5帧进行多帧增强。
子步骤S512,根据前、后已有的重建帧,通过预测神经网络获取虚拟帧。具体地,
通过第0帧与第4帧获取第2帧的虚拟帧2;
通过第4帧与第8帧获取第6帧的虚拟帧2;
通过第0帧与第2帧获取第1帧的虚拟帧1;
通过第2帧与第4帧获取第3帧的虚拟帧1;
通过第0帧与第6帧获取第3帧的虚拟帧3;
通过第4帧与第6帧获取第5帧的虚拟帧1;
通过第2帧与第8帧获取第5帧的虚拟帧3;
通过第6帧与第8帧获取第7帧的虚拟帧1;
其中,预测神经网络结构参见图7。预测神经网络具体参数配置是:共10层卷积层,第一层卷积层包括两组,分别输入前向与后向帧,两者输出级联后依次经过9层卷积。其中,前4层卷积后分别进行了下采样2倍操作,该操作对每2×2像素块取一个平均值;后4层卷积后分别进行了上采样2倍操作,该操作将每个像素点扩展成一个2×2像素块;最后一层卷积生成虚拟帧。每层的卷积核尺寸都是3×3,除最后一层的特征图数目为1外,其余卷积层的特征图数目为64。
设当前帧为X,其前后成对的两帧图像经过网络得到的预测帧为X’,通过最小化以下目标函数训练预测神经网络:
Figure PCTCN2020084255-appb-000004
其中,ω是预测神经网络的权重系数。
步骤S520,训练增强神经网络,得到图像增强模型;
子步骤S521,在解码端得到重建的低分辨率图像;
子步骤S522,以重建的低分辨率图像以及它的虚拟帧们作为样本,与其对应的原始高分辨率图像作为标签,形成训练集;
子步骤S523,利用上述样本和其对应的标签对神经网络进行训练,获得视频图像质量 增强模型。其中训练规则为:
Figure PCTCN2020084255-appb-000005
其中,重建的低分辨率图像与对应的虚拟帧经过增强神经网络得到Y',Y是与其对应的原始高分辨率图像,Θ是神经网络参数。
本实施例使用H.265/HEVC编码器对下采样得到的低分辨率视频进行压缩与解压缩,得到待增强的重建的低分辨率视频图像。其中,可根据不同编码配置、不同编码参数以及不同视频内容,为增强神经网络分别构造不同的训练集。本实施例中,根据量化参数、不同缩放尺寸、不同帧位置,构造不同的训练集,使用训练集训练增强神经网络,得到对应的图像增强模型。在本实施例中,所使用的增强神经网络共11层,每层的卷积核尺寸都是3×3,重建的低分辨率视频帧与其虚拟帧被送入第1层,有几个虚拟帧就对应几个输入,经过第一层卷积所输出的特征图被级联后,送入图3所示的上采样卷积神经网络结构,图8是输入虚拟帧1、虚拟帧2与当前帧的增强神经网络结构。
步骤S530,决策选择增强模型;
解码时,每个视频帧都对应一个基础量化参数、缩放尺寸和帧位置,对一帧重建的低分辨率视频帧,都根据其基础量化参数、对应的缩放尺寸、帧位置选择相应的增强模型。
步骤S540,利用该帧的虚拟帧与该帧的重建帧,对该帧进行质量增强,得到增强的重建图像。进行视频质量增强;
本实施例中,对于编码参考结构中的倒数第1层与倒数第2层的视频帧,采用图8所示的神经网络进行增强,不过输入对应的卷积数目应与输入帧(包括虚拟帧与当前帧)的数目对应。也就是,除自身外,
第1帧使用其对应的虚拟帧1进行增强;
第2帧使用其对应的虚拟帧2进行增强;
第3帧使用其对应的虚拟帧1与虚拟帧3进行增强;
第5帧使用其对应的虚拟帧1与虚拟帧3进行增强;
第6帧使用其对应的虚拟帧2进行增强;
第7帧使用其对应的虚拟帧1进行增强。
对于编码参考结构中的其它帧,则采用基于单帧的增强策略,即图8中去掉虚拟帧1与虚拟帧2这两个输入对应的结构。
以上显示和描述了本发明的基本原理、主要特征及优点。本行业的技术人员应该了解,本发明不受上述实施例的限制,上述实施例和说明书中描述的只是说明本发明的原理,在不脱离本发明精神和范围的前提下,本发明还会有各种变化和改进,这些变化和改进都落入要求保护的本发明范围内。本发明要求保护范围由所附的权利要求书及其等效物界定。

Claims (2)

  1. 一种无线自组织网络带宽受限下的视频业务质量增强方法,其特征在于,在无线自组织网络环境下进行高分辨率视频业务传输时,包括以下步骤:
    (1)对高分辨率视频图像进行下采样操作,将视频图像缩小2 M倍,M是非负整数得到对应的低分辨率视频;
    所述的下采样操作由下采样神经网络完成,并且对不同采样倍数训练不同的网络模型;
    对所述的下采样神经网络进行训练时,设有与下采样神经网络结构对称的上采样神经网络,两个网络共享使用同样的参数,设原始高分辨率视频图像为H,H经过下采样神经网络得到低分辨率视频L,L在接收端得到的重建视频图像为L’,L’经过上采样神经网络得到恢复的高分辨率视频H’,通过最小化下列目标函数对网络模型进行训练:
    Figure PCTCN2020084255-appb-100001
    其中,θ是网络的权重系数,n为训练样本的数目;
    (2)对得到的低分辨率视频进行信源编码,并在无线自组织网络环境下,对得到的码流进行传输;
    (3)在接收端,对接收到的码流进行信源解码,得到重建的低分辨率视频;
    (4)对于该重建的低分辨率视频,以编码组为单位,选定位于编码参考结构中较低层次的视频帧作为待使用多帧方法进行增强的视频帧;
    所述的位于编码参考结构中较低层次的视频帧是指在一个编码组中量化参数相对较大的帧;
    (5)对选定的视频帧,利用其邻近已经解码的重建帧来预测该帧,得到该帧的虚拟帧;
    (5.1)所述的邻近已经解码的重建帧是位置对称且成对出现的,它们的基础量化参数值高于当前帧,具体地:
    设当前待增强帧的前面第i帧图像与当前待增强帧的后面第i帧图像为邻近已经解码的重建帧,根据这对重建帧可得到当前帧的虚拟帧,标记为虚拟帧i;
    (5.2)所述的利用其邻近已经解码的重建帧来预测该帧的预测方法是一种基于神经网络的方法,该预测神经网络输入当前帧前后成对的两帧图像,得到当前帧的虚拟帧;设当前帧为X,其前后成对的两帧图像经过该预测神经网络得到的预测帧为X’,通过最小化以下目标函数训练该预测神经网络:
    Figure PCTCN2020084255-appb-100002
    其中,ω是神经网络的权重系数,m是训练样本的数目;
    (6)当前帧经过解码得到了当前帧的重建帧,利用该帧的虚拟帧与该帧的重建帧,对该帧进行质量增强,得到增强的重建图像;
    所述的质量增强是指待增强的当前帧的虚拟帧与重建帧帧经过卷积并将生成的特征图级联后,使用与下采样神经网络结构对称的上采样神经网络,输出增强的当前帧,增强后视频帧的分辨率放大了2 P倍,其中P是非负整数。
  2. 根据权利要求1所述的一种无线自组织网络带宽受限下的视频业务质量增强方法,其特征在于,所述的编码参考结构中较低层次的帧的位置与编码组尺寸有关,编码组尺寸为8的情况下,选择位于编码参考结构倒数第一层的帧与位于倒数第二层的帧;编码组尺寸为16的情况下,选择位于编码参考结构倒数第一层的帧、倒数第二层的帧与倒数第三层的帧。
PCT/CN2020/084255 2019-05-24 2020-04-10 无线自组织网络带宽受限下的视频业务质量增强方法 WO2020238439A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910437633.1 2019-05-24
CN201910437633.1A CN110099280B (zh) 2019-05-24 2019-05-24 一种无线自组织网络带宽受限下的视频业务质量增强方法

Publications (1)

Publication Number Publication Date
WO2020238439A1 true WO2020238439A1 (zh) 2020-12-03

Family

ID=67449075

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/084255 WO2020238439A1 (zh) 2019-05-24 2020-04-10 无线自组织网络带宽受限下的视频业务质量增强方法

Country Status (2)

Country Link
CN (1) CN110099280B (zh)
WO (1) WO2020238439A1 (zh)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110099280B (zh) * 2019-05-24 2020-05-08 浙江大学 一种无线自组织网络带宽受限下的视频业务质量增强方法
CN112468830A (zh) * 2019-09-09 2021-03-09 阿里巴巴集团控股有限公司 视频图像处理方法、装置及电子设备
CN113920010A (zh) * 2020-07-10 2022-01-11 华为技术有限公司 图像帧的超分辨率实现方法和装置
CN113341712B (zh) * 2021-05-31 2022-10-11 西南电子技术研究所(中国电子科技集团公司第十研究所) 无人机自主控制系统智能分层递阶控制选择方法
CN116017004A (zh) * 2021-10-21 2023-04-25 伊姆西Ip控股有限责任公司 用于流式传输的方法、系统和计算机程序产品
WO2023133889A1 (zh) * 2022-01-17 2023-07-20 深圳市大疆创新科技有限公司 图像处理方法、装置、遥控设备、系统及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106162180A (zh) * 2016-06-30 2016-11-23 北京奇艺世纪科技有限公司 一种图像编解码方法及装置
CN106960416A (zh) * 2017-03-20 2017-07-18 武汉大学 一种内容复杂性自适应的视频卫星压缩图像超分辨率方法
WO2019009490A1 (ko) * 2017-07-06 2019-01-10 삼성전자 주식회사 영상을 부호화/복호화 하는 방법 및 그 장치
CN110099280A (zh) * 2019-05-24 2019-08-06 浙江大学 一种无线自组织网络带宽受限下的视频业务质量增强方法

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101808244B (zh) * 2010-03-24 2012-03-14 北京邮电大学 一种视频传输控制方法及系统
WO2016132145A1 (en) * 2015-02-19 2016-08-25 Magic Pony Technology Limited Online training of hierarchical algorithms
TWI624804B (zh) * 2016-11-07 2018-05-21 盾心科技股份有限公司 利用超解析重建法生成高解析度影像的方法與系統
CN108012157B (zh) * 2017-11-27 2020-02-04 上海交通大学 用于视频编码分数像素插值的卷积神经网络的构建方法
CN108235058B (zh) * 2018-01-12 2021-09-17 广州方硅信息技术有限公司 视频质量处理方法、存储介质和终端
CN108307193B (zh) * 2018-02-08 2018-12-18 北京航空航天大学 一种有损压缩视频的多帧质量增强方法及装置
CN108596855A (zh) * 2018-04-28 2018-09-28 国信优易数据有限公司 一种视频图像质量增强方法、装置以及视频画质增强方法
CN109102462B (zh) * 2018-08-01 2023-04-07 中国计量大学 一种基于深度学习的视频超分辨率重建方法
CN109242919A (zh) * 2018-09-29 2019-01-18 中国科学技术大学 一种图像下采样方法
CN109492940A (zh) * 2018-12-06 2019-03-19 华中科技大学 一种用于造型浇注生产线的铸件炉包箱质量跟踪方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106162180A (zh) * 2016-06-30 2016-11-23 北京奇艺世纪科技有限公司 一种图像编解码方法及装置
CN106960416A (zh) * 2017-03-20 2017-07-18 武汉大学 一种内容复杂性自适应的视频卫星压缩图像超分辨率方法
WO2019009490A1 (ko) * 2017-07-06 2019-01-10 삼성전자 주식회사 영상을 부호화/복호화 하는 방법 및 그 장치
CN110099280A (zh) * 2019-05-24 2019-08-06 浙江大学 一种无线自组织网络带宽受限下的视频业务质量增强方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
LI, YUE ET AL.: "Learning a Convolutional Neural Network for Image Compact-Resolution", IEEE TRANSACTIONS ON IMAGE PROCESSING, vol. 28, no. 3, 28 September 2018 (2018-09-28), XP011703593, DOI: 20200608152517A *

Also Published As

Publication number Publication date
CN110099280B (zh) 2020-05-08
CN110099280A (zh) 2019-08-06

Similar Documents

Publication Publication Date Title
WO2020238439A1 (zh) 无线自组织网络带宽受限下的视频业务质量增强方法
CN112203093B (zh) 一种基于深度神经网络的信号处理方法
CN111726633B (zh) 基于深度学习和显著性感知的压缩视频流再编码方法
CN106899861B (zh) 一种图片文件处理方法及其设备、系统
CN110751597B (zh) 基于编码损伤修复的视频超分辨方法
CN111586412B (zh) 高清视频处理方法、主设备、从设备和芯片系统
CN112053408B (zh) 基于深度学习的人脸图像压缩方法及装置
CN113079378B (zh) 图像处理方法、装置和电子设备
CN109361919A (zh) 一种联合超分辨率和去压缩效应的图像编码性能提升方法
CN109922339A (zh) 结合多采样率下采样和超分辨率重建技术的图像编码框架
CN116437102B (zh) 可学习通用视频编码方法、系统、设备及存储介质
CN116582685A (zh) 一种基于ai的分级残差编码方法、装置、设备和存储介质
CN111080729B (zh) 基于Attention机制的训练图片压缩网络的构建方法及系统
Tan et al. Image compression algorithms based on super-resolution reconstruction technology
Yang et al. Graph-convolution network for image compression
CN114463449B (zh) 一种基于边缘引导的高光谱图像压缩方法
CN115150628B (zh) 具有超先验引导模式预测的由粗到细深度视频编码方法
KR102604657B1 (ko) 영상 압축 성능 개선 방법 및 장치
CN113256521B (zh) 一种数据缺失的错误隐藏方法及装置
Li et al. An Intelligent Image Compression Method Based on Generative Adversarial Networks for Satellites
US20240244218A1 (en) Encoding method, decoding method, bitstream, encoder, decoder, storage medium, and system
CN111031312B (zh) 基于网络实现注意力机制的图像压缩方法
CN102905129B (zh) 静止图像的分布式编码方法
Li et al. You Can Mask More For Extremely Low-Bitrate Image Compression
CN117915107B (zh) 图像压缩系统、图像压缩方法、存储介质与芯片

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20815284

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20815284

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 20815284

Country of ref document: EP

Kind code of ref document: A1