CN114666620A - 基于视觉敏感度的自适应流媒体方法 - Google Patents

基于视觉敏感度的自适应流媒体方法 Download PDF

Info

Publication number
CN114666620A
CN114666620A CN202210272937.9A CN202210272937A CN114666620A CN 114666620 A CN114666620 A CN 114666620A CN 202210272937 A CN202210272937 A CN 202210272937A CN 114666620 A CN114666620 A CN 114666620A
Authority
CN
China
Prior art keywords
video
visual sensitivity
video block
pixel
bit rate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210272937.9A
Other languages
English (en)
Other versions
CN114666620B (zh
Inventor
叶进
但萌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangxi University
Original Assignee
Guangxi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangxi University filed Critical Guangxi University
Priority to CN202210272937.9A priority Critical patent/CN114666620B/zh
Publication of CN114666620A publication Critical patent/CN114666620A/zh
Application granted granted Critical
Publication of CN114666620B publication Critical patent/CN114666620B/zh
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23418Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23424Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for inserting or substituting an advertisement
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • H04N21/234309Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements by transcoding between formats or standards, e.g. from MPEG-2 to MPEG-4 or from Quicktime to Realvideo
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • H04N21/23439Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements for generating different versions
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/266Channel or content management, e.g. generation and management of keys and entitlement messages in a conditional access system, merging a VOD unicast channel into a multicast channel
    • H04N21/2662Controlling the complexity of the video stream, e.g. by scaling the resolution or bitrate of the video stream based on the client capabilities
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44016Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440218Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by transcoding between formats or standards, e.g. from MPEG-2 to MPEG-4
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/44029Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display for generating different versions
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/45Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
    • H04N21/462Content or additional data management, e.g. creating a master electronic program guide from data received from the Internet and a Head-end, controlling the complexity of a video stream by scaling the resolution or bit-rate based on the client capabilities
    • H04N21/4621Controlling the complexity of the content stream or additional data, e.g. lowering the resolution or bit-rate of the video stream for a mobile client with a small screen
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Marketing (AREA)
  • Business, Economics & Management (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本发明为一种基于视觉敏感度的自适应流媒体方法,在传统的自适应方法仅考虑网络条件、播放器状态的基础上,考虑了人类视觉系统感知不同视频内容质量失真的敏感度对用户体验质量(QoE)的影响。基于提取的四种特征映射图,本方法采用深度卷积神经网络构建总体掩蔽效应模型,并推导出视觉敏感度模型。通过对优化目标QoE的建模,在强化学习的框架下,综合考虑可用信息进行比特率决策以最大化用户QoE。本发明能够实现基于视觉敏感度的比特率分配,进一步提高流媒体资源利用率和用户体验质量。

Description

基于视觉敏感度的自适应流媒体方法
技术领域
本发明涉及流媒体传输技术领域,具体涉及一种基于视觉敏感度的自适应流媒体方法。
背景技术
近年来,用户对高视频服务质量的需求迅速增长,传统的内容提供商为用户提供多种视频比特率以供选择。然而,由于网络带宽的不稳定性和用户需求的多样性,固定的比特率无法实现令人满意的视频流服务。为了应对这一挑战,国际标准化组织MPEG提出了自适应流媒体技术,客户端播放器使用自适应比特率(ABR)算法,根据网络条件动态地选择未来视频块的比特率,旨在最大化用户体验质量(QoE)。目前,ABR算法通常仅根据预测的网络带宽和当前播放器状态为视频块选择比特率,而忽略了视频内容和人类视觉固有的特性对用户体验质量的影响。由于人类视觉系统(HVS)对不同视频内容的质量失真具有不同的敏感度,视觉敏感度高的视频内容具有更高的视觉重要性,需要分配更多的比特率资源以提高用户感知质量。因此,现有的自适应比特率算法在资源分配和最大化QoE方面仍存在一定的局限性,无法满足当前高质量流媒体服务的部署和发展要求。故急需一种更科学、高效的自适应流媒体方法。
发明内容
本发明所要解决的是目前ABR算法仅根据预测的网络带宽和当前播放器状态为视频块选择比特率,而在最大化用户体验质量、提高资源利用率方面存在局限性的问题,提供一种基于视觉敏感度的自适应流媒体方法。
为解决上述问题,本发明是通过以下技术方案实现的:
基于视觉敏感度的自适应流媒体方法,包括步骤如下:
步骤1、将源视频文件切割成等长的视频块,并将每个视频块转码为不同的比特率级别;
步骤2、分别从每个视频块的最高比特率的视频块中采样K+1个的视频帧,并将采样所得到的前K个视频帧作为该视频块的采样视频帧;其中K为设定值;
步骤3、计算每个采样视频帧的空间随机性映射图、亮度映射图、时间映射图和显著性映射图;
步骤4、构建总体掩蔽效应模型;将每个采样视频帧的空间随机性映射图、亮度映射图、时间映射图和显著性映射图均以设定大小网格切割为多个区域,并分别随机选择一定数量的区域作为空间随机性映射图、亮度映射图、时间映射图和显著性映射图的区域样本,并将这些区域样本送入到总体掩蔽效应模型,得到该采样视频帧的第一个刚刚可见差异点的量化参数预测值;
步骤5、将每个视频块的所有采样视频帧的第一个刚刚可见差异点的量化参数预测值的平均值作为该视频块的第一个刚刚可见差异点的量化参数预测值,并利用第一个刚刚可见差异点的量化参数预测值计算该视频块的视觉敏感度;
Figure BDA0003554507030000021
式中,VSt为第t个视频块的视觉敏感度,QPt为第t个视频块的第一个刚刚可见差异点的量化参数预测值,QPmax为视频提供商提供的量化参数最大阈值,t=1,2,...,T,T为源视频文件的视频块的数量;
步骤6、使用包含综合考虑视频块的视觉敏感度和比特率的视频质量、质量平滑度、以及卡顿时间的线性用户体验质量模型作为自适应比特率算法的优化目标,并将比特率决策建模为一个基于强化学习的最优化问题,根据观察到的当前网络环境,通过最大化奖励函数即定义的线性用户体验质量模型来不断学习优化当前的比特率决策。
第k个采样视频帧的空间随机性映射图SMRk(i,j)为:
Figure BDA0003554507030000022
第k个采样视频帧的时间映射图TMk(i,j)为:
TMk(i,j)=|Lk+1(i,j)-Lk(i,j)|
第k个采样视频帧的亮度映射图LMk(i,j)为:
LMk(i,j)=Lk(i,j)
第k个采样视频帧的显著性映射图SMk(i,j)为:
Figure BDA0003554507030000023
式中,
Figure BDA0003554507030000024
为第k个采样视频帧在像素(i,j)处的四邻域像素亮度向量,
Figure BDA0003554507030000025
Lk(i,j+1)为第k个采样视频帧在像素(i,j+1)处的亮度值,Lk(i+1,j)为第k个采样视频帧在像素(i+1,j)处的亮度值,Lk(i,j-1)为第k个采样视频帧在像素(i,j-1)处的亮度值,Lk(i-1,j)为第k个采样视频帧在像素(i-1,j)处的亮度值;Lk(i,j)为第k个采样视频帧在像素(i,j)处的亮度值;
Figure BDA0003554507030000026
Figure BDA0003554507030000027
的自相关矩阵;
Figure BDA0003554507030000028
为关于Lk(i,j)与
Figure BDA0003554507030000029
的协方差矩阵;||表示取绝对值;Lk+1(i,j)为第k+1个采样视频帧在像素(i,j)处的亮度值;
Figure BDA00035545070300000210
为第k个采样视频帧在像素(i,j)处的CBY颜色值;
Figure BDA00035545070300000211
为第k个采样视频帧在像素(i,j)处的CRG颜色值;
Figure BDA00035545070300000212
为第k个采样视频帧在像素(i,j)处的方向值;k=1,2,...,K,K为每个视频块的采样视频帧的数量。
上述步骤4、所构建总体掩蔽效应模型由4个完全相同的子通道模块、连接层、加权模块、回归模块和加权池化层组成;每个子通道模块依次由两层卷积层、一层最大池化层和VGG卷积神经网络串联而成;加权模块依次由一层全连接层、一层激活层、一层正则化层、一层全连接层和一层激活层串联而成;回归模块依次由一层全连接层、一层激活层、一层正则化层和一层全连接层串联而成;4个子通道模块分别输入空间随机性映射图、亮度映射图、时间映射图和显著性映射图的区域样本,4个子通道模块的输出均接连接层的输入,连接层的输出同时接加权模块和回归模块的输入,加权模块和回归模块的输出同时接加权池化层的输入,加权池化层输出第一个刚刚可见差异点的量化参数预测值。
上述步骤6中,用户体验质量模型QoE(Rt)为:
QoE(Rt)=Q(Rt)+S(Rt)-B(Rt)
其中:
Figure BDA0003554507030000031
Figure BDA0003554507030000032
Figure BDA0003554507030000033
式中,Rt为视频块t的比特率;Q(Rt)为视频质量的奖励函数;S(Rt)为质量平滑度函数;B(Rt)为卡顿时间的惩罚函数;VSt为第t个视频块的视觉敏感度,max(VSt)为源视频文件中所有视频块的最大视觉敏感度,min(VSt)为源视频文件中所有视频块的最小视觉敏感度;VMAF(Rt)为第t个视频块的VMAF度量;μ为设定的归一化权重,ξ为设定的归一化偏差;Rt-1为视频块t-1的比特率;VMAF(Rt-1)为第t-1个视频块的VMAF度量;γ为设定的正质量平滑度的权重参数,δ为设定的负质量平滑度的权重参数;max(a,b)表示取a和b中的较大值;β是卡顿时间的惩罚权重;Lt-1为第t个视频块下载前视频播放器的缓冲区占用率;C为视频块的时长;vt为第t个视频块的平均下载速度。
与现有技术相比,本发明在传统的自适应方法仅考虑网络条件、播放器状态的基础上,考虑了人类视觉系统(HVS)感知不同视频内容质量失真的敏感度对用户体验质量(QoE)的影响。本发明采用多种视频内容特征构建总体掩蔽效应模型,并计算不同视频块的视觉敏感度值。通过对优化目标QoE的建模,基于深度强化学习框架建立自适应比特率决策模型。本发明能够实现基于视觉敏感度的比特率分配,以更有效地利用比特率资源,进一步优化用户感知质量。
附图说明
图1为本发明的应用场景图。
图2为基于视觉敏感度的自适应流媒体方法的总流程图。
图3为视频块的视觉敏感度建模的流程图。
图4为总体掩蔽效应模型的结构图。
图5为ABR算法的流程图。
具体实施方式
为使本发明的目的、技术方案和优点更加清楚明白,以下结合具体实例,对本发明进一步详细说明。
图1为本发明的应用场景图,其主要由视频服务器、内容分发网络(CDN)和视频播放器组成。视频服务器将源视频文件切割成多个视频块,并转码为不同比特率(代表不同清晰度)进行存储。内容分发网络从视频服务器处获取源视频文件的各个比特率的视频块,并计算各个视频块的视觉敏感度值。视频播放器基于ABR控制器决策出每个视频块的比特率,并通过互联网向内容分发网络请求下载相应比特率的视频块。
一种基于视觉敏感度的自适应流媒体方法,如图2所示,包括步骤如下:
步骤1、使用FFmpeg工具将源视频文件切割成包含固定时长(如4秒)的视频块,并将每个视频块转码为不同的比特率级别(如750kbps、1200kbps和1850kbps),不同的比特率级别对应不同的清晰度(750kbps、1200kbps和1850kbps分别对应低清晰度、标准清晰度和高清晰度)。
步骤2、分别从每个视频块的最高比特率的视频块中采样K+1个的视频帧,并将采样所得到的前K个视频帧作为该视频块的采样视频帧。其中K为设定值。
后续将利用源视频切割并转码后具有最高比特率的视频块计算每个视频块的视觉敏感度。图3为视频块的视觉敏感度建模的流程图。
步骤3、计算每个最高比特率的视频块的前K个采样视频帧以像素为单位的四种特征映射图,即空间随机性映射图、亮度映射图、时间映射图和显著性映射图。
(1)第k个采样视频帧以像素为单位的空间随机性映射图SMRk(i,j)。
通过对采样视频帧的各个像素邻域预测误差的计算来提取采样视频帧以像素为单位的空间随机性映射图,计算公式如下:
Figure BDA0003554507030000041
式中,
Figure BDA0003554507030000042
为第k个采样视频帧在像素(i,j)处的四邻域像素亮度向量,
Figure BDA0003554507030000043
Lk(i,j+1)为第k个采样视频帧在像素(i,j+1)处的亮度值,Lk(i+1,j)为第k个采样视频帧在像素(i+1,j)处的亮度值,Lk(i,j-1)为第k个采样视频帧在像素(i,j-1)处的亮度值,Lk(i-1,j)为第k个采样视频帧在像素(i-1,j)处的亮度值;Lk(i,j)为第k个采样视频帧在像素(i,j)处的亮度值;
Figure BDA0003554507030000044
Figure BDA0003554507030000045
的自相关矩阵;
Figure BDA0003554507030000046
为关于Lk(i,j)与
Figure BDA0003554507030000047
的协方差矩阵;| |表示取绝对值。
(2)第k个采样视频帧以像素为单位的亮度映射图LMk(i,j)。
LMk(i,j)=Lk(i,j)
式中,Lk(i,j)为第k个采样视频帧在像素(i,j)处的亮度值。
(3)第k个采样视频帧以像素为单位的时间映射图TMk(i,j)。
计算相邻的2个采样视频帧之间的运动强度(即各个像素亮度的差值),计算公式如下:
TMk(i,j)=|Lk+1(i,j)-Lk(i,j)|
式中,Lk+1(i,j)为第k+1个采样视频帧在像素(i,j)处的亮度值;Lk(i,j)为第k个采样视频帧在像素(i,j)处的亮度值。在计算最后一个采样视频帧的时间映射图TMK(i,j)时,其LK+1(i,j)为该视频块的最高比特率的视频块所采样的K+1个的视频帧在像素(i,j)处的亮度值。
(4)第k个采样视频帧以像素为单位的显著性映射图SMk(i,j)。
首先,提取第k个采样视频帧以像素为单位的CBY颜色空间上的特征映射图
Figure BDA0003554507030000051
CRG颜色空间上的特征映射图
Figure BDA0003554507030000052
亮度(L)空间上的特征映射图
Figure BDA0003554507030000053
和方向(OT)空间上的特征映射图
Figure BDA0003554507030000054
然后,利用提取的特征映射图计算采样视频帧的各个像素的显著性值,计算公式如下:
Figure BDA0003554507030000055
式中,
Figure BDA0003554507030000056
为第k个采样视频帧以像素为单位的CBY颜色空间上的特征映射图,即第k个采样视频帧在像素(i,j)处的CBY颜色值;
Figure BDA0003554507030000057
为第k个采样视频帧以像素为单位的CRG颜色空间上的特征映射图,即第k个采样视频帧在像素(i,j)处的CRG颜色值;
Figure BDA0003554507030000058
为第k个采样视频帧以像素为单位的亮度(L)空间上的特征映射图,即第k个采样视频帧在像素(i,j)处的亮度值Lk(i,j);
Figure BDA0003554507030000059
为第k个采样视频帧以像素为单位的方向(OT)空间上的特征映射图,即第k个采样视频帧在像素(i,j)处的方向值。
步骤4、构建总体掩蔽效应模型;将每个采样视频帧的空间随机性映射图、亮度映射图、时间映射图和显著性映射图均以设定大小网格切割为多个区域,并分别随机选择一定数量的区域作为空间随机性映射图、亮度映射图、时间映射图和显著性映射图的训练区域样本,并将这些区域样本送入到总体掩蔽效应模型,得到该采样视频帧的第一个刚刚可见差异(First Just Noticeable Difference,FJND)点的量化参数预测值
Figure BDA00035545070300000510
FJND点是无损感知和有损感知之间的过渡点,通常使用量化参数(QP)值表示,FJND点的大小能够准确地反映视频内容的总体掩蔽效应。因此,在本发明中,总体掩蔽效应模型本质上为FJND点的预测模型。由于可用的样本数量有限,需要对空间随机性映射图、亮度映射图、时间映射图和显著性映射图均进行预处理,即将各特征映射图划分成多个区域,并随机选择一定数量的区域作为训练样本,其训练标签为对应视频块的FJND点。
参见图4,总体掩蔽效应模型由4个完全相同的子通道模块、连接层、加权模块、回归模块和加权池化层组成。每个子通道模块依次由两层卷积层、一层最大池化层和VGG(Visual Geometry Group)卷积神经网络串联而成。加权模块依次由一层全连接层、一层激活层、一层正则化层、一层全连接层和一层激活层串联而成。回归模块依次由一层全连接层、一层激活层、一层正则化层和一层全连接层串联而成;4个子通道模块分别输入将空间随机性映射图、亮度映射图、时间映射图和显著性映射图切割并随机选择后的区域样本,4个子通道模块的输出均接连接层的输入,连接层的输出同时接加权模块和回归模块的输入,加权模块和回归模块的输出同时接加权池化层的输入,加权池化层输出FJND点的量化参数预测值。该模型的训练过程主要包括特征融合和空间池化两个部分。各个子通道模块均在典型VGG卷积神经网络结构的基础上扩展了3层,分别为Conv1、Conv2和Maxpool,以适应较小的输入区域大小。经过一系列卷积层提取特征后,采用concat()函数进行特征向量的融合。融合后的特征被输入到网络的回归部分,并采用加权平均区域聚合策略。
步骤5、基于每个视频块的所有采样视频帧的FJND点预测值,计算该视频块的视觉敏感度。
步骤5.1、计算视频块的所有采样视频帧的FJND点的量化参数预测值的平均值,得到视频块的FJND点的量化参数预测值QPt
步骤5.2、对视频块的FJND点的量化参数预测值QPt进行非线性变换,得到视频块的视觉敏感度值。
由于FJND点的值越小,则对应QP编码的视频失真程度越低,表明该视频内容具有的总体掩蔽效应较弱,HVS更容易感知到视频失真。因此,视觉敏感度随FJND点的增大而减小。根据二者的对应关系,视频块的视觉敏感度值计算公式如下:
Figure BDA0003554507030000061
式中,VSt为第t个视频块的视觉敏感度,QPt为第t个视频块的FJND点的量化参数预测值,QPmax为视频提供商设定的量化参数最大阈值,t=1,2,...,T,T为源视频文件的视频块的数量。在本实施例中,所采用的Videoset数据集中的QPmax设定为51。
步骤6、使用包含综合考虑视频块的视觉敏感度和比特率的视频质量、质量平滑度和卡顿时间的线性QoE(用户体验质量)模型作为ABR(自适应比特率)算法的优化目标,并将比特率决策建模为一个基于强化学习的最优化问题,根据观察到的当前网络环境,通过最大化奖励(Reward)函数,即定义的线性QoE模型,来不断学习优化当前的比特率决策。
ABR算法的优化目标为最大化用户体验质量QoE。而视频质量、质量平滑度、卡顿时间和视觉敏感度对用户QoE具有重要影响。为此本发明使用基于上述因素的线性QoE模型作为ABR算法的优化目标。参见图5。
(1)视频质量
在本发明中,视频质量主要由视频块的视觉敏感度和基于比特率的VMAF(VideoMultimethod Assessment Fusion)度量来决定。视频块的比特率越高,则VMAF度量值越大,视频质量较高;反之,视频质量较低。如果视频块的视觉敏感度较高,由于以更高的质量传输该块能够获得更高的QoE,则所需的视频质量越高;反之,所需的视频质量较低。为了使比特率分配与视频内容的视觉敏感度保持一致,视觉敏感度高的视频块将被分配更多的比特率资源。为此,视频质量建模如下:
Figure BDA0003554507030000071
式中,Q(Rt)为视频质量的奖励函数;Rt为视频块t的比特率;VMAF(Rt)为第t个视频块的VMAF度量;max(VSt)为源视频文件中所有视频块的最大视觉敏感度,min(VSt)为源视频文件中所有视频块的最小视觉敏感度,VSt为第t个视频块的视觉敏感度;μ为设定的归一化权重,ξ为设定的归一化偏差。μ和ξ的作用是将第t个视频块的视觉敏感度VSt映射到晤μ+ξ]范围内。在本实施例中,μ和ξ分别设为2和0.6。
(2)质量平滑度
网络带宽的时变性可能导致视频质量的波动,这将导致负的视频质量平滑度并降低用户QoE。为了避免视频质量频繁向下波动,定义正/负质量平滑度函数。为此,不同情况下的质量平滑度建模如下:
Figure BDA0003554507030000072
式中,S(Rt)为质量平滑度函数;VMAF(Rt)为第t个视频块的VMAF度量,Rt为视频块t的比特率;VMAF(Rt-1)为第t-1个视频块的VMAF度量,Rt-1为视频块t-1的比特率;γ和δ分别是正/负质量平滑度的权重参数。
(3)卡顿时间
在视频传输过程中,连续选择高比特率可能会超出网络带宽容量,导致视频卡顿并降低用户QoE。因此,在比特率决策时应避免视频卡顿现象。为此,视频卡顿时间建模为:
Figure BDA0003554507030000073
式中,B(Rt)为卡顿时间的惩罚函数;max(a,b)表示取a和b中的较大值;C为视频块的时长;Rt为第t个视频块的比特率。Lt-1为第t个视频块下载前视频播放器的缓冲区占用率。vt为第t个视频块的平均下载速度;β是卡顿时间的惩罚权重。
(4)QoE模型
ABR算法的目标是在时变的网络带宽下,综合考虑视频质量、视频视觉敏感度、质量平滑度和卡顿时间以最大化总体QoE。为此,总体QoE建模为:
QoE(Rt)=Q(Rt)+S(Rt)-B(Rt)
式中,Q(Rt)为视频质量的奖励函数;S(Rt)为质量平滑度函数;B(Rt)为卡顿时间的惩罚函数;Rt为第t个视频块的比特率。
该模型实现了视频质量、质量平滑度和视频卡顿时间之间的权衡。如果γ、δ和β较小,则视频播放器将倾向于选择更高的比特率,然而,这将导致更大的视频质量变化和卡顿时间。反之,视频播放器将维持较低的比特率不变,以避免频繁的视频质量变化和视频卡顿。同时,ABR策略将更多的带宽资源分配给视觉敏感度高的视频内容,以最大限度地提高用户QoE。
(5)优化方法
本发明的ABR算法采用了A3C,一种先进的强化学习(RL)算法,包括两个神经网络(Actor网络和Critic网络)的训练。强化学习是一个无监督学习过程,通过与环境进行实时交互并作出动作来响应环境。RL主要由五个部分组成:代理、环境、状态、动作和奖励。RL将任何决策者(学习者)定义为代理,并将除代理之外的任何东西定义为环境。代理与环境之间的相互作用通过三个基本要素来描述,即状态、动作和奖励。在每个时间步处,代理检查当前状态并执行相应的动作。然后,环境将其状态更改为下一时间步处的状态,并向代理提供奖励作为反馈。RL的本质是使代理通过学习自动做出连续的动作决策。
在网络条件和播放器状态的基础上,本发明将未来T个视频块的视觉敏感度作为当前环境状态的附加输入,并设计奖励(Reward)函数以激励与视觉敏感度相一致的比特率决策。视频播放器下载每个块t后,RL代理将包含7个参数的输入状态
Figure BDA0003554507030000081
传递到Actor网络和Critic网络中。前六项是与网络状态和播放器状态有关的特征,
Figure BDA0003554507030000082
包含过去K个视频块的吞吐量,
Figure BDA0003554507030000083
包含过去K个视频块的下载时间,
Figure BDA0003554507030000084
为下一个视频块大小的向量,bt为当前缓冲区大小,et为剩余视频块数量,lt为上一个视频块的比特率。
Figure BDA0003554507030000085
描述了未来T个视频块的视觉敏感度。对于给定的状态st,RL代理采取一定的策略输出动作at,即下一个视频块的比特率。该策略被定义为:πθ(st,at)→[0,1],在状态st下采取动作at的概率。应用每个动作后,模拟环境为代理提供该视频块的奖励Rewardt。RL代理的训练目标即为最大化获得的累积奖励。奖励函数(Reward)通常设置为期望优化的目标,如特定的QoE指标,以反映针对每个视频块比特率决策的性能。在A3C算法训练过程中,本发明将Reward函数设置为(4)中定义的线性QoE模型,且并行生成了多个RL代理以加快训练过程。每个代理都具有不同的输入参数,并将{state,action,reward}元组发送给中央代理。对于接收到的每组数据,中央代理进行梯度计算,并通过梯度下降策略更新模型,然后将新模型推送到相应的RL代理。每个代理都是相互独立的,因此训练过程可以异步进行。
本发明为一种基于视觉敏感度的自适应流媒体方法,在传统的自适应方法仅考虑网络条件、播放器状态的基础上,考虑了人类视觉系统感知不同视频内容质量失真的敏感度对用户体验质量(QoE)的影响。基于提取的四种特征映射图,本方法采用深度卷积神经网络构建总体掩蔽效应模型,并推导出视觉敏感度模型。通过对优化目标QoE的建模,在强化学习的框架下,综合考虑可用信息进行比特率决策以最大化用户QoE。本发明能够实现基于视觉敏感度的比特率分配,进一步提高流媒体资源利用率和用户体验质量。
上述实施例,仅为对本发明的目的、技术方案和效果进一步详细说明的具体个例,本发明并非限定于此。凡在本发明的公开的范围之内所做的任何修改、等同替换、改进等,均包含在本发明的保护范围之内。

Claims (4)

1.基于视觉敏感度的自适应流媒体方法,其特征是,包括步骤如下:
步骤1、将源视频文件切割成等长的视频块,并将每个视频块转码为不同的比特率级别;
步骤2、分别从每个视频块的最高比特率的视频块中采样K+1个的视频帧,并将采样所得到的前K个视频帧作为该视频块的采样视频帧;其中K为设定值;
步骤3、计算每个采样视频帧的空间随机性映射图、亮度映射图、时间映射图和显著性映射图;
步骤4、构建总体掩蔽效应模型;将每个采样视频帧的空间随机性映射图、亮度映射图、时间映射图和显著性映射图均以设定大小网格切割为多个区域,并分别随机选择一定数量的区域作为空间随机性映射图、亮度映射图、时间映射图和显著性映射图的区域样本,并将这些区域样本送入到总体掩蔽效应模型,得到该采样视频帧的第一个刚刚可见差异点的量化参数预测值;
步骤5、将每个视频块的所有采样视频帧的第一个刚刚可见差异点的量化参数预测值的平均值作为该视频块的第一个刚刚可见差异点的量化参数预测值,并利用第一个刚刚可见差异点的量化参数预测值计算该视频块的视觉敏感度;
Figure FDA0003554507020000014
式中,VSt为第t个视频块的视觉敏感度,QPt为第t个视频块的第一个刚刚可见差异点的量化参数预测值,QPmax为视频提供商提供的量化参数最大阈值,t=1,2,...,T,T为源视频文件的视频块的数量;
步骤6、使用包含综合考虑视频块的视觉敏感度和比特率的视频质量、质量平滑度、以及卡顿时间的线性用户体验质量模型作为自适应比特率算法的优化目标,并将比特率决策建模为一个基于强化学习的最优化问题,根据观察到的当前网络环境,通过最大化奖励函数即定义的线性用户体验质量模型来不断学习优化当前的比特率决策。
2.根据权利要求1所述的基于视觉敏感度的自适应流媒体方法,其特征是,步骤3中:
第k个采样视频帧的空间随机性映射图SMRk(i,j)为:
Figure FDA0003554507020000011
第k个采样视频帧的时间映射图TMk(i,j)为:
TMk(i,j)=|Lk+1(i,j)-Lk(i,j)|
第k个采样视频帧的亮度映射图LMk(i,j)为:
LMk(i,j)=Lk(i,j)
第k个采样视频帧的显著性映射图SMk(i,j)为:
Figure FDA0003554507020000012
式中,
Figure FDA0003554507020000013
为第k个采样视频帧在像素(i,j)处的四邻域像素亮度向量,
Figure FDA0003554507020000021
Lk(i,j+1)为第k个采样视频帧在像素(i,j+1)处的亮度值,Lk(i+1,j)为第k个采样视频帧在像素(i+1,j)处的亮度值,Lk(i,j-1)为第k个采样视频帧在像素(i,j-1)处的亮度值,Lk(i-1,j)为第k个采样视频帧在像素(i-1,j)处的亮度值;Lk(i,j)为第k个采样视频帧在像素(i,j)处的亮度值;
Figure FDA0003554507020000022
Figure FDA0003554507020000023
的自相关矩阵;
Figure FDA0003554507020000024
为关于Lk(i,j)与
Figure FDA0003554507020000025
的协方差矩阵;| |表示取绝对值;Lk+1(i,j)为第k+1个采样视频帧在像素(i,j)处的亮度值;
Figure FDA0003554507020000026
为第k个采样视频帧在像素(i,j)处的CBY颜色值;
Figure FDA0003554507020000027
为第k个采样视频帧在像素(i,j)处的CRG颜色值;
Figure FDA0003554507020000028
为第k个采样视频帧在像素(i,j)处的方向值;k=1,2,...,K,K为每个视频块的采样视频帧的数量。
3.根据权利要求1所述的基于视觉敏感度的自适应流媒体方法,其特征是,步骤4、所构建总体掩蔽效应模型由4个完全相同的子通道模块、连接层、加权模块、回归模块和加权池化层组成;
每个子通道模块依次由两层卷积层、一层最大池化层和VGG卷积神经网络串联而成;加权模块依次由一层全连接层、一层激活层、一层正则化层、一层全连接层和一层激活层串联而成;回归模块依次由一层全连接层、一层激活层、一层正则化层和一层全连接层串联而成;
4个子通道模块分别输入空间随机性映射图、亮度映射图、时间映射图和显著性映射图的区域样本,4个子通道模块的输出均接连接层的输入,连接层的输出同时接加权模块和回归模块的输入,加权模块和回归模块的输出同时接加权池化层的输入,加权池化层输出第一个刚刚可见差异点的量化参数预测值。
4.根据权利要求1所述的基于视觉敏感度的自适应流媒体方法,其特征是,步骤6中,用户体验质量模型QoE(Rt)为:
QoE(Rt)=Q(Rt)+S(Rt)-B(Rt)
其中:
Figure FDA0003554507020000029
Figure FDA00035545070200000210
Figure FDA00035545070200000211
式中,Rt为视频块t的比特率;Q(Rt)为视频质量的奖励函数;S(Rt)为质量平滑度函数;B(Rt)为卡顿时间的惩罚函数;VSt为第t个视频块的视觉敏感度,max(VSt)为源视频文件中所有视频块的最大视觉敏感度,min(VSt)为源视频文件中所有视频块的最小视觉敏感度;VMAF(Rt)为第t个视频块的VMAF度量;μ为设定的归一化权重,ξ为设定的归一化偏差;Rt-1为视频块t-1的比特率;VMAF(Rt-1)为第t-1个视频块的VMAF度量;γ为设定的正质量平滑度的权重参数,δ为设定的负质量平滑度的权重参数;max(a,b)表示取a和b中的较大值;β是卡顿时间的惩罚权重;Lt-1为第t个视频块下载前视频播放器的缓冲区占用率;C为视频块的时长;vt为第t个视频块的平均下载速度。
CN202210272937.9A 2022-03-18 2022-03-18 基于视觉敏感度的自适应流媒体方法 Active CN114666620B (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210272937.9A CN114666620B (zh) 2022-03-18 2022-03-18 基于视觉敏感度的自适应流媒体方法

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210272937.9A CN114666620B (zh) 2022-03-18 2022-03-18 基于视觉敏感度的自适应流媒体方法

Publications (2)

Publication Number Publication Date
CN114666620A true CN114666620A (zh) 2022-06-24
CN114666620B CN114666620B (zh) 2023-08-22

Family

ID=82028843

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210272937.9A Active CN114666620B (zh) 2022-03-18 2022-03-18 基于视觉敏感度的自适应流媒体方法

Country Status (1)

Country Link
CN (1) CN114666620B (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190289296A1 (en) * 2017-01-30 2019-09-19 Euclid Discoveries, Llc Video Characterization For Smart Encoding Based On Perceptual Quality Optimization
CN111083477A (zh) * 2019-12-11 2020-04-28 北京航空航天大学 基于视觉显著性的hevc优化算法
US20200162535A1 (en) * 2018-11-19 2020-05-21 Zhan Ma Methods and Apparatus for Learning Based Adaptive Real-time Streaming
CN114173132A (zh) * 2021-12-15 2022-03-11 中山大学 一种面向动态比特率视频的自适应比特率选择方法及系统

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190289296A1 (en) * 2017-01-30 2019-09-19 Euclid Discoveries, Llc Video Characterization For Smart Encoding Based On Perceptual Quality Optimization
US20200162535A1 (en) * 2018-11-19 2020-05-21 Zhan Ma Methods and Apparatus for Learning Based Adaptive Real-time Streaming
CN111083477A (zh) * 2019-12-11 2020-04-28 北京航空航天大学 基于视觉显著性的hevc优化算法
CN114173132A (zh) * 2021-12-15 2022-03-11 中山大学 一种面向动态比特率视频的自适应比特率选择方法及系统

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
XIAO,WINGYU,ET AL: "Adaptive Video Streaming via Deep Reinforcement Learning from User Trajectory", IEEE INTERNATIONAL PERFORMANCE COMPUTING AND COMMUNICATIONS CONFERENCE, pages 1 - 8 *
陈超;王晓东;姚婷;: "基于图像显著性的立体视频宏块重要性模型", 计算机工程, no. 01, pages 266 - 270 *

Also Published As

Publication number Publication date
CN114666620B (zh) 2023-08-22

Similar Documents

Publication Publication Date Title
US10666962B2 (en) Training end-to-end video processes
US20220030244A1 (en) Content adaptation for streaming
CN107211193B (zh) 感知体验质量估计驱动的智能适应视频流传输方法和系统
Zhang et al. Video super-resolution and caching—An edge-assisted adaptive video streaming solution
EP1438673B1 (en) System and method for communicating media signals
Jiang et al. Plato: Learning-based adaptive streaming of 360-degree videos
CN110072119A (zh) 一种基于深度学习网络的内容感知视频自适应传输方法
KR102472971B1 (ko) 인공지능 모델을 이용한 동영상 인코딩 최적화 방법, 시스템, 및 컴퓨터 프로그램
WO2022000298A1 (en) Reinforcement learning based rate control
CN112055263A (zh) 基于显著性检测的360°视频流传输系统
CN115037962A (zh) 视频自适应传输方法、装置、终端设备以及存储介质
CN113259657A (zh) 基于视频质量分数的dppo码率自适应控制系统和方法
WO2021092821A1 (en) Adaptively encoding video frames using content and network analysis
Li et al. Improving adaptive real-time video communication via cross-layer optimization
Quan et al. Reinforcement learning driven adaptive vr streaming with optical flow based qoe
CN114666620B (zh) 基于视觉敏感度的自适应流媒体方法
Lu et al. Deep-reinforcement-learning-based user-preference-aware rate adaptation for video streaming
del Rio et al. Multisite gaming streaming optimization over virtualized 5G environment using Deep Reinforcement Learning techniques
CN114071121B (zh) 影像品质评估装置及其影像品质评估方法
CN115209155A (zh) 用于低码率视频压缩的微量处理
CN116827921A (zh) 一种流媒体的音视频处理方法、装置及设备
Libório Filho et al. A gan to fight video-related traffic flooding: Super-resolution
CN114640851B (zh) 基于质量感知的自适应全向视频流的传输方法
Darwich et al. Video quality adaptation using CNN and RNN models for cost-effective and scalable video streaming Services
Dan et al. Visual sensitivity aware rate adaptation for video streaming via deep reinforcement learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Tang Zhong

Inventor after: Liang Zhisheng

Inventor after: Liu Xiaohong

Inventor after: Ye Jin

Inventor after: Dan Meng

Inventor before: Ye Jin

Inventor before: Dan Meng

GR01 Patent grant
GR01 Patent grant