CN102547293B - Method for coding session video by combining time domain dependence of face region and global rate distortion optimization - Google Patents

Method for coding session video by combining time domain dependence of face region and global rate distortion optimization Download PDF

Info

Publication number
CN102547293B
CN102547293B CN201210034708.XA CN201210034708A CN102547293B CN 102547293 B CN102547293 B CN 102547293B CN 201210034708 A CN201210034708 A CN 201210034708A CN 102547293 B CN102547293 B CN 102547293B
Authority
CN
China
Prior art keywords
face
roi
unit
diffusion
coding
Prior art date
Application number
CN201210034708.XA
Other languages
Chinese (zh)
Other versions
CN102547293A (en
Inventor
范小九
彭强
杨天武
王琼华
Original Assignee
西南交通大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 西南交通大学 filed Critical 西南交通大学
Priority to CN201210034708.XA priority Critical patent/CN102547293B/en
Publication of CN102547293A publication Critical patent/CN102547293A/en
Application granted granted Critical
Publication of CN102547293B publication Critical patent/CN102547293B/en

Links

Abstract

本发明公开了一种人脸区域时域依赖性与全局率失真优化相结合的会话视频编码方法,利用人脸感兴趣区域ROI在同一图像组GOP内相邻编码帧之间的时域依赖性,提前估计人脸ROI的失真度及其扩散影响,为最佳运动向量及模式划分选择提供有效的辅助手段。 The present invention discloses a method of encoding a video session face region and time-domain-dependent global rate-distortion optimization combination, the use of the face region of interest ROI temporal dependencies between adjacent encoded frames in the same group of picture GOP advance estimate face distortion and diffuse the impact of ROI, provide an effective adjunct to the best motion vector and mode selection is divided. 采用本发明方法,从全局的角度侧重优化人脸ROI编码单元,较好保证了人脸ROI编码单元及未来以其作为参考的编码单元的主客观质量,避免了传统编码过程中因失真度扩散所引起的额外比特开销,在维持或提升编码图像主客观质量的前提下,有效降低了会话视频编码码率,改善了编码性能,完全兼容于传统的顺序编码结构,适用于视频存储及实时性要求大于一个GOP延时的实时视频编码等应用场合。 Using the method of the present invention, focuses on optimizing the face ROI coding unit from a global perspective, it is preferred to ensure that the ROI coding unit face and future reference in its subjective and objective quality of the coding unit, to avoid the traditional process of encoding distortion due to diffusion extra bit overhead caused, under the premise of maintaining or improve subjective and objective quality of coded images, effectively reducing the coding rate video session, to improve the coding performance, fully compatible with the traditional structure of the sequence coding for video storage and real-time require more than one GOP delay real-time video coding applications.

Description

人脸区域时域依赖性与全局率失真优化相结合的会话视频编码方法所属技术领域 Face region time-domain-dependent and rate-distortion optimization session global video encoding method of combining the art

[0001] 本发明属于视频编码和处理领域,具体涉及会话视频编码过程中率失真优化编码方法的研究。 [0001] The present invention belongs to the field of video coding and processing, research session during the video encoding rate-distortion optimization coding method particularly relates.

背景技术 Background technique

[0002] 人脸作为人类区别于其他生物的关键特征之一,在人际交往及社会活动中扮演着主要信息载体的角色,因而对其进行全面而深入的研究具有十分重要的理论和现实意义。 [0002] One of the key features of a human face as human beings from other creatures, play a major role in the information carrier of interpersonal and social activities, which has a very important theoretical and practical significance to conduct a comprehensive and in-depth research. 随着实时多媒体服务的兴起,视频会议、可视电话、新闻播报等应用都与人脸有着直接或间接的联系。 With the rise of real-time multimedia services, video conferencing, video telephony, broadcast news and other applications have direct or indirect contact with a human face. 伴随这些应用的广泛推广,人脸研究的重要性更是与日俱增。 Accompanied by extensive promotion of these applications, the importance of the human face of research is increasing. 通常,视频编码及通信界用"会话视频序列"来对上述应用加以概括,而与其相应的编码技术则称为会话视频编码技术。 Typically, video encoding and communications industry for these applications to be summarized as "conversational video sequences", and its corresponding encoding technique is called conversational video encoding techniques. 在经典的视频压缩理论中,所有的帧图像及编码单元都基于同等重要性而被顺序编码。 In the classic theory of video compression, and all of the frame image based on coding units of equal importance are sequentially coded. 随着研究的深入,人们逐渐意识到视频编码算法的评价指标除了压缩率和峰值信噪比(Peak Signal to Noise Ratio, PSNR)之外,还应考虑"感兴趣区域(Region of Interest,R0I) "的编码质量。 With further research, people gradually realized Evaluation video coding algorithm in addition to the compression ratio and peak signal to noise ratio (Peak Signal to Noise Ratio, PSNR), should also be considered "region of interest (Region of Interest, R0I) "encoding quality. 事实上,使用者往往以对ROI压缩效果的主观感受的好坏来直接评价视频编码结果的可接受程度。 In fact, often the user is good or bad compression ROI subjective feelings to direct evaluation of the acceptability of the results of video encoding. 因此,如何保证或提高会话视频序列中人脸ROI的编解码质量是当前会话视频编码领域中亟待研究的前沿课题。 Therefore, how to ensure or improve the quality of encoding and decoding video sequences of the human face conversation ROI it is urgent research topics at the forefront of the current session in the field of video coding.

[0003] 从已有的研究成果看,围绕人脸ROI视频编码的相关研究主要分为两类:1)编码端优先保护人脸R0I,如基于人脸ROI的帧内编码模式更新、基于人脸ROI的比特分配及资源优化;2)解码端重点恢复或在出现差错的情况下优先恢复人脸R0I,如基于人脸ROI的差错掩盖。 Related research [0003] From the research results look around the face ROI video coding is divided into two categories: 1) conservation priority encoder face R0I, as based on facial ROI intra coding mode update, based on people face ROI bit allocation and resource optimization; 2) decoding end key recovery or in case of error recovery priority face R0I, such as a human face based on the error ROI masking. 其中,大部分研究成果通过赋予人脸ROI更高的编解码优先级,一定程度上实现了人脸ROI主客观质量的提高并促进了会话视频编码技术的发展。 Among them, most of the research results by giving people face a higher ROI codec priority, to some extent, to achieve improved ROI subjective and objective quality of the human face and to promote the development of conversational video coding technology. 然而学者们忽视的一个问题是,人脸ROI的质量虽然具有视频评价上的特殊作用,但由于其只是会话视频序列中的一部分,对于人脸ROI的侧重编解码必然意味着会话视频序列的其他部分即非人脸ROI部分编解码优先级的降低。 However, a question scholars have overlooked is that although the quality of the human face of ROI has a special role in the evaluation of the video, but it is only part of the session in the video sequence, for face ROI focused codec necessarily mean that another session of a video sequence That part of the non-face ROI priority codec portion is reduced. 这一点在编码过程中的体现尤为突出,如在比特资源有限的情况下,侧重人脸ROI的比特分配即以牺牲非人脸ROI的编码比特为前提。 This is reflected in the encoding process is particularly prominent, as in the case of limited resources a bit, focusing on a human face ROI bit allocation that is at the expense of non-face coded bits ROI as a precondition. 当所牺牲的编码比特影响到非人脸ROI的编码质量时,此时编码质量陡然降低的非人脸ROI会超越人脸ROI成为人眼关注的核心。 When the sacrifice of coded bits affect the coding quality of non-face ROI, in which case the coding quality of non-face sharply lower ROI ROI will become the core of the human face beyond the human eye's attention. 如此一来,虽然人脸ROI的编码质量因比特侧重分配获得了明显提高,但人眼所感觉到的视频序列整体编码质量非但不会提高反而降低。 Thus, although the encoding quality due to face the ROI bit allocation focused to achieve a significant increase, but the human eye perceived video quality overall coding sequence not only will not increase but decrease. 另一方面,人脸ROI中各部分的重要性也并非完全一致,虽然已有部分文献中对人脸编码优先级给出了更细致的划分(如按眼、耳、口、鼻区域)以对该问题加以突出,但相关划分方法仍显得过于主观。 On the other hand, the importance of each part of the face in the ROI is not exactly the same, although the literature has been part of the human face priority encoder gives a more detailed classification (e.g. by the eyes, ears, mouth, nose area) to the problem to be prominent, but still relevant division method is too subjective. 因此,基于人脸ROI实际编码时还应结合人脸ROI在编码过程中的具体表现如率失真(Rate-Distortion, RD)性能来进行。 Thus, when a human face based ROI coding should be combined with the actual face ROI embodied in the encoding process, such as a rate distortion (Rate-Distortion, RD) to performance.

[0004] 率失真优化(Rate Distortion Optimization, RD0)控制策略是在有限的带宽条件下提供解码端最佳视频质量的有效手段之一。 One effective means of [0004] Rate Distortion Optimization (Rate Distortion Optimization, RD0) control strategy to provide the best video quality of the decoding side in the limited bandwidth conditions. 理论上,视频编码RDO的最优解是对所有编码单元进行全局优化的结果。 In theory, the optimal solution video coding RDO is the result of global optimization of all coding units. 为了使问题更易于求解,学者们往往倾向于做一个独立性假设,即认为各编码单元间互不影响,从而实现各编码单元码率及失真度的独立衡量。 To make the problem easier to solve, scholars tend to do an independence assumption that considered among the coding units independently of each other, in order to achieve a measure of independence of each coding unit rate and degree of distortion. 以此为基础并结合拉格朗日乘子法,视频编码RDO问题即被分而治之而求解。 On this basis and combined Lagrange multiplier method, the video coding RDO ie divide and conquer the problem and solve it. 事实上,因为单个编码单元在特定编码模式下的比特数必须在其他编码单元计算完毕后才能获得,所以从严格意义上讲,各编码单元最佳编码模式的判决是相互依赖的。 In fact, since the number of bits in a single coding unit specific coding mode must be to get after the other coding unit calculated, so strictly speaking, the best coding mode decision of each coding unit are interdependent. 由于视频编码的关键任务是移除不同编码单元间的冗余(时间冗余、空间冗余及统计冗余),于是与其相关的运动估计、运动补偿和熵编码等常用策略导致了复杂的编码依赖性,该依赖性也使得每个编码单元的RDO不可能是一个完全封闭的个体。 Because the key task is to remove redundant video coding (temporal redundancy, spatial redundancy and statistical redundancy) between different coding units, so the associated motion estimation, motion compensation and entropy coding and other commonly used strategy has led to complex coding dependency, the dependency RDO also that each coding unit can not be a fully enclosed individual. 因此,基于独立性假设的RDO方法并不合理,且在各编码单元RDO过程中考虑编码依赖性已成为改善视频编码性能的重要手段之一。 Thus, the independence assumption based RDO unreasonable, and considered in the coding unit coding process dependent RDO has become an important means to improve the performance of video encoding.

[0005] 近年来,很多视频编码相关研究工作中对编码依赖性已经有所涉及,但这些方法普遍存在计算复杂度较高的缺陷。 [0005] In recent years, many video coding related research work has been directed to the coding dependencies, but these methods are ubiquitous high complexity calculation deficiencies. 为了取得编码效率和时间复杂度之间的平衡,大量RDO 方法不得不放弃对部分编码依赖性的考虑以获得性能上的提高。 In order to achieve a balance between the coding efficiency and complexity of the time, a large number of methods have to give consideration to RDO portion of the coding dependencies to obtain performance improvement. 在本发明所关注的会话视频编码中,由于人脸ROI编码单元纹理的相似性和运动的一致性,因而在一个图像组(Group of Picture, G0P)中相邻巾贞编码时所表现出来的依赖性更强。 In the video encoding session of interest to the present invention, the consistency and similarity of the face ROI motion texture coding unit, thus adjacent demonstrated when a towel Zhen encoded image group (Group of Picture, G0P) in stronger dependence.

发明内容 SUMMARY

[0006] 鉴于现有技术的以上不足,本发明的目的是设计一种提高会话视频编码性能的新方法,使之取得更优秀的编码性能和很好的应用价值和理论意义,且适用于视频存储(可设置GOP最大长度为整个序列帧数)及实时性要求大于一个GOP延时的实时视频编码。 [0006] In view of the above shortcomings of the prior art, the object of the invention is to devise a new method for improving the performance of video encoding session to make it get better encoding performance and very good value and theoretical significance, and for video storage (provided the maximum GOP length is the entire sequence of frames) and the real time requirement greater than a real-time video encoding GOP delay.

[0007] 本发明的目的是通过如下的手段实现的。 [0007] The object of the present invention are achieved by the following means.

[0008] -种人脸区域时域依赖性与全局率失真优化相结合的会话视频编码方法,利用人脸感兴趣区域ROI在同一图像组GOP内相邻编码帧之间的时域依赖性,提前估计人脸ROI 的失真度及其扩散影响,为最佳运动向量及模式划分选择提供有效的辅助手段,以实现视频序列整体及人脸ROI在主客观质量上的同步提高,其实现方式包括如下的序列步骤: [0008] - species face region temporal rate-distortion optimization dependent global session combining video encoding method using a face region of interest ROI temporal dependencies between adjacent encoded frames in the GOP within the same image group, advance estimate ROI distortion of the human face and its impact diffusion, provide an effective adjunct to divide selected as the best motion vector and model to improve video synchronization sequence and overall ROI in the face on subjective and objective quality of its implementation, including the following sequence of steps:

[0009] A.(在编码会话视频序列各GOP之前)对当前GOP内所有编码帧进行人脸ROI检测,从而确定人脸ROI编码单元的具体位置。 [0009] A. (conversational video sequence before the encoding of each GOP) coding for all frames within the current GOP ROI for face detection, to determine the specific position of the face ROI coding unit. 会话视频序列、G0P、编码单元及人脸ROI编码单元的定义和示意图见下文关于附图和术语的说明第1项。 Conversational video sequences, defined G0P, ROI coding unit and a coding unit face and schematic description see item 1 on the drawings and terms below.

[0010] B.根据当前编码单元是否属于人脸R0I,选择不同的RDO方法进行优化编码。 [0010] B. The current coding unit belongs to the face R0I, selecting a different optimized coding methods RDO.

[0011] 对于人脸ROI编码单元, [0011] ROI coding unit for the face,

[0012] B. 1构造人脸ROI编码单元时域扩散链。 [0012] B. 1 ROI coding unit configured to face the diffusion time domain chain. 人脸ROI编码单元时域扩散链的定义见见下文关于附图和术语的说明第2项。 ROI definition of a human face diffusion time domain encoding unit chain described below meet the second term on the drawings and terms. 为了降低人脸ROI编码单元时域扩散链构造时的时间复杂度,本发明给出一种简化的人脸ROI编码单元时域扩散链构造方法如下: In order to reduce the face ROI coding unit at the time of the time-domain complexity diffusion chain structure, the present invention gives a simplified face ROI constructor diffusion time domain encoding unit chain as follows:

[0013] (1)对会话视频序列的当前编码GOP内各编码单元进行前向运动搜索,以获得各编码单元在下一帧中最佳匹配单元位置,记录对应的前向运动向量及前向预测差值(该步骤在当前GOP内仅进行一次)。 [0013] (1) before each encode GOP coding unit within the current session to the video sequence motion search, each coding unit to obtain a best matching unit location in the next frame, the motion vectors and prediction recorded before the corresponding forward the difference (the first step is performed only within the current GOP). 前向运动搜索、最佳匹配单元、前向运动向量、前向预测差值见见下文关于附图和术语的说明第3项。 Forward motion search, the best matching means, forward motion vector, the forward prediction difference meet the accompanying drawings and described below in relation to item 3 terms.

[0014] (2)根据步骤(1)中得到的前向运动向量推导人脸ROI编码单元在当前GOP下一编码帧中的扩散位置,该扩散位置所对应的与人脸ROI编码单元大小相同的单元称为人脸ROI扩散单元。 [0014] (2) according to step (1) Before diffusion derivation obtained position of the face in the ROI coding unit of the current GOP to the next coding frame motion vectors corresponding to the same position of the diffuser and the face of ROI encoding unit size the unit is called face ROI diffusion unit. 区别起见,本步骤人脸ROI扩散单元称为1号人脸ROI扩散单元。 Clarification reasons, this step is referred to as the face ROI diffusion unit 1 face ROI diffusion unit. 实际上, 1号人脸ROI扩散单元即为步骤(1)中当前人脸ROI编码单元的最佳匹配单元。 In fact, the best match cell number 1 is the human face ROI diffusion unit in step (1) of this face ROI coding unit. 本步骤存储人脸ROI编码单元的前向预测差值及1号人脸ROI扩散单元的位置。 The difference between the present position of the forward prediction step of storing the face ROI coding unit 1 and the face ROI diffusion unit.

[0015] (3)将步骤(2)中1号人脸ROI扩散单元中心所在的实际编码单元的前向运动向量作为该人脸ROI扩散单元的前向运动向量,从而可得到其在当前GOP的再下一个编码帧中的扩散位置。 [0015] (3) before the actual encoding step (2) No. 1 face ROI diffusion unit to the central unit where the motion vector of the front face as the diffusing unit to ROI motion vector to be obtained in which the current GOP then a diffusion position next frame coding. 该扩散位置所对应的与人脸ROI编码单元大小相同的单元即为人脸ROI编码单元在当前GOP的再下一个编码帧中的人脸ROI扩散单元,称为2号人脸ROI扩散单元。 ROI coding unit and the face unit is the same size as the face of the diffusion ROI coding unit corresponding to a position of the face frame coded ROI in the current GOP diffusion unit again, the face is referred ROI 2 diffusion cell. 此处所得到的扩散单元不应超出发明内容步骤A中所得到的当前编码帧中人脸ROI范围, 若超出则将扩散单元水平平移至人脸ROI范围内作为2号人脸ROI扩散单元,若平移后仍超出人脸ROI范围则继续垂直平移直至扩散单元完全位于人脸ROI范围内。 The diffusion cell obtained here should not exceed the current coding frame faces in the range of Step A ROI invention is obtained, as people move unit horizontal flat face ROI # 2 within the diffusion cell range if exceeded will face ROI diffusion, if after translation still exceeds the range of the human face ROI vertical translation is continued until diffusion unit entirely within the scope of the human face ROI. 同时,根据步骤(2)所得到的1号人脸ROI扩散单元在各实际编码单元上的比例情况,将各实际编码单元的前向预测差值按比例求和作为1号人脸ROI扩散单元的前向预测差值。 Meanwhile, in accordance with step No. 1 face ROI diffusion unit (2) obtained in each case the ratio of the actual coding units, each of the forward prediction difference coding unit according to the actual proportion of the face as the sum of 1 ROI diffusion unit the difference between the forward prediction. 本步骤存储1 号人脸ROI扩散单元前向预测差值及2号人脸ROI扩散单元的位置。 This step of storing the number one on the front face of the ROI position prediction difference diffusion unit 2 and the face of the diffusion cell ROI.

[0016] (4)类似于步骤(3)推导后续的人脸ROI扩散单元,直至人脸ROI扩散单元位于当前GOP的最后一帧。 [0016] (4) similar to step (3) to derive subsequent diffusion unit face ROI, ROI diffusing means until the face is located in the last frame of the current GOP. 将人脸ROI编码单元及其在后续帧上的所有扩散单元连接在一起形成人脸ROI编码单元时域扩散链,各前向预测差值保存供后述使用。 The ROI coding unit connected to the face and all of the diffusion cells in subsequent frames together to form the face ROI diffusion time domain encoding unit chain, each of the forward prediction difference value saved for later use.

[0017] 该方法的示意图及相关说明见见下文关于附图和术语的说明第2项。 [0017] The schematic diagram of the process and the instructions on the accompanying drawings and the following description to see item 2 terms.

[0018] B. 2计算人脸ROI编码单元及人脸ROI编码单元时域扩散链上所有扩散单元的失真度估计值。 [0018] B. 2 calculates the face and human ROI coding unit distortion estimation value on all of the diffusion cells face a time-domain coding unit ROI diffusion chain. 失真度估计值是在当前编码单元或扩散单元未编码前对其实际编码后所产生的失真度进行合理估计所得的结果,本发明给出一种根据残差DCT系数的拉普拉斯分布特性所得到的失真度估计方法为如下: Distortion estimate is a reasonable estimate results obtained after its distortion actually generated by encoding the current coding unit before diffusion unit or uncoded, the present invention presents a Laplacian distribution characteristics of the residual DCT coefficients distortion estimation method is obtained as follows:

[0019] 公式1 [0019] Equation 1

Figure CN102547293BD00061

[0020] 其中D为失真度估计值,Dicp为时域扩散链上当前编码单元的上一个编码单元或扩散单元的前向预测差值,Q为量化步长。 [0020] where D is the distortion estimate, the diffusion time domain Dicp chain units on a current coding unit coding or forward prediction difference diffusion unit, Q is the quantization step size. 由于人脸ROI编码单元是时域预测链的起始单元,因此计算其失真度估计值时需采用其后向预测差值。 Since the face ROI coding unit is a starter unit chain of temporal prediction, and therefore need to calculate the estimated distortion to subsequent use of the predicted difference value. 后向预测差值基于后向运动搜索得到,后向运动搜索及后向预测差值见说明书定义及附图第4项。 After the motion search based on the obtained prediction difference, see the item defined in the description and the accompanying drawings 4 to the prediction difference value and the motion search. 对于公式1中的F( □) 函数,其计算方法如下, For the formula 1 F (□) function, calculated as follows,

[0021] 公式2: [0021] Equation 2:

Figure CN102547293BD00062

[0023] B. 3计算人脸ROI编码单元时域扩散链上所有扩散单元受人脸ROI编码单元影响的失真度扩散系数并求和得到总失真度扩散系数。 The diffusion coefficient of distortion [0023] B. 3 face ROI calculation time domain encoding unit all of the diffusion cell proliferation chain by a human face and influence ROI coding unit to obtain the total distortion summed diffusion coefficient. 失真度扩散系数是某一编码单元或扩散单元的编码结果对其时域扩散链下一相邻扩散单元编码影响的衡量标志。 Distortion diffusion coefficient is coded or spread the results of a coding unit means to measure the diffusion cell marker affect their temporal coding strand next adjacent diffusion. 本发明给出一种基于实验推导所得的失真度扩散系数计算方法表示如下, The present invention presents a degree of distortion of the resulting diffusion coefficient is calculated based on experimentally derived as follows,

[0024] 公式3 [0024] Equation 3

Figure CN102547293BD00063

[0025] 其中31表示当前扩散单元受时域扩散链上前一编码单元或扩散单元影响的失真度扩散系数,D t表示当前扩散单元的失真度估计值,Dw表示前一编码单元或扩散单元的失真度估计值,表示当前扩散单元的前向预测差值。 [0025] wherein the diffusion unit 31 indicates the current time domain by chain diffusion front distortion affect the diffusion coefficient encoding unit or a diffusion unit, D t represents the estimated value of the distortion current diffusion unit, Dw represents the front unit or a diffusion unit encoding the distortion estimate, the difference represents the forward prediction current diffusion unit. 为了计算人脸ROi编码单元时域扩散链上所有其他扩散单元受人脸ROI编码单元影响的失真度扩散系数并进而求得总失真度扩散系数,本发明分别计算人脸ROI编码单元时域扩散链上某扩散单元受前一编码单元或扩散单元影响的失真度扩散系数,然后利用基于推导得到的乘性关系得出其受人脸ROI 编码单元影响的失真度扩散系数。 To calculate the time domain diffusion face ROi chain coding units of all other units by diffusion of the diffusion coefficient on the distortion of the face thus determined ROI coding unit total distortion diffusion coefficients were calculated according to the present invention, the face ROI diffusion time domain encoding unit a diffusion coefficient of the diffusion cell by distortion before a coding unit or units affected by chain diffusion and deduced based on the use of the multiplicative distortion by its relationship derived affect the diffusion coefficient of the face ROI coding unit. 例如,当前扩散单元N及其前面NI个扩散单元的失真度扩散系数分别为βΝ,……,P 1,则其受人脸ROI编码单元影响的失真度扩散系数为β i · β 2.....3 Ν。 For example, the current cell distortion and N NI a diffusion front of the diffusion coefficient of diffusion cells were βΝ, ......, P 1, by the degree of distortion which affect the diffusion coefficient of the face ROI coding units β i · β 2 .. ... 3 Ν.

[0026] Β· 4更新拉格朗日系数。 [0026] Β · 4 to update a Lagrange multiplier.

[0027] (1)统计人脸ROI编码单元的实际编码方式(SKIP、DIRECT、帧内、帧间等)、运动补偿预测失真值及重建失真值。 [0027] (1) actual coding mode (SKIP, DIRECT, intra, inter, etc.) the face statistical ROI coding unit, motion-compensated prediction distortion value and reconstruction distortion value. 运动补偿预测失真值对应人脸ROI编码单元与其在视频编码运动搜索相应的编码单元之间的绝对差均值,重建失真值则对应人脸ROI编码单元与其在视频编码后的重建单元之间的绝对差均值。 Motion compensated prediction between the absolute distortion value corresponding to the face in the ROI coding unit and its mean absolute difference between the video coding motion coding unit corresponding to the search, the reconstruction distortion value corresponding to the face of the ROI coding unit after its reconstruction unit of video encoding mean difference.

[0028] (2)若当前人脸ROI编码单元为当前帧最后一个人脸ROI编码单元(按空间顺序从前向后从上向下),计算所有已编码GOP及当前GOP内已编码帧中以帧内方式进行编码的人脸ROI编码单元百分比、人脸ROI编码单元的平均运动补偿预测失真值及人脸ROI编码单元的平均重建失真值。 [0028] (2) If the current cell is the face ROI coding a current frame is the last face ROI coding unit (spatially sequentially from top to bottom from front to back), and calculates all encode GOP currently coded frames within a GOP to intra mode encoding human average face motion compensation coding unit percentage ROI, ROI coding unit facial reconstruction distortion value average prediction distortion value, and the face ROI coding unit. 否则,跳至STEP 3。 Otherwise, skip to STEP 3.

[0029] (3)调整拉格朗日系数。 [0029] (3) Adjust the Lagrange multiplier. 相应的调整公式为, Corresponding adjustment formula,

Figure CN102547293BD00071

[0031] 其中,λΜ为调整后的拉格朗日系数,Atjld为调整前的拉格朗日系数,η为B.3步骤中得到的总失真扩散系数,Y为当前GOP内所有已编码帧中以帧内方式进行编码的人脸ROI编码单元百分比,万为人脸ROI编码单元的平均运动补偿预测失真值,万为人脸ROI 编码单元的平均重建失真值,α为常数值,可选范围为[0.88,1.0)。 [0031] wherein, λΜ Lagrange multiplier is adjusted, Atjld before adjustment for the Lagrange multiplier, η is the total distortion diffusion coefficient obtained in step B.3, Y for all frames within the current GOP has been encoded man is encoded in intra-face manner ROI coding unit%, the average motion compensated prediction distortion value Wan human face ROI coding unit, the average human face reconstruction distortion value Wan ROI coding units, [alpha] is a constant value, the range of selectable [0.88,1.0).

[0032] Β. 5基于Β. 4中已更新的拉格朗日系数,调用拉格朗日优化方法对人脸ROI编码单元进行RDO。 [0032] Β. 5 based Β. 4 Lagrange coefficients have been updated, the Lagrange optimization method call face ROI coding unit RDO.

[0033] 对于非人脸ROI编码单元, [0033] For non-face ROI coding unit,

[0034] Β. 6若当前存在Β. 4中的拉格朗日系数,用其与相应的总失真度扩散系数的乘积即η · Anew替代传统RDO的拉格朗日系数进行非人脸ROI编码单元的RDO编码。 [0034] Β. 6 if the current presence Β. Lagrange multiplier is 4, with its respective product of the diffusion coefficient of the total distortion, i.e. η · Anew RDO Lagrange alternative to traditional non-face ROI coefficients RDO coding unit. 否则,按传统RDO及相应的拉格朗日系数进行非人脸ROI编码单元的优化编码。 Otherwise, the traditional and the RDO optimization Lagrange multiplier corresponding non-face ROI encoding the coding unit.

[0035] 采用本发明基于人脸区域时域依赖性和全局率失真优化思想相结合的会话视频编码方法,从全局的角度侧重优化对人眼主观视觉质量具有较大影响的人脸ROI编码单元,较好保证了人脸ROI编码单元及未来以其作为参考的编码单元的主客观质量,避免了传统编码过程中因失真度扩散所引起的额外比特开销,在维持或提升编码图像主客观质量的前提下,有效降低了会话视频编码码率。 [0035] The present invention is based on the face region and global temporal rate distortion optimization dependent session thought video encoding method of combining, focuses on optimizing the overall point of view has a greater impact on the human eye subjective visual quality of ROI encoding unit face , ensure better ROI coding unit of the human face and its future as a reference of subjective and objective quality coding unit, avoiding the extra bit overhead due to the distortion caused by the proliferation of conventional coding process, maintain or enhance the quality of subjective and objective image coding under the premise of effectively reducing the bit rate video encoding session. 同时,本发明根据会话视频编码中人脸ROI编码单元和非人脸ROI编码单元重要性的不同,实现有选择的RDO编码,在保障编码结果主客观质量的同时,改善了会话视频序列的编码性能。 Meanwhile, according to the present invention is a face in different sessions ROI video coding and non-face ROI encoding unit encoding unit importance, to achieve selective RDO coding, while ensuring the subjective and objective quality of the encoding result, to improve the coding sequence of the video session performance. 另外,本发明基于人脸区域编码时域依赖性和全局率失真优化思想相结合的会话视频编码方法完全兼容于传统的顺序编码结构,易于实现,可适用于视频存储及实时性要求大于一个GOP延时的实时视频编码等应用场合。 Further, the present invention is based on the face region encoding domain-dependent and rate-distortion optimization session global video coding method fully compatible with the idea of ​​binding to the coding sequence of the conventional structure, easy to implement, and is applicable to real-time video storage requirement greater than a GOP delay of real-time video coding applications.

附图说明 BRIEF DESCRIPTION

[0036] 图1为会话视频序列举例示意系列图。 [0036] Figure 1 illustrates a schematic view of a series of video sequences session.

[0037] 图2为会话视频序列的巾贞、GOP及编码单元组成示意图。 [0037] FIG. 2 is a conversational video sequences towel Zhen, GOP, and coding units FIG.

[0038] 图3为人脸ROI基本单兀举例不意图。 [0038] FIG. 3 is a face substantially single ROI example is not intended to Wu.

[0039] 图4为一种可供参考的人脸ROI编码单元时域扩散链构建流程图。 [0039] FIG. 4 as a reference face for ROI coding unit chain construct flowchart diffusion time domain.

[0040] 图5为前向搜索过程图。 [0040] FIG. 5 is a forward search process of FIG.

[0041] 图6为后向搜索过程图。 [0041] Figure 6 is the search process.

[0042] 图7为本发明方法的步骤框图。 [0042] Step 7 is a schematic block diagram of the inventive method.

[0043] 关于附图和术语的说明: [0043] drawings and description of the terms:

[0044] 1、(会话)视频序列、G0P、(人脸R0I)编码单元 [0044] 1, (the session) the video sequence, G0P, (face R0I) encoding unit

[0045] 会话视频序列是一种由若干帧连续的且以人脸头肩图像为主体的视频图像(又称为画面)所组成的视频段集合。 [0045] The video session is a sequence of several successive frames of video images and the face images of the head and shoulders of the body (also known as screen) video segment composed of a collection. 视频会议、视频电话、新闻播报等领域所产生的视频序列即为会话视频序列的典型代表,如图1所示。 A video sequence of video conferencing, video telephony, broadcast information fields generated session is the typical video sequence, as shown in FIG.

[0046] 由于会话视频序列仍属于常规的视频序列范畴,下文对GOP及编码单元说明时仍以视频序列为对象。 [0046] Since the video sequence session is still visible conventional video sequence, a GOP and hereinafter still to be described video sequence encoding means for the object.

[0047] GOP即视频序列中的画面组,一个GOP对应待编码视频序列中一组连续的画面。 [0047] i.e. GOP group of pictures in a video sequence, a GOP video sequence to be encoded corresponding to a set of consecutive pictures. 在常见的H. 264/AVC或MPEG视频编码标准中,帧格式分为I (内部编码帧)、P (前向预测帧)、 B (双向预测帧)三种,如排成IBBPBBPBBPBBP……样式,这种连续的帧图片(画面)组合即为一个G0P。 In a typical 264 / AVC video coding standard MPEG or H., the frame format is divided into I (Intra-coded frame), P (forward predictive frame), B (bidirectionally predictive frames) are three such patterns arranged IBBPBBPBBPBBP ...... this successive image frames (picture) is the combination of a G0P. 按照所采用编码方式的不同,GOP长度一般可设置为1-15。 According to the different coding schemes employed, GOP length is generally set to be 1-15.

[0048] 编码单元是视频编码中的一个基本概念,通常由一个亮度像素编码单元和附加的两个色度编码单元组成。 [0048] The coding unit is a basic concept of a video coding, usually it consists of a luminance pixel coding unit and an additional two chrominance coding units. 以常见的编码单元--宏块(Macroblock)为例,一个编码图像(视频图像帧)可包含若干个宏块,每个宏块由一个亮度像素和两个色度像素块组成,其中亮度块为16x16大小的像素块,而两个色度图像像素块的大小依据其图像的采样格式而定,如对于YUV420采样图像,色度块为8x8大小的像素块。 In common coding unit - macroblock (Macroblock) as an example, a coded image (video frame) may comprise a plurality of macro blocks, each macro block is composed of a luminance pixels and two chrominance pixel blocks, wherein the luminance block block size of 16x16 pixels, and the size of two blocks of chrominance image pixels according to its sampling format of the image may be, as described for sample image YUV420, chroma block size is 8x8 pixel blocks. 在每个图像帧中,若干宏块被排列成片的形式,视频编码算法以宏块为单位,逐个宏块进行编码,组织成连续的视频码流。 In each image frame, macroblocks are arranged in the form of tablets, video coding algorithm in units of macroblocks, macroblock by macroblock encoding, organized into a continuous video stream.

[0049] 设某(会话)视频序列包含N帧,每个GOP长度为M,则该(会话)视频序列与G0P、编码单元的关系示意图如图2所示。 [0049] provided a (session) comprising N video frames sequence, each GOP length of M, then the relationships (session) the video sequence G0P, a schematic diagram of a coding unit shown in FIG.

[0050] 在对会话视频序列各编码帧进行类似如图2所示的编码单元划分时,人脸ROI所全部或部分覆盖的编码单元称为人脸ROI编码单元。 [0050] When the session of the video sequence encoded frames similar to each coding unit splitter shown in FIG. 2, the face of ROI encoding unit covers all or part of the face is referred to as ROI coding unit. 如图3所示,浅灰色虚线标记了当前编码图像中各编码单元的位置,黑色短划线框标记了人脸ROI的检测结果,则黑色方点框内的所有编码单元均为人脸ROI编码单元。 3, the light gray dashed line marks the position of each coding unit of the current coding picture, black dashed line box labeled ROI detection result of the face, all the black square dots frame coding units are encoded ROI face unit.

[0051] 2、人脸ROI编码单元时域扩散链 [0051] 2, when the face region diffusion unit ROI coding strand

[0052] 在考虑编码时域依赖性的前提下,将人脸ROI编码单元及时域上受其影响的其他全部或部分编码单元连接在一起可组成人脸ROI编码单元时域扩散链。 [0052] In consideration of the time domain coding dependency premise, the ROI coding unit connected to the face in time all or part of the coding unit other affected domain ROI face together may constitute a time-domain coding unit diffusion chain. 其中,时域扩散链上除人脸ROI编码单元以外的其他单元又被称为人脸ROI扩散单元。 Wherein the time domain diffusion chain excluding ROI coding unit face the other unit is also known as the face ROI diffusion unit.

[0053] 理论上,上述扩散影响主要来自于运动补偿预测参考,因此扩散单元对应于以人脸ROI编码单元的全部和部分作为参考块的全部或部分编码单元,以及以这些全部或部分编码单元作为参考块的后续全部或部分编码单元。 [0053] Theoretically, the above-described influence of diffusion mainly from the motion compensation prediction reference, the diffusion unit corresponds to the entire face ROI coding unit and the reference block as part of all or part of the coding units, as well as all or part of the coding units as a follow all or part of the coding unit of the reference block. 利用实际编码过程中的运动补偿信息可准确无误的构造人脸ROI编码单元时域扩散链。 Motion compensation information in the actual encoding process using a configuration may be accurate when the chain diffusion human face domain ROI coding unit.

发明内容[0054] 中提及的一种简化的人脸ROI编码单元时域扩散链构造方法示意图如图4所示。 SUMMARY OF THE INVENTION A simplified human-mentioned [0054] In the time domain encoding unit face ROI chain diffusion schematic construction method as shown in FIG.

[0055] 其中,深黑色虚线框表示构成人脸ROI编码单元时域扩散链的人脸ROI编码单元及其在后续帧上的人脸ROI扩散单元,深黑色虚线框附近的黑色实线框表示在构造人脸ROI编码单元时域扩散链时,各帧中人脸ROI扩散单元中心所处的实际编码单元。 [0055] wherein, dark dotted box indicates human face constituting the diffusion time domain ROI coding strand ROI coding unit and face unit face in the ROI diffusion unit subsequent frames, frame close to the solid dark black dashed box in constructing the face diffusion time domain ROI chain coding unit, each frame unit faces in the center of the actual diffusion ROI coding unit is located. 图中第n+1帧上的人脸ROI扩散单元直接基于第η帧上人脸ROI编码单元的前向预测向量得到, 而第η+2帧上的人脸ROI扩散单元则基于第η+1帧上人脸ROI扩散单元中心所在编码单元的前向预测向量得到。 FIG face ROI diffusion cell on the n + 1 frame based on the first direct vector [eta] forward predictive frame to the human face ROI coding unit obtained, and the first face η + diffusion cell on the ROI based on the first two η + forward predictive vector a human face where the center of the cell proliferation ROI coding unit obtained. 相应的,第η+3帧上的人脸ROI扩散单元将基于第η+2帧上人脸ROI 扩散单元中心所在编码单元的前向预测向量得到,后续扩散单元以此规律类推。 Accordingly, the first ROI η + Face diffusing unit 3 on the basis of the forward prediction vector η + 2 human face where the center of the cell proliferation ROI coding unit obtained, the subsequent diffusion laws and so on in this unit.

[0056] 3、前向运动搜索、最佳匹配单元、前向运动向量、前向预测差值。 [0056] 3, before moving to the search, the best matching means, forward motion vector prediction difference value former.

[0057] 前向运动搜索过程类似于常规视频编码技术中的运动搜索过程,示意图如图5所示。 [0057] Similar to the previous conventional video encoding techniques to process a motion search motion search process, a schematic diagram as shown in FIG. 其基本思想是认为视频序列各编码单元内所有像素的位移量完全相同,从而对每个编码单元在参考帧中根据一定的匹配准则(一般采用最小平均绝对值误差MD或最小绝对差值和SAD)和搜索范围找出与当前编码单元最相似的单元,即最佳匹配单元,最佳匹配单元与当前编码单元的相对位移称为前向运动向量,两者之间的匹配误差值称为前向预测差值。 The basic idea is that exactly the same amount of displacement of all pixels within the video sequence of each coding unit so that for each coding unit in the reference frame in accordance with certain matching criteria (typically using the minimum mean absolute error or the minimum absolute difference value and MD SAD ) and the search range to find the most similar to the current coding unit cell, i.e., the best matching unit before the relative displacement of the best matching unit and the current coding unit is referred to the motion vector, the matching error between the two referred to as front to predict the difference. 不同于常规视频编码运动搜索过程的是,前向运动搜索所选择的参考帧为按编码时序的当前GOP内下一原始编码帧,而非重建帧。 Unlike conventional video coding motion search process that searches forward motion reference frame selected for the next frame of the original encoding of the current GOP in encoding sequence, rather than the reconstructed frame. 如图5中,CU n为第η帧中的特定编码单元, CUn+1为CUn通过前向运动搜索方法得到的第η+1帧中的最佳匹配单元,MVnS对应的前向运动向量。 As shown in FIG. 5, CU n η for the first specific frame coding unit, for the first CUn. 1 + η front CUn obtained by the motion search method in a + best matching unit, to the front MVnS corresponding motion vector.

[0058] 4、后向运动搜索、后向预测差值 [0058] 4, to search for the motion, backward prediction difference

[0059] 后向运动搜索类似于前向运动搜素。 Search backward motion after [0059] forward motion similar to the search elements. 不同的是,后向运动搜索所选择的参考帧为按编码时序的后向原始编码巾贞,不意图如图6所不。 The difference is that, after the reference frame for the motion search to the selected sequence encoded by the original encoded towel Chen, not intended to 6 are not shown in FIG. 其中,CU n+1为第η+1巾贞中的特定编码单元,CUn为通过后向运动搜索方法得到的第η帧中的最佳匹配单元,MV n+1为对应的后向运动向量。 Wherein, CU n + 1 for the first η + 1 towel Zhen specific coding unit, a first frame CUn [eta] is obtained through the motion search method to best match unit, MV n + 1 is the backward motion vector corresponding to . CUn+1与CUn两者之间的匹配误差值称为后向预测差值,匹配准则与前向运动搜索一致。 After CUn + match error between the two values ​​1 and CUn prediction difference referred to, matching the search criteria consistent with the forward motion.

具体实施方式 Detailed ways

[0060] 下面结合附图和实施例对本发明作进一步说明。 Drawings and embodiments of the present invention will be further described [0060] below in conjunction.

[0061] 为便于说明,且不失一般性,对待编码会话视频序列作如下假定: [0061] For ease of illustration, and without loss of generality, conversational video sequence to be encoded is assumed as follows:

[0062] 假定编码单元大小为16*16 ; [0062] assumed that the encoding unit size is 16 * 16;

[0063] 假定编码图像分辨率为352*288,则编码单元数量为22*18个,按行顺序依次编号为1-396 ; [0063] assumed that the encoded image resolution of 352 * 288, the number of coding units is 22 * ​​18, line-sequentially are numbered 1-396;

[0064] 假定编码总帧数为100, GOP大小为5 ; [0064] The total number of frames is assumed that the encoding 100, GOP size is 5;

[0065] 假定各编码帧均可根据适当的人脸检测方法进行人脸检测。 [0065] assumed that each frame can be coded for face detection in accordance with an appropriate method for face detection.

[0066] 根据以上假设,本实施例以第一个GOP为例进行介绍。 [0066] According to the above assumption, in a first embodiment according to the present embodiment introduces an example GOP.

[0067] A.对当前GOP内所有编码帧进行人脸ROI检测,从而确定人脸ROI编码单元的具体位置。 [0067] A. All the current coding frame within the GOP ROI for face detection, to determine the specific position of the face ROI coding unit. 设经人脸检测后人脸ROI编码单元在各帧中的顺序号依次为: Provided by the face detection unit later face ROI coding sequence number in each frame in the order of:

[0068] 第1 帧: [0068] Frame 1:

[0069] 53,54,55,74,75,76,77,78,96,97,98,99,100,118,119,120,121,122,141,142, 143 [0069] 53,54,55,74,75,76,77,78,96,97,98,99,100,118,119,120,121,122,141,142, 143

[0070] 第2 帧: [0070] Frame 2:

[0071] 52,53,54,73,74,75,76,77,95,96,97,98,99,118,119,120,121,122,140,141, 142 [0071] 52,53,54,73,74,75,76,77,95,96,97,98,99,118,119,120,121,122,140,141, 142

[0072] 第3 帧: [0072] Frame 3:

[0073] 53,54,55,74,75,76,77,78,96,97,98,99,100,118,119,120,121,122,140,141, 142,143,144 [0073] 53,54,55,74,75,76,77,78,96,97,98,99,100,118,119,120,121,122,140,141, 142,143,144

[0074] 第4 帧: [0074] Frame 4:

[0075] 53,54,55,74,75,76,77,96,97,98,99,118,119,120,121,140,141,142,143,163, 164,165 [0075] 53,54,55,74,75,76,77,96,97,98,99,118,119,120,121,140,141,142,143,163, 164, 165

[0076]第5 帧: [0076] Frame 5:

[0077] 53,54,55,74,75,76,77,96,97,98,99,118,119,120,121,140,141,142,143 [0077] 53,54,55,74,75,76,77,96,97,98,99,118,119,120,121,140,141,142,143

[0078] B.从第1帧第1个编码单元开始,判断各帧中各编码单元是否属于人脸R0I,设置不同的RDO方法进行优化编码。 [0078] B. 1 starts from the first frame of the first coding means, each frame is determined for each coding unit belongs R0I face, setting different coding RDO optimization methods.

[0079] 第1帧第1到第52个编码单元均为非人脸ROI编码单元,此时不存在基于人脸ROI更新了的拉格朗日系数,本发明将默认按常规基于独立性假设的RDO方法进行编码,具体方法请参照步骤B. 6。 [0079] The first frame 52 first to a non-face ROI coding means are coding units, based on the face at this time is not updated Lagrange multiplier ROI exists, the present invention is conventionally based on a default assumption of independence the RDO method of coding details, please refer to step B. 6.

[0080] 第1帧第53个编码单元为人脸ROI编码单元,但由于当前帧为当前GOP第一帧, 缺少相应的后向运动搜索参考帧,因而无法根据残差DCT系数的拉普拉斯分布特性进行失真度估计。 [0080] 53 of the first frame unit coding a human face ROI coding unit, but the current frame is the first frame of the current GOP, the lack of reference frame corresponding to the motion search, and thus not in accordance with the residual DCT coefficients of the Laplacian distribution characteristic distortion estimate. 因此,第1帧中各人脸ROI编码单元的率失真优化编码仍按常规基于独立性假设的方法进行。 Thus, each one face of the first frame ROI coding rate distortion optimized coding unit based upon the conventional method based on the assumption of independence.

[0081] 第2帧前51个编码单元按常规基于独立性假设的RDO方法进行优化编码。 [0081] The second frame coding unit 51 before the optimized coding method based on the hypothesis of independence of the conventional RDO.

[0082] 第2帧第52个编码单元为人脸ROI编码单元,其RDO方法如下: [0082] The second frame 52 of a human face ROI encoding unit encoding unit RDO follows:

[0083] B. 1构造人脸ROI编码单元时域扩散链。 [0083] B. 1 ROI coding unit configured to face the diffusion time domain chain. 步骤如下: Proceed as follows:

[0084] (1)对当前GOP内所有编码单元进行前向运动搜索,保存相应的前向运动向量及前向预测差值(此步骤在当前GOP内仅进行一次)。 [0084] (1) before all of the coding unit performs the motion search within the current GOP, the corresponding stored forward motion vector and the forward prediction difference value (this step is performed only once within the current GOP).

[0085] (2)根据第2帧第52个编码单元的前向运动向量,推导其在第3帧中的扩散位置, 从而得到第2帧第52个编码单元在第3帧上的扩散单元。 [0085] (2) The second frame section 52 before the motion vectors to the coding units, deriving its spread position in the third frame, resulting in the diffusing unit the second frame 52 of coding units in the third frame of the . 显然,该扩散单元对应前向运动搜索时第2帧第52个编码单元的最佳匹配单元。 Obviously, before the diffuser unit corresponding to the best match of the second frame unit of coding unit 52 when the motion search. 注意,此处所得到的扩散单元不应超过第3帧中所检测得到的人脸ROI范围,若超出则将扩散单元水平平移至人脸ROI范围内,如水平平移后仍超出则继续垂直平移直至扩散单元完全位于人脸ROI范围内。 Note that the diffusion unit obtained here should not exceed the range of the human face ROI detected in the third frame obtained, if exceeded will translate to horizontal diffusion cell ROI range of the human face, as will still exceed the horizontal translation vertical translation is continued until diffusion means located entirely within the scope of the human face ROI.

[0086] (3)取第3帧中扩散单元中心所在的实际编码单元前向运动向量作为该扩散单元的前向运动向量,推导第2帧第52个编码单元在第4帧上的扩散位置,从而得到第2帧第52个编码单元在第4帧上的扩散单元。 [0086] (3) take the forward motion vector as the diffusion unit prior to the actual coding unit cell center where the third frame diffusing into the motion vector derivation diffusion position of the frame 2, paragraph 52 encoding units on the frame 4 , to obtain the diffusion unit of the second frame 52 on the coding unit of the fourth frame. 与步骤(1)类似,本步骤同样需检测所得到的扩散单元是否超出人脸ROI范围。 Step (1) Similarly, the same steps required to detect resulting diffusion unit exceeds a range of the human face ROI. 按第3帧中扩散单元在各实际编码单元所占的比例情况和各实际编码单元的前向预测差值,可得到该扩散单元的前向预测差值。 By forward predictive difference diffusion unit in the third frame forward predictive difference coding unit of each actual proportion of each of the actual situation and the coding unit, the diffuser unit can be obtained.

[0087] (4)重复步骤⑵推导第2帧第52个编码单元在当前GOP内其他编码帧即第4帧及第5帧中扩散单元的前向预测差值。 [0087] (4) repeating steps ⑵ deriving a second frame of 52 frames, that other coding unit coding a difference forward prediction frame unit 4 and second 5 current diffusion in the GOP. 将第2帧第52个编码单元及其在后续帧上的所有扩散单元连接在一起即形成人脸ROI编码单元时域扩散链。 The second frame coding unit 52 and all of the diffusion cells in the subsequent frame that formed on the face joined ROI diffusion time domain encoding unit chain. 各前向预测差值保存供后述步骤使用。 Each forward predictive difference values ​​saved for later steps.

[0088] B. 2计算第2帧第52个编码单元及其在人脸ROI编码单元时域扩散链上所有扩散单元的失真度估计值。 [0088] B. 2 calculates the frame of the second encoding unit 52 and estimates the distortion in the time domain ROI face diffusion chain coding units for all the diffusion cells. 方法步骤为:首先,对第2帧第52个编码单元进行后向运动搜索以获得其在第1帧中的最佳匹配单元位置,记录相应的后向预测差值。 Method steps: first, the second frame of the first encoding unit 52 to the motion search unit to obtain the best matched position in the first frame, the record corresponding to the difference value prediction. 其次,根据发明内容中公式1和公式2计算第2帧第52个编码单元的失真度估计值。 Next, a second estimation value calculation frame distortion of the coding unit 52 according to the Summary of the Invention Equations 1 and 2. 最后,根据第2帧第52个编码单元的前向预测差值,计算人脸ROI编码单元时域扩散链第一个扩散单元的失真度估计值。 Finally, according to the difference of the second frame 52 forward predictive coding unit, calculate an estimate of a diffusion cell distortion face diffusion time domain ROI coding unit chain. 以此类推,直至得到人脸ROI编码单元时域扩散链上所有扩散单元的失真度估计值。 And so on until all the distortion estimate to obtain diffusion field diffusion chain units on a human face ROI coding unit.

[0089] B. 3计算人脸ROI编码单元时域扩散链上各扩散单元受第2帧第52个编码单元影响的失真度扩散系数,并求和得到总失真度扩散系数。 The diffusion coefficient of distortion [0089] B. 3 face ROI calculation unit time domain coding strand of each diffusion cell by diffusion of the encoding unit 52 Effect frame 2, and summed to obtain the total distortion diffusion coefficient. 方法步骤为:首先,根据第2帧第52 个编码单元的失真度估计值、人脸ROI编码单元时域扩散链第一个扩散单元的失真度估计值及前向预测差值,结合发明内容中公式3计算第一个扩散单元受人脸ROI编码单元影响的扩散系数。 Method steps: First, the second frame distortion estimated value of a coding unit 52, the face unit time domain ROI coding strand of a diffuser diffusing the forward prediction difference value and the distortion estimation unit, binding Summary equation 3 is calculated by means of a diffusion coefficient of diffusion of the influence ROI encoding unit face. 其次,与前述步骤类似,计算人脸ROI编码单元时域预测链上第二个扩散单元受第一个扩散单元影响的扩散系数,其与前一个扩散单元扩散系数的乘积即为第二个扩散单元受人脸ROI编码单元影响的扩散系数。 Next, similar to the preceding step, the calculation of the face in the temporal prediction unit ROI coding strand a second diffusion coefficient of the diffusion of a diffusion cell unit by the impact, before the product thereof with a diffusion coefficient of the diffusion unit is the second diffusion unit affected by the diffusion coefficient of the face ROI coding unit. 第三,分别计算人脸ROI编码单元时域扩散链上其他扩散单元受前一扩散单元影响的失真度扩散系数,然后利用乘性关系得出其受人脸ROI编码单元影响的失真度扩散系数,直至到达最后一个扩散单元。 Third, the diffusion coefficients were calculated distortion face on the diffusion time domain ROI chain coding unit by other means prior to diffusion effects a diffusion cell, with the multiplicative relationship is then derived which is the diffusion coefficient of distortion by a human face impact ROI coding unit until reaching the last cell proliferation. 最后,将人脸ROI编码单元时域扩散链上人脸ROI编码单元及各扩散单元受人脸ROI编码单元影响的失真度扩散系数求和(人脸ROI编码单元受自身影响的失真度扩散系数为1),从而得到人脸ROI编码单元的总失真度扩散系数。 Finally diffusion coefficient of distortion, will face a time-domain coding unit ROI diffusion chain human face ROI coding unit and each diffusion cell by a human face summation influence ROI coding unit (coding unit face ROI distortion affected by its own diffusion coefficient 1), to obtain a total distortion of the diffusion coefficient of the face ROI coding unit.

[0090] B. 4更新第2帧第52个编码单元的拉格朗日系数。 [0090] B. 4 second frame update Lagrange multiplier 52 of coding unit. 步骤如下: Proceed as follows:

[0091] (1)统计已编码帧中人脸ROI编码单元的实际编码方式(SKIP、DIRECT、帧内、帧间等)、运动补偿预测失真值及重建失真值。 The actual encoding (SKIP, DIRECT, intra, inter, etc.) [0091] (1) to the encoded frames of the human face ROI coding unit, motion-compensated prediction distortion value and reconstruction distortion value.

[0092] (2)若当前人脸ROI编码单元为整个GOP最后一个人脸ROI编码单元(按时空递增顺序从前向后从上向下),计算所有已编码帧中以帧内方式进行编码的人脸ROI编码单元百分比、人脸ROI编码单元的平均运动补偿预测失真值及人脸ROI编码单元的平均重建失真值。 [0092] (2) if the current face ROI for the entire GOP coding unit face the last ROI coding unit (increments by temporal order from top to bottom from front to back), calculated for all the encoded frames encoded in the intra mode average motion compensation coding unit face percentage ROI, ROI coding unit facial reconstruction distortion value average prediction distortion value, and the face ROI coding unit. 否则,跳至(3)。 Otherwise, skip to (3).

[0093] (3)按发明内容中公式4更新拉格朗日系数。 [0093] (3) according to Equation 4 Summary updated Lagrange multiplier.

[0094] B. 5基于更新的拉格朗日系数对第2帧第52个编码单元进行RDO编码。 [0094] B. 5 of the second frame coding unit 52 based on the updated Lagrangian RDO coding coefficients.

[0095] 对于第2帧第53、54个编码单元,按第2帧第52个编码单元相同的RDO编码步骤进行编码。 [0095] For the second frame coding unit 53, 54, 52 of the same encoding unit RDO encoding step for encoding the second frame.

[0096] 对于第2帧第55个编码单元,当前编码单元为非人脸ROI编码单元,其RDO编码按常规基于独立性假设的方法进行,方法描述如下: [0096] For the second frame coding unit 55, a current non-face ROI coding unit coding means coding RDO based on the hypothesis of independence of the conventional methods, the method described below:

[0097] B. 6由于其之前已存在人脸ROI编码单元被编码且进行了拉格朗日系数更新步骤,因此所采用的拉格朗日系数基于最近人脸ROI编码单元即第2帧第54个编码单元的η · ληΜ进行替代,RDO方法与传统RDO-致。 [0097] B. 6 is encoded and Lagrange coefficients have been updated since the step face ROI coding unit already exists before it, so based on Lagrange multiplier that is used in the second frame the face nearest ROI coding unit a coding unit 54 for η · ληΜ Alternatively, RDO method RDO- conventional actuator. 对于第1帧中的编码单元,由于之前不存在拉格朗日系数更新,因此本发明直接基于传统拉格朗日系数(按当前量化参数值进行计算)进行RDO。 For the first frame coding unit, since the previous update Lagrange multiplier is not present, the present invention is thus based on the conventional direct Lagrange multiplier (calculated according to the quantization parameter value of the current) for RDO.

[0098] 对于当前GOP内其他人脸ROI编码单元及非人脸ROI编码单元,均按上文所描述的第2帧第52个编码单元及第1帧第1个编码单元或第2帧第55个编码单元的RDO方法进行编码。 [0098] For others, the current GOP face and non-face ROI ROI coding unit coding means 52 are by the second frame encoding units described above and the second one of the first encoding unit or the second frame section RDO method of coding unit 55 encodes.

Claims (1)

1.人脸区域时域依赖性与全局率失真优化相结合的会话视频编码方法,利用人脸感兴趣区域ROI在同一图像组GOP内相邻编码帧之间的时域依赖性,提前估计人脸ROI的失真度及其扩散影响,为最佳运动向量及模式划分选择提供有效的辅助手段,以实现视频序列整体及人脸ROI在主客观质量上的同步提高,其实现方式包括如下的系列步骤: A. 在编码会话视频序列各GOP之前对当前GOP内所有编码帧进行人脸ROI检测,从而确定人脸ROI编码单元的具体位置; B. 根据当前编码单元是否属于人脸R0I,选择不同的率失真优化方法,即RD0,进行优化编码: 对于人脸ROI编码单元, B. 1构造人脸ROI编码单元时域扩散链,构造方法如下: (1) 对会话视频序列的当前编码GOP内各编码单元进行前向运动搜索,以获得各编码单元在下一帧中最佳匹配单元位置,记录对应的前向运动向量及前 1. The face region temporal rate-distortion optimization dependent global session combining video encoding method using a face region of interest ROI temporal dependencies between adjacent encoded frames in the GOP within the same image group, in advance of people estimated ROI and face distortion effects of diffusion, provide an effective adjunct to divide selected as the best motion vector and model to improve video synchronization sequence and overall ROI in the face on subjective and objective quality of its implementation includes the following series steps of: A. for a video session prior to encoding each GOP sequence coding for all frames within the current GOP face ROI detection, to determine the specific position of the face ROI coding unit; B. the current coding unit belongs to the face R0I, select a different the rate-distortion optimization method, i.e. RD0, optimized coding: means for ROI coding face, B. 1 ROI coding unit configured to face the chain diffusion time domain, configured as follows: the current coding GOP (1) a video sequence session before the respective encoding motion search unit to obtain the best match of each encoding unit cell positions in the next frame, before recording the motion vector corresponding to the front and 向预测差值;该步骤在当前GOP内仅进行一次; (2) 根据步骤(1)中得到的前向运动向量推导人脸ROI编码单元在当前GOP下一编码帧中的扩散位置,该扩散位置所对应的与人脸ROI编码单元大小相同的单元称为人脸ROI 扩散单元;区别起见,本步骤人脸ROI扩散单元称为1号人脸ROI扩散单元,存储人脸ROI 编码单元的前向预测差值及1号人脸ROI扩散单元的位置; (3) 将步骤(2)中1号人脸ROI扩散单元中心所在的实际编码单元的前向运动向量作为该人脸ROI扩散单元的前向运动向量,从而得到其在当前GOP的再下一个编码帧中的扩散位置;该在当前GOP的再下一个编码帧中的扩散位置所对应的与人脸ROI编码单元大小相同的单元即为人脸ROI编码单元在当前GOP的再下一个编码帧中的人脸ROI扩散单元, 称为2号人脸ROI扩散单元;此处所得到的扩散单元不应超出步骤A中所得到的当前编 This step is performed only once within the current GOP;; predictive difference (2) before the step (1) obtained in the derivation of the diffusion position of the face in the ROI coding unit of the current GOP to the next coding frame motion vector, the diffusion ROI coding unit and the face unit of the same size corresponding to the position referred to as the face ROI diffusion unit; clarification reasons, this step is referred to as the face ROI diffusion unit 1 face ROI diffusion unit that stores face forward ROI coding unit No. 1 and the position of the prediction difference diffusion unit face ROI; (3) before the actual encoding step (2) No. 1 face ROI diffusion unit to the central unit where the motion vector of the front face as the diffusing unit ROI the motion vector, to obtain the diffusion position in the current GOP, then the next encoded frame; and the face ROI coding unit of the same size unit is the person of the diffusion position in the current GOP, then the next encoded frame corresponding to the ROI coding unit coding a face frame of the face in the ROI diffusion unit further current GOP, referred to as No. 2 face ROI diffusion unit; diffusion unit obtained here should not exceed the programmed current obtained in step a 码帧中人脸ROI编码单元的具体位置范围,若超出则将扩散单元水平平移至人脸ROI范围内作为2号人脸ROI扩散单元,若平移后仍超出人脸ROI范围则继续垂直平移直至扩散单元完全位于人脸ROI范围内;同时,根据步骤(2)所得到的1号人脸ROI扩散单元在各实际编码单元上的比例情况,将各实际编码单元的前向预测差值按比例求和作为1号人脸ROI扩散单元的前向预测差值,存储1号人脸ROI扩散单元前向预测差值及2号人脸ROI扩散单元的位置; (4) 重复步骤(3)处理后续的人脸ROI扩散单元,直至人脸ROI扩散单元位于当前GOP 的最后一帧时,将人脸ROI编码单元及其在后续帧上的所有扩散单元连接在一起形成人脸ROI编码单元时域扩散链,各前向预测差值保存供后述步骤使用; B. 2计算人脸ROI编码单元及人脸ROI编码单元时域扩散链上所有扩散单元的失真度估计值,失真度估 Code frame specified range of positions of human face ROI coding unit, unit pan to level No. 2 as a human face within the face ROI ROI diffusion unit if exceeded will range diffusion, if the pan is still beyond the range of the human face ROI vertical translation is continued until diffusion means located entirely within the scope of the human face ROI; Meanwhile, according to step (2) a ratio of resolution face ROI diffusion cell obtained in each of the actual coding units, each of the forward predictive coding unit, the actual difference value proportionally forward prediction as the sum of the difference between No. 1 ROI diffusion unit face, the front face of person 1 stored in ROI position to the prediction difference diffusion unit 2 and the face of the diffusion unit ROI; (4) repeating steps (3) processing subsequent diffusion unit face ROI, ROI diffusing means until the face in the current GOP is the last one, the face joined together to form a time-domain coding unit ROI ROI coding unit face and all of the diffusion cells in subsequent frames diffusion chain, each of the steps of forward prediction difference value saved for later use; estimated distortion value of all the diffusion unit B. 2 calculates the face and the face ROI coding unit means a time domain ROI coding strand diffusion, distortion estimate 方法为如下: 公式1 : 其中D为失真度估计值,Dscp为时域扩散链上当前编码单元的上一个编码单元或扩散单元的前向预测差值,Q为量化步长,公式1中的F(Θ)函数,其计算方法如下, 公式2 : A method as follows: Equation 1: wherein D is the distortion estimate, the diffusion time domain Dscp chain units on a current coding unit coding or forward prediction difference diffusion unit, Q is the quantization step size, in Equation 1 F (Θ) function, calculated as follows, equation 2:
Figure CN102547293BC00031
B. 3计算人脸ROI编码单元时域扩散链上所有扩散单元受人脸ROI编码单元影响的失真度扩散系数并求和得到总失真度扩散系数,基于实验推导所得的失真度扩散系数计算方法表示如下, 公式3 :A=万-+'DWT 其中βt表示当前扩散单元受时域扩散链上前一编码单元或扩散单元影响的失真度扩散系数,Dt表示当前扩散单元的失真度估计值,Dw表示前一编码单元或扩散单元的失真度估计值,1);^表示当前扩散单元的前向预测差值; B. 4更新拉格朗日系数, (1) 统计人脸ROI编码单元的实际编码方式,包括SKIP、DIRECT、帧内、帧间、运动补偿预测失真值及重建失真值;运动补偿预测失真值对应人脸ROI编码单元与其在视频编码运动搜索相应的编码单元之间的绝对差均值,重建失真值则对应人脸ROI编码单元与其在视频编码后的重建单元之间的绝对差均值; (2) 依从前向后从上向 Distortion calculating diffusion coefficient of the face B. 3 ROI diffusion time domain encoding unit chain by a human face all of the diffusion cells ROI coding unit and summed to give a total impact of distortion diffusion coefficient, the diffusion coefficient is calculated based on experimentally derived distortion obtained by the process shown below, equation 3: a = Wan - + 'DWT where βt represents the current diffusion unit receiving a forward link time domain encoding unit diffusion or diffusion diffusion coefficient unit distortion effects, represents the estimated value Dt of distortion of the current diffusion unit, Dw denotes a coding unit or front diffusion unit distortion estimation value, 1); ^ represents the difference between the forward prediction current diffusion unit; B. 4 update Lagrange multiplier, (1) a face statistical ROI coding unit the actual coding, comprising SKIP, DIRECT, intra, inter, the motion compensation prediction distortion value and reconstruction distortion value; motion-compensated prediction distortion value corresponding to the human face and its search for ROI coding unit between absolute coding units corresponding to the video coding motion mean difference reconstruction distortion value corresponding to the face of the ROI coding unit and its mean absolute difference between the reconstruction unit in the video encoding; (2) from the back to the front by 下空间顺序,若当前人脸ROI编码单元为当前帧最后一个人脸ROI编码单元,计算所有已编码GOP及当前GOP内已编码帧中以帧内方式进行编码的人脸ROI编码单元百分比、人脸ROI编码单元的平均运动补偿预测失真值及人脸ROI编码单元的平均重建失真值,否则,跳至(3); (3) 调整拉格朗日系数,相应的调整公式为, 公式4 The spatial sequence, if the current cell is the face ROI coding a current frame is the last face ROI encoding unit, calculates the total encoding the GOP and the current GOP, the percentage of the face encoded ROI coding unit frame encoded in the intra-person way, people an average motion-compensated prediction distortion value, and an average facial reconstruction distortion value ROI face ROI coding unit of the coding unit, otherwise skip to (3); (3) adjusting Lagrange multiplier, the corresponding formula is adjusted, equation 4
Figure CN102547293BC00032
其中,λn"为调整后的拉格朗日系数,Atjld为调整前的拉格朗日系数,η为B. 3步骤中得到的总失真扩散系数,Y为当前GOP内所有已编码帧中以帧内方式进行编码的人脸ROI 编码单元百分比,5Λί(ν,为人脸ROI编码单元的平均运动补偿预测失真值,万为人脸ROI编码单元的平均重建失真值,α为常数值,可选范围为[0.88,1.0); Β. 5基于Β. 4中已更新的拉格朗日系数,调用拉格朗日优化方法对人脸ROI编码单元进行RDO; 对于非人脸ROI编码单元, Β. 6若当前存在Β. 4中的拉格朗日系数,用其与相应的总失真度扩散系数的乘积即Π·λnew替代传统RDO的拉格朗日系数进行非人脸ROI编码单元的RDO编码;否则,按传统RDO及相应的拉格朗日系数进行非人脸ROI编码单元的优化编码。 Wherein, λn "Lagrange multiplier is adjusted, Atjld before adjustment for the Lagrange multiplier, η is the total distortion diffusion coefficient obtained in Step B. 3, Y is coded frames within a GOP of all the current to intra mode encoding human face percentage ROI coding unit, 5Λί (ν, the average human face motion compensated prediction distortion value ROI coding unit, Wan ROI coding unit for facial reconstruction average distortion value, [alpha] is a constant value, the selectable range It is [0.88,1.0);. Β 5 Β 4 based on the updated Lagrange multiplier, Lagrangian optimization method call face ROI coding unit RDO;. for non-face ROI coding unit, Β. 6 if the current presence Β. Lagrange multiplier is 4, with its respective product of the diffusion coefficient of the total distortion, i.e. Π · λnew replace traditional Lagrange multiplier encoding RDO RDO is a non-face ROI coding unit ; otherwise, the traditional and the RDO optimization Lagrange multiplier corresponding non-face ROI encoding the coding unit.
CN201210034708.XA 2012-02-16 2012-02-16 Method for coding session video by combining time domain dependence of face region and global rate distortion optimization CN102547293B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210034708.XA CN102547293B (en) 2012-02-16 2012-02-16 Method for coding session video by combining time domain dependence of face region and global rate distortion optimization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210034708.XA CN102547293B (en) 2012-02-16 2012-02-16 Method for coding session video by combining time domain dependence of face region and global rate distortion optimization

Publications (2)

Publication Number Publication Date
CN102547293A CN102547293A (en) 2012-07-04
CN102547293B true CN102547293B (en) 2015-01-28

Family

ID=46353091

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210034708.XA CN102547293B (en) 2012-02-16 2012-02-16 Method for coding session video by combining time domain dependence of face region and global rate distortion optimization

Country Status (1)

Country Link
CN (1) CN102547293B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103297801A (en) * 2013-06-09 2013-09-11 浙江理工大学 No-reference video quality evaluation method aiming at video conference

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004051981A3 (en) * 2002-11-29 2005-07-07 Clive Gillard Video camera
CN101146226A (en) * 2007-08-10 2008-03-19 中国传媒大学 A highly-clear video image quality evaluation method and device based on self-adapted ST area
CN101572810A (en) * 2008-04-29 2009-11-04 合肥坤安电子科技有限公司 Video encoding method based on interested regions
CN101945275A (en) * 2010-08-18 2011-01-12 镇江唐桥微电子有限公司 Video coding method based on region of interest (ROI)

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7574016B2 (en) * 2003-06-26 2009-08-11 Fotonation Vision Limited Digital image processing using face detection information

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004051981A3 (en) * 2002-11-29 2005-07-07 Clive Gillard Video camera
CN101146226A (en) * 2007-08-10 2008-03-19 中国传媒大学 A highly-clear video image quality evaluation method and device based on self-adapted ST area
CN101572810A (en) * 2008-04-29 2009-11-04 合肥坤安电子科技有限公司 Video encoding method based on interested regions
CN101945275A (en) * 2010-08-18 2011-01-12 镇江唐桥微电子有限公司 Video coding method based on region of interest (ROI)

Also Published As

Publication number Publication date
CN102547293A (en) 2012-07-04

Similar Documents

Publication Publication Date Title
CN102301710B (en) Multiple bit rate video encoding using variable bit rate and dynamic resolution for adaptive video streaming
CA2748374C (en) Video encoding using previously calculated motion information
CN101189882B (en) Method and apparatus for encoder assisted-frame rate up conversion (EA-FRUC) for video compression
US8582660B2 (en) Selective video frame rate upconversion
CN101919255B (en) Reference selection for video interpolation or extrapolation
CN101385356B (en) Process for coding images using intra prediction mode
US9071841B2 (en) Video transcoding with dynamically modifiable spatial resolution
CN100394447C (en) Improved optimizing technology for data compression
KR20070117660A (en) Content adaptive multimedia processing
KR20090003300A (en) Dynamic selection of motion estimation search ranges and extended motion vector ranges
JP2005236990A (en) Video codec system equipped with real-time complexity adaptation and region-of-interest coding
JP2007166617A (en) Method and device for intra prediction coding and decoding of image
CN101563925A (en) Decoder-side region of interest video processing
CN102067610A (en) Rate control model adaptation based on slice dependencies for video coding
CN1977539A (en) Method and apparatus for generating coded picture data and for decoding coded picture data
KR20050105271A (en) Video encoding
CN1875637A (en) Method and apparatus for minimizing number of reference pictures used for inter-coding
CN102724498A (en) Methods and device for data alignment with time domain boundary
CN1585495A (en) Quick selection of prediction modes in H.264/AVC frame
CN101406056B (en) Method of reducing computations in intra-prediction and mode decision processes in a digital video encoder
CN1719735A (en) Method or device for coding a sequence of source pictures
JP2006519565A (en) Video encoding
CN101640802B (en) Video inter-frame compression coding method based on macroblock features and statistical properties
CN101352046B (en) Image encoding/decoding method and apparatus
CN101710995B (en) Video coding system based on vision characteristic

Legal Events

Date Code Title Description
C06 Publication
C10 Request of examination as to substance
C14 Granted