CN101164342B - Region-of-interest coding in video telephony using rho domain bit allocation - Google Patents

Region-of-interest coding in video telephony using rho domain bit allocation Download PDF

Info

Publication number
CN101164342B
CN101164342B CN200680013595.8A CN200680013595A CN101164342B CN 101164342 B CN101164342 B CN 101164342B CN 200680013595 A CN200680013595 A CN 200680013595A CN 101164342 B CN101164342 B CN 101164342B
Authority
CN
China
Prior art keywords
roi
frame
video
coding
zone
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN200680013595.8A
Other languages
Chinese (zh)
Other versions
CN101164342A (en
Inventor
王浩宏
哈立德·希勒米·厄勒-马列
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qualcomm Inc
Original Assignee
Qualcomm Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US11/199,935 external-priority patent/US8768084B2/en
Application filed by Qualcomm Inc filed Critical Qualcomm Inc
Publication of CN101164342A publication Critical patent/CN101164342A/en
Application granted granted Critical
Publication of CN101164342B publication Critical patent/CN101164342B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The disclosure is directed to techniques for content-adaptive background skipping for region-of-interest (ROI) video coding. The techniques may be useful in video telephony (VT) applications such as video streaming and videoconferencing, and especially useful in low bit-rate wireless communication applications, such as mobile VT. The disclosed techniques analyze content information of a video frame to dynamically determine whether to skip a non-ROI area within the frame. For example, the skipping determination may be based on content activity, such as ROI shape deformation, ROI motion, non-ROI motion, non-ROI texture complexity, and accumulated distortion due to non-ROI skipping. The skip determination may operate in conjunction with either frame-level or macroblock-level bit allocation.

Description

Region-of-interest coding method and device in the visual telephone that use position, ρ territory is distributed
The application's case is advocated the priority of the 60/658th, No. 008 U.S. Provisional Application case of application on March 1st, 2005.
Technical field
This disclosure relates to digital video coding, and more particularly, relates to the technology that the coding that is used for visual telephone (VT) application is paid close attention to district's (ROI) information.
Background technology
Many different video encoding standards have been set up for the encoded digital video sequence.For instance, mobile photographic experts group (MPEG) has been developed many standards, comprises MPEG-1, MPEG-2 and MPEG-4.Other example comprises International Telecommunication Union H.263 standard and emerging H.264 standard.These video encoding standards are supported usually by improve the efficiency of transmission of video sequence with the compress mode coded data.
Visual telephone (VT) allows shared video of user and audio-frequency information to support the application of for example video conference.Exemplary visual telephone standard comprises by dialogue and opens H.323 standard and ITU standard H.324 of those standards of defining of beginning agreement (SIP), ITU.In the VT system, the user can send and receiver, video information, receiver, video information only, or only send video information.The recipient watches the video information that is received with video information from the form of sender's transmission usually.
Proposed the selected part of video information is carried out priority encoding.For instance, the sender can specify with the better quality coding and pay close attention to district (ROI) to be used to be transferred to the recipient.The sender may wish long-range recipient is emphasized described ROI.Although the sender may wish to be primarily focused on other interior object of video scene, the representative instance of ROI is people's face.Utilization is compared with non-ROI district the priority encoding of ROI, and the recipient can more clearly watch ROI.
Summary of the invention
This disclosure is at the technology of concern district (ROI) coding that is used for visual telephone (VT).The technology that is disclosed comprises and is used for skipping adaptively the non-ROI zone of frame of video to keep bits of coded for the technology that is assigned to ROI.The technology that is disclosed also comprises and is used for using the technology that the position is assigned to ROI through the summation of weighted bits apportion model with macro zone block (MB) level in the ρ territory.In addition, the technology that is disclosed comprises the technology that is used to produce at the quality metric of ROI video, and it is common degree of concern, ROI video fidelity and the ROI perceived quality of considering the user to ROI when the quality of the encoded video sequence of assessment.
Non-ROI skips the picture quality that technology is used to strengthen ROI, and the picture quality in non-ROI district is significantly demoted.In particular, non-ROI skips technology and can keep non-ROI position and be used to be assigned to ROI so that extra bits to be provided.But application quality is measured and is made a distribution technique skew to strengthen the subjective picture quality in the encoded video scene.Position in the ρ territory is distributed can provide the more accurate and consistent control that ROI is quantized so that strengthen visual quality.Non-ROI skips, position, ρ territory is distributed and quality metric can be jointly or use effective control of ROI and non-ROI being encoded to realize separately.
In one embodiment, this disclosure provides a kind of method, and described method comprises that the video fidelity based on previous frame, the perceived quality and the user of previous frame produce the quality metric that contains the encoded frame of video of paying close attention to the district to the preference of paying close attention to the district.
In another embodiment, this disclosure provides a kind of device, and described device comprises: video encoder, its coding contain the frame of video of paying close attention to the district; And quality metric calculator, the perceived quality of its video fidelity based on previous frame, previous frame and user produce the quality metric of frame of video to the preference of paying close attention to the district.
In another embodiment, this disclosure provides a kind of method, and described method comprises: the concern regional boundary that obtains in the frame of video is fixed; The frame budget of the number of the bits of coded that can be used for described frame is defined in acquisition; And the ρ thresholding is assigned to macro zone block in the frame based on described frame budget and the weighting paid close attention between the macro zone block in or not regional paying close attention to the district of macro zone block and frame of video in the district.
In extra embodiment, this disclosure provides a kind of device, and described device comprises: pay close attention to district's mapper, its concern regional boundary that produces in the frame of video is fixed; Frame stage speed controller, its generation define the frame budget of the number of the bits of coded that can be used for described frame; And the position distribution module, it is assigned to macro zone block in the frame based on described frame budget and the weighting paid close attention between the macro zone block in or not regional paying close attention to the district of macro zone block and frame of video in the district with the ρ thresholding.
In another embodiment, this disclosure provides a kind of method, and described method comprises: successive frame is grouped into frame unit; The concern district in each frame in the described frame unit of encoding; And skip the coding in the zone in each pays close attention to the district not at least one frame in the described frame unit.
In another embodiment, this disclosure provides a kind of device, and described device comprises: pay close attention to district's mapper, its concern regional boundary that produces in the frame of video is fixed; Video encoder, its encoded video frame; And skip module, its successive frame that will divide into groups is grouped into frame unit, concern district in the described frame unit of direct video encoder encodes in each frame, and the direct video encoder is skipped do not pay close attention to the coding in the zone in distinguishing at each at least one frame in the described frame unit.
Technology described herein can be implemented in hardware, software, firmware or its any combination.If in software, implement, can part realize described technology so by computer-readable media, described computer-readable media comprises and contains the program code that can carry out the one or more instruction in the method described herein when carrying out.
Stated the details of one or more embodiment in the accompanying drawings and the description below content.From describe content and accompanying drawing and accessory rights claim, will understand other features, objects and advantages easily.
Description of drawings
Fig. 1 is that the video coding that has ROI to enable Video Codec (CODEC) and the block diagram of decode system are incorporated in explanation into.
Fig. 2 be explanation with display that radio communication device is associated on the figure that defines of the interior ROI of the video scene that presents.
Fig. 3 A and 3B are the ROI of the video scene described in the key diagram 2 and the figure in non-ROI zone.
Fig. 4 is that explanation is incorporated into to have and had the ROI that non-ROI skips module, ROI ρ territory position distribution module and ROI weight calculator and enable the block diagram of the video communication device of encoder.
Fig. 5 is the block diagram of explanation ROI quality metric calculator.
Fig. 6 is the figure that further specifies the radio communication device of incorporating the ROI user preference input unit that is useful on the calculating of ROI quality metric into.
Fig. 7 is that explanation uses the ROI quality metric calculator to analyze video sequence to optimize the block diagram of the coding parameter of being used by video encoder.
Fig. 8 is that explanation uses the ROI quality metric calculator to analyze encoded video to regulate the block diagram of the coding parameter of being used by video encoder.
Fig. 9 is the flow chart of explanation at the ROI quality metric calculating of encoded video.
Figure 10 is the flow chart of explanation at the ROI quality metric calculating of video sequence.
Figure 11 is the flow chart that explanation position, ROI ρ territory is distributed.
Figure 12 is the curve chart that the general perceives quality that will use through the coding techniques of summation of weighted bits apportion model and best solution compares.
Figure 13 is the flow chart that the non-ROI of explanation skips technology.
Figure 14 is that explanation is grouped into the figure of frame unit to support that non-ROI skips with successive frame.
Figure 15 is explanation coding continuous ROI zone and the common non-ROI figure of zone to support that non-ROI skips.
Figure 16 be with use normal bit distribute, through the curve chart that summation of weighted bits is distributed and the general perceives quality of the ROI coding techniques of background skipping compares, wherein user preference factor α=0.9.
Figure 17 be with use normal bit distribute, through the curve chart that summation of weighted bits is distributed and the overall video fidelity of the ROI coding techniques of background skipping compares, wherein user preference factor α=0.9.
Figure 18 be with use normal bit distribute, through the curve chart that summation of weighted bits is distributed and the ROI video fidelity of the ROI coding techniques of background skipping compares, wherein user preference factor α=0.9.
Figure 19 be with use normal bit distribute, through the curve chart that summation of weighted bits is distributed and the non-ROI video fidelity of the ROI coding techniques of background skipping compares, wherein user preference factor α=0.9.
Figure 20 be with use normal bit distribute, through the curve chart that summation of weighted bits is distributed and the general perceives quality of the ROI coding techniques of background skipping compares, wherein user preference factor α=0.7.
Figure 21 be with use normal bit distribute, through the curve chart that summation of weighted bits is distributed and the overall video fidelity of the ROI coding techniques of background skipping compares, wherein user preference factor α=0.7.
Figure 22 be with use normal bit distribute, through the curve chart that summation of weighted bits is distributed and the general perceives quality of the ROI coding techniques of background skipping compares, wherein user preference factor α=0.5.
Figure 23 be with use normal bit distribute, through the curve chart that summation of weighted bits is distributed and the overall video fidelity of the ROI coding techniques of background skipping compares, wherein user preference factor α=0.5.
Figure 24 is the curve chart that the perceived quality that will be under various user preference factor value uses standard frame to skip the ROI coding techniques of skipping with non-ROI compares.
Figure 25 is the curve chart that the perceived quality with ROI coding techniques when non-ROI skips opening and closing compares.
Figure 26 skips the curve chart of the distortion that causes by non-ROI on the exemplary video sequence of explanation.
Figure 27 is the curve chart that the general perceives quality that will use non-ROI to skip, do not have non-ROI to skip the ROI coding techniques of skipping with the non-ROI of adaptability compares.
Figure 28 is the curve chart that uses the general perceives quality of the ROI coding techniques of various positions distribution technique to compare at exemplary video sequence with in a code rate scope.
Figure 29 is the curve chart that will be under the code rate of 40 kilobit per seconds (kps) uses the general perceives quality of the ROI coding techniques of various positions distribution techniques to compare.
Figure 30 is the curve chart that will be under the code rate of 40 kilobit per seconds (kps) uses the overall video fidelity of the ROI coding techniques of various positions distribution techniques to compare.
Figure 31 is the curve chart that will be under the code rate of 40 kilobit per seconds (kps) uses the ROI video fidelity of the ROI coding techniques of various positions distribution techniques to compare.
Figure 32 is the curve chart that will be under the code rate of 40 kilobit per seconds (kps) uses the non-ROI video fidelity of the ROI coding techniques of various positions distribution techniques to compare.
Figure 33 is the curve chart that uses the general perceives quality of the ROI coding techniques of various positions distribution technique to compare at another exemplary video sequence with in a code rate scope.
Embodiment
Fig. 1 illustrates to incorporate into to have ROI to enable the video coding of Video Codec (CODEC) and the block diagram of decode system 10.As shown in Figure 1, system 10 comprises first video communication device 12 and second video communication device 14.Communicator 12,14 connects by transmission channel 16.Transmission channel 16 can be the wired or wireless communication medium.System 10 supports the two-way video transmission that is used for visual telephone between the video communication device 12,14. Device 12,14 symmetrical manner is substantially operated.Yet in certain embodiments, the one or both in the video communication device 12,14 can be configured to only be used for one-way communication to support that ROI enables video streaming.
One or both in the video communication device 12,14 can be configured to use the ROI coding techniques at visual telephone (VT), as described herein.The ROI coding techniques comprises: skip non-ROI district adaptively to keep bits of coded for being assigned to ROI; Through the summation of weighted bits apportion model position is assigned to ROI with video block level (for example, the level of the macro zone block (MB) in the ρ territory) use; And the ROI video quality metric that produces the ROI video, it is common degree of concern, ROI video fidelity and the ROI perceived quality of considering the user to ROI when the quality of the encoded video sequence of assessment.ρ (rho) parametric representation video block (for example, the number of the non zero AC coefficient in MB).Rate controlled in the ρ territory trends towards more accurate than the rate controlled in the QP territory.Non-ROI skips, position, ρ territory is distributed and quality metric can be jointly or use effective control of ROI and non-ROI being encoded to realize separately.
Macro zone block is the video block that forms the part of frame.The size of MB can be 16 * 16 pixels.Yet other MB size also is possible.This paper will describe macro zone block for purposes of illustration, should be appreciated that macro zone block or other video block can have multiple different size.
For bidirectional applications, reciprocal coding, decoding, multiplexed (MUX) conciliate multiplexed (DEMUX) assembly and can be provided on the opposite end of passage 16.In the example of Fig. 1, video communication device 12 comprises MUX/DEMUX assembly 18, ROI enables video CODEC 20 and audio frequency CODEC 22.Similarly, video communication device 14 comprises MUX/DEMUX assembly 26, ROI enables video CODEC 28 and audio frequency CODEC 30.
System 10 can according to dialogue open the beginning agreement (SIP), ITU H.323 standard, ITU H.324 standard or other standard are supported visual telephone.Each video CODEC 20,28 according to for example MPEG-2, MPEG-4, ITU H.263 or ITU video compression standard H.264 produce encoded video data.As further showing among Fig. 1, video CODEC 20,28 can be with audio frequency CODEC 22,30 be integrated separately, and comprises the Voice ﹠ Video part of suitable MUX/ DEMUX assembly 18,26 with data streams.Audio-frequency unit portability sound or other audio content.MUX- DEMUX unit 18,26 can meet ITU H.223 multiplexer agreement or other agreement of User Datagram Protoco (UDP) (UDP) for example.
Each ROI enables that the ROI information that provided by local user this locality of video communication device 12,14 separately may be able to be provided for video CODEC 20,28 or by the long-range ROI information that provides of the long-distance user of other video communication device 12,14.For instance, the local user of video communication device 12 can specify the district that is emphasized institute's transmission of video by the ROI in video communication device 12 local " near-end " videos that produce with the long-distance user to device 14.On the contrary, the local user of video communication device 12 can specify by the ROI in " far-end " video of video communication device 14 long-range generations, and described ROI is sent to remote video communication device.In the case, the priority encoding (for example) of 14 couples of ROI of user's Long-distance Control video communication device of video communication device 12 is more clearly to watch the ROI from the video that video communication device 14 receives.
Video communication device 12,14 can be embodied as through equipment to be used for video streaming, visual telephone or both mobile radio terminals or catv terminal.For this purpose, video communication device 12,14 can further comprise suitable wireless launcher, receiver, modulator-demodulator and processing electronic component with support of wireless communication.The example of mobile radio terminal comprises mobile radiotelephone, mobile personal digital assistant (PDA), mobile computer or is equipped with wireless communication ability and other mobile device of video coding and/or decoding capability.The example of catv terminal comprises desktop PC, visual telephone, the network equipment, set-top box, interactive television etc.Arbitrary video communication device 12,14 can be configured to send video information, receiver, video information, or sends and receiver, video information.
For videophone application, need device 12 to support video to send and the video reception ability usually.Yet, also expect the crossfire Video Applications.In visual telephone and the especially mobile video telephone by radio communication, bandwidth is an important consideration, because need extremely low bit rate usually.In particular, communication port 16 may have finite bandwidth, thereby makes via the efficient real time transmission of the high-quality video sequence of passage 16 very challenging.For instance, communication port 16 can be a wireless communication link, and it is owing to the physical constraint of passage 16 or may have finite bandwidth by the service quality that provider forced (QoS) restriction of communication port 16 or the cause of allocated bandwidth constraint.
Therefore, extra bits of coded optionally is assigned to ROI, stronger error protection or other priority encoding step can be improved the picture quality of the part of video, and keeps overall code efficiency simultaneously.For priority encoding, extra bits can be assigned to ROI, simultaneously the position of the number that reduces can be assigned to non-ROI district (for example background in the video scene).Non-ROI zone will be called as " background " zone, but non-ROI zone more generally comprises any zone of a part that does not form ROI in the video scene.Therefore, be used interchangeably non-ROI of term and background in this disclosure and refer to the not zone in specifying ROI.
In general, system 10 is used for concern district (ROI) treatment technology that visual telephone (VT) is used.Yet this type of technology also can be applicable to video streaming and uses, as mentioned above.For purposes of illustration, each video communication device 12,14 of supposition can be operated as the sender and the recipient of video information, and the participant in full who uses it as in the VT dialogue operates.For the video information of transmitting to video communication device 14 from video communication device 12, video communication device 12 is that sender's device and video communication device 14 are recipient's devices.On the contrary, for the video information of transmitting to video communication device 12 from video communication device 14, video communication device 12 is that recipient's device and video communication device 14 are sender's devices.Technology described herein also can be applicable to only send or only receive the device of this type of video.When video information by local video communication device 12,14 coding and transmission was treated in discussion, described video information can be called as " near-end " video, as mentioned above.When discussion is treated by remote video communication device 12,14 coding and during from video information that remote video communication device 12,14 receives, described video information can be called as " far-end " video.
According to the technology that is disclosed, when operating as recipient's device, video communication device 12 or 14 defines ROI information at the far-end video information that receives from sender's device.Once more, the video information that receives from sender's device will be called as " far-end " video information, receive because it is another (sender) device from the far-end that is positioned at communication port.Equally, the ROI information that defines at the video information that receives from sender's device will be called as " far-end " ROI information.Far-end ROI typically refers to the district that causes in the far-end video that far-end video reception person pays close attention to most.Recipient's device decoding far-end video information also will be presented to the user via display unit through the far-end video of decoding.The user selects ROI in the video scene that the far-end video is presented.Perhaps, can define ROI automatically.
The ROI that recipient's device is selected based on the user at recipient's device place and produce far-end ROI information, and described far-end ROI information is sent to sender's device, make sender's device can use this type of information.Far-end ROI information can be taked the form of ROI macro zone block (MB) mapping, and it defines ROI according to the MB that resides in the ROI.ROI MB shines upon available 1 mark and is in MB in the ROI, and is in the MB of ROI outside with 0 mark, is included in the MB of (1) among the ROI and eliminating (0) from ROI with identification easily.
By using the far-end ROI information by the transmission of recipient's device, sender's device is applied to corresponding ROI in the video scene with priority encoding.In particular, extra bits of coded can be assigned to ROI, simultaneously the bits of coded of the number that reduces can be assigned to non-ROI district, improve the picture quality of ROI whereby.In this way, recipient's device can Long-distance Control sender device to the ROI coding of far-end video information.
Priority encoding for example distributes by the priority bit in the ROI zone or preferentially quantizes and have higher-quality coding to the ROI area applications than the non-ROI zone of video scene.Allow the user of recipient's device more clearly to watch object or district through the ROI of priority encoding.For instance, compare with the background area of video scene, the user of recipient's device may wish more clearly to watch face or a certain other object.
When operating as sender's device, video communication device 12 or 14 also can define ROI information at the video information by the transmission of sender's device.Once more, the video information that produces in sender's device will be called as " near-end " video, because it is to produce in the proximal end of communication port.The ROI information that is produced by sender's device will be called as " near-end " ROI information.
Near-end ROI typically refers to sender in the near-end video and wishes the district that emphasizes to the recipient.Therefore, ROI can be appointed as far-end ROI information by recipient's device users, or is appointed as near-end ROI information by sender's device users.Sender's device is looked closely frequency nearly and is presented to the user via display unit.The user who is associated with sender's device selects ROI in the video scene that the near-end video is presented.The ROI that sender's device uses the user the to select near-end video of encoding makes that with respect to non-ROI zone the ROI in the near-end video is carried out priority encoding by (for example) with the better quality coding.
The near-end ROI that is selected or defined by the local user at sender's device place allows the user of sender's device to emphasize district or object in the video scene, and makes this type of district or object cause the attention of recipient's device users whereby.It should be noted that the near-end ROI that is selected by sender's device users need not to be transferred to recipient's device.But sender's device was using described information at local coder near-end video before recipient's device selected near-end ROI message transmission.Yet in certain embodiments, sender's device can send to ROI information recipient's device to allow to use preferential decoding technique, for example better quality error correction or reprocessing.
If ROI information is provided by sender's device and recipient's device, sender's device is used the far-end ROI information that receives from recipient's device or the local near-end ROI information that the produces near-end video of encoding so.Near-end that sender's device and recipient's device provide with ROI may occur between far-end ROI selects and conflict.This type of conflict may need to solve, and is for example initiatively solved by the local user or solves according to the access rights and the grade of defined.In either case, sender's device all based on provide by sender's device this locality or come priority encoding ROI by the long-range near-end ROI information that provides of recipient's device.
Given ROI by local user or long-distance user's appointment, this disclosure concentrates on the ROI coding techniques substantially.In particular, this disclosure is distributed the mode of narrating priority encoding ROI according to the position between ROI in the video scene and the non-ROI zone.Can use the ROI video quality metric and be offset distributing between ROI and the non-ROI zone through summation of weighted bits.Video quality metric considers that when the quality of the encoded video sequence of assessment the user is to the preference of ROI (, pay close attention to) degree, ROI video fidelity and ROI perceived quality.In the ρ territory, use and distribute through summation of weighted bits.In addition, can use non-ROI or " background " skip algorithm keeps bits of coded for being assigned to ROI.
Fig. 2 be explanation with display 34 that radio communication device 36 is associated on the figure that defines of the interior ROI of the video scene that presents 32.In the example of Fig. 2, ROI is depicted as rectangle ROI 38 or non-rectangle ROI 40.Non-rectangle ROI 40 can have circular or irregularly shaped.Under each situation, ROI 38 or ROI 40 all contain the people's who presents in the video scene 32 face 42.Fig. 3 A and 3B are the ROI 38 of the video scene 32 described in the key diagram 2 and the figure in non-ROI zone 43.Highlight non-ROI zone 43 (that is background) with shade among Fig. 3 B.
Can by the user manually, by installing 36 automatically or use user's manual ROI to describe the combination that the automatic ROI with device 36 defines and define ROI 38 or 40.The user can select rectangle ROI 38.Non-rectangle ROI 40 can for example use stylus and touch screen to draw by the user, perhaps selects automatically by any one of installing in 36 multiple object detection of use or the cutting techniques.Use for VT, ROI 38 or 40 can comprise the part of the face 42 of containing video conference participants in the video scene 32. ROI 38 or 40 size, shape and position can be fixing or adjustable, and can define in many ways, describe or regulate.
Individual objects in the video scene 32 that ROI 38 or 40 permission video senders emphasize to be transmitted, for example people's face 42.On the contrary, ROI 38 or 40 permission video reception persons more clearly watch the required object in the video scene 32 that is received.In either case, the face 42 in the ROI 38 or 40 all encodes with higher image quality with respect to the non-ROI zone (for example, background area) of video scene 32.In this way, the user can more clearly watch facial expression, lip activity, eye activity etc.
Yet, can use ROI 38 or 40 to specify object except that face.In general, the ROI during VT uses can be very subjective and can be different owing to the user is different.Required ROI also depends on how to use VT.In some cases, VT can be used for watching and evaluation object, and these are different with video conference.For instance, the user may wish to concentrate on the zone of containing equation or picture of whiteboard and be not speaker's face, especially when speech back to video camera and when whiteboard carries out.In some cases, video scene can comprise designated two or more ROI that are used for priority encoding.
Fig. 4 illustrates that the ROI that is used for video communication device 12 enables the block diagram of video coding system 44.As shown in Figure 4, system 44 comprises ROI weight calculator 46, position, ROI ρ territory distribution module 48, non-ROI (that is background) and skips module 50, ROI macro zone block (MB) mapper 52, frame stage speed controller 54, ρ-quantization parameter (QP) mapper 56, video encoder 58 and distortion analyzer 60.In Fig. 4, MUX-DEMUX and audio-frequency assembly have been omitted for ease of explanation.
Each assembly of describing among Fig. 4 can form in many ways, as the discrete functionality module or as comprising the functional one chip module that belongs to each module.In either case, each assembly of video coding system 44 can hardware, software, firmware or its make up and realize.For instance, this class component can be used as one or more microprocessors or digital signal processor (DSP), one or more application-specific integrated circuit (ASIC)s (ASIC), one or more field programmable gate arrays (FPGA) or other equivalence is integrated or discrete logic on the software program carried out operate.
In the example of Fig. 4, ROI weight calculator 46 receives the user preference factor α by long-distance user's input of the local user of video communication device 12 or video communication device 14.User preference α is the perceptual importance factor of ROI, and its expression is from the importance of the visual quality of actual user's viewpoint ROI.User preference α quantizes the user to the attention degree of the visual quality in the ROI.If the user payes attention to the ROI visual quality strongly, α is with higher so.If the visual quality of ROI is more inessential, α will be lower so.Based on preference α, ROI weight calculator 46 produces one group of weight w i, its be applied in position, ROI ρ territory distribution module 48 with skew just by distributing between the non-ROI of the frame of video of video encoder 58 coding and the ROI zone through summation of weighted bits.Can be each video block (for example, macro zone block (MB)) the specified weight w in the frame of video i ROI weight calculator 46 receives ROI MB mapping from ROI MB mapper 52, and incites somebody to action weight w separately iBe assigned to ROI and non-ROI MB by 52 identifications of ROI MB mapper.Has higher weights w iMacro zone block will receive the bits of coded of greater number.
Position, ρ territory distribution module 48 receives weight input w from ROI weight calculator 46 i, skip indication (skipping On/Off) from non-ROI background skipping module 50 receptions, receive ROI MB mapping from ROI MB mapper 52, from frame stage speed controller 54 receiving velocity budget R BUDGET, and receive the standard deviation of encoded MB from video encoder 58.Standard deviation can be the locomotion evaluation standard deviation of the real surplus thing of acquisition afterwards, and can be the residue of the being stored statistics from previous frame.The ROI MB that is provided by ROI MB mapper 52 shines upon the MB that specifies in the ROI that drops on that discerns in the given frame of video.Use ROI MB mapping, position, ρ territory distribution module 48 is distinguished ROI MB and non-ROI MB for the purpose of carrying out the priority bit distribution to ROI MB, and the weight w that is provided by ROI weight calculator 46 promptly is provided i Position distribution module 48 produces the ρ parameter for each MB.The number of non zero AC coefficient among the ρ parametric representation MB.Rate controlled in the ρ territory trends towards more accurate than the rate controlled in the QP territory.
For the purpose of this disclosure, suppose that the suitable process that is used to produce ROI MB mapping can use.For instance, the ROI mapping process can for example have face detection, the face that can accept accuracy and cuts apart routine techniques defining automatically or detecting ROI with target following based on using from the user's who defines ROI manual input or (for example).In this disclosure, for purposes of illustration, consider head or head and shoulder video sequence, but technology described herein can be applicable to contain except that the people or as the video sequence of other type of people's alternative multiple object.
Each frame in 54 pairs of video sequences of frame stage speed controller produces the position and distributes.In particular, frame stage speed controller 54 generation value R BUDGET, its indication can be used for the encoding number of position of all MB (that is, ROI and non-ROI MB) in the present frame.As further showing among Fig. 4, position, ρ territory distribution module 48 receives from non-ROI background skipping module 50 and skips indication (skipping On/Off), and the background in the present frame will be encoded or skip to its indication.If will skip background, position, ρ territory distribution module 48 can be fetched effectively and originally will be assigned to the position of non-ROI so, and it is re-assigned to the Ji Qu of the ROI that can be used for encoding.Therefore, open if skip in the particular frame, position, ρ territory distribution module 48 is at R so BUDGETIn have than multidigit and be assigned to ROI.If in particular frame, skip background, so can be in its position substitution from the background of the frame of previous coding.Perhaps, can produce the background of skipping by interpolation.
By using weight w i, ROI MB mapping, R BUDGET, skip On/Off indication and standard deviation, position, ρ territory distribution module 48 produces the ρ territory output of the ρ budget of each MB of indication.The output of ρ territory is applied to ρ-QP mapper 56, and described ρ-QP mapper 56 is mapped to the ρ value the corresponding QP value of each MB.By using the QP value of MB in the frame, video encoder 58 coding input videos are to produce encoded video.In addition, skip module 50 and will skip indication (skipping On/Off) and offer video encoder 58, successive frame is grouped into frame unit with the direct video encoder, the ROI zone of coded frame, and skip coding to the non-ROI zone of a frame in the frame unit.It can be adaptive skipping, and skips coding to the non-ROI zone of a frame in the frame unit because skip module 50 bootable video encoders 58 during less than threshold value at the distortion value that is associated with the previous frame unit.In this way, skipping module 50 can be based on level of distortion and application adaptability is skipped so that keep visual quality.
Can be from integrated with video communication device 12 or operationally be coupled to the video capture device (for example video camera) of video communication device 12 and obtain input video.For instance, in certain embodiments, video capture device can be integrated to form so-called camera phone or visual telephone with mobile phone.In this way, video capture device 40 can support mobile VT to use.Video can present this locality on video communication device 12, and present on video communication device 14 via display unit by transmission, described display unit is LCD (LCD), plasma screen etc. for example, and it is can be with video communication device 12 or 14 integrated or operationally be coupled to video communication device 12 or 14.
Distortion analyzer 60 is analyzed encoded video and original input video.For instance, distortion analyzer 60 compares original input video frame F and reconstructing video frame F '.Distortion analyzer 60 produces distortion value D NONROI_SKIPFor being applied to non-ROI background skipping module 50.Distortion value D NONROI_SKIPIndicate whether to skip the non-ROI zone of next frame of video.Therefore, non-ROI skips for the adaptability in the present frame, and non-ROI skips module 50 and can be dependent on previous frame usually or contain the relevant distortion information of frame unit of two or more frames.
If distortion value D NONROI_SKIPSurpass required threshold value, so non-ROI background skipping module 50 indications should not skipped the non-ROI in the next frame.In the case, coding ROI and non-ROI zone both.Yet, do not have the distortion of excessive level if distortion value, can be skipped non-ROI zone so less than required threshold value.In the case, the non-ROI zone at the previous frame coding is used for present frame.As describing, non-ROI skips module 50 can be grouped into frame unit with successive frame, and direct video encoder 58 is according to the distortion value D of previous frame unit (that is the frame unit that, contains the frame before current frame of just encoding) NONROI_SKIPAnd skip coding to the non-ROI of a frame.
Fig. 5 is the block diagram of explanation according to the ROI quality metric calculator 61 of another embodiment of this disclosure.The ROI weight calculator 46 of Fig. 4 can form the part of ROI quality metric calculator 61.Therefore, ROI quality metric calculator 46 product may be one group of weight w i, it can be based on user preference factor α and video fidelity, space quality and/or temporal quality value.As shown in Figure 5, ROI quality metric calculator 61 receives user preference value α and one or more video distortion values.The video distortion value can be divided into ROI value and non-ROI value, and can comprise video fidelity value D RF, D NF, space quality value D RS, D NSAnd temporal quality value D RT, D NTD RFVideo fidelity in the expression ROI, and D NFRepresent the video fidelity in the non-ROI district.D RSSpace quality in the expression ROI zone, and D NSRepresent the space quality in the non-ROI zone.D RTTemporal quality in the expression ROI zone, and D NTRepresent the temporal quality in the non-ROI zone.The ROI quality metric is common when the quality of the encoded video sequence of assessment considers user's concern, video fidelity and perceived quality (space, time or both).In certain embodiments, described tolerance can be used for being offset the position allocation algorithm used by position, ρ territory distribution module 48 to realize subjective visual quality do preferably.
Although learnt the ROI video coding widely, fully at length do not narrate mass measurement as yet at the ROI video.Most of mass measurement technology use peak signal noise ratio (PSNR) to assess the ROI of frame of video and the quality of non-ROI part as distortion measurement.The ROI video quality metric not only can be used for analysis purpose, and can be used as input and come towards the favourable solution skew of subjective vision through summation of weighted bits distribution technique (for example, such as the position distribution module 48 of Fig. 4 application).In general, as discussed above, at least three aspects are considered in the assessment of ROI video quality: the user is to the perceived quality of the video fidelity and the reconstructing video data of the concern of ROI visual quality or preference α, reconstructing video data (space, time or both).
User preference α is decision perceptual important sexual factor that frame of video is categorized as ROI and non-ROI part and is associated directly.In videophone application, spokesman's facial regions is typical R OI, because human facial expression is very complicated and less variation can be passed on bulk information.For the video fidelity factor, PSNR is good measurement, the distortion total amount that its indication reconstructing video frame is compared with primitive frame.Reconstructed frame is to produce by the encoded frame of video of decoding, and primitive frame is a coding frame of video before.
In many cases, video fidelity will be the most important Consideration of video coding, and wherein any improvement all may produce better subjective visual quality do.Yet situation is not that such was the case with, and why Here it is also should consider the perceived quality factor in some cases.Perceived quality is considered space error and time error.Space error can comprise into piece (that is blocking effect), around illusion or both existence.Time error can comprise the existence of time flicker, promptly when the visual quality of frame of video when time shaft changes unevenly.Time error can cause the normal change campaign in the video sequence, and this is undesirable.
D RAnd D NRThe every pixel distortion of standardization of expression ROI and non-ROI, and α represents ROI perceptual importance factor.If supposition can be reduced to linear function with the relation between the each side mentioned above in video quality assessment, the overall distortion of video sequence can be expressed as so:
D sequence = α D R + ( 1 - α ) D NR
= α M [ β Σ i = 1 M D RF ( f i , f ~ i ) + γ Σ i = 1 M D RS ( f ~ i ) + ( 1 - β - γ ) D RT ( f ~ 1 , . . . , f ~ M ) ] + , - - - ( 1 )
( 1 - α ) M [ β Σ i = 1 M D NF ( f i , f ~ i ) + γ Σ i = 1 M D NS ( f ~ i ) + ( 1 - β - γ ) D NT ( f ~ 1 , . . . , f ~ M ) ]
F wherein iWith Be i original and reconstructed frame in M the frame in the video sequence, β and γ are weighting factors, D RAnd D NRBe the overall distortion of ROI and non-ROI, D RF, D RSAnd D RTBe the normalization errors of fidelity, spatial perception quality and the time perceived quality of ROI, and D NF, D NSAnd D NTIt is its homologue for non-ROI zone.Should assign real-valued between 0 and 1 to value α, β and γ.The quality metric of gained can be used as the cost function to be formulated the ρ Parameter Optimization problem in summation of weighted bits is distributed or to be used for the other problem that ROI handles.
In low bitrate Video Applications (for example wireless videophone), becoming piece (that is blocking effect), illusion is the subject matter of spatial perception quality.This illusion promptly is set at zero by quantizing to cause that wherein most of high frequency coefficients are removed.Income effect is to make block border quite obvious through level and smooth image block.Under extremely low bit rate situation, incite somebody to action only encoding D C coefficient, this makes the image through decoding become the constant block of segmentation.In this disclosure, with ROI space quality value D RS(for D NSBe similar) be defined as the distortion of standardization blocking effect, it can be expressed as:
Figure S2006800135958D00125
Wherein, the border between the inspection block is to find out whether there is appreciable discontinuity.At S.Minami and A.Zakhor " An optimization approach for removing blocking effects in transform coding " (IEEE Trans.Circuits Systems for Video Technology, the 5th volume, the 2nd phase, the 74-82 page or leaf, a kind of suitable discontinuity detection method has been described in April, 1995), the summation of the intensity slope mean square deviation on its test zone block boundary, the full content of described article is incorporated herein by reference.The method hypothesis, the slope of block border both sides should be identical, and may be because the cause that quantizes causes the rapid variation of slope.
In equation (1), based on the D of all frames in the video sequence RS(or D NS) variance, with D RT(or D NT) value defined is the appointment mark in [0,1] scope.In this way, video fidelity, spatial perception quality and time perceived quality every is able to standardization, and can be by weighting parameters α, β and γ bridge joint to form controllable video quality measurement.The selection of these weighting parameters is determined based on its requirement and expectation by the user.Once more, this measurement can be usefully as importing with the bits of offset assigning process towards helping subjective perception.Therefore, the user can realize visually more desirable result at the ROI encoding context.
Fig. 6 is the figure that the radio communication device 36 of the ROI user preference input unit 62 that is useful on the calculating of ROI quality metric is incorporated in explanation into.In Fig. 6, radio communication device 36 is consistent with Fig. 2 substantially, input unit 62 is arranged to capture user preference α but further incorporate into, and described user preference α specifies ROI and the non-ROI relative importance partly that is assigned to video scene 32.In the example of Fig. 6, input unit 62 is shown as the form of the slider bar with slide block 64, and described slide block 64 can move along the length of slider bar with indication user preference degree α.
By using input unit 62, the user can for example optionally regulate user preference α by quality metric calculator 61 and distribute so that influence the ROI position with dynamic basis.Along with user preference α changes, the position between the ROI of frame of video and the non-ROI part is distributed and can be changed.Although describe horizontal slider among Fig. 6, input unit 62 can be implemented by in the multiple equivalent input mediums such as vertical slider, button, dial, drop-down percentage menu for example any one.This type of input medium can be handled via in touch-screen or multiple hardkey, soft key, indicator device etc. any one.
Fig. 7 is that explanation uses ROI quality metric calculator 61 to analyze video sequence to optimize the block diagram of being enabled the coding parameter of video encoder 63 application by ROI.As shown in Figure 7, video sequence is enabled video encoder 63 codings by ROI before, use ROI quality metric calculator 61 and analyze the distortion value that imports video sequence into.Therefore, the ROI quality metric calculator is analyzed the distortion value of video bit stream, for example describes referring to Fig. 5.Based on distortion value and user preference value α, the ROI quality metric calculator produces one group and makes the video sequence that is used for encoding and imports into for video encoder 63 through optimum parameters.Can comprise the weight of allocated code position between the ROI that is used in frame of video by the position distribution module and the non-ROI zone through optimum parameters, or the value of other parameter of using in distributing of position, for example weighting factor β and γ.In some sense, Fig. 7 represents open loop embodiment, and wherein ROI quality metric calculator 61 was analyzed the video flowing that imports into before coding, but did not analyze encoded video.Quality metric causes producing optimum encoding parameter to be used for encoder 63.
Fig. 8 is that explanation uses ROI quality metric calculator 61 to analyze encoded video to regulate the block diagram of being enabled the coding parameter of video encoder 63 application by ROI.In the example of Fig. 8, ROI quality metric calculator 61 is analyzed distortion value and the user preference value α that is associated with encoded video, to produce the adjusting for the coding parameter of being enabled video encoder 63 uses by ROI.Therefore, ROI quality metric calculator 61 is analyzed described video video has been enabled video encoder 63 codings by ROI after, and produces adjusting (for example) with the performance of improvement video encoder and the quality of encoded video with closed type loop basis.Can comprise the weight of allocated code position between ROI that adjusting is used in frame of video by the position distribution module and the non-ROI zone to the adjusting of coding parameter, or the value of other parameter of using in distributing of position, for example weighting factor β and γ.In the example of Fig. 8, quality metric is used for encoding iteratively and quality of evaluation at loop, till quality metric and threshold value more satisfactory.In each iteration, quality metric calculator 61 sends one group of improved coding parameter.Finally, iteration satisfies owing to quality metric threshold or result's convergence stops.
Fig. 9 is the flow chart that explanation ROI quality metric calculates.As shown in Figure 9, given suitable ROI MB mapping, ROI quality metric calculator 46 obtains the ROI and the non-ROI part (70) of ROI user preference α (68) and encoded video frame.When the encoded frame of video of reconstruct, distortion analyzer 60 is analyzed the frame of video and ROI and non-ROI partly the video fidelity D of original video frame to determine previous frame of video respectively of previous coding RFAnd D NFIn addition, distortion analyzer 60 produces ROI and non-ROI detecting period mass value D respectively RT, D NTAnd ROI and non-ROI aware space mass value D RS, D NSROI quality metric calculator 46 obtains video fidelity (72), ROI and non-ROI temporal quality (74) and ROI and non-ROI space quality (76) from distortion analyzer 60.
Based on user preference α, video fidelity, space quality and temporal quality, ROI quality metric calculator 46 is determined ROI quality metric (78).Video fidelity is for example measured the video error of reconstructing video frame with respect to primitive frame with the individual element for the basic basis color intensity value.Space quality is measured the space error of reconstructed frame with respect to primitive frame, for example becomes piece and around illusion.Temporal quality is measured for example error of the flicker of the time under the situation that the frame visual quality changes unevenly along time shaft.
It should be noted that user preference α is the currency that the user applies, and video fidelity, space quality and temporal quality are to derive from present frame one or more frames before that position distribution module 48 is handled.User preference α can fix between frame, when the user changes described value till.If the user is designated value not as yet, can assign default value to user preference factor α so.Can use the ROI quality metric and be offset the ROI of current video frame and the position distribution (80) between the non-ROI, describe referring to Fig. 5 as mentioned.For instance, but service quality measure to regulate and be used for the weight that the ROI position is distributed.In certain embodiments, but the performed operation of ROI quality metric calculator 61 in " closed type loop " example of functional presentation graphs 8 shown in Figure 9.
Figure 10 is the flow chart of explanation at the ROI quality metric calculating of video sequence.Figure 10 carries out the embodiment that quality metric calculates with respect to video flowing substantially corresponding to Fig. 9 before but be illustrated in encoded video streams.Therefore, the process of Figure 10 further comprises acquisition video flowing (67).In addition, form contrast, distribute (80) to carry out video coding (70) afterwards in the non-ROI of skew ROI/ position with Fig. 9.In certain embodiments, but the performed operation of ROI quality metric calculator 61 in " open loop " example of functional presentation graphs 7 shown in Figure 9.
Figure 11 is the flow chart that explanation position, ROI ρ territory is distributed.As shown in figure 11, position distribution module 48 obtain ROI define (82) and frame rate budget (84) both.ROI defines the form that can take ROI MB mapping, and its identification drops on MB or other video block in the ROI.Rate budget provides the number of the position of the entire frame (comprising ROI and non-ROI zone) that can be used for encoding.In addition, position distribution module 48 obtains ROI weight w from ROI weight calculator 46 i(86), the position between its skew ROI and the non-ROI is distributed.When the non-ROI skip mode (88) of determining frame, be to open or close when skipping promptly for described frame, position distribution module 48 obtains the statistics (89) of present frames.Present frame statistics (89) then can be used for subsequent frame is made the skip mode decision-making.Frame statistics can be including (for example) the standard deviation of the residue of frame after the locomotion evaluation.Perhaps, can obtain the frame statistics of previous frame.Utilize skip mode indication (88), position distribution module 48 can determine that it still is that institute's rheme must shared between ROI and non-ROI (non-ROI frame be skipped and closed) that all available positions all can belong to ROI (non-ROI frame is skipped unlatchings).
ROI defines by using, frame rate budget, quality metric-biased and non-ROI skip mode, position distribution module 48 produce position between ROI MB and the non-ROI MB through weighting ρ territory distribution (90).When timing is divided in position, definite ρ territory, mapper 56 is carried out ρ-QP and is shone upon to provide MB QP value (92) for being applied to video encoder 58 (94).Mapper 56 can be used ρ-QP mapping table or produce equation or the function of QP at specific ρ.Video encoder 58 uses the QP value that is provided by position distribution module 48 and mapper 56 to encode can to use each ROI and the non-ROI MB in the frame of video.The quality metric can not only consider the frame budget that is suitable for but also to consider availability that non-ROI skips and be associated with previous frame in the video sequence is distributed in the position of gained.Hereinafter with the operation of more detailed description position distribution module 48.
The sufficient ROI of the common supposition of position distribution technique that describes in this disclosure detects or defines available, and the control of acceptable frame stage speed is available.Based on this, the distribution technique MB stage speed that concentrates on usually between ROI and the non-ROI MB in position is controlled.Most conventional ROI position allocation algorithm based on ITU H.263+TMN8 model through weighted version, wherein create cost function, and by using one group of default weight to come distortion component in each district in the penalty differently.Similar to other video standard of great majority, TMN8 uses Q territory rate controlled scheme, and it is with function modelling speed and the distortion of QP.Yet the position distribution technique of describing in this disclosure utilizes ρ territory rate controlled module, and wherein ρ represents the number of the quantification of the non-zero among MB AC coefficient in the video coding process.As described herein, it is more accurate than QP territory rate controlled to use the distribution of position, ρ territory to trend towards, and can reduce rate fluctuation effectively.
In video coding was used, typical problem was to make distortion value D under the situation of the position budget of given video sequence SequenceMinimize.The best solution of this challenge depends on optimum frame stage speed control algolithm and best macro zone block level allocative decision.Yet,, when coding during present frame under the very finite information situation about can use about the frame in future, pursue the control of optimum frame stage speed and be not actual or feasible for real-time application (for example visual telephone).Usually, use general algorithm (" greediness " algorithm).The complexity of greedy algorithm supposition video content evenly distributes along the frame in the video sequence.Based on this, greedy algorithm is assigned to each frame in the sequence with the sub-fraction of available position.In using in real time, the limited availability of frame information also makes and is difficult to consider temporal quality in the rate controlled in the future.
In this disclosure,, suppose that usually good frame stage speed control is available in order to find practical solution and to simplify the position assignment problem.This supposition is reduced to the macro zone block level with the position assignment problem and distributes.Simultaneously, the position allocative decision can be utilized non-ROI skipping method.Non-ROI skips to have increased and reduces the time distortion item
Figure S2006800135958D00161
The possibility of value because the district of being skipped will present the perceived quality identical with the perceived quality of previous frame.Therefore, skip the fluctuation that the perceived quality between the successive frame can be reduced in non-ROI zone.
For purpose of explanation, assess the picture quality of frame of video according to equation (1).Yet, for the sake of simplicity, set β and γ and make β+γ=1.With R BudgetBe expressed as the total position budget of given frame f and R is expressed as the bit rate of the described frame of coding, described problem can be by following function representation:
Minimize
Figure S2006800135958D00162
Make R≤R Budget
Above-mentioned optimization problem can solve with dynamic programming by Ge Lalangri is lax.Yet the computational complexity of these class methods will be much higher than that real-time system can bear.Therefore, according to this disclosure, the nearly best solution of low-complexity is preferred.In particular, in this disclosure, use the second order section allocation algorithm in the ρ territory.Phase I relates to following optimization problem:
Minimize α D RF ( f , f ~ ) + ( 1 - α ) D NF ( f , f ~ ) , Make R≤R Budget(4)
After the optimum encoding parameter that obtains equation (4), second stage is regulated coding parameter to reduce item with iterative manner α D RS ( f ~ ) + ( 1 - α ) D NS ( f ~ ) , Till reaching local minimum.When β is relatively large when digital, the possibility of result of this two-stage algorithm is very near best solution.When β=1, problem (3) is identical with (4).In this disclosure, concentrate on phase I and the solution at problem (4).
In the ROI video coding, N is the number of the MB in the frame, { ρ i, { σ i, { R iAnd { D iIt is respectively the set of ρ, standard deviation, speed and the distortion (error sum of squares) of i macro zone block.Therefore, R = Σ i = 1 N R i . For all MB in the frame define one group of weight { w i}:
Figure S2006800135958D00171
Wherein K is the number of the MB in the ROI.Equation (5) can (for example) be implemented by ROI weight calculator 46.
Therefore, frame through weighted distortion is:
D = Σ i = 1 N w i D i = [ α D RF ( f , f ~ ) + ( 1 - α ) D NF ( f , f ~ ) ] * 255 2 * 384 , - - - ( 6 )
Therefore, problem (4) can be rewritten as:
Minimize D, make R≤R Budget(7)
By using bit allocation method to come solve equation (7) based on modeling.The distribution of the AC coefficient of natural image can be by laplacian distribution p ( x ) = η 2 e - η | x | Come optimal approximation.Therefore, can be in following equation (8) and (9) be the function of ρ with the speed and the distortion modeling of i macro zone block.
For instance, speed can be expressed as:
R i=Aρ i+B, (8)
Wherein A and B are constant modeling parameters, and A can think the to encode average number of the required position of nonzero coefficient, and B can think the position owing to non-texture information.
In addition, distortion can be expressed as:
D i = 384 σ i 2 e - θ ρ i / 384 - - - ( 9 )
Wherein θ is a unknown constant, and σ is the standard deviation of remaining data.Herein, the position distribution technique is optimized ρ iAnd be not quantizer, ρ-QP table can be used for according to any selected ρ because supposition exists fully accurately iProduce acceptable quantizer.In general, can come solve equation (7), wherein affined problem is converted into free problem by using the Ge Lalangri method of relaxation, as follows:
Figure S2006800135958D00181
λ wherein *Be to realize Σ i = 1 N R i = R budget Separate.By in equation (10), partial derivative being set at zero, obtain following through optimizing ρ iExpression formula:
Order ∂ J λ ∂ ρ i = ∂ Σ i = 1 N [ λ ( A ρ i + B ) + 384 w i σ i 2 e - θ ρ i / 384 ] ∂ ρ i = 0 , - - - ( 11 )
It is
λA - θ w i σ i 2 e - θ ρ i / 384 = 0 , - - - ( 12 )
Therefore
e - θ ρ i / 384 = λA θ w i σ i 2 . - - - ( 13 )
And
ρ i = 384 θ [ ln ( θ w i σ i 2 ) - ln ( λA ) ] . - - - ( 14 )
On the other hand, because:
R budget = Σ i = 1 N R i = 384 A θ Σ i = 1 N [ ln ( θ w i σ i 2 ) - ln ( λA ) ] + NB , - - - ( 15 )
So following relation is set up,
ln ( λA ) = 1 N Σ i = 1 N ln ( θ w i σ i 2 ) - θ 384 NA ( R budget - NB ) . - - - ( 16 )
According to equation (14) and (16), obtain position apportion model I, as follows:
ρ i = 384 θ [ ln ( θ w i σ i 2 ) - 1 N Σ i = 1 N ln ( θ w i σ i 2 ) + θ 384 NA ( R budget - NB ) ]
= R budget - NB NA + 384 θ [ ln ( θ w i σ i 2 ) - Σ i = 1 N ln ( θ w i σ i 2 ) N ] . - - - ( 17 )
Then gained ρ is mapped to corresponding QP and is used for the bits of coded of proper number is assigned to each ROI or non-ROIMB.
Can use alternative distortion model to obtain another one apportion model (position apportion model II).According to described alternative distortion model, suppose that step-length is that the uniform quantizer of q can be used, and is provided the distortion that causes by quantizing by following formula:
D ( q ) = 2 ∫ 0 0.5 q p ( x ) xdx + 2 Σ i = 1 ∞ ∫ ( i - 0.5 ) q ( i + 0.5 ) q p ( x ) | x - iq | dx
= 1 η [ 1 + e - ηq 1 - e - ηq ( 2 - e - 0.5 ηq - e 0.5 ηq ) - e - 0.5 ηq ] - - - ( 18 )
And provide zero percentage by following formula
Ψ = ∫ - 0.5 q 0.5 q η 2 e - η | x | dx = 1 - e - 0.5 ηq . - - - ( 19 )
Therefore,
D ( q ) = Ψ η ( 2 - Ψ ) . - - - ( 20 )
According to Shannon source code theorem as describing in " Elements of information theory " (Wiley, NewYork, NY, 1991) of T.M.Cover and J.A.Thomas, for the Laplce source, the minimal amount of the position that the expression symbol is required by R ( q ) = log 2 ( 1 ηD ( q ) ) Provide, make
R i = Σ i = 1 384 R ( q ) = 384 log 2 ( 1 ηD ( q ) ) = 384 log 2 2 - Ψ i Ψ i . - - - ( 21 )
Because Ψ i = 1 - ρ i 384 , Wherein 384 is for the coefficient total number in i the macro zone block of 4: 2: 0 videos, thereby can launch equation (21) by using Taylor expansion, and can be similar to relation between bit rate and the ρ by following formula:
R i=Aρ i+B, (22)
Wherein A and B are constant modeling parameters, and A can think the to encode average number of the required position of nonzero coefficient, and B can think the position owing to non-texture information.
In addition, the be expressed from the next variance of coefficient:
σ 2 = ∫ - ∞ + ∞ p ( x ) x 2 dx = ∫ - ∞ + ∞ η 2 x 2 e - η | x | dx = 2 η 2 , - - - ( 23 )
Therefore, can the be expressed from the next distortion of i macro zone block:
D i = Σ i = 1 384 D ( q ) = 384 Ψ i η ( 2 - Ψ i ) = 384 - ρ i 2 ( 384 + ρ i ) σ i . - - - ( 24 )
The same with the derivation of position apportion model I, can realize the optimum bit allocative decision by solving optimization problem (7) (that is following problem):
Figure S2006800135958D00206
In general, can come solve equation (25), wherein affined problem is converted into free problem by using the Ge Lalangri method of relaxation, as follows:
Figure S2006800135958D00211
λ wherein *Be to realize Σ i = 1 N R i = R budget Separate.By in equation (26), partial derivative being set at zero, obtain following through optimizing ρ iExpression formula:
Order ∂ J λ ∂ ρ i = ∂ Σ i = 1 N [ λ ( A ρ i + B ) + ( 384 - ρ i ) 2 ( 384 + ρ i ) σ i ] ∂ ρ i = 0 , - - - ( 27 )
It is:
λA - 384 2 ( 384 + ρ i ) 2 σ i = 0 , - - - ( 28 )
Therefore
ρ i = 384 2 Aλ σ i - 384 . - - - ( 29 )
On the other hand, because:
R budget = Σ i = 1 N R i = A Σ i = 1 N 384 2 Aλ σ i - 384 NA + NB , - - - ( 30 )
So
384 2 Aλ = R budget + 384 NA - NB A Σ i = 1 N σ i . - - - ( 31 )
According to equation (28) and (30), obtain following formula:
ρ i = σ i Σ j = 1 N σ j ( R budget A - N B A ) + 384 σ i - 1 N Σ j = 1 N σ j 1 N Σ j = 1 N σ j ,
= σ i Σ j = 1 N σ j ρ budget + 384 σ i - 1 N Σ j = 1 N σ j 1 N Σ j = 1 N σ j , - - - ( 32 )
ρ wherein BudgetIt is the total ρ budget of frame.
Although modeling distortion by different way in the equation (32) based on described model, obtains with the next apportion model II:
ρ i = w i σ i Σ j = 1 N w i σ j ρ budget . - - - ( 33 )
Can (for example) implement equation (33) by position distribution module 48.
Figure 12 is the curve chart that the general perceives quality that will use through the coding techniques of summation of weighted bits apportion model I and II and best solution compares.Realize best solution by the Ge Lalangri method of relaxation, and an apportion model I and the II of implementing as indicated above.Figure 12 is illustrated in PSNR (with decibelmeter) and the frame number during initial 100 frames to standard Foreman video test sequence carry out the ROI coding.In Figure 12, respectively by reference number 91,93 and 95 identification best solutions, position apportion model I and position apportion model II.For position apportion model I and II, for the purpose of position weights assigned equation (5), the value of α is 0.9.As shown in figure 12, compare with best solution, position apportion model I and II all carry out very goodly.
Figure 13 is the flow chart that the non-ROI of explanation (" background ") skips technology.The ability of skipping the coding in the non-ROI zone of frame of video can produce the remarkable saving that the position is distributed.(that is, skip) non-ROI if do not encode, the position that is assigned to non-ROI so originally can change into being reallocated and be used to the ROI that encodes, thereby improves the visual quality of the MB among the ROI.If skip non-ROI at given frame, the non-ROI at the previous frame coding repeats so, or in the non-ROI zone substitution present frame with interpolation.Skip temporal quality that non-ROI zone also can improve present frame for being used for the ROI coding except keeping the position.In particular, in two or more successive frames, present identical non-ROI zone and will trend towards reducing time flicker in the non-ROI zone.
Low-down bit rate (for example, 32kbps) under, even the position is evenly distributed between the MB, the non-ROI district of also encoding roughly usually, wherein the time visual quality problem of for example glimmering becomes remarkable.On the other hand, under most of videophone application situations of background right and wrong ROI, there be very limited moving in the background.Therefore, background skipping is the solution of relocation bit with the quality in improvement ROI and encoded non-ROI district, as long as the described video fidelity of can seriously not demoting of skipping.
It is in order to keep the common method of bits of coded in very low bitrate is used that frame is skipped.Non-ROI skips and the difference of frame between skipping is, the ROI of each frame of coding is to guarantee the good visual quality of ROI in non-ROI skipping method.Frame is skipped in many application very useful.Yet in the ROI video coding, frame is skipped for example risk of the important information of facial expression is lost in existence, especially when in equation (1) α being set at higher value, because any ROI distortion all can be subjected to the serious punishment and the overall performance of can demoting.Therefore, non-ROI skips and is preferable selection and can saves a large amount of positions usually to improve the ROI quality, because the number of background MB is dominant in the ordinary video frame.
As shown in figure 13, non-ROI skips technology and relates to successive frame is grouped into the unit, and described unit comprises the common non-ROI zone of sharing between the ROI zone of frame and the frame.In the example of Figure 13, two successive frames are divided into groups.Non-ROI background skipping module 50 is grouped into frame unit (96) with frame i and frame i+1, and notice video encoder 58 will be about wherein skipping the frame in non-ROI zone.In response, video encoder 58 uses by what position distribution module 48 provided and distributes to come the zone of ROI separately (98) of coded frame i and i+1 through summation of weighted bits.In addition, video encoder 58 uses the non-ROI zone of distributing to come coded frame i through summation of weighted bits.Yet video encoder 58 is the non-ROI zone of coded frame i+1 not.But, the non-ROI zone of skipped frame i+1, and the non-ROI zone of previous frame i is provided in its position.
Can for providing non-ROI, the basis skip by All Time.For instance, can alternate frame skip the purpose of non-ROI for All Time continuously, per two frames are grouped into a unit for the basis.In other words, can be the basic non-ROI that skips in a frame by All Time.As an alternative, can on the adaptability basis, activate and releasing is skipped.When the non-ROI distortion that is produced by nearest previous frame surpasses the distortion threshold value, can remove and skip.As shown in figure 13, for instance, if the distortion in the non-ROI zone of previous frame less than threshold value (102), the non-ROI (104) of skipped frame i+1, and process so proceeds to two successive frames of next group, as increased progressively i=i+2 (106) expression by frame.In the case, the level of non-ROI distortion is acceptable, and activates and skip.Yet, if the non-ROI zone (108) of distributing to come coded frame i+1 through summation of weighted bits, is used in non-ROI distortion so greater than distortion threshold value (102).In the case, remove and skip owing to the cause of excessive non-ROI distortion (that is the excessive distortion in the non-ROI zone of associated video scene).
Figure 14 is that explanation is grouped into the figure of unit to support that non-ROI skips with successive frame.As shown in figure 14, the successive frame in frame 0,1,2 and the 3 expression video sequences.In this example, frame 0 and frame 1 are grouped into unit 1, and frame 2 and frame 3 are grouped into unit 2.Common non-ROI zone is shared in each unit.In particular, have under the situation that the All Time that can accept distortion is skipped or adaptability is skipped, at the non-ROI zone of frame 1 repeating frame 0.Because at the non-ROI zone of frame 1 repeating frame 0, so non-ROI zone that needn't coded frame 1.Frame is grouped into the unit can be applicable to whole video sequence.In the example of Figure 14, two frames are grouped into a unit.Yet, in some applications, two or more frames can be grouped into a unit, wherein the frame in the unit, skip the non-ROI in other all frames.
Figure 15 is the figure in explanation coding continuous ROI zone and common non-ROI zone.In particular, when successive frame 0 and 1 is grouped into a unit, the ROI zone 110,112 in the difference coded frame 0 and 1.Yet,, make the non-ROI zone (not shown) of skipped frame 1 at frame 0 and frame 1 both repeating frame 0 non-ROI zones 114.In this way, can avoid originally need being used for the position consumption of coded frame 1 non-ROI.In the example of Figure 15, although it should be noted that non-ROI zone 114 is called as " background " and can comprises for example foreground features of people's shoulder.Therefore, background is generally used for referring to any zone of ROI outside in this disclosure, and should not think and be strictly limited to background imaging in the video scene.Non-ROI hereinafter is described in further detail to be skipped.
The non-ROI that now uses description to implement Fig. 4 skips the exemplary prototype system of module 50.In described prototype system, describe referring to Figure 13-15 as mentioned per two frames are grouped into a unit.In each unit, encode first non-ROI zone and skip the second non-ROI zone (for example using prediction MB) with zero motion vector.The position of each unit is distributed and can be supposed that wherein the content complexity of the frame of video in the sequence is evenly distributed in the frame based on distributing identical logic with " greediness " frame level.By this supposition, the position should be evenly distributed between two frame units:
ρ uniti = ρ Sequence - ρ used M 2 - i , - - - ( 34 )
ρ wherein SequenceBe the total ρ budget of one group of M successive frame in the video sequence, ρ UnitiBe ρ distribution at i unit, and ρ UsedThe ρ that is an initial unit, (i-1)/2 consumes.In the unit, can use arbitrary apportion model (I or II) that the position is assigned to MB in ROI and the non-ROI zone.
In order to assess the result that non-ROI skips, as described herein, executed some tests.In described test, to comparing with the next distribution technique: (a) through the summation of weighted bits allocation algorithm, it has the non-ROI of All Time and skips based on model II; (b) through the summation of weighted bits allocation algorithm, it does not have non-ROI to skip based on model II; And (c) " greediness " algorithm, treat ROI and non-ROI MB in the assigning process wherein on the throne with being equal to.Speed with 15 frame per seconds (fps) has been carried out described test to initial 150 frames of standard " Carphone " QCIF video sequence.The result who shows described comparison among Figure 16-23.
Figure 16 is the curve chart that above-mentioned coding techniques (a) and (b) and (c) general perceives quality are compared.In particular, Figure 16 is depicted in the perception PSNR (in decibel (db)) in the code rate scope (in per second kilobit (kbps)).Figure 17 is the curve chart that above-mentioned coding techniques (a) and (b) and (c) overall video fidelity are compared.Term " totally " video fidelity is meant ROI and both combinations (that is, the video fidelity of entire frame) of non-ROI zone, and alternately is called " frame " video fidelity.Figure 17 is depicted in " frame " PSNR (in decibel (db)) in the code rate scope (in per second kilobit (kbps)).
Figure 18 and 19 is respectively the curve chart that above-mentioned coding techniques (a) and (b) and (c) ROI video fidelity and non-ROI video fidelity are compared.In particular, Figure 18 and 19 is depicted in the PSNR (in decibel (db)) in the code rate scope (in per second kilobit (kbps)).According to Figure 18, the ROI video fidelity is meant the video fidelity in the ROI zone of frame of video.According to Figure 19, non-ROI video is meant the video fidelity in the non-ROI zone of frame of video.Figure 16-19 is illustrated in user application preference factor α=0.9 in the summation of weighted bits allocation algorithm.In each of Figure 16-19, (a) have that the non-ROI of All Time skips through summation of weighted bits distribute, (b) do not skip through summation of weighted bits distribute and (c) curve of greedy algorithm respectively by reference number 116,118,120 identifications.
Figure 20 and 21 is respectively the curve chart that the general perceives quality of above-mentioned coding techniques (a) and (b), (c) and overall video fidelity are compared.In particular, Figure 20 is depicted in the perception PSNR (in decibel (db)) in the code rate scope (in per second kilobit (kbps)).Figure 21 is depicted in the PSNR (in decibel (db)) in the code rate scope (in per second kilobit (kbps)).Figure 20 and 21 is illustrated in user application preference factor α=0.7 in the summation of weighted bits allocation algorithm.Figure 22 and 23 is respectively the curve chart that coding techniques (a) and (b) and (c) general perceives quality and overall video fidelity are compared.Figure 22 and 23 is illustrated in user application preference factor α=0.5 in the summation of weighted bits allocation algorithm.In Figure 20-23, (a) have that the non-ROI of All Time skips through summation of weighted bits distribute, (b) do not skip through summation of weighted bits distribute and (c) curve of greedy algorithm respectively by reference number 116,118,120 identifications.
For the test result shown in Figure 16-23, four groups of video quality measurements (being perception PSNR, frame PSNR, ROI PSNR and non-ROI PSNR) have defined as follows:
1. perception PSNR=-10 log 10D Frame
2. frame PSNR = - 10 log 10 D F ( f , f ~ ) ;
3. ROIPSNR = - 10 log 10 D RF ( f , f ~ ) ; And
4. non- ROIPSNR = - 10 l og 10 D NF ( f , f ~ ) .
In above expression formula, D FrameBe the overall time and the spatial distortion of frame, D FBe the video fidelity between primitive frame and the reconstructed frame, D RFBe the video fidelity between the ROI zone of primitive frame and reconstructed frame, and D NFIt is the video fidelity between the non-ROI zone of primitive frame and reconstructed frame.Show perception PSNR among Figure 16,20 and 22.Show frame PSNR among Figure 17,21 and 23.Show ROI PSNR among Figure 18, and show non-ROI PSNR among Figure 19.Other method during result shown in Figure 16-23 shows proposed non-ROI skipping method and all are tested is compared in the gain that has aspect the perception PSNR (PPSNR) more than the 1dB.Mainly from the improvement of ROI quality, shown in Figure 18 and 19, described improvement is to realize by will be re-assigned to ROI from the position of non-ROI in coded frame in described gain.
Introducing the observed result of noting for one is, non-ROI (background) skipping method also surpasses other method aspect the frame PSNR under low bitrate, shown in Figure 17,21 and 23.In addition, curve chart shows that the gain of frame PSNR increases along with reducing of user preference factor α.These observed results indicate non-ROI skipping method for as the very low bitrate of wireless VT use very attractive because its not only video fidelity and also aspect the visual quality all better than other method.Expection will be better than greedy algorithm through the summation of weighted bits distribution method when assigning higher value (for example, α among Figure 16=0.9) for α.Yet described advantage reduces along with reducing of α, shown in Figure 20 and 22.
Having carried out extra test assesses to incorporate into and position distribution technique that non-ROI skips is arranged and depend on the performance through the summation of weighted bits distribution technique that frame is skipped (that is, skip entire frame rather than only skip non-ROI zone).Figure 24 will use standard frame to skip the curve chart that the perceived quality with the ROI coding techniques of background skipping compares.Under each situation, all use as described herein distributing through summation of weighted bits.In one case, using non-ROI (background) skips.In another case, use the All Time frame and skip, make and skip every a frame with alternately basic.Figure 24 describes perception PSNR (in decibelmeter) and speed (with per second kilobit (kbps)).In Figure 24, reference number 122,124 and 126 is discerned respectively has that frame is skipped and the curve that distributes through summation of weighted bits of user preference factor α=0.9,0.7 and 0.5.Reference number 128,130,132 is discerned respectively has that non-ROI skips and the curve that distributes through summation of weighted bits of user preference factor α=0.9,0.7 and 0.5.What as shown in figure 24, have that non-ROI skips is distributed under all set points of α all better than having distributing through summation of weighted bits that frame skips through summation of weighted bits.Non-ROI is provided by the performance gain that is provided increases along with the increase of α value.This result is reasonably, because when α was big, the punishment of skipping ROI because of frame increased the weight of.
Show that as Figure 16-24 non-ROI background skipping method produces superperformance, especially keep relatively low when mobile at non-ROI.Yet for the video sequence with non-ROI zone of containing a large amount of motions, performance gain may reduce.Simultaneously, may skip important background information, thereby cause system performance degradation.Therefore, when skipping serious degradation video fidelity, for example when background content contains important information, need close background skipping.That will skip by the non-ROI with opening and closing for instance, distributes the ROI coding that carries out to be applied to the 180th to 209 frame of the wherein background fast moving of standard C arphone video test sequence through summation of weighted bits.Figure 25 shows the result of this analysis.More particularly, Figure 25 is the curve chart that the perceived quality of the ROI coding techniques that distributes through summation of weighted bits when non-ROI skips opening and closing is as described herein compared.
Figure 25 describes perception PSNR (with decibelmeter) and speed (with the per second kilobit) in curve chart.In Figure 25, reference number 134 and 136 is discerned the non-ROI of expression application respectively and is skipped unlatching and the curve through summation of weighted bits distribution user preference factor α=0.9 and 0.5.Reference number 138,140 is discerned the non-ROI of expression application respectively and is skipped that close and the curve through summation of weighted bits distribution user preference factor α=0.9 and 0.5.The advantage that the non-ROI that result's indication among Figure 25 is compared skips reduces (for example, from 0.9 to 0.5) along with α and reduces.This result also indicates the exploitation realization non-ROI to be skipped the value of the non-ROI skipping method of the adaptability of dynamically controlling based on the content of video sequence and user's degree of concern (α is represented as the user preference factor).
Can have relatively clearly and not have that non-ROI skips distributes the distortion that is produced through summation of weighted bits, as following indication:
D Skip_on=αD RF1)+(1-α)D NF2)+αD RFunit12)+(1-α)D NonROI_skip,(35)
D Skip_off=αD RF1′)+(1-α)D NF2′)+αD RF3′)+(1-α)D NFunit1′-ρ 2′-ρ 3′),(36)
D wherein Skip_onBe the unit total distortion when non-ROI skip mode is opened, D Skip_offBe the unit total distortion when the background skipping pattern is closed, D NonROI_skipBe the distortion that causes by the non-ROI in second frame of skipping the unit, and the ρ in the equation (35) wherein 1And ρ 2And the ρ in the equation (36) 1', ρ 2' and ρ 3' be the number that is assigned to the AC coefficient (ρ) of ROI and non-ROI.
From equation (35) and (36), can be observed, only work as D NonROI_skip>>D NFUnit1'-ρ 2'-ρ 3') time, D Skip_on>D Skip_offJust set up, because following formula is set up usually:
αD RF1)+(1-α)D NF2)+αD RFunit12)<αD RF1′)+(1-α)D NF2′)+αD RF3′)
D from as shown in figure 26 Carphone video test sequence NonROI_skipStatistics in the checking this observed result be correct.Figure 26 is the curve chart that the distortion that is caused by background skipping on the exemplary video sequence is described.In particular, Figure 26 describes the average non-ROI district dump energy D on initial 240 frames of Carphone video test sequence NonROI_skipWith frame number.According to Figure 26, can recognize D during frame 180-209 easily NonROI_skipValue is far longer than other value, and frame 180-209 is to be the frame of feature with the high degree of motion.Therefore, normally favourable although non-ROI skips, it is not favourable during the high motion parts that frame 180-209 is provided.
Based on above observed result, the task of pursuing the standard that is used for opening and closing background skipping pattern is converted into seeks D NonROI_skipThe task of the threshold value of distortion.If the unit distortion in the supposition video sequence changes (usually so) with smooth manner, the mean value of the unit distortion of most recent processing can be used for deriving the distortion threshold value so.Will
Figure S2006800135958D00271
Be expressed as the average distortion of a most recent n unit, so based on (35) and (36), if ( 1 - α ) D NonROI _ skip > D ‾ n 2 Set up, so very might realize D Skip_on>D Skip_offIn other words, being used to close the standard that non-ROI skips can be designated as D NonROI _ skip > D ‾ n 2 ( 1 - α ) . This standard can be served as the basis of the non-ROI skip algorithm of adaptability.
The non-ROI skip algorithm of adaptability can be consistent substantially with process shown in Figure 13, and can further describe to as follows.
Step 0: initialization data, and set D ‾ n = 0 , And skip mode=unlatching.
Step 1: use equation (34) (to have two successive frame F as active cell nAnd F N+1Group) distribute the ρ budget.
Step 2: in active cell, divide coordination for each macro zone block by equation (32).If skip mode is opened, be not that the non-ROI of second frame in the unit divides coordination so.
Step 3: after the distortion that obtains active cell, by D ‾ n = ( 1 - η ) D ‾ n - 1 + η D n Upgrade
Figure S2006800135958D00282
Wherein η is a study factor and in [0,1] scope.
Step 4: for next unit is obtained data; If this is last unit, advance to step 6 so.
Step 5: calculate new unit and (have next two frame F N+2And F N+3Group) D NonROI_skipDistortion; If D NonROI _ skip > D ‾ n 2 ( 1 - α ) , Close skip mode so; Otherwise, open skip mode.Turn back to step 1.
Step 6: stop the adaptability skip algorithm.
Figure 27 is the curve chart that the general perceives quality that will use non-ROI to skip, do not have non-ROI to skip the ROI coding techniques of skipping with the non-ROI of adaptability compares.Under each situation, all use as described herein through the summation of weighted bits allocation algorithm.Figure 27 describes the perception PSNR (in decibelmeter) and speed (with the per second kilobit) at the ROI video coding of the frame 180-209 of standard C arphone video test sequence.Reference number 142 and 144 is discerned the non-ROI of expression respectively and is skipped unlatching and the curve through summation of weighted bits distribution user preference factor α=0.9 and 0.5.Reference number 146 and 148 is discerned the non-ROI of expression respectively and is skipped that close and the curve through summation of weighted bits distribution user preference factor α=0.9 and 0.5. Reference number 150 and 152 is discerned expression respectively and is had the non-ROI of adaptability and skip the curve that distributes through summation of weighted bits with user preference factor α=0.9 and 0.5.In this estimated, value η was set at η=0.25.Result among Figure 27 shows that for the various values of α, the result of the non-ROI skipping method of adaptability all is in close proximity to best solution.
Figure 28-33 shows the additional experiments result through the ROI of summation of weighted bits distribution technique coding techniques who uses as describing in this disclosure.Figure 28-32 expression is applied to standard C arphone video test sequence with various ROI coding techniquess.For Figure 28-32, the user preference factor α that uses in summation of weighted bits distribution method (" method of proposal " and " distributing through summation of weighted bits ") is set at 0.9." method of proposal " mark is meant to have distributing through summation of weighted bits that non-ROI skips." distribute " mark to be meant not have distributing that non-ROI skips through summation of weighted bits through summation of weighted bits.
Figure 28 is the curve chart that will use the general perceives quality of the ROI coding techniques of various positions distribution techniques to compare, and describes perception PSNR and speed.In Figure 28, reference number 154,156,158,160 and 162 discerns expression application of frame skipping method respectively, have that non-ROI skips through summation of weighted bits distribution method, greedy algorithm, constant QP algorithm and the curve that do not have non-ROI to skip through the summation of weighted bits distribution method.
Figure 29 is the curve chart that will be under the code rate of 40 kilobit per seconds (kps) uses the general perceives quality of the ROI coding techniques of various positions distribution techniques to compare.In particular, Figure 29 describes at having perception PSNR and the frame number through summation of weighted bits distribution, greedy algorithm and constant QP algorithm that non-ROI skips.
Figure 30 is the curve chart that will be under the code rate of 40 kilobit per seconds (kps) uses the overall video fidelity of the ROI coding techniques of various positions distribution techniques to compare, and describes PSNR and frame number.Figure 31 is the curve chart that will be under the code rate of 40 kilobit per seconds (kps) uses the ROI video fidelity of the ROI coding techniques of various positions distribution techniques to compare, and describes PSNR and frame number among the ROI.Figure 32 is the curve chart that will be under the code rate of 40 kilobit per seconds (kps) uses the non-ROI video fidelity of the ROI coding techniques of various positions distribution techniques to compare, and describes non-ROI PSNR and frame number.
In Figure 29-32, have distributing by reference number 164 indications that non-ROI skips through summation of weighted bits, greedy algorithm is by reference number 166 indications, and constant QP algorithm is by reference number 168 indications.Constant QP algorithm is a frame stage speed control algolithm only, and wherein all MB in the frame are all assigned identical quantizer.Greedy algorithm is described hereinbefore, and with the progressive line operate of MB.Frame skip algorithm application standard frame is skipped avoiding with basic coding alternately every the content of a frame, and skip ROI and non-ROI zone both.Do not have that non-ROI skips through summation of weighted bits distribute and have that the adaptability frame skips distribute (" method of proposal ") to describe hereinbefore through summation of weighted bits.
All better than all other methods, and performance gain reaches 2dB to the method that Figure 28 shows proposal in whole bit rate range.In Figure 29-32, show the frame level details of method, greedy algorithm and the constant QP algorithm of proposal.
Figure 33 is the curve chart that uses the general perceives quality of the ROI coding techniques of various positions distribution technique to compare at another exemplary video sequence with in a code rate scope.In particular, Figure 33 describes perception PSNR and the speed on initial 180 frames of standard Foreman video test sequence.In Figure 33, reference number 154,156,158,160 and 162 discerns expression application of frame skipping method respectively, have that non-ROI skips through summation of weighted bits distribution method, greedy algorithm, constant QP algorithm and the curve that do not have non-ROI to skip through the summation of weighted bits distribution method.
As shown in figure 33, the frame skipping method not as in the Carphone sequence, carry out good because compare with the Carphone sequence, much bigger motion is contained in the face of Foreman sequence.Therefore, frame is skipped the ROI information of omitting volume in the Foreman sequence, thereby causes performance unsatisfactory.It should be noted that the proposal method of distributing through summation of weighted bits that has that the non-ROI of adaptability skips carries out very goodly for the Foreman sequence, show as Figure 33.
In this disclosure, various technology have been described to support to be used for the ROI coding of visual telephone or video streaming application, especially when having very low bitrate requirement (for example, in wireless videophone).This disclosure provide two kinds of different ρ territories that are used for the ROI video coding through optimizing through the summation of weighted bits allocative decision.This disclosure also provide can with through summation of weighted bits apportion model co-operation with the non-ROI of the adaptability that realizes better performance (" background ") skipping method.In addition, this disclosure is provided for the video quality metric of measure R OI video quality.The position distribution technique that the ROI quality metric can be used for guiding optimization produces subjective visual quality do preferably by common consideration user to preference, video fidelity, spatial perception quality and the time perceived quality of ROI.Thereby the ROI quality metric is realized user interactions and is satisfied the subjective perception quality requirement with the skew coding parameter.
Technology described herein can be implemented in hardware, software, firmware or its any combination.If in software, implement, can part realize described technology so by computer-readable media, described computer-readable media comprises and contains the program code that can carry out the one or more instruction in the described method when carrying out.In the case, computer-readable media can comprise for example random-access memory (ram), read-only memory (ROM), nonvolatile RAM (NVRAM), Electrically Erasable Read Only Memory (EEPROM), FLASH memory, magnetic or the optical data storage media etc. of Synchronous Dynamic Random Access Memory (SDRAM).
Program code can be carried out by one or more processors, and described one or more processors are one or more digital signal processors (DSP), general purpose microprocessor, application-specific integrated circuit (ASIC) (ASIC), field programmable logic array (FPGA) or the integrated or discrete logic of other equivalence for example.In certain embodiments, functional being provided in the dedicated software modules or hardware cell that is configured for use in Code And Decode described herein, or be incorporated in the Video Codec (CODEC) of combination.
Various embodiment have been described.These and other embodiment belongs in the scope of appended claims.

Claims (29)

1. method of using the region-of-interest coding of the vision signal that position, ρ territory distributes, it comprises:
Obtain defining of the interior concern district of frame of video;
The frame rate budget of the bits of coded number that can be used in described frame is defined in acquisition; And
Based on the weighting between the macro zone block in the not zone in described concern district of macro zone block in described frame rate budget and the described concern district and described frame of video the ρ thresholding is assigned to macro zone block in the described frame, each of wherein said ρ thresholding is represented the number of the non zero AC coefficient in the macro zone block.
2. method according to claim 1, it comprises that further described ρ thresholding is mapped to corresponding quantization parameter (QP) value is assigned in the described macro zone block each with the bits of coded with a number.
3. method according to claim 2, it further comprises the encode described macro zone block of described frame of video of the bits of coded of using described distribution.
4. method according to claim 2, the number of the position of wherein said distribution is less than or equal to the bits number by described frame rate budget appointment.
5. method according to claim 2, wherein said weighting is to the distortion of small part based on previous frame.
6. method according to claim 2, wherein said weighting to small part based on the perceived quality of the video fidelity of previous frame, described previous frame and user preference to described concern district.
7. method according to claim 6, wherein said perceived quality comprise the temporal quality value and the space quality value of described previous frame.
8. method according to claim 7, wherein said temporal quality value comprise the second temporal quality value in the not zone in described concern district of the very first time mass value in described concern district and described frame of video.
9. method according to claim 7, wherein said space quality value comprise the second space quality value in the not zone in described concern district of the first space quality value in described concern district and described frame of video.
10. method according to claim 7, wherein said space quality value are to the existence of small part based on blocked false image in the described previous frame, and wherein said temporal quality value is to the existence of small part based on time flicker artifacts in the described previous frame.
11. method according to claim 6, wherein said video fidelity comprise to the distortion value of small part based on the peak signal noise ratio of described previous frame.
12. method according to claim 1 wherein distributes the ρ thresholding to comprise based on about whether skipping the indication of the coding in the described zone in described concern district not being distributed described ρ thresholding.
13. method according to claim 12, it further comprises successive frame is grouped into frame unit, the concern district that is associated with described each frame of coding, and skip the coding of not paying close attention to the zone in distinguishing at each to described frame of video in the described frame at least one.
14. method according to claim 12, it further comprises when the distortion value that is associated with the previous frame unit during less than threshold value, optionally skips the coding in the zone in each pays close attention to the district not in the described frame at least one.
15. a device that is used to use the region-of-interest coding of the vision signal that position, ρ territory distributes, it comprises:
Pay close attention to district's mapper, it produces defining of the interior concern district of frame of video;
Frame stage speed controller, its generation define the frame rate budget of the bits of coded number that can be used in described frame; And
The position distribution module, it is assigned to macro zone block in the described frame based on the weighting between the macro zone block in the not zone in described concern district of macro zone block in described frame rate budget and the described concern district and described frame of video with the ρ thresholding, and each of wherein said ρ thresholding is represented the number of non zero AC coefficient in the macro zone block.
16. device according to claim 15, it further comprises ρ-QP mapper, and described ρ-QP mapper is mapped to corresponding quantization parameter (QP) value with described ρ thresholding and is assigned in the described macro zone block each with the bits of coded with a number.
17. device according to claim 15, it further comprises video encoder, and described video encoder uses through the encode described macro zone block of described frame of video of the bits of coded of distributing.
18. device according to claim 15, wherein the number of the position through distributing is less than or equal to the bits number by described frame rate budget appointment.
19. device according to claim 15, wherein said weighting to small part based on the perceived quality of the video fidelity of previous frame, described previous frame and user preference to described concern district.
20. device according to claim 19, wherein said perceived quality comprise the temporal quality value and the space quality value of described previous frame.
21. device according to claim 20, wherein said temporal quality value comprise the second temporal quality value in the not zone in described concern district of the very first time mass value in described concern district and described frame of video.
22. device according to claim 20, wherein said space quality value comprise the second space quality value in the not zone in described concern district of the first space quality value in described concern district and described frame of video.
23. device according to claim 20, wherein said space quality value are to the existence of small part based on blocked false image in the described previous frame, and wherein said temporal quality value is to the existence of small part based on time flicker artifacts in the described previous frame.
24. device according to claim 19, wherein said video fidelity comprise to the distortion value of small part based on the peak signal noise ratio of described previous frame.
25. device according to claim 15, wherein said position distribution module is based on about whether skipping the indication of the coding in the described zone in described concern district not being distributed the ρ thresholding.
26. device according to claim 25, it further comprises:
Video encoder, the described macro zone block of its described frame of video of encoding; And
Skip module, it guides described video encoder that successive frame is grouped into frame unit, the concern district that is associated with described each frame of coding, and skip the coding of not paying close attention to the zone in distinguishing at each to described frame of video in the described frame in the described frame unit at least one.
27. device according to claim 26, wherein when the distortion value that is associated with the previous frame unit during less than threshold value, the described described video encoder of module booting of skipping is optionally skipped the not coding in the zone in each pays close attention to the district to described frame of video in the described frame at least one.
28. device according to claim 15, it further comprises wireless launcher, and described wireless launcher transmits encoded frame of video via wireless communication, and wherein said device is configured to support mobile video telephone.
29. a device that is used to use the region-of-interest coding of the vision signal that position, ρ territory distributes, it comprises
Be used to obtain the device that defines in the concern district in the frame of video;
Be used to obtain to define the device of the frame rate budget of the bits of coded number that can be used in described frame; And
Be used for based on the weighting between the macro zone block in the not zone in described concern district of macro zone block in described frame rate budget and the described concern district and described frame of video the ρ thresholding being assigned to the device of the macro zone block in the described frame, each of wherein said ρ thresholding is represented the number of non zero AC coefficient in the macro zone block.
CN200680013595.8A 2005-03-01 2006-02-28 Region-of-interest coding in video telephony using rho domain bit allocation Expired - Fee Related CN101164342B (en)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US65800805P 2005-03-01 2005-03-01
US60/658,008 2005-03-01
US11/199,935 US8768084B2 (en) 2005-03-01 2005-08-09 Region-of-interest coding in video telephony using RHO domain bit allocation
US11/199,935 2005-08-09
PCT/US2006/007154 WO2006093999A2 (en) 2005-03-01 2006-02-28 Region-of-interest coding in video telephony using rho domain bit allocation

Publications (2)

Publication Number Publication Date
CN101164342A CN101164342A (en) 2008-04-16
CN101164342B true CN101164342B (en) 2011-03-02

Family

ID=39298314

Family Applications (4)

Application Number Title Priority Date Filing Date
CN200680013595.8A Expired - Fee Related CN101164342B (en) 2005-03-01 2006-02-28 Region-of-interest coding in video telephony using rho domain bit allocation
CN200680013727.7A Expired - Fee Related CN101164343B (en) 2005-03-01 2006-02-28 Region-of-interest coding with background skipping for video telephony
CN200680013366.6A Expired - Fee Related CN101164341B (en) 2005-03-01 2006-02-28 Quality metric-biased region-of-interest coding for video telephony
CN200680013746.XA Expired - Fee Related CN101164344B (en) 2005-03-01 2006-03-01 Content-adaptive background skipping for region-of-interest video coding

Family Applications After (3)

Application Number Title Priority Date Filing Date
CN200680013727.7A Expired - Fee Related CN101164343B (en) 2005-03-01 2006-02-28 Region-of-interest coding with background skipping for video telephony
CN200680013366.6A Expired - Fee Related CN101164341B (en) 2005-03-01 2006-02-28 Quality metric-biased region-of-interest coding for video telephony
CN200680013746.XA Expired - Fee Related CN101164344B (en) 2005-03-01 2006-03-01 Content-adaptive background skipping for region-of-interest video coding

Country Status (1)

Country Link
CN (4) CN101164342B (en)

Families Citing this family (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101742296B (en) * 2008-11-14 2016-01-20 北京中星微电子有限公司 Reduce video coding-decoding method and the device of the fluctuation of bit stream data amount
CN101494785B (en) * 2008-12-19 2012-05-09 无锡云视界科技有限公司 Method for encoding H.264 interesting region
RU2540846C2 (en) 2010-01-11 2015-02-10 Телефонактиеболагет Л М Эрикссон (Пабл) Video quality assessment technology
KR20120127409A (en) * 2010-01-22 2012-11-21 소니 주식회사 Reception device, transmission device, communication system, method for controlling reception device, and program
CN101854532B (en) * 2010-05-25 2012-12-19 无锡中星微电子有限公司 Video encoding device and method
CN102263943B (en) * 2010-05-25 2014-06-04 财团法人工业技术研究院 Video bit rate control device and method
CN101977309B (en) * 2010-06-30 2012-07-25 无锡中星微电子有限公司 Code rate control method and device thereof
TWI427531B (en) * 2010-10-05 2014-02-21 Aten Int Co Ltd Remote management system and the method thereof
KR20130129471A (en) 2011-04-11 2013-11-28 인텔 코오퍼레이션 Object of interest based image processing
CN102427533B (en) * 2011-11-22 2013-11-06 苏州科雷芯电子科技有限公司 Video transmission device and method
CN103428483B (en) * 2012-05-16 2017-10-17 华为技术有限公司 A kind of media data processing method and equipment
US20140254659A1 (en) 2013-03-11 2014-09-11 Mediatek Inc. Video coding method using at least evaluated visual quality and related video coding apparatus
US9832521B2 (en) * 2014-12-23 2017-11-28 Intel Corporation Latency and efficiency for remote display of non-media content
CN105049947A (en) * 2015-06-19 2015-11-11 美国掌赢信息科技有限公司 Transmission method of instant video and electronic equipment
CN105163196A (en) * 2015-06-19 2015-12-16 美国掌赢信息科技有限公司 Real-time video coding method and electronic equipment
CN105163125A (en) * 2015-06-19 2015-12-16 美国掌赢信息科技有限公司 Real-time video coding method and electronic equipment
CN104994405A (en) * 2015-06-19 2015-10-21 美国掌赢信息科技有限公司 Instant-video transmission method and electronic equipment
WO2016202287A1 (en) * 2015-06-19 2016-12-22 美国掌赢信息科技有限公司 Real-time video transmission method and electronic apparatus
CN105163195A (en) * 2015-06-19 2015-12-16 美国掌赢信息科技有限公司 Real-time video coding method and electronic equipment
CN105163197A (en) * 2015-06-19 2015-12-16 美国掌赢信息科技有限公司 Real-time video coding method and electronic equipment
CN106162177B (en) * 2016-07-08 2018-11-09 腾讯科技(深圳)有限公司 Method for video coding and device
CN108833915A (en) * 2018-06-21 2018-11-16 杭州雄迈集成电路技术有限公司 A kind of intelligent coding method of adaptive frame skipping
JP7092050B2 (en) * 2019-01-17 2022-06-28 日本電信電話株式会社 Multipoint control methods, devices and programs
CN111901603B (en) * 2020-07-28 2023-06-02 上海工程技术大学 Coding method and decoding method for static background video
CN116760988B (en) * 2023-08-18 2023-11-10 瀚博半导体(上海)有限公司 Video coding method and device based on human visual system
CN118101938B (en) * 2024-04-24 2024-06-25 华侨大学 VVC code rate control method and device based on region of interest

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6111991A (en) * 1998-01-16 2000-08-29 Sharp Laboratories Of America Method and apparatus for optimizing quantizer values in an image encoder
EP0892564B1 (en) * 1997-07-18 2002-04-03 Tektronix, Inc. Attentional maps in objective measurement of video quality degradation

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3086396B2 (en) * 1995-03-10 2000-09-11 シャープ株式会社 Image encoding device and image decoding device
SE513356C2 (en) * 1998-11-20 2000-08-28 Ericsson Telefon Ab L M Method and apparatus for encrypting images
KR100643454B1 (en) * 2001-11-17 2006-11-10 엘지전자 주식회사 Method for video data transmission control
KR20050035539A (en) * 2002-09-06 2005-04-18 코닌클리케 필립스 일렉트로닉스 엔.브이. Content-adaptive multiple description motion compensation for improved efficiency and error resilience
JP3814592B2 (en) * 2003-06-27 2006-08-30 キヤノン株式会社 Imaging apparatus and control method thereof

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0892564B1 (en) * 1997-07-18 2002-04-03 Tektronix, Inc. Attentional maps in objective measurement of video quality degradation
US6111991A (en) * 1998-01-16 2000-08-29 Sharp Laboratories Of America Method and apparatus for optimizing quantizer values in an image encoder

Also Published As

Publication number Publication date
CN101164343B (en) 2013-02-13
CN101164344B (en) 2012-04-25
CN101164341A (en) 2008-04-16
CN101164343A (en) 2008-04-16
CN101164341B (en) 2012-05-02
CN101164344A (en) 2008-04-16
CN101164342A (en) 2008-04-16

Similar Documents

Publication Publication Date Title
CN101164342B (en) Region-of-interest coding in video telephony using rho domain bit allocation
JP5160903B2 (en) Region of interest coding using background skipping for videophones.
JP5410553B2 (en) Quality Metric Bias Region of Interest Coding for Videophones
JP5318424B2 (en) Content Adaptive Background Skip for Region of Interest Video Coding
JP5301645B2 (en) Region of interest coding using background skipping for videophones.
JP2008532427A5 (en)
JP2008532428A5 (en)

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 1117687

Country of ref document: HK

C14 Grant of patent or utility model
GR01 Patent grant
REG Reference to a national code

Ref country code: HK

Ref legal event code: WD

Ref document number: 1117687

Country of ref document: HK

CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20110302

Termination date: 20190228

CF01 Termination of patent right due to non-payment of annual fee