CN101529913A

CN101529913A - Picture identification for multi-view video coding

Info

Publication number: CN101529913A
Application number: CN 200780039739
Authority: CN
Inventors: 帕文·拜哈斯·潘迪特; 苏晔平; 尹鹏
Original assignee: Thomson Licensing SAS
Current assignee: Thomson Licensing SAS
Priority date: 2006-10-24
Filing date: 2007-10-11
Publication date: 2009-09-09
Also published as: CN101529914A; ZA200902051B

Abstract

According to a general aspect, a picture from a first view, a picture from a second view, and dependency information are accessed. The dependency information describes one or more inter-view dependency relationships for the picture from the first view. Based on the dependency information, it is determined whether the picture from the first view is a reference picture for the picture from the second view. One application area involves determining whether pictures in a decoded picture buffer are reference pictures for pictures that have not yet been decoded. The pictures in the buffer may be marked to indicate whether they continue to be needed as inter-view reference pictures.

Description

The image identification that is used for multi-view video coding

The cross reference of related application

The application requires the priority of following each application: the U.S. Provisional Application No.60/853 that submitted on October 24th, (1) 2006, be entitled as " Decoded Reference Picture Management for MVC ", 932, and on November 21st, (2) 2006 U.S. Provisional Application No.60/860 that submit to, that be entitled as " Inter-View and Temporal Refernece Picture Identification for MVC ", 367.These two each in first to file all by reference integral body be incorporated into this.

Technical field

Principle relate generally to video coding of the present invention and decoding.

Background technology

Video Decoder can decoded picture, and this image is stored in the memory, no longer need to determine this decoded image up to decoder.The successive image that this decoded image may need for example to be used for to being encoded based on this decoded image is decoded.In various systems, image is encoded as and is known as the difference of the previous image of " reference picture ", and decoded reference picture is stored in the decoder place, till all successive images that used this reference picture are all decoded.To the storage consumption of reference picture the memory of decoder place preciousness.

Summary of the invention

According to a total aspect, visit is from the image at first visual angle, from the image and the dependency information at second visual angle.This dependency information is described visual angle, one or more interval (inter-view) dependency relationships from the image at first visual angle.Based on this dependency information, judge whether the image from first visual angle is the reference picture that is used for from the image at second visual angle.

Set forth the details of one or more implementations in the the accompanying drawings and the following description.Even be described, still also should be understood that and to dispose or to implement these implementations in many ways with a kind of ad hoc fashion.For example, a kind of implementation can be used as method and carries out, and perhaps is embodied as the device that is configured to carry out one group of operation, perhaps is embodied as the device that storage is used to carry out one group of instruction of operating, perhaps is implemented in the signal.In conjunction with the accompanying drawings and claims, from following detailed, can know other aspects and feature.

Description of drawings

Fig. 1 is the block diagram of example encoder.

Fig. 2 is the block diagram of exemplary decoder.

Fig. 3 is based on the MPEG-4AVC standard and has the figure of the exemplary interval visual angle time prediction structure at 8 visual angles.

Fig. 4 is the figure that is used for illustrative methods that the reference picture management data are encoded.

Fig. 5 is the figure that is used for illustrative methods that the reference picture management data are decoded.

Fig. 6 is used for determining the figure of the illustrative methods of view angle dependency at interval.

Fig. 7 is used for determining the figure of the another kind of illustrative methods of view angle dependency at interval.

Fig. 8 is the high-level view of example encoder.

Fig. 9 is the high-level view of exemplary decoder.

Figure 10 is the flow chart that is used for the implementation of definite dependent method.

Figure 11 is the flow chart of implementation of method that is used to delete the image of storage.

Embodiment

At least a implementation described here provides a kind of and has deleted given video encoder and/or Video Decoder through decoded picture based on interval view angle dependency information from memory.At interval the view angle dependency information description givenly concern through one or more intervals of decoded picture view angle dependency.Therefore, for example to depending on the given information that is described through all successive images of decoded picture as the reference image, Video Decoder (for example) will this be given through decoded picture in back deletion that all that successive image is decoded by visit.Other implementation should be given through decoded picture at all decoded back of all that successive image mark, and this is given through decoded picture and do not delete immediately.Can encode to dependency information according to the high-level syntax that for example (defines below) based on the MVC of MPEG-4AVC standard.

In the current implementation of the multi-view video coding of H.264 recommending (hereinafter being called " MPEG-4AVC standard ") based on International Standards Organization/International Electrotechnical Commission (ISO/IEC) mpeg-4-(MPEG-4) the 10th part advanced video coding (AVC) standard/international telecommunication union telecommunications sector (ITU-T), reference software by utilize single encoded device come to each visual angle encode and considered intersection visual angle (cross-view) thus with reference to realizing the various visual angles prediction.In addition, based on the current implementation of the MVC (hereinafter being called " based on the MVC of MPEG-4AVC standard ") of MPEG-4AVC standard also to the picture order count between the different visual angles (POC) and frame/image number (frame_num) uncoupling, thereby the image that allows to have identical frame_num and POC appears in decoded picture buffer (DPB).Utilize the visual angle identifier (view_id) that is associated with these images to distinguish these images.

In order to manage, based on the MVC use compatible storage management control operation of MPEG-4AVC (MMCO) order of MPEG-4AVC standard through decoded picture buffer.These MMCO order is only to having and a plurality of image manipulations that are used for carrying the identical view_id of image that these MMCO order.

This may be too limited, and the possibility inefficiency, because do not allow the MMCO Command Flags view_id image different with himself, so required generally bigger through the decoded picture buffer size.Therefore, less in order to allow through decoded picture buffer size (thereby using less memory), should come managing image in mode more efficiently.

According to the MPEG-4AVC standard, be encoded or decode and the image that can be used for reference is stored in decoded picture buffer.This image is marked as (a) short-term reference picture or (b) long term reference image subsequently.Short-term reference picture can be assigned a LongTermPicNum (and " being changed " is the long term reference image) after a while constantly.This labeling process is to utilize the MMCO order shown in the table 1 to finish.Table 1 shows through decoded reference pictures mark grammer.Utilize the MMCO order can realize managing through decoded picture buffer efficiently.

Table 1

dec_ref_pic_marking(){	C	Descriptor
dec_ref_pic_marking(){	C	Descriptor	If (nalunit_type==5\|\|nal_unit_type==21)/* nal_unit_type 21 in appendix G, specify */
no_output_of_prior_pics_flag	2\|5	u(1)

long_term_reference_flag	2\|5	u(1)
long_term_reference_flag	2\|5	u(1)	}else{
adaptive_ref_pic_marking_mode_flag	2\|5	u(1)	}else{
adaptive_ref_pic_marking_mode_flag	2\|5	u(1)	if(adaptive_ref_pic_marking_mode_flag)
do{			if(adaptive_ref_pic_marking_mode_flag)
do{			memory_management_controlo_peration	2\|5	ue(v)
if(memory_management_control_operation＝＝1\|\| memory_management_control_operation＝＝3)			memory_management_controlo_peration	2\|5	ue(v)
			difference_of_pic_nums_minus1	2\|5	ue(v)
if(memory_management_control_operation＝＝2)			difference_of_pic_nums_minus1	2\|5	ue(v)
if(memory_management_control_operation＝＝2)			long_term_pic_num	2\|5	ue(v)
if(memory_management_control_operation＝＝3\|\| memory_management_control_operation＝＝6)			long_term_pic_num	2\|5	ue(v)
			Iong_term_frame_idx	2\|5	ue(v)
if(memory_management_control_operation＝＝4)			Iong_term_frame_idx	2\|5	ue(v)
if(memory_management_control_operation＝＝4)			max_long_term_frame_idx_plus1	2\|5	ue(v)
}while(memory_management_control_operation ！＝0)			max_long_term_frame_idx_plus1	2\|5	ue(v)
}while(memory_management_control_operation ！＝0)			}
}			}

Selection between adaptive reference image tube reason and the sliding window mark is to utilize the adaptive_ref_pic_marking_mode_flag that occurs in the slice header to realize.Explanation to adaptive_ref_pic_marking_mode_flag has been shown in the table 2.

Table 2

adaptive_ref_pic_marking_mode_flag	Specified reference picture marking pattern
adaptive_ref_pic_marking_mode_flag	Specified reference picture marking pattern	0	Sliding window reference picture marking pattern: a kind of marking mode that first-in first-out mechanism is provided for short-term reference picture
1	Adaptive reference image scale note pattern: grammer is provided	0

Unit usually specifies a kind of reference picture marking pattern of reference picture marking for " being not used in reference " and appointment long-term frame index

Explanation to each storage management control operation order has been shown in the table 3.Table 3 shows storage management control operation (memory_management_control_operation) value.

Table 3

memory_management_control_operation	The storage management control operation
memory_management_control_operation	The storage management control operation	0	Finish the circulation of memory_management_control_operation syntactic element
1	Short-term reference picture is labeled as " being not used in reference "	0
1		2	The long term reference image tagged is " being not used in reference "
3	Short-term reference picture is labeled as " being used for long term reference ", and assigns long-term frame index to it	2
3		4	Specify maximum long-term frame index, and long-term frame index is " being not used in reference " greater than these peaked all long term reference image tagged
5	All reference picture markings are " being not used in reference ", and the MaxLongTermFrameIdx variable is set to " non-long-term frame index "	4
5		6	Present image is labeled as " being used for long term reference ", and assigns long-term frame index to it

In a kind of MPEG-4AVC operating such solution that is used for multi-view video coding, all video sequences are woven into single sequence.Sequence after this single interweaving is fed in the MPEG-4AVC operating such encoder, has produced MPEG-4AVC operating such bit stream.

Because this is a kind of MPEG-4AVC operating such implementation, so can't identify the visual angle under arbitrary image.Owing to assigned frame number (flame_num) and picture order count under the situation of not considering this point, MPEG-4AVC operating such MMCO order can realize efficiently through the management of decoded picture buffer size.

In the MVC based on the MPEG-4AVC standard, extra grammer has been added to sequential parameter and has concentrated, and is as shown in table 4, is used for informing intersection visual angle reference.Table 4 shows sequence parameter set (SPS) multi-view video coding extension syntax.This grammer is used to refer to and will be used for the intersection visual angle reference of grappling and non-anchor picture in the following manner.

Table 4

seq_parameter_set_mvc_extension(){	C	Descriptor
seq_parameter_set_mvc_extension(){	C	Descriptor	num_views_minus_1	ue(v)
for(i＝0；i＜＝num_minus_1；i++)			num_views_minus_1	ue(v)
for(i＝0；i＜＝num_minus_1；i++)			view_id[i]	ue(v)
for(i＝0；i＜＝num_views_minus_1；i++){			view_id[i]	ue(v)
for(i＝0；i＜＝num_views_minus_1；i++){			num_anchor_refs_10[i]	ue(v)
for(j＝0；j＜num_anchor_refs_10[i]；j++)			num_anchor_refs_10[i]	ue(v)
for(j＝0；j＜num_anchor_refs_10[i]；j++)			anchor_ref_10[i][j]	ue(v)
num_anchor_refs_11[i]		ue(v)	anchor_ref_10[i][j]	ue(v)
num_anchor_refs_11[i]		ue(v)	for(j＝0；j＜num_anchor_refs_11[i]；j++)
anchor_ref_11[i][j]		ue(v)	for(j＝0；j＜num_anchor_refs_11[i]；j++)
anchor_ref_11[i][j]		ue(v)	}
for(i＝0；i＜＝num_views_minus_1；i++){			}
for(i＝0；i＜＝num_views_minus_1；i++){			num_non_anchor_refs_10[i]	ue(v)
for(j＝0；j＜num_non_anchor_refs_10[i]；j++)			num_non_anchor_refs_10[i]	ue(v)
for(j＝0；j＜num_non_anchor_refs_10[i]；j++)			non_anchor_ref_10[i][j]	ue(v)
num_non_anchor_refs_11[i]		ue(v)	non_anchor_ref_10[i][j]	ue(v)
num_non_anchor_refs_11[i]		ue(v)	for(j＝0；j＜num_non_anchor_refs_11[i]；j++)
non_anchor_ref_11[i][j]		ue(v)	for(j＝0；j＜num_non_anchor_refs_11[i]；j++)
non_anchor_ref_11[i][j]		ue(v)	}

}

Following processes should be performed and will be placed into from the reference picture at the visual angle different with current visual angle in the reference prediction tabulation:

If-present image is anchor picture or V-IDR image, then for i each value from 0 to num_anchor_refs_IX-1, have equal anchor_ref_IX[i] view_id, the image that equals 1 inter_view_flag and the PicOrderCnt () identical with present image should be affixed to RefPicListX.

-otherwise (present image is not anchor picture and is not the V-IDR image), then for i each value from 0 to num_no_anchor_refs_IX-1, have equal non_anchor_ref_IX[i] view_id, the image that equals 1 inter_view_flag and the PicOrderCnt () identical with present image should be affixed to RefPicListX.

In this implementation, storage management control operation order only is associated with each visual angle, and the image in other visual angles of mark not.Direct result is, the visual angle reference picture that intersects can preserved in decoded picture buffer than the long time between in case of necessity, this be because given intersection visual angle reference picture can only by in the bit stream after a while the image tagged at its own visual angle be " being not used in reference ".

In the MVC based on the MPEG-4AVC standard, do not specify how to distinguish following situation (as shown in table 5): image only is used to visual angle reference at interval; Image only is used to time reference; Image is used to visual angle reference and time reference at interval; And image is not used to reference.Table 5 shows the reference picture situation at time reference and interval visual angle reference.

Table 5

Time reference	Visual angle reference at interval
Time reference	Visual angle reference at interval	0	0
0	1	0	0
0	1	1	0
1	1	1	0

The current implementation of associating multi-angle video model (JMVM) has specified the image in the visual angle that appears at except that current visual angle will be marked as the condition of " being not used in reference " under given conditions.These conditions are as follows:

If-present image is an anchor picture, all reference pictures of condition all should be marked as " being not used in reference " below then satisfying:

-reference picture has the PicOrderCnt () identical with present image.

-reference picture is decoded optional for the successive image in the decoding order of the different visual angles that next free anchor_ref_IX (X is 0 or 1) is indicated.

-reference picture is for optional to the decoding of the successive image in its oneself the visual angle.

If-present image is not an anchor picture, all reference pictures of condition all should be marked as " being not used in reference " below then satisfying:

-reference picture has the PicOrderCnt () identical with present image; And

-reference picture is decoded optional for the successive image in the decoding order of the different visual angles that next free noo_anchor_ref_IX (X is 0 or 1) is indicated.

The image that satisfies top condition is carried out mark be known as " implicit expression mark ".More generally, the implicit expression mark refers to utilize existing grammer and does not utilize and add the mark that explicit signaling is carried out.For utilizing efficiently managing through decoded picture buffer of above-mentioned implicit expression mark, the situation shown in the differentiation table 5 is important.But do not know in based on the MVC of MPEG-4AVC standard how appointment can realize this difference.

The sequence parameter set that is used for the multi-view video coding expansion shown in the table 4 comprises the information that is used as reference for which visual angle of certain viewing angles.This information can be used to generate reference table or other data structures, with indicate which visual angle be used as at interval the visual angle with reference to, which is not used as visual angle reference at interval.In addition, this information can knownly be separated for anchor picture and non-anchor picture.

In another approach, whether a new sign indicating image is used as visual angle prediction reference at interval.This informs in network abstract layer (NAL) unit header of scalable video coding/multi-view video coding expansion, and syntactic element nal_ref_idc only indicates an image whether to be used to visual angle prediction (being also referred to as " time ") reference at interval.Nal_ref_idc is apprised of in the network abstraction layer unit syntax table shown in the table 6.

Table 6

nal_unit(NumBytesInNALunit){

C

Descriptor

forbidden_zero_bit	All	f(1)
forbidden_zero_bit	All	f(1)	nal_ref_idc	All	u(2)
nal_unit_type	All	u(5)	nal_ref_idc	All	u(2)
nal_unit_type	All	u(5)	NumBytesInRBSP＝0
for(i＝1；i＜NumBytesInNALunit；i++){			NumBytesInRBSP＝0
for(i＝1；i＜NumBytesInNALunit；i++){			if(i+2＜NumBytesInNALunit&&next_bits(24)＝＝0x000003){
rbsp byte[NumBytesInRBSP++]	All	b(8)	if(i+2＜NumBytesInNALunit&&next_bits(24)＝＝0x000003){
rbsp byte[NumBytesInRBSP++]	All	b(8)	rbsp_byte[NumBytesInRBSP++]	All	b(8)
i+＝2			rbsp_byte[NumBytesInRBSP++]	All	b(8)
i+＝2			Emulation_prevention_three_byte/* equal 0x03*/	All	f(8)
}else			Emulation_prevention_three_byte/* equal 0x03*/	All	f(8)
}else			rbsp_byte[NumBytesInRBSP++]	All	b(8)
}		ue(v)	rbsp_byte[NumBytesInRBSP++]	All	b(8)
}		ue(v)	}

Semanteme below the current utilization of nal_ref_idc defines:

Be not equal to the content of NAL unit that 0 nal_ref_idc specifies the data partitions of the fragment that comprises sequence parameter set or picture parameter set or reference picture or reference picture.

For the NAL unit that comprises fragment or data partitions, equaling 0 nal_ref_idc, to indicate this fragment or fragment book to divide be the part of non-reference picture.

For sequence parameter set or sequence parameter set extension or picture parameter set NAL unit, nal_ref_idc should not equal 0.When equaling 0 for fragment of specific image or data partitions NAL unit nal_ref_idc, it all should equal 0 for all fragments of this image and data partitions NAL unit.

For the IDR_NAL unit, that is, the NAL unit that nal_unit_type equals 5, nal_ref_idc should not equal 0.

For all NAL unit that nal_unit_type equals 6,9,10,11 or 12, nal_ref_idc should equal 0.

In the following table 7 grammar correction has been shown.Table 7 shows network abstract layer (NAL) scalable video coding (SVC) multi-view video coding extension syntax.

Table 7

nal_unit_header_svc_mvc_extension(){	C	Descriptor
nal_unit_header_svc_mvc_extension(){	C	Descriptor	svc_mvc_falg	All	u(1)
if(！svc_mvc_flag){			svc_mvc_falg	All	u(1)
if(！svc_mvc_flag){			priority_id	All	u(6)
discardable_flag	All	u(1)	priority_id	All	u(6)
discardable_flag	All	u(1)	temporal_level	All	u(3)
dependency_id	All	u(3)	temporal_level	All	u(3)
dependency_id	All	u(3)	quality_level	All	u(2)
layer_base_flag	All	u(1)	quality_level	All	u(2)
layer_base_flag	All	u(1)	use_base_prediction_flag	All	u(1)
fragmented_flag	All	u(1)	use_base_prediction_flag	All	u(1)
fragmented_flag	All	u(1)	last_fragment_flag	All	u(1)
fragment_order	All	u(2)	last_fragment_flag	All	u(1)
fragment_order	All	u(2)	reserved_zero_two_bits	All	u(2)
}else{			reserved_zero_two_bits	All	u(2)
}else{			inter_view_reference_flag	All	u(1)
temporal_level	All	u(3)	inter_view_reference_flag	All	u(1)
temporal_level	All	u(3)	view_level	All	u(3)
anchor_pic_flag	All	u(1)	view_level	All	u(3)
anchor_pic_flag	All	u(1)	view_id	All	u(10)
reserved_zero_five_bits	All	u(5)	view_id	All	u(10)
reserved_zero_five_bits	All	u(5)	}
nalUnitHeaderBytes+＝3			}
nalUnitHeaderBytes+＝3			}

The semantical definition of inter_view_reference_flag is as follows:

Equal 0 inter_view_reference_flag indication present image and be not used to visual angle prediction reference at interval.Equal 1 inter_view_reference_flag indication present image and be used to visual angle prediction reference at interval.

Therefore, by checking the combination of nal_ref_idc and inter_view_reference_flag, can determine the type of given reference picture.Table 8 shows at the nal_ref_idc of reference picture type and inter_view_reference_flag.

Table 8

nal_ref_idc	inter_view_reference_flag	Type
nal_ref_idc	inter_view_reference_flag	Type		0	0	Need not be for referencial use
0	1	Only visual angle reference at interval		0	0	Need not be for referencial use
0	1	Only visual angle reference at interval	Be not equal to 0	0	Time reference only
Be not equal to 0	1	Time and visual angle reference at interval	Be not equal to 0	0	Time reference only

This method has obviously been used adjunct grammar.

The description has here illustrated principle of the present invention.Thereby, will appreciate that those skilled in the art can design various layouts, although these layouts are not here clearly described or illustrated, also can realize principle of the present invention and be included in its spirit and scope.

Here all examples put down in writing and conditional language all are in order to instruct purpose, the reader understands as of the present invention principle and the notion of the present inventor to the contribution of prior art with help, and is interpreted as limiting never in any form the example and the condition of concrete record here.

And, put down in writing here the each side of principle of the present invention and embodiment with and all statements of specific example all be in order to comprise its 26S Proteasome Structure and Function equivalent.In addition, wish that these equivalents had both comprised current known equivalent, be included in the equivalent of following exploitation again, that is, any element of identical function (regardless of structure) is carried out in exploitation.

Thereby, for example, one of skill in the art will appreciate that block representation given here realizes the concept map of the illustrative circuit of principle of the present invention.Similarly, will appreciate that, the various processes that any flow table, flow chart, state transition diagram, false code or the like are all represented can essence to be illustrated in the computer-readable medium and therefore carried out by computer or processor, no matter whether this computer or processor clearly illustrate.

The function of the various elements shown in the accompanying drawing can by use specialized hardware and can with suitable software explicitly the hardware of operating software provide.When being provided by processor, these functions can be provided by single application specific processor, are provided by single shared processing device, are perhaps provided by a plurality of independent processors, and wherein some in these a plurality of processors can be shared.And, clearly the using of term " processor " or " controller " should not be interpreted as exclusively referring to can operating software hardware, but can impliedly include but not limited to digital signal processor (DSP) hardware, be used for read-only memory (ROM), random access storage device (RAM) and the Nonvolatile memory devices of storing software.

Also can comprise other hardware (conventional and/or customization).Similarly, any switch shown in the accompanying drawing is all just conceptual.Their function can be by programmed logic operation, by special logic, by the mutual of program control and special logic or even manually carry out, the selectable particular technology of implementer can be understood from context more specifically.

In its claim, the any element that is expressed as the device that is used to carry out specific function all is an any way of carrying out this function in order to comprise, for example comprises the combination or the b of the electric circuit element of a) carrying out this function) any type of software (comprising firmware, microcode or the like) be used to carry out of the combination of this software with the proper circuit that realizes function.Defined principle of the present invention has embodied such fact in these claims, promptly the function of being put down in writing that various device provided be combined and the mode quoted with claim combined together.Thereby can think to provide any device of these functions all to be equal to shown content here.

" embodiment " or " embodiment " that mention principle of the present invention in the specification is meant that described in conjunction with the embodiments special characteristic, structure, characteristic or the like are included among at least one embodiment of principle of the present invention.Thereby the term " in one embodiment " or " in an embodiment " that occur everywhere in specification might not refer to same embodiment.

To recognize for example in the situation of " A and/or B ", use term " and/or " be to comprise selection to first option of listing (A), to the selection of second option (B) listed or to the selection of these two options (A and B).As another example, in the situation of " A, B and/or C ", this statement be to comprise selection to first option of listing (A), to the selection of second option (B) listed, to the selection of the 3rd option (C) listed, to the selection of first and second options (A and B) listed, to the selection of the first and the 3rd option (A and C) listed, to the selection of the second and the 3rd option (B and C) listed or to the selection of all three options (A and B and C).For a plurality of projects of listing, this can be expanded, and this area and correlative technology field personnel are readily understood that.

Here used " high-level syntax " is meant the grammer that exists in bit stream, it is positioned on the macroblock layer in hierarchy.For example, used here high-level syntax can refer to that (but being not limited to) is in the slice header rank, in supplemental enhancement information (SEI) rank, in picture parameter set (PPS) rank, in sequence parameter set (SPS) rank and at other grammer of network abstract layer (NAL) unit header level.

And, will recognize that although one or more embodiment of principle of the present invention are at the MPEG-4AVC standard to describe, principle of the present invention is not limited in this standard or is limited to arbitrary standards here.Thereby, for other video coding implementations and system's (comprising other video encoding standards, recommendation and expansion thereof (comprising the expansion of MPEG-4AVC standard)), also can utilize principle of the present invention.

Forward Fig. 1 to, label 100 always shows exemplary MVC encoder.Encoder 100 comprises combiner 105, and combiner 105 has and links to each other with the input of converter 110 to carry out the output of signal transmission.The output of converter 110 links to each other with the input of quantizer 115 to carry out the signal transmission.The output of quantizer 115 links to each other to carry out the signal transmission with the input of entropy coder 120 and the input of inverse quantizer 125.The output of inverse quantizer 125 links to each other with the input of inverse converter 130 to carry out the signal transmission.The output of inverse converter 130 links to each other with the first noninverting input of combiner 135 to carry out the signal transmission.The output of combiner 135 links to each other to carry out the signal transmission with the input of interior fallout predictor 145 and the input of deblocking filter 150.The output of deblocking filter 150 links to each other with the input of reference pictures store device 155 (being used for visual angle i) to carry out the signal transmission.The output of reference pictures store device 155 links to each other to carry out the signal transmission with first input of motion compensator 175 and first input of exercise estimator 180.The output of exercise estimator 180 links to each other to carry out the signal transmission with second input of motion compensator 175.

Reference pictures store device 160 (being used for other visual angles) links to each other to carry out the signal transmission with first input of disparity estimator 170 and first input of disparity compensator 165.The output of disparity estimator 170 links to each other to carry out the signal transmission with second input of disparity compensator 165.

The output of entropy coder 120 can be used as the output of encoder 100.The noninverting input of combiner 105 can be used as the input of encoder 100, and links to each other to carry out the signal transmission with second input of disparity estimator 170 and second input of exercise estimator 180.The output of switch 185 links to each other to carry out the signal transmission with the second noninverting input of combiner 135 and the anti-phase input of combiner 105.Switch 185 comprises that linking to each other with the output of motion compensator 175 links to each other with first input of carrying out the signal transmission, with the output of disparity compensator 165 and links to each other to carry out the 3rd input that signal transmits with second input of carrying out the signal transmission and with the output of interior fallout predictor 145.

Forward Fig. 2 to, label 200 is always indicated a kind of exemplary MVC decoder.Notice that encoder 100 and decoder 200 be configurable carries out the whole bag of tricks described in the disclosure.In addition, encoder 100 can be carried out mark and/or delete function during reconstruction processing.For example, encoder 100 can be kept the current state through decoded picture buffer, thus the expection of mirror image decoder action.Thereby encoder 100 can be carried out the performed all operations of decoder 200 substantially.

Decoder 200 comprises entropy decoder 205, and entropy decoder 205 has and links to each other with the input of inverse quantizer 210 to carry out the output of signal transmission.The output of inverse quantizer links to each other with the input of inverse converter 215 to carry out the signal transmission.The output of inverse converter 215 links to each other with the first noninverting input of combiner 220 to carry out the signal transmission.The output of combiner 220 links to each other to carry out the signal transmission with the input of deblocking filter 225 and the input of interior fallout predictor 230.The output of deblocking filter 225 links to each other with the input of reference pictures store device 240 (being used for visual angle i) to carry out the signal transmission.The output of reference pictures store device 240 links to each other to carry out the signal transmission with first input of motion compensator 235.

Reference pictures store device 245 (being used for other visual angles) links to each other to carry out the signal transmission with first input of disparity compensator 250.

The input of entropy decoder 205 can be used as the input to decoder 200, is used to receive residual bitstream.In addition, the input of mode module 260 also can be used as the input to decoder 200, is used for receiving the control grammer and is selected by switch 255 to control which input.In addition, second input of motion compensator 235 can be used as the input of decoder 200, is used to receive motion vector.In addition, second input of disparity compensator 250 can be used as the input to decoder 200, is used to receive difference vector.

The output of switch 255 links to each other with the second noninverting input of combiner 220 to carry out the signal transmission.First input of switch 255 links to each other with the output of disparity compensator 250 to carry out the signal transmission.Second input of switch 255 links to each other with the output of motion compensator 235 to carry out the signal transmission.The 3rd input of switch 255 links to each other with the output of interior fallout predictor 230 to carry out the signal transmission.The output of mode module 260 links to each other with switch 255 to carry out the signal transmission, is used to control which input and is selected by switch 255.The output of deblocking filter 225 can be used as the output of decoder.

One or more embodiment provide the implicit expression reference picture marking process of the multi-view video coding expansion that is used for the MPEG-4AVC standard through decoded reference pictures for efficient management.This implicit expression mark through decoded reference pictures is based on the decoder side information available derives, do not need to inform clearly tab command.The implicit expression labeling process that is proposed can utilize high-level syntax to realize.

Also provide based on dependency information from memory deletion one or more implementations (need not clearly to inform this dependency information) through decoded picture.This deletion can incorporation of markings carry out or not incorporation of markings carry out.

In the current implementation based on the multi-view video coding of MPEG-4AVC standard, reference software is intersected the visual angle with reference to realizing the various visual angles prediction by utilizing single encoded device and being encoded in each visual angle and consider.In addition, the current implementation of this multi-view video coding is also with picture order count between the different visual angles (POC) and frame number (frame_num) uncoupling, thereby the image that allows to have identical frame_num and POC appears in the encoded frame buffer (DPB), and the view_id that these imagery exploitations are associated with them distinguishes.

Forward Fig. 3 to, label 300 always shows has 8 visual angles (S0 to S7) and based on visual angle, the interval time prediction structure of MPEG-4AVC standard.In Fig. 3, the image T0-T11 among the S0 of visual angle only needs for visual angle S1 and S2, therefore just no longer needs those images after decoded visual angle S1 and S2.But in the current implementation based on the multi-view video coding (MVC) of MPEG-4AVC standard, these images still are marked as and are used for reference, and therefore require bigger through decoded picture buffer.These images only in next image sets (GOP) at this visual angle 1 ^StBe labeled (because not being used to reference) in the image.Therefore, can not manage through decoded picture buffer efficiently based on the current implementation of the MVC of MPEG-4AVC standard.

In order to manage through decoded picture buffer, this current implementation is used MPEG-4AVC operating such MMCO order.These MMCO order is only to having and being used for carrying the image that image that these MMCO order has identical view_id and operating.

In multi-view video coding, there is diverse ways to come the visual angle group is encoded.A kind of method is called time priority coding (time-first coding).This can be described at first all images at all visual angles of coming comfortable synchronization sampling is encoded.Return Fig. 3, this has implied the S0-S7 coding of sampling at T0, and then the S0-S7 in the T8 sampling is encoded, the S0-S7 in the T4 sampling encoded again, or the like.

Another kind method is called the visual angle priority encoding.This can be described as at first the one group of image that comes comfortable different single visual angles of sampling constantly being encoded, and subsequently one group of image from another visual angle is encoded.Refer again to Fig. 3, this means that the T0-T8 to visual angle S0 encodes, the T0-T8 to visual angle S2 encodes subsequently, the T0-T8 to visual angle S1 encodes again, or the like.

In order efficiently to manage through decoded reference pictures, at least a implementation is carried out mark (just looking like not need as the same with reference to image) to what have a view_id different with current visual angle through decoded reference pictures under indeterminate situation of informing tab command.For as intersect the visual angle with reference to but not as the image of time reference, decoder can be to being " need not for referencial use " with this image tagged as intersecting the visual angle with this image after with reference to all images decoding of carrying out reference.

To recognize, providing under the situation that provides here the instruction of principle of the present invention, this area and various equivalent modifications can easily be expanded to other tab command about implicit expression through decoding reference marker idea with what propose, for example " be labeled as the long term reference image ", keep the spirit of principle of the present invention simultaneously.

Sequence parameter set (SPS) has defined the grammer that the dependency structure between the different visual angles is described.This is shown in the table 4.According to table 4, the implicit expression labeling process can be derived dependency diagram/table, and it can indicate the complete dependence at visual angle.Therefore, at any given time, can search the figure/table of being derived and determine to be marked as " need not be for referencial use " from which image at a visual angle.

As a simple examples, information that can be from table 4 generates the interval view angle dependency information of Fig. 3.For the implementation that adopts among Fig. 3, the visual angle number will be known.In addition, for given visual angle (view_id[i]): (1) for each grappling time all at interval visual angles with reference to identical, and (2) for each non-anchor fix time all at interval visual angles with reference to identical.

Subsequently, for a given visual angle, at interval the number of visual angle grappling reference is by num_anchor_refs_I0[i] (for example having value j1) and num_anchor_refs_I1[i] the summation indication of (for example having value j2).For each grappling of given visual angle " i " with reference to by anchor_ref_I0[i] [j] (for example, for j=1 to j1) and anchor_ref_I1[i] list in two tabulations of [j] (for example, for j=1 to j2) index.

Similarly, for a given visual angle, at interval the number of the non-grappling reference in visual angle is by num_non_anchor_refs_I0[i] (for example having value j1) and num_non_anchor_refs_I1[i] the summation indication of (for example having value j2).For the non-grappling of each of given visual angle " i " with reference to by anchor_ref_I0[i] [j] (for example, for j=1 to j1) and anchor_ref_I1[i] list in two tabulations of [j] (for example, for j=1 to j2) index.

Whether being required the state that is used for time reference about piece image can inform in many ways.For example, this state can be informed in the nal_ref_idc of nal unit header grammer.In addition, this state can also show (if having this information for the time extensibility) at time rank middle finger.In this case, have other image of the highest time stage and be not used as time reference.In addition, can also utilize some other high-level syntaxs to indicate this state, for example, show that clearly this image only is used as the grammer of time reference.

Be to be used to carry out the embodiment of implicit expression below through the decoding reference marker.If piece image is not as time reference but as intersecting the visual angle reference, then decoder shows and it is labeled as " need not be for referencial use " meeting the following conditions: use present image all to be encoded as intersecting all images of visual angle reference picture.

By enabling the implicit expression reference picture marking, can not change existing labeling process and not change under the situation of the grammer in the MPEG-4AVC standard, efficient management intersection visual angle reference picture.

To determine the multi-view video coding sequence still be that the visual angle preferentially encodes as time priority is a selection of encoder.This informational needs is transmitted to decoder and makes and can derive correct implicit expression mark.Therefore, proposed to comprise that one indicates the type of informing encoding scheme as high-level syntax.This sign is called mvc_coding_mode_flag.In one embodiment, this sign is as shown in table 9 is apprised of in sequence parameter set (SPS).Table 9 shows sequence parameter set (SPS) multi-view video coding (MVC) extension syntax.The semanteme of this sign can followingly be described:

Mvc_coding_mode_flag indication MVC sequence preferential or visual angle priority encoding scheme service time.When mvc_coding_mode_flag equaled 1, the MVC sequence was the time priority coding.When mvc_coding_mode_flag equaled 0, the MVC sequence was the visual angle priority encoding.

Table 9

seq_parameter_set_mvc_extension(){	C	Descriptor
seq_parameter_set_mvc_extension(){	C	Descriptor	num_views_minus_1		ue(v)

mvc_coding_mode_flag	u(1)
mvc_coding_mode_flag	u(1)	implicit_marking	u(1)
for(i＝0；i＜＝num_views_minus_1；i++)		implicit_marking	u(1)
for(i＝0；i＜＝num_views_minus_1；i++)		view_id[i]	ue(v)
for(i＝0；I＜＝num_views_minus_1；i++){		view_id[i]	ue(v)
for(i＝0；I＜＝num_views_minus_1；i++){		num_anchor+refs_I0[i]	ue(v)
for(j＝0；j＜num_anchor_refs_I0[i]；j++)		num_anchor+refs_I0[i]	ue(v)
for(j＝0；j＜num_anchor_refs_I0[i]；j++)		anchor_ref_I0[i][j]	ue(v)
num_anchor_refs_I1[i]	ue(v)	anchor_ref_I0[i][j]	ue(v)
num_anchor_refs_I1[i]	ue(v)	for(j＝0；j＜num_anchor_refs_I1[i]；j++)
anchor_ref_I1[i][j]	ue(v)	for(j＝0；j＜num_anchor_refs_I1[i]；j++)
anchor_ref_I1[i][j]	ue(v)	}
for(i＝0；i＜num_views_minus_1；i++){		}
for(i＝0；i＜num_views_minus_1；i++){		num_non_anchor_refs_I0[i]	ue(v)
for(j＝0；j＜num_non_anchor_refs_I0[i]；j++)		num_non_anchor_refs_I0[i]	ue(v)
for(j＝0；j＜num_non_anchor_refs_I0[i]；j++)		non_anchor_ref_I0[i][j]	ue(v)
num_non_anchor_refs_I1[i]	ue(v)	non_anchor_ref_I0[i][j]	ue(v)
num_non_anchor_refs_I1[i]	ue(v)	for(j＝0；j＜num_non_anchor_refs_I1[i]；j++)
non_anchor_ref_I1[i][j]	ue(v)	for(j＝0；j＜num_non_anchor_refs_I1[i]；j++)
non_anchor_ref_I1[i][j]	ue(v)	}
}		}

Suppose that the method that is used for the multi-view video coding sequence is a time priority.As can be seen from Figure 3 at even number visual angle (S0, S2 ...) middle some images (T1, the T3 of existing ...), these images only are used as time reference as intersecting visual angle (being also referred to as " visual angle at interval ") reference.These images will have the highest time rank.These images can be by the sign of the special sign in the bit stream, and these these images of sign indication only are intersection visual angle images.As can be seen,, just no longer need them, and they can be labeled as and both be not used in time reference and also be not used in and intersect the visual angle reference in case these images have been used as and have intersected the visual angle reference.For example, in case (S1, T1) with reference to (S0, T1), just no longer need (S0, T1).

In addition, (S1, S3) (T1, T3), these images are not used as time reference or intersect the visual angle reference middle some images of existence at the odd number visual angle.These images also will have the highest time rank, and can be non-reference pictures.Utilize the implicit expression mark, can be with these image tagged for being not used in reference to (time reference or visual angle reference at interval).

In one embodiment, proposed to introduce a mark as high-level syntax, this implicit expression labeling process will be enabled or forbid to this mark.This mark is known as implicit_marking.In one embodiment, this mark is as shown in table 9 is apprised of in sequence parameter set (SPS).

The implicit_marking sign also can be subjected to the constraint of employed encoding scheme.For example, the implicit_marking sign can only be used when encoding scheme is the time priority coding.This is as shown in table 10.Table 10 shows sequence parameter set (SPS) multi-view video coding (MVC) extension syntax.

Whether implicit_marking indication implicit expression labeling process is used to image tagged is " being not used in reference ".When implicit_marking equaled 1, the implicit expression mark was activated.When implicit_marking equaled 0, the implicit expression mark was disabled.

Table 10

seq_parameterset_mvc_extension(){	C	Descriptor
seq_parameterset_mvc_extension(){	C	Descriptor	num_views_minus_1	ue(v)
mvc_coding_mode_flag		u(1)	num_views_minus_1	ue(v)
mvc_coding_mode_flag		u(1)	if(mvc_coding_mode_flag)
implicit_marking		u(1)	if(mvc_coding_mode_flag)
implicit_marking		u(1)	for(i＝0；i＜＝num_views_minus_1；i++)
view_id[i]		ue(v)	for(i＝0；i＜＝num_views_minus_1；i++)
view_id[i]		ue(v)	for(i＝0；i＜＝num_views_minus_1；i++){
num_anchorrefs_I0[i]		ue(v)	for(i＝0；i＜＝num_views_minus_1；i++){
num_anchorrefs_I0[i]		ue(v)	for(j＝0；j＜num_anchorrefs_I0[i]；j++)
anchor_ref_I0[i][j]		ue(v)	for(j＝0；j＜num_anchorrefs_I0[i]；j++)
anchor_ref_I0[i][j]		ue(v)	num_anchor_refs_I1[i]	ue(v)
for(j＝0；j＜num_anchorrefs_I1[i]；j++)			num_anchor_refs_I1[i]	ue(v)

anchor_ref_I1[i][j]	ue(v)
anchor_ref_I1[i][j]	ue(v)	}
for(i＝0；i＜num_views_minus_1；i++){		}
for(i＝0；i＜num_views_minus_1；i++){		num_non_anchor_refs_I0[i]	ue(v)
for(j＝0；j＜num_non_anchor_refs_I0[i]；j++)		num_non_anchor_refs_I0[i]	ue(v)
for(j＝0；j＜num_non_anchor_refs_I0[i]；j++)		non_anchor_ref_I0[i][j]	ue(v)
num_non_anchor_refs_I1[i]	ue(v)	non_anchor_ref_I0[i][j]	ue(v)
num_non_anchor_refs_I1[i]	ue(v)	for_(j＝0；j＜num_non_anchorrefs_I1[i]；j++)
non_anchor_ref_I1[i][j]	ue(v)	for_(j＝0；j＜num_non_anchorrefs_I1[i]；j++)
non_anchor_ref_I1[i][j]	ue(v)	}
}		}

According to one or more embodiment, a kind of method has been proposed, be used for the information of implicit expression derivation about the type of reference picture.This method does not require extra grammer, and is to use the existing grammer in the current implementation of associating multi-angle video model (JMVM).

The current implementation of associating multi-angle video model (JMVM) comprises that the concentrated high-level syntax of sequential parameter indicates visual angle, the interval reference for the visual angle.Thereby this has further distinguished the dependence of anchor picture and non-anchor picture by separately sending the reference viewing angle identifier.This is shown in the table 4, and table 4 comprises the relevant information that is used as reference for which visual angle of certain viewing angles.This information can be with generating reference table or other data structures, and which is not used to indicate which visual angle to be used as visual angle reference at interval.In addition, this information can knownly be separated for anchor picture and non-anchor picture.In a word, whether by the reference viewing angle information of utilizing sequential parameter to concentrate, can derive piece image needs for the prediction of visual angle, interval.

In the MPEG-4AVC standard, can utilize the nal_ref_idc in the network abstraction layer unit header that piece image is designated reference picture.In the context of multi-view video coding, only use nal_ref_idc to indicate this image whether to be used as the time reference reference of visual angle (that is, himself).

Be used to distinguish the situation shown in the table 5 from the information of sequence parameter set of the current implementation of uniting multi-angle video model (JMVM) and the nal_ref_idc (table 7) in the network abstraction layer unit header.Therefore, utilize the value of nal_ref_idc and, all combinations that can addressing list 5 from the reference viewing angle information of sequence parameter set.

For example, return Fig. 3, the different situations below considering.Suppose that visual angle S0 has view_id=0, S1 has view_id=1, and S2 has view_id=2.

For S0:

The SPS grammer will have following value, and wherein " i " has the corresponding value with S0:

Num_anchor_refs_I0[i], num_anchor_refs_I1[i], num_non_anchor_refs_I0[i] and, num_non_anchor_refs_I1[i] all equal 0.

For S1:

The sequence parameter set grammer will have following value, and wherein " i " has the corresponding value with S1, j=0:

num_anchor_refs_I0[i]＝1，num_anchor_refs_I1[i]＝1，num_non_anchor_refs_I0[i]＝1，num_non_anchor_refs_I1[i]＝1。

anchor_ref_I0[i][j]＝0，anchor_ref_I1[i][j]＝2，non_anchor_ref_I0[i][j]＝0，non_anchor_ref_I1[i][j]＝2

For S2:

The sequence parameter set grammer will have indication and use the value of visual angle reference at interval for this visual angle of anchor picture.Index can be set to have " i " with the corresponding value of S2, and j=0.

num_anchor_refs_I0[i]＝1，num_anchor_refs_I1[i]＝0，num_non_anchor_refs_I0[i]＝0，num_non_anchor_refs_I1[i]＝0。

anchor_ref_I0[i][j]＝0。

For S3 to S7, the rest may be inferred.

For all visual angles, constantly the image at T1 and T3 place will have and equal 0 nal_ref_idc.In addition, the image at moment T0/T2/T4 place will have and be not equal to 0 nal_ref_idc.

Utilize above-mentioned information, can derive following information as shown in table 11 as can be seen.Note, can utilize said method to be classified from all images at all visual angles, although only provide an example for every kind in four kinds.

Table 11

Time reference

Visual angle reference at interval

Image type

Image

(nal_ref_idc)	(from SPS)
(nal_ref_idc)	(from SPS)			0	0	Need not be for referencial use	S1，T1
0	1	Only visual angle reference at interval	S2，T1	0	0	Need not be for referencial use	S1，T1
0	1	Only visual angle reference at interval	S2，T1	Be not equal to 0	0	Time reference only	S1，T2
Be not equal to 0	1	Time and visual angle reference at interval	S2，T4	Be not equal to 0	0	Time reference only	S1，T2

Therefore, do not need extra signaling to discern the image that to distinguish at the condition in the table 5.

A kind of application of this derived information is above-mentioned implicit expression labeling process.Certainly, principle of the present invention is not limited only to relate to the application of above-mentioned implicit expression labeling process, and this area and those skilled in the relevant art will predict this application and other application that principle of the present invention can be applied to, and keeps the spirit of principle of the present invention simultaneously.

Said method can also be used to determining when from memory (for example, encoded frame buffer) deletion piece image.Note, do not need to carry out mark and still can carry out mark.For example, consider image S2, T1, this image only are visual angle references at interval.Suppose a kind of implementation of priority encoding service time, the visual angle of preset time (for this implementation, be equal to have identical picture order count) is encoded according to following order: S0, S2, S1, S4, S3, S6, S5 and S7.A kind of implementation utilizes following algorithm from encoded frame buffer deletion S2, T1:

-the image among the T1 of having decoded (S1 for example, T1) after, determine at other images that in decoded picture buffer, whether also store from T1.This will disclose S2, and T1 is stored in decoded picture buffer.

If-there are any this other images, determine then whether they only are visual angle references at interval.This will disclose S2, and T1 only is visual angle, an interval reference picture.

-for as only being each this image of visual angle reference at interval, consider to want all decoded visual angles in T1 place residue, and determine any visual angle image whether reference is stored in those visual angles.For example, determined whether that any remaining visual angle is with reference to S2.

For anchor picture and non-anchor picture, can carry out the final step of considering all residue visual angles respectively.That is, can carry out evaluation to different grammers with non-anchor picture for anchor picture.For example, S2, T1 are non-grappling visual angles, so may be for the grammer below all follow-up visual angles " i "] by evaluation: num_non_anchor_refs_I0[i, num_non_anchor_refs_I1[i], non_anchor_ref_I0[i] [j], and non_anchor_ref_I1[i] [j].The follow-up visual angle of S1 (current decoded visual angle) is S4, S3, S6, S5 and S7.The grammer at these visual angles will disclose S3 and depend on S2.Therefore, do not delete S2.But behind the S3 that decoded, above-mentioned algorithm will be considered the S2 image stored once more, and announcement is no longer included residue visual angle (S6, S5 and S7) with reference to S2.Therefore, behind the S3 that decoded, from deleting S2 through decoded picture buffer.This will take place behind the S0 that decoded, S2, S1, S4 and S3.

Rotate back into Fig. 4, label 400 always shows and is used for illustrative methods that the reference picture management data of multi-view video coding are encoded.

Method 400 comprises beginning frame 402, and beginning frame 402 passes control to functional block 404.Functional block 404 reads encoder configuration file, and passes control to functional block 406.Functional block 406 is provided with grappling and non-grappling image reference in sequence parameter set (SPS) expansion, and passes control to functional block 408.Functional block 408 is provided with mvc_coding_mode and comes preferentially still visual angle priority encoding instruction time, and passes control to judgement frame 410.Judgement frame 410 judges whether mvc_coding_mode equals 1.If then control is delivered to functional block 412.Otherwise control is delivered to functional block 414.

Functional block 412 implicit_marking are set to 1 or 0, and pass control to functional block 414.

Functional block 414 visual angle numbers are set to equal variable N, and variable i and variable j all are set to 0, pass control to judgement frame 416 again.Whether judgement frame 416 judgment variable i are less than variable N.If then control is passed to judgement frame 418.Otherwise control is passed to judgement frame 442.

Whether judgement frame 418 judgment variable j are less than the picture number among the i of visual angle.If then control is passed to functional block 420.Otherwise control is passed to functional block 440.As can be seen, the implementation of Fig. 4 is a visual angle priority encoding implementation.Fig. 4 can be used to provide the similar procedure of time of implementation priority encoding.

Functional block 420 begins the current macro that has the image of given frame_num and POC among the i of visual angle is encoded, and passes control to functional block 422.Functional block 422 is selected macro block mode, and passes control to functional block 424.424 pairs of these macro blocks of functional block are encoded, and pass control to judgement frame 426.Judgement frame 426 judges whether that all macro blocks all are encoded.If then control is passed to functional block 428.Otherwise control is returned to functional block 420.

Functional block 428 increases progressively variable j, and passes control to functional block 430.Functional block 430 increases progressively frame_num and picture order count (POC), and passes control to judgement frame 432.Judgement frame 432 judges whether implicit_marking equals 1.If then control is passed to functional block 434.Otherwise control is returned to judgement frame 418.

Functional block 434 is based on need judging whether the reference of (current evaluation) reference viewing angle as the visual angle in future in (in this implementation) with the dependency information of high-level indication.If then control is returned to judgement frame 418.Otherwise control is passed to functional block 436.

Functional block 440 increases progressively variable i, replacement frame_num, POC and variable j, and with control turn back to the judgement frame 416.

Functional block 436 is " being not used in reference " with the reference viewing angle image tagged, and control is turned back to judgement frame 418.

Judgement frame 442 judges whether to inform sequence parameter set (SPS), picture parameter set (PPS) and visual angle parameter set (VPS) in band.If then control is passed to functional block 444.Otherwise control is passed to functional block 446.

Functional block 444 sends SPS, PPS and VPS in band, and passes control to functional block 448.

Functional block 446 sends SPS, PPS and VPS outside band, and passes control to functional block 448.

Functional block 448 is write bit stream file or is passed through network streaming transmission bit stream, and passes control to end block 499.Should be appreciated that then this signaling will be sent out with video data bit stream if inform SPS, PPS or VPS in band.

Forward Fig. 5 to, label 500 always shows and is used for illustrative methods that the reference picture management data of multi-view video coding are decoded.

Method 500 comprises beginning frame 502, and beginning frame 502 passes control to functional block 504.504 couples of view_id from sequence parameter set (SPS), picture parameter set (PPS), visual angle parameter set (VPS), slice header or network abstract layer (NAL) unit header of functional block resolve, and pass control to functional block 506.506 couples of mvc_coding_mode from SPS, PPS, NAL head unit, slice header or supplemental enhancement information (SEI) message of functional block resolve the preferential or visual angle priority encoding with instruction time, and pass control to functional block 508.Functional block 508 is resolved other SPS parameters, and passes control to judgement frame 510.Judgement frame 510 judges whether mvc_coding_mode equals 1.If then control is delivered to functional block 512.Otherwise control is delivered to judgement frame 514.

Functional block 512 is resolved implicit_marking, and passes control to judgement frame 514.Judgement frame 514 judges whether present image needs decoding.If then control is delivered to functional block 528.Otherwise control is delivered to functional block 546.

Functional block 528 is resolved slice header, and passes control to functional block 530.Functional block 530 is resolved macro block mode, motion vector and ref_idx, and passes control to functional block 532.Functional block 532 decoding current macro (MB), and pass control to judgement frame 534.Judgement frame 534 judges whether to have finished all macro blocks.If then control is passed to functional block 536.Otherwise control is returned to functional block 530.

Functional block 536 is inserted into present image in decoded picture buffer (DPB), and passes control to judgement frame 538.Judgement frame 538 judges whether implicit_marking equals 1.If then control is passed to judgement frame 540.Otherwise control is passed to judgement frame 544.

Judgement frame 540 need to judge whether current reference viewing angle as being used for the reference at visual angle in the future based on the dependency information with high-level indication.If then control is passed to judgement frame 544.Otherwise control is passed to functional block 542.

Judgement frame 544 judges whether that all images is all decoded.If then control is passed to end block 599.Otherwise control is returned to functional block 546.

Functional block 546 obtains next image, and control is returned to judgement frame 514.

Fig. 5 provides not only can be with visual angle priority encoding data but also the decoder implementation that can use with the time priority coded data.

Forward Fig. 6 to, label 600 is always indicated the illustrative methods of the interval view angle dependency that is used for definite multi-angle video content.In one embodiment, method 600 is implemented by encoder (for example encoder 100 of Fig. 1).

Method 600 comprises beginning frame 602, and beginning frame 602 passes control to functional block 604.Functional block 604 reads encoder configuration file, and passes control to functional block 606.Functional block 606 is provided with grappling and non-grappling image reference in sequence parameter set (SPS) expansion, and passes control to functional block 608.Functional block 608 is provided with other SPS parameters based on this encoder configuration file, and passes control to judgement frame 610.Judgement frame 610 judges whether current (grappling/non-grappling) image is time reference.If then control is delivered to functional block 612.Otherwise control is delivered to functional block 624.

Functional block 612 nal_ref_idc are set to equal 1, and pass control to judgement frame 614.Judgement frame 614 judges based on the SPS grammer whether current visual angle is used as the reference at any other visual angle.If then control is passed to functional block 616.Otherwise control is passed to functional block 626.

Functional block 616 is labeled as visual angle reference picture at interval with present image, and passes control to judgement frame 618.Judgement frame 618 judges whether nal_ref_idc equals 0.If then control is passed to judgement frame 620.Otherwise control is passed to judgement frame 630.

Judgement frame 620 judges whether present image is visual angle, interval reference picture.If then control is passed to functional block 622.Otherwise control is passed to functional block 628.

Functional block 622 present images only are set to visual angle reference picture at interval, and pass control to end block 699.

Functional block 624 nal_ref_idc are set to equal 0, and pass control to judgement frame 614.

Functional block 626 is labeled as present image and is not used in visual angle, any interval reference picture, and passes control to judgement frame 618.

Functional block 628 present images are set to be not used in reference, and pass control to end block 699.

Judgement frame 630 judges whether present image is visual angle, interval reference picture.If then control is passed to functional block 632.Otherwise control is passed to functional block 634.

Functional block 632 present images are set to time and visual angle, interval reference picture, and pass control to end block 699.

Functional block 634 present images are set to only time reference, and pass control to end block 699.

Forward Fig. 7 to, label 700 has always been indicated the illustrative methods of the interval view angle dependency that is used for definite video content of many time.In one embodiment, method 700 is implemented by decoder (for example decoder 200 of Fig. 2).

Method 700 comprises beginning frame 702, and beginning frame 702 passes control to functional block 704.Functional block 70 reads sequence parameter set (SPS) (reading the view angle dependency structure), picture parameter set (PPS), network abstract layer (NAL) head and slice header, and passes control to judgement frame 706.Judgement frame 706 judges based on the SPS grammer whether current visual angle is used as the reference at any other visual angle.If then control is delivered to functional block 708.Otherwise control is delivered to functional block 716.

Functional block 708 is labeled as visual angle reference picture at interval with present image, and passes control to judgement frame 710.Judgement frame 710 judges whether nal_ref_idc equals 0.If then control is delivered to judgement frame 712.Otherwise control is delivered to judgement frame 720.

Judgement frame 712 judges whether present image is visual angle, interval reference picture.If then control is passed to functional block 714.Otherwise control is passed to functional block 718.

Functional block 714 present images only are set to visual angle reference picture at interval, and pass control to end block 799.

Functional block 718 present images are set to be not used in reference, and pass control to end block 799.

Functional block 716 is labeled as present image and is not used in visual angle reference picture at interval, and passes control to judgement frame 710.

Judgement frame 720 judges whether present image is visual angle, interval reference picture.If then control is passed to functional block 722.Otherwise control is passed to functional block 724.

Functional block 722 present images are set to time and visual angle, interval reference picture, and pass control to end block 799.

Functional block 724 present images are set to only time reference, and pass control to end block 799.

Forward Fig. 8 to, label 800 is always indicated the high-level view of the example encoder that principle of the present invention may be used on.

Encoder 800 comprises high-level syntax's maker 810, and the output of high-level syntax's maker 810 links to each other with the input of vedio data encoder 820 to carry out the signal transmission.The output of vedio data encoder 820 can be used as the output of encoder 800, is used for output bit flow and exports one or more high level syntax element with this bit stream in band alternatively.The output of high-level syntax's maker 810 also can be used as the output of encoder 800, is used for respect to this bit stream one or more high level syntax element of output outside band.The input of the input of vedio data encoder and high-level syntax's maker 810 can be used as the input of encoder 800, is used to receive inputting video data.

High-level syntax's maker 810 is used to generate one or more high level syntax element.As mentioned above, employed here " high-level syntax " is meant the grammer that exists in bit stream, and it is positioned on the macroblock layer in hierarchy.For example, used here high-level syntax can refer to that (but being not limited to) is in the slice header rank, in supplemental enhancement information (SEI) rank, in picture parameter set (PPS) rank, in sequence parameter set (SPS) rank with at other grammer of network abstract layer (NAL) unit header level.Vedio data encoder 820 is used for coding video data.

Forward Fig. 9 to, label 900 has always been indicated the high-level view of the exemplary decoder that principle of the present invention may be used on.

Decoder 900 comprises high-level syntax's reader 910, and the output of high-level syntax's reader 910 links to each other with the input of video data decoder 920 to carry out the signal transmission.The output of video data decoder 920 can be used as the output of decoder 900, is used for output image.The input of video data decoder 920 can be used as the output of decoder 900, is used to receive bit stream.The input of high-level syntax's reader 910 can be used as the input of decoder 900, is used for receiving one or more high level syntax element with respect to bit stream outside band alternatively.

Video data decoder 920 is used for decode video data, comprises reading high-level syntax.Therefore, if receive the band inner syntax in bit stream, then video data decoder 920 fully can decoded data, comprises and reads high-level syntax.If sent the outer high-level syntax of band, then this grammer can be received by high-level syntax's reader 910 (or directly by video data decoder 920).

With reference to Figure 10, show process 1000.Process 1000 comprises visit data (1010) and determines dependence (1020) based on the data of being visited.In a kind of specific implementation mode, the data of accessed (1010) comprise from the image at first visual angle, from the image and the dependency information at second visual angle.This dependency information is described the one or more intervals view angle dependency relation from the image at first visual angle.For example, can to describe the image from first visual angle be the reference picture that is used for from the image at second visual angle to this dependency relationships.In this specific implementation mode, determine (1020) dependence comprise that whether relevant image from first visual angle is the judgement that is used for from the reference picture of the image at second visual angle.

With reference to Figure 11, show process 1100.Process 1100 comprises visit data (1110), and decoded picture (1120) is stored through decoded picture (1130), and deletes the image of being stored (1140).In a kind of specific implementation mode, the data of accessed (1010) comprise from the image at first visual angle and dependency information.This dependency information is described the one or more intervals view angle dependency relation from the image at first visual angle.For example, can to describe the image from first visual angle be not the reference picture that is used for the not decoded as yet any image with identical image sequential counting to this dependency information.In this specific implementation mode, decoded in operation 1120 from the image at first visual angle, and in operation 1130, be stored in the memory.In this specific implementation mode, that is stored is deleted (1140) based on dependency information from memory through decoded picture.For example, can to indicate the image from first visual angle be not the reference picture that is used for the not decoded as yet any image with identical image sequential counting to this dependency information.In this case, no longer need as with reference to image from the image at first visual angle, and can from memory, delete.

Be also noted that in another kind of implementation operation 1110-1130 is not optionally comprised.That is, a kind of implementation is executable operations 1140.Perhaps, operation 1110-1130 can be carried out by an equipment, can be carried out by separate equipment and operate 1140.

Notice that term " encoder " and " decoder " mean general structure, and are not limited to any specific function or feature.For example, decoder can receive and carry should decoding through modulation bit stream and to this bit stream through modulated carrier, demodulation of encoded bit stream.

In addition, in several implementations, go back with reference to the high-level syntax that is used for sending customizing messages.Yet should be appreciated that other implementations can use low level syntax or in fact provide identical information (the perhaps variant of this information) with other mechanism (for example, the part of this information as encoded data being sent).

In addition, several implementations are described to from memory " deletion " image.Term " deletion " comprises that any in the multiple action, these actions have for example deleted image, removal of images, delete image, remove image or no longer with reference to piece image or make the effect of the unavailable or inaccessible of this image from tabulation.For example, the memory that can be associated with image by not reallocating also returns to this memory operating system or memory is returned to memory pool, thereby can " delete " this image.

Various implementations have been described piece image can depend on another width of cloth image (reference picture).This dependence can be based on one of several variants of " reference picture ".For example, image can be formed this image and not encode original reference image or the difference between decoded reference pictures.In addition, no matter which variant of reference picture is used as the basis that given image is encoded, decoder can use actual available variant.For example, decoder only can be visited the not reference picture of perfect decoding.Term " reference picture " will comprise the multiple possibility of existence.

Implementation described here can be implemented in for example a kind of method or process, device or the software program.Even only carried out discussing (for example, only coming into question as method) in a kind of context of implementation of form, the implementation of described feature also can realize (for example, device or program) with other forms.Device can for example be embodied as the form of suitable hardware, software and firmware.During method can be implemented in and for example install, this device for example was a processor, and processor refers generally to processing apparatus, comprises for example computer, microprocessor, integrated circuit or programmable logic device.Processing apparatus also comprises communication equipment, for example computer, cell phone, portable/personal digital assistant (" PDA ") and other equipment that can realize the information communication between the end user.

The various processes described here and the implementation of feature can be implemented in various equipment or the application, especially for example with digital coding and decoding associated device or application.The example of equipment comprises video encoder, Video Decoder, Video Codec, web server, set-top box, kneetop computer, personal computer, cell phone, PDA and other communication equipments.Should be understood that equipment can move, even is installed in the mobile traffic.

In addition, these methods can utilize the instruction of being carried out by processor to realize, and this instruction can be stored in the processor readable medium, processor readable medium for example is integrated circuit, software carrier or other memory devices, for example hard disk, compact disk, random access memory (" RAM ") or read-only memory (ROM ").Instruction can form the application program that visibly is included in the processor readable medium.Should be understood that processor can comprise the processor readable medium with the instruction that for example is used for implementation.This application program can be uploaded to the machine that comprises any suitably architecture and be carried out by this machine.Preferably, this machine can realize that described hardware for example is one or more CPU (" CPU "), random access memory (" RAM ") and I/O (" I/O ") interface having on the computer platform of hardware.Computer platform also can comprise operating system and micro-instruction code.Various process described herein and function can be the parts of micro-instruction code, perhaps can be the parts of application program, or its any combination, and it can be carried out by CPU.In addition, various other peripheral cells can be connected to computer platform, for example extra data storage cell and print unit.

Those skilled in the art will be further appreciated that implementation can also produce format and carry and can for example be stored or the signal of information transmitted.Information can comprise a kind of data that produce of the instruction that for example is used for manner of execution or implementation described here.Sort signal can be formatted as for example electromagnetic wave (for example, utilizing the radio frequency part of frequency spectrum) or baseband signal.Format for example can comprise encoded data stream, produce grammer and utilize encoded data flow and grammer to come modulated carrier.Signal institute loaded information can be an analog or digital information for example.Can be by multiple known, different wired or wireless link transmission signals.

Also will be understood that, because it is some construction system assembly illustrated in the accompanying drawings and method preferably realize with software, different so the actual connection between system component or the process function frame may be depended on mode that principle of the present invention is programmed.The given instruction here, those skilled in the art can expect these and similarly implementation or configuration of principle of the present invention.

Multiple implementation has been described.Yet, will understand and can make various modifications.For example, the element of different implementations can be combined, replenishes, revises or delete and produce other implementations.In addition, it will be appreciated by those skilled in the art that, other structures and process replaceable disclosed those, and the implementation that obtains will carry out essentially identical at least function in essentially identical mode at least, realizes and the essentially identical at least result of disclosed implementation.Particularly, although illustrative embodiment has been described with reference to the drawings, will be understood that principle of the present invention is not limited to these accurate embodiment, and those skilled in the art can realize various changes and modification, and do not break away from the scope or the spirit of principle of the present invention.All such changes and modifications all are intended to be included in the scope of principle of the present invention given in the claim.

Claims (according to the modification of the 19th of treaty)

1. an equipment (100,200,800,900) is configured to:

Visit (1010) is from the image at first visual angle, from the image and the dependency information at second visual angle, and described dependency information is described one or more intervals view angle dependency relation of described image from first visual angle, and

Judge based on described dependency information whether (1020) described image from first visual angle is the reference picture that is used for from the image at second visual angle.

2. equipment as claimed in claim 1, wherein said equipment comprises encoder, and visit comprises described image and described image from second visual angle from first visual angle encoded, and described dependency information is formatd.

3. equipment as claimed in claim 1, wherein said equipment comprises decoder, and visit comprises the described image from first visual angle of reception, described image and described dependency information from second visual angle.

4. a method (1000) comprising:

Judge that based on described dependency information whether (1020) described image from first visual angle is the reference picture that is used for described image from second visual angle.

5. method as claimed in claim 4, wherein visit comprises described image and described image from second visual angle from first visual angle is encoded, and described dependency information is formatd.

6. method as claimed in claim 4, wherein judgement is to be carried out by described encoder during the restructuring procedure of being carried out by encoder.

7. method as claimed in claim 4, wherein visit comprises the described image from first visual angle of reception, described image and described dependency information from second visual angle.

8. method as claimed in claim 4, wherein said dependency information comprises high level syntax element.

Claims

1. an equipment (100,200,800,900) is configured to:

4. a method (1000) comprising:

7. method as claimed in claim 7, wherein visit comprises the described image from first visual angle of reception, described image and described dependency information from second visual angle.

9. method as claimed in claim 8, wherein:

Described high level syntax element comprises the sequence parameter set data, and

Judge whether described image from first visual angle is that reference picture comprises described sequence parameter set data evaluation.

10. method as claimed in claim 4, the dependency information of wherein said image from first visual angle is included in the syntactic element of the following content of indication: the number of the grappling reference of (1) described image from first visual angle, (2) number of the non-grappling reference of described image from first visual angle, (3) the visual angle number of the grappling reference of described image from first visual angle, and the visual angle number of the non-grappling reference of (4) described image from first visual angle.

11. method as claimed in claim 4 also comprises and judges whether described image from first visual angle is the reference picture that is used for from another image at first visual angle.

12. method as claimed in claim 4 also comprises:

Based on described dependency information, judge described image from first visual angle whether be used for not decoded as yet at the decoder place, from the reference picture of any image at another visual angle.

13. method as claimed in claim 12, wherein:

Determine described image from first visual angle be not be used for not decoded as yet, from the reference picture of any image at another visual angle, and

This method also comprises described image tagged from first visual angle for no longer needing as visual angle reference picture at interval.

14. method as claimed in claim 13 also comprises based on the described image from first visual angle of described tag delete.

15. method as claimed in claim 4 also comprises based on described dependency information and comes the described image from first visual angle of mark.

16. method as claimed in claim 15 also comprises based on the described image from first visual angle of described tag delete.

17. method as claimed in claim 4, wherein said image from first visual angle are anchor picture or non-anchor picture.

18. method as claimed in claim 8, wherein said high level syntax element are to the existing high level syntax element in the expansion of existing video encoding standard or the recommendation of existing video coding.

19. method as claimed in claim 8, wherein said high level syntax element are to the existing high level syntax element in the expansion that International Standards Organization/H.264 International Electrotechnical Commission's mpeg-4-the 10th part advanced video coding standard/international telecommunication union telecommunications sector recommends.

20. method as claimed in claim 4, wherein said dependency information be used to judge described image from first visual angle be only at interval the visual angle with reference to, only time reference, be used for visual angle reference at interval and time reference, still be not used in visual angle reference and time reference at interval.

21. method as claimed in claim 4, wherein said dependency information is included in visual angle, interval in sequence parameter set and the nal_ref_idc syntactic element with reference to indication, visual angle, described interval with reference to indication be combined judge described image from first visual angle be only at interval the visual angle with reference to, only time reference, be used for visual angle reference at interval and time reference, still be not used in visual angle reference and time reference at interval.

22. an equipment (100,200,800,900) comprising:

Be used to visit image from first visual angle, from the image at second visual angle and the device of dependency information, described dependency information is described one or more intervals view angle dependency relation of described image from first visual angle, and

Be used for judging that based on described dependency information whether described image from first visual angle is the device that is used for the reference picture of described image from second visual angle.

23. equipment as claimed in claim 22 also comprises:

Be used for storing described image from first visual angle, described from second visual angle image or at least one device of high level syntax element.

24. an equipment that comprises processor readable medium, this processor readable medium comprises the instruction that is stored in this processor readable medium, and this instruction is used for carrying out at least following steps: