WO2008145560A1

WO2008145560A1 - Method for selecting a coding data and coding device implementing said method

Info

Publication number: WO2008145560A1
Application number: PCT/EP2008/056149
Authority: WO
Inventors: Julien Haddad; Olivier Le Meur; Philippe Guillotel
Original assignee: Thomson Licensing
Priority date: 2007-05-29
Filing date: 2008-05-20
Publication date: 2008-12-04
Also published as: FR2916931A1

Abstract

The invention relates to a method for selecting a coding data from a predefined set (E) of coding data. Said coding data being associated with a picture portion (Bi) with a view to its subsequent coding. The method comprises the following steps: determine (12) a subset (SEi) of the set (E) of coding data, and select (14) at least one coding data from the determined subset (SEi). According to an essential characteristic of the invention, the coding data subset (SEi) is determined (12) for the picture portion (Bi) according to a predetermined value (Si) representative of the perceptual interest of the picture portion (Bi), called perceptual interest value.

Description

METHOD FOR SELECTING A CODING DATA AND CODING DEVICE IMPLEMENTING SAID METHOD

1. Scope of the invention The invention relates to the general domain of video coding.

The invention relates, more particularly, to a method for selecting a coding data from a predefined set of coding data, said coding data being associated with a picture portion with a view to its subsequent coding. It also relates to a coding device of a sequence of pictures suited to implement said selection method.

2. Prior art

Video coders are known that are suitable to code pictures in INTRA mode, i.e. independently from the other pictures of the sequence and pictures in INTER mode, i.e. by temporal prediction from other pictures of the sequence, called reference pictures. In a picture divided into blocks of picture data (e.g. luminance data), each block is coded in INTRA mode if the picture is of the INTRA type and in INTRA mode or INTER mode if the picture is of the INTER type. The most recent video coding standards, e.g. MPEG-4 AVC, define several coding modes of the INTRA type and several coding modes of the INTER type. Figure 1 shows different INTER coding modes as defined in the document ISO/IEC 14496-10:2005 relating to the MPEG-4 AVC standard. Such video coders are suited to select, for a current block of index i, a coding mode mode, from a set E of K coding modes m_k. They are also suitable to generate, for this current block, a prediction block according to the selected coding mode mode,. The video coder is suitable to subtract the prediction block from the current block and to code, in the form of a stream of binary data, the residual data thus generated. Generally, the coding mode mode, is selected from the set E by means of a predefined criterion. This criterion is, for example, a bitrate/distortion type criterion. In this case, the video coder calculates, for the index block i and for each of the modes m_k of the set E, a value J,(m_k) equal to D,(m_k) +λ7R,(m_k), where R,(m_k) is the coding cost of the index block i coded according to the mode m_k and D,(m_k) is the distortion associated with the index block i coded according to the mode m_k then reconstructed. The video coder then selects from the set E, the coding mode mode, of the index block i such that mode^ argmin^, (m_k )) . Now, to add new coding modes to the set E, as is the m_keE case with the standard MPEG-4 AVC with respect to the standard MPEG2, enables the picture data of the index block i to be predicted more finely and thus enables the reconstruction quality of said block to be increased for a given coding cost, i.e. a given number of bits. Moreover, this enables the coding cost of said block to be reduced for a given reconstruction quality. However, the greater the number K of coding modes m_k in the set E, the greater the selection time of the coding mode mode, associated with the index block i, as the number of values J,(m_k) to calculate is greater. More generally, it is often necessary to select a coding data from a predefined set according to a given criteria before carrying out the coding itself of the index block i. Now, the more elements that comprise this set, the greater is the selection time of the coding data. This is notably problematic for the production of a real-time coding device. As previously illustrated, this coding data is the coding mode for example. It can also be a transform type, a reference picture number, etc.

3. Summary of the invention

The purpose of the invention is to compensate for at least one disadvantage of the prior art.

The invention relates to a method for selecting a coding data from a predefined set of coding data, said coding data being associated with a picture portion with a view to its subsequent coding. The method comprises the following steps:

- determine a subset of the coding data set, and

- select at least one coding data from the determined subset.

According to an essential characteristic of the invention, the coding data subset is determined for the picture portion according to a predetermined value representative of the perceptual interest of the picture portion, called perceptual interest value. Advantageously, by pre-selecting coding data, the invention reduces selection time of the coding data finally selected. Further, this pre-selection being carried out as a function of perceptual interest data, the reconstruction quality of the sequence is not degraded.

According to a characteristic of the invention, the coding data is a coding mode. According to another characteristic of the invention, the picture portion is a picture data block.

Advantageously, the predetermined value is a saliency value associated with the picture portion.

According to a characteristic of the invention, the subset is equal to the set if the perceptual interest value is greater than a predetermined threshold. If the perceptual interest value of the block is less than or equal to the predetermined threshold, the subset comprised the p coding modes of the set for which the selection probability is the highest, this probability having been determined beforehand for each coding mode of the set. According to a characteristic of the invention, the subset is equal to a first subset if the perceptual interest value is greater than a predefined threshold and is equal to a second subset different from the first subset if the perceptual interest value is less than the predefined threshold.

According to a particular characteristic of the invention, the first subset is equal to the set and the second subset comprises the p coding modes of the set for which the selection probability is the highest, this probability having been determined beforehand for each coding mode of the set.

The invention also relates to a coding device of a sequence of pictures, each picture being divided into picture data portions. The device comprises:

- selection means suitable to select, for each picture data portion, at least one coding data, and

- coding means suitable to code each of the picture data portions according to the coding data selected. According to an essential characteristic of the invention, the selection means comprise:

- means to determine, for each picture data portion, a subset of the set of coding data according to a predetermined value representative of the perceptual interest of the picture data portion, and - means to select the at least one coding data from the determined subset.

4. List of figures

The invention will be better understood and illustrated by means of embodiments and implementations, by no means limiting, with reference to the figures attached in the appendix, wherein:

- figure 1 shows different INTER coding modes according to the MPEG-4 AVC standard,

- figure 2 shows a selection method of a coding mode according to the invention,

- figure 3 illustrates a video coding device according to the invention, and

- figure 4 illustrates a video coding device according to a variant of the invention.

5. Detailed description of the invention

The invention described within the framework of the MPEG-4 AVC standard can be extended to any type of standard in which the selection of a coding data must be carried out. The invention described within the framework of the selection of a coding mode can be extended to the general case of the selection of a coding data within a set of predefined coding data. For example, the invention can be applied to the case of the selection of the number of reference pictures used to code a current picture of the INTER type. Likewise, it can be extended to the selection of a particular transform type.

With reference to figure 2, the invention relates to a selection method for each portion B₁ of a current picture divided into N picture portions of a coding data within a predefined set E comprising K coding data. According to a particular embodiment, the coding data is coding modes. According to particular characteristic of the invention, each picture portion B₁ is a picture data block. In the rest of the description B₁ is called block. In step 10, the index i of the block B₁ is initialised to zero.

In step 12, a subset SE₁ of the set E is determined for the block B₁ according to a predetermined value S₁ associated with the block B₁, this value being representative of the perceptual interest of the block B₁. In a particular embodiment, the subset SE₁ is equal to the set E if the value S₁ is greater than a predefined threshold T and the set SE₁ comprises the most probable p modes m_k of the set E otherwise, with p an integer belonging to [1 ; K] otherwise. In order to determine the most probable p modes of the set E, the K modes of the set E are ordered according to their selection probability that was calculated beforehand by coding statistics on a representative number of sequences. The p modes m_k for which the selection probability is the highest then form the sub-set SE₁ if S₁ < T. In the particular case of the INTRA modes defined by the standard MPEG-4 AVC in the section 8.3 of the document ISO/IEC 14496-10 (version 3), the most probable p modes of the set E can be determined by an analysis of the direction of the contours in block B₁. If the contours in the block B₁ are mostly oriented in the vertical direction then the p modes closest to the vertical direction are the most probable and form the subset SE₁, i.e. the vertical INTRA mode, vertical INTRA to the right and vertical INTRA mode to the left. Obviously, the invention is not limited by the manner in which the most probable p modes of the set E are determined.

According to a first variant, the subset SE₁ is equal to the set E if the value S₁ is greater than the predefined threshold T and the set SE₁ comprises p modes m_k of the set E, said p modes being selected according to the sub-block sizes that are associated with them. For example, if the current picture to which the block B₁ belongs is a picture of the INTER type and the set E comprises the coding modes shown in figure 1 , then if S₁ is less than or equal to T, the subset SE₁ comprises the coding modes associated with the greatest sub- block sizes, for example INTER16x16, INTER16x8 and INTER8x16. In this case, the other coding modes associated with the smaller sub-block sizes, i.e. INTER8x8, INTER8x4, INTER4x8, INTER4x4, do not belong to the subset SE₁. According to a second variant, the subset SE₁ is equal to the set E if the value S₁ is greater than the predefined threshold T and the set SE₁ comprises p modes m_k of the set E, said p modes being the ones that require the least calculation.

According to another variant, several thresholds can be defined. For example, if the value S₁ is greater than a first threshold defined T1 , then the subset SE₁ is equal to the set E, if the value S₁ is less than T1 and greater than a predefined threshold T2 then the set SE₁ comprises the most probable p modes m_k of the set E, and if the value S₁ is less than 12, then the set SE₁ comprises the most probable q modes m_k of the set E with q an integer less than or equal to p.

The value S₁ is determined beforehand for the block B₁ according to a method known by the prior art. Such a value S₁ is, for example, obtained by applying the method described in the patent application EP03293216.2 (published und the number 1544792). This method is suitable to generate a saliency map for the current picture. This saliency map is a topographical representation of the degree of saliency of each pixel of the current picture. This map is standardised for example between 0 and 1 but can also be between 0 and 255. The saliency map thus provides a saliency value S(x,y) per pixel (where (x,y) are the co-ordinates of a pixel of the picture), which characterizes the perceptual interest of this pixel. The higher the value of S(x,y), the more the pixel of co-ordinates (x,y) is relevant from a perceptual viewpoint. In order to obtain a saliency value S₁ per block B₁, the mean value of the saliency values S(x,y) associated with each of the pixels of B₁ is calculated for example. The median value can also be used instead of the mean value to represent the block B₁. According to this document, the saliency map is generated by applying the following steps:

- projection of the picture into a psycho-visual colour space according to the luminance component in the case of a monochrome picture and according to the luminance component and according to each one of its chrominance components in the case of a coloured picture; in the rest, it will be considered that the picture processed is a coloured picture,

- perceptual decomposition of the projected components into subbands (one luminance component and two chrominance components) in the frequency domain according to a visibility threshold of the human eye; the subbands are obtained by sharing the frequency domain according to the radial spatial frequency and the orientation (angular selectivity); each subband can be considered as the neuronal image corresponding to a population of visual cells aligned on a spatial frequency interval and a particular orientation, - extraction of the salient elements of the subbands relating to the luminance component and relating to each of the chrominance components, i.e. the most important information of the subbands.

- improvement of the contours of the salient elements in each subband relating to the luminance component and relating to each of the chrominance components,

- calculation of a saliency map for the luminance from the improved contours of the salient elements of each subband relating to the luminance component, - calculation of a saliency map for each of the chrominance components from the improved contours of the salient elements of each subband relating to the chrominance components, and

- generation of a final saliency map from the luminance and chrominance saliency maps. In step 14, the coding mode mode, associated with the block B, is selected from the subset SE, according to a criterion for example of the bitrate-distortion type. Advantageously, if the block B, is a block of which the value S, representative of the perceptual interest of the block is less than T, only the modes of the subset SE, are tested. In this case, the selection method calculates, for each of the modes m_k of the sub-set SE₁, the value J₁(ITIk) equal to D,(m_k) +λ7R,(m_k). The method selects from the subset SE₁, the coding mode mode, of the block such that modeF argmin^mJ) . The

selection of the coding mode mode, requires less calculation. The reconstruction quality can be slightly reduced for blocks with a low perceptual interest, i.e. such that S, < T, because all the coding modes are not tested for these blocks. However, this degradation does not disturb the human eye as it is produced in the zones of the picture of the least interest for the human eye. Moreover, the computation resources thus saved on the blocks of which the perceptual interest is low can be advantageously used to code the zones of high perceptual interest and for increasing the reconstruction quality. Indeed, the human eye is less sensitive to the degradation in the zones of which the perceptual interest is low than to degradations in the zones of which the perceptual interest is greater.

At step 16, the i index is incremented by 1.

At step 18, i is compared with N. If i is greater than or equal to N then the selection of the coding modes for the current picture is terminated 20, otherwise the method continues to step 12 with the next block.

With reference to figures 3 and 4, the invention relates to a coding device 30 and 40. Only the essential elements of the invention are shown in these figures. The elements that are well known by those skilled in the art of video coders are not shown, e.g. motion estimation module, motion compensation module, etc. In figures 3 and 4, the modules shown are functional units that may or may not correspond to physically distinguishable units. For example, these modules or some of them can be grouped together in a single component, or constitute functions of the same software. On the contrary, some modules may be composed of separate physical entities.

With reference to figure 3, the coding device 30 comprises a first input 300, a second input 302, an output 310, a selection module 304, a coding module 306 and a memory 308. The first input 300 is suitable to receive saliency values S₁ and the second input 302 is suitable to receive the picture data of block B₁. The selection module 304 is suitable to select, for each block B₁ received from the second input 302, a coding mode mode, according to the saliency value S₁ received from the first input 300. The selection module 304 is suited to implement the selection method of the invention. For this purpose, it comprises a unit 3040 suitable to determine, for the block B₁, a subset SE₁ of the set E according to the value S₁ of perceptual interest of said block B₁ in accordance with step 12 of the method and a unit 3042 connected to the unit 3040 suitable to select, in accordance with step 14 of the method, from the subset SE₁, the coding mode mode, finally retained to code the block B, subsequently. The unit 3042 is suitable to calculate for example the function of type of bitrate-distortion J,(m_k) and to carry out the selection of mode, from calculated values. The coding module 306 is suitable to code in binary form the picture data B, transmitted by the second input 302 according to the coding mode mode, transmitted by the selection module 304 and possibly according to picture data previously coded and reconstructed by said coding module 306 and stored in a memory 308, e.g. picture data belonging to a previously coded picture (temporal prediction) or to a block of the same previously coded picture (spatial prediction). The coding module 306 is linked to the output 310 of the coding device. The output 310 is suitable to transmit, e.g. to a decoding device or to a broadcast network, a bitstream F representative of the picture data received on the second input 302 and coded by the coding module. A variant of the coding device 30 is shown in figure 4. The shared elements of the two coding devices are identified by the same numerical references. The coding device 40 comprises a single input 302 suitable to receive the picture data from the blocks B₁. It further comprises a module 400 suitable to calculate for each bock B₁ a perceptual interest value S₁. This value S₁ is for example calculated according to the method described above for the selection method. In this variant, perceptual interest values S₁ are calculated directly by the coding device 40 from picture data received on the input 302.

Naturally, the invention is not limited to the embodiment examples mentioned above. In particular, the person skilled in the art may apply any variant to the stated embodiments and combine them to benefit from their various advantages. Notably, the invention described for coding data of this type of coding mode can be extended to the selection of any other type of coding data, notably a number of reference pictures, a type of transform, a size of search window for motion estimation, etc. In MPEG4 AVC, it is indeed possible to select the reference picture used for the prediction of a picture data block in a set of 5 reference pictures. According to the invention, it is possible to reduce the number of pictures to test for certain blocks of the picture, i.e. the blocks of which the perceptual interest value S₁ is less than or equal to T. Likewise, in the FRExt extension (standing for "Fidelity Range Extension") of MPEG4 AVC, it is possible to transform each block of a picture using a 4x4 by transform or else a 8x8 type transform. According to the invention, it is possible to limit this choice for the blocks of which the perceptual interest value Si is less than or equal to T. Moreover, the invention is neither limited by the type of saliency map generated, nor by the selection function that can be a function other than the J function described above.

Claims

1. Method of selection of a coding data from a predefined set of coding data (E), said coding data being associated with a picture portion with a view to its subsequent coding, said method comprising the following steps:

- determine (12) a subset (SE₁) of said set (E) of coding data,

- select (14) at least one coding data from said subset (SE₁), said method being characterized in that said subset (SE₁) is determined (12) for said picture portion (B₁) according to a predetermined value (S₁) representative of the perceptual interest of said picture portion (B₁), called perceptual interest value.

2. Method according to claim 1 , wherein at least one coding data is a coding mode.

3. Method according to claim 1 or 2, wherein said picture portion is a picture data block.

4. Method according to one of claims 1 to 3, wherein said predetermined value is a saliency value associated with said picture portion.

5. Method according to one of claims 1 to 4, wherein said subset (SE₁) is equal to a first subset if said perceptual interest value (S₁) is greater than a predefined threshold (T) and is equal to a second subset different from the first subset if said perceptual interest value (S₁) is less than or equal to said predefined threshold (T).

6. Method according to claim 5, wherein said first subset is equal to said set (E).

7. Method according to claim 5 or 6, wherein, said second subset comprises the p coding modes of said set (E) for which the selection probability is the highest, this probability having been determined beforehand for each coding mode of the set (E).

8. Coding device of a sequence of pictures, each picture being divided into portions of picture blocks (B₁), said device comprising:

- selection means (304) suitable to select, for each picture data portion (B₁), at least one coding data, and

- coding means (306) suitable to code each of the picture data portions (B₁) according to the coding data selected, said selection means (304) being characterized in that they comprise:

- means (3040) to determine, for each picture data portion (B₁), a subset (SE₁) of said set (E) of coding data according to a predetermined value (S₁) representative of the perceptual interest of said picture data portion (B₁), and

- means (3042) to select said at least one coding data from said subset (SE₁).