WO2009003885A2 - Video indexing method, and video indexing device - Google Patents

Video indexing method, and video indexing device Download PDF

Info

Publication number
WO2009003885A2
WO2009003885A2 PCT/EP2008/058050 EP2008058050W WO2009003885A2 WO 2009003885 A2 WO2009003885 A2 WO 2009003885A2 EP 2008058050 W EP2008058050 W EP 2008058050W WO 2009003885 A2 WO2009003885 A2 WO 2009003885A2
Authority
WO
WIPO (PCT)
Prior art keywords
interest
regions
picture
region
video data
Prior art date
Application number
PCT/EP2008/058050
Other languages
French (fr)
Other versions
WO2009003885A3 (en
Inventor
Sylvain Fabre
Régis Sochard
Pierre Laurent Lagalaye
Olivier Le Meur
Philippe Guillotel
Samuel Vermeulen
Original Assignee
Thomson Licensing
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Thomson Licensing filed Critical Thomson Licensing
Priority to JP2010513897A priority Critical patent/JP5346338B2/en
Priority to CN200880022001.9A priority patent/CN101690228B/en
Priority to EP08761351A priority patent/EP2174500A2/en
Priority to KR1020107002047A priority patent/KR101488548B1/en
Publication of WO2009003885A2 publication Critical patent/WO2009003885A2/en
Publication of WO2009003885A3 publication Critical patent/WO2009003885A3/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/4728End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for selecting a Region Of Interest [ROI], e.g. for requesting a higher resolution version of a selected region
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/59Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial sub-sampling or interpolation, e.g. alteration of picture size or resolution
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display

Definitions

  • the invention relates to a video indexing method, and a video indexing device,
  • ROI regions of interest
  • coding applications often decode regions of interest and deploy more resources for coding these regions.
  • the detection of regions of interest is today principally used prior to coding in a such a manner as to privilege the regions of interest during coding by according them more bandwidth, for example by reducing the quantization step for these regions.
  • the present invention is principally concerned not with the detection of regions of interest, but rather with the transmission of these regions of interest to the devices or applications that take them into account for different applications and can at least resolve the picture display problem on a terminal with a low display capacity, whether mobile or not.
  • the present invention proposes a method for indexing a coded video data stream.
  • the video data stream comprises information relative to the location of regions of interest of each picture, the method comprises steps of:
  • the selected regions of interest are recorded in a temporary memory as they are being selected and decoded, - when all the selected regions of interest are recorded in the temporary memory, the selected regions of interest are transferred to a permanent memory support (503).
  • the regions of interest are formatted in order to obtain a homogenous size for all the selected regions of interest.
  • the method comprises a step of encrypting the location of the regions of interest thanks to an encryption key.
  • the method comprises a step of obtaining a decryption key upon payment by the user.
  • the video data stream is coded according to the coding standard H.264/AVC and the location information is contained in a Supplemental Enhancement Information (SEI) type message.
  • SEI Supplemental Enhancement Information
  • the SEI messages are encapsulated into real-time protocol packets (RTP), the RTP packets being encrypted.
  • the Supplemental Enhancement Information type messages relative to regions of interest location information are inserted in the coded data before or after each picture to which they refer.
  • the location information comprises information chosen from:
  • the selection step of a region of interest per picture selects a region of interest according to the weight relative to the importance of the region of interest.
  • the video coding standard uses flexible macro-bloc ordering, the regions of interest being coded into slice groups, independently from the other picture data, the location information of regions of interest comprising the slice group numbers in which the regions of interest are coded.
  • the Supplemental Enhancement Information message comprises an identifier indicating for each slice group if it is related to one region of interest.
  • the method comprises a further step of reading the SEI messages and in that the step of decoding of video data decodes only the slice groups containing the region of interests.
  • the invention concerns also a device for indexing a coded video data stream.
  • the video data stream comprises information relative to the location of regions of interest of each picture
  • the device comprises means for:
  • the detection of the regions of interest of a picture is made in general prior to coding. This data is then used to facilitate the encoding.
  • the inventors realized that the location of regions of interest can also be of interest during the decoding of a picture and particularly during the display on a device whose display capacity is limited. In fact, the reception terminal can in fact choose to display the regions of interest only, which enables having a better visibility of these regions relative to the display of the complete picture.
  • FIG. 1 shows a coding device according to a preferred embodiment of the invention
  • - figure 2 shows a coding method according to a preferred embodiment of the invention
  • - figure 3 shows a decoding device according to a preferred embodiment of the invention
  • - figure 4 shows a decoding method according to another embodiment of the invention
  • - figure 5 shows a personal recording type device according to another embodiment of the invention
  • FIG. 6 shows an indexing method in a personal recording type device implementing an embodiment of the invention.
  • Figure 1 shows a coding device in accordance with the coding standard H.264/AVC implementing a preferred embodiment of the invention.
  • a video stream is coded.
  • a current frame F n is presented at the coder input to be coded by it.
  • This frame is coded in the form of slices, namely it is divided into sub-units which each contain a certain number of macroblocks corresponding to groups of 16x16 pixels.
  • Each macroblock is coded in intra or inter mode. Whether in intra mode or inter mode, a macroblock is coded by being based on a reconstructed frame.
  • a module 109 decides the coding mode in intra mode of the current picture, according to the content of the picture.
  • P shown in figure 2 comprises samples of the current frame Fn that were previously coded, decoded and reconstructed (uF'n on figure 2, u meaning non-filtered).
  • inter mode P is comprised from a motion estimation based on one or more F' n- i frames.
  • a motion estimation module 101 establishes an estimation of motion between the current frame Fn and at least one preceding frame F'n-1. From this motion estimation, a motion compensation module 102 produces a frame P when the current picture Fn must be coded in inter mode. A subtractor 103 produces a signal Dn, the difference between the picture Fn to be coded and the picture P. Then this picture is transformed by a DCT transform in a module 104. The transformed picture is then quantized by a quantization module 105. Then, the pictures are reorganized by a module 111.
  • a CABAC (Context-based Adaptive Binary Arithmetic Coding) type entropic coding module 112 then codes each picture.
  • the modules 106 and 107 respectively of quantization and inverse transformation enable a difference D'n to be reconstituted after transformation and quantization then inverse quantization and inverse transformation.
  • an intra prediction module 108 codes the picture.
  • a uF'n picture is obtained at the adder output 114, as is the sum of the D'n signal and the P signal.
  • This module 108 also receives at input the reconstructed non-filtered F'n picture.
  • a filter module 110 can obtain an F'n picture reconstructed and filtered from a uF'n picture.
  • the entropic decoding module 112 transmits the coded slices encapsulated in NAL type units.
  • the NALs contain, as well as the slices, information relating to the headers for example.
  • the NAL type units are transmitted to a module 113.
  • a module 116 enables the regions of interest to be determined.
  • the means 116 then establish a salience map for each picture of the video.
  • parameters entered by the user can also be taken into account. For example, it is possible to define, according to the event to which the video is related, certain important objects of the filmed scene and particularly for sporting events to specify that it concerns a football match.
  • this allows a salience map to be obtained that weights the salience zones according to the event. In a football match, it would be preferable to focus on the ball rather than on the terraces.
  • the region of interest module therefore enables one or more salient zones to be extracted, also referred to as regions of interest. These regions of interest are then geographically located on the picture. They are identified by their coordinates according to the height and width of the picture. Their size can also be extracted for each of the regions of interest. It is also possible to associate them with an element of semantic information. In fact for a football match, one may require information on a region of interest if the user can select the regions of interest to be displayed from a choice of several regions of interest to be displayed.
  • the module 115 receives information relating to the regions of interest in order to code them into an SEI ("Supplemental Enhancement Information") type message.
  • SEI Supplemental Enhancement Information
  • the SEI message is coded as indicated in the table below:
  • uuid_iso_iec_11578 single word of 128 bits to indicate our message type to the decoder.
  • user_data_payload_byte 8 bits comprising a part of the SEI massage.
  • payloadSize 17 (bytes) thus 16 for the UUID and 1 for the proprietary data.
  • number_of_ROI Number of regions of interest present in the picture
  • roi_x_16 Position X in the picture of the region of interest, in multiples of 16 pixels.
  • roi_y_16 Position Y in the picture of the region of interest, in multiples of 16 pixels.
  • roi_w_16 Width in the picture of the region of interest, in multiples of
  • semantic_information title characterizing the region of interest.
  • Relative weights gives the weight of each region of interest of the picture in such a way to know which region of interest that has in principle the most interest.
  • Macroblock_alignment gives the number of the starting macroblock in which the region of interest is found, as well as the size of the region of interest in number of macroblocks, in width and in height.
  • the regions of interest are classified as salient if their salience is higher than a certain threshold predetermined by the method for obtaining salience maps.
  • the regions of interest are classed in increasing order of salience for all regions where the salience is higher than a fixed threshold.
  • the module 113 inserts the SEI message into the data stream and sends the video stream thus coded to the transmission network.
  • An SEI message is transmitted before each picture to which it refers. In other embodiments, it is also possible to transmit the SEI message only when the location of at least one region of interest changes between two or more pictures. Hence, during decoding, the decoder takes into account the last SEI message received, whether it is immediately before the picture to be decoded or if it relates to a picture previously received if the current picture is not preceded by such an SEI message.
  • Figure 2 shows a coding method in accordance with the coding standard H.264/AVC implementing a preferred embodiment of the invention.
  • the salience map associated with the video to be broadcast is determined.
  • information relating to the video content can also be received to take account of this information during the establishment of the salience map.
  • the position of the ball corresponds to a region of interest for the user and in this case, privilege the zones of the picture in which the ball is situated.
  • the presenter corresponds to a region of interest, and in this case, determine the regions of interest by privileging the zones containing the presenter by detecting for example the face using known picture processing techniques.
  • one or more regions of interest relating to the video content are thus obtained.
  • the coordinates of the regions of interest in the pictures are determined.
  • the size of the regions of interest can also be determined in pixels and semantic information on the content can be associated with each region of interest.
  • the video stream is coded according to the coding standard H.264.
  • zones are privileged that were detected as regions of interest.
  • a lower quantization step is applied to them.
  • an SEI message is created from location and semantic information associated with the regions of interest.
  • the SEI message thus created is in accordance with the SEI message previously described in tables 1 and 2.
  • the stream is constituted by inserting SEI messages into the stream to obtain a coded stream according to the H.264 standard.
  • the video stream thus coded is transmitted to decoding devices in real time or in a deferred manner during a step E6, the decoding devices can be local or remote.
  • Figure 3 represents a preferred embodiment of a decoding device according to the invention, in accordance with the coding standard H.264/AVC.
  • a 209 module receives SEI messages at the input. It extracts the different SEI messages.
  • the NALs of useful data are transmitted to an entropic decoding module 201.
  • the SEI messages are analyzed by a module 210. This module enables decoding of the content of SEI messages representative of the regions of interest. The regions of interest of each picture are thus identified at the level of the decoding device in a simple manner and prior to the decoding of each picture using information contained in the field macroblock_alignment.
  • the macroblocks are transmitted to a re-ordering module 202 to obtain a set of coefficients. These coefficients undergo an inverse quantization in the module 203 and an inverse DCT transformation in the module 204 at the output of which D'n macroblocks are obtained, D'n being a deformed version of Dn.
  • a predictive block P is added to D'n, by an adder 205, to reconstruct a macroblock uF'n.
  • the block P is obtained after motion compensation, carried out by a module 208, of the preceding decoded frame, during a coding in inter mode or after intra prediction of the macroblock uF'n, by the module 207, in the case of coding in intra mode.
  • a filter 206 is applied to the signal uF'n to reduce the effects of the distortion and the reconstructed frame F'n is created from a series of macroblocks.
  • SEI messages the blocks representative of regions of interest are detected in the stream and prior to display, these blocks are identified and can be cropped according to the choice of the user and transmitted for display to a device such as a PDA, or mobile telephone.
  • this region of interest is displayed in zoom on the screen to take up the full screen.
  • the decoding device thus only decodes the macroblocks likely to contain information of interest to the user. In this way the decoding is faster and requires less resources at the level of the decoding device and therefore at reception. This is particularly advantageous when the receiving device is a mobile terminal comprising limited processing capacity.
  • Figure 4 shows a decoding method in accordance with the coding standard H.264/AVC implementing a preferred embodiment of the invention.
  • Such a method can be implemented in a mobile terminal having a limited display capacity.
  • step S1 the type of display required is selected.
  • the selection is made by means of the user interface present on the mobile terminal. Either it is decided to function in full picture mode and in this case the integrality of the video stream is displayed as it is transmitted by the transmitter. Or it is decided to display the only the regions of interest of the picture. This particular mode constitutes the particularity of the invention.
  • step S2 if not it passes to step S8.
  • step S8 it is understood that different types of SEI messages can be inserted into the video stream for other applications and in this case, prior to step S8 or during step S8, there can be a step of SEI message analysis.
  • the user selects the use that he wants to make of the regions of interest. Particularly, he can select: - the maximum number of regions of interest that he wants to display.
  • the regions of interest whose "semantic information" field comprises the keyword are also possible to specify whether it is required to display a single region of interest per picture comprising the keyword (and in this case those for which the salience is maximum) or several regions of interest comprising the key word.
  • the SEI messages present in the stream are analyzed as they are being received.
  • the SEI message is used to code the location of regions of interest of the picture as they were detected prior to the picture coding. Hence for each picture, there can be one or more regions of interest according to the visual properties of the picture or according to picture content or both.
  • the SEI message is coded according to the tables 1 and 2 previously described. Information relating to SEI messages is recorded temporally up until the display of the corresponding picture.
  • the pictures are all decoded in conformance with the decoding standard.
  • the decoded regions of interest are processed according to those that the user selected during the S2 step. If the user selects a zoom of the principle region of interest of the picture, then during step S6, the zone is magnified so as to reach the maximum size of display. If the user has selected a mosaic of regions of interest then the picture is recomposed of regions of interest, each being magnified according to the screen size and the number of regions of interest selected for display. If the user has specified a keyword, then the regions of interest comprising the keyword are displayed and zoomed. During a step S7, the regions of interest are displayed on the screen of the mobile terminal, according to the user's desire.
  • step a S8 following a non-selection by the user to display only the regions of interest, the entire video stream is decoded for display.
  • Figure 5 shows a video indexing application of the invention.
  • FIG. 5 partially shows a personal recorder (PVR) type device 500.
  • PVR personal recorder
  • the PVR 500 receives a compressed video stream at its input.
  • this video data stream is in accordance with the coding standard H.264.
  • the compressed video stream comprises particularly
  • This video data stream is partly transmitted to a recording support 503.
  • Recording support can be understood as hard disk, holographic support, memory card or "blue ray” disk. This recording support can be remote in other embodiments.
  • the video data stream is transmitted in another part to a decoder 501 to be decoded in real time, this for example to be displayed on a television set.
  • the stream is transmitted to the decoder 501 when the user wants to view it in real time. If not, it is not decoded but simply recorded, when recording is requested.
  • the present invention offers to decode part of the video data stream, even when viewing in real time is not requested.
  • a part of the video stream it is understood particularly the regions of interest or certain regions of interest.
  • the decoder 501 When the decoder 501 receives a video stream for which a recording is requested, the data is transmitted to the recording support 503.
  • the recording support 503 records the data as it is received.
  • the decoder 501 receives the video data stream and progressively decodes the SEI messages.
  • the decoded regions of interest are transmitted to the video indexing module 502 responsible for their temporary recording before transmitting them to the recording support 503.
  • Figure 6 illustrates the method implemented by the decoder 501 and the indexing module 502.
  • the video data stream is received by the decoder 501.
  • the decoder 501 decodes the SEI messages present in the video data stream.
  • the decoded SEI messages are SEI messages as previously described in the tables 1 and 2.
  • the decoder can also decode other SEI messages but that is not the object of the present invention.
  • Each SEI message can describe one or more regions of interest per picture as described in tables 1 and 2.
  • the decoder 501 analyzes each SEI message and decodes each picture. During this step, the weight indicated in the SEI message is used to select which region of interest will be recorded for each picture. In a preferred embodiment, the region of interest with the maximum of salience is kept, i.e having the highest weight.
  • the indexing module 502. decides which picture is used to index the video. According to the preferred embodiment described here, only about 10 pictures will be selected for a video of one and a half hours. It can be imagined that in other embodiments the number of pictures will be greater. These 10 pictures are taken at regular intervals. These selected pictures are recorded temporarily in a RAM type memory comprised in the indexing module 502 and not shown.
  • the pictures are zoomed during a step T5, that is they are enlarged so that they are all the same size.
  • this size can be the size of the picture.
  • they are read in the temporary memory and re-recorded after their enlargement.
  • the pictures are enlarged prior to their recording in the temporary memory.
  • the images are presented as a mosaic on the display. Therefore, instead of being enlarged, the images are reduced to one single size, the same for all of them.
  • the indexing pictures are also transferred from the temporary memory to the recording support 503 and recorded in a file.
  • the regions of interest are used for the indexation or can also be used for display on a PVR type device when the user wants to consult the content of the database.
  • this encrypting step in respect of figure 2, would be a step E4' (not shown) but inserted after the step E4.
  • Obtaining of the decryption key could be the object of a paid for service from the programme broadcaster for example.
  • the SEI messages relating to regions of interest are encapsulated in RTP (Real Time Protocol) type packets and transmitted on a different video port.
  • Temporal CTS type labels can link the SEI messages relating to regions of interest with corresponding pictures.
  • this transmission mode enables encrypting only RTP packets containing the SEI messages and not the video.
  • the decryption is carried out at the level of the terminal receiver.
  • the encrypting standard used is DVB-CSA and SEI messages relating to regions of interest are encapsulated in a different PID than that of the video.
  • the SEI messages relating to regions of interest are linked to corresponding pictures via the PTS (timestamp) of the PES packet header. This transmission mode allows encryption only of the PIDs that contain SEI messages relating to regions of interest and not the video PID.
  • the video data stream is coded in accordance with the coding standard H.264/AVC using FMO (Flexible Macroblock Ordering) which enables coding of different parts of the picture independently and so decoding of them independently.
  • FMO Flexible Macroblock Ordering
  • the FMO mode uses "slice groups".
  • the "slice groups” are defined in the standard.
  • the regions of interest are coded in groups different from the rest of the picture.
  • a PPS type NAL comprises a map of "slice groups”. SEI messages are inserted such as those described hereafter indicating in which "slice groups" the regions of interest are coded.
  • uuid_iso_iec_11578 single word of 128 bits to indicate our message type to the decoder.
  • user_data_payload_byte 8 bits comprising a part of the SEI message.
  • a semantic information For each slice_group representing a region of interest, a semantic information, a relative weight and which macroblock it concerns can be specified.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The invention relates to a method and a device for indexing a coded video data stream. According to the invention,the video data stream comprises information relative to the location of regions of interest of each picture, said method comprises steps of: reception (T1) of coded video stream, recording the coded video stream on a recording support, decoding (T2) location information of regions of interest, selection (T3) of a region of interest per picture, decoding (T3) of video data, selecting (T4) a predetermined number of regions of interest for the video data stream from among the regions of interest selected per picture, recording (T6) of the selected regions of interest.

Description

VIDEO INDEXING METHOD, AND VIDEO INDEXING DEVICE
FIELD OF THE INVENTION
The invention relates to a video indexing method, and a video indexing device,
BACKGROUND OF THE INVENTION
Several picture processing applications use the detection of regions of interest (ROI) to improve picture quality. For example, coding applications often decode regions of interest and deploy more resources for coding these regions.
Different methods enable detection of regions of interest in a picture.
Particularly, methods are known based on the establishment of salience maps of a picture or a video that take into account the visual parameters and enables definition of regions on which the human eye lingers when viewing a picture or a video.
The detection of regions of interest is today principally used prior to coding in a such a manner as to privilege the regions of interest during coding by according them more bandwidth, for example by reducing the quantization step for these regions.
The emergence of mobile terminals, such as mobile telephones, PDAs, game consoles, portable DVD players, the development of display and screen techniques and the emergence of new services have all combined to render' necessary the display of video on terminals with a low display capacity. For example, the possibility to receive television on a mobile telephone raises display problems for dense pictures on low dimension screens.
The present invention is principally concerned not with the detection of regions of interest, but rather with the transmission of these regions of interest to the devices or applications that take them into account for different applications and can at least resolve the picture display problem on a terminal with a low display capacity, whether mobile or not.
SUMMARY OF THE INVENTION
For this purpose, the present invention proposes a method for indexing a coded video data stream. According to the invention, the video data stream comprises information relative to the location of regions of interest of each picture, the method comprises steps of:
- reception of coded video stream,
- recording the coded video stream on a recording support, - decoding location information of regions of interest,
- selection of a region of interest per picture,
- decoding of video data,
- selecting a predetermined number of regions of interest for the video data stream from among the regions of interest selected per picture, - recording of the selected regions of interest.
According to a preferred embodiment, during the recording step,
- the selected regions of interest are recorded in a temporary memory as they are being selected and decoded, - when all the selected regions of interest are recorded in the temporary memory, the selected regions of interest are transferred to a permanent memory support (503).
Preferentially, prior to their recording the regions of interest are formatted in order to obtain a homogenous size for all the selected regions of interest.
Preferentially, the method comprises a step of encrypting the location of the regions of interest thanks to an encryption key.
Preferentially, the method comprises a step of obtaining a decryption key upon payment by the user.
Preferentially, the video data stream is coded according to the coding standard H.264/AVC and the location information is contained in a Supplemental Enhancement Information (SEI) type message. According to a preferred embodiment, the SEI messages are encapsulated into real-time protocol packets (RTP), the RTP packets being encrypted.
Preferentially, the Supplemental Enhancement Information type messages relative to regions of interest location information are inserted in the coded data before or after each picture to which they refer.
According to a preferred embodiment, the location information comprises information chosen from:
- the number of regions of interest in each picture,
- the coordinates of each region of interest for each of the picture dimensions,
- the surface of each region of interest, - a weight relative to the importance of the region of interest with respect to other regions of interest of the picture,
- information relating to the content of each region of interest, and any combination of this information.
Preferentially, the selection step of a region of interest per picture selects a region of interest according to the weight relative to the importance of the region of interest.
Preferentially, the video coding standard uses flexible macro-bloc ordering, the regions of interest being coded into slice groups, independently from the other picture data, the location information of regions of interest comprising the slice group numbers in which the regions of interest are coded.
Preferentially, the Supplemental Enhancement Information message comprises an identifier indicating for each slice group if it is related to one region of interest. Preferentially, the method comprises a further step of reading the SEI messages and in that the step of decoding of video data decodes only the slice groups containing the region of interests.
The invention concerns also a device for indexing a coded video data stream. According to the invention, the video data stream comprises information relative to the location of regions of interest of each picture, the device comprises means for:
- receiving the coded video stream, - recording the coded video stream on a recording support (503),
- decoding (501 ) location information of the regions of interest,
- decoding (501 ) video data,
- selecting (502) a region of interest per picture,
- selecting (502) a predetermined number of regions of interest for the video data stream from among the regions of interest selected per picture,
- recording (503) the selected regions of interest.
The detection of the regions of interest of a picture is made in general prior to coding. This data is then used to facilitate the encoding. The inventors realized that the location of regions of interest can also be of interest during the decoding of a picture and particularly during the display on a device whose display capacity is limited. In fact, the reception terminal can in fact choose to display the regions of interest only, which enables having a better visibility of these regions relative to the display of the complete picture.
BRIEF DESCRIPTION OF THE DRAWINGS
The invention will be better understood and illustrated by means of embodiments and implementations, by no means limiting, with reference to the figures attached in the appendix, wherein:
- figure 1 shows a coding device according to a preferred embodiment of the invention,
- figure 2 shows a coding method according to a preferred embodiment of the invention, - figure 3 shows a decoding device according to a preferred embodiment of the invention,
- figure 4 shows a decoding method according to another embodiment of the invention, - figure 5 shows a personal recording type device according to another embodiment of the invention,
- figure 6 shows an indexing method in a personal recording type device implementing an embodiment of the invention.
DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
Figure 1 shows a coding device in accordance with the coding standard H.264/AVC implementing a preferred embodiment of the invention. In this preferred embodiment, a video stream is coded.
A current frame Fn is presented at the coder input to be coded by it.
This frame is coded in the form of slices, namely it is divided into sub-units which each contain a certain number of macroblocks corresponding to groups of 16x16 pixels. Each macroblock is coded in intra or inter mode. Whether in intra mode or inter mode, a macroblock is coded by being based on a reconstructed frame. A module 109 decides the coding mode in intra mode of the current picture, according to the content of the picture. In intra mode, P (shown in figure 2) comprises samples of the current frame Fn that were previously coded, decoded and reconstructed (uF'n on figure 2, u meaning non-filtered). In inter mode, P is comprised from a motion estimation based on one or more F'n-i frames.
A motion estimation module 101 establishes an estimation of motion between the current frame Fn and at least one preceding frame F'n-1. From this motion estimation, a motion compensation module 102 produces a frame P when the current picture Fn must be coded in inter mode. A subtractor 103 produces a signal Dn, the difference between the picture Fn to be coded and the picture P. Then this picture is transformed by a DCT transform in a module 104. The transformed picture is then quantized by a quantization module 105. Then, the pictures are reorganized by a module 111. A CABAC (Context-based Adaptive Binary Arithmetic Coding) type entropic coding module 112 then codes each picture.
The modules 106 and 107 respectively of quantization and inverse transformation enable a difference D'n to be reconstituted after transformation and quantization then inverse quantization and inverse transformation.
When a picture is coded in intra mode, according to module 109, an intra prediction module 108 codes the picture. A uF'n picture is obtained at the adder output 114, as is the sum of the D'n signal and the P signal. This module 108 also receives at input the reconstructed non-filtered F'n picture. A filter module 110 can obtain an F'n picture reconstructed and filtered from a uF'n picture.
The entropic decoding module 112 transmits the coded slices encapsulated in NAL type units. The NALs contain, as well as the slices, information relating to the headers for example. The NAL type units are transmitted to a module 113.
A module 116 enables the regions of interest to be determined. Several methods now enable regions of interest to be located in a picture. Particularly known are methods based on the establishment of salience maps. For example the patent application WO2006/07263 filed in the name of
Thompson Licensing on the 10th January 2006 and published on 13th July 2006 discloses an effective method for establishing a salience map.
The means 116 then establish a salience map for each picture of the video. To establish this salience map, parameters entered by the user can also be taken into account. For example, it is possible to define, according to the event to which the video is related, certain important objects of the filmed scene and particularly for sporting events to specify that it concerns a football match. Advantageously, this allows a salience map to be obtained that weights the salience zones according to the event. In a football match, it would be preferable to focus on the ball rather than on the terraces.
The region of interest module therefore enables one or more salient zones to be extracted, also referred to as regions of interest. These regions of interest are then geographically located on the picture. They are identified by their coordinates according to the height and width of the picture. Their size can also be extracted for each of the regions of interest. It is also possible to associate them with an element of semantic information. In fact for a football match, one may require information on a region of interest if the user can select the regions of interest to be displayed from a choice of several regions of interest to be displayed.
The module 115 receives information relating to the regions of interest in order to code them into an SEI ("Supplemental Enhancement Information") type message.
The SEI message is coded as indicated in the table below:
Figure imgf000009_0001
Table 1
uuid_iso_iec_11578: single word of 128 bits to indicate our message type to the decoder. user_data_payload_byte: 8 bits comprising a part of the SEI massage.
Typically in this case: payloadSize = 17 (bytes) thus 16 for the UUID and 1 for the proprietary data. user_data_payload_byte:
Figure imgf000009_0002
Figure imgf000010_0001
Table 2
Where: number_of_ROI: Number of regions of interest present in the picture
(or the following pictures). roi_x_16: Position X in the picture of the region of interest, in multiples of 16 pixels. roi_y_16: Position Y in the picture of the region of interest, in multiples of 16 pixels. roi_w_16: Width in the picture of the region of interest, in multiples of
16 pixels. roi_h_16: Height in the picture of the region of interest, in multiples of
16 pixels. semantic_information: title characterizing the region of interest.
Relative weights: gives the weight of each region of interest of the picture in such a way to know which region of interest that has in principle the most interest.
Macroblock_alignment: gives the number of the starting macroblock in which the region of interest is found, as well as the size of the region of interest in number of macroblocks, in width and in height.
When regions of interest are detected using the salience maps, a rate of salience is obtained for each region of interest, the regions are classified as salient if their salience is higher than a certain threshold predetermined by the method for obtaining salience maps. Hence, in the SEI messages, the regions of interest are classed in increasing order of salience for all regions where the salience is higher than a fixed threshold. The module 113 inserts the SEI message into the data stream and sends the video stream thus coded to the transmission network.
An SEI message is transmitted before each picture to which it refers. In other embodiments, it is also possible to transmit the SEI message only when the location of at least one region of interest changes between two or more pictures. Hence, during decoding, the decoder takes into account the last SEI message received, whether it is immediately before the picture to be decoded or if it relates to a picture previously received if the current picture is not preceded by such an SEI message.
Figure 2 shows a coding method in accordance with the coding standard H.264/AVC implementing a preferred embodiment of the invention.
During a step E1 , the salience map associated with the video to be broadcast is determined. In order to determine this salience map that shows the regions of interest, information relating to the video content can also be received to take account of this information during the establishment of the salience map. Particularly, during a sporting event, it can be considered that the position of the ball corresponds to a region of interest for the user and in this case, privilege the zones of the picture in which the ball is situated. When the video corresponds to the broadcast of a televised report, it can also be assumed that the presenter corresponds to a region of interest, and in this case, determine the regions of interest by privileging the zones containing the presenter by detecting for example the face using known picture processing techniques.
At the end of the E1 step, one or more regions of interest relating to the video content are thus obtained.
During a step E2, the coordinates of the regions of interest in the pictures are determined. The size of the regions of interest can also be determined in pixels and semantic information on the content can be associated with each region of interest.
In parallel, during a step E3, the video stream is coded according to the coding standard H.264. During the coding, zones are privileged that were detected as regions of interest. In order to privilege the regions of interest at the coding level, a lower quantization step is applied to them.
Following step E2, during a step E4, an SEI message is created from location and semantic information associated with the regions of interest. The SEI message thus created is in accordance with the SEI message previously described in tables 1 and 2.
During a step E5, the stream is constituted by inserting SEI messages into the stream to obtain a coded stream according to the H.264 standard.
The video stream thus coded is transmitted to decoding devices in real time or in a deferred manner during a step E6, the decoding devices can be local or remote.
Figure 3 represents a preferred embodiment of a decoding device according to the invention, in accordance with the coding standard H.264/AVC.
A 209 module receives SEI messages at the input. It extracts the different SEI messages. The NALs of useful data are transmitted to an entropic decoding module 201. The SEI messages are analyzed by a module 210. This module enables decoding of the content of SEI messages representative of the regions of interest. The regions of interest of each picture are thus identified at the level of the decoding device in a simple manner and prior to the decoding of each picture using information contained in the field macroblock_alignment.
The macroblocks are transmitted to a re-ordering module 202 to obtain a set of coefficients. These coefficients undergo an inverse quantization in the module 203 and an inverse DCT transformation in the module 204 at the output of which D'n macroblocks are obtained, D'n being a deformed version of Dn. A predictive block P is added to D'n, by an adder 205, to reconstruct a macroblock uF'n. The block P is obtained after motion compensation, carried out by a module 208, of the preceding decoded frame, during a coding in inter mode or after intra prediction of the macroblock uF'n, by the module 207, in the case of coding in intra mode. A filter 206 is applied to the signal uF'n to reduce the effects of the distortion and the reconstructed frame F'n is created from a series of macroblocks.
Using information relating to the regions of interest comprised in the
SEI messages, the blocks representative of regions of interest are detected in the stream and prior to display, these blocks are identified and can be cropped according to the choice of the user and transmitted for display to a device such as a PDA, or mobile telephone.
It is also possible to leave the choice to the user to choose which macroblock he wants to display, by entering semantic information for example. He enters for example "ball" and in this case the regions of interest containing a ball are displayed. If no region of interest is associated with this semantic, then all the regions of interest can be displayed. The different regions of interest can be displayed in the form of a mosaic on the screen.
When a single region of interest is displayed, this region of interest is displayed in zoom on the screen to take up the full screen.
The decoding device thus only decodes the macroblocks likely to contain information of interest to the user. In this way the decoding is faster and requires less resources at the level of the decoding device and therefore at reception. This is particularly advantageous when the receiving device is a mobile terminal comprising limited processing capacity.
Figure 4 shows a decoding method in accordance with the coding standard H.264/AVC implementing a preferred embodiment of the invention.
Such a method can be implemented in a mobile terminal having a limited display capacity.
During a step S1 , the type of display required is selected. The selection is made by means of the user interface present on the mobile terminal. Either it is decided to function in full picture mode and in this case the integrality of the video stream is displayed as it is transmitted by the transmitter. Or it is decided to display the only the regions of interest of the picture. This particular mode constitutes the particularity of the invention. When it is decided to display the regions of interest, it passes to step S2, if not it passes to step S8. It is understood that different types of SEI messages can be inserted into the video stream for other applications and in this case, prior to step S8 or during step S8, there can be a step of SEI message analysis.
During a step S2, the user selects the use that he wants to make of the regions of interest. Particularly, he can select: - the maximum number of regions of interest that he wants to display.
- the manner in which he wants to display the various regions of interest on the screen, for example in the form of a mosaic,
- the degree of zoom that he wants on the region of interest.
- using a keyword, the regions of interest whose "semantic information" field comprises the keyword. In this case, for each picture, it is also possible to specify whether it is required to display a single region of interest per picture comprising the keyword (and in this case those for which the salience is maximum) or several regions of interest comprising the key word.
During a step S3, the SEI messages present in the stream, are analyzed as they are being received. The SEI message is used to code the location of regions of interest of the picture as they were detected prior to the picture coding. Hence for each picture, there can be one or more regions of interest according to the visual properties of the picture or according to picture content or both. The SEI message is coded according to the tables 1 and 2 previously described. Information relating to SEI messages is recorded temporally up until the display of the corresponding picture.
During a step S4, the pictures are all decoded in conformance with the decoding standard. During a step S5, the decoded regions of interest are processed according to those that the user selected during the S2 step. If the user selects a zoom of the principle region of interest of the picture, then during step S6, the zone is magnified so as to reach the maximum size of display. If the user has selected a mosaic of regions of interest then the picture is recomposed of regions of interest, each being magnified according to the screen size and the number of regions of interest selected for display. If the user has specified a keyword, then the regions of interest comprising the keyword are displayed and zoomed. During a step S7, the regions of interest are displayed on the screen of the mobile terminal, according to the user's desire.
During step a S8, following a non-selection by the user to display only the regions of interest, the entire video stream is decoded for display.
Figure 5 shows a video indexing application of the invention.
Figure 5 partially shows a personal recorder (PVR) type device 500.
The PVR 500 receives a compressed video stream at its input. According to the embodiment described, this video data stream is in accordance with the coding standard H.264. The compressed video stream comprises particularly
SEI messages as previously described in the tables 1 and 2.
This video data stream is partly transmitted to a recording support 503. Recording support can be understood as hard disk, holographic support, memory card or "blue ray" disk. This recording support can be remote in other embodiments.
The video data stream is transmitted in another part to a decoder 501 to be decoded in real time, this for example to be displayed on a television set. In the known devices, the stream is transmitted to the decoder 501 when the user wants to view it in real time. If not, it is not decoded but simply recorded, when recording is requested.
The present invention, according to this aspect, offers to decode part of the video data stream, even when viewing in real time is not requested. For a part of the video stream, it is understood particularly the regions of interest or certain regions of interest.
When the decoder 501 receives a video stream for which a recording is requested, the data is transmitted to the recording support 503. The recording support 503 records the data as it is received. In a simultaneous manner, the decoder 501 receives the video data stream and progressively decodes the SEI messages. The decoded regions of interest are transmitted to the video indexing module 502 responsible for their temporary recording before transmitting them to the recording support 503. Figure 6 illustrates the method implemented by the decoder 501 and the indexing module 502.
During a step T1 , the video data stream is received by the decoder 501. During a step 12, the decoder 501 decodes the SEI messages present in the video data stream. The decoded SEI messages are SEI messages as previously described in the tables 1 and 2. The decoder can also decode other SEI messages but that is not the object of the present invention. Each SEI message can describe one or more regions of interest per picture as described in tables 1 and 2. During a step T3, the decoder 501 analyzes each SEI message and decodes each picture. During this step, the weight indicated in the SEI message is used to select which region of interest will be recorded for each picture. In a preferred embodiment, the region of interest with the maximum of salience is kept, i.e having the highest weight.
Once the region of interest has been decoded, during a step T4, it is transmitted to the indexing module 502. The recording of a region of interest per picture, and this for all the pictures, is of little interest as it represents a large volume of information and also does not enable an efficient indexing of the video. Hence the indexing module decides which picture is used to index the video. According to the preferred embodiment described here, only about 10 pictures will be selected for a video of one and a half hours. It can be imagined that in other embodiments the number of pictures will be greater. These 10 pictures are taken at regular intervals. These selected pictures are recorded temporarily in a RAM type memory comprised in the indexing module 502 and not shown. In order to display them in the best manner, the pictures are zoomed during a step T5, that is they are enlarged so that they are all the same size. According to a preferred embodiment, this size can be the size of the picture. For that, they are read in the temporary memory and re-recorded after their enlargement. According to another embodiment, the pictures are enlarged prior to their recording in the temporary memory.
According to another embodiment, the images are presented as a mosaic on the display. Therefore, instead of being enlarged, the images are reduced to one single size, the same for all of them. When the entire video is received and so recorded in the recording support 503, during step T6, the indexing pictures are also transferred from the temporary memory to the recording support 503 and recorded in a file.
Then according to the desired use, the regions of interest are used for the indexation or can also be used for display on a PVR type device when the user wants to consult the content of the database.
According to another aspect of the invention, it is also possible to encrypt the location data of the regions of interest during the coding of SEI messages. Hence, only users having the decryption key can access the regions of interest and so access the visualization of regions of interest or indexing of video streams due to the location information of the regions of interest. This encrypting step, in respect of figure 2, would be a step E4' (not shown) but inserted after the step E4.
Obtaining of the decryption key could be the object of a paid for service from the programme broadcaster for example.
To do this, the SEI messages relating to regions of interest are encapsulated in RTP (Real Time Protocol) type packets and transmitted on a different video port. Temporal CTS type labels can link the SEI messages relating to regions of interest with corresponding pictures. Advantageously, this transmission mode enables encrypting only RTP packets containing the SEI messages and not the video.
The decryption is carried out at the level of the terminal receiver. In the case of an MPEG-2 TS encapsulation, the encrypting standard used is DVB-CSA and SEI messages relating to regions of interest are encapsulated in a different PID than that of the video. The SEI messages relating to regions of interest are linked to corresponding pictures via the PTS (timestamp) of the PES packet header. This transmission mode allows encryption only of the PIDs that contain SEI messages relating to regions of interest and not the video PID.
According to another embodiment, the video data stream is coded in accordance with the coding standard H.264/AVC using FMO (Flexible Macroblock Ordering) which enables coding of different parts of the picture independently and so decoding of them independently. The FMO mode uses "slice groups". The "slice groups" are defined in the standard. In this embodiment, the regions of interest are coded in groups different from the rest of the picture. A PPS type NAL comprises a map of "slice groups". SEI messages are inserted such as those described hereafter indicating in which "slice groups" the regions of interest are coded.
The tables below illustrate the format of the SEI message used according to this embodiment:
Figure imgf000018_0001
Table 3
uuid_iso_iec_11578: single word of 128 bits to indicate our message type to the decoder. user_data_payload_byte: 8 bits comprising a part of the SEI message.
Typically in this case:
• payloadSize = 17 (bytes) thus 16 for the UUID and 1 for the proprietary data.
• user_data_payload_byte:
Figure imgf000018_0002
Figure imgf000019_0001
Table 4
- Slice_group(i)_id: if the slice_group_id equals "1" then the slice_group represents an region of interest, if it equals "0" then the slice_group represents the rest of the picture.
For each slice_group representing a region of interest, a semantic information, a relative weight and which macroblock it concerns can be specified.
Hence, only macroblocks corresponding to the regions of interest can be decoded during reception as they are identified and coded independently.

Claims

Claims
1. Method for indexing a coded video data stream, characterized in that said video data stream comprises information relative to the location of regions of interest of each picture, said method comprises steps of:
- reception (T1 ) of coded video stream,
- recording the coded video stream on a recording support,
- decoding (T2) location information of regions of interest,
- selection (T3) of a region of interest per picture, - decoding (T3) of video data,
- selecting (T4) a predetermined number of regions of interest for the video data stream from among the regions of interest selected per picture,
- recording (T6) of the selected regions of interest.
2. Indexing method according to claim 1 characterized in that during the recording step,
- the selected regions of interest are recorded in a temporary memory as they are being selected and decoded, - when all the selected regions of interest are recorded in said temporary memory, the said selected regions of interest are transferred to a permanent memory support (503).
3. Indexing method according to one of claims 1 or 2, characterized in that prior to their recording said regions of interest are formatted in order to obtain a homogenous size for all the selected regions of interest.
4. Indexing method according to any of the previous claims characterized in that it comprises a step of encrypting the location of the regions of interest thanks to an encryption key.
5. Indexing method according to claim 4 characterized in that it comprises a step of obtaining a decryption key upon payment by the user.
6. Indexing method according to any of the previous claims characterized in that the video data stream is coded according to the coding standard H.264/AVC and the location information is contained in a Supplemental Enhancement Information(SEI) type message.
7. Indexing method according to claims 5 and 6 characterized in that said SEI messages are encapsulated into real-time protocol packets (RTP), said RTP packets being encrypted.
8. Indexing method according to one of claims 5 or 6, characterized in that said Supplemental Enhancement Information type messages relative to regions of interest location information are inserted in the coded data before or after each picture to which they refer.
9. Indexing method according to one of the preceding claims, characterized in that said location information comprises information chosen from:
- the number of regions of interest in each picture, - the coordinates of each region of interest for each of the picture dimensions,
- the surface of each region of interest,
- a weight relative to the importance of the region of interest with respect to other regions of interest of said picture, - information relating to the content of each region of interest, and any combination of this information.
10. Indexing method according to any of the previous claims, characterized in that said selection step (T3) of a region of interest per picture selects a region of interest according to the weight relative to the importance of said region of interest.
11. Indexing method according to any of claims 6 to 10 characterized in that the video coding standard uses flexible macro-bloc ordering, said regions of interest being coded into slice groups, independently from the other picture data, said location information of regions of interest comprising the slice group numbers in which the regions of interest are coded.
12. Indexing method according to claim 11 characterized in that Supplemental Enhancement Information message comprises an identifier indicating for each slice group if it is related to one region of interest.
13. Indexing method according to claim 12 characterized in that it comprises a further step of reading the SEI messages and in that the step of decoding of video data (T3) decodes only the slice groups containing the region of interests.
14. Device for indexing a coded video data stream, characterized in that said video data stream comprises information relative to the location of regions of interest of each picture, said device comprises means for:
- receiving the coded video stream, - recording the coded video stream on a recording support (503),
- decoding (501 ) location information of the regions of interest,
- decoding (501 ) video data,
- selecting (502) a region of interest per picture,
- selecting (502) a predetermined number of regions of interest for the video data stream from among the regions of interest selected per picture,
- recording (503) the selected regions of interest.
PCT/EP2008/058050 2007-06-29 2008-06-25 Video indexing method, and video indexing device WO2009003885A2 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
JP2010513897A JP5346338B2 (en) 2007-06-29 2008-06-25 Method for indexing video and apparatus for indexing video
CN200880022001.9A CN101690228B (en) 2007-06-29 2008-06-25 Video indexing method, and video indexing device
EP08761351A EP2174500A2 (en) 2007-06-29 2008-06-25 Video indexing method, and video indexing device
KR1020107002047A KR101488548B1 (en) 2007-06-29 2008-06-25 Video indexing method, and video indexing device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
FR0756181 2007-06-29
FR0756181 2007-06-29

Publications (2)

Publication Number Publication Date
WO2009003885A2 true WO2009003885A2 (en) 2009-01-08
WO2009003885A3 WO2009003885A3 (en) 2009-03-26

Family

ID=39204994

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2008/058050 WO2009003885A2 (en) 2007-06-29 2008-06-25 Video indexing method, and video indexing device

Country Status (5)

Country Link
EP (1) EP2174500A2 (en)
JP (1) JP5346338B2 (en)
KR (1) KR101488548B1 (en)
CN (1) CN101690228B (en)
WO (1) WO2009003885A2 (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010283508A (en) * 2009-06-03 2010-12-16 National Institute Of Information & Communication Technology Hologram encoder and hologram decoder, and hologram encoding program and hologram decoding program
US8358918B2 (en) 2009-06-24 2013-01-22 Kabushiki Kaisha Toshiba Video processing apparatus and video processing method
WO2014051992A1 (en) * 2012-09-25 2014-04-03 Intel Corporation Video indexing with viewer reaction estimation and visual cue detection
WO2015014773A1 (en) * 2013-07-29 2015-02-05 Koninklijke Kpn N.V. Providing tile video streams to a client
US9083954B2 (en) 2011-11-02 2015-07-14 Huawei Technologies Co., Ltd. Video processing method and system and related device
US20150237351A1 (en) * 2014-02-18 2015-08-20 Penne Lee Techniques for inclusion of region of interest indications in compressed video data
CN110073662A (en) * 2016-11-17 2019-07-30 英特尔公司 The suggestion viewport of panoramic video indicates
US10397666B2 (en) 2014-06-27 2019-08-27 Koninklijke Kpn N.V. Determining a region of interest on the basis of a HEVC-tiled video stream
US10674185B2 (en) 2015-10-08 2020-06-02 Koninklijke Kpn N.V. Enhancing a region of interest in video frames of a video stream
US10694192B2 (en) 2014-06-27 2020-06-23 Koninklijke Kpn N.V. HEVC-tiled video streaming
US10715843B2 (en) 2015-08-20 2020-07-14 Koninklijke Kpn N.V. Forming one or more tile streams on the basis of one or more video streams
US11523185B2 (en) 2019-06-19 2022-12-06 Koninklijke Kpn N.V. Rendering video stream in sub-area of visible display area

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10349077B2 (en) * 2011-11-21 2019-07-09 Canon Kabushiki Kaisha Image coding apparatus, image coding method, image decoding apparatus, image decoding method, and storage medium
CN103246658B (en) * 2012-02-03 2017-02-08 展讯通信(上海)有限公司 Index table building method and coding method
TWI527466B (en) 2012-04-13 2016-03-21 Ge影像壓縮有限公司 Low delay picture coding
TWI737990B (en) 2012-06-29 2021-09-01 美商Ge影像壓縮有限公司 Video data stream concept
US10390024B2 (en) 2013-04-08 2019-08-20 Sony Corporation Region of interest scalability with SHVC
US9532086B2 (en) 2013-11-20 2016-12-27 At&T Intellectual Property I, L.P. System and method for product placement amplification
US10582201B2 (en) * 2016-05-19 2020-03-03 Qualcomm Incorporated Most-interested region in an image
CN109644284B (en) * 2016-08-30 2022-02-15 索尼公司 Transmission device, transmission method, reception device, and reception method
CN108810600B (en) * 2017-04-28 2020-12-22 华为技术有限公司 Video scene switching method, client and server
US10771163B2 (en) * 2017-10-24 2020-09-08 Mediatek Inc. Apparatus and method for decoding ROI regions in image
CN111510752B (en) * 2020-06-18 2021-04-23 平安国际智慧城市科技股份有限公司 Data transmission method, device, server and storage medium
CN113747151B (en) * 2021-07-30 2024-04-12 咪咕文化科技有限公司 Video encoding and decoding method, device, equipment and computer readable storage medium
CN116074585B (en) * 2023-03-03 2023-06-23 乔品科技(深圳)有限公司 Super-high definition video coding and decoding method and device based on AI and attention mechanism

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020044696A1 (en) * 1999-11-24 2002-04-18 Sirohey Saad A. Region of interest high resolution reconstruction for display purposes and a novel bookmarking capability
EP1322104A1 (en) * 2001-11-30 2003-06-25 Eastman Kodak Company Method for selecting and recording a subject of interest in a still digital image
US20040095477A1 (en) * 2002-08-09 2004-05-20 Takashi Maki ROI setting method and apparatus, electronic camera apparatus, program, and recording medium
US6909745B1 (en) * 2001-06-05 2005-06-21 At&T Corp. Content adaptive video encoder
US20060045381A1 (en) * 2004-08-31 2006-03-02 Sanyo Electric Co., Ltd. Image processing apparatus, shooting apparatus and image display apparatus
US20060072838A1 (en) * 2000-10-12 2006-04-06 Chui Charles K Multi-resolution image data management system and method based on tiled wavelet-like transform and distinct bitstreams for distinct groups of bit planes
EP1748385A2 (en) * 2005-07-28 2007-01-31 THOMSON Licensing Method and device for generating a sequence of images of reduced size
US20070061862A1 (en) * 2005-09-15 2007-03-15 Berger Adam L Broadcasting video content to devices having different video presentation capabilities

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH07148155A (en) * 1993-11-26 1995-06-13 Toshiba Corp Computerized tomographic apparatus
JP2005110145A (en) * 2003-10-02 2005-04-21 Ricoh Co Ltd Code string converter, code string converting method, photographing system, image display system, monitoring system, program, and information recording
US7598977B2 (en) * 2005-04-28 2009-10-06 Mitsubishi Electric Research Laboratories, Inc. Spatio-temporal graphical user interface for querying videos
KR101255226B1 (en) * 2005-09-26 2013-04-16 한국과학기술원 Method and Apparatus for defining and reconstructing ROIs in Scalable Video Coding

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020044696A1 (en) * 1999-11-24 2002-04-18 Sirohey Saad A. Region of interest high resolution reconstruction for display purposes and a novel bookmarking capability
US20060072838A1 (en) * 2000-10-12 2006-04-06 Chui Charles K Multi-resolution image data management system and method based on tiled wavelet-like transform and distinct bitstreams for distinct groups of bit planes
US6909745B1 (en) * 2001-06-05 2005-06-21 At&T Corp. Content adaptive video encoder
EP1322104A1 (en) * 2001-11-30 2003-06-25 Eastman Kodak Company Method for selecting and recording a subject of interest in a still digital image
US20040095477A1 (en) * 2002-08-09 2004-05-20 Takashi Maki ROI setting method and apparatus, electronic camera apparatus, program, and recording medium
US20060045381A1 (en) * 2004-08-31 2006-03-02 Sanyo Electric Co., Ltd. Image processing apparatus, shooting apparatus and image display apparatus
EP1748385A2 (en) * 2005-07-28 2007-01-31 THOMSON Licensing Method and device for generating a sequence of images of reduced size
US20070061862A1 (en) * 2005-09-15 2007-03-15 Berger Adam L Broadcasting video content to devices having different video presentation capabilities

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
CHEN L-Q ET AL: "A VISUAL ATTENTION MODEL FOR ADAPTING IMAGES ON SMALL DISPLAYS" MULTIMEDIA SYSTEMS, ACM, NEW YORK, NY, US, vol. 9, no. 4, October 2003 (2003-10), pages 353-364, XP001196335 ISSN: 0942-4962 *
CHEN Z ET AL: "SEI for functional app" VIDEO STANDARDS AND DRAFTS, XX, XX, no. JVT-U059, 26 October 2006 (2006-10-26), XP030006705 *
HANNUKSELA M M ET AL: "H.264/AVC video for wireless transmission" IEEE WIRELESS COMMUNICATIONS, IEEE SERVICE CENTER, PISCATAWAY, NJ, US, vol. 12, no. 4, August 2005 (2005-08), pages 6-13, XP011137994 ISSN: 1536-1284 *
JUNQING YU ET AL: "Content-Based News Video Mining" ADVANCED DATA MINING AND APPLICATIONS; [LECTURE NOTES IN COMPUTER SCIENCE;LECTURE NOTES IN ARTIFICIAL INTELLIGENCE;LNCS], SPRINGER-VERLAG, BERLIN/HEIDELBERG, vol. 3584, 13 August 2005 (2005-08-13), pages 431-438, XP019012983 ISBN: 978-3-540-27894-8 *
LIU H XIE X ET AL: "Automatic Browsing of Large Pictures on Mobile Devices" PROCEEDINGS OF THE 11TH. ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA. MM'03. BERKELEY, CA, NOV. 4 - 6, 2003, ACM INTERNATIONAL MULTIMEDIA CONFERENCE, NEW YORK, NY : ACM, US, vol. CONF. 11, 2 November 2003 (2003-11-02), pages 1-8, XP002471139 ISBN: 1-58113-722-2 *
See also references of EP2174500A2 *
SMOLLAR S W ET AL: "CONTENT-BASED VIDEO INDEXING AND RETRIEVAL" IEEE MULTIMEDIA, IEEE SERVICE CENTER, NEW YORK, NY, US, vol. 12, 1 January 1994 (1994-01-01), pages 62-72, XP002921947 ISSN: 1070-986X *

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010283508A (en) * 2009-06-03 2010-12-16 National Institute Of Information & Communication Technology Hologram encoder and hologram decoder, and hologram encoding program and hologram decoding program
US8358918B2 (en) 2009-06-24 2013-01-22 Kabushiki Kaisha Toshiba Video processing apparatus and video processing method
US9083954B2 (en) 2011-11-02 2015-07-14 Huawei Technologies Co., Ltd. Video processing method and system and related device
WO2014051992A1 (en) * 2012-09-25 2014-04-03 Intel Corporation Video indexing with viewer reaction estimation and visual cue detection
US9247225B2 (en) 2012-09-25 2016-01-26 Intel Corporation Video indexing with viewer reaction estimation and visual cue detection
WO2015014773A1 (en) * 2013-07-29 2015-02-05 Koninklijke Kpn N.V. Providing tile video streams to a client
US10721530B2 (en) 2013-07-29 2020-07-21 Koninklijke Kpn N.V. Providing tile video streams to a client
EP3562170A1 (en) * 2013-07-29 2019-10-30 Koninklijke KPN N.V. Providing tile video streams to a client
KR20160032184A (en) * 2013-07-29 2016-03-23 코닌클리즈케 케이피엔 엔.브이. Providing tile video streams to a client
KR101879519B1 (en) * 2013-07-29 2018-07-17 코닌클리즈케 케이피엔 엔.브이. Providing tile video streams to a client
TWI569629B (en) * 2014-02-18 2017-02-01 英特爾公司 Techniques for inclusion of region of interest indications in compressed video data
CN105917649A (en) * 2014-02-18 2016-08-31 英特尔公司 Techniques for inclusion of region of interest indications in compressed video data
WO2015126545A1 (en) * 2014-02-18 2015-08-27 Intel Corporation Techniques for inclusion of region of interest indications in compressed video data
US20150237351A1 (en) * 2014-02-18 2015-08-20 Penne Lee Techniques for inclusion of region of interest indications in compressed video data
US10397666B2 (en) 2014-06-27 2019-08-27 Koninklijke Kpn N.V. Determining a region of interest on the basis of a HEVC-tiled video stream
US10694192B2 (en) 2014-06-27 2020-06-23 Koninklijke Kpn N.V. HEVC-tiled video streaming
US10715843B2 (en) 2015-08-20 2020-07-14 Koninklijke Kpn N.V. Forming one or more tile streams on the basis of one or more video streams
US10674185B2 (en) 2015-10-08 2020-06-02 Koninklijke Kpn N.V. Enhancing a region of interest in video frames of a video stream
CN110073662A (en) * 2016-11-17 2019-07-30 英特尔公司 The suggestion viewport of panoramic video indicates
CN110073662B (en) * 2016-11-17 2023-07-18 英特尔公司 Method and device for indicating suggested viewport of panoramic video
US11792378B2 (en) 2016-11-17 2023-10-17 Intel Corporation Suggested viewport indication for panoramic video
US11523185B2 (en) 2019-06-19 2022-12-06 Koninklijke Kpn N.V. Rendering video stream in sub-area of visible display area

Also Published As

Publication number Publication date
JP5346338B2 (en) 2013-11-20
KR20100042632A (en) 2010-04-26
CN101690228A (en) 2010-03-31
CN101690228B (en) 2012-08-08
KR101488548B1 (en) 2015-02-02
WO2009003885A3 (en) 2009-03-26
JP2010532121A (en) 2010-09-30
EP2174500A2 (en) 2010-04-14

Similar Documents

Publication Publication Date Title
KR101488548B1 (en) Video indexing method, and video indexing device
US10911786B2 (en) Image processing device and method
US9918108B2 (en) Image processing device and method
US7933327B2 (en) Moving picture coding method and moving picture decoding method
EP0895694B1 (en) System and method for creating trick play video streams from a compressed normal play video bitstream
JP4877852B2 (en) Image encoding apparatus and image transmitting apparatus
US7650032B2 (en) Method for encoding moving image and method for decoding moving image
US8081678B2 (en) Picture coding method and picture decoding method
US20010028725A1 (en) Information processing method and apparatus
US20060062299A1 (en) Method and device for encoding/decoding video signals using temporal and spatial correlations between macroblocks
US8750631B2 (en) Image processing device and method
US20060256853A1 (en) Moving picture encoding method and moving picture decoding method
KR100630983B1 (en) Image processing method, and image encoding apparatus and image decoding apparatus capable of employing the same
JPWO2013031315A1 (en) Image processing apparatus and image processing method
JP2006311079A (en) Image bit stream conversion apparatus
JPH1079941A (en) Picture processor
EP1926104A1 (en) Encoding device, decoding device, recording device, audio/video data transmission system
KR101012760B1 (en) System and Method for transmitting and receiving of Multi-view video
JP3519722B2 (en) Data processing method and data processing device
JP2006109060A (en) Blur correcting method and device using image coding information
RU2628198C1 (en) Method for interchannel prediction and interchannel reconstruction for multichannel video made by devices with different vision angles
JP2003284067A (en) Image relay system, image transmitter and its program, and image receiver and its program

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 200880022001.9

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08761351

Country of ref document: EP

Kind code of ref document: A2

WWE Wipo information: entry into national phase

Ref document number: 2008761351

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2010513897

Country of ref document: JP

NENP Non-entry into the national phase in:

Ref country code: DE

ENP Entry into the national phase in:

Ref document number: 20107002047

Country of ref document: KR

Kind code of ref document: A