WO2011138628A1 - Method and device for optimal playback positioning in digital content - Google Patents
Method and device for optimal playback positioning in digital content Download PDFInfo
- Publication number
- WO2011138628A1 WO2011138628A1 PCT/IB2010/001065 IB2010001065W WO2011138628A1 WO 2011138628 A1 WO2011138628 A1 WO 2011138628A1 IB 2010001065 W IB2010001065 W IB 2010001065W WO 2011138628 A1 WO2011138628 A1 WO 2011138628A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- video content
- frames
- tagged
- search area
- content
- Prior art date
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/76—Television signal recording
- H04N5/765—Interface circuits between an apparatus for recording and another apparatus
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/76—Television signal recording
- H04N5/78—Television signal recording using magnetic recording
- H04N5/782—Television signal recording using magnetic recording on tape
- H04N5/783—Adaptations for reproducing at a rate different from the recording rate
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/10—Indexing; Addressing; Timing or synchronising; Measuring tape travel
- G11B27/102—Programmed access in sequence to addressed parts of tracks of operating record carriers
- G11B27/105—Programmed access in sequence to addressed parts of tracks of operating record carriers of operating discs
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/10—Indexing; Addressing; Timing or synchronising; Measuring tape travel
- G11B27/19—Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier
- G11B27/28—Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier by using information signals recorded by the same method as the main recording
Definitions
- the present disclosure generally relates to digital content systems and digital video recording systems, and more particularly, to a method and device for optimal playback positioning in digital video content.
- DVR digital video recorder
- a method and device for optimal playback positioning in digital video content are provided.
- the present disclosure relates to a mechanism of tagging scenes or significant points in content in a prioritized way, and defines a mechanism to utilize this tagging associated with the content to facilitate stopping or starting at appropriate points for playback, e.g., when pressing a scene skip button to jump forward or back to another scene, or when pressing Play after inputting a fast-forward (FF) or rewind (Rew) instruction.
- FF fast-forward
- Rew rewind
- a method for determining an optimal playback position in video content including a plurality of frames.
- the method including, inter alia, displaying video content at a playback speed for viewing, receiving a first navigation instruction to navigate the video content at a speed faster than the playback speed for viewing, receiving a second navigation instruction to resume displaying the video content at the playback speed for viewing, and determining a playback position of the video content, in response to the second navigation instruction, based on at least one tagged frame of the video content.
- a device for playing back video content the video content including a plurality of frames.
- the device includes, inter alia, a video processor for providing video content at a playback speed for viewing to a display device, a user interface for receiving a first navigation instruction to navigate the video content at a speed faster than the playback speed for viewing and receiving a second navigation instruction to resume displaying the video content at the playback speed for viewing, and a controller coupled to the user interface for receiving the second navigation instruction, determining a playback position of the video content based on at least one tagged frame of the video content and providing the determined playback position to the video processor.
- FIG. 1 is a block diagram of an exemplary system for delivering video content in accordance with the present disclosure
- FIG. 2 is a block diagram of an exemplary set-top box/digital video recorder (DVR) in accordance with the present disclosure
- FIG. 3 is a flowchart of an exemplary method for playing back content in an environment when the content has been pre-tagged in accordance with the present disclosure
- FIG. 4 is a flowchart of an exemplary method for playing back content in an environment when the content is dynamically tagged in accordance with the present disclosure
- FIG. 5 is a flowchart of an exemplary method for playing back content and navigating the content with a scene skip function in accordance with the present disclosure
- FIG. 6 is a flowchart of an exemplary method for playing back content and navigating the content with a scene skip function in accordance with another embodiment the present disclosure.
- FIG. 7 illustrates a video playback timeline and how various zones are determined to be searched for tagged frames of video content in accordance with the present disclosure. It should be understood that the drawing(s) is for purposes of illustrating the concepts of the disclosure and is not necessarily the only possible configuration for illustrating the disclosure. DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
- the elements shown in the figures may be implemented in various forms of hardware, software or combinations thereof. Preferably, these elements are implemented in a combination of hardware and software on one or more appropriately programmed general-purpose devices, which may include a processor, memory and input/output interfaces.
- general-purpose devices which may include a processor, memory and input/output interfaces.
- the phrase "coupled" is defined to mean directly connected to or indirectly connected with through one or more intermediate components. Such intermediate components may include both hardware and software based components.
- any switches shown in the figures may be provided through the use of dedicated hardware as well as hardware capable of executing software in association with appropriate software.
- the functions may be provided by a single dedicated processor, by a single shared processor, or by a plurality of individual processors, some of which may be shared.
- explicit use of the term "processor” or “controller” should not be construed to refer exclusively to hardware capable of executing software, and may implicitly include, without limitation, digital signal processor (“DSP”) hardware, read only memory (“ROM”) for storing software, random access memory (“RAM”), and nonvolatile storage. Other hardware, conventional and/or custom, may also be included.
- any switches shown in the figures are conceptual only. Their function may be carried out through the operation of program logic, through dedicated logic, through the interaction of program control and dedicated logic, or even manually, the particular technique being selectable by the implementer as more specifically understood from the context.
- any element expressed as a means for performing a specified function is intended to encompass any way of performing that function including, for example, a) a combination of circuit elements that performs that function or b) software in any form, including, therefore, firmware, microcode or the like, combined with appropriate circuitry for executing that software to perform the function.
- the disclosure as defined by such claims resides in the fact that the functionalities provided by the various recited means are combined and brought together in the manner which the claims call for. It is thus regarded that any means that can provide those functionalities are equivalent to those shown herein.
- a method and device for optimal playback positioning in digital video content are provided.
- the present disclosure relates to a mechanism of tagging scenes or significant points in content in a prioritized way, and defines a mechanism to utilize this tagging associated with the content to facilitate stopping or starting at appropriate points for playback, e.g., when pressing a scene skip button to jump forward or back to another scene, or when pressing Play after inputting a fast-forward (FF) or rewind (Rew) instruction.
- FF fast-forward
- Rew rewind
- FIG. 1 a block diagram of an embodiment of a system 100 for delivering video content to the home or end user is shown.
- the content originates from a content source 102, such as a movie studio or production house.
- the content may be supplied in at least one of two forms.
- One form may be a broadcast form of content.
- the broadcast content is provided to the broadcast affiliate manager 104, which is typically a national broadcast service, such as the American Broadcasting Company (ABC), NBC, CBS, etc.
- the broadcast affiliate manager may collect and store the content, and may schedule delivery of the content over a deliver network, shown as delivery network 1 (106).
- Delivery network 1 (106) may include satellite link transmission from a national center to one or more regional or local centers.
- Delivery network 1 (106) may also include local content delivery using local delivery systems such as over the air broadcast, satellite broadcast, or cable broadcast.
- the locally delivered content is provided to a user's set top box/digital video recorder (DVR) 108 in a user's home.
- DVR digital video recorder
- Special content may include content delivered as premium viewing, pay-per-view, or other content otherwise not provided to the broadcast affiliate manager. In many cases, the special content may be content requested by the user.
- the special content may be delivered to a content manager 1 10.
- the content manager 1 10 may be a service provider, such as an Internet website, affiliated, for instance, with a content provider, broadcast service, or delivery network service.
- the content manager 1 10 may also incorporate Internet content into the delivery system.
- the content manager 1 10 may deliver the content to the user's set top box/digital video recorder 108 over a separate delivery network, delivery network 2 (1 12). Delivery network 2 (1 12) may include high-speed broadband Internet type communications systems.
- the content from the broadcast affiliate manager 104 may also be delivered using all or parts of delivery network 2 (1 12) and content from the content manager 1 10 may be delivered using all or parts of Delivery network 1 (106).
- the user may also obtain content directly from the Internet via delivery network 2 (1 12) without necessarily having the content managed by the content manager 1 10.
- the set top box/digital video recorder 108 may receive different types of content from one or both of delivery network 1 and delivery network 2.
- the set top box/digital video recorder 108 processes the content, and provides a separation of the content based on user preferences and commands.
- the set top box/digital video recorder may also include a storage device, such as a hard drive or optical disk drive, for recording and playing back audio and video content. Further details of the operation of the set top box/digital video recorder 108 and features associated with playing back stored content will be described below in relation to FIG. 2.
- the processed content is provided to a display device 1 14.
- the display device 1 14 may be a conventional 2-D type display or may alternatively be an advanced 3-D display.
- FIG.2 a block diagram of an embodiment of the core of a set top box/digital video recorder 200 is shown.
- the device 200 shown may also be incorporated into other systems including the display device 1 14 itself. In either case, several components necessary for complete operation of the system are not shown in the interest of conciseness, as they are well known to those skilled in the art.
- the content is received in an input signal receiver 202.
- the input signal receiver 202 may be one of several known receiver circuits used for receiving, demodulation, and decoding signals provided over one of the several possible networks including over the air, cable, satellite, Ethernet, fiber and phone line networks.
- the desired input signal may be selected and retrieved in the input signal receiver 202 based on user input provided through a control interface (not shown).
- the decoded output signal is provided to an input stream processor 204.
- the input stream processor 204 performs the final signal selection and processing, and includes separation of video content from audio content for the content stream.
- the audio content is provided to an audio processor 206 for conversion from the received format, such as compressed digital signal, to an analog waveform signal.
- the analog waveform signal is provided to an audio interface 208 and further to the display device 114 or an audio amplifier (not shown).
- the audio interface 208 may provide a digital signal to an audio output device or display device using an HDMI (High-Definition Multimedia Interface) cable or alternate audio interface such as via a SPDIF(Sony/Philips Digital Interconnect Format).
- the audio processor 206 also performs any necessary conversion for the storage of the audio signals.
- the video output from the input stream processor 204 is provided to a video processor 210.
- the video signal may be one of several formats.
- the video processor 210 provides, as necessary a conversion of the video content, based on the input signal format.
- the video processor 210 also performs any necessary conversion for the storage of the video signals.
- a storage device 212 stores audio and video content received at the input.
- the storage device 212 allows later retrieval and playback of the content under the control of a controller 214 and also based on commands, e.g., navigation instructions such as fast-forward (FF) and rewind (Rew), received from a user interface 216.
- the storage device 212 may be a hard disk drive, one or more large capacity integrated electronic memories, such as static random access memory, or dynamic random access memory, or may be an interchangeable optical disk storage system such as a compact disk drive or digital video disk drive.
- the converted video signal from the video processor 210, either originating from the input or from the storage device 212, is provided to the display interface 218.
- the display interface 218 further provides the display signal to a display device of the type described above.
- the display interface 218 may be an analog signal interface such as red-green-blue (RGB) or may be a digital interface such as high definition multimedia interface (HDMI).
- the controller 214 is interconnected via a bus to several of the components of the device 200, including the input stream processor 202, audio processor 206, video processor 210, storage device 212, and a user interface 216.
- the controller 214 manages the conversion process for converting the input stream signal into a signal for storage on the storage device or for display.
- the controller 214 also manages the retrieval and playback of stored content.
- the controller 214 is further coupled to control memory 220 (e.g., volatile or non-volatile memory, including random access memory, static RAM, dynamic RAM, read only memory, programmable ROM, flash memory, EPROM, EEPROM, etc.) for storing information and instruction code for controller 214.
- control memory 220 e.g., volatile or non-volatile memory, including random access memory, static RAM, dynamic RAM, read only memory, programmable ROM, flash memory, EPROM, EEPROM, etc.
- the implementation of the memory may include several possible embodiments, such as a single memory device or, alternatively, more than one memory circuit connected together to form a shared or common memory. Still further, the memory may be included with other circuitry, such as portions of bus communications circuitry, in a larger circuit.
- a method for controlling fast-forward (FF) and rewind (Rew) functions in a video recording device is described below.
- the physical implementation of the algorithm or function may be done in hardware, such as discrete circuitry related to the video processor 210, or software, such as software residing in the control memory 220 and read and execute by the controller 214.
- the method involves analyzing the content to recognize and tag important points in the content that may represent the starts of scenes or other important reference points. Then, under a number of circumstances, the device 200 will be capable of automatically determining the right position to jump to, based on several criteria.
- the analysis may be done prior to broadcast, on ingest to the device or at playback, though the preferred implementation is likely to be upon ingest to the device or when the content is written to disk.
- One practical example of the present disclosure is to make it simple for a user to easily start at the right point when pressing play after fast-forwarding through an ad (or advertisement) break or to easily rewind to the end of the previous ad break.
- the right start point, or playback position would be determined by looking at the speed of the FF or Rew, when play button is pressed, the controller 214 will examine recently passed "tagged" positions, and make a determination if any scene tags and at what priority have recently been passed, in effect determining the proximity to a previously or dynamically recognized scene transition points that represent a valid point to start playing.
- Black Reference Frame this could represent a significant marker (as black reference frames are used typically at the start and end of ad breaks), and if one has been recently been passed in the FF or REW, then this would be used as the start point.
- reference frames outside the regular intervals could also be tagged as less significant trigger points, as they may also represent the start of a scene.
- the speed of the FF/Rew function needs to be considered along with the user reaction time to determine the area in which to search the content for tags. If the FF/Rew speed is fast, the user may have passed several reference points between seeing where they wanted to start playing and playback will need to start from the appropriate one. At slower speeds, it is likely that the last reference point passed will be the appropriate starting point.
- the method and device of the present disclosure are predicated on having tags associated with the content so that when it is played back, information is available upon which to make a decision.
- This tag information could be obtained on one of three primary modes of operation.
- First, content could be pre-analyzed at the head end of the broadcast affiliate manager 104 or content manager 110 and have metadata broadcast along with it. This could be implemented by putting the tagging data as part of the SI data in the transport stream and sending the tagging data along with the content so there is no work at the DVR or device 200.
- Second, content could be analyzed and tagged as it flows in to the device 200 or as it is written to disk.
- Third, content could be analyzed dynamically upon playback and/or during trick mode operation so that reference points are created dynamically. For example, as a user fast-forwards or rewinds, the device is actually doing some frame analysis on either direction as the content is passing through.
- Each mode of tagging will now be further described.
- tagging will be performed at the headend before the content is transmitted over a delivery network. Broadcasters are unlikely to support the tagging of content (particularly as it relates to the potential of skipping adds) due to the potential loss of revenue. However, the concept of actually having this capability at the encoder itself presents other opportunities, as there are also other implications of being able to have scene detection. If scene tagging existed in the stream itself, several possibilities emerge including, for example, tagging preferred commercials to indicate they can't be skipped.
- the headend may not be relevant as the device 200 is likely to have a digital terrestrial tuner, so, like any other DVR, the device 200 is being fed content that it is processing on the fly.
- the headend may also be used to receive streamed, pre-prepared content.
- the headend may also be used to receive streamed, pre-prepared content.
- it may be an advantage to have some sort of enhanced scene detection within the film.
- the broadcaster might want to have content having a very long GOP (group of pictures), with a high maximum I- frame interval.
- having tagging done at the headend may be of value and facilitate playback and searching through the content.
- the tagging will occur during ingest to the set-top box 200 by the video processor 210, i.e., where the content is received and/or written to a disk, hard drive or other memory device.
- the point at which content is being ingested into the device and/or being processed and written to disk is likely to be the optimal point at which to analyze the content and provide tagging.
- the level of processing will vary depending on requirements, and be as simple as just tagging non-regularly spaced l-Frames, and "Black" l-frames or involve more sophisticated scene detection. There are considerations as to how much additional disk space can be used and much additional information should be stored.
- thumbnails of the frame starting the scene may also be captured to allow a graphical based browsing of the content.
- the third mode of tagging frames involves tagging content in real time.
- the video processor 210 can perform scene analysis where the scene analysis can be done on the fly during fast-forwarding and rewind events. In the event the user does a fast-forward or rewind, the video processor 210 essentially does the tagging on the fly, keeping counters as to where the appropriate scene points are.
- the algorithms or functions described below will be applied to jump to the appropriate tag position.
- the tagging of content will be implemented as an automated solution that is completely invisible to the user, though there are potentially significant variations in how much information is tagged, what is used to determine those tags and how the tags are used.
- the tags may constitute a very small amount of data that defines the key transition points in the file. For example, for a two-hour program which had six ad breaks, the start and end of those ad breaks could be defined by analyzing the scene changes where you have a black reference frame.
- an l-frame will typically be inserted every half a second or second, and there are a few interspersed l-frames that represent scene changes.
- l-frames are typically spaced at regular intervals, in addition to the scene changes, one difficulty is that is a scene may change on a regular interval l-frame, making it difficult to identify as a new scene. It is relatively simple to calculate the actual maximum l-Frame interval of the content, as looking through a short history will reveal l-Frames at least every N frames.
- the content has a maximum GOP size of 1 ⁇ 2 a second, there would be a minimum of 100 l-frames in every 50 seconds. However, due to additional l-Frames for scene changes, there may be, for example, 110 l-frames per 50 second period. From this we can still deduce the interval is roughly X or roughly half a second but there is additional l-frames in addition that represent scene changes.
- motion picture video content data is generally captured, stored, transmitted, processed, and output as a series of still images.
- Small frame-by-frame data content changes are perceived as motion when the output is directed to a viewer at sufficiently close time intervals.
- a large data content change between two adjacent frames is perceived as a scene change (e.g., a change from an indoor to an outdoor scene, a change in camera angle, an abrupt change in illumination within an image, and the like).
- Encoding and compression processes take advantage of small frame-by-frame video content data changes to reduce the amount of data needed to store, transmit, and process video data content.
- the amount of data required to describe the changes is less than the amount of data required to describe the original still image.
- MPEG Moving Pictures Experts Group
- a group of frames begins with an intra-coded frame (l-frame) in which encoded video content data corresponds to visual attributes (e.g., luminance, chrominance) of the original still image.
- Subsequent frames in the group of frames such as predictive coded frames (P-frames) and bi-directional coded frames (B- frames), are encoded based on changes from earlier frames in the group.
- P-frames predictive coded frames
- B- frames bi-directional coded frames
- New groups of frames, and thus new l-frames are begun at regular time intervals to prevent, for instance, noise from inducing false video content data changes.
- New groups of frames, and thus new l-frames are also begun at scene changes when the video content data changes are large because less data is required to describe a new still image than to describe the large changes between the adjacent still images. In other words, two pictures from different scenes have little correlation between them. Compression of the new picture into an l-frame is more efficient than using one picture to predict the other picture. Therefore, during content data encoding, it is important to identify scene changes between adjacent video content data frames.
- the method and device of the present disclosure may detect scene change by using a Sum of Absolute Histogram Difference (SAHD) and a Sum of Absolute Display Frame Difference (SADFD).
- SAHD Sum of Absolute Histogram Difference
- SADFD Sum of Absolute Display Frame Difference
- Such methods use the temporal information in the same scene to smooth out variations and accurately detect scene changes.
- These methods can be used for both real-time (e.g., real-time video compression) and non-real-time (e.g., film post-production) applications.
- Levels may, for example, be: Blank Reference Frames (highest priority)
- the playback would commence from a reference frame, though the tagging allows a better estimate of what frames the user is most likely to want to start from. If a priority 1 frame is found in the primary or secondary search zone, then playback will begin here. If a priority 1 frame is found in the primary zone, no further searching will take place. If there is no priority 1 tagged frame in the primary or secondary zones, the 2 nd priority tag closest to the center is selected for the start position. There may be "other" tags that need to be considered, as a tertiary priority in the same way as the priority 2 tags, though in the absence of any of these, the reference frame closest to the center of the primary search zone will be selected as the starting position.
- tags or tagged frames In one embodiment, in the case of video playback with pre-tagged content, assume that there is a content file on the disk or storage device 212 that has been tagged or a separate file that is associated with the content file that contains the tagging information.
- the tagging information will indicate the scene points generally within the video content file, and in particular would have weighted tags for how important these markers are as reference points.
- tags types such as a defined "look-up point”, regular interval l-frame (reference frame), off- interval l-frame (representing a new scene), and also a blank l-frame. Blank (black) I- frames would have a very low data rate as they contain little data, and are generally inserted between ad-breaks, indicating transition from a commercial to the beginning of a scene or between scenes, for example.
- the flow chart shown in FIG. 3 represents the process flow of playing back content in an environment when the content has been pre-tagged either prior to broadcast of the content or as it was ingested into the DVR device 200 or written to disk.
- the information is being read off the disk, such as a hard drive disk (step 302)
- normal playback occurs at a speed for viewing (304).
- a user may input a navigation instruction via user interface 216, e.g., fast-forward or rewind the content (step 306).
- the navigation instruction e.g., a fast-forward (FF), rewind (Rew), skip scene, etc., will cause the user to navigate the video content at a speed faster than the normal playback speed for viewing.
- the controller 214 When the user inputs a fast-forward or rewind, no additional processing takes place until the user presses play again, i.e. a subsequent navigation instruction.
- the controller 214 will examine the tagged information and determine what tags have occurred within the appropriate scope of the position at which the user pressed play (step 310). Then, the controller 214 will make a determination where to jump to, to start playback, based on a tag weight and FF/Rew speed (step 312). Once the playback position is determined, the video processor 210 will seek the play head to that point and begin video playback from the selected tagged frame (step 314).
- the playback process itself could be used to effectively dynamically tag the content.
- the content will be read from the disk and normal playback will occur (step 404).
- the video processor 210 will apply dynamic or "on-the-fly" frame tagging (step 408). That is, the device will detect blank scenes, references frames, etc., as passed during the FF/Rew process. These detect frames or points of reference will be tagged. These tags may or may not be stored along with the content for later use.
- the device 200 will proceed as described above.
- the controller 214 will make a determination where to jump to, to start playback, based on a tag weight and FF/Rew speed (step 412).
- the video processor 210 will seek the play head to that point and begin video playback from the selected tagged frame (step 414).
- the tagging can also be used to provide a better or different experience for users to be able to skip from "Scene to Scene" with a press of a button, or skip a larger amount of content (with a pre-defined base time period), though still begin playback on a scene boundary as defined in the tags.
- This process is shown in FIG. 5. Referring to FIG. 5, video is read from the disk (step 502) and normal playback occurs at a speed for viewing (step 504).
- step 506 the controller 214 will set a "scene search" position according to a predefined "scene definition” setting (step 508), i.e., jump forward or backward a fixed amount of time to begin scene search.
- step 510 the controller 214 will examine the tag information for tagged frames within the proximity of the "scene search" start point. Then, the controller 214 will make a determination where to jump to, to start playback, based on a tag weight in the selection area (step 512). Once the playback position is determined, the video processor 210 will seek the play head to that point and begin video playback from the selected tagged frame (step 514).
- the device 200 In addition to being able to perform scene skipping with tagged content, the device 200 also performs scene skipping dynamically with content that has not been pre-tagged, as shown in FIG. 6.
- video is read from the disk (step 602) and normal playback occurs at a speed for viewing (step 604).
- the controller 214 Upon the user requesting a "scene skip” function in step 606, the controller 214 will set a "scene search" position according to a predefined "scene definition” setting (step 608), i.e., jump forward or backward a fixed amount of time to begin scene search.
- step 510 the controller 214 will examine the tag information for tagged frames within the proximity of the "scene search" start point.
- the video processor 210 will apply dynamic or "on-the-fly" frame tagging (step 610). That is, the video processor 210 will detect blank scenes, references frames, etc., as passed during the scene skip process. These detect frames or points of reference will be tagged. These tags may or may not be stored along with the content for later use. Then, the controller 214 will make a determination where to jump to, to start playback, based on a tag weight in the selection area (step 612). Once the playback position is determined, the video processor 210 will seek the play head to that point and begin video playback from the selected tagged frame (step 614). The function of how to determine the appropriate playback position after a user presses play will now be described.
- the controller 214 will set a start point based on one of a number of factors, then specify a period or zone in which to search either direction from that reference point. The controller 214 will then search to see what tags fall within that range and apply an algorithm or function to determine the most appropriate start point for playback.
- the play start position is likely to be a reference frame of some form, it is also possible to key off an alternate pre-defined time stamp, which could also be other than a reference frame. Indeed, as part of the tagging mechanism a facility to say this is other than an l-frame, say a B-frame, but it is a B-frame that is easily buildable from the last four frames.
- the tag could contain data (or a reference to the data) to allow the device to go back several frames to get back into all the video data that is needed to build this non-reference frame and treat it as such. In this instance, the tag would likely contain the offset information required to make it quicker and easier to get the data required rather than have to calculate it from scratch on the fly.
- the present disclosure provides mechanisms to get reference frames from somewhere else so the device and method can actually support the fast forward and rewind with such video by augmenting it with external data, dynamically getting additional frames from the internet or some other medium and/or source.
- the stream has minimal reference frames, and there would be another source of the rest of the l-frames or intervening data required to build complete frames.
- DVRs typically employ algorithms or functions where during trick mode playback, the DVR will jump from l-frame to l-frame or determine which reference frames are to be displayed.
- the present disclosure expands on this basic idea so that rather than just referencing l-frames, there are multiple possible points at which the DVR may stop, which will nominally be defined as a scene. While the tags define possible points from which to start playback, an algorithm or function is applied to determine the time interval within the content in which to search for these tags, and which tag represents the optimal start point within that content.
- the start and end positions for any playback position search are bounded by the position in the content file at which the user started the fast-forward/rewind, i.e., input a first navigation instruction, and where they pressed play i.e., input a second navigation instruction. No searching will occur outside these boundaries.
- the controller 214 will calculate both a "search position" (in the center of the search area), and a size of the area (or zone) in which to search for tags as illustrated in FIG. 7.
- a search start position is defined in the file based on the following criteria: 1 ) the speed at which the user is doing the FF/Rew and 2) a nominal reaction time assigned to the user.
- the reaction time of the user may initially be set at 2-5 seconds and can be modified according to user input and/or experience of the device 200 as to actual likely reaction times, as will be described in detail below.
- the user is FF at 30x real speed and presses play 43 minutes and 10 seconds into the file (43:10).
- the user has an assigned reaction time of 4 seconds.
- This means that the central position 702 for the search would be 4x30 seconds (i.e. 2 minutes) before the position where the user pressed play (i.e., 41 :10).
- the search for tagged frames would therefore start at this position, with primary search zone 704 being a fixed percentage of this distance on either side of the center point 702. Assume this is 50%, the tag search zone would be 1 minute either side of the central point, i.e. between 40:10 and 42:10 in the file.
- any priority tagged frame is found within this range, a hit is registered and the video playback will commence from the tagged frame having the highest priority. If more than one match is found and the weight of the tag priorities is the same, the playback will commence from the point closest to the center position 702.
- the user's reaction time may also be measured and potentially used to alter the expected response time for future searches. If no match is found, a secondary zone 706 will also be searched, this may be, for example 100% of the distance from the position at which the user pressed play, to the center point 702. If a key tag is found in this search, this may indicate that the user's reaction was abnormal, and if a key frame exists in this area, it can still be selected as the start position.
- the final learning search zone 708 extends from the central point 702 to the play position, and 200% from the central point back. This will only be searched in the case that no key frame was found in either of the first 2 zones. If a key tagged frame is found here, the delay can be recorded, and if this is constant behavior, the reaction time of the user may be adjusted to ensure that the key frame lands in the primary zone more often. Note that the percentages of the distance from the central point are illustrative only, and will be better determined through user profiling. In addition, regardless of the percentage, the search will take place within the extreme bounds of the search as described earlier.
- the device 200 will employ both automated and manual mechanisms. This may include a user preference that lets users define and/or test their own reaction time.
- a typical reaction time might be two seconds for example, so at the user fast-forwards through the content, it will take a certain amount of time from when the users see the point at which they would like to start playing, before they press the play button.
- the user has a 2 second reaction time and is fast-forwarding at 30x normal playback, a minute worth of video will pass between what triggered the user to press play, and them actually doing so. If the FF rate was, for example, only 2x normal playback, only 4 seconds of video would have passed in this time.
- the user's reaction time will be highly variable, with a slow reaction time being around 5 seconds, and a fast reaction time is probably half a second.
- the device 200 will determine if the user's reaction time is fast or not. As a rule of thumb, default values will be used to set the average user response based on testing. Additionally, the device 200 may provide a user-interface for user's to configure their reaction time, and/or have it calculated dynamically. If the device were to define a default time for the average user of, say 2 seconds, it can then build up a record of how the user actually does react over time, e.g., based on testing if there are high priority "Blank Frame" tags, found consistently within a unusually long distance from when the user presses play.
- the response time may also be connected to a user based system on the device 200 such that separate profiling may be conducted for multiple users of the system.
- Manual reaction time may be set using a traditional slider displayed on the display device 114.
- Another option is a mechanism to determine the reaction speed of the user, by, for example, showing a series of images in random order, and ask the user to press the play button when they see a particular image (such as a picture of a dog, for example), then measuring the time between when the image was displayed and when the user pressed play.
- the test may be repeated multiple times to gain better accuracy, and may be user specific (i.e. the system may allow a user to identify themselves individually, both from a testing perspective and for using the device).
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Television Signal Processing For Recording (AREA)
- Indexing, Searching, Synchronizing, And The Amount Of Synchronization Travel Of Record Carriers (AREA)
- Signal Processing For Digital Recording And Reproducing (AREA)
Abstract
Description
Claims
Priority Applications (7)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP10730826.4A EP2548370B8 (en) | 2010-03-17 | 2010-05-07 | Method and device for optimal playback positioning in digital content |
CN201080066658.2A CN102884786B (en) | 2010-05-07 | 2010-05-07 | The method and apparatus of optimal playback location in digital content |
US13/634,984 US8891936B2 (en) | 2010-05-07 | 2010-05-07 | Method and device for optimal playback positioning in digital content |
KR1020127026867A KR101656520B1 (en) | 2010-05-07 | 2010-05-07 | Method and device for optimal playback positioning in digital content |
PCT/IB2010/001065 WO2011138628A1 (en) | 2010-05-07 | 2010-05-07 | Method and device for optimal playback positioning in digital content |
JP2013508561A JP2013532405A (en) | 2010-05-07 | 2010-05-07 | Method and apparatus for optimal playback positioning in digital content |
BR112012023885-0A BR112012023885B1 (en) | 2010-03-17 | 2010-05-07 | METHOD FOR DETERMINING A PLAYBACK POSITION ON VIDEO CONTENT AND DEVICE FOR PLAYING VIDEO CONTENT |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/IB2010/001065 WO2011138628A1 (en) | 2010-05-07 | 2010-05-07 | Method and device for optimal playback positioning in digital content |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2011138628A1 true WO2011138628A1 (en) | 2011-11-10 |
Family
ID=43382374
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/IB2010/001065 WO2011138628A1 (en) | 2010-03-17 | 2010-05-07 | Method and device for optimal playback positioning in digital content |
Country Status (5)
Country | Link |
---|---|
US (1) | US8891936B2 (en) |
JP (1) | JP2013532405A (en) |
KR (1) | KR101656520B1 (en) |
CN (1) | CN102884786B (en) |
WO (1) | WO2011138628A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3086321A1 (en) * | 2015-04-24 | 2016-10-26 | ARRIS Enterprises LLC | Designating partial recordings as personalized multimedia clips |
EP3416395B1 (en) * | 2012-03-13 | 2023-05-10 | TiVo Solutions Inc. | Automatic commercial playback system |
Families Citing this family (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103220582B (en) * | 2013-02-20 | 2017-11-07 | 商松 | A kind of video file management method |
US9274673B2 (en) | 2013-12-31 | 2016-03-01 | Google Inc. | Methods, systems, and media for rewinding media content based on detected audio events |
US9591365B2 (en) | 2014-02-26 | 2017-03-07 | Rovi Guides, Inc. | Methods and systems for supplementing media assets during fast-access playback operations |
US9760275B2 (en) * | 2014-04-11 | 2017-09-12 | Intel Corporation | Technologies for skipping through media content |
WO2016179386A1 (en) * | 2015-05-06 | 2016-11-10 | Arris Enterprises Llc | Intelligent multimedia playback re-positioning |
BR112017028445A2 (en) * | 2015-06-30 | 2018-08-28 | Thomson Licensing | method and apparatus for controlling media playback using a single control |
US10817169B2 (en) * | 2016-10-14 | 2020-10-27 | Microsoft Technology Licensing, Llc | Time-correlated ink |
KR20180092163A (en) * | 2017-02-08 | 2018-08-17 | 삼성전자주식회사 | Electronic device and server for video playback |
US10390077B2 (en) | 2017-03-15 | 2019-08-20 | The Directv Group, Inc. | Collective determination of interesting portions of a media presentation, media tagging and jump playback |
CN107122617A (en) * | 2017-05-16 | 2017-09-01 | 上海联影医疗科技有限公司 | The acquisition methods and medical imaging devices of medical imaging data |
US11895369B2 (en) | 2017-08-28 | 2024-02-06 | Dolby Laboratories Licensing Corporation | Media-aware navigation metadata |
CN110971857B (en) | 2018-09-28 | 2021-04-27 | 杭州海康威视系统技术有限公司 | Video playback method and device and computer readable storage medium |
JP7164465B2 (en) * | 2019-02-21 | 2022-11-01 | i-PRO株式会社 | wearable camera |
JP6752349B1 (en) * | 2019-12-26 | 2020-09-09 | 株式会社ドワンゴ | Content distribution system, content distribution method, and content distribution program |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2000062298A1 (en) * | 1999-03-30 | 2000-10-19 | Tivo, Inc. | System for automatic playback position correction after fast forward or reverse |
US20030071971A1 (en) * | 1999-10-19 | 2003-04-17 | Samsung Electronics Co., Ltd. | Recording and/or reproducing apparatus and method using key frame |
JP2005293680A (en) * | 2004-03-31 | 2005-10-20 | Nec Personal Products Co Ltd | Content cue position control method, content cue position control device and content cue position control program |
EP1748438A1 (en) * | 2005-07-27 | 2007-01-31 | Samsung Electronics Co., Ltd. | Video apparatus and control method |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1994016442A1 (en) | 1993-01-08 | 1994-07-21 | Arthur D. Little Enterprises, Inc. | Method and apparatus for eliminating television commercial messages |
JP3909130B2 (en) * | 1997-09-30 | 2007-04-25 | 株式会社日立製作所 | Stream event point detection display method and apparatus |
JP2000354225A (en) * | 1999-06-11 | 2000-12-19 | Toshiba Corp | Disk recorder |
EP1251515A1 (en) * | 2001-04-19 | 2002-10-23 | Koninklijke Philips Electronics N.V. | Method and system for selecting a position in an image sequence |
US7333712B2 (en) * | 2002-02-14 | 2008-02-19 | Koninklijke Philips Electronics N.V. | Visual summary for scanning forwards and backwards in video content |
CN101025987A (en) | 2006-02-21 | 2007-08-29 | 广州市纽帝亚资讯科技有限公司 | Video play fast forward/fast rewind method and device based on video content |
KR100834959B1 (en) | 2006-08-11 | 2008-06-03 | 삼성전자주식회사 | Method and apparatus for multimedia contents playing |
JP2008277967A (en) | 2007-04-26 | 2008-11-13 | Sony Corp | Information processing device and information processing method, program, and recording medium |
US8479229B2 (en) * | 2008-02-29 | 2013-07-02 | At&T Intellectual Property I, L.P. | System and method for presenting advertising data during trick play command execution |
-
2010
- 2010-05-07 JP JP2013508561A patent/JP2013532405A/en active Pending
- 2010-05-07 CN CN201080066658.2A patent/CN102884786B/en active Active
- 2010-05-07 WO PCT/IB2010/001065 patent/WO2011138628A1/en active Application Filing
- 2010-05-07 US US13/634,984 patent/US8891936B2/en active Active
- 2010-05-07 KR KR1020127026867A patent/KR101656520B1/en active IP Right Grant
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2000062298A1 (en) * | 1999-03-30 | 2000-10-19 | Tivo, Inc. | System for automatic playback position correction after fast forward or reverse |
US20030071971A1 (en) * | 1999-10-19 | 2003-04-17 | Samsung Electronics Co., Ltd. | Recording and/or reproducing apparatus and method using key frame |
JP2005293680A (en) * | 2004-03-31 | 2005-10-20 | Nec Personal Products Co Ltd | Content cue position control method, content cue position control device and content cue position control program |
EP1748438A1 (en) * | 2005-07-27 | 2007-01-31 | Samsung Electronics Co., Ltd. | Video apparatus and control method |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3416395B1 (en) * | 2012-03-13 | 2023-05-10 | TiVo Solutions Inc. | Automatic commercial playback system |
EP3086321A1 (en) * | 2015-04-24 | 2016-10-26 | ARRIS Enterprises LLC | Designating partial recordings as personalized multimedia clips |
US10504557B2 (en) | 2015-04-24 | 2019-12-10 | Arris Enterprises Llc | Designating partial recordings as personalized multimedia clips |
Also Published As
Publication number | Publication date |
---|---|
US8891936B2 (en) | 2014-11-18 |
CN102884786A (en) | 2013-01-16 |
CN102884786B (en) | 2016-08-17 |
US20130011116A1 (en) | 2013-01-10 |
KR101656520B1 (en) | 2016-09-22 |
JP2013532405A (en) | 2013-08-15 |
KR20130086521A (en) | 2013-08-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8891936B2 (en) | Method and device for optimal playback positioning in digital content | |
US11503348B2 (en) | Smart TV detection of STB user-control actions related to STB- originated content presentation speed | |
JP6701137B2 (en) | Automatic commercial playback system | |
US6909837B1 (en) | Method and system for providing alternative, less-intrusive advertising that appears during fast forward playback of a recorded video program | |
JP4202316B2 (en) | Black field detection system and method | |
KR102010513B1 (en) | Method and apparatus for playing back recorded video | |
US8793733B2 (en) | Information processing apparatus, information processing method, and program for enabling computer to execute same method | |
JP2005524271A (en) | System and method for indexing commercials in video presentation | |
US9445144B2 (en) | Apparatus, systems and methods for quick speed presentation of media content | |
US20220150596A1 (en) | Apparatus, systems and methods for song play using a media device having a buffer | |
US11234060B2 (en) | Weave streaming content into a linear viewing experience | |
KR101324067B1 (en) | A search tool | |
EP2548370B1 (en) | Method and device for optimal playback positioning in digital content | |
JP5970090B2 (en) | Method and apparatus for optimal playback positioning in digital content | |
US8189986B2 (en) | Manual playback overshoot correction | |
US20140226956A1 (en) | Method and apparatus for changing the recording of digital content |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 201080066658.2 Country of ref document: CN |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 10730826 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2013508561 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 13634984 Country of ref document: US |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2010730826 Country of ref document: EP |
|
ENP | Entry into the national phase |
Ref document number: 20127026867 Country of ref document: KR Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
REG | Reference to national code |
Ref country code: BR Ref legal event code: B01A Ref document number: 112012023885 Country of ref document: BR |
|
ENP | Entry into the national phase |
Ref document number: 112012023885 Country of ref document: BR Kind code of ref document: A2 Effective date: 20120917 |