JP2015097426A

JP2015097426A - Method and device for optimal playback positioning in digital content

Info

Publication number: JP2015097426A
Application number: JP2015012066A
Authority: JP
Inventors: ティモシー，アラン，バレット; Alan Barrett Timothy
Original assignee: Thomson Licensing SAS
Current assignee: Thomson Licensing SAS
Priority date: 2015-01-26
Filing date: 2015-01-26
Publication date: 2015-05-21
Anticipated expiration: 2030-05-07
Also published as: JP5970090B2

Abstract

PROBLEM TO BE SOLVED: To provide a method and device for optimal playback positioning in video content.SOLUTION: In a mechanism of tagging scenes or significant points in content in a prioritized way, a mechanism is defined to utilize this tagging associated with the content to facilitate stopping or starting at appropriate points for playback when pressing a scene skip button to jump forward or back to another scene or when pressing 308 a play button after inputting 306 a fast-forward (FF) or rewind (Rew) instruction.

Description

関連する仮出願への参照
本願は2010年3月17日に出願された「最適な再生位置決めのためのDVRにおけるコンテンツ・タグ付け」という名称の仮出願第61/314700号からの優先権を主張する。 Reference to related provisional application This application claims priority from provisional application 61/314700 filed March 17, 2010, entitled "Content Tagging in DVR for Optimal Playback Positioning" To do.

発明の技術分野
本開示は概括的にはデジタル・コンテンツ・システムおよびデジタル・ビデオ記録システムに関し、より詳細にはデジタル・ビデオ・コンテンツにおける最適な再生位置決めのための方法および装置に関する。 TECHNICAL FIELD OF THE INVENTION The present disclosure relates generally to digital content systems and digital video recording systems, and more particularly to methods and apparatus for optimal playback positioning in digital video content.

デジタル・ビデオ・レコーダー（DVR: digital video recorder）を使うとき、コンテンツ、たとえば映画やテレビ番組の中で前や後にスキップしたくなることがよくある。しかしながら、現在のところ、シーンの適切な開始もしくは終了点またはコンテンツ再生を開始するための適切な点を決定する機構はない。多くのDVRは単にユーザーが再生ボタンを押すところで再生を開始する。ただ、一部のものは、早送り（FF: fast-forward）または巻き戻し（Rew: rewind）がどれくらいの速さだったかによって、固定量の遅延を想定し、補償するためにある量だけ自動的にスキップして戻って再生開始点が決められる機構をもつ。今日存在する現在の実装の最良のものでは、コンテンツ再生は必ずしも何らかの種類のシーン境界から始まらず、単にユーザーを、望むと思われるところに近づけるだけである。 When using a digital video recorder (DVR), you often want to skip ahead or behind in content, such as a movie or TV show. However, there is currently no mechanism for determining an appropriate start or end point for a scene or an appropriate point for starting content playback. Many DVRs simply start playing when the user presses the play button. Some, however, assume a fixed amount of delay and automatically compensate by a certain amount, depending on how fast the fast-forward (FF) or rewind (Rew) is. The playback start point is determined by skipping to In the best of the current implementations that exist today, content playback does not necessarily start at some kind of scene boundary, it simply brings the user closer to what he wants.

デジタル・ビデオ・コンテンツにおける最適な再生位置決めのための方法および装置が提供される。 A method and apparatus for optimal playback positioning in digital video content is provided.

本開示は、優先順位付けされた仕方でコンテンツ中のシーンまたは有意な点にタグ付けする機構に関し、たとえば前方または後方の別のシーンにジャンプするようシーン・スキップ・ボタンを押すとき、あるいは早送り（FF）または巻き戻し（Rew）命令の入力後に再生ボタンを押すときに、コンテンツに関連付けられたこのタグ付けを利用して、再生のために適切な点における停止または開始を容易にする機構を定義する。 The present disclosure relates to a mechanism for tagging scenes or significant points in content in a prioritized manner, for example when pressing a scene skip button to jump to another scene forward or backward, or fast forward ( Use this tagging associated with the content to define a mechanism that facilitates stopping or starting at an appropriate point for playback when the play button is pressed after a FF) or Rew command is entered To do.

本開示のある側面によれば、複数のフレームを含むビデオ・コンテンツにおける最適な再生位置を決定する方法が提供される。本方法は、中でも、試聴のための再生速度でビデオ・コンテンツを表示し、ビデオ・コンテンツを試聴のための再生速度より速い速度でナビゲートする第一のナビゲーション命令を受け取り、試聴のための再生速度でのビデオ・コンテンツの再生を再開する第二のナビゲーション命令を受け取り、ビデオ・コンテンツの少なくとも一つのタグ付けされたフレームに基づいて、第二のナビゲーション命令に応答してビデオ・コンテンツの再生位置を決定することを含む。 According to one aspect of the present disclosure, a method is provided for determining an optimal playback position in video content that includes a plurality of frames. The method, among other things, displays video content at a playback speed for audition, receives a first navigation instruction for navigating the video content at a speed faster than the playback speed for audition, and plays it for preview. A second navigation instruction that resumes playing the video content at a speed, and a playback position of the video content in response to the second navigation instruction based on at least one tagged frame of the video content Including determining.

本開示のもう一つの側面によれば、複数のフレームを含むビデオ・コンテンツを再生する装置が提供される。本装置は、中でも、試聴のための再生速度でビデオ・コンテンツを再生装置に提供するビデオ・プロセッサと、ビデオ・コンテンツを試聴のための再生速度より速い速度でナビゲートする第一のナビゲーション命令を受け取り、試聴のための再生速度でのビデオ・コンテンツの再生を再開する第二のナビゲーション命令を受け取るユーザー・インターフェースと、第二のナビゲーション命令を受け取り、ビデオ・コンテンツの少なくとも一つのタグ付けされたフレームに基づいてビデオ・コンテンツの再生位置を決定し、決定された再生位置をビデオ・プロセッサに提供する、ユーザー・インターフェースに結合されたコントローラとを含む。 According to another aspect of the present disclosure, an apparatus for playing video content including a plurality of frames is provided. The device includes, among other things, a video processor that provides video content to the playback device at a playback speed for previewing, and a first navigation instruction for navigating the video content at a speed faster than the playback speed for previewing. A user interface that receives a second navigation instruction for receiving and resuming playback of the video content at a playback speed for listening; and at least one tagged frame of the video content for receiving the second navigation instruction And a controller coupled to the user interface for determining a playback position of the video content based on the video content and providing the determined playback position to the video processor.

本開示のこれらおよび他の側面、特徴および利点が記載され、付属の図面との関連で読まれるべき好ましい実施形態の以下の詳細な説明から明白となるであろう。 These and other aspects, features and advantages of the present disclosure will be described and will become apparent from the following detailed description of the preferred embodiments to be read in conjunction with the accompanying drawings.

図面において、同様の参照符号は諸図を通じて同様の要素を表す。
本開示に基づく、ビデオ・コンテンツを送達するための例示的なシステムのブロック図である。本開示に基づく例示的なセットトップボックス／デジタル・ビデオ・レコーダー（DVR）のブロック図である。本開示に基づく、コンテンツがあらかじめタグ付けされているときの環境においてコンテンツを再生する例示的な方法のフローチャートである。本開示に基づく、コンテンツが動的にタグ付けされるときの環境においてコンテンツを再生する例示的な方法のフローチャートである。本開示に基づく、コンテンツを再生し、シーン・スキップ機能でコンテンツをナビゲートする例示的な方法のフローチャートである。本開示の別の実施形態に基づく、コンテンツを再生し、シーン・スキップ機能でコンテンツをナビゲートする例示的な方法のフローチャートである。本開示に基づく、ビデオ再生タイムラインと、ビデオ・コンテンツのタグ付けされたフレームを求めてさまざまなゾーンの検索が決定される様子とを示す図である。図面は、本開示の概念を例解する目的のためであり、必ずしも本開示を例解するための唯一可能な構成ではないことを理解しておくべきである。 In the drawings, like reference numerals represent like elements throughout the drawings.
1 is a block diagram of an exemplary system for delivering video content in accordance with the present disclosure. FIG. 2 is a block diagram of an exemplary set top box / digital video recorder (DVR) in accordance with the present disclosure. FIG. 4 is a flowchart of an exemplary method for playing content in an environment when the content is pre-tagged according to the present disclosure. 2 is a flowchart of an exemplary method for playing content in an environment when the content is dynamically tagged in accordance with the present disclosure. 4 is a flowchart of an exemplary method for playing content and navigating content with a scene skip function, in accordance with the present disclosure. 4 is a flowchart of an exemplary method for playing content and navigating content with a scene skip function, according to another embodiment of the present disclosure. FIG. 4 illustrates a video playback timeline and how various zone searches are determined for tagged frames of video content in accordance with the present disclosure. It should be understood that the drawings are for purposes of illustrating the concepts of the disclosure and are not necessarily the only possible configuration for illustrating the disclosure.

図面に示される要素はさまざまな形のハードウェア、ソフトウェアまたはそれらの組み合わせにおいて実装されうることを理解しておくべきである。好ましくは、これらの要素は、プロセッサ、メモリおよび入出力インターフェースを含みうる、一つまたは複数の適切にプログラムされた汎用デバイス上でのハードウェアおよびソフトウェアの組み合わせにおいて実装される。本稿において、「結合される」という表現は、直接接続されるまたは一つまたは複数の中間コンポーネントを通じて間接的に接続されることを意味するものと定義される。そのような中間コンポーネントはハードウェアおよびソフトウェア・ベースのコンポーネントを両方含んでいてもよい。 It should be understood that the elements shown in the drawings can be implemented in various forms of hardware, software, or combinations thereof. Preferably, these elements are implemented in a combination of hardware and software on one or more appropriately programmed general purpose devices that may include a processor, memory and input / output interfaces. In this paper, the expression “coupled” is defined to mean directly connected or indirectly connected through one or more intermediate components. Such intermediate components may include both hardware and software based components.

本稿は本開示の原理を例解する。よって、当業者は、本稿に明示的に記載されたり示されたりしていなくても、本開示の原理を具現し、その精神および範囲内に含まれるさまざまな構成を考案できるであろうことは理解されるであろう。 This article illustrates the principles of this disclosure. Thus, those skilled in the art will be able to devise various arrangements that embody the principles of the present disclosure and fall within the spirit and scope thereof, even if not explicitly described or shown herein. Will be understood.

本稿に記載されるあらゆる例および条件付きの言辞は、読者が、本開示の原理および当該技術を進歩させる発明者によって寄与される概念を理解するのを支援するという教育目的のために意図されているのであって、そのような個別的に記載されている例および条件に限定することなく解釈されるものである。 All examples and conditional phrases described in this article are intended for educational purposes to help readers understand the principles of this disclosure and the concepts contributed by the inventors to advance the technology. And are not to be construed as limiting to such individually described examples and conditions.

さらに、本開示の原理、側面および実施形態ならびにその個別的な例を記載する本稿におけるあらゆる陳述は、その構造的および機能的な等価物の両方を包含することが意図されている。さらに、そのような等価物は、現在知られている等価物および将来開発される等価物、すなわち構造にかかわりなく同じ機能を実行する任意の開発された要素の両方を含むことが意図されている。 Moreover, any statement herein that sets forth principles, aspects, and embodiments of the disclosure and specific examples thereof is intended to encompass both its structural and functional equivalents. Furthermore, such equivalents are intended to include both presently known equivalents and future equivalents, i.e., any developed element that performs the same function regardless of structure. .

よって、たとえば、当業者は、本稿に呈示されるブロック図が本開示の原理を具現する例示的な回路の概念図を表すものであることを理解するであろう。同様に、フローチャート、流れ図、状態遷移図、擬似コードなどがあったとすると、それらはいずれも、コンピュータ可読媒体において実質的に表現され、コンピュータまたはプロセッサによって実行されうるさまざまなプロセスを表すことが理解されるであろう。これはそのようなコンピュータまたはプロセッサが明示的に示されているかどうかにはよらない。 Thus, for example, those skilled in the art will appreciate that the block diagram presented herein represents a conceptual diagram of an exemplary circuit that embodies the principles of the present disclosure. Similarly, if there are flowcharts, flowcharts, state transition diagrams, pseudocode, etc., it is understood that they all represent various processes that are substantially represented in a computer-readable medium and that may be executed by a computer or processor. It will be. This does not depend on whether such a computer or processor is explicitly indicated.

図面に示されるさまざまな要素の機能は、専用ハードウェアの使用を通じて提供されても、適切なソフトウェアとの関連でソフトウェアを実行することのできるハードウェアの使用を通じて提供されてもよい。プロセッサによって提供されるとき、機能は単一の専用プロセッサによって、単一の共有されるプロセッサによって、あるいは一部が共有されていてもよい複数の個別プロセッサによって提供されうる。さらに、用語「プロセッサ」または「コントローラ」の明示的な使用は、ソフトウェアを実行することのできるハードウェアのみを指すものと解釈されるべきではなく、暗黙的に、限定なしに、デジタル信号プロセッサ（「DSP」）ハードウェア、ソフトウェアを記憶するための読み出し専用メモリ（「ROM」）、ランダム・アクセス・メモリ（「RAM」）および不揮発性記憶装置を含みうる。 The functionality of the various elements shown in the drawings may be provided through the use of dedicated hardware or through the use of hardware capable of executing software in the context of appropriate software. When provided by a processor, functionality may be provided by a single dedicated processor, by a single shared processor, or by multiple individual processors that may be partially shared. Furthermore, the explicit use of the terms “processor” or “controller” should not be construed to refer only to hardware capable of executing software, but implicitly, without limitation, digital signal processors ( “DSP”) hardware, read only memory (“ROM”) for storing software, random access memory (“RAM”) and non-volatile storage.

従来のものおよび／またはカスタムのものを含め他のハードウェアも含まれてもよい。同様に、図面に示されるスイッチがあったとしても、それは単に概念的なものである。その機能はプログラム論理の動作を通じて、専用論理を通じて、プログラム制御と専用論理の相互作用を通じて、あるいはさらに手動で実行されてもよい。特定の技法は、コンテキストからより個別に理解されるように実装者によって選択可能である。 Other hardware, including conventional and / or custom, may also be included. Similarly, if there is a switch shown in the drawings, it is merely conceptual. The function may be performed through the operation of program logic, through dedicated logic, through the interaction of program control and dedicated logic, or even manually. The particular technique can be selected by the implementer so that it can be understood more individually from the context.

本願の請求項では、特定の機能を実行する手段として表現されたいかなる要素も、その機能を実行するいかなる仕方をも、たとえばａ）その機能を実行する回路素子の組み合わせまたはｂ）任意の形の、したがってファームウェア、マイクロコードなどを含むソフトウェアを、当該機能を実行するためにそのソフトウェアを実行するための適切な回路と組み合わせたものを包含することが意図されている。そのような請求項によって定義される本開示は、前記さまざまな記載される手段によって提供される機能性が請求項が記載する仕方で組み合わされ、一緒にされるという事実にある。よって、これらの機能性を提供できる任意の手段が本願で示されている手段と等価であると見なされる。 In the claims of this application, any element expressed as a means for performing a specified function may be performed in any way, for example a) a combination of circuit elements performing that function, or b) any form of Thus, it is intended to encompass software, including firmware, microcode, etc., combined with appropriate circuitry to execute the software to perform the function. The present disclosure, as defined by such claims, resides in the fact that the functionality provided by said various described means is combined and brought together in the manner described by the claims. It is thus regarded that any means that can provide those functionalities are equivalent to those shown herein.

デジタル・ビデオ・コンテンツにおける最適な再生位置についての方法および装置が提供される。本開示は、優先順位付けされた仕方でコンテンツ中のシーンまたは有意な点にタグ付けする機構に関し、たとえば前方または後方の別のシーンにジャンプするようシーン・スキップ・ボタンを押すとき、あるいは早送り（FF）または巻き戻し（Rew）命令の入力後に再生を押すときに、コンテンツに関連付けられたこのタグ付けを利用して、再生のために適切な点における停止または開始を容易にする機構を定義する。 A method and apparatus for optimal playback position in digital video content is provided. The present disclosure relates to a mechanism for tagging scenes or significant points in content in a prioritized manner, for example when pressing a scene skip button to jump to another scene forward or backward, or fast forward ( Use this tagging associated with content to define a mechanism that facilitates stopping or starting at an appropriate point for playback when pressing play after entering a FF) or Rew command .

ここで図１に目を転じると、家庭またはエンドユーザーにビデオ・コンテンツを送達するシステム１００のある実施形態のブロック図が示されている。コンテンツは、映画スタジオまたは制作会社のようなコンテンツ源１０２に由来する。コンテンツは二つの形のうちの少なくとも一つで供給されうる。一つの形はコンテンツの放送される形であってもよい。放送されるコンテンツは、放送アフィリエート・マネージャ１０４に提供される。放送アフィリエート・マネージャは典型的には、ABC（アメリカン・ブロードキャスティング・カンパニー）、NBC、CBSなどといった全国的な放送サービスである。放送アフィリエート・マネージャは、コンテンツを収集および記憶してもよく、送達ネットワーク１（１０６）として図示される送達ネットワークを通じたコンテンツの送達をスケジューリングしてもよい。送達ネットワーク１（１０６）は、全国的なセンターから一つまたは複数の地域またはローカルのセンターへの衛星リンク送信を含んでいてもよい。送達ネットワーク１（１０６）はまた、空中（over the air）放送、衛星放送またはケーブル放送といったローカルな送達システムを使ったローカルなコンテンツ送達を含んでいてもよい。ローカルに送達されるコンテンツは、ユーザーの家庭にあるユーザーのセットトップボックス／デジタル・ビデオ・レコーダー（DVR）１０８に与えられる。 Turning now to FIG. 1, a block diagram of one embodiment of a system 100 for delivering video content to a home or end user is shown. The content comes from a content source 102, such as a movie studio or production company. Content can be provided in at least one of two forms. One form may be a form in which content is broadcast. The broadcast content is provided to the broadcast affiliate manager 104. Broadcast affiliate managers are typically national broadcast services such as ABC (American Broadcasting Company), NBC, and CBS. A broadcast affiliate manager may collect and store content and may schedule delivery of content through a delivery network, illustrated as delivery network 1 (106). Delivery network 1 (106) may include satellite link transmissions from national centers to one or more regional or local centers. Delivery network 1 (106) may also include local content delivery using local delivery systems such as over the air broadcast, satellite broadcast or cable broadcast. Locally delivered content is provided to the user's set top box / digital video recorder (DVR) 108 in the user's home.

第二の形のコンテンツは、特殊コンテンツと称される。特殊コンテンツは、プレミアム試聴、ペイパービューまたは普通なら放送アフィリエート・マネージャに提供されない他のコンテンツとして送達されるコンテンツを含んでいてもよい。多くの場合、特殊コンテンツは、ユーザーによって要求されたコンテンツであってもよい。特殊コンテンツはコンテンツ・マネージャ１１０に送達されてもよい。コンテンツ・マネージャ１１０は、コンテンツ・プロバイダー、放送サービスまたは送達ネットワーク・サービスなどと提携した、インターネット・ウェブサイトのようなサービス・プロバイダーであってもよい。コンテンツ・マネージャ１１０は、インターネット・コンテンツを送達システム中に組み込んでもよい。コンテンツ・マネージャ１１０はコンテンツを、別個の送達ネットワークである送達ネットワーク２（１１２）を通じてユーザーのセットトップボックス／デジタル・ビデオ・レコーダー１０８に送達してもよい。送達ネットワーク２（１１２）は高速ブロードバンド・インターネット型の通信システムを含んでいてもよい。放送アフィリエート・マネージャ１０４からのコンテンツが送達ネットワーク２（１１２）の全部または一部を使って送達されてもよく、コンテンツ・マネージャ１１０からのコンテンツが送達ネットワーク１（１０６）の全部または一部を使って送達されてもよいことを注意しておくことが重要である。さらに、ユーザーは、必ずしもコンテンツをコンテンツ・マネージャ１１０によって管理してもらうことなく、送達ネットワーク２（１１２）を介してインターネットから直接コンテンツを取得してもよい。 The second form of content is called special content. Special content may include content delivered as premium audition, pay-per-view, or other content that would otherwise not be provided to the broadcast affiliate manager. In many cases, the special content may be content requested by the user. Special content may be delivered to the content manager 110. Content manager 110 may be a service provider, such as an Internet website, affiliated with a content provider, a broadcast service or a delivery network service or the like. Content manager 110 may incorporate Internet content into the delivery system. The content manager 110 may deliver the content to the user's set top box / digital video recorder 108 through the delivery network 2 (112), which is a separate delivery network. Delivery network 2 (112) may include a high-speed broadband Internet type communication system. Content from the broadcast affiliate manager 104 may be delivered using all or part of the delivery network 2 (112), and content from the content manager 110 uses all or part of the delivery network 1 (106). It is important to note that it may be delivered at a time. Furthermore, the user may acquire content directly from the Internet via the delivery network 2 (112) without necessarily having the content managed by the content manager 110.

セットトップボックス／デジタル・ビデオ・レコーダー１０８は、送達ネットワーク１および送達ネットワーク２の一方または両方から種々の型のコンテンツを受信してもよい。セットトップボックス／デジタル・ビデオ・レコーダー１０８はコンテンツを処理し、ユーザー選好およびコマンドに基づいて該コンテンツの分離を提供する。セットトップボックス／デジタル・ビデオ・レコーダーはまた、オーディオおよびビデオ・コンテンツを記録および再生するためのハードドライブまたは光ディスクドライブのような記憶装置を含んでいてもよい。セットトップボックス／デジタル・ビデオ・レコーダー１０８の動作のさらなる詳細および記憶されたコンテンツの再生に関する機能は下記で図２との関連で述べる。処理されたコンテンツは表示装置１１４に提供される。表示装置１１４は通常の2D型のディスプレイであってもよいし、あるいはまた、高度な3Dディスプレイであってもよい。 Set top box / digital video recorder 108 may receive various types of content from one or both of delivery network 1 and delivery network 2. Set top box / digital video recorder 108 processes the content and provides separation of the content based on user preferences and commands. The set top box / digital video recorder may also include a storage device, such as a hard drive or optical disc drive, for recording and playing audio and video content. Further details of the operation of the set top box / digital video recorder 108 and functions related to the playback of stored content are described below in connection with FIG. The processed content is provided to the display device 114. The display device 114 may be a normal 2D type display, or may be an advanced 3D display.

ここで図２に目を転じると、セットトップボックス／デジタル・ビデオ・レコーダー２００のある実施形態のブロック図が示されている。示された装置２００は、表示装置１１４自身を含め他のシステム中に組み込まれてもよい。いずれの場合でも、システムの完全な動作に必要ないくつかのコンポーネントは、当業者にはよく知られているので、簡潔のため図示していない。 Turning now to FIG. 2, a block diagram of an embodiment of a set top box / digital video recorder 200 is shown. The illustrated device 200 may be incorporated into other systems, including the display device 114 itself. In any case, some components necessary for full operation of the system are well known to those skilled in the art and are not shown for the sake of brevity.

図２に示した装置２００では、コンテンツは入力信号受信器２０２において受信される。入力信号受信器２０２は、空中、ケーブル、衛星、イーサネット（登録商標）、ファイバーおよび電話線ネットワークを含むいくつかの可能なネットワークの一つを通じて与えられる信号を受信し、復調し、復号するために使われるいくつかの既知の受信器回路の一つであってもよい。入力信号受信器２０２において、制御インターフェース（図示せず）を通じて与えられるユーザー入力に基づいて所望される入力信号が選択され、取得される。復号された出力信号は入力ストリーム・プロセッサ２０４に与えられる。入力ストリーム・プロセッサ２０４は、最終的な信号選択および処理を実行し、当該コンテンツ・ストリームについてのオーディオ・コンテンツからのビデオ・コンテンツの分離を含む。オーディオ・コンテンツは、圧縮デジタル信号のような受信されたフォーマットからアナログ波形信号への変換のためにオーディオ・プロセッサ２０６に与えられる。アナログ波形信号はオーディオ・インターフェース２０８に、さらに表示装置１１４またはオーディオ増幅器（図示せず）に与えられる。あるいはまた、オーディオ・インターフェース２０８は、HDMI（High-Definition Multimedia Interface［高精細度マルチメディア・インターフェース］）ケーブルまたはSPDIF（Sony/Philips Digital Interconnect Format［ソニー／フィリップス・デジタル相互接続フォーマット］）のような代替的なオーディオ・インターフェースを使ってデジタル信号をオーディオ出力装置または表示装置に与えてもよい。オーディオ・プロセッサ２０６は、オーディオ信号の記憶のための必要な変換があれば実行する。 In the apparatus 200 shown in FIG. 2, the content is received at the input signal receiver 202. Input signal receiver 202 is for receiving, demodulating and decoding signals provided through one of several possible networks including air, cable, satellite, Ethernet, fiber and telephone line networks. It may be one of several known receiver circuits used. At the input signal receiver 202, a desired input signal is selected and obtained based on user input provided through a control interface (not shown). The decoded output signal is provided to the input stream processor 204. The input stream processor 204 performs final signal selection and processing, including separation of video content from audio content for the content stream. Audio content is provided to audio processor 206 for conversion from a received format, such as a compressed digital signal, to an analog waveform signal. The analog waveform signal is provided to the audio interface 208 and further to the display device 114 or an audio amplifier (not shown). Alternatively, the audio interface 208 can be an HDMI (High-Definition Multimedia Interface) cable or SPDIF (Sony / Philips Digital Interconnect Format). An alternative audio interface may be used to provide the digital signal to the audio output device or display device. Audio processor 206 performs any necessary conversions for storage of the audio signal.

入力ストリーム・プロセッサ２０４からのビデオ出力は、ビデオ・プロセッサ２１０に与えられる。ビデオ信号はいくつかのフォーマットのうちの一つであってもよい。ビデオ・プロセッサ２１０は、入力信号フォーマットに基づいて、必要に応じて、ビデオ・コンテンツの変換を提供する。ビデオ・プロセッサ２１０はビデオ信号の記憶のために必要な変換があれば実行する。 Video output from the input stream processor 204 is provided to the video processor 210. The video signal may be in one of several formats. Video processor 210 provides conversion of video content as needed based on the input signal format. Video processor 210 performs any necessary conversions for storage of the video signal.

記憶装置２１２は入力において受信されるオーディオおよびビデオ・コンテンツを記憶する。記憶装置２１２は、コントローラ２１４の制御のもとで、かつユーザー・インターフェース２１６から受領されるコマンド、たとえば早送り（FF）および巻き戻し（Rew）のようなナビゲーション命令に基づいて、コンテンツののちの取得および再生を許容する。記憶装置２１２はハードディスクドライブ、静的ランダム・アクセス・メモリまたはダイナミック・ランダム・アクセス・メモリのような一つまたは複数の大容量集積（integrated）電子メモリであってもよいし、あるいはコンパクト・ディスク・ドライブまたはデジタル・ビデオ・ディスク・ドライブのような交換可能な光ディスク記憶システムであってもよい。 Storage device 212 stores audio and video content received at the input. The storage device 212 is subject to subsequent acquisition of content under the control of the controller 214 and based on commands received from the user interface 216, such as navigation commands such as fast forward (FF) and rewind (Rew). And allow playback. The storage device 212 may be one or more mass integrated electronic memories such as a hard disk drive, static random access memory or dynamic random access memory, or a compact disk drive. It may be a replaceable optical disk storage system such as a drive or a digital video disk drive.

入力が起源であるにせよ記憶装置２１２が起源であるにせよビデオ・プロセッサ２１０からの変換されたビデオ信号はディスプレイ・インターフェース２１８に与えられる。ディスプレイ・インターフェース２１８はさらに、上記の型の表示装置に表示信号を与える。ディスプレイ・インターフェース２１８は、赤緑青（RGB）のようなアナログ信号インターフェースであってもよいし、あるいは高精細度マルチメディア・インターフェース（HDMI）のようなデジタル・インターフェースであってもよい。 Regardless of whether the input originates from storage 212, the converted video signal from video processor 210 is provided to display interface 218. The display interface 218 further provides display signals to display devices of the type described above. The display interface 218 may be an analog signal interface such as red green blue (RGB), or a digital interface such as a high definition multimedia interface (HDMI).

コントローラ２１４は、入力ストリーム・プロセッサ２０２、オーディオ・プロセッサ２０６、ビデオ・プロセッサ２１０、記憶装置２１２およびユーザー・インターフェース２１６を含む装置２００のコンポーネントのいくつかにバスを介して相互接続される。コントローラ２１４は、入力ストリーム信号を記憶装置上での記憶のためまたは表示のための信号に変換するための変換プロセスを管理する。コントローラ２１４はまた、記憶されたコンテンツの取得および再生をも管理する。コントローラ２１４はさらに、コントローラ２１４のための情報および命令コードを記憶するための制御メモリ２２０（たとえば、ランダム・アクセス・メモリ、静的RAM、ダイナミックRAM、読み出し専用メモリ、プログラム可能型ROM、フラッシュメモリ、EPROM、EEPROMなどを含む揮発性または不揮発性メモリ）に結合されている。さらに、メモリの実装は、単一メモリ・デバイスまたは一緒に接続されて、共有されるもしくは共通のメモリをなす二つ以上のメモリ回路のようないくつかの可能な実施形態を含んでいてもよい。さらに、メモリは、より大きな回路内で、バス通信回路の諸部分のような他の回路と一緒に含められてもよい。 The controller 214 is interconnected via a bus to some of the components of the device 200, including the input stream processor 202, the audio processor 206, the video processor 210, the storage device 212, and the user interface 216. The controller 214 manages the conversion process for converting the input stream signal into a signal for storage or display on the storage device. The controller 214 also manages the acquisition and playback of stored content. The controller 214 further includes a control memory 220 (eg, random access memory, static RAM, dynamic RAM, read only memory, programmable ROM, flash memory, etc.) for storing information and instruction codes for the controller 214. Volatile or non-volatile memory (including EPROM, EEPROM, etc.). Further, a memory implementation may include several possible embodiments such as a single memory device or two or more memory circuits connected together to form a shared or common memory. . Further, the memory may be included with other circuits, such as portions of the bus communication circuit, within a larger circuit.

ビデオ記録装置における早送り（FF）および巻き戻し（Rew）機能を制御する方法について以下で述べる。アルゴリズムまたは関数の物理的な実装は、ビデオ・プロセッサ２１０に関係する離散回路のようなハードウェアまたは制御メモリ２２０に存在し、コントローラ２１４によって読まれ実行されるソフトウェアのようなソフトウェアにおいて行われてもよい。本方法は、コンテンツを解析して、シーンの開始または他の重要な参照点を表しうるコンテンツ内の重要な点を認識し、タグ付けすることを含む。次いで、いくつかの状況のもとでは、装置２００は、いくつかの基準に基づいてジャンプすべき正しい位置を自動的に決定することができる。解析は、放送に先立って、装置への摂取に際して、または再生時に行われてもよい。ただし、好ましい実施形態は、装置への摂取時またはコンテンツがディスクに書き込まれるときであろう。 A method for controlling the fast forward (FF) and rewind (Rew) functions in the video recording apparatus will be described below. The physical implementation of the algorithm or function may be performed in hardware, such as a discrete circuit associated with the video processor 210 or in control memory 220, and in software such as software read and executed by the controller 214. Good. The method includes analyzing the content to recognize and tag important points in the content that may represent the start of a scene or other important reference points. Then, under some circumstances, the device 200 can automatically determine the correct position to jump based on several criteria. The analysis may be performed prior to broadcasting, upon ingestion into the device, or during playback. However, the preferred embodiment would be when ingested into the device or when content is written to the disc.

本開示の一つの実際的な例は、ユーザーにとって、コマーシャル（または広告）休みを通じて早送りしたあと再生を押すときに正しい点で簡単に始めること、あるいは前のコマーシャル休みの終わりまで簡単に巻き戻すことを簡単にするというものである。この場合、正しい開始点または再生位置は、FFまたはRewの速度を見ることによって決定される。再生ボタンが押されたとき、コントローラ２１４は最近通過された「タグ付けされた」諸位置を調べ、何らかのシーン・タグが最近通過されたかどうかやその優先度について判定し、事実上、再生を開始する有効な点を表す以前にまたは動的に認識されたシーン遷移点への近接を判定する。「黒参照フレーム（Black Reference Frame）」の場合、これは有意なマーカーを表していることができ（コマーシャル休みの始まりと終わりに通常、黒参照フレームが使われるので）、FFまたはREWにおいて黒参照フレームが最近通過されたとしたら、それが開始点として使われる。あるいはまた、規則的な間隔から外れた参照フレームも、シーンの始まりを表すことがありうるので、それほど有意ではないトリガー点としてタグ付けされることもできる。 One practical example of this disclosure is that a user can easily start at the right point when pressing play after fast-forwarding through a commercial (or advertising) break, or simply rewinding to the end of the previous commercial break. Is to make it easier. In this case, the correct starting point or playback position is determined by looking at the speed of FF or Rew. When the play button is pressed, the controller 214 examines the "tagged" locations that were recently passed, determines if any scene tags have been passed recently, and their priority, and in effect starts playing Proximity to previously or dynamically recognized scene transition points representing valid points to be determined. In the case of "Black Reference Frame", this can represent a significant marker (since a black reference frame is usually used at the beginning and end of commercial breaks), and a black reference in FF or REW If the frame has been passed recently, it is used as a starting point. Alternatively, reference frames that are out of regular spacing can also be tagged as less significant trigger points because they can represent the beginning of the scene.

タグを求めてコンテンツを検索する領域を決定するために、FF/Rew機能のスピードは、ユーザー反応時間と一緒に考慮される必要がある。FF/Rewスピードが速いと、ユーザーは、再生を開始したいところを見る間にいくつかの参照点を通過してしまったことがあり、再生はそのうちの適切な参照点から始まる必要がある。より遅いスピードでは、通過した最後の参照点が適切な開始点である可能性が高い。 The speed of the FF / Rew function needs to be considered along with the user reaction time to determine the area to search for content in search of tags. If the FF / Rew speed is fast, the user may have passed several reference points while looking at where he wants to start playback, and playback must start at the appropriate reference point. At slower speeds, the last reference point passed is likely to be a good starting point.

本開示の方法および装置は、コンテンツにタグを関連付けさせ、コンテンツが再生されるとき、判断のもとになる情報が利用可能になるようにすることに基づく。このタグ情報は、三つの主要な動作モードの一つで得ることができる。第一に、コンテンツは放送アフィリエート・マネージャ１０４またはコンテンツ・マネージャ１１０のヘッドエンドにおいて事前解析され、メタデータが一緒に放送されることができる。これは、タグ付けデータを、トランスポート・ストリーム中のSIデータの一部として入れ、タグ付けデータをコンテンツと一緒に送り、DVRまたは装置２００の作業がないようにすることによって実現できる。第二に、コンテンツは装置２００に流れ込む際に、またはディスクに書き込まれるときに解析され、タグ付けされることができる。第三に、コンテンツは、再生に際しておよび／またはトリック・モード動作中に動的に解析され、参照点が動的に生成されることができる。たとえば、ユーザーが早送りまたは巻き戻しする際、装置は実際にはいずれかの方向にコンテンツが通過していく際に何らかのフレーム解析を行っているのである。各タグ付けモードについて、これからさらに述べる。 The methods and apparatus of the present disclosure are based on associating tags with content so that the information on which the decision is made becomes available when the content is played. This tag information can be obtained in one of three main operating modes. First, the content can be pre-analyzed at the broadcast affiliate manager 104 or content manager 110 headend and the metadata can be broadcast together. This can be accomplished by putting the tagged data as part of the SI data in the transport stream and sending the tagged data along with the content so that there is no work on the DVR or device 200. Second, content can be analyzed and tagged as it flows into device 200 or when it is written to disk. Third, the content can be dynamically analyzed during playback and / or during trick mode operation, and reference points can be dynamically generated. For example, when a user fast forwards or rewinds, the device actually performs some frame analysis as the content passes in either direction. Each tagging mode will now be further described.

ビデオ・コンテンツのフレームにタグ付けする第一のモードでは、タグ付けは、コンテンツが送達ネットワークを通じて送信される前にヘッドエンドにおいて実行される。放送局は、歳入喪失の可能性のため（特に、コマーシャルをスキップする可能性に関するので）、コンテンツのタグ付けを支持する可能性が低い。しかしながら、この機能を実際にエンコーダ自身において備えるという概念は、シーン検出ができることには他の可能性も含意されるので、他の機会を呈する。ストリーム自身にシーン・タグ付けが存在していた場合、いくつかの可能性が出てくる。たとえば、優先されるコマーシャルに、スキップできないことを示すタグ付けをするといったことが含まれる。典型的な実施形態では、ヘッドエンドは重要ではないことがある。装置２００はデジタル地上波チューナーをもつ可能性が高く、他の任意のDVRと同様に、装置２００は、オンザフライで該装置が処理しているコンテンツを供給される。しかしながら、ある代替的な実施形態では、ヘッドエンドは、ストリーミングされた前もって準備されたコンテンツを受信するために使われてもよい。この場合、同様の解決策を使って、フィルム内の何らかの高度なシーン検出をもつことが有利でありうる。たとえば、放送局は、大きな最大Iフレーム間隔をもつ非常に長いGOP（group of pictures［ピクチャー・グループ］）をもつコンテンツをもつことを望むことがある。この場合、ヘッドエンドでタグ付けを済ませておくことは貴重である場合があり、コンテンツを通じた再生および検索を容易にすることがある。 In the first mode of tagging frames of video content, tagging is performed at the headend before the content is transmitted over the delivery network. Broadcasters are less likely to support content tagging because of the possibility of lost revenue (especially with respect to the possibility of skipping commercials). However, the concept of actually providing this function in the encoder itself presents other opportunities because other possibilities are implied by being able to detect the scene. If scene tagging exists in the stream itself, there are several possibilities. For example, priority commercials may be tagged to indicate that they cannot be skipped. In an exemplary embodiment, the head end may not be important. The device 200 is likely to have a digital terrestrial tuner and, like any other DVR, the device 200 is fed content that the device is processing on the fly. However, in an alternative embodiment, the headend may be used to receive streamed pre-prepared content. In this case, it may be advantageous to have some advanced scene detection in the film using a similar solution. For example, a broadcaster may desire to have content with a very long GOP (group of pictures) with a large maximum I-frame interval. In this case, tagging at the headend may be valuable and may facilitate playback and search through the content.

ビデオ・コンテンツのフレームにタグ付けする第二のモードでは、タグ付けは、ビデオ・プロセッサ２１０によるセットトップボックス２００への摂取の間に、すなわちコンテンツが受信されるおよび／またはディスク、ハードドライブまたは他のメモリ・デバイスに書き込まれるところで、行われる。コンテンツが装置中に摂取されつつあるおよび／または処理されディスクに書き込まれつつあるある点は、コンテンツを解析し、タグ付けを提供する最適な点である可能性が高い。処理のレベルは要件に依存して変わり、単に間隔が規則的でないIフレームおよび「黒」Iフレームにタグ付けするという単純なものであってもよいし、より洗練されたシーン検出を含んでいてもよい。どのくらいの追加的ディスク・スペースを使用できるか、またどのくらいの追加情報が記憶されるべきかについての考察がある。ある実施形態では、諸シーンが検出されるとき、コンテンツのグラフィック・ベースのブラウズを許容するために、シーンを開始するフレームの諸サムネイルも取り込まれてもよい。 In the second mode of tagging frames of video content, tagging is during ingestion by the video processor 210 into the set-top box 200, i.e. content is received and / or disk, hard drive or other Where it is written to the memory device. The point at which content is being ingested into the device and / or processed and written to disk is likely to be the best point to analyze the content and provide tagging. The level of processing varies depending on the requirements and may be as simple as tagging non-regular I-frames and “black” I-frames, or includes more sophisticated scene detection. Also good. There are considerations about how much additional disk space can be used and how much additional information should be stored. In some embodiments, when scenes are detected, thumbnails of frames that start the scene may also be captured to allow graphic-based browsing of content.

フレームにタグ付けする第三のモードは、リアルタイムでコンテンツにタグ付けすることを含む。コンテンツがあらかじめタグ付けされていない場合、ビデオ・プロセッサ２１０はシーン解析を実行できる。ここで、シーン解析は、早送りまたは巻き戻しイベントの間にオンザフライで行われることができる。ユーザーが早送りまたは巻き戻しを行う場合、ビデオ・プロセッサ２１０は本質的にはオンザフライでタグ付けを行い、どこに適切なシーン点があるかについてのカウンタを維持する。ユーザーが再生を押すとき、下記で記載されるアルゴリズムまたは機能が適用されて、適切なタグ位置にジャンプする。 A third mode for tagging frames involves tagging content in real time. If the content is not pre-tagged, video processor 210 can perform scene analysis. Here, scene analysis can be performed on-the-fly during fast forward or rewind events. When the user fast forwards or rewinds, the video processor 210 essentially tags on the fly and maintains a counter as to where the appropriate scene point is. When the user presses play, the algorithm or function described below is applied to jump to the appropriate tag location.

すべての場合において、コンテンツのタグ付けは、ユーザーにとって完全に不可視な自動化された解決策として実装される。ただし、どのくらいの情報がタグ付けされるか、そうしたタグを決定するために何が使われるか、それらのタグがどのように使われるかにおいては、かなりの変動がある可能性がある。ある実施形態では、タグは、ファイル中のキーとなる遷移点を定義する非常に少量のデータをなしていてもよい。たとえば、６回のコマーシャル休みがあった二時間番組について、それらのコマーシャル休みの始まりと終わりが、黒参照フレームがあるシーン変化を解析することによって定義されることができる。 In all cases, content tagging is implemented as an automated solution that is completely invisible to the user. However, there can be considerable variation in how much information is tagged, what is used to determine such tags, and how those tags are used. In some embodiments, a tag may comprise a very small amount of data that defines a key transition point in the file. For example, for a two-hour program that had six commercial breaks, the beginning and end of those commercial breaks can be defined by analyzing scene changes with black reference frames.

ビデオ・コンテンツにおいてタグ点を検出するプロセスについてこれから述べる。ビデオを圧縮するプロセスにおいて、Iフレームは典型的には0.5秒または1秒毎に挿入され、シーン変化を表す若干数の散発的なIフレームがある。シーン変化に加えてIフレームは典型的には規則的な間隔で離間しているので、一つの困難は、規則的な間隔のIフレームでシーンが変わることもあり、それは新たなシーンとして識別するのが難しくなるということである。コンテンツの実際の最大Iフレーム間隔を計算するのは比較的簡単である。短い履歴を通して見ていけば少なくともNフレーム毎のIフレームが明らかになる。たとえばコンテンツの最大GOPサイズが0.5秒だとすると、50秒毎に最低でも100個のIフレームがあることになる。しかしながら、シーン変化についての追加的なIフレームのため、50秒の期間当たりにたとえば110個のIフレームがあることがある。これから、間隔はほぼXであるまたはほぼ0.5秒であると推定することができるが、さらに、シーン変化を表す追加的なIフレームがある。 The process for detecting tag points in video content will now be described. In the process of compressing video, I frames are typically inserted every 0.5 or 1 second, and there are a few sporadic I frames that represent scene changes. In addition to scene changes, I frames are typically spaced at regular intervals, so one difficulty is that scenes can change at regularly spaced I frames, which identifies them as new scenes. It will be difficult. It is relatively easy to calculate the actual maximum I-frame interval of the content. If you look through a short history, at least every N frames will be revealed. For example, if the maximum GOP size of content is 0.5 seconds, there will be at least 100 I frames every 50 seconds. However, because of the additional I frames for scene changes, there may be, for example, 110 I frames per 50 second period. From this, it can be estimated that the interval is approximately X or approximately 0.5 seconds, but there are additional I frames that represent scene changes.

タグ付けするための適切なフレームを検出する実際の方法論は当業者には比較的よく知られている。たとえば、ある既知の手法では、動画ビデオ・コンテンツ・データは一般に一連のスチール画像として取り込まれ、記憶され、伝送され、処理され、出力される。その出力が十分短い時間間隔で視聴者に向けられるとき、フレーム毎のデータ内容の小さな変化が動きとして知覚される。二つの隣接するフレーム間の大きなデータ内容の変化は、シーン変化（たとえば屋内から屋外のシーンへの変化、カメラ・アングルの変化、画像内での照度の急激な変化など）として知覚される。 The actual methodology for detecting the appropriate frame for tagging is relatively well known to those skilled in the art. For example, in one known approach, animated video content data is typically captured, stored, transmitted, processed, and output as a series of still images. When the output is directed to the viewer at sufficiently short time intervals, small changes in the data content from frame to frame are perceived as movement. A large change in data content between two adjacent frames is perceived as a scene change (eg, a change from indoor to outdoor scene, a change in camera angle, a sudden change in illuminance in an image, etc.).

エンコードおよび圧縮プロセスは、ビデオ・データ・コンテンツを記憶し、伝送し、処理するために必要なデータの量を減らすために、フレーム毎のビデオ・コンテンツ・データの小さな変化を利用する。変化を記述するために必要とされるデータ量は、もとのスチール画像を記述するために必要とされるデータ量より少ない。たとえば動画像専門家グループ（MPEG: Moving Pictures Experts Group）によって開発された諸規格のもとでは、フレームのグループは、イントラ符号化されたフレーム（Iフレーム）で始まる。Iフレームでは、エンコードされたビデオ・コンテンツ・データは、もとのスチール画像の視覚的属性（たとえばルミナンス、クロミナンス）に対応する。予測符号化されたフレーム（Pフレーム）および双方向符号化されたフレーム（Bフレーム）のような該フレームのグループ内のその後のフレームは、グループ内のより早いフレームからの変化に基づいてエンコードされる。新しいフレーム・グループ、よって新しいIフレームが規則的な時間間隔で始められ、たとえば誤ったビデオ・コンテンツ・データ変化を誘導することからのノイズを防止する。新しいフレーム・グループ、よって新しいIフレームは、ビデオ・コンテンツ・データ変化が大きいときのシーン変化でも始められる。これは、隣り合うスチール画像の間の大きな変化を記述するよりも、新しいスチール画像を記述するほうが必要とされるデータが少ないからである。換言すれば、異なるシーンからの二つの画像は、両者の間にほとんど相関がない。新しいピクチャをIフレームに圧縮するほうが、一方のピクチャを使って他方のピクチャを予測するより効率的である。したがって、コンテンツ・データ・エンコードの際には、隣り合うビデオ・コンテンツ・データ・フレーム間でシーン変化を識別することが重要である。 The encoding and compression process takes advantage of small changes in video content data from frame to frame to reduce the amount of data needed to store, transmit, and process video data content. The amount of data required to describe the change is less than the amount of data required to describe the original still image. For example, under standards developed by the Moving Pictures Experts Group (MPEG), a group of frames begins with an intra-coded frame (I frame). In an I frame, the encoded video content data corresponds to the visual attributes (eg, luminance, chrominance) of the original still image. Subsequent frames in the group of frames, such as predictively encoded frames (P frames) and bi-directionally encoded frames (B frames), are encoded based on changes from earlier frames in the group. The New frame groups, and thus new I frames, are started at regular time intervals to prevent noise from inducing, for example, erroneous video content data changes. A new frame group, and thus a new I-frame, can also be started with a scene change when the video content data change is large. This is because describing a new still image requires less data than describing a large change between adjacent still images. In other words, two images from different scenes have little correlation between them. It is more efficient to compress a new picture into I frames than to use one picture to predict the other picture. Therefore, in content data encoding, it is important to identify scene changes between adjacent video content data frames.

本開示の方法および装置は、絶対ヒストグラム差分和（SAHD: Sum of Absolute Histogram Difference）および絶対ディスプレイ・フレーム差分和（SADFD: Sum of Absolute Display Frame Difference）を使ってシーン変化を検出しうる。そのような方法は、同じシーン内の時間的情報を使って変動をならし、シーン変化を正確に検出する。これらの方法は、リアルタイム（たとえばリアルタイム・ビデオ圧縮）および非リアルタイム（たとえば映画のポストプロダクション）両方の用途のために使用できる。 The disclosed method and apparatus may detect scene changes using a sum of absolute histogram difference (SAHD) and a sum of absolute display frame difference (SADFD). Such a method uses temporal information within the same scene to smooth out variations and accurately detect scene changes. These methods can be used for both real-time (eg, real-time video compression) and non-real-time (eg, movie post-production) applications.

本開示のもう一つの実施形態では、タグのいくつかのレベルがある。すなわち、タグは重みまたは優先度を割り当てられる。この実施形態では、コンテンツ内の検索ゾーンが影響のより多くの部分をもつ。レベルはたとえば次のようなものであってもよい。
黒参照フレーム（最高優先度）
１）規則的でない参照フレーム（二次的な優先度だがシーン変化を表す）
２）その他（任意的）。 In another embodiment of the present disclosure, there are several levels of tags. That is, tags are assigned weights or priorities. In this embodiment, the search zone within the content has more of an impact. For example, the level may be as follows.
Black reference frame (highest priority)
1) Non-regular reference frames (secondary priority but scene change)
2) Other (optional).

典型的には、記憶されたコンテンツを再生するとき、再生は参照フレームから始まる。ただし、タグ付けは、ユーザーがどのフレームから開始することを望んでいる可能性が最も高いかのよりよい推定を許容する。優先度１のフレームが一次または二次検索ゾーン内に見出される場合、再生はここで始まる。優先度１のフレームが一次ゾーン内に見出される場合、それ以上の検索は行われない。優先度１のタグ付けされたフレームが一次ゾーンまたは二次ゾーンにない場合には、中心に最も近い第二の優先度のタグが開始位置として選択される。優先度２のタグと同様の三次優先度のような考慮する必要のある「その他」タグがあることもある。ただし、これらがいずれもない場合には、一次検索ゾーンの中心に最も近い参照フレームが開始位置として選択されることになる。 Typically, when playing stored content, playback begins with a reference frame. However, tagging allows a better estimate of which frame the user is most likely to want to start with. If a priority 1 frame is found in the primary or secondary search zone, playback begins here. If a priority 1 frame is found in the primary zone, no further search is performed. If the priority 1 tagged frame is not in the primary or secondary zone, the second priority tag closest to the center is selected as the starting position. There may be “other” tags that need to be considered, such as a tertiary priority similar to a priority 2 tag. However, if none of these is present, the reference frame closest to the center of the primary search zone is selected as the start position.

タグまたはタグ付けされたフレームを使ってビデオ・コンテンツを再生するプロセスについてこれから述べる。ある実施形態では、あらかじめタグ付けされたコンテンツでのビデオ再生の場合、タグ付けされている、ディスクまたは記憶装置２１２上のコンテンツ・ファイル、またはタグ付け情報を含むコンテンツ・ファイルに関連付けられた別個のファイルがあるとする。タグ付け情報は、ビデオ・コンテンツ・ファイル内で概括的にシーン点を示し、特に、それらのマーカーが参照点としてどのくらい重要かについて重み付けされたタグをもつ。定義された「ルックアップ点」、規則的な間隔のIフレーム（参照フレーム）、間隔から外れたIフレーム（新しいシーンを表す）およびブランクのIフレームといったいくつかの可能なタグ種別がある。ブランク（黒）Iフレームはほとんどデータを含まないので非常に低いデータレートをもち、一般にたとえば、コマーシャル休みとコマーシャル休みの間に、コマーシャルからシーンの先頭への遷移を示して、またはシーンとシーンの間に挿入される。 The process of playing video content using tags or tagged frames will now be described. In some embodiments, for video playback with pre-tagged content, a separate content file associated with the content file on the disc or storage device 212 that is tagged or that includes tagging information. Suppose you have a file. The tagging information generally indicates scene points within the video content file, and in particular has tags weighted as to how important those markers are as reference points. There are several possible tag types: defined “look-up points”, regularly spaced I frames (reference frames), off-spaced I frames (representing a new scene), and blank I frames. Blank (black) I frames contain very little data and therefore have a very low data rate, generally indicating, for example, a transition from commercial to beginning of the scene between commercial breaks or between scenes and scenes. Inserted between.

図３に示されるフローチャートは、コンテンツの放送に先立って、またはコンテンツがDVR装置２００に摂取されるかディスクに書き込まれるかしたときに、コンテンツがあらかじめタグ付けされるときの環境においてコンテンツを再生するプロセスの流れを表す。その情報が、ハードドライブディスクのようなディスクから読み取られる場合（ステップ３０２）、通常の再生が試聴のための速度で行われる（３０４）通常の再生の間、ユーザーはユーザー・インターフェース２１６を介してナビゲーション命令、たとえばコンテンツを早送りまたは巻き戻しする命令を入力してもよい（ステップ３０６）。ナビゲーション命令、たとえば早送り（FF）、巻き戻し（Rew）、シーン・スキップなどがユーザーに、試聴のための通常の再生速度より速い速度でビデオ・コンテンツをナビゲートさせることを理解しておくべきである。 The flowchart shown in FIG. 3 plays the content in an environment where the content is pre-tagged prior to the content broadcast or when the content is consumed by the DVR device 200 or written to disk. Represents the process flow. If the information is read from a disk, such as a hard drive disk (step 302), normal playback is performed at the listening speed (304), and during normal playback, the user can navigate through the user interface 216. Navigation instructions, such as instructions for fast-forwarding or rewinding content, may be entered (step 306). It should be understood that navigation commands, such as fast forward (FF), rewind (Rew), scene skip, etc., allow users to navigate video content faster than the normal playback speed for auditioning. is there.

ユーザーが早送りまたは巻き戻しを入力すると、ユーザーが再び再生を押すまで、すなわちその後のナビゲーション命令まで、追加的な処理は行われない。早送りまたは巻き戻し後にひとたびユーザーが再生を押すと（ステップ３０８）、コントローラ２１４はタグ付けされた情報を調べ、ユーザーが再生を押した位置の適切な範囲内でどんなタグが生起したかを判別する（ステップ３１０）。次いで、コントローラ２１４は、タグ重みおよびFF/Rew速度に基づいて、再生を開始するためにどこにジャンプすべきかの決定を行う（ステップ３１２）。ひとたび再生位置が決定されたら、ビデオ・プロセッサ２１０はその点まで再生ヘッドをシークし、選択されたタグ付けされたフレームからビデオ再生を開始する（ステップ３１４）。 When the user inputs fast forward or rewind, no additional processing is performed until the user presses play again, i.e., until a subsequent navigation command. Once the user presses play after fast forward or rewind (step 308), the controller 214 examines the tagged information to determine what tags have occurred within the appropriate range of locations where the user pressed play. (Step 310). Controller 214 then determines where to jump to start playback based on the tag weight and FF / Rew speed (step 312). Once the playback position is determined, video processor 210 seeks the playback head to that point and starts video playback from the selected tagged frame (step 314).

図４に示される代替的な実施形態では、再生プロセス自身がコンテンツに効果的に動的にタグ付けするために使われることができる。上記のように、最初に、ステップ４０２において、コンテンツがディスクから読まれ、通常の再生が行われる（ステップ４０４）。ユーザーがFF/Rewを実行するとき、すなわちナビゲーション命令を入力するとき（ステップ４０６）、ビデオ・プロセッサ２１０は動的なまたは「オンザフライの」フレーム・タグ付けを適用する（ステップ４０８）。すなわち、装置は、FF/Rewプロセスの間に通過したブランクのシーン、参照フレームなどを検出する。これらのタグは、のちの使用のためにコンテンツと一緒に記憶されてもされなくてもよい。 In an alternative embodiment shown in FIG. 4, the playback process itself can be used to effectively dynamically tag content. As described above, first, in step 402, content is read from the disc and normal playback is performed (step 404). When the user performs FF / Rew, ie, enters navigation instructions (step 406), video processor 210 applies dynamic or “on-the-fly” frame tagging (step 408). That is, the device detects blank scenes, reference frames, etc. that have passed during the FF / Rew process. These tags may or may not be stored with the content for later use.

早送りまたは巻き戻し後にひとたびユーザーが再生を押すと（ステップ４１０）、装置２００は上記のように進行する。コントローラ２１４は、タグ重みおよびFF/Rew速度に基づいて、再生を開始するためにどこにジャンプすべきかの決定を行う（ステップ４１２）。ひとたび再生位置が決定されたら、ビデオ・プロセッサ２１０はその点まで再生ヘッドをシークし、選択されたタグ付けされたフレームからビデオ再生を開始する（ステップ４１４）。 Once the user presses play after fast forward or rewind (step 410), device 200 proceeds as described above. Based on the tag weight and FF / Rew speed, the controller 214 determines where to jump to start playback (step 412). Once the playback position is determined, video processor 210 seeks the playback head to that point and begins video playback from the selected tagged frame (step 414).

コンテンツを通じた早送りまたは巻き戻しのプロセスをサポートすることに加えて、タグ付けは、ユーザーがボタン一押しで「シーンからシーンへ」スキップしたり、あるいは（あらかじめ定義された基本時間期間をもつ）より大量のコンテンツをスキップしたりできる、よりよいまたは異なる体験を提供するためにも使用できる。それでも、再生は、タグにおいて定義されているシーン境界で始まる。このプロセスは図５に示されている。 In addition to supporting the fast forward or rewind process through the content, tagging can be skipped by the user at the push of a button “from scene to scene” or (with a predefined basic time period) It can also be used to provide a better or different experience that allows you to skip large amounts of content. Still, playback begins at the scene boundary defined in the tag. This process is illustrated in FIG.

図５を参照するに、ビデオがディスクから読まれ（ステップ５０２）、試聴のための速度で通常の再生が行われる（ステップ５０４）。ステップ５０６でユーザーが「シーン・スキップ」機能を要求すると、すなわちナビゲーション命令を入力すると、コントローラ２１４はあらかじめ定義された「シーン定義」設定に従って「シーン検索」位置を設定する（ステップ５０８）、すなわち、シーン検索を開始するために固定量の時間だけ前方または後方にジャンプする。次に、ステップ５１０において、コントローラ２１４は「シーン検索」開始点の近傍内でタグ付けされたフレームのためのタグ情報を調べる。次いで、コントローラ２１４は、選択領域内のタグ重みに基づいて、再生を開始するためにどこにジャンプすべきかの決定を行う（ステップ５１２）。ひとたび再生位置が決定されたら、ビデオ・プロセッサ２１０はその点まで再生ヘッドをシークし、選択されたタグ付けされたフレームからビデオ再生を開始する（ステップ５１４）。 Referring to FIG. 5, a video is read from the disc (step 502) and normal playback is performed at a pre-listening speed (step 504). When the user requests the “Scene Skip” function at step 506, ie, enters a navigation command, the controller 214 sets the “scene search” position according to the predefined “scene definition” setting (step 508), ie, Jump forward or backward a fixed amount of time to start the scene search. Next, in step 510, the controller 214 examines tag information for frames tagged within the vicinity of the “scene search” starting point. Controller 214 then determines where to jump to start playback based on the tag weights in the selected region (step 512). Once the playback position is determined, video processor 210 seeks the playback head to that point and begins video playback from the selected tagged frame (step 514).

タグ付けされたコンテンツに関してシーン・スキップを実行できることに加えて、装置２００は、図６に示されるように、あらかじめタグ付けされていないコンテンツに関して動的にシーン・スキップを実行することもできる。上記のように、ビデオがディスクから読まれ（ステップ６０２）、試聴のための速度で通常の再生が行われる（ステップ６０４）。ステップ６０６でユーザーが「シーン・スキップ」機能を要求すると、コントローラ２１４はあらかじめ定義された「シーン定義」設定に従って「シーン検索」位置を設定する（ステップ６０８）、すなわち、シーン検索を開始するために固定量の時間だけ前方または後方にジャンプする。次に、ステップ５１０において、コントローラ２１４は「シーン検索」開始点の近傍内でタグ付けされたフレームのためのタグ情報を調べる。ビデオ・プロセッサ２１０は、動的なまたは「オンザフライの」フレーム・タグ付けを適用する（ステップ６１０）。すなわち、ビデオ・プロセッサ２１０は、シーン・スキップ・プロセスの間に通過したブランクのシーン、参照フレームなどを検出する。これらの検出フレームまたは参照点がタグ付けされる。これらのタグは、のちの使用のためにコンテンツと一緒に記憶されてもされなくてもよい。次いで、コントローラ２１４は、選択領域内のタグ重みに基づいて、再生を開始するためにどこにジャンプすべきかの決定を行う（ステップ６１２）。ひとたび再生位置が決定されたら、ビデオ・プロセッサ２１０はその点まで再生ヘッドをシークし、選択されたタグ付けされたフレームからビデオ再生を開始する（ステップ６１４）。 In addition to being able to perform scene skipping on tagged content, the apparatus 200 can also dynamically perform scene skipping on pre-tagged content, as shown in FIG. As described above, the video is read from the disc (step 602) and normal playback is performed at the speed for listening (step 604). When the user requests the “Scene Skip” function in step 606, the controller 214 sets the “scene search” position according to the predefined “scene definition” settings (step 608), ie, to initiate a scene search. Jump forward or backward for a fixed amount of time. Next, in step 510, the controller 214 examines tag information for frames tagged within the vicinity of the “scene search” starting point. Video processor 210 applies dynamic or “on-the-fly” frame tagging (step 610). That is, the video processor 210 detects blank scenes, reference frames, etc. that have passed during the scene skip process. These detection frames or reference points are tagged. These tags may or may not be stored with the content for later use. Controller 214 then determines where to jump to start playback based on the tag weights in the selected region (step 612). Once the playback position is determined, video processor 210 seeks the playback head to that point and begins video playback from the selected tagged frame (step 614).

ユーザーが再生を押したあとにいかにして適切な再生位置を決定するかの機能についてここで述べる。再生を開始するべき適切な位置を決定するために、コントローラ２１４はいくつかの因子の一つに基づいて開始点を設定し、次いでその参照点からいずれかの方向に検索する期間またはゾーンを指定する。コントローラ２１４は、どんなタグがその範囲内にはいっているかを調べるために探索し、アルゴリズムまたは関数を適用して、再生のための最も適切な開始点を決定する。 The function of how to determine the appropriate playback position after the user presses play is described here. In order to determine the appropriate position at which playback should begin, the controller 214 sets a starting point based on one of several factors, then specifies a period or zone to search in either direction from that reference point To do. The controller 214 searches to see what tags are within its range and applies an algorithm or function to determine the most appropriate starting point for playback.

開始点位置は何らかの形の参照フレームである可能性が高いが、代替的なあらかじめ定義されたタイムスタンプをキーにして出発することも可能である。これは参照フレーム以外であってもよい。実際、タグ付け機構の一部として、これがIフレーム以外、たとえばBフレームであると言うのは易しいが、最後の四つのフレームから簡単に構築可能なのはBフレームである。再生のための開始位置がここであれば、タグは、デバイスが数フレーム戻って、この非参照フレームを構築し、そう扱うために必要とされる全ビデオ・データを取得できるようにするデータ（または該データへの参照）を含むことができる。この場合、タグは、必要とされるデータをゼロからオンザフライで計算しなければならないのではなく、必要とされるデータを得るのをより迅速かつ簡単にするために必要とされるオフセット情報を含むであろう。 The starting point location is likely to be some form of reference frame, but it is also possible to start with an alternative predefined time stamp as a key. This may be other than the reference frame. In fact, as part of the tagging mechanism, it is easy to say that this is a non-I frame, for example a B frame, but it is the B frame that can be easily constructed from the last four frames. If the starting position for playback is here, the tag will return data that allows the device to go back a few frames to build this non-reference frame and get all the video data needed to handle it. Or a reference to the data). In this case, the tag does not have to calculate the required data from scratch on the fly, but contains the offset information needed to make it faster and easier to get the required data Will.

もう一つの実施形態では、ビデオ圧縮の結果としてたとえば10秒の非常に長いGOPを生じた場合、本開示は、本装置および方法がそのようなビデオで実際に早送りおよび巻き戻しをサポートできるようどこかほかのところから参照フレームを取得する機構を提供する。それは、そのようなビデオを外部データで増強し、追加的なフレームをインターネットまたは他の何らかの媒体および／または源から動的に取得することによる。この例では、ストリームは最小限の参照フレームをもち、完全な諸フレームを構築するために必要とされるIフレームの残りおよび介在するデータの別の源がある。 In another embodiment, if the video compression results in a very long GOP of, for example, 10 seconds, the present disclosure describes where the apparatus and method can actually support fast forward and rewind with such video. Provides a mechanism to obtain a reference frame from elsewhere. It is by augmenting such video with external data and dynamically acquiring additional frames from the Internet or some other medium and / or source. In this example, the stream has a minimum number of reference frames, and there is a remainder of the I frames and another source of intervening data needed to build a complete frame.

DVRは典型的には、トリック・モード再生の際にDVRがIフレームからIフレームにジャンプするまたはどの参照フレームが表示されるべきかを決定するアルゴリズムまたは関数を用いる。本開示は、この基本的な発想に基づいて拡張し、単にIフレームを参照するのではなく、DVRが停止しうる複数の可能な点がある。それらの点は名目上、シーンと定義される。タグは再生を開始する可能な点を定義するが、これらのタグを検索すべきコンテンツ内の時間区間を決定し、どのタグがそのコンテンツ内の最適な開始点を表すかを決定するためにアルゴリズムまたは関数が適用される。 A DVR typically uses an algorithm or function that determines which reference frame should be displayed or jumps from I frame to I frame during trick mode playback. The present disclosure extends based on this basic idea, and there are several possible points where the DVR can stop rather than simply referencing I-frames. These points are nominally defined as scenes. Tags define the possible points at which playback begins, but determine the time interval within the content where these tags should be searched, and algorithms to determine which tag represents the best starting point within that content Or the function is applied.

この実装では、何らかの再生位置検索のための開始位置および終了位置は、ユーザーが早送り／巻き戻しを開始した、すなわち第一のナビゲーション命令を入力したコンテンツ・ファイル内の位置と、ユーザーが再生を押した、すなわち第二のナビゲーション命令を入力したところとが境界となる。これらの境界の外側では検索は行われない。タグ検索の開始位置を決定するために、コントローラ２１４は、図７に示されるように、（検索エリアの中心における）「検索位置」と、タグを検索すべきエリア（またはゾーン）のサイズとの両方を計算する。 In this implementation, the start and end positions for any playback position search are the position in the content file where the user started fast-forward / rewind, i.e., entered the first navigation instruction, and the user presses play. In other words, the boundary from the input of the second navigation command is the boundary. No search is performed outside of these boundaries. In order to determine the starting position of the tag search, the controller 214 determines the “search position” (in the center of the search area) and the size of the area (or zone) where the tag is to be searched, as shown in FIG. Calculate both.

ユーザーがFFまたはRew実行中に再生ボタンを押すとき、検索開始位置は次の基準に基づいてファイル内で定義される：１）ユーザーがFF/Rewを行っている速度および２）ユーザーに割り当てられた名目反応時間。ユーザーの反応時間は初期に2〜5秒に設定されてもよく、実際のありそうな反応時間に関してはユーザー入力および／または装置２００の経験に従って修正されることができる。これについて下記で詳述する。 When the user presses the play button during FF or Rew execution, the search start position is defined in the file based on the following criteria: 1) the speed at which the user is doing FF / Rew and 2) assigned to the user Nominal reaction time. The user's reaction time may be initially set to 2-5 seconds, and the actual likely reaction time can be modified according to user input and / or device 200 experience. This will be described in detail below.

例を挙げると、ユーザーはリアル・スピードの30倍でFFし、ファイル中で43分10秒（43:10）のところで再生を押す。ユーザーが4秒の反応時間を割り当てられているとする。これは、検索のための中心位置７０２が、ユーザーが再生を押した位置より4×30秒（すなわち2分）前であるということを意味する（すなわち41:10）。したがって、タグ付けされたフレームの検索は、この位置から始まり、一次検索ゾーン７０４は中心点７０２の各側にこの距離の固定割合である。この割合が50%であるとすると、タグ検索ゾーンは中心点の各側1分、すなわちファイル中の40:10から42:10までの間となる。この範囲内に何らかの優先度タグ付けされたフレームが見つかったら、ヒットが登録され、ビデオ再生は、最高優先度をもつタグ付けされたフレームから始まる。二つ以上の一致がみつかり、タグ優先度の重みが同じ場合には、再生は、中心位置７０２に最も近い点から始まる。何らかの一致がなされた場合、ユーザーの反応時間が測定されてもよく、可能性として、将来の検索のための期待される応答時間を変更するために使われてもよい。 For example, the user FFs 30 times real speed and presses play at 43 minutes 10 seconds (43:10) in the file. Suppose a user is assigned a reaction time of 4 seconds. This means that the center position 702 for the search is 4 × 30 seconds (ie 2 minutes) before the position where the user pressed play (ie 41:10). Thus, the search for tagged frames begins at this position and the primary search zone 704 is a fixed percentage of this distance on each side of the center point 702. If this ratio is 50%, the tag search zone is 1 minute on each side of the center point, that is, between 40:10 and 42:10 in the file. If any priority tagged frame is found within this range, a hit is registered and video playback begins with the tagged frame with the highest priority. If two or more matches are found and the tag priority weights are the same, playback begins from the point closest to the center position 702. If any match is made, the user's reaction time may be measured and possibly used to change the expected response time for future searches.

一致がみつからない場合、二次ゾーン７０６も検索される。これはたとえば、ユーザーが再生を押した位置から中心点７０２までの距離の100%であってもよい。この検索においてキーとなるタグがみつかった場合、これはユーザーの反応が異常であったことを示すことがありえ、キーとなるフレームがこのエリアに存在する場合、そのフレームはやはり開始位置として選択されることができる。 If no match is found, the secondary zone 706 is also searched. This may be, for example, 100% of the distance from the position where the user pressed play to the center point 702. If a key tag is found in this search, this may indicate that the user's response was abnormal, and if a key frame exists in this area, that frame is still selected as the starting position. Can.

最後の学習検索ゾーン７０８は中心点７０２から再生位置まで延び、中心点から200%戻る。これは、最初の二つのゾーンのいずれにおいてもキーとなるフレームがみつからなかった場合にのみ検索される。キーとなるタグ付けされたフレームがここでみつかった場合には、遅延が記録されることができ、これが恒常的な振る舞いである場合には、ユーザーの反応時間が調整されてもよい。キーとなるフレームが一次ゾーンにはいることがもっと多くなることを保証するためである。中心点からの距離の割合は単に例示的なものであり、ユーザー・プロファイリングを通じて決定するほうがよいことを注意しておく。さらに、割合に関わらず、検索は、先述したように、検索の端の境界内で行われる。
ある実施形態では、前記第一の検索領域（７０４）および前記第二の検索領域（７０６）内にタグ付けされたフレームがない場合：
前記第二の検索領域（７０６）より大きな第三の検索領域（７０８）を選択する段階と；
前記第三の検索領域（７０８）内に前記少なくとも一つのタグ付けされたフレームを判別する際、前記ユーザーに割り当てられた前記反応時間を調整する段階と；
前記第一の検索領域の長さを増大させる段階とが実行されてもよい。 The last learning search zone 708 extends from the center point 702 to the playback position and returns 200% from the center point. This is searched only if no key frame is found in either of the first two zones. If a key tagged frame is found here, a delay can be recorded, and if this is a constant behavior, the user response time may be adjusted. This is to ensure that more key frames enter the primary zone. Note that the percentage of distance from the center point is merely exemplary and should be determined through user profiling. Further, regardless of the ratio, the search is performed within the boundary of the end of the search as described above.
In one embodiment, if there are no tagged frames in the first search area (704) and the second search area (706):
Selecting a third search area (708) larger than the second search area (706);
Adjusting the reaction time assigned to the user when determining the at least one tagged frame in the third search area (708);
Increasing the length of the first search area may be performed.

ユーザーの反応時間を決定するために、装置２００は自動化された機構と手動機構の両方を用いる。これは、ユーザーに自分の反応時間を定義および／または試験させるユーザー選好を含んでいてもよい。典型的な反応時間はたとえば2秒であり、よってユーザーがコンテンツを通じて早送りする際、ユーザーが再生を開始したい点を見るときから、ユーザーが再生ボタンを押すまでにある時間がかかる。ユーザーの反応時間が2秒であり、通常再生の30倍で早送りしている例では、ユーザーが再生を押す契機となったものからユーザーが実際に再生を押すまでの間に、1分ぶんのビデオが通過する。FFレートがたとえば通常再生のたった2倍だとすると、この時間内に通過するビデオはたった4秒である。ユーザーの反応時間はきわめて変動しやすく、遅い反応時間は約5秒、速い反応時間はおそらく0.5秒である。 In order to determine the user reaction time, the apparatus 200 uses both automated and manual mechanisms. This may include user preferences that allow users to define and / or test their reaction times. A typical reaction time is, for example, 2 seconds, so when the user fast-forwards through the content, it takes some time for the user to press the play button after seeing the point where the user wants to start playback. In an example where the user's reaction time is 2 seconds and fast-forward at 30 times the normal playback, it takes 1 minute between the moment when the user presses playback and the user actually presses playback. The video passes. If the FF rate is only twice that of normal playback, for example, the video passing in this time is only 4 seconds. User response times are highly variable, slow response times are about 5 seconds, and fast reaction times are probably 0.5 seconds.

装置２００は、ユーザーの反応時間が速いか否かを判定する。おおまかな規則として、試験に基づいて平均ユーザー応答を設定するためにデフォルト値が使われる。さらに、装置２００は、ユーザーが自分の反応時間を構成設定するおよび／または動的に計算させるためのユーザー・インターフェースを提供してもよい。装置が平均的なユーザーのデフォルト時間、たとえば2秒を定義するとすると、その後装置は、ユーザーが実際にどのように反応するかの記録を時間とともに蓄積することができる。これはたとえば、ユーザーが再生を押すときから通例長い距離内に一貫して見出される高優先度の「ブランク・フレーム」タグがあるかどうかの試験に基づく。応答時間は、装置２００上のユーザー・ベースのシステムに接続されていてもよく、それによりシステムの複数のユーザーについて別個のプロファイリングが実施されてもよい。 The device 200 determines whether the user response time is fast. As a rule of thumb, default values are used to set the average user response based on tests. Furthermore, the apparatus 200 may provide a user interface for the user to configure and / or dynamically calculate his / her reaction time. If the device defines an average user default time, eg 2 seconds, then the device can accumulate a record of how the user actually reacts over time. This is based, for example, on testing for high priority “blank frame” tags that are typically found consistently within a long distance from when the user presses play. The response time may be connected to a user-based system on the device 200 so that separate profiling may be performed for multiple users of the system.

手動反応時間は、表示装置１１４上に表示される伝統的なスライダーを使って設定されてもよい。別のオプションは、たとえば、ランダムな順序で一連の画像を見せて、ユーザーに特定の画像（たとえば犬の写真など）を見たときにボタンを押すよう求め、画像が表示されたときからユーザーが再生を押したときまでの間の時間を測定することによってユーザーの反応速度を決定する機構である。よりよい精度を得るために試験は複数回繰り返されてもよいし、ユーザー固有であってもよい（すなわち、システムは、試験の観点からと装置使用のための両方で、ユーザーが個々に自分を識別することを許容してもよい）。 The manual reaction time may be set using a traditional slider displayed on the display device 114. Another option is, for example, to show a series of images in a random order and ask the user to press a button when viewing a particular image (for example, a picture of a dog). It is a mechanism that determines the user's reaction rate by measuring the time between when playback is pressed. The test may be repeated multiple times to obtain better accuracy, or it may be user specific (i.e., the system will allow the user to individually identify himself, both from a test perspective and for device use). May be allowed to identify).

本開示の教示を組み込む諸実施形態が本稿で詳細に示され、説明されてきたが、当業者は、これらの教示を組み込んでいるままで他の多くの変形した実施形態を容易に考案することができる。デジタル・コンテンツにおける最適な再生位置決めのための方法および装置の好ましい実施形態（これらは限定するものではなく例解するものであることが意図されている）を記載してきたが、上記の教示に照らして当業者は修正や変更をなすことができることを注意しておく。したがって、付属の請求項に記載される開示の範囲内で開示された開示の特定の実施形態に変更をなしうることを理解しておくべきである。 While embodiments that incorporate the teachings of this disclosure have been shown and described in detail herein, one of ordinary skill in the art can readily devise many other variations that still incorporate these teachings. Can do. Although preferred embodiments of methods and apparatus for optimal playback positioning in digital content have been described (these are intended to be illustrative rather than limiting), in light of the above teachings It should be noted that those skilled in the art can make modifications and changes. Accordingly, it should be understood that changes may be made in the particular embodiments of the disclosure disclosed within the scope of the disclosure as set forth in the appended claims.

いくつかの付記を記載しておく。
〔付記１〕
複数のフレームを含むビデオ・コンテンツにおける最適な再生位置を決定する方法であって：
試聴のための再生速度でビデオ・コンテンツを表示する段階と；
前記ビデオ・コンテンツを前記試聴のための再生速度より速い速度でナビゲートする第一のナビゲーション命令を受け取る段階と；
前記試聴のための再生速度での前記ビデオ・コンテンツの再生を再開する第二のナビゲーション命令を受け取る段階と；
前記ビデオ・コンテンツの少なくとも一つのタグ付けされたフレームに基づいて、前記第二のナビゲーション命令に応答して前記ビデオ・コンテンツの再生位置を決定する段階とを含む、
方法。
〔付記２〕
前記ビデオ・コンテンツの前記少なくとも一つのタグ付けされたフレームが、前記表示する段階より前にタグ付けされる、付記１記載の方法。
〔付記３〕
前記第一のナビゲーション命令を受け取る段階と第二のナビゲーション命令を受け取る段階の間の時間期間内にフレームが通過される際に前記ビデオ・コンテンツの少なくとも一つのフレームに動的にタグ付けする段階をさらに含む、付記１記載の方法。
〔付記４〕
前記決定する段階がさらに：
前記第一のナビゲーション命令を受け取る段階と第二のナビゲーション命令を受け取る段階の間の時間期間内に通過されるフレームのうちに検索開始位置を決定する段階と；
決定された検索開始位置の両側に前記ビデオ・コンテンツの所定の時間を含む、タグ付けされたフレームを検索するための第一の検索領域を選択する段階とを含む、
付記１記載の方法。
〔付記５〕
前記検索開始位置が前記第一のナビゲーション命令の速度に基づく、付記４記載の方法。
〔付記６〕
前記検索開始位置がユーザーに割り当てられた反応時間にさらに基づく、付記５記載の方法。
〔付記７〕
前記第一の検索領域内に少なくとも二つのタグ付けされたフレームがある場合、優先度が最も高いタグ付けされたフレームを前記再生位置として選択することをさらに含む、付記６記載の方法。
〔付記８〕
前記第一の検索領域内にタグ付けされたフレームがない場合、前記第一の検索領域より大きな第二の検索領域を選択する段階をさらに含む、付記７記載の方法。
〔付記９〕
前記第一の領域および前記第二の検索領域内にタグ付けされたフレームがない場合：
前記第二の検索領域より大きな第三の検索領域を選択する段階と；
前記第三の検索領域内に前記少なくとも一つのタグ付けされたフレームを判別する際、前記ユーザーに割り当てられた前記反応時間を調整する段階と；
前記第一の検索領域の前記所定の時間の数を増大させる段階とをさらに含む、
付記８記載の方法。
〔付記１０〕
前記第一のナビゲーション命令が早送り機能または巻き戻し機能である、付記１記載の方法。
〔付記１１〕
前記第二のナビゲーション命令が再生機能である、付記１０記載の方法。
〔付記１２〕
前記第一のナビゲーション命令がシーン・スキップ機能である、付記１記載の方法。
〔付記１３〕
前記決定する段階がさらに：
前記第一のナビゲーション命令を受け取る時刻から所定の時間、前方または後方に動かすことによって検索開始位置を決定する段階と；
前記検索開始位置の近傍内で少なくとも一つのタグ付けされたフレームを検索する段階とを含む、
付記１２記載の方法。
〔付記１４〕
前記第一の検索領域内に少なくとも二つのタグ付けされたフレームがある場合、優先度が最も高いタグ付けされたフレームを前記再生位置として選択することをさらに含む、付記１３記載の方法。
〔付記１５〕
複数のフレームを含むビデオ・コンテンツを再生する装置であって：
試聴のための再生速度でビデオ・コンテンツを再生装置に提供するビデオ・プロセッサと；
前記ビデオ・コンテンツを前記試聴のための再生速度より速い速度でナビゲートする第一のナビゲーション命令を受け取り、前記試聴のための再生速度での前記ビデオ・コンテンツの再生を再開する第二のナビゲーション命令を受け取るユーザー・インターフェースと；
前記第二のナビゲーション命令を受け取り、前記ビデオ・コンテンツの少なくとも一つのタグ付けされたフレームに基づいて前記ビデオ・コンテンツの再生位置を決定し、決定された再生位置を前記ビデオ・プロセッサに提供する、前記ユーザー・インターフェースに結合されたコントローラとを有する、
装置。
〔付記１６〕
前記ビデオ・プロセッサは、前記ビデオ・コンテンツの前記少なくとも一つのタグ付けされたフレームに、前記ビデオ・コンテンツを記憶装置に記憶する前にタグ付けしている、付記１５記載の装置。
〔付記１７〕
前記ビデオ・プロセッサは、前記第一のナビゲーション命令と前記第二のナビゲーション命令の受け取りの間の時間期間内にフレームが通過される際に前記ビデオ・コンテンツの前記少なくとも一つのフレームに動的にタグ付けする、付記１５記載の装置。
〔付記１８〕
前記コントローラがさらに、前記第一のナビゲーション命令と前記第二のナビゲーション命令の受け取りの間の時間期間内に通過されるフレームのうちに検索開始位置を決定し、決定された検索開始位置の両側にビデオ・コンテンツの所定の時間を含む、タグ付けされたフレームを検索するための第一の検索領域を選択するよう構成されている、付記１５記載の装置。
〔付記１９〕
前記検索開始位置が前記第一のナビゲーション命令の速度に基づく、付記１８記載の装置。
〔付記２０〕
前記検索開始位置がユーザーに割り当てられた反応時間にさらに基づく、付記１９記載の装置。
〔付記２１〕
前記第一の検索領域内に少なくとも二つのタグ付けされたフレームがある場合、前記コントローラは、優先度が最も高いタグ付けされたフレームを前記再生位置として選択する、付記２０記載の装置。
〔付記２２〕
前記第一の検索領域内にタグ付けされたフレームがない場合、前記コントローラは、前記第一の検索領域より大きな第二の検索領域を選択する、付記２１記載の装置。
〔付記２３〕
前記第一の領域および前記第二の検索領域内にタグ付けされたフレームがない場合、前記コントローラは、前記第二の検索領域より大きな第三の検索領域を選択し、
前記第三の検索領域内に前記少なくとも一つのタグ付けされたフレームを判別する際、前記コントローラは、前記ユーザーに割り当てられた前記反応時間を調整し、前記第一の検索領域の前記所定の時間の数を増大させる、付記２２記載の装置。
〔付記２４〕
前記第一のナビゲーション命令が早送り機能または巻き戻し機能である、付記１５記載の装置。
〔付記２５〕
前記第二のナビゲーション命令が再生機能である、付記２４記載の装置。
〔付記２６〕
前記第一のナビゲーション命令がシーン・スキップ機能である、付記１５記載の装置。
〔付記２７〕
前記コントローラがさらに、前記第一のナビゲーション命令を受け取る時刻から所定の時間、前方または後方に動かすことによって検索開始位置を決定し、前記検索開始位置の近傍内で少なくとも一つのタグ付けされたフレームを検索するよう構成されている、付記２６記載の装置。
〔付記２８〕
前記第一の検索領域内に少なくとも二つのタグ付けされたフレームがある場合、前記コントローラは、優先度が最も高いタグ付けされたフレームを前記再生位置として選択する、付記２７記載の装置。 Here are some notes.
[Appendix 1]
A method for determining an optimal playback position in video content that includes multiple frames, comprising:
Displaying video content at a playback speed for audition;
Receiving a first navigation instruction for navigating the video content at a speed faster than the playback speed for the audition;
Receiving a second navigation instruction to resume playback of the video content at a playback speed for the audition;
Determining a playback position of the video content in response to the second navigation instruction based on at least one tagged frame of the video content.
Method.
[Appendix 2]
The method of claim 1, wherein the at least one tagged frame of the video content is tagged prior to the displaying step.
[Appendix 3]
Dynamically tagging at least one frame of the video content as frames are passed within a time period between receiving the first navigation instruction and receiving a second navigation instruction. The method according to appendix 1, further comprising:
[Appendix 4]
The determining step further includes:
Determining a search start position in a frame that is passed within a time period between receiving the first navigation command and receiving a second navigation command;
Selecting a first search area for searching tagged frames that includes a predetermined time of the video content on both sides of the determined search start position.
The method according to appendix 1.
[Appendix 5]
The method according to claim 4, wherein the search start position is based on a speed of the first navigation command.
[Appendix 6]
The method according to claim 5, wherein the search start position is further based on a reaction time assigned to a user.
[Appendix 7]
7. The method of claim 6, further comprising selecting a tagged frame with the highest priority as the playback position if there are at least two tagged frames in the first search area.
[Appendix 8]
8. The method of claim 7, further comprising selecting a second search area that is larger than the first search area if there are no tagged frames in the first search area.
[Appendix 9]
If there are no tagged frames in the first region and the second search region:
Selecting a third search area larger than the second search area;
Adjusting the reaction time assigned to the user when determining the at least one tagged frame in the third search region;
Further increasing the predetermined number of times of the first search region.
The method according to appendix 8.
[Appendix 10]
The method of claim 1, wherein the first navigation instruction is a fast forward function or a rewind function.
[Appendix 11]
The method of claim 10, wherein the second navigation instruction is a playback function.
[Appendix 12]
The method of claim 1, wherein the first navigation instruction is a scene skip function.
[Appendix 13]
The determining step further includes:
Determining a search start position by moving forward or backward a predetermined time from the time of receiving the first navigation command;
Searching for at least one tagged frame within the vicinity of the search start position;
The method according to appendix 12.
[Appendix 14]
14. The method of claim 13, further comprising selecting a tagged frame with the highest priority as the playback position if there are at least two tagged frames in the first search area.
[Appendix 15]
An apparatus for playing back video content that includes multiple frames:
A video processor for providing video content to the playback device at a playback speed for listening;
A second navigation instruction for receiving a first navigation instruction for navigating the video content at a speed faster than a playback speed for the audition and restarting the playback of the video content at the playback speed for the audition; User interface to receive;
Receiving the second navigation instruction, determining a playback position of the video content based on at least one tagged frame of the video content, and providing the determined playback position to the video processor; A controller coupled to the user interface;
apparatus.
[Appendix 16]
The apparatus of claim 15, wherein the video processor tags the at least one tagged frame of the video content prior to storing the video content in a storage device.
[Appendix 17]
The video processor dynamically tags the at least one frame of the video content when a frame is passed within a time period between receipt of the first navigation instruction and the second navigation instruction. The device according to appendix 15, which is attached.
[Appendix 18]
The controller further determines a search start position in a frame that is passed within a time period between receipt of the first navigation instruction and the second navigation instruction, and on both sides of the determined search start position. The apparatus of claim 15 configured to select a first search area for searching tagged frames that includes a predetermined time of video content.
[Appendix 19]
The apparatus according to claim 18, wherein the search start position is based on a speed of the first navigation command.
[Appendix 20]
The apparatus of claim 19, wherein the search start position is further based on a reaction time assigned to a user.
[Appendix 21]
The apparatus of claim 20, wherein if there are at least two tagged frames in the first search area, the controller selects the tagged frame with the highest priority as the playback position.
[Appendix 22]
The apparatus of claim 21, wherein if there is no tagged frame in the first search area, the controller selects a second search area that is larger than the first search area.
[Appendix 23]
If there are no tagged frames in the first and second search areas, the controller selects a third search area that is larger than the second search area;
When determining the at least one tagged frame in the third search area, the controller adjusts the reaction time assigned to the user and the predetermined time of the first search area. 24. The apparatus of appendix 22, wherein the number is increased.
[Appendix 24]
The apparatus of claim 15, wherein the first navigation instruction is a fast forward function or a rewind function.
[Appendix 25]
25. The apparatus according to appendix 24, wherein the second navigation command is a playback function.
[Appendix 26]
The apparatus according to claim 15, wherein the first navigation instruction is a scene skip function.
[Appendix 27]
The controller further determines a search start position by moving forward or backward for a predetermined time from a time when the first navigation command is received, and at least one tagged frame within the vicinity of the search start position. 27. Apparatus according to appendix 26, configured to search.
[Appendix 28]
28. The apparatus of claim 27, wherein if there are at least two tagged frames in the first search area, the controller selects the tagged frame with the highest priority as the playback position.

Claims

A method for determining an optimal playback position in video content that includes multiple frames, comprising:
Displaying video content at a playback speed for audition;
Receiving a first navigation instruction for navigating the video content at a speed faster than the playback speed for the audition;
Receiving a second navigation instruction to resume playback of the video content at a playback speed for the audition;
Thereby determining a playback position of the video content in response to the second navigation command, the step of determining a playback position comprising: receiving the first navigation command; At least one tag for determining a search start position in a frame that is passed in a time period between receiving a navigation command and including a predetermined time of the video content on both sides of the determined search start position Further comprising selecting a first search area for searching for tagged frames, wherein the playback position is associated with a selected one of the at least one tagged frame; and Including
If at least two tagged frames are found in the first search region, the determining step further determines a priority for each tagged frame of the at least two tagged frames. Selecting a tagged frame having the highest priority of the at least two tagged frames to be associated with the playback position;
Method.

The method of claim 1, wherein the at least one tagged frame of the video content is tagged prior to the displaying.

Dynamically tagging at least one frame of the video content as frames are passed within a time period between receiving the first navigation instruction and receiving a second navigation instruction. The method of claim 1 further comprising:

The method of claim 1, wherein the search start position is based on a speed of the first navigation instruction.

The method of claim 4, wherein the search start position is further based on a reaction time assigned to a user.

The method of claim 1, further comprising selecting a second search area that is larger than the first search area if there are no tagged frames in the first search area.

If there are no tagged frames in the first region and the second search region:
Selecting a third search area larger than the second search area;
Adjusting the reaction time assigned to the user when determining the at least one tagged frame in the third search region;
Further increasing the predetermined time of the first search area.
The method of claim 6.

The method of claim 1, wherein the first navigation instruction is a fast forward function or a rewind function.

The method of claim 8, wherein the second navigation instruction is a playback function.

The method of claim 1, wherein the first navigation instruction is a scene skip function.

The determining step further includes:
Determining a search start position by moving forward or backward a predetermined time from the time of receiving the first navigation command;
Searching for at least one tagged frame within the vicinity of the search start position;
The method of claim 10.

An apparatus for playing back video content that includes multiple frames:
A video processor for providing video content to the playback device at a playback speed for listening;
A second navigation instruction for receiving a first navigation instruction for navigating the video content at a speed faster than a playback speed for the audition and restarting the playback of the video content at the playback speed for the audition; User interface to receive;
Receiving the second navigation command, determining a search start position in a frame that is passed in a time period between receiving the first navigation command and receiving the second navigation command; Determining the playback position of the video content by selecting a first search area for searching at least one tagged frame that includes a predetermined time of the video content on both sides of a start position A controller coupled to the user interface, wherein the playback position is associated with a selected one of the at least one tagged frame, and wherein the controller further determines the determined playback position A controller for providing to the video processor;
If at least two tagged frames are found in the first search region, the determining step further determines a priority for each tagged frame of the at least two tagged frames. Selecting a tagged frame having the highest priority of the at least two tagged frames to be associated with the playback position;
apparatus.

13. The apparatus of claim 12, wherein the video processor tags the at least one tagged frame of the video content prior to storing the video content in a storage device.

The video processor dynamically tags the at least one frame of the video content when a frame is passed within a time period between receipt of the first navigation instruction and the second navigation instruction. The apparatus of claim 12, attached.

The apparatus of claim 12, wherein the search start position is based on a speed of the first navigation instruction.

The apparatus of claim 15, wherein the search start position is further based on a reaction time assigned to a user.

13. The apparatus of claim 12, wherein the controller selects a second search area that is larger than the first search area if there are no tagged frames in the first search area.

If there are no tagged frames in the first and second search areas, the controller selects a third search area that is larger than the second search area;
When determining the at least one tagged frame in the third search area, the controller adjusts the reaction time assigned to the user and the predetermined time of the first search area. 18. The apparatus of claim 17, wherein the apparatus increases.

The apparatus of claim 12, wherein the first navigation instruction is a fast forward function or a rewind function.

The apparatus of claim 19, wherein the second navigation instruction is a playback function.

The apparatus of claim 12, wherein the first navigation instruction is a scene skip function.

The controller further determines a search start position by moving forward or backward for a predetermined time from a time when the first navigation command is received, and at least one tagged frame within the vicinity of the search start position. The apparatus of claim 21, wherein the apparatus is configured to search.