JP2011029795A

JP2011029795A - System and method for providing annotation information of video content

Info

Publication number: JP2011029795A
Application number: JP2009171668A
Authority: JP
Inventors: Katsumi Inoue; 克己井上
Original assignee: Individual
Current assignee: Individual
Priority date: 2009-07-23
Filing date: 2009-07-23
Publication date: 2011-02-10

Abstract

<P>PROBLEM TO BE SOLVED: To provide a system for providing annotation information of scenes of a video content real-time edition by being aimed at all genres of video contents. <P>SOLUTION: Heading terms common for scenes of all genres of video contents and terms on a genre basis related to them are formed into a dictionary on a hierarchy basis; a graphic user interface part for displaying and selecting the dictionary is provided; an appropriate term is selected from the dictionary by an operation button operation of a remote control using it to provide character information and the like; a speech recognition function of limited terms is provided for performing a high-speed operation; and character information or the like can be provided by selecting an appropriate term from the dictionary by speech recognition. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、映像コンテンツの任意のシーンにアノテーション情報を付与するシステムで、放送番組や自作ビデオなどの映像コンテンツの録画装置、ビデオカメラ、編集装置等の映像装置に適用されるものである。 The present invention is a system that adds annotation information to an arbitrary scene of video content, and is applied to video devices such as a video content recording device, a video camera, an editing device, etc., such as a broadcast program or a self-made video.

自作ビデオを含む、映像コンテンツをより快適に、より効率よく、より積極的に利用するには、映像コンテンツの編集が不可欠であるがさまざまな要因から手軽に実施できない。
映像コンテンツの編集を行う上での最初のステップは目的のシーン探しとその分類であり、これには多くの時間的、肉体的、精神的な負担が強いられる。
例えばハイライトシーンやコマーシャルメッセージシーン（以下ＣＭ）探しなどはその最たるものであり、これ以外にも、映像コンテンツの長さを適切なものにするために、必要シーン／不要シーンを区分けするなど、制作意図に沿って精度よく映像コンテンツを編集する場合に費やす時間の大半は、これらのシーンを探し出す仕事とその分類であると言っても過言でない。
ＣＭは画像の変化や音声のモードの変化からこれを自動検出する方法は以前より実用化されている。
また一方、音声のレベル検出などを利用してハイライトシーンを自動検出する技術も提案されているものの、ハイライトシーンとするシーンは視聴者の個人差のウエートが高く、特定映像コンテンツ以外は自動化が困難である。
例えば、紀行番組で素晴らしい眺めや景観などは万人が胸を打つシーンであるが、凡庸な場所でも昔住んでいた土地や訪れたことのある場所はその人にとって懐かしく貴重なハイライトシーンとなる。
高校野球の番組でも、学校の運動会の自作ビデオでも、自分の子供が出場するシーンではどんな画像やその結果であろうが大切なハイライトシーンである。
またニュース番組において政治経済的な内容であればそれぞれの場面で自分の意見との賛成、反対もハイライトシーンとなる。
以上のようにハイライトシーンに対しては上記のように個人が抱く懐かしいシーン、感動したシーン、怒りを感じるシーンなど、さまざまなシーンが存在し、ひとくくりに自動化することは出来ない。
残しておきたい映像コンテンツの中でも一部のシーンは、見たくない、見せたくないシーンや、不要なシーン、無駄なシーン等さまざまなシーンが存在する。
以上のようにさまざまな目的の映像コンテンツの編集において、目的とするシーン位置を見つけ出し適切な分類をするためには人的判断と文字を中心とした情報（文字情報等）とすることが不可欠であり時間的、肉体的、精神的な負担が強いられる。
このため作業性や効率改善など人的作業を支援、つまり編集シーンへの情報（アノテーション情報）付与ための支援方法が、特に放送業界、映像コンテンツ業界等のプロユーズにおいてさまざまな形で提案されている。 Editing video content is indispensable in order to use video content, including self-made videos, more comfortably, efficiently, and more actively, but it cannot be easily implemented due to various factors.
The first step in editing video content is the search for the desired scene and its classification, which imposes many time, physical and mental burdens.
For example, searching for highlight scenes and commercial message scenes (hereinafter referred to as CMs) is the best one. Besides this, in order to make the length of video content appropriate, the necessary scenes / unnecessary scenes are classified, etc. It's no exaggeration to say that the majority of the time spent editing video content accurately according to the production intent is the task of finding these scenes and their classification.
For CM, a method of automatically detecting this from a change in image or a change in sound mode has been put into practical use.
On the other hand, although technology to automatically detect highlight scenes using audio level detection has also been proposed, the scenes used as highlight scenes have a high individual difference among viewers and are automated except for specific video content. Is difficult.
For example, a wonderful view and landscape in a travel show is a scene where everyone will hit the heart, but even in mediocre places, places that have lived in the past or places that have been visited become nostalgic and valuable highlight scenes for that person .
Whether it is a high school baseball program or a self-made video of a school athletic meet, it is an important highlight scene regardless of what image or result it is.
In addition, if it is political and economic content in a news program, the approval and disagreement with your opinion in each scene is also a highlight scene.
As described above, there are various scenes such as nostalgic scenes, impressed scenes, and angry scenes that individuals have for highlight scenes as described above, and cannot be automated all at once.
Among the video contents that are desired to remain, there are various scenes such as a scene that you do not want to see or want to see, an unnecessary scene, and a useless scene.
As described above, in editing video content for various purposes, it is indispensable to use human judgment and character-centered information (character information, etc.) in order to find the target scene position and classify it appropriately. There is a time, physical and mental burden.
For this reason, support methods for supporting human work such as improving workability and efficiency, that is, providing information (annotation information) to editing scenes have been proposed in various forms, particularly in the broadcast industry, video content industry, and other professional uses. .

このような映像コンテンツの任意のシーンに文字情報等を与えるためのアノテーション情報の入力手段として、
遠隔操作装置（リモコン）と表示画面のＧＵＩ（グラフィックユーザーインターフェース）キーボードにより、遠隔操作装置のカーソルボタン操作でかな文字等を一文字ずつ選択し、漢字変換入力し用語を登録する方法がある。
この方法は現在多くの映像装置に採用されているものの習熟が必要な上、慣れても複雑な操作となり文字入力の効率が悪いため、通常はオフライン（非再生時）で映像コンテンツのタイトルを編集する程度が限界であり、オンライン（再生時）では利用できない。
外部キーボードを用いて、文字を一文字ずつ入力し用語を登録する方法がある。
この方法は習熟すれば効率が良いが、付与する文字、用語等、情報に統一性を持たすことが困難であり、付与する情報が場当たり的になり後の検索に負担がかかる。
音声認識を利用して、用語を登録する方法がある。
一般的な音声認識は習熟も必要であり、また誤認識も避けて通れない、またシステムの負担も大きく、キーボード同様付与する文字、用語等、情報に統一性を持たすことが困難である、また複数の視聴者がいる場合、発声が邪魔になる。
編集用の辞書を利用してこれを利用する方法がある。
この方法は付与する用語に統一性を与えることができるためジャンルごとの辞書をもつ方法も提案されているものの、単に音声認識のためのテンプレートであったり、登録されている用語の効率的な検索方法、更には選択方法についてが解決されない限り、特にオンラインでの利用は困難である。 As an input means of annotation information for giving character information etc. to an arbitrary scene of such video content,
There is a method in which a kana character or the like is selected one by one by a cursor button operation of the remote operation device using a remote operation device (remote control) and a GUI (graphic user interface) keyboard on the display screen, and Kanji characters are converted and input to register a term.
Although this method is currently used in many video devices, it is necessary to master it, and even if you get used to it, it becomes a complicated operation and character input is inefficient, so usually you can edit the title of video content offline (when not playing) The limit is to the extent that it is not available online (during playback).
There is a method of registering terms by inputting characters one by one using an external keyboard.
This method is efficient if it is mastered, but it is difficult to provide uniformity of information such as characters and terms to be added, and the information to be added becomes ad hoc and burdens on subsequent searches.
There is a method of registering terms using voice recognition.
General speech recognition requires proficiency, misrecognition is unavoidable, the burden on the system is large, and it is difficult to make the information, such as letters, terms, etc., uniform as well as the keyboard, When there are multiple viewers, the utterance gets in the way.
There is a method of using this using an editing dictionary.
Although this method can give uniformity to the terms to be assigned, a method having a dictionary for each genre has been proposed, but it is merely a template for speech recognition or an efficient search for registered terms. Unless the method and the selection method are solved, it is particularly difficult to use online.

音声認識と辞書をもつアノテーション情報付与の先行技術文献の一例として、特開２００４−８６１２４号公報ならびに特開２００４−１５３７６４号公報は、コンテンツ制作におけるメタデータ（アノテーション情報）制作装置及び検索装置に関するものであり、いずれも音声認識の誤認率と、邪魔な発声が課題となる、また後者は制作された映像・音声コンテンツを再生することによりメタデータとすべき情報を確認し、音声入力でコンンピュータ等に入力することにより前記メタデータを制作し、検索するシステムであるが、事前に制作された映像・音声コンテンツを確認する必要があり、例えば放送中の番組のハイライトシーンなど、リアルタイムで利用することは難しい。
更に特開２００７−１４０１９８号公報では、映像・音声コンテンツに関連したメタデータを作成するメタデータ（アノテーション情報）作成装置に関するものであり、音声認識の場合キーワードが誤って付与されてしまう問題があり、重要度をもとにしてキーワードを作成することを目的としているので、オペレータの声を音声認識でキーワード登録するので誤認率および、邪魔な発声は残されたままであり、また重要度は機械的に事前登録する方法であり必ずしもそのシーンに最適な重み付けとならない。
従がって上記同様放送中の番組のハイライトシーンなど、リアルタイムで利用することは難しい。 As an example of a prior art document of speech recognition and annotation information addition having a dictionary, Japanese Patent Application Laid-Open No. 2004-86124 and Japanese Patent Application Laid-Open No. 2004-153964 relate to a metadata (annotation information) production device and a search device in content production. In both cases, misrecognition rate of voice recognition and disturbing utterances are problems, and the latter confirms information that should be made metadata by playing the produced video and audio content, and the computer with voice input It is a system that creates and searches for the metadata by inputting it to the etc., but it is necessary to check the video / audio content produced in advance, for example, highlight scenes of broadcast programs, etc. Difficult to do.
Furthermore, Japanese Patent Application Laid-Open No. 2007-140198 relates to a metadata (annotation information) creation apparatus that creates metadata related to video / audio content, and there is a problem that keywords are erroneously assigned in the case of voice recognition. Because the purpose is to create keywords based on the importance, the operator's voice is registered as a keyword by voice recognition, so the misperception rate and disturbing utterance remain, and the importance is mechanical The pre-registration method is not necessarily an optimal weighting for the scene.
Therefore, it is difficult to use in real time such as a highlight scene of a program being broadcast as described above.

特開２００４−８６１２４号公報JP 2004-86124 A 特開２００４−１５３７６４号公報JP 2004-153764 A 特開２００７−１４０１９８号公報JP 2007-140198 A

以上のような技術的な背景を克服して、熟練したオペレータであったたり、事前にシーンの概要を把握することを必要とせず、全ての映像コンテンツのジャンルを対象としてオンライン（放送中）の映像コンテンツを視聴しながら視聴環境を配慮しリアルタイムで任意のシーンを対象として、特別の習熟を必要とせず、誰でも当該シーンにふさわしく、かつ編集の検索用に集約され、高精度で、高速で、文字情報等を中心とする情報を付与するためのアノテーション情報付与システムを提供する。
また本発明の主要適用装置である家庭用汎用録画装置等に広く実現可能なコストを目指すために、特別な、装置、部品、組立、の技術を用いることなく、現在市場に広く流通している、装置、部品、組立、の技術で実現可能なアノテーション情報付与システムを提供する。 Overcoming the technical background as described above, it is not necessary to be a skilled operator or to grasp the outline of the scene in advance, and it is online (broadcasting) for all video content genres. Considering the viewing environment while viewing the video content, targeting any scene in real time, no special skill is required, anyone is suitable for the scene, and is aggregated for searching for editing, with high accuracy and high speed An annotation information providing system for providing information centered on character information and the like is provided.
Moreover, in order to aim at a cost that can be widely realized in a general-purpose video recording device for home use, which is a main application device of the present invention, it is currently widely distributed in the market without using any special device, component, or assembly technology. An annotation information providing system that can be realized by the technology of devices, parts, and assembly is provided.

以上の課題を解決するために
請求項１では、
映像装置と、このユーザーインターフェース装置と、で構成される自作ビデオを含む映像コンテンツの任意のシーンにアノテーション情報を付与するためのシステムであって、
上記映像装置は、
映像コンテンツの全てのジャンルのシーンに共通なシーンを視聴した印象を表す用語である印象用語の見出し用語と、映像コンテンツのジャンル特有の用語であるジャンル用語と、を映像コンテンツのジャンル別および階層別に関連付けし構成される編集用語辞書と、
上記ユーザーインターフェース装置は、
映像コンテンツの視聴開始より順次アノテーション情報を付与するシーン位置を指定し、この指定したシーンに対して上記編集用語辞書の上記見出し用語と上記ジャンル用語とを順次選択し、以上の指定および選択した信号情報を映像装置に送信する手段を備え、
更に上記映像装置は、
ユーザーインターフェースより受信した信号情報にもとづき編集用語辞書によるアノテーション情報データを作成するアノテーション情報作成部と、
を具備することを特徴とする。
請求項２では、
前記編集用語辞書の前記見出し用語ならびに前記ジャンル用語は１グループ最大１２の用語とするよう構成されることを特徴とする。
請求項３では、
前記編集用語辞書にはシーンの印象の度合いの情報が登録され、この印象の度合いを選択しアノテーション情報とすることを特徴とする。
請求項４では、
前記映像装置はリモコン信号受信部を備え、
前記ユーザーインターフェース装置は少なくても２０個の操作ボタンを具備するリモコンであり、前記指定および選択する信号情報はリモコン送信信号であって、
このリモコンボタンを操作することにより、前記シーン位置を指定し、前記編集用語辞書の前記見出し用語ならびに前記ジャンル用語を選択し、
映像装置は上記リモコン信号受信部でこの信号情報を受信し、アノテーション情報作成部で前記編集用語辞書の用語による前記アノテーション情報データを作成することを特徴とする。
請求項５では、
前記映像装置は音声認識部を備え、
前記ユーザーインターフェース装置は音声用マイクロフォンであり、前記指定および選択する信号情報はマイクロフォン音声信号であって、
このマイクロフォンに多くとも３０種以内の音声を発することにより、前記シーン位置を指定し、前記編集用語辞書の前記見出し用語ならびに前記ジャンル用語を選択し、
映像装置は上記音声認識部でマイクロフォン音声信号を信号情報として認識し、アノテーション情報作成部で前記編集用語辞書の用語による前記アノテーション情報データを作成することを特徴とする。
請求項６では、
前記映像装置は映像コンテンツのジャンルをＥＰＧ（エレクトロニクスプログラムガイド）ジャンルからジャンルを自動選択するジャンル選択手段を具備することを特徴とする。
請求項７では、
前記映像装置はタイムシフト再生（追いかけ再生）手段を具備し、アノテーション情報の編集中、一時停止することを特徴とする。
請求項８では、
前記アノテーション情報に、アノテーション情報付与者名を登録することを特徴とする。
請求項９では、
前記映像装置は、編集用語辞書を通信回線よりダウンロードする辞書ダウンロード部と、用語登録のための外部キーボードのためのキーボード入力部と、
を更に具備すること特徴とする。
請求項１０では、
前記映像装置は、前記ＥＰＧデータまたはインターネットよりのダウンロードデータのいずれかによって個別番組ごとの編集用語辞書とすることを特徴とする。
請求項１１では、
映像コンテンツの全てのジャンルに共通な見出し用語と、
映像コンテンツのジャンル特有の用語であるジャンル用語と、
を映像コンテンツのジャンル別および階層別に関連付けし構成される編集用語辞書より用語を選択してアノテーション情報データを作成することを特徴とする。
請求項１２では、
前記全てのジャンルに共通の見出し用語はシーンを視聴した印象を表す用語であることを特徴とする。 In order to solve the above problems, in claim 1,
A system for adding annotation information to an arbitrary scene of video content including a self-made video composed of a video device and the user interface device,
The video device
The heading term of impression terms, which is a term that expresses the impression of viewing a scene common to scenes of all genres of video content, and the genre terms, which are terms specific to the genre of video content, are classified by genre and hierarchy of video content. An associated editorial dictionary of terms,
The user interface device is
The scene position to which annotation information is added sequentially from the start of viewing video content is specified, the heading term and the genre term in the editing term dictionary are sequentially selected for the specified scene, and the above designation and selected signal Means for transmitting information to the video device;
Furthermore, the video device
An annotation information creation unit that creates annotation information data based on an edited term dictionary based on signal information received from the user interface;
It is characterized by comprising.
In claim 2,
The heading terms and the genre terms in the editing term dictionary are configured to have a maximum of 12 terms per group.
In claim 3,
Information on the degree of impression of a scene is registered in the editing term dictionary, and the degree of impression is selected and used as annotation information.
In claim 4,
The video device includes a remote control signal receiver,
The user interface device is a remote controller having at least 20 operation buttons, and the signal information to be specified and selected is a remote control transmission signal,
By operating this remote control button, specify the scene position, select the heading term and the genre term in the editing term dictionary,
In the video apparatus, the remote control signal receiving unit receives the signal information, and the annotation information creating unit creates the annotation information data based on the terms in the editing term dictionary.
In claim 5,
The video device includes a voice recognition unit,
The user interface device is a voice microphone, and the signal information to be specified and selected is a microphone voice signal,
By emitting at most 30 kinds of sounds to this microphone, the scene position is designated, the heading terms in the editing term dictionary and the genre terms are selected,
In the video apparatus, the voice recognition unit recognizes a microphone voice signal as signal information, and the annotation information creation unit creates the annotation information data based on the terms in the editing term dictionary.
In claim 6,
The video apparatus includes genre selection means for automatically selecting a genre of video content from an EPG (Electronic Program Guide) genre.
In claim 7,
The video apparatus includes a time-shift playback (chasing playback) means, and pauses during editing of annotation information.
In claim 8,
An annotation information assignor name is registered in the annotation information.
In claim 9,
The video device includes a dictionary download unit for downloading an edited term dictionary from a communication line, a keyboard input unit for an external keyboard for term registration,
Is further provided.
In claim 10,
The video device may be an edit term dictionary for each individual program based on either the EPG data or the data downloaded from the Internet.
In claim 11,
Headline terms common to all genres of video content,
Genre terms, which are terms specific to the genre of video content,
Annotation information data is created by selecting a term from an editing term dictionary constructed by associating video content with each genre and hierarchy.
In claim 12,
The headline term common to all the genres is a term representing an impression of viewing a scene.

全ての映像コンテンツのジャンルのシーンに共通な見出し用語と、この見出し用語に関連するジャンルごとの用語を階層構造として紐付けした辞書を開発するだけで特に大きな技術開発負担を強いることなく、録画装置、ビデオカメラ、編集装置等の映像装置に適応可能な、利用し易く、実用的で、リアルタイムで視聴環境に配慮し運用できる映像コンテンツのアノテーション情報付与システムが実現出来る。 A recording device that does not impose a particularly large technical development burden by developing a dictionary in which a headline term common to scenes of all video content genres and a term for each genre related to this headline term are linked as a hierarchical structure. Therefore, it is possible to realize a video content annotation information adding system that can be applied to video devices such as video cameras and editing devices, is easy to use, practical, and can be operated in consideration of the viewing environment in real time.

図１は本発明のシステムの全体構成の例であるFIG. 1 shows an example of the overall configuration of the system of the present invention. 図２は映像コンテンツのジャンル区分の例であるFIG. 2 is an example of genre classification of video content 図３は野球番組の辞書構成の例であるFIG. 3 shows an example of a dictionary structure for a baseball program. 図４は料理番組の辞書構成の例であるFIG. 4 shows an example of a dictionary structure for a cooking program. 図５はニュース番組の辞書構成の例であるFIG. 5 shows an example of a dictionary structure for news programs. 図６は映画番組の辞書構成の例であるFIG. 6 shows an example of a dictionary structure for a movie program. 図７は旅番組の辞書構成の例であるFIG. 7 shows an example of a dictionary structure for a travel program. 図８はクラシック音楽番組の辞書構成の例であるFIG. 8 shows an example of a dictionary structure of a classical music program 図９はお笑い番組の辞書構成の例であるFIG. 9 shows an example of a dictionary structure for a comedy program. 図１０は自作ビデオの結婚式の辞書構成の例であるFIG. 10 is an example of a wedding dictionary structure of a self-made video. 図１１は特定シーン登録、文字登録の例であるFIG. 11 shows an example of specific scene registration and character registration. 図１２はリモコンの例であるFIG. 12 shows an example of a remote control 図１３は操作フローの例であるFIG. 13 shows an example of the operation flow. 図１４はジャンル選択の例であるFIG. 14 shows an example of genre selection. 図１５は用語選択ファンクション操作の例であるFIG. 15 shows an example of the term selection function operation. 図１６はシーン位置指定の例であるFIG. 16 shows an example of scene position designation. 図１７は編集用語辞書の第１階層表示の例であるFIG. 17 shows an example of the first level display of the edited term dictionary. 図１８は編集用語辞書の第２階層表示の例であるFIG. 18 shows an example of the second hierarchy display of the editing term dictionary. 図１９は編集用語辞書の第３階層表示の例であるFIG. 19 shows an example of the third hierarchy display of the editing term dictionary. 図２０は編集用語辞書の第４階層表示の例であるFIG. 20 shows an example of the fourth hierarchy display of the editing term dictionary. 図２１は編集用語辞書の第５階層表示の例であるFIG. 21 shows an example of the fifth hierarchy display of the editing term dictionary. 図２２は五感印象用語の例であるFIG. 22 is an example of five sense impression terms 図２３はアノテーション情報データの例であるFIG. 23 is an example of annotation information data 図２４は視聴後編集の例であるFIG. 24 shows an example of editing after viewing. 図２５はデータベース検索の例であるFIG. 25 shows an example of database search.

図１は本発明のシステムの全体構成の例である。
映像装置１は録画装置、ビデオカメラ、編集装置等であり、ＵＩ（ユーザーインターフェース）装置２であるリモコン３でリモコン送信信号５と、またはマイクロフォン４でマイクロフォン音声信号６と、のいずれかで操作可能なシステム構成となっている。
映像装置１にはテレビジョンまたは液晶表示機などのディスプレーであるメインディスプレー９およびサブディスプレー１０が接続されており、映像コンテンツの録画、再生、表示部１４でアンテナ入力４１から放送番組を受信し、外部映像入力４２から映像コンテンツを受信し、これらを録画再生するとともに映像信号７を出力してメインディスプレー９に表示をさせる。
以上の映像コンテンツは映像コンテンツ記憶部３１に映像コンテンツのタイトル３２ごとに映像コンテンツ３３が録画されている。 FIG. 1 shows an example of the overall configuration of the system of the present invention.
The video device 1 is a recording device, a video camera, an editing device, etc., and can be operated by either a remote control transmission signal 5 with a remote control 3 which is a UI (user interface) device 2 or a microphone audio signal 6 with a microphone 4. System configuration.
The video apparatus 1 is connected to a main display 9 and a sub-display 10 which are displays such as a television or a liquid crystal display, and receives a broadcast program from the antenna input 41 at the video recording / playback / display unit 14. Video contents are received from the external video input 42, recorded and reproduced, and a video signal 7 is output to be displayed on the main display 9.
The video content 33 is recorded in the video content storage unit 31 for each title 32 of the video content.

ＧＵＩ（グラフィックユーザーインターフェース）部１１のＧＵＩ表示部１３はアノテーション情報付与編集に係るＧＵＩ表示信号８をディスプレーに出力する。
このＧＵＩ表示信号８はディスプレー切替スイッチ４０によりメインディスプレー９またはサブディスプレー１０に切替られる。
この構成図ではディスプレー切替スイッチ４０はＢ側に選択され編集のためのサブディスプレー１０に表示がなされ、映像コンテンツそのものの視聴と、編集を別々に独立して行えるが、サブディスプレー１０を使用しない場合にはディスプレー切替スイッチ４０をＡ側にしてメインディスプレー９の映像コンテンツの表示に重ねてこのＧＵＩ表示信号８を表示する。
以降の説明ではサブディスプレー１０を用いず、メインディスプレー９に映像コンテンツにＧＵＩ表示信号８を重ねた場合の例で説明する。
ＧＵＩ部１１のＧＵＩ制御部１２は、ＵＩ装置情報認識部２２のＵＩ装置２のリモコン３の信号情報はリモコン信号受信部２１、マイクロフォン４の信号情報は音声認識部２４からのＵＩ受信信号１５によって編集に係る各種の制御を実行する。
ＧＵＩ部１１のジャンル選択部２３は映像コンテンツのジャンルをＥＰＧ（エレクトロニクスプログラムガイド）データから取得して視聴する映像コンテンツのジャンル別の辞書用語を選択するものである。 The GUI display unit 13 of the GUI (graphic user interface) unit 11 outputs a GUI display signal 8 related to annotation information addition editing to the display.
The GUI display signal 8 is switched to the main display 9 or the sub display 10 by the display changeover switch 40.
In this configuration diagram, the display changeover switch 40 is selected on the B side and displayed on the sub-display 10 for editing, and viewing and editing of the video content itself can be performed independently, but the sub-display 10 is not used. The display changeover switch 40 is set to the A side, and the GUI display signal 8 is displayed so as to overlap the display of the video content on the main display 9.
In the following description, an example in which the GUI display signal 8 is superimposed on the video content on the main display 9 without using the sub display 10 will be described.
The GUI control unit 12 of the GUI unit 11 uses the remote control signal reception unit 21 for signal information of the remote control 3 of the UI device 2 of the UI device information recognition unit 22, and the UI reception signal 15 from the voice recognition unit 24 for signal information of the microphone 4. Various controls related to editing are executed.
The genre selection unit 23 of the GUI unit 11 acquires the genre of video content from EPG (Electronic Program Guide) data and selects dictionary terms for each genre of video content to be viewed.

編集用語辞書１９は本発明の根幹をなすものであり、映像コンテンツのジャンル別更には番組別に先のリモコン３の信号はリモコン信号受信部２１と、マイクロフォン４のマイクロフォン音声信号６は音声認識部２４と、のいずれかからの信号情報により辞書用語選択部２０でこの編集用語辞書１９より用語を選択し、選択された用語２９を出力し、この出力はＧＵＩ表示部１３を通じてディスプレーに表示されるとともに、選択された用語２９をアノテーション情報作成部２５で、ＵＩ装置２の操作にもとづきアノテーション情報データベース３４内のアノテーション情報データ３７として記憶する。
アノテーション情報データベース３４内には映像コンテンツのタイトル３２別にアノテーション情報データ３７が作成され、アノテーション情報データ３７には時刻情報３５ならびに関連情報３６が関連付けされている。 The editing term dictionary 19 forms the basis of the present invention. The signal from the remote controller 3 is the remote controller signal receiver 21 for each genre of video content and the program, and the microphone audio signal 6 of the microphone 4 is the voice recognizer 24. The dictionary term selection unit 20 selects a term from the edited term dictionary 19 based on the signal information from any of the above, and outputs the selected term 29. This output is displayed on the display through the GUI display unit 13. The selected term 29 is stored in the annotation information creation unit 25 as annotation information data 37 in the annotation information database 34 based on the operation of the UI device 2.
Annotation information data 37 is created for each video content title 32 in the annotation information database 34, and time information 35 and related information 36 are associated with the annotation information data 37.

以上の構成のアノテーション情報データベース３４はアノテーション情報データ検索部３０により検索が行われ選択された用語２９としてメインディスプレー９またはサブディスプレー１０いずれかに表示される。
編集用語辞書１９は放送番組映像コンテンツのＥＰＧデータよりタイトル情報ならびにジャンル情報、番組情報、出演者情報などの必要な情報を辞書登録部１６によって取得するとともに、辞書用語ダウンロード部１７でインターネット通信信号２６により随時最新版用語データがダウンロード可能になっているとともに、外部接続されたキーボードの入力によるキーボード信号２７を辞書用語キーボード入力部１８により書き込みも可能な構成となっている。
以上が本発明のシステムの全体構成の例である。 The annotation information database 34 configured as described above is displayed on either the main display 9 or the sub display 10 as the term 29 selected by the search by the annotation information data search unit 30.
The edit term dictionary 19 acquires necessary information such as title information, genre information, program information, and performer information from the EPG data of the broadcast program video content by the dictionary registration unit 16, and the dictionary term download unit 17 uses the Internet communication signal 26. Therefore, the latest term data can be downloaded at any time, and the keyboard signal 27 by the input of an externally connected keyboard can be written by the dictionary term keyboard input unit 18.
The above is an example of the overall configuration of the system of the present invention.

図２は映像コンテンツのジャンル区分の例である。
現在のデジタル放送番組の映像コンテンツはＥＰＧデータで１２のジャンルが規定され更にサブジャンルが定義付けされている。
これをそのまま利用することも出来るが、本例では図２に示すようにＥＰＧメインジャンルのアニメ／特撮は映画に含め、自作のビデオを含め合計１２種類の区分としている。
本例では以上のように自作ビデオ以外は放送規格にもとづくＥＰＧメインジャンルをそのまま用いた構成として、ＥＰＧで自動選択することも出来るようにしているが、別な方法で分類し、手動選択する方法でもよい。
自作ビデオについては、作成されるビデオの種類を適切にサブジャンルに登録し選択出来るよう構成する。 FIG. 2 shows an example of genre classification of video content.
The video content of the current digital broadcast program has 12 genres defined by EPG data and further sub-genres.
Although this can be used as it is, in this example, as shown in FIG. 2, the EPG main genre animation / special effects are included in the movie, and a total of 12 types including the self-made video are included.
In this example, as described above, the EPG main genre based on the broadcast standard other than the home-made video is used as it is, so that it can be automatically selected by the EPG. However, it is classified by another method and manually selected. But you can.
The self-made video is configured so that the type of video to be created can be appropriately registered and selected in a sub-genre.

以上のように多岐にわたるジャンルの映像コンテンツにおいて、事前にシーンの概要を把握することを必要とせず、オンライン（放送中）の映像コンテンツを視聴しながらでも任意のシーンを対象として、特別の習熟を必要とせず、当該シーンにふさわしく、かつ編集の検索用に集約され、高精度で、高速で、文字情報等を付与するために、本発明は全ての映像コンテンツのジャンルのシーンに共通であるとともに、万人に共通で理解し易いく、検索のためにも用語の数が限定出来る見出し用語を探し出しこれを見出し用語とした辞書を用意し、この辞書から必要な用語を順次選択して行くことにより課題を解決している。 As described above, it is not necessary to grasp the outline of the scene in advance for a wide variety of video content, and special proficiency can be obtained for any scene while viewing online (broadcast) video content. The present invention is common to scenes of all video content genres, so that it is suitable for the scene and is aggregated for search for editing, and is provided with high accuracy, high speed, character information, etc. Find a heading term that is easy to understand for all and easy to understand and can limit the number of terms for searching. Prepare a dictionary using this as a heading term, and select the necessary terms from this dictionary in order. The problem is solved.

本発明の辞書構成の原理は次の通りである。
何の目的で視聴者（利用者）が、その映像および音声のシーンをハイライトシーンを含む編集のためのシーンとして選ぶのかの第一のステップはそのシーンに対する、視聴者それぞれの嗜好、経験、環境、境遇などにもとづく感情や五感、場合によっては体感をもとにしているものであり、このことは全ての映像コンテンツのジャンルのシーンに共通である。
これらの生理学的な分野は脳波、これを利用する認識技術、認識制御等の研究として活発におこなわれ、将来は映像コンテンツの編集のような高度な人的判断にも利用出来るよう期待がかかるもの、脳波などを直接利用して編集を行うには現在のところ重装備の生体情報取得のためのインターフェースの使用が不可欠となるためマスプロユーズ対象の製品に適応するのは困難である。
しかしながら、これに代わって、人の感情や五感、場合によっては体感にもとづく印象を表する用語を見出し用語とする辞書を利用して意味付けする方法は、他の文字入力方法にない大きなメリットをもつ。
その第１のメリットは先に述べたように印象を表現する用語は感じた印象そのものであり万人に理解し易く全ての映像コンテンツのジャンルのシーンに共通に利用可能であることである。
従がってこれを見出し用語として用いることにより、これまで困難であった映像コンテンツの任意のシーンのアノテーション編集に必要な用語の効率的な登録方法と、登録された用語の効率的な選択方法と、が解決出来る。
次に重要なメリットは印象を表す用語、つまり名詞を形容する形容詞やこれに類する形容系の用語を見出し用語とすることにより、必然的にこれに続く形容の対象となる名詞用語が欲しくなる。
具体的には印象を表す用語を見出し用語とすることにより、これに関連するジャンル特有の名詞用語を階層的に直感的で、平易に、限定的に、紐付けし登録することが可能になるとともに、また反対に利用者は、見出し用語である印象を表す用語を選択することにより、この用語に関連紐付けされた、ジャンル特有の編集のための名詞用語を階層的に、案内表示に誘導されるよう容易に選択することが可能となることである。
例えばアイウエオ順や、その他の見出し分類方法ではこのように以降に続く用語を誘導するような効果は得られない。
誘導効果以外においても専門用語から先に意味付けする方法では、映像コンテンツのジャンルごと別々に見出し用語が必要になり、見出しの共有化が困難であり、本実施例のように全ての映像コンテンツに共通な見出し用語とするような共通の階層構造とする辞書構造とはならない。
全ジャンル共通の見出し用語をもつことは装置、システム開発側の負担を少なくするのみでなく、装置、システムの利用者側の慣れや効率の面でも大きな意味をもつ。 The principle of the dictionary structure of the present invention is as follows.
The first step of what purpose the viewer (user) selects the video and audio scene as the scene for editing including the highlight scene is the viewer's individual preference, experience, This is based on the emotions and the five senses based on the environment and circumstances, and in some cases, the bodily sensations, and this is common to all video content genre scenes.
These physiological fields are actively conducted as research on brain waves, recognition technology using them, recognition control, etc., and in the future it is expected to be used for advanced human judgments such as editing video contents. In order to edit directly by using brain waves or the like, it is currently difficult to adapt to mass-produced products because it is indispensable to use an interface for acquiring biological information of heavy equipment.
However, instead of this, using a dictionary that uses terms that represent impressions based on human emotions, the five senses, and, in some cases, bodily sensations as a headline term, has a significant advantage over other methods of character input. Have.
The first merit is that, as described above, the term for expressing an impression is the impression itself, which can be easily understood by everyone and can be used in common for all video content genre scenes.
Therefore, by using this as a heading term, an efficient registration method of terms necessary for annotation editing of an arbitrary scene of video content, which has been difficult until now, and an efficient selection method of registered terms Can be solved.
Next, the most important merit is to use the noun term that is the subject of the following adjectives by using the term representing the impression, that is, the adjective that describes the noun or the similar adjective system as the heading term.
Specifically, by using a term representing an impression as a headline term, it is possible to associate and register genre-specific noun terms related to this in a hierarchically intuitive, simple and limited manner. At the same time, the user selects a term representing an impression that is a heading term, and hierarchically guides a genre-specific noun term associated with this term to the guidance display. It is possible to select easily.
For example, the effect of inducing the following term cannot be obtained in the order of eye-were or other headline classification methods.
In addition to the inductive effect, the method of assigning meanings from the technical terms first requires heading terms separately for each genre of video content, and it is difficult to share the headings. It is not a dictionary structure with a common hierarchical structure such as a common heading term.
Having a headline term common to all genres not only reduces the burden on the device and system development side, but also has great significance in terms of habituation and efficiency on the user side of the device and system.

この印象を表現する用語だけでもさまざまな映像シーンの意味付けが可能になり編集シーンの大方の分類とすることが出来るが、更にこの印象を表現する用語に映像コンテンツのジャンルごとのジャンル特有の用語を階層状に紐付けし、これを利用することによって映像コンテンツのあらゆるシーンを対象として文字情報を中心とした意味付けを可能にする。
階層を深くすることにより複雑な文字情報の付与も可能である、また徒に階層を深くすることなく、編集のためのお勧め用語となることを期待して編集に不可欠なジャンル用語のみを辞書登録することも一つの考え方である。 It is possible to define various video scenes with just the term that expresses this impression, and it can be classified into most of the editing scenes, but the term that expresses this impression is also a genre-specific term for each genre of video content Are linked in a hierarchical manner, and using this makes it possible to make meanings centered on character information for all scenes of video content.
It is possible to add complex character information by deepening the hierarchy, and dictionary only genre terms essential for editing in the hope that it will become a recommended term for editing without deepening the hierarchy. Registration is also one way of thinking.

本実施例では全ての映像コンテンツのジャンルに共通で、映像シーンや音声シーンにより人が抱く、喜怒哀楽などの感情や好き嫌いなどの嗜好などあらゆる種類の印象を表現する形容系の用語を洗い出し、これを集約して印象用語とするとともに、これを良い印象を感じるシーンをプラス感情、悪い印象を感じるマイナス印象として区分している。
これは編集すべきシーンはプラス印象かマイナス印象どちらかであることを利用したものである。 In this example, it is common to all video content genres, and we find out the adjective terms that express all kinds of impressions such as emotions such as emotions and likes and dislikes that people hold by video scenes and audio scenes, These are combined into impression terms, and scenes that feel good are classified as positive emotions and negative impressions that feel bad.
This utilizes the fact that the scene to be edited is either a positive impression or a negative impression.

更にプラスおよびマイナス印象それぞれ便宜上、人を対象とする印象を人物系とし、人物以外のモノや事象を場面系として、合計４種（以下の説明では印象区分とする）、３８の用語を見出し用語としているが、更に追加することも、変更することも可能である。
この見出し用語（以下の説明では印象用語とする）をもとに映像コンテンツのさまざまなジャンルの特有用語（以下の説明ではジャンル用語とする）を階層的に登録した辞書構造とすることにより当初の目的を果たすものである。
喜怒哀楽などの感情や好き嫌いなどの嗜好などの見出し用語は全ての映像コンテンツのあらゆるジャンル共通に利用することが可能であり、以下にこの見出し用語を用いた代表的な８種類の映像コンテンツのジャンルにおける辞書の実施例を示す。 In addition, for the sake of convenience, positive and negative impressions, each of which has a human impression of a person and a non-human thing or event as a scene, a total of 4 types (in the following description, impression categories) and 38 terms However, it can be added or changed.
Based on this headline term (impression term in the following description), a unique dictionary structure of various genres of video content (genre term in the following explanation) is registered in a hierarchical structure. It serves a purpose.
Heading terms such as emotions such as emotions and preferences such as likes and dislikes can be used in common for all genres of all video content. An example of a genre dictionary is shown.

図３は野球番組の辞書構成の例である。
映像コンテンツのジャンル１０１別に構成される編集用語辞書１９の階層１０３の第１階層は先に説明の印象区分１０４がプラス印象、マイナス印象と場面系、人物系の４種に区分されており第２階層にこれらの印象区分１０４ごとに、本実施例では形容系の印象用語１０５が合計３８用語登録されている。
辞書構造説明のために印象用語を全ジャンルの映像コンテンツに共通な喜怒哀楽などの感情や好き嫌いなどの嗜好などの印象用語のみ示しているが、本方式では第１階層を適切な区分とすることにより第２階層の印象用語を最大１２×１２＝１４４まで登録可能で、これ以外の五感や体感など通常人が感じる、ありとあらゆる印象の用語を登録することが出来る。
理解し易い適切な印象区分と映像コンテンツのジャンル全体を考慮し割り付けする、詳細は後述する。
ここまでの構成は全てのジャンルに共通となる。 FIG. 3 shows an example of a dictionary structure for a baseball program.
In the first level of the hierarchy 103 of the editing term dictionary 19 configured according to the genre 101 of the video content, the impression category 104 described above is divided into four types: positive impression, negative impression, scene type, and person type. In this embodiment, a total of 38 terms of adjective impression terms 105 are registered for each of these impression categories 104 in the hierarchy.
For the purpose of explaining the dictionary structure, only impression terms such as emotions and emotions that are common to video content of all genres and preferences such as likes and dislikes are shown, but in this method, the first hierarchy is classified appropriately. Thus, it is possible to register up to 12 × 12 = 144 impression terms in the second hierarchy, and it is possible to register terms of any impression that a normal person feels, such as other five senses and bodily sensations.
The details will be described later, in which an appropriate impression category that is easy to understand and the entire genre of video content are considered.
The configuration so far is common to all genres.

第３階層には第２階層の印象用語１０５の中から野球番組に関連する印象用語１０５ごとに関連する野球番組特有の名詞用語であるジャンル用語１０６が登録され、更に本例では、選手、打撃、投球、送球、走塁、盗塁、の６種のジャンル用語１０６に関しては更にこの詳細の名詞用語が第４階層に、第３階層の野球番組ジャンル用語１０６に関連する野球番組ジャンル用語１０６として登録されている。 In the third level, genre terms 106, which are noun terms specific to baseball programs, are registered for each impression term 105 related to the baseball program from among the impression terms 105 in the second level. , Throwing, throwing, running, stealing, and the noun terms of this detail are registered in the fourth layer as baseball program genre terms 106 related to the baseball program genre term 106 in the third layer Has been.

野球の場合のプラス印象はひいきのチームが勝つたり、いいプレーをした時のシーンやひいきの選手が出場するシーンのそれぞれの印象用語１０５に対して野球番組特有のジャンル用語１０６が関連付けされている。 A positive impression in the case of baseball is that a baseball program-specific genre term 106 is associated with each impression term 105 of a scene when a favorite team wins or plays well or a scene where a favorite player participates. .

またこの例では、退屈・つまらない、の印象用語１０５にはＣＭ、が登録されていてこれをアノテーション情報として選択し情報付与することができる、野球番組以外の番組で子供に見せたくないシーン、自分でも見たくないシーンなども適切な印象用語１０５を選択して情報付与することが可能であり、あらゆるシーンにこれらの編集に関連する適切な用語を選択し情報付与することが出来る。 In this example, CM is registered in the impression term 105 of bored and boring, and it is possible to select and add this as annotation information. However, it is possible to select an appropriate impression term 105 and give information to a scene or the like that you do not want to see, and to select and assign information to an appropriate term related to these edits to any scene.

例えば同じ満塁ホームランのシーンでもひいきのチームであれば、プラス印象場面系、凄い・素晴らしい、打撃、満塁ホームラン、と階層別に選択、情報付与されアノテーション情報データ３７となり、一方反対の場合は、マイナス印象場面系、拙い、投球、満塁ホームラン、と階層別に選択、情報付与されアノテーション情報データ３７となる。 For example, if it is a favorite team even in the scene of the same Manchuria home run, it will be selected and classified according to hierarchy, plus impressive scene system, amazing / awesome, batting, Manchuria homerun, and the annotation information data 37 will be given if it is opposite, a negative impression Annotation information data 37 is obtained by selecting and assigning information according to hierarchy, such as scene type, scooping, pitching, and full home run.

更に高校野球や優勝を決するような試合には、プラス印象場面系、感動・感謝・感激、打撃、満塁ホームラン、としても階層別に選択、情報付与可能なよう登録されている、どちらを選ぶかは利用者の印象でよい。
単に好きな投手の選手が登場するシーンでは、プラス印象人物系、好きな・ファンの、選手、投手、が階層別に選択可能なように登録されている。
以上のように印象用語１０５に紐付けするジャンル用語１０６は野球を知っている人であれば、あらゆるシーンを連想することにより平易に実施出来る、他のジャンルにおいても同様である。 Furthermore, for games such as high school baseball and winning decisions, it is registered so that it can be selected and given information according to rank, as positive impression scene type, impression / appreciation / excitement, hitting, full home run, etc. The user's impression is good.
In a scene where a favorite pitcher player appears, a positive impression character system, a favorite fan, a player, a pitcher, and the like are registered so that they can be selected according to hierarchy.
As described above, the genre term 106 linked to the impression term 105 can be easily implemented by associating any scene as long as it is a person who knows baseball, and the same applies to other genres.

また図３の通り本発明の編集用語辞書１９では全ての階層の印象区分１０４、印象用語１０５、ジャンル用語１０６とも最大１２を１グループとした辞書構成としている。
この理由は後述される。 Further, as shown in FIG. 3, the edit term dictionary 19 of the present invention has a dictionary configuration in which all 12 impression categories 104, impression terms 105, and genre terms 106 have a maximum of 12 groups.
The reason for this will be described later.

更にジャンル用語の階層を深くすることも可能であるが、本実施例では第５階層に印象の度合いつまり感情や嗜好の程度を５段階の星型記号で選択可能な構成としている。
本発明は印象を表す用語を見出し用語とするので、以上のように印象の度合いにもとづく編集情報、結果として重要な編集シーンも容易に設定出来ることも特徴の一つである。
この記号は文字情報とすることも可能であり、また段階を少なくする、多くすることも自由である。 Although the genre term hierarchy can be further deepened, in this embodiment, the fifth level is configured such that the degree of impression, that is, the degree of emotion and preference, can be selected by five-stage star symbols.
Since the present invention uses a term representing an impression as a headline term, it is one of the features that editing information based on the degree of impression as described above, and as a result, an important editing scene can be easily set.
This symbol can also be character information, and can be reduced in number and increased in number.

図４は料理番組の辞書構成の例である。
第１階層、第２階層ならびに第５階層は先の野球番組と同様である。
料理番組などの印象用語１０５としては、美味い、不味いなどの味覚や嗅覚に関連する用語が登録されていた場合更に適切なものとなる、味覚や嗅覚以外にも五感や体感に関する用語は印象用語の対象であり利用頻度を考慮して第１階層および第２階層を設計すればよい、詳細は後述する。
第３階層、第４階層は料理番組特有のジャンル用語が印象用語に関連して登録されている。
一般的な料理番組では、レシピ、調理方法、盛り付け、出来あがり、がポイントになるが出演者や参加者の人物系に印象のウエートが多い場合にも対応が可能な構成としている。
第４階層の出演者の固有名詞は放送番組のＥＰＧデータから番組個別に取得したものである。
多くの放送番組の映像コンテンツには出演者情報がＥＰＧデータとしてデータ放送されるのでこれを利用することが出来るが、更に詳細な人名、地名、番組内容に係る用語に対しては、外部インターネットサイトからダウンロードして番組ごとに辞書用語とすることも可能であり、これらは他のジャンルの映像コンテンツについても同様である。 FIG. 4 shows an example of a dictionary structure for cooking programs.
The first hierarchy, the second hierarchy, and the fifth hierarchy are the same as the previous baseball program.
As impression term 105 for cooking programs, terms related to taste and smell are more appropriate when terms related to taste and smell are registered, such as taste and taste. The first layer and the second layer may be designed in consideration of the usage frequency and will be described in detail later.
In the third and fourth layers, genre terms specific to cooking programs are registered in relation to impression terms.
In general cooking programs, recipes, cooking methods, arrangements, and completions are the key points, but it is possible to handle cases where there are many weights of impressions of the performers and participants.
The proper nouns of performers in the fourth hierarchy are acquired individually from the EPG data of the broadcast program.
Performer information is broadcasted as EPG data in the video content of many broadcast programs, which can be used, but for more detailed personal names, place names, and terminology related to the program content, an external Internet site It is also possible to download from the dictionary terms for each program, and the same applies to video content of other genres.

図５はニュース番組の辞書構成の例である。
第１階層、第２階層ならびに第５階層は先の野球番組と同様である。
第３階層、第４階層はニュース番組特有のジャンル用語が印象用語に関連して登録されている。
一般的なニュース番組は番組中さまざまなサブジャンルの内容が送られてくるのでこれは、報道内容、として、政治、経済、文化、社会、国際、先端技術、環境、趣味、福祉、として登録されていて更に追加設定することも可能である。 FIG. 5 shows an example of the dictionary structure of a news program.
The first hierarchy, the second hierarchy, and the fifth hierarchy are the same as the previous baseball program.
In the third and fourth hierarchies, genre terms specific to news programs are registered in relation to impression terms.
Since the contents of various sub-genres are sent to general news programs, this is registered as politics, economy, culture, society, international, advanced technology, environment, hobby, welfare, etc. It is possible to make additional settings.

図６は映画番組の辞書構成の例である。
第１階層、第２階層ならびに第５階層は先の野球番組と同様である。
第３階層、第４階層は映画番組特有のジャンル用語が印象用語に関連して登録されている。
映画番組は場面系と人物系の双方の利用頻度が高くそれぞれさまざまな特有の用語が選択可能であり、スターの名前などが登録されている。
これらの男優、女優等のスターの名前は前述のインターネットサイトからダウンロードして取得した例であり、単に出演者とする場合であればＥＰＧデータから取得出来る。 FIG. 6 shows an example of a dictionary structure for a movie program.
The first hierarchy, the second hierarchy, and the fifth hierarchy are the same as the previous baseball program.
In the third and fourth layers, genre terms specific to movie programs are registered in relation to impression terms.
Movie programs are frequently used for both scenes and people, and various unique terms can be selected, and the names of stars are registered.
The names of these actors, actresses and other stars are examples obtained by downloading from the above-mentioned Internet site, and can be obtained from EPG data if they are simply made performers.

映画番組では、アクション、コメディ、ＳＦ、アニメ、実録など、更に詳細ジャンルを定めることにより、より詳細ジャンルに特化したジャンル用語とすることも可能である。
明確な定義が可能であれば、映画以外のジャンルの映像コンテンツでも同様である。 In movie programs, actions, comedy, SF, animation, actual recordings, etc., can be made into genre terms that are more specific to the detailed genre by defining more detailed genres.
The same applies to video content of a genre other than movies if a clear definition is possible.

図７は旅番組の辞書構成の例である。
第１階層、第２階層ならびに第５階層は先の野球番組と同様である。
第３階層、第４階層は旅番組特有のジャンル用語が印象用語に関連して登録されている。
旅番組は先の映画同様に場面系と人物系の双方の利用頻度が高くそれぞれさまざまな特有のジャンル用語が選択可能であり、土地、自然、文化遺産、には具体的な名前などが登録されている。
宿泊場所、乗り物、にはその種類が登録されている。 FIG. 7 shows an example of a dictionary structure for a travel program.
The first hierarchy, the second hierarchy, and the fifth hierarchy are the same as the previous baseball program.
In the third and fourth hierarchies, genre terms specific to travel programs are registered in relation to impression terms.
Like the previous movie, travel programs are frequently used for both scenes and people, and various unique genre terms can be selected, and specific names are registered for land, nature, and cultural heritage. ing.
The types of accommodation places and vehicles are registered.

図８はクラシック番組の辞書構成の例である。
第１階層、第２階層ならびに第５階層は先の野球番組と同様である。
第３階層、第４階層はクラシック番組特有のジャンル用語が印象用語に関連して登録されている。
曲名、指揮者、演奏者、独奏者、歌手、には具体的な名前が登録されている。
料理番組同様に音楽番組には聴覚による印象用語があると更に詳細なアノテーション情報とすることが出来る、これについては後述する。 FIG. 8 shows an example of the dictionary structure of a classic program.
The first hierarchy, the second hierarchy, and the fifth hierarchy are the same as the previous baseball program.
In the third and fourth hierarchies, genre terms specific to classic programs are registered in relation to impression terms.
Specific names are registered for the song title, conductor, performer, soloist and singer.
Similar to a cooking program, if there is an auditory impression term in a music program, more detailed annotation information can be obtained, which will be described later.

図９はお笑い番組の辞書構成の例である。
第１階層、第２階層ならびに第５階層は先の野球番組と同様である。
第３階層、第４階層はお笑い番組特有のジャンル用語が印象用語に関連して登録されている。
司会者、出演者、参加者、歌手、には具体的な名前が登録されている。 FIG. 9 shows an example of a dictionary structure for a comedy program.
The first hierarchy, the second hierarchy, and the fifth hierarchy are the same as the previous baseball program.
In the third and fourth layers, genre terms specific to comedy programs are registered in relation to impression terms.
The host, performer, participant, and singer are registered with specific names.

以上の７種の代表的な放送番組のジャンルの説明のように、喜怒哀楽などの感情や好き嫌いなどの嗜好などの印象用語はあらゆる映像コンテンツのジャンルに共通に利用可能であり辞書構築の基本となる。
またデータ量が多く、変動も激しい固有名詞においては実施例４の料理番組のように番組ＥＰＧデータから出演者の名前などの固有名詞を入手する方法と、更に詳細な情報を得るために実施例６の映画番組のインターネット通信等により特定のサイトから情報を入手して、さまざまな番組に関連する人名、地名、などの固有名詞を入手する方法が利用可能である。
更には後述する方法で固有名詞を個別登録することも出来る。 Impression terms such as emotions and emotions and preferences such as likes and dislikes can be used in common with all video content genres, as described in the above seven representative broadcast program genres. It becomes.
In the case of proper nouns with a large amount of data and fluctuating fluctuations, a method for obtaining proper nouns such as names of performers from program EPG data as in the cooking program of Example 4, and an example for obtaining more detailed information. It is possible to use a method for obtaining proper nouns such as names of persons, places, etc. related to various programs by obtaining information from a specific site by Internet communication of 6 movie programs.
Furthermore, proper nouns can be individually registered by the method described later.

以上のようにジャンルや更には番組ごとの辞書構成とすることは、選択する用語（文字）の数を限定し、適切なものとするためであり、例えばあまたの芸能人や地名の中から例えばあいうえお順に登録された辞書の中から１人の芸能人や１個所の地名を選択するような煩雑な操作をなくすためのものである。 As described above, the dictionary structure for each genre and further for each program is to limit the number of terms (characters) to be selected and make it appropriate. For example, among other entertainers and place names, This is to eliminate the complicated operation of selecting one entertainer or one place name from the dictionaries registered in order.

図１０は自作ビデオの結婚式の辞書構成の例である。
第１階層、第２階層ならびに第５階層は先の野球番組と同様である。
第３階層、第４階層は結婚式特有のジャンル用語が印象用語に関連して登録されている。
友人・知人、式場、には具体的な名前が登録されている。
自作ビデオの場合、名前や地域、団体などの固有名詞等は個人個人で登録することが不可欠となる、これらの名前等の登録は以下の通りである。 FIG. 10 is an example of a wedding dictionary structure of a self-made video.
The first hierarchy, the second hierarchy, and the fifth hierarchy are the same as the previous baseball program.
In the third and fourth layers, genre terms specific to weddings are registered in relation to impression terms.
Specific names are registered in friends / acquaintances and ceremony halls.
In the case of self-made video, it is essential to register proper nouns such as names, regions, groups, etc. by individuals. The registration of these names is as follows.

図１１は特定シーン登録、文字登録の例である。
図３においてCMシーンは、退屈・つまらない、に関連する用語として登録されていたが、本例では５種のＣＭなどの特定なシーンが第１階層の１１にまとめて登録されておりそれぞれ開始、中間、終了シーンを直接指定出来るようになっている。 FIG. 11 shows an example of specific scene registration and character registration.
In FIG. 3, CM scenes are registered as terms related to boredom and boring, but in this example, specific scenes such as five types of CMs are registered together in the first layer 11 and started, The middle and end scenes can be specified directly.

更に、本実施例では、図１０の、友人・知人、の名前などの固有名詞の登録は図１１に示すように、第１階層の１２に文字登録として割り当てられて、第２階層の１には漢字、２にはカナ、３にはアルファーベットならびに数字を割り当て、第４階層に個別に文字が登録してありこれを選択することにより、図３から図９までの辞書登録用語以外の個人個人必要な用語、を事前に登録または適宜入力出来る構造とすることにより更に利用価値の高い文字情報等を作成することが出来る。 Furthermore, in this embodiment, registration of proper nouns such as names of friends and acquaintances in FIG. 10 is assigned as character registration to 12 in the first hierarchy as shown in FIG. Is a Chinese character, 2 is kana, 3 is assigned an alpha bet and a number, and individual characters are registered in the 4th layer. By selecting this, individuals other than the dictionary registered terms in FIGS. 3 to 9 can be selected. It is possible to create character information and the like having higher utility value by adopting a structure in which personally necessary terms can be registered in advance or input appropriately.

この配列文字の選択による文字入力方法は携帯電話の文字入力方法やリモコンのカーソルボタン操作による文字入力よりボタンの操作回数を大幅に減らすことが出来、文字入力の効率を大幅に改善することが出来る。 The character input method by selecting this array character can greatly reduce the number of button operations compared to the character input method of the mobile phone or the character input by the cursor button operation of the remote control, and can greatly improve the efficiency of character input. .

図１２はリモコンの例である。
現在の映像装置の遠隔操作装置であるリモコンはボタン式で赤外線無線方式が主流である。
本発明では現在広く利用されている、デジタル放送用映像装置のボタン式リモコンをＵＩ装置２として利用することを意図している。 FIG. 12 shows an example of a remote controller.
The remote control that is the remote control device of the current video apparatus is a button type and the infrared wireless system is the mainstream.
In the present invention, it is intended to use a button type remote control of a digital broadcast video apparatus that is widely used at present as the UI apparatus 2.

カーソルボタン４９は通常さまざまな編集の際に利用されるものであり、通常映像コンテンツのタイトルなどの文字入力をする際にも利用されるものであるが、先に説明の通りこのボタンを用いて文字入力をすることは極めて効率が悪く実用的でないため本発明はこのカーソルボタン４９を使用せずに、以下の３種類のボタンを用いて文字入力をさせようとするものである。 The cursor button 49 is usually used for various edits and is usually used for inputting characters such as a title of video content. As described above, the cursor button 49 is used. Since inputting characters is extremely inefficient and impractical, the present invention tries to input characters using the following three types of buttons without using the cursor button 49.

通常の映像装置操作においてチャンネルボタン４３は１から１２までの１２個のチャンネル切替えボタンであり、このボタンを先に説明した用語のグループを最大１２としてある編集用語辞書の中からグループごとに用語を選択するものである。
カラーボタン４４はさまざまなファンクション実行のための青、赤、緑、黄の４個のボタンであり、チャプタボタン４５は、チャプタマークを付与するボタンである。
通常の映像コンテンツの視聴ではこれらのボタンはチャンネル選択など通常のボタン操作とし、後に述べる文字等入力編集モードにした時このボタンを文字等入力編集操作用ボタンとして転用して使用する。 In normal video device operation, the channel buttons 43 are 12 channel switching buttons from 1 to 12, and this button is used to select a term for each group from the editing term dictionary having a maximum of 12 term groups described above. To choose.
The color buttons 44 are four buttons of blue, red, green, and yellow for executing various functions, and the chapter buttons 45 are buttons for assigning chapter marks.
In normal viewing of video content, these buttons are used as normal button operations such as channel selection, and when the character input / edit mode is described later, this button is used as a character input / edit operation button.

この３種、１７個のボタン、最大でも２０個のボタンを操作することにより、映像コンテンツを視聴する際、映像コンテンツの任意のシーンに対して目的の用語や記号を文字情報等として付与するものである。 By operating these three types, 17 buttons, and a maximum of 20 buttons, when a video content is viewed, a desired term or symbol is given as character information to an arbitrary scene of the video content It is.

このボタン操作によるリモコン送信信号５は図１に示す映像装置１のリモコン信号受信部２１で受信されＧＵＩ制御部１２ならびに辞書用語選択部２０を操作する。 The remote control transmission signal 5 by this button operation is received by the remote control signal receiving unit 21 of the video apparatus 1 shown in FIG. 1 and operates the GUI control unit 12 and the dictionary term selection unit 20.

同様にマイクロフォン４をＵＩ装置２とする場合、このリモコンのボタンの名称であるチャプタボタンを、ココ、１から１２を、イチ、ニ、サン、シまたはヨン、ゴ、ロク、ナナまたはシチ、ハチ、キュウまたはク、ジュウ、ジュウイチ、ジュウニ、と青、赤、緑、黄のカラーボタンを、アオ、アカ、ミドリ、キまたはキイロ、以上の３０種以下の発声をすることにより、映像装置１の音声認識部２２はリモコンによる信号と同様に信号情報として認識し同上の操作を行う。 Similarly, when the microphone 4 is used as the UI device 2, the chapter buttons, which are the names of the buttons on the remote controller, are displayed here, 1 to 12, 1, 2, Sun, 2, 2 , Kyu or Ku, Ju, Juichi, Juuni, and blue, red, green, yellow color buttons, ao, red, green, yellow or yellow The voice recognition unit 22 recognizes the signal information in the same manner as the signal from the remote controller and performs the same operation.

リモコン操作の場合にはボタンの位置を確認してボタン操作をする必要があるが、視聴環境に問題がない場合は、このような音声認識を利用することにより、リモコンのボタンを毎回確認する必要がなく、ディスプレー画面を見たままで発声し高速な文字入力をすることが出来る。
このように極めて少ない用語の信号情報としての音声認識は構成を極めて単純化したものでも識別率を高精度なものとすることが出来る。 In the case of remote control operation, it is necessary to confirm the button position and operate the button, but if there is no problem in the viewing environment, it is necessary to check the remote control button every time by using such voice recognition There is no, you can speak while looking at the display screen, you can input characters at high speed.
As described above, the speech recognition as the signal information of very few terms can make the identification rate highly accurate even if the configuration is extremely simplified.

本発明は以上の通り視聴環境によってリモコン操作ならびに音声操作のいずれでも対応可能にしている。 As described above, according to the present invention, both the remote control operation and the voice operation can be supported depending on the viewing environment.

図１３は操作フローの例である。
以上のようなシステム構成と辞書構成を利用して目的の文字情報等のデータを作成するための映像装置１（図では装置本体と記載）に対するＵＩ装置２の操作フローを示すものである。 FIG. 13 is an example of the operation flow.
The operation flow of the UI apparatus 2 for the video apparatus 1 (denoted as the apparatus main body in the figure) for creating data such as target character information using the system configuration and the dictionary configuration as described above is shown.

ＳＴＥＰ１に示すように映像コンテンツを視聴する前に映像コンテンツのジャンルを選択しておく、これは自作ビデオ以外の放送番組は、番組ＥＰＧデータによって自動選択とすることも可能である。
映像コンテンツが開始され、文字情報等を付与したいシーンがあると、ＳＴＥP２に示すように、当該シーン位置指定信号を装置本体に送信する。
これによりＳＴＥＰ３に示すように装置本体のＧＵＩ表示部１３は視聴画面に辞書からの文字情報等の編集情報を表示し、
ＳＴＥＰ４に示すようにＵＩ装置２で目的の文字情報等を階層順に選択し１つのシーンの文字情報等の付与が完了されると、
ＳＴＥＰ５に示すように文字情報等のアノテーション情報データ３７を作成し
ＳＴＥＰ６に示すように画面表示をもとに戻し
以降映像コンテンツの視聴終了までこれが繰り返される。 As shown in STEP 1, the genre of the video content is selected before viewing the video content. For this, a broadcast program other than the self-produced video can be automatically selected by the program EPG data.
When the video content is started and there is a scene to which character information or the like is to be added, the scene position designation signal is transmitted to the apparatus body as shown in STEP2.
Thereby, as shown in STEP 3, the GUI display unit 13 of the apparatus main body displays editing information such as character information from the dictionary on the viewing screen,
As shown in STEP 4, when the target character information or the like is selected in the hierarchical order in the UI device 2 and the provision of the character information or the like of one scene is completed,
As shown in STEP 5, annotation information data 37 such as character information is created, and the screen display is returned to the original state as shown in STEP 6, and this is repeated until the end of viewing of the video content.

以上のような簡単な操作で実現出来るため、今まで困難であった全ての映像コンテンツのジャンルを対象としたオンライン、リアルタイムでの文字情報等の入力編集も可能にさせる、以下に本システムの操作の詳細を示す。 Since it can be realized with the simple operation as described above, it enables online and real-time input and editing of character information etc. for all video content genres that have been difficult so far. Details are shown.

図１４はジャンル選択の例である。
通常デジタル放送番組の場合、放送電波で番組のＥＰＧデータが送られてくるので、この情報をもとに映像装置１は自動的にジャンル選択することも可能であるが、ここではＵＩ装置２で指定する場合の実施例を示している（自作ビデオの場合はＥＰＧデータがないためこの方法で選択する）。 FIG. 14 shows an example of genre selection.
Usually, in the case of a digital broadcast program, EPG data of the program is sent by broadcast radio waves, so the video apparatus 1 can automatically select a genre based on this information, but here the UI apparatus 2 An example in the case of designation is shown (in the case of a self-produced video, since there is no EPG data, it is selected by this method).

映像装置１のメインメニューから編集モード等にして、ＧＵＩ制御部１２でこれから説明する文字入力等の編集モードに設定することにより映像装置１は文字入力等の編集モードになりリモコン信号、マイクロフォンよりの信号を文字等入力編集用の信号として受け付ける。 When the editing mode is set from the main menu of the video apparatus 1 and the editing mode such as the character input described below is set by the GUI control unit 12, the video apparatus 1 enters the editing mode such as the character input and the like from the remote control signal and the microphone. The signal is accepted as a signal for input editing such as characters.

先に説明の図２に示す映像コンテンツのジャンルをＵＩ装置２のリモコン３またはマイクロフォン４を使って選択する場合の実施例である。
図の左側にはメインジャンル１０１が選択番号４６の１から１２まで選択項目４７として表示されている。
更にカラーボタン４４に対応してファンクション内容４８が表示されている。
最初の段階ではカーソル５０は表示されていないが、リモコン操作５１の場合、２、ボタンを押すことにより選択番号４６の２であるスポーツのラインにカーソル５０が表示され、これを確認し、良ければ、緑、ボタンで、次に進む、もしジャンルを間違えて押した場合には、赤、ボタンを押すことにより、取消、されカーソルは消滅し再選択することができる。
音声操作５２の場合はマイクロフォンに、二、と発声することによりカーソルが表示されこれを確認しよければ、ミドリ、と発声することによりメインジャンルが選択される。 This is an embodiment in which the genre of the video content shown in FIG. 2 described above is selected using the remote control 3 or the microphone 4 of the UI device 2.
On the left side of the figure, the main genre 101 is displayed as selection items 47 from selection numbers 1 to 12.
Further, function contents 48 are displayed corresponding to the color buttons 44.
In the first stage, the cursor 50 is not displayed. However, in the case of remote control operation 51, the cursor 50 is displayed on the sports line which is 2 of the selection number 46 by pressing the button 2, and this is confirmed. If you press the wrong genre, press the red button to cancel, the cursor disappears and you can reselect.
In the case of the voice operation 52, the cursor is displayed by saying “2” on the microphone, and if it is confirmed, the main genre is selected by saying “midori”.

メインジャンルの選択が完了すると、図の右側に示すようにサブジャンル１０１が表示され、野球を選択する場合、リモコン操作５１の場合、１、ボタンを押すことにより選択番号４６の１である野球のラインにカーソル５０が表示され、青、ボタンを押すことによりジャンル１０１の選択が完了する。
音声操作５２の場合は、イチ、アオ、と発声することによりサブジャンル１０１の選択が完了される。
もし押し間違え、ジャンルの変更があれば、赤、緑、の操作により修正することが出来る。 When the selection of the main genre is completed, the sub genre 101 is displayed as shown on the right side of the figure. When selecting baseball, in the case of remote control operation 51, 1 is selected by pressing the button. The cursor 50 is displayed on the line, and the selection of the genre 101 is completed by pressing the blue button.
In the case of the voice operation 52, the selection of the sub-genre 101 is completed by uttering “I” and “Ao”.
If you make a mistake and change the genre, you can correct it by operating red and green.

以上で映像コンテンツのジャンル選択が完了し、ＧＵＩ制御部１２は当該放送番組開始時点から番組終了時点までの任意のシーンに視聴者が必要とする文字情報等を付与することが可能な文字等入力編集モードとなる。 The genre selection of the video content is completed as described above, and the GUI control unit 12 can input character information and the like that the viewer can add to the desired scene from the broadcast program start time to the program end time. Edit mode is entered.

図１５は用語選択ファンクション操作の例である。
本実施例では文字等入力編集に係る操作をリモコンの、チャプタボタン、チャンネルボタンとカラーボタンの１７個のボタンを操作、または音声認識することにより実現させるもので、チャプタボタンはシーン位置を指定し、１２個のチャンネルボタン４３は選択項目４７を選択するための選択番号４６の選択に利用している、またカラーボタン４４はこれ以外の操作を実行する。
本例ではカラーボタン４４の、青、ボタンを、選択、緑、ボタンを、戻る、黄、ボタンを、次へ、赤、ボタンを、取消、に対応させ、通常操作は、青、ボタン操作のみであるが、変更や次の階層へのジャンプなどの通常外操作の概要は図に示す通りである。 FIG. 15 shows an example of the term selection function operation.
In this embodiment, operations related to character input editing are realized by operating chapter buttons, channel buttons and color buttons on the remote controller or by voice recognition. The chapter buttons designate scene positions. The twelve channel buttons 43 are used for selecting a selection number 46 for selecting the selection item 47, and the color button 44 executes other operations.
In this example, the color button 44 corresponds to blue, button, select, green, button, back, yellow, button, next, red, button, cancel, and normal operation is only blue, button operation However, the outline of the extraordinary operations such as change and jump to the next hierarchy is as shown in the figure.

図１６はシーン位置指定の例である。
先にジャンル選択したスポーツ／野球の映像コンテンツを視聴中の満塁ホームランのシーンに文字情報等を付与する場合の例である。 FIG. 16 shows an example of scene position designation.
This is an example in the case where character information or the like is given to a scene of a full home run while viewing the sports / baseball video content previously selected for the genre.

視聴者（利用者）は先ず編集するべきシーンと感じたところで、リモコン３を使ってのリモコン操作５１の場合はチャプタボタン４５を押すことにより、このシーンの時間位置を指定することが出来る。 When the viewer (user) first feels that the scene is to be edited, in the case of the remote control operation 51 using the remote controller 3, the time position of this scene can be specified by pressing the chapter button 45.

音声操作５２の場合にはマイクロフォン４に、ココ、と発声することによりこのシーンの時間位置を指定することが出来る。
音声操作５２の場合の時間位置指定は、ココ、として示したがこれに代わる、チャプタ、などの発声でもよく、発声に対応したファンクションを決めればよい。
リモコン操作でも、音声操作でも、この時アノテーション情報を付与するための用語等一切考える必要はない、感覚、印象に任せて、この時間位置指定をすればよい、このことが本発明のポイントである。
以上により辞書機能が作動し、本例の場合図３に示す編集用語辞書１９の第１階層の表示に移行する。 In the case of the voice operation 52, the time position of this scene can be designated by uttering the microphone 4 here.
Although the time position designation in the case of the voice operation 52 is shown as “here”, it may be an utterance such as a chapter instead of this, and a function corresponding to the utterance may be determined.
It is not necessary to think about terms for giving annotation information at this time, whether it is remote control operation or voice operation. It is sufficient to specify this time position depending on the sense and impression. This is the point of the present invention. .
The dictionary function operates as described above, and in the case of this example, the display shifts to the display of the first hierarchy of the edited term dictionary 19 shown in FIG.

以上に関しては映像装置１の映像コンテンツの録画、再生、表示部１４のタイムシフト再生（追いかけ再生）手段により、視聴中の映像を自動的に一時停止し、第１階層の表示をすることも可能である、これによって短時間であってもハイライトシーンに続く大事なシーンを見逃すこともなく安心して文字等入力編集を実施することが出来る。 With regard to the above, it is also possible to automatically pause the currently viewed video and display the first level by means of video content recording and playback of the video device 1 and time shift playback (chase playback) of the display unit 14. Thus, input editing of characters and the like can be performed with confidence without missing an important scene following the highlight scene even in a short time.

図１７は図１６から移行した編集用語辞書の第１階層の表示の例である。
本例では図３で説明の野球番組における編集用語辞書１９の第１階層が１から４までＧＵＩ表示部１３により表示されている。
また第１階層の１１、１２には先に説明の図１１の特定シーン登録、文字登録が利用出来るようにメニューが表示されている。 FIG. 17 shows an example of the first level display of the edited term dictionary transferred from FIG.
In this example, the first level of the editing term dictionary 19 in the baseball program described with reference to FIG.
Also, menus 11 and 12 in the first hierarchy are displayed so that the specific scene registration and character registration of FIG. 11 described above can be used.

先の説明で指定したシーンの時間位置に対する、用語選択をする際、視聴者が攻撃側のチームのファンであった場合は、印象区分１０４としておのずから選択番号４６が１の、プラス印象場面系、が選択される。
この場合のリモコン操作５１は、１、ボタンを押すことによりカーソル５０が表示され、良ければ、青、ボタンで選択され、プラス印象場面系、がアノテーション情報作成部２５でアノテーション情報データベース３４内のアノテーション情報データ３７の第１階層に記憶される。
同様に音声操作５２の場合は、イチ、アオ、の発声である。 When selecting a term for the time position of the scene specified in the previous description, if the viewer is a fan of the attacking team, the selection number 46 is set as the impression category 104, and the plus impression scene system. Is selected.
In this case, the remote controller operation 51 is performed by pressing the button 1 to display the cursor 50. If it is good, the blue button is selected, and the plus impression scene system is an annotation in the annotation information database 34 by the annotation information creation unit 25. The information data 37 is stored in the first hierarchy.
Similarly, in the case of the voice operation 52, the voice is “Ichi” or “Ao”.

また選手そのものがひいきであれば図３の辞書から、プラス印象人物系、好きな・ファンの、選手、内野手、のように選択して登録することができる。
反対であれば、マイナス印象場面系、が印象区分１０４として必然的に選択される。
もちろん中立な立場でどちらのチームの素晴らしいプレーに対しても、プラス印象の印象用語１０４で登録することも可能である。
以上が第１階層表示画面で、次の第２階層の表示に移行する。 Further, if the player itself is a favorite, it can be selected and registered from the dictionary of FIG. 3, such as a positive impression person, a favorite / fan's player, an infielder, and the like.
If the opposite is true, the negative impression scene system is necessarily selected as the impression category 104.
Of course, it is possible to register with the positive impression term 104 for both teams in a neutral position.
The above is a 1st hierarchy display screen, and it transfers to the display of the following 2nd hierarchy.

図１８は図１７から移行した編集用語辞書の第２階層の表示の例である。
先の説明の第１階層で印象区分１０４が、プラス印象場面系、として選択された場合の第２階層の印象用語１０５の表示である。
これらの中から最適な印象用語１０５を選択番号４６が４の、凄い・素晴らしい、とする場合、この場合のリモコン操作５１は、３、青、であり音声操作５２の場合は、サン、アオの発声である。
以上が第２階層選択画面で、選択した印象用語１０５がアノテーション情報データ３７の第２階層に記憶され、次に第３階層の表示に移行する。 FIG. 18 is an example of the display of the second hierarchy of the edited term dictionary transferred from FIG.
This is a display of impression terms 105 in the second hierarchy when the impression category 104 is selected as a positive impression scene system in the first hierarchy described above.
Of these, when the optimal impression term 105 is the selection number 46 of 4 and is awesome and wonderful, the remote control operation 51 in this case is 3 and blue, and in the case of the audio operation 52, the sound of San and Ao It is utterance.
The above is the second hierarchy selection screen. The selected impression term 105 is stored in the second hierarchy of the annotation information data 37, and then the display shifts to the third hierarchy display.

図１９は図１８から移行した編集用語辞書の第３階層の表示の例である。
先の説明の第２階層で印象用語１０５が、凄い・素晴らしい、として選択された場合の第３階層のジャンル用語１０６の表示である。
これらの中から最適なジャンル用語１０６を選択番号４６が１の、打撃、とする場合、この場合のリモコン操作５１は、１、青、であり音声操作５２の場合は、イチ、アオ、の発声である。
以上が第３階層選択画面で、選択したジャンル用語１０６がアノテーション情報データ３７の第３階層に記憶され次に第４階層の表示に移行する。 FIG. 19 is a display example of the third hierarchy of the edited term dictionary transferred from FIG.
This is a display of the genre term 106 in the third hierarchy when the impression term 105 is selected as awesome / excellent in the second hierarchy described above.
In the case where the most appropriate genre term 106 is a hit with a selection number 46 of 1, the remote control operation 51 in this case is 1, blue. It is.
The above is the third hierarchy selection screen, and the selected genre term 106 is stored in the third hierarchy of the annotation information data 37, and then the display shifts to the fourth hierarchy display.

図２０は図１９から移行した編集用語辞書の第４階層の表示の例である。
先の説明の第３階層でジャンル用語１０６が、打撃、として選択された場合の第４階層のジャンル用語１０６の表示である。
これらの中から最適なジャンル用語１０６を選択番号４６が１の、満塁ホームラン、とする場合、この場合のリモコン操作５１は、１、青、であり音声操作５２の場合は、イチ、アオ、の発声である。
以上が第４階層選択画面で、選択したジャンル用語１０６がアノテーション情報データ３７の第４階層に記憶され次に第５階層の表示に移行する。 FIG. 20 shows an example of the fourth layer display of the edited term dictionary transferred from FIG.
It is a display of the genre term 106 of the 4th hierarchy when the genre term 106 is selected as a hit in the 3rd hierarchy of the previous description.
If the most appropriate genre term 106 is a full home run with selection number 46 of 1, the remote control operation 51 in this case is 1, blue, and in the case of voice operation 52, It is utterance.
The above is the fourth hierarchy selection screen. The selected genre term 106 is stored in the fourth hierarchy of the annotation information data 37, and then the display shifts to the fifth hierarchy display.

図２１は図２０から移行した編集用語辞書の第５階層の表示の例である。
先の説明で第４階層でジャンル用語１０６が、満塁ホームラン、として選択された場合の第５階層の表示である。 FIG. 21 shows an example of the display of the fifth layer of the edited term dictionary migrated from FIG.
In the above description, it is a display of the fifth hierarchy when the genre term 106 is selected as a full home run in the fourth hierarchy.

第５階層は図３で説明の通り、このシーンの印象の度合いを現す文字または記号を選択する場合でありこの例では５段階レベル中、４段階のレベルを示す選択番号４６が４を選択する場合である、この場合のリモコン操作５１の場合は、４、青、の発声であり音声操作５２は、ヨン、アオ、である。
本発明は印象を表す形容系の用語を見出し用語としているので以上のように印象の度合いにもとづく編集情報、結果として重要な編集シーンも容易に設定出来る。 As shown in FIG. 3, the fifth layer is a case where a character or symbol representing the degree of impression of this scene is selected. In this example, among the five levels, the selection number 46 indicating four levels selects “4”. In the case of the remote control operation 51 in this case, the utterance is 4, blue, and the voice operation 52 is Yong, Ao.
Since the present invention uses an adjective term representing an impression as a heading term, editing information based on the degree of impression as described above and, as a result, an important editing scene can be easily set.

以上が第５階層表示画面で、これにより以上のデータはアノテーション情報データ３７の第５階層として記憶され、通常動作に戻るとともに、タイムシフト再生（追いかけ再生）手段により一時停止中の場合には画面は再会され通常視聴画面となる。 The above is the fifth tier display screen, whereby the above data is stored as the fifth tier of the annotation information data 37, the screen returns to normal operation, and when paused by time-shift playback (chase playback) means Will be reunited and become the normal viewing screen.

番組視聴中、視聴者（利用者）は印象区分と形容系の印象用語を選択することにより、以降のジャンル用語を自分であれこれ考えることもなく、案内表示に誘導されるようジャンル用語を選択することによって映像コンテンツの任意のシーンに最適な文字情報等のアノテーション情報を付与することが可能となる。
以上の図１６から図２１の操作を繰り返すことにより、リアルタイムでアノテーション情報データ３７が完成される。
これは見出し用語を形容系の印象用語として以降に続く名詞用語であるジャンル用語を関連付けすることの最大の効果である。
感情や五感、体感に任せてシーンの時間位置を指定し、適切な印象用語を選択することにより後は案内表示に誘導されるように「最適なジャンル用語の付与が可能となる。
編集途中の操作ミスの修正や、取消しなどの操作は先に説明の図１５の用語選択ファンクション操作にもとづき自由に実施可能である。 While watching the program, the viewer (user) selects the genre term to be guided to the guidance display without selecting the genre term afterward by selecting the impression category and the impression-type impression term. This makes it possible to add annotation information such as character information that is optimal for an arbitrary scene of video content.
The annotation information data 37 is completed in real time by repeating the operations shown in FIGS.
This is the greatest effect of associating a headline term with an adjective impression term and a genre term that is a subsequent noun term.
By assigning the time position of the scene to the emotion, the five senses, and the bodily sensation, and selecting an appropriate impression term, it is possible to give “optimum genre term” so that it is guided to the guidance display.
Operations such as correction of operation mistakes during editing and cancellation can be freely performed based on the term selection function operation of FIG. 15 described above.

以上の文字情報等の付与は生放送番組のみならず録画した映像コンテンツの再視聴時などにおいても、上記同様リモコン操作５１、音声操作５２のシンプルな操作で実施することが出来る。
また映像装置１に録画した映像コンテンツ以外のリムーバルビデオコンテンツとして搭載された映像コンテンツ３３に利用することも可能である。 The addition of the character information and the like can be performed by simple operations of the remote control operation 51 and the audio operation 52 as described above, not only when viewing live broadcast programs but also when viewing recorded video content.
It can also be used for video content 33 mounted as removable video content other than video content recorded in the video device 1.

本実施例ではリモコン方式、音声認識方式とも合計１７のボタンまたは音声をもってすべての編集を行っている、アノテーション情報編集中にはチャンネル切替などがないことを利用してチャンネルボタンで用語の選択を行うよう、リモコンボタンの割り付けを行ったが、他のファンクションボタン等は他のボタン割り付けでも構わない、電源などの操作を含めても最低２０個のボタンのあるリモコン、または最大３０種の信号情報としての音声で実現出来るところが本発明の重要なポイントである。 In this embodiment, all editing is performed with a total of 17 buttons or voices in both the remote control method and the voice recognition method, and the channel button is used to select terms using the fact that there is no channel switching during annotation information editing. Although the remote control buttons are assigned, other function buttons may be assigned to other buttons, the remote control having at least 20 buttons including the operation of the power supply, or the maximum 30 kinds of signal information. This is an important point of the present invention.

以上の説明ように本発明の音声認識で辞書用語を直接読取り、認識率が課題になるような音声認識はしていない、辞書内の用語の選択のための信号情報とするだけである。
従がって音声認識も文脈を判断する文法系の音声認識等とする必要もなく、単純な音響系のパターンマッチングによる音声認識で可能であり、システムの負担を大きくすることもなく認識率を高くすることが可能である。
限られた３０以内の音声であるため、必要に応じ特定話者登録も容易である。
更に高精度にするためにはマイクロフォンをヘッドセットタイプにする、発音スイッチを取り付けする、またはイヤータイプのマイクロフォンで鼓膜の振動を集音するなど様々な使用環境に応じた形態とすることが可能である。 As described above, the dictionary terms are directly read by the speech recognition of the present invention, and only the signal information for selecting the terms in the dictionary is not used.
Therefore, it is not necessary to use grammatical speech recognition for judging context, and it is possible to perform speech recognition by simple acoustic pattern matching, and the recognition rate can be increased without increasing the burden on the system. It can be increased.
Since the voice is limited to 30 or less, it is easy to register a specific speaker as necessary.
In order to achieve higher accuracy, it is possible to adopt a form that suits various usage environments, such as making the microphone a headset type, attaching a sound generation switch, or collecting eardrum vibration with an ear type microphone. is there.

本発明は以上のように、シーンの印象をもとにした印象用語１０５に関連付けられたジャンル特有のジャンル用語１０６の選択がＧＵＩ表示部１３によりメニュー選択形式で印象用語に誘導されるよう選択出来るため、全ての映像コンテンツのジャンルを対象としてアナウンサーのような専門家でなくても用語を考えたり、選択を迷うこともなく、操作が単純で、特別な習熟を要せず、直感的にリアルタイム編集が可能であり、本実施例ではシーンの時間位置の指定から第１階層から第５階層までの文字および記号合計２９文字記号を計１１ボタン操作で実現出来、平均ボタン操作時間とシステム操作時間を平均１．０秒とする場合、最短１１．０秒で情報入力が完了出来る。
音声入力の場合は、平均発声時間とシステム動作時間を平均０．５秒とする場合、最短５．５秒で情報入力を完了出来る。
本実施例では確実性を重視するため、選択番号４６を指定後更に、選択、を操作するよう構成されているが、編集速度を優先する場合には、選択番号４６を指定することにより直接下の階層に移行するよう構成すれば、先の時間をほぼ半分まで短縮することも出来る。
以上のように文字を中心とする情報が、操作が単純で、特別な習熟を要せず、直感的に操作することにより、全ての映像コンテンツのジャンルを対象としてリアルタイムで、当該シーンに最適な編集用語のアノテーション情報の付与編集を可能とするのが、印象用語を見出し用語とする辞書を利用した本発明の大きな特徴である。 As described above, the present invention can be selected so that selection of the genre-specific genre term 106 associated with the impression term 105 based on the impression of the scene is guided to the impression term in the menu selection format by the GUI display unit 13. Therefore, for all video content genres, even if you are not an expert like an announcer, you will not have to think about terms or make choices, are simple to operate, do not require special proficiency, and are intuitively real-time In this embodiment, a total of 29 characters and symbols from the first layer to the fifth layer from the designation of the time position of the scene can be realized by a total of 11 button operations, and the average button operation time and system operation time can be realized. If the average is 1.0 seconds, information input can be completed in a minimum of 11.0 seconds.
In the case of voice input, if the average utterance time and system operation time are 0.5 seconds on average, information input can be completed in a minimum of 5.5 seconds.
In this embodiment, since the certainty is emphasized, the selection is performed after the selection number 46 is specified. However, when priority is given to the editing speed, the selection number 46 is directly specified. If it is configured to move to the next hierarchy, the previous time can be reduced to almost half.
As described above, information centered on characters is simple to operate, does not require special learning, and is intuitively operated, so that it is optimal for the scene in real time for all video content genres. The feature of the present invention using a dictionary that uses impression terms as heading terms is to enable the editing of annotation information of editing terms.

初回視聴の映像コンテンツに対しては例えばハイライトシーンの位置と第２階層までの印象用語１０５のみをアノテーション情報としてリアルタイムで付与しておき、次回再生時、編集時に以後のジャンル用語を詳細に登録することでもよい。
通常のチャプタマークのように時間位置のみを指定する方法でマークを多用した場合、後でこのマークが何の意図のマークであったかを判読することが難しい。
最低この印象用語１０５を付すだけでも当該シーンの意図が理解出来、以降の編集を迅速で効率的にすることが可能となる。
このような場合には第２階層の印象用語１０５の選択完了後、自動的に通常視聴画面に戻るよう設定しておくことも出来る。 For the first-time viewing video content, for example, only the position of the highlight scene and impression terms 105 up to the second hierarchy are assigned in real time as annotation information, and the subsequent genre terms are registered in detail during the next playback and editing. You may do it.
When a mark is frequently used by a method of specifying only the time position as in the case of a normal chapter mark, it is difficult to determine what the mark was intended later.
At least this impression term 105 can be added to understand the intention of the scene, and subsequent editing can be made quickly and efficiently.
In such a case, it is possible to automatically return to the normal viewing screen after the selection of the impression term 105 in the second hierarchy is completed.

図２２は五感印象用語の例である。
これまでの説明の喜怒哀楽などの感情や好き嫌いなどの嗜好などによるプラス印象、マイナス印象の印象用語１０５は、全ての映像コンテンツのジャンルに共通に利用出来るものであるが、これに更に視覚、聴覚、味覚、嗅覚、触覚の五感に関する形容系の印象用語１０５をまとめたものが図２２であり、第１階層の５から１０の印象区分１０４に割り付け登録されている。
映像コンテンツに対しては当然のことながら視覚的印象が最も多く、これを形状的印象、空間的印象、明暗色調的印象の３つに区分して割り付けしている。
聴覚的印象では音楽番組にも対応できるよう用語が登録されている。
嗅覚に関しては味覚と一緒に割り付けグルメや料理番組に対応出来るよう用語が登録されている。
触覚的印象は、アクション映画等自分が主人公になったつもりで感じる印象用語が登録されている。 FIG. 22 shows examples of five sense impression terms.
The positive impression and negative impression impression terms 105 based on emotions such as emotions and likes and dislikes in the explanation so far, can be used in common with all video content genres. FIG. 22 shows a summary of the adjective impression terms 105 relating to the five senses of hearing, taste, smell, and touch, which are assigned and registered in the impression categories 104 of 5 to 10 in the first hierarchy.
As a matter of course, the video content has the most visual impressions, which are divided into three parts: a shape impression, a spatial impression, and a light and dark tone impression.
In terms of auditory impressions, terms are registered so that music programs can be handled.
Regarding olfaction, terms are registered with the taste so that it can be assigned to gourmet and cooking programs.
As for tactile impressions, impression terms such as action movies that you feel as if you were the main character are registered.

プラス印象、マイナス印象の喜怒哀楽などの感情や好き嫌いなどの嗜好などの印象用語と、以上説明の五感に関する印象用語と、更に必要によっては、眠い、疲れた、酔っぱらった、等の五感に含まれない体感的な印象用語と、を加えると、映像コンテンツのシーン画像、シーン音声による刺激に対する人の反応としてのシーンの印象は完全に満たされたものとなる。
これらの五感、体感等の印象用語１０５に映像コンテンツそれぞれのジャンルに関係する用語を当てはめ、これに関連するジャンル用語１０６を適切に登録することにより、どのようなシーンの編集にでも適切な見出し用語が選択可能となる。
しかしながら、例えば野球番組の編集で味覚や嗅覚の印象用語はほとんど利用されなく、一方で料理番組では重要になる、従がって五感や体感に関する印象用語に関しては必ずしも全ての映像コンテンツのジャンルに対応させる必要はなく、スポーツ番組、料理番組、音楽番組、旅行番組、ドラマ、アクション映画などそのジャンルに必要な五感や体感に関する印象用語をジャンル別に利用出来るようにすればよい。
また、五感や体感の印象用語１０５の場合にはこれ自体がそのシーンから視た、聴いた、味わった、嗅いだ、触れた、体感した、等の印象の意味を持っているので第３、４階層に必ずしもジャンル用語１０６を当てはめる必要もない、必要なジャンル用語１０６を適切に関連付けすればよい。 Impression terms such as positive impressions, negative impressions such as emotions and preferences such as likes and dislikes, impression terms related to the five senses described above, and, if necessary, included in the five senses such as sleepy, tired, drunk In addition, the impression of the scene as a human reaction to the stimulus by the scene image and scene sound of the video content is completely satisfied.
By applying the terms related to the genre of each video content to the impression terms 105 such as the five senses and the bodily sensations, and appropriately registering the genre terms 106 related thereto, appropriate heading terms for any scene editing Can be selected.
However, for example, taste and olfactory impression terms are rarely used in editing baseball programs, but on the other hand, they are important in cooking programs. Therefore, impression terms related to the five senses and bodily sensations do not necessarily correspond to all video content genres. Impression terms related to the five senses and bodily sensations necessary for the genre, such as sports programs, cooking programs, music programs, travel programs, dramas, and action movies, may be made available for each genre.
Also, in the case of impression term 105 of the five senses and bodily sensations, since this itself has the meaning of impressions seen from the scene, heard, tasted, smelled, touched, experienced, etc., the third, It is not always necessary to apply the genre terms 106 to the four layers, and the necessary genre terms 106 may be appropriately associated.

図２３はアノテーション情報データの例である。
これまでの例でアノテーション情報が付与されたアノテーション情報データ３７の例であり映像コンテンツのタイトル部にはジャンル、番組名、放送局名、放送開始時間、終了時間が記録されており、タイトルの下にはアノテーション情報付与者名である編集者の名前が関連情報３６内に個人別情報１０８として登録されている、同一のタイトルを別な複数の利用者が文字情報等編集することも可能である。 FIG. 23 is an example of annotation information data.
In this example, the annotation information data 37 to which annotation information is added is shown. The title part of the video content includes a genre, a program name, a broadcast station name, a broadcast start time, and an end time. The name of the editor who is the name of the annotation information assigner is registered as the individual information 108 in the related information 36, and the same title can be edited by a plurality of different users with text information and the like. .

以上の情報の下に放送開始から放送終了までに指定し選択した、時刻とそれぞれの階層のアノテーション情報（文字、記号等）が付与されており、印象区分１０４と印象用語１０５、ジャンル用語１０６それぞれの用語が階層別に選択されている。
これを一覧するだけでも、おおよそのシーンの時間位置とその内容を克明に理解することが出来る。 Under the above information, the time and annotation information (characters, symbols, etc.) of each layer specified and selected from the start of the broadcast to the end of the broadcast are given, and the impression category 104, the impression term 105, and the genre term 106, respectively. Terms are selected by hierarchy.
By just listing this, you can understand the approximate time position of the scene and its contents clearly.

順番３はＣＭを、退屈・つまらない、として選択した内容となっている、自作のビデオの編集などで不要なシーンも、退屈・つまらない、を選択することにより選択することが可能である。
先に説明の通りこのようなシーンに対しては、図１１並びに図１７で示した特定シーンから選択登録することも可能である。 In order 3, it is possible to select a scene that is not selected for boring / boring, even for scenes that are unnecessary for editing a self-produced video, which is the content selected as boring / boring.
As described above, such a scene can be selectively registered from the specific scenes shown in FIGS.

図２４は視聴後編集の例である。
これまでの説明のように、以上のアノテーション情報の付与は映像コンテンツを視聴した結果にもとづくもので厳密な意味でのシーン位置指定のタイミングは後ろにずれたものとなる。
従がって指定したシーンより先行するシーンに時間位置を再設定すればよい、この際先行するシーンにシーン位置指定を自動修正することも可能である。 FIG. 24 shows an example of editing after viewing.
As described so far, the above annotation information is added based on the result of viewing the video content, and the timing of specifying the scene position in a strict sense is shifted backward.
Accordingly, it is only necessary to reset the time position to the scene preceding the designated scene. At this time, the scene position designation can be automatically corrected to the preceding scene.

この場合映像コンテンツを視聴していて、開始とほぼ同時に時間位置指定が可能な例えばＣＭシーン、数秒かからないと分からない例えば投手がボールを投げてから満塁ホームランとなるまでのシーン、映画やドラマのハイライトシーンなどのように１分程度前からがハイライトシーンの導入部になる場合など映像コンテンツのジャンルおよび選択された用語をもとに、時間位置の自動詳細調整をすることも、範囲で指定することも可能である。 In this case, you can watch the video content and specify the time position almost at the same time as the start, for example, the CM scene, which can only be seen in a few seconds, for example, the scene from the pitcher throwing the ball to the full home run, the movie or drama high Automatic range adjustment can also be specified within the range based on the genre of video content and the selected term, such as when a scene is introduced from about a minute ago, such as a light scene. It is also possible to do.

また更に編集効果を高めるためには、隣接する映像の編集点（カット点）を自動検出してこの編集点を時間位置とするとよい。
これらの編集点（カット点）の自動検出はさまざまな文献で紹介されている。
また編集されたアノテーション用語が適切であるか、印象の度合い、など映像コンテンツ全体を総合的に判断して修正することも可能である。 In order to further enhance the editing effect, it is preferable to automatically detect an editing point (cut point) of an adjacent video and set this editing point as a time position.
Automatic detection of these edit points (cut points) has been introduced in various documents.
It is also possible to comprehensively judge and correct the entire video content such as whether the edited annotation term is appropriate or the degree of impression.

以上のような視聴後に二次編集を施すことにより精度が高く高品位なアノテーション情報データ３７とすることが出来る。
このアノテーション情報データ３７の時刻情報３５を利用して、カット、結合、編集はもちろんのこと、映像コンテンツをランダムアクセスしてプレーリスト作成などの編集を自由に行い、映像コンテンツの利用の幅を拡大することが、本発明の最終目的であり以下にその一例を示す。 By performing secondary editing after viewing as described above, the annotation information data 37 with high accuracy and high quality can be obtained.
Using the time information 35 of the annotation information data 37, not only cutting, combining, and editing, but also random access to video contents and free editing such as creating playlists, etc., can be expanded. This is the final object of the present invention, and an example is shown below.

図２５はデータベース検索の例である。
本発明の最大の特徴は印象用語を見出し用語としてこれに関連する映像コンテンツのジャンル特有のジャンル用語を選択し登録する方式であるため、選択の用語が限定され、適切な用語が選定可能である、あいまい検索等のような検索システムの負担も少ない。 FIG. 25 shows an example of database search.
The greatest feature of the present invention is a method of selecting and registering genre terms specific to the genre of video content related to impression terms as headline terms, so that selection terms are limited and appropriate terms can be selected. The burden of the search system such as fuzzy search is also small.

また出来あがったアノテーション情報データ３７は編集者の意図に沿った内容となり、利用者に最適な個人ごとのパーソナルなものとすることが出来る。
従がって、データベース内に複数の編集者によるアノテーション情報データ３７がある場合、アノテーション情報付与者名であるこの個人別情報１０８を選択することにより、選択した編集者の印象にもとづく検索が可能であり検索の意図にそった映像コンテンツのシーンの検索が可能になる。 The completed annotation information data 37 has contents in line with the editor's intention, and can be personalized for each individual optimum for the user.
Therefore, when there is annotation information data 37 by a plurality of editors in the database, it is possible to perform a search based on the impression of the selected editor by selecting the individual information 108 which is the name of the annotation information assignor. Thus, it is possible to search for a scene of video content that matches the search intention.

また映像コンテンツのタイトルごとにデータベース化されたアノテーション情報データベース３４はさまざまデータとして加工することも出来、映像コンテンツのそれぞれのジャンル、階層ごとの区分、用語、記号などあらゆる条件で検索を行い、映像コンテンツの詳細な内容を検索することが可能である。 In addition, the annotation information database 34 created as a database for each title of video content can be processed as various data, and the video content can be searched under various conditions such as each genre of video content, classification by hierarchy, term, symbol, etc. It is possible to search the detailed contents of.

図２５では編集者、映像コンテンツのジャンル、第１階層から第５階層まで独立させて検索情報を入力し複数の映像コンテンツの文字情報等付与シーン１０７の中から検索条件に合致する映像コンテンツの文字情報等付与シーン１０７として検出する場合の概要を示している。
独立させずに全体を一括して編集用語を検索させることも自由である。
以上のような検索結果はさまざまな種類のハイライトシーンのプレーリストに利用することが出来る。
映像コンテンツ全体の印象用語ごと、さらには印象の度合いごと、等のハイライトダイジェスト版、等さまざまな検索結果にもとづき、映像コンテンツの利用範囲が拡大される。
ハイライトシーン以外の、不要のシーンや見たくいないシーンの編集にも有効であることは説明する必要もない。 In FIG. 25, the editor, the genre of the video content, the search information is input independently from the first layer to the fifth layer, and the character of the video content that meets the search condition from the given scene 107 such as character information of a plurality of video content. The outline in the case of detecting the information etc. addition scene 107 is shown.
It is also free to search the edited terms collectively without being independent.
The above search results can be used for playlists of various types of highlight scenes.
The range of use of the video content is expanded on the basis of various search results such as the highlight digest version of each impression term of the entire video content, and also the degree of impression.
There is no need to explain that it is also effective for editing unnecessary scenes and scenes that you do not want to see other than highlight scenes.

本発明の展開方法１として図１に示すように編集用語辞書１９はインターネット回線で最新データを更新することや、個別番組ごとのデータを番組開始前にデータ配信、またはダウンロードすることが考えられる。 As an expansion method 1 of the present invention, as shown in FIG. 1, the editing term dictionary 19 can update the latest data via the Internet line, or distribute or download data for each individual program before the program starts.

本発明の展開方法２として放送番組のＥＰＧデータの階層構造を編集用語辞書１９と直接連携出来る構造とすることが出来れば装置構成が簡素化され、更に活用の範囲が拡大される。 If the hierarchical structure of the EPG data of the broadcast program can be directly linked with the editing term dictionary 19 as the expansion method 2 of the present invention, the apparatus configuration is simplified and the range of utilization is further expanded.

これまでの説明を整理すると、システム開発側は、
辞書構成は、全ての映像コンテンツのジャンルに共通にハイライトシーンを含むさまざまな編集シーンに利用出来る印象を表す形容系の用語（印象用語）を見出し用語にして、この見出し用語に関連するジャンル用語を映像コンテンツのジャンル別に登録することが出来るので、辞書の用語はジャンル別に限定的なものとなり、辞書構築の負担が少なく、ジャンル用語の利用度に応じ追加、削除も自由である。 To summarize the explanation so far, the system development side,
The dictionary structure uses common terms (impression terms) representing impressions that can be used in various editing scenes including highlight scenes as common to all video content genres, and genre terms related to this heading term. Can be registered by genre of video content, dictionary terms are limited by genre, the burden of dictionary construction is small, and additions and deletions are free according to the usage of genre terms.

映像コンテンツのジャンルをＥＰＧデータのジャンルを用いることが出来るので辞書構成の標準化がしやすい、またＥＰＧデータにより番組ごとの用語を持たすことも可能となる、辞書データはインターネット等の通信回線からダウンロードさせることが出来るため最新版の更新も自由である。 The genre of EPG data can be used as the genre of video content, so it is easy to standardize the dictionary structure, and it is also possible to have terms for each program using EPG data. The dictionary data is downloaded from a communication line such as the Internet. It is possible to update the latest version.

辞書として使用される用語が限定的となることにより、完成したアノテーション情報データの検索の際には、あいまい検索などの必要もなく、装置負担をかけることなく適確で効率的な検索をすることが出来る。 Because the terms used as dictionaries are limited, when searching for completed annotation information data, there is no need for fuzzy searches, etc., and an accurate and efficient search without burdening the equipment I can do it.

市場に広く流通しているボタン式のリモコンのボタン機能をそのまま利用出来るので装置ハード開発が容易である。 Device hardware development is easy because the button function of the button-type remote control widely distributed in the market can be used as it is.

音声操作における音声認識も最大３０の信号情報としての音声を認識することでアノテーション情報データを作成することが出来るので、音声認識についても装置負担が少ない。 Since voice information in voice operation can also generate annotation information data by recognizing voice as a maximum of 30 signal information, the apparatus burden is also low for voice recognition.

一方本発明のシステム利用者側におけるメリットとして、
視聴環境に影響しない使い慣れたボタン式のリモコンのボタン機能をそのまま利用して、印象を表す用語を見出し用語として文字情報等が入力出来るので操作の違和感がなく、全ての映像コンテンツのジャンルを対象としてだれでも簡単に放送に追従してリアルタイムでの利用ができる。 On the other hand, as a merit on the system user side of the present invention,
By using the button functions of the familiar button-type remote control that does not affect the viewing environment as it is, you can enter text information etc. as a headline term that represents the impression, so there is no sense of incongruity in operation, and it targets all video content genres Anyone can follow the broadcast easily and use it in real time.

視聴環境によってマイクロフォンを接続し音声認識で文字情報等を入力することも可能で、最大３０程度の発声により行われるので特別の習熟の必要もなく誤認識も少ない。 It is also possible to connect a microphone depending on the viewing environment and input character information etc. by voice recognition. Since it is performed with a maximum of about 30 utterances, there is no need for special learning and there are few false recognitions.

視聴時に感じた印象をもと操作することにより適切な編集のための用語が案内されるので、誰でも最適なにアノテーション情報を作成出来るとともに、個人個人思い思いのプライベートアノテーション情報データとすることが出来るとともに印象の度合いも簡単に登録出来るので、検索においても個人個人の印象の種類やその度合いをもとに最適な検索が可能になる。 By operating based on the impression felt during viewing, the terms for appropriate editing are guided, so anyone can create annotation information optimally and can create private annotation information data that is personally personalized. At the same time, the degree of impression can be easily registered, so that an optimum search can be performed based on the type and degree of impression of an individual person.

視聴時に付与したシーンのアノテーション情報を検索して、当該シーンをダイレクトにランダムアクセスし複数の映像コンテンツの中からお好みのシーンのみを連続してダイジェストで再生させるような応用が可能となる。 It is possible to search for annotation information of a scene given at the time of viewing, directly access the scene in a random manner, and continuously reproduce only a favorite scene from a plurality of video contents by digest.

タイムシフト（追いかけ再生）手段を使うことにより、映像コンテンツの途中シーンの見逃しをなくすことも出来る。 By using the time shift (chase playback) means, it is possible to eliminate missing scenes in the middle of the video content.

新しい情報などの用語はジャンル別または個別番組別に放送データや通信回線でダウンロードすることにより当該映像コンテンツに最適で、最新用語によるアノテーション情報を作成することが出来る。 Terminology such as new information is optimal for the video content by downloading it by broadcast data or communication line by genre or individual program, and annotation information by the latest term can be created.

以上の説明のように本発明は、特別な装置、部品、組立技術を用いることなく、現在市場に広く流通している、装置、部品、組立て、の技術で実現可能なアノテーション情報付与システムであり、家庭用汎用録画装置、ビデオカメラ、編集装置はもとより専門映像装置等に広く利用することが出来る。 As described above, the present invention is an annotation information providing system that can be realized with the technology of devices, parts, and assemblies that are currently widely distributed in the market without using special devices, components, and assembly techniques. It can be widely used not only for home general-purpose recording devices, video cameras and editing devices but also for professional video devices.

１映像装置
２ＵＩ（ユーザーインターフェース）装置
３リモコン
４マイクロフォン
５リモコン送信信号
６マイクロフォン音声信号
７映像信号
８ＧＵＩ（グラフィックユーザーインターフェース）表示信号
９メインディスプレー
１０サブディスプレー
１１ＧＵＩ部
１２ＧＵＩ制御部
１３ＧＵＩ表示部
１４映像コンテンツの録画、再生、表示部
１５ＵＩ受信信号
１６辞書登録部
１７辞書用語ダウンロード部
１８辞書用語キーボード入力部
１９編集用語辞書
２０辞書用語選択部
２１リモコン信号受信部
２２ＵＩ装置情報認識部
２３ジャンル選択部
２４音声認識部
２５アノテーション情報作成部
２６インターネット通信信号
２７キーボード信号
２８辞書データ
２９選択された用語
３０アノテーション情報データ検索部
３１映像コンテンツ記憶部（または搭載された映像コンテンツ）
３２タイトル
３３映像コンテンツ
３４アノテーション情報データベース
３５時刻情報
３６関連情報
３７アノテーション情報データ
４０ディスプレー選択スイッチ
４１アンテナ入力
４２外部映像入力
４３チャンネルボタン
４４カラーボタン
４５チャプタボタン
４６選択番号
４７選択項目
４８ファンクション内容
４９カーソルボタン
５０カーソル
５１リモコン操作
５２音声操作
１０１ジャンル
１０２ジャンル区分
１０３階層
１０４印象区分
１０５印象用語
１０６ジャンル用語
１０７文字情報等付与シーン
１０８個人別情報 1 Video device 2 UI (user interface) device 3 Remote control 4 Microphone 5 Remote control transmission signal 6 Microphone audio signal 7 Video signal 8 GUI (graphic user interface) display signal 9 Main display 10 Sub display 11 GUI unit 12 GUI control unit 13 GUI display Unit 14 video content recording / playback / display unit 15 UI reception signal 16 dictionary registration unit 17 dictionary term download unit 18 dictionary term keyboard input unit 19 editing term dictionary 20 dictionary term selection unit 21 remote control signal reception unit 22 UI device information recognition unit 23 Genre selection unit 24 Voice recognition unit 25 Annotation information creation unit 26 Internet communication signal 27 Keyboard signal 28 Dictionary data 29 Selected term 30 Annotation information data search unit 31 Content storage (or video content installed)
32 Title 33 Video content 34 Annotation information database 35 Time information 36 Related information 37 Annotation information data 40 Display selection switch 41 Antenna input 42 External video input 43 Channel button 44 Color button 45 Chapter button 46 Selection number 47 Selection item 48 Function content 49 Cursor Button 50 Cursor 51 Remote control operation 52 Voice operation 101 Genre 102 Genre category 103 Hierarchy 104 Impression category 105 Impression term 106 Genre term 107 Character information etc. addition scene 108 Individual information

Claims

A system for adding annotation information to an arbitrary scene of video content including a self-made video composed of a video device and the user interface device,
The video device
The heading term of impression terms, which is a term that expresses the impression of viewing a scene common to scenes of all genres of video content, and the genre terms, which are terms specific to the genre of video content, are classified by genre and hierarchy of video content. An associated editorial dictionary of terms,
The user interface device is
The scene position to which annotation information is added sequentially from the start of viewing video content is specified, the heading term and the genre term in the editing term dictionary are sequentially selected for the specified scene, and the above designation and selected signal Means for transmitting information to the video device;
Furthermore, the video device
An annotation information creation unit that creates annotation information data based on an edited term dictionary based on signal information received from the user interface;
An annotation information adding system for video content, comprising:

2. The annotation information adding system according to claim 1, wherein the heading terms and the genre terms in the editing term dictionary are configured to have a maximum of 12 terms per group.

2. The video content annotation information adding system according to claim 1, wherein information on the degree of impression of a scene is registered in the editing term dictionary, and the degree of impression is selected and used as annotation information.

The video device includes a remote control signal receiver,
The user interface device is a remote controller having at least 20 operation buttons, and the signal information to be specified and selected is a remote control transmission signal,
By operating this remote control button, specify the scene position, select the heading term and the genre term in the editing term dictionary,
2. The video content annotation information according to claim 1, wherein the video device receives the signal information at the remote control signal reception unit, and the annotation information creation unit creates the annotation information data based on the terms in the editing term dictionary. Grant system.

The video device includes a voice recognition unit,
The user interface device is a voice microphone, and the signal information to be specified and selected is a microphone voice signal,
By emitting at most 30 kinds of sounds to this microphone, the scene position is designated, the heading terms in the editing term dictionary and the genre terms are selected,
2. The video content according to claim 1, wherein the audio recognition unit recognizes a microphone audio signal as signal information in the audio recognition unit, and the annotation information generation unit generates the annotation information data based on terms in the editing term dictionary. Annotation information assignment system.

2. The video content annotation information adding system according to claim 1, wherein the video device comprises genre selection means for automatically selecting a genre of video content from an EPG (Electronic Program Guide) genre.

2. The video content annotation information adding system according to claim 1, wherein the video device includes time shift playback (chasing playback) means, and pauses during editing of the annotation information.

2. The video content annotation information adding system according to claim 1, wherein an annotation information assignor name is registered in the annotation information.

The video device includes a dictionary download unit for downloading an edited term dictionary from a communication line, a keyboard input unit for an external keyboard for term registration,
The system for adding annotation information of video content according to claim 1, further comprising:

2. The video content annotation information adding system according to claim 1, wherein the video device uses an edit term dictionary for each individual program based on either the EPG data or data downloaded from the Internet.

Headline terms common to all genres of video content,
Genre terms, which are terms specific to the genre of video content,
An annotation information adding method for video content, characterized in that an annotation information data is created by selecting a term from an editing term dictionary configured by associating a video content with each genre and hierarchy.

12. The method for adding annotation information of video content according to claim 11, wherein the headline term common to all the genres is a term representing an impression of viewing a scene.