JP4341503B2

JP4341503B2 - Information signal processing method, information signal processing apparatus, and program recording medium

Info

Publication number: JP4341503B2
Application number: JP2004233944A
Authority: JP
Inventors: 昇村林; 裕成岡本; 勝宮本
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2004-08-10
Filing date: 2004-08-10
Publication date: 2009-10-07
Anticipated expiration: 2024-08-10
Also published as: JP2006054619A

Description

本発明は、例えば放送番組における映像信号、音声信号などの画像音声情報信号をＭＰＥＧなど所定の帯域圧縮処理を行い、光磁気ディスク、ハードディスク（HDD:Hard Disk Drive）、半導体メモリーなどの所定記録媒体に記録し再生する記録再生装置において、所定のダイジェスト再生処理などの特殊再生動作を行う場合の情報信号処理方法、情報信号処理装置及びプログラム記録媒体に関する。 The present invention performs predetermined band compression processing such as MPEG on video and audio information signals such as video signals and audio signals in broadcast programs, for example, and a predetermined recording medium such as a magneto-optical disk, hard disk (HDD), and semiconductor memory The present invention relates to an information signal processing method, an information signal processing apparatus, and a program recording medium when a special reproduction operation such as a predetermined digest reproduction process is performed in a recording / reproducing apparatus that records and reproduces information on a recording medium.

従来のＶＴＲやディスク記録再生装置においては、長時間記録した記録内容を短時間で再生してその内容を把握する場合に、音声情報の理解速度を考慮しておおよそ１．５〜２倍速程度で再生処理が行われている。 In a conventional VTR or disc recording / reproducing apparatus, when the recorded contents recorded for a long time are reproduced in a short time and the contents are grasped, the speed of understanding the voice information is taken into consideration at about 1.5 to 2 times speed. A playback process is being performed.

さらに短時間で再生しその要約再生（ダイジェスト再生）を行おうとすると、高速度の音声出力では内容が理解できにくいため、無音とし画像出力のみの再生処理を行うことが一般的であった。 If playback is further performed in a short time and summary playback (digest playback) is to be performed, it is difficult to understand the content with high-speed audio output.

そこで、記録する放送番組の画像音声データから所定の特徴データを抽出して、その所定の特徴データを用いてキーフレーム（重要フレーム）と思われるキーフレーム区間の検出を行い、あらかじめ決めた所定の規則に従って、所定のキーフレーク区間を順次選択して再生することで、元の記録した放送番組の記録時間よりも短い所定時間内でダイジェスト再生を行うことが考えられている。 Therefore, predetermined feature data is extracted from the video / audio data of the broadcast program to be recorded, and a key frame section that is considered to be a key frame (important frame) is detected using the predetermined feature data, and a predetermined predetermined data is detected. It is considered to perform digest reproduction within a predetermined time shorter than the recording time of the originally recorded broadcast program by sequentially selecting and reproducing predetermined key flake sections according to the rules.

また、記録した所定区間において、一定時間間隔、例えば３分、５分、１０分などの間隔ごとに再生位置を示す位置情報データを自動で生成するか、又は、ユーザーが手動で所望の位置に位置情報データを生成する、いわゆるチャプターデータ生成を行い、その位置情報データ（チャプターデータ）を利用して、スキップ再生、編集操作、サムネール画像表示を行うことが考えられている。 In addition, position information data indicating a playback position is automatically generated at a predetermined time interval, for example, every 3 minutes, 5 minutes, 10 minutes, or the like, or a user manually sets a desired position in a recorded predetermined section. It is considered to generate position information data, so-called chapter data generation, and to perform skip reproduction, editing operation, and thumbnail image display using the position information data (chapter data).

特開２００２−３４４８７２号公報JP 2002-344872A

上記の特徴データは、画像信号と音声信号毎に複数の特徴データを検出することができ、各々の特徴データを例えば、画像音声信号の記録の際に抽出処理を行い、画像音声記録データと共に抽出データを記録媒体に記録することが考えられる。 The feature data can detect a plurality of feature data for each image signal and sound signal, and each feature data is extracted together with the image sound recording data, for example, when the image sound signal is recorded. It is conceivable to record data on a recording medium.

これら記録された特徴データを読み出して所定のルール処理によりダイジェスト再生区間を決定する信号処理を行うことになるが、複数存在する特徴データを各々別々にファイルとして記録媒体に記録したのでは、ファイルの数が多くなり、信号処理の際におけるファイルの取扱が煩雑になり効率的ではなく、従来ではこれらの問題を解決する効果的な技術はなかった。 The recorded feature data is read out and signal processing is performed to determine the digest playback section by a predetermined rule process. However, if a plurality of feature data are separately recorded on the recording medium as files, The number of files increases and handling of files during signal processing becomes complicated and is not efficient. Conventionally, there has been no effective technique for solving these problems.

そこで、本発明の目的は、上述の如き従来の実情に鑑み、特徴抽出データを用いる効果的なダイジェスト再生動作又はチャプター処理を行うために、特徴抽出データを効率良く処理し、効果的なダイジェスト再生、チャプターデータを用いる種々の動作を効率良く行うための情報信号処理方法、情報信号処理装置及びプログラム記録媒体を提供することにある。 Therefore, in view of the conventional situation as described above, an object of the present invention is to efficiently process feature extraction data and perform effective digest reproduction in order to perform effective digest reproduction operation or chapter processing using feature extraction data. Another object of the present invention is to provide an information signal processing method, an information signal processing apparatus, and a program recording medium for efficiently performing various operations using chapter data.

本発明の更に他の目的、本発明によって得られる具体的な利点は、以下に説明される実施の形態の説明から一層明らかにされる。 Other objects of the present invention and specific advantages obtained by the present invention will become more apparent from the description of embodiments described below.

本発明は、所定の記録媒体を用いて所定の画像音声情報信号を所定の帯域圧縮信号処理により記録又は再生を行う情報信号処理方法において、画像音声情報信号を記録する場合に、アナログ入力系の画像音声情報信号は、画像又は音声信号における所定の特性信号を記録の際に自動的に抽出処理し、デジタル入力系の画像音声情報信号は、上記画像音声情報信号の記録終了後に画像又は音声信号における所定の特性信号を抽出する特性データ抽出処理を自動的に行うか、又は上記画像音声情報信号の記録終了後に上記特性データ抽出処理を所定の選択操作により自動的に行うか所望の時に手動的に行うかを選択して行う情報信号処理を実行することを特徴とする。 The present invention provides an information signal processing method for recording or reproducing a predetermined image / audio information signal by using a predetermined band compression signal process using a predetermined recording medium, in the case of recording an image / audio information signal . The image / audio information signal is automatically extracted at the time of recording a predetermined characteristic signal in the image / audio signal . The image / audio information signal of the digital input system is the image / audio signal after the recording of the image / audio information signal is completed. The characteristic data extraction process for extracting a predetermined characteristic signal at the time is automatically performed, or the characteristic data extraction process is automatically performed by a predetermined selection operation after the recording of the video / audio information signal is completed, or manually when desired. It is characterized in that information signal processing performed by selecting whether or not to perform is performed .

本発明に係る情報信号処理装置は、所定の画像音声情報信号を所定の帯域圧縮信号処理により所定の記録媒体に記録を行う記録手段と、上記画像音声情報信号の所定区間毎に所定の特性データを抽出する特性データ抽出処理を行う特性データ抽出手段と、上記記録手段より画像音声情報信号を記録する場合に、アナログ入力系の画像音声情報信号は、上記特性データ抽出手段により上記特性データ抽出処理を上記画像音声情報信号の記録の際に自動的に行い、デジタル入力系の画像音声情報信号は、上記特性データ抽出手段により、上記画像音声情報信号の記録終了後に上記特性データ抽出処理を自動的に行うか、又は上記画像音声情報信号の記録終了後に上記特性データ抽出処理を所定の選択操作手段により自動的に行うか所望の時に手動的に行うかを選択して行う情報処理手段とを備えたことを特徴とする。 An information signal processing apparatus according to the present invention comprises a recording means for recording a predetermined image / audio information signal on a predetermined recording medium by predetermined band compression signal processing, and predetermined characteristic data for each predetermined section of the image / audio information signal. a characteristic data extraction unit for performing characteristic data extraction process for extracting, when recording image audio information signal from the recording unit, image and sound information signals of the analog input system, the characteristic data extraction process by the characteristic data extraction unit Is automatically performed at the time of recording the audio / video information signal, and the audio / video information signal of the digital input system is automatically subjected to the characteristic data extraction processing by the characteristic data extracting means after the recording of the audio / video information signal. Or after the recording of the video / audio information signal is completed, the characteristic data extraction process is automatically performed by a predetermined selection operation means or manually when desired. Characterized by comprising an information processing means for performing either the select performed.

本発明は、所定の記録媒体を用いて所定の画像音声情報信号を所定の帯域圧縮信号処理により記録又は再生を行う制御プログラムがコンピュータにより読み取り実行可能に記録されたプログラム記録媒体であって、画像音声情報信号を記録する場合に、アナログ入力系の画像音声情報信号は、画像又は音声信号における所定の特性信号を記録の際に自動的に抽出処理し、デジタル入力系の画像音声情報信号は、上記画像音声情報信号の記録終了後に画像又は音声信号における所定の特性信号を抽出する特性データ抽出処理を自動的に行うか、又は上記画像音声情報信号の記録終了後に上記特性データ抽出処理を所定の選択操作により自動的に行うか所望の時に手動的に行うかを選択して行うことを特徴とする制御プログラムがコンピュータにより読み取り実行可能に記録されてなることを特徴とする。 The present invention is a program recording medium in which a control program for recording or reproducing a predetermined video / audio information signal by a predetermined band compression signal process using a predetermined recording medium is recorded so as to be readable and executable by a computer. When recording an audio information signal, an analog input system image audio information signal is automatically extracted at the time of recording a predetermined characteristic signal in the image or audio signal, and a digital input system image audio information signal is The characteristic data extraction process for extracting a predetermined characteristic signal in the image or audio signal is automatically performed after the recording of the image / audio information signal is completed, or the characteristic data extraction process is performed after the recording of the image / audio information signal is completed. A control program characterized by selecting whether to perform automatically by a selection operation or manually when desired is stored in a computer Ri is characterized by comprising a read executable recorded.

本発明により、画像音声情報信号記録再生において効率的に所定の特徴抽出処理を行うことができ、これにより特徴抽出データを用いた効果的なダイジェスト再生やチャプターデータ（所定の記録位置情報データ）設定といった機能を効率的に実現できる。 According to the present invention , predetermined feature extraction processing can be efficiently performed in video / audio information signal recording / reproduction, thereby enabling effective digest reproduction and chapter data (predetermined recording position information data) setting using feature extraction data. Can be efficiently realized.

また、本発明によりユーザーが上記のようなある機能が搭載されていない記録再生装置を購入した後でも、その機能が欲しいと思った場合に購入した装置そのものに、その機能を容易に動作可能な状態とすることができる。 In addition, according to the present invention, even after a user purchases a recording / reproducing apparatus not equipped with a certain function as described above, if the function is desired, the function can be easily operated on the purchased apparatus itself. State.

さらに、基本的な機能だけを装備した記録再生装置を初期段階で販売し、その後、色々な各ユーザーの要望に応じて自分の欲しい機能を購入した装置そのものに容易に後から装備することができるので、ユーザーは効率的にその装置を購入することができる。 In addition, recording / playback devices equipped with only basic functions can be sold at an early stage, and after that, according to the demands of various users, they can easily be equipped later on the purchased devices themselves. Therefore, the user can purchase the device efficiently.

本発明によれば、複数種類の異なる画像特徴データと音声特徴データを効率良く各々の特徴データファイル又は１つの特徴データファイルとしてまとめることができる。例えば、画像特徴のカメラ特徴、テロップ特徴、シーン特徴、カラー特徴、音声特徴といった複数ある特徴データを所定の書式により効率良くデータファイルとして処理を行うことができ、画像音声データファイルと共に所定の記録媒体に記録してファイル管理、信号処理の際のファイル処理なども効率的に行うことができる。 According to the present invention, it is possible to efficiently combine a plurality of different types of image feature data and audio feature data as each feature data file or one feature data file. For example, a plurality of feature data such as a camera feature, a telop feature, a scene feature, a color feature, and an audio feature of an image feature can be efficiently processed as a data file in a predetermined format. File management, file processing during signal processing, etc. can be efficiently performed.

また、特徴データ毎にファイルを設けないので、記録媒体において特徴データ毎にファイルを設ける場合に比べてそれだけファイルが占有する記録容量が少なくてすむことになる。 In addition, since no file is provided for each feature data, the recording capacity occupied by the file can be reduced as compared with the case where a file is provided for each feature data in the recording medium.

以下、本発明の実施の形態について、図面を参照して以下の順序で詳細に説明する。なお、本発明は、以下の例に限定されるものではなく、本発明の要旨を逸脱しない範囲で、任意に変更可能であることは言うまでもない。
（１）本発明を適用したシステムの概要
１．１特徴抽出データを用いた要約再生及び所定時点設定処理
ここでは、本発明の動作処理概要について説明する。 Hereinafter, embodiments of the present invention will be described in detail in the following order with reference to the drawings. The present invention is not limited to the following examples, and it is needless to say that the present invention can be arbitrarily changed without departing from the gist of the present invention.
(1) Outline of System to which Present Invention is Applied 1.1 Summary Reproduction and Predetermined Time Setting Process Using Feature Extraction Data Here, an outline of operation process of the present invention will be described.

下記の動作概要に関係する信号処理については、ここでの項目の他に後の項目で詳細に説明する。 The signal processing related to the following operation outline will be described in detail in the following items in addition to the items here.

また、下記の説明でプレイリスト（プレイリストデータ）生成に関する記述について、特別に記述する他に、特別に記述しない場合でもプレイリストデータ生成と所定チャプターデータ（所定の記録位置情報データ。以下同様）を一緒に生成処理するものとしても良い。 In addition, in the following description, in addition to specially describing the play list (play list data) generation, play list data generation and predetermined chapter data (predetermined recording position information data; the same applies hereinafter) even when not specifically described. May be generated and processed together.

特徴抽出データを用いた要約再生（ダイジェスト再生）及びチャプター処理の説明図を図１の（Ａ）〜（Ｇ）に示す。 Explanatory diagrams of summary reproduction (digest reproduction) and chapter processing using feature extraction data are shown in FIGS.

はじめに特徴抽出データを用いた要約再生動作について説明する。
（特徴抽出データを用いたダイジェスト再生処理）
ここでは、図１の（Ａ）に示すような画像音声データ系列があると想定する。 First, a summary reproduction operation using feature extraction data will be described.
(Digest playback process using feature extraction data)
Here, it is assumed that there is a video / audio data sequence as shown in FIG.

この画像音声データ系列は、放送番組や映画ソフトその他などが考えられ、ハードディスク（ＨＤＤ）や光磁気ディスク、大容量半導体メモリーなど所定の記録媒体を用いて、ＭＰＥＧなど所定の帯域圧縮信号処理を用いて記録及び再生処理を行うものと仮定する。 The video / audio data series may be broadcast programs, movie software, etc., using a predetermined recording medium such as a hard disk (HDD), a magneto-optical disk, a large-capacity semiconductor memory, and a predetermined band compression signal processing such as MPEG. It is assumed that recording and playback processing is performed.

上記画像音声データ系列において、所定の意味的を設定し、シーンチェンジ、音声セグメントなどに応じて所定のビデオ構造（意味的ビデオ構造）に区切った所定区間の概念図を図１の（Ｂ）に示す。 FIG. 1B is a conceptual diagram of a predetermined section in which predetermined semantics are set in the video / audio data series and divided into predetermined video structures (semantic video structures) according to scene changes, audio segments, and the like. Show.

この所定の意味の設定処理、所定区間の設定処理、ビデオ構造などについては後で説明する。 The predetermined meaning setting process, the predetermined section setting process, the video structure, and the like will be described later.

ここで、図１の（Ｃ）に示すように、上記意味を設定所定の区間に区切った各々の区間毎に、所定時間内に記録された全区間、所定プログラム区間など、所定の全区間における上記各々の区間の所定評価値を設定する。 Here, as shown in FIG. 1C, for each section obtained by dividing the above meaning into predetermined sections, all sections recorded within a predetermined time, predetermined program sections, etc. A predetermined evaluation value for each of the above sections is set.

ここで、上記所定評価値は上記所定全区間における所定キーフレーム（重要フレーム、重要（画像音声）区間）となる場合ほど、所定の高い評価値（評価データ）を設定すると仮定する。 Here, it is assumed that a predetermined higher evaluation value (evaluation data) is set as the predetermined evaluation value becomes a predetermined key frame (important frame, important (video / audio) interval) in all the predetermined intervals.

すなわち、所定の評価データ区間を再生すれば、その区間にはキーフレーム区間が含まれるので、全区間を再生しなくても概要を把握することができることになる。 That is, if a predetermined evaluation data section is reproduced, the section includes a key frame section, so that the outline can be grasped without reproducing all the sections.

図１の（Ｃ）は上記所定評価値区間の概要を示すもので、図１の（Ａ）に示す画像音声データ系列で、ｆ１〜ｆ２、ｆ４〜ｆ５、ｆ７〜ｆ８の各区間が評価値において設定したしきい値Ｔｈ以上の区間で、図１の（Ｄ）に示すようにＡ１、Ａ２、Ａ３の各区間を所定の要約再生モード時にスキップ再生することで所定のダイジェスト再生を行うことになる。
（特徴抽出データを用いた自動チャプター処理）
図１の（Ｅ）はチャプター点を設定する場合の概念図であり、先に説明したような、所定キーフレーム（重要フレーム、重要評価値区間）の先頭又はその近傍、及び、そのキーフレームの区間の最後に続く（最後に接続する）キーフレーム区間ではない区間の先頭又はその近傍にチャプター点を設定することが考えられる。 (C) in FIG. 1 shows an outline of the predetermined evaluation value section. In the video / audio data series shown in (A) in FIG. 1, each section of f1 to f2, f4 to f5, and f7 to f8 is an evaluation value. As shown in FIG. 1D, the A1, A2, and A3 sections are skip-played in a predetermined summary playback mode to perform a predetermined digest playback in a section equal to or greater than the threshold value Th set in FIG. Become.
(Automatic chapter processing using feature extraction data)
FIG. 1E is a conceptual diagram in the case of setting a chapter point. As described above, the beginning of the predetermined key frame (important frame, important evaluation value section) or its vicinity, and the key frame. It is conceivable to set a chapter point at or near the beginning of a section that is not the key frame section that follows (connects to the end) at the end of the section.

ここで、例えば、従来技術のＤＶＤ記録再生装置で自動チャプター機能と言われる所定区間の区切り点を設定することで、その時点を編集操作の目安にしたり、早送りフォワード再生（ＦＦ再生）、早送り逆再生（リワインド再生、ＲＥＷ再生）などの場合に利用できることが考えられる。 Here, for example, in a conventional DVD recording / reproducing apparatus, by setting a breakpoint of a predetermined section called an automatic chapter function, that point can be used as a guideline for editing operation, fast forward playback (FF playback), fast forward reverse It can be used in the case of reproduction (rewind reproduction, REW reproduction) or the like.

従来技術では、上記の自動チャプター機能として、例えば、５分等間隔、１０分等間隔、１５分等間隔などのように時間間隔が等間隔の処理が行われるものが知られており、このようなチャプター処理では図１の（Ｇ）に示すように、キーフレームと思われる時点の開始点にはチャプター点を設定できない場合が考えられる。 In the prior art, as the above-described automatic chapter function, for example, a process in which time intervals are processed at equal intervals such as 5 minutes, 10 minutes, 15 minutes, and the like is known. In such a chapter process, as shown in FIG. 1G, there may be a case where a chapter point cannot be set as a start point at a time point that seems to be a key frame.

また、従来技術においては、手動チャプター処理という、ユーザー自身が、所望するに任意の時点にチャプター点を設定できる機能もあるが、ユーザー自身が記録した、あるいは記録する番組（プログラム）を実際に見て設定処理を行うことになるので、ユーザーにとっては面倒な操作であり、効率的ではないと考えられる。 In addition, in the prior art, there is a function called manual chapter processing that allows a user to set a chapter point at an arbitrary time as desired. However, a program (program) recorded or recorded by the user himself / herself is actually viewed. This is a troublesome operation for the user and is not efficient.

しかし、先に説明したように、本発明の特徴抽出データを用いたチャプター点設定処理では、図１の（Ｅ）に示すように、適切にキーフレーム区間の先頭又はその近傍と、そのキーフレーム区間の最後に接続される（又は最後に続く）キーフレーム区間ではない区間の先頭又はその近傍にチャプター点を自動的に設定処理することができるので、従来技術によるチャプター処理よりも、より効果的なチャプター点設定を行うことができるとともに、このチャプター処理を用いた効果的な編集操作（編集処理）や、ＦＦ再生、ＲＥＷ再生を行うことができる。 However, as described above, in the chapter point setting process using the feature extraction data of the present invention, as shown in FIG. 1E, the head of the key frame section or its vicinity and the key frame are appropriately displayed. Since chapter points can be automatically set at or near the beginning of a section that is not connected to the end of the section (or continues to the last) key frame section, it is more effective than chapter processing according to the prior art. It is possible to set chapter points, and to perform effective editing operations (editing processing) using this chapter processing, FF playback, and REW playback.

ここで、図１の（Ｆ）に示す自動設定したチャプター点を所定の大きさのサムネール画像として所定の画像モニターに表示させる場合の概念図を図２に示す。 Here, FIG. 2 shows a conceptual diagram in a case where the automatically set chapter points shown in FIG. 1F are displayed on a predetermined image monitor as thumbnail images of a predetermined size.

図１の（Ｆ）に示すように、ｆ１、ｆ４、ｆ７が各々、所定キーフレーム区間Ａ１、Ａ２、Ａ３の先頭（又はその近傍）で、ｆ３、ｆ６、ｆ９が各々Ａ１、Ａ２、Ａ３の区間の後のキーフレーム区間ではない区間Ｂ１、Ｂ２、Ｂ３の先頭（又はその近傍）であり、ユーザーは図２のような表示画面を見ることで、例えば、記録再生装置の処理記録媒体であるハードディスクに記録された放送番組で図１の（Ｄ）に示すキーフレーム区間Ａ１、Ａ２、Ａ３を切り出し、ＤＶＤなどのディスク記録媒体に編集操作で記録することなどの処理や、ｆ１、ｆ４、ｆ７の時点にスキップ再生するなどの操作を想定することが考えられる。 As shown in FIG. 1 (F), f1, f4, and f7 are the heads of the predetermined key frame sections A1, A2, and A3 (or in the vicinity thereof), and f3, f6, and f9 are A1, A2, and A3, respectively. The head (or the vicinity thereof) of the sections B1, B2, and B3 that are not the key frame sections after the section, and the user sees the display screen as shown in FIG. 2, for example, the processing recording medium of the recording / reproducing apparatus. Processing such as cutting out the key frame sections A1, A2, and A3 shown in FIG. 1D in the broadcast program recorded on the hard disk and recording them on a disk recording medium such as a DVD by an editing operation, and f1, f4, and f7 It is conceivable to assume an operation such as skip playback at the point of time.

図１（Ｇ）に従来技術による所定時点設定点（チャプター点）の一例を示すように、所定の一定間隔、例えば、５分間隔、１０分間隔などの一定間隔又は略一定間隔で設定点（チャプター点）が設定処理されるが、図１の（Ｃ）、（Ｇ）から分かるように、必ずしもキーフレーム（重要フレーム）に設定されるとは限らない。 As shown in FIG. 1 (G) as an example of a predetermined time point set point (chapter point) according to the prior art, a set point (at a predetermined constant interval such as a 5-minute interval, a 10-minute interval, or a substantially constant interval). The chapter points are set, but as shown in FIGS. 1C and 1G, they are not necessarily set to key frames (important frames).

このように本発明における特徴抽出データを用いて自動的に所定のチャプター点（所定設定点、又は所定区切り点）又はセグメント処理を行うことで、より効果的な編集操作やスキップ再生を行うことができる。 As described above, by performing the predetermined chapter point (predetermined set point or predetermined breakpoint) or segment processing automatically using the feature extraction data in the present invention, more effective editing operation and skip reproduction can be performed. it can.

１．２本発明の処理プロセスの一例
本発明における処理プロセスの一例を図３に示す。 1.2 Example of Processing Process of the Present Invention An example of the processing process of the present invention is shown in FIG.

図３に示す処理プロセスの一例では、ＭＰＥＧ画像音声ストリームデータから、画像系及び音声系の各特徴データを抽出する特徴抽出処理（２）を含んでいる。 The example of the processing process shown in FIG. 3 includes a feature extraction process (2) for extracting image and audio feature data from MPEG video and audio stream data.

ここでは、簡単のためＭＰＥＧデータ（１）は、所定記録媒体に記録する、又は、所定記録媒体に記録されているデータを想定しているが、例えば、所定の伝送系（有線系又は無線系）において伝送される画像音声データにおいても同様に本発明を適用することができる。 Here, for simplicity, the MPEG data (1) is assumed to be recorded on a predetermined recording medium or data recorded on the predetermined recording medium. For example, a predetermined transmission system (wired system or wireless system) is assumed. The present invention can be similarly applied to the image and sound data transmitted in (1).

特徴抽出処理（２）は、記録処理と同時に行うことができるが、所定の記録媒体にすでに画像音声データが記録されている場合には、その記録媒体から再生して所定の特徴抽出処理を行うことも考えられる。 The feature extraction process (2) can be performed at the same time as the recording process. However, if image / audio data has already been recorded on a predetermined recording medium, the characteristic extraction process (2) is performed by reproducing from the recording medium. It is also possible.

ここで、規則処理（ルール処理）について説明する。 Here, rule processing (rule processing) will be described.

ルール処理は、ルールが所定の様式で記述されたファイル、又は、データを用いて所定の処理が行われる。 The rule processing is performed using a file or data in which the rules are described in a predetermined format.

ルールファイル、ルールデータは、例えば、番組ジャンルに応じた特徴データに基づくルールが記述されており、このルールファイル、ルールデータと所定区間の各特徴データが記述されたＰＵ特徴データファイル（再生ユニット化特徴データ）との演算により、所定プレイユニットファイルが生成されることになる。 In the rule file and rule data, for example, rules based on feature data corresponding to the program genre are described, and this rule file and rule feature data and PU feature data file describing each feature data of a predetermined section (reproduction unitization) A predetermined play unit file is generated by calculation with the feature data.

ここで、説明を分かりやすくするため、便宜上、所定番組ジャンルｎに対するルールファイルをＲｆ（ｎ）、ＰＵファイル（特徴データファイル）をＰｕ、プレイリストファイルをＤｆとし、所望の要約時間をｔとすると、以下の（１）式のような演算で表現できる。 Here, for the sake of convenience, for convenience, it is assumed that a rule file for a predetermined program genre n is Rf (n), a PU file (feature data file) is Pu, a playlist file is Df, and a desired summary time is t. , And can be expressed by the following equation (1).

Ｄｆ＝Ｐｕ（＊）Ｒｆ（ｎ）（＊）ｔ（１）式
ここで、（＊）は所定ファイルのデータを用いた所定の演算子と仮定する。 Df = Pu (*) Rf (n) (*) t (1) where (*) is assumed to be a predetermined operator using data of a predetermined file.

ルールファイル、ルールデータＲｆ（ｎ）は、以下で説明するように、例えば、所定の書式で記述され、所定の時間補正関数、意味の記述、意味の重み付け係数（評価値、重要度）その他などの所定パラメータのデータなどにより構成されている。 As will be described below, the rule file and the rule data Rf (n) are described in a predetermined format, for example, a predetermined time correction function, a description of meaning, a weighting coefficient (evaluation value, importance) of the meaning, etc. It is composed of data of predetermined parameters.

（再生ユニット処理）
特徴抽出処理（２）の後は、本発明の特徴の一つであるＰＵ（再生ユニット（プレイユニット））処理（３）を行う。 (Playback unit processing)
After the feature extraction process (2), a PU (playback unit (play unit)) process (3), which is one of the features of the present invention, is performed.

ＰＵ処理（３）において、各特徴データは、ＰＵという区切り（セグメント）（４）で所定のデータとして、又は、ファイルとして所定の記録媒体又はバッファメモリーに記録（記憶）される。
（規則１処理）
ＰＵデータは、所定の規則１処理（５）によりＰＵの意味付け処理が行われる。後で説明するが、規則１処理（５）の概要は次の通りである。 In the PU process (3), each feature data is recorded (stored) in a predetermined recording medium or buffer memory as predetermined data or as a file at a segment (segment) (4) called PU.
(Rule 1 processing)
The PU data is subjected to PU meaning processing by predetermined rule 1 processing (5). As will be described later, the outline of the rule 1 process (5) is as follows.

（処理１）各特徴の取り出し
（処理２）特徴の組合せから第１ルールで表現されている意味の中で最も条件を満たすものを選択
（処理３）選択された意味をそのＰＵの意味として採用
この規則１処理（５）では、ＥＰＧ（電子番組ガイド）その他により、番組ジャンル、又は、過去にユーザーが視聴した番組ジャンル、時間帯、再生回数、再生時刻、再生日時、その他などのパラメータ、サイド情報などが利用できる場合には、これらパラメータを考慮して所定の処理を行うようにしても良い。 (Processing 1) Extraction of each feature (Processing 2) From the combination of features, select the one that satisfies the most conditions among the meanings expressed by the first rule. (Processing 3) Adopt the selected meaning as the meaning of the PU. In the rule 1 process (5), parameters such as program genre, program genre, program genre watched by the user in the past, time zone, number of playbacks, playback time, playback date and time, etc. according to EPG (electronic program guide) and the like, side When information or the like can be used, predetermined processing may be performed in consideration of these parameters.

この処理に関連して行われる時間補正関数の処理については後で説明する。 The process of the time correction function performed in association with this process will be described later.

（規則２処理）
意味付けされたＰＵ（６）は、所定の規則２処理（７）で所定の評価値処理が行われる。 (Rule 2 processing)
The given PU (6) is subjected to a predetermined evaluation value process in a predetermined rule 2 process (7).

規則２処理（７）では、次の（処理１）及び（処理２）の重要度についての評価値処理を行う。 In the rule 2 process (7), the evaluation value process for the importance of the following (process 1) and (process 2) is performed.

（処理１）意味の重要度
（処理２）意味の出現パターンによる重要度
所定の評価値処理が行われたＰＵ（８）では、ＰＵ単体、又は、ＰＵが幾つか連結されたＰＵ群で所定の評価値が付けられている。 (Processing 1) Importance of meaning (Processing 2) Importance based on appearance pattern of meaning In PU (8) for which a predetermined evaluation value process has been performed, a PU alone or a PU group in which several PUs are connected is specified. The evaluation value is attached.

ここで、規則１処理（５）、規則２処理（７）では、図４に示すように、ルール切り替え処理系９００により、複数の番組ジャンルに応じたルール処理データとして、ジャンルＡ規則データ、ジャンルＢ規則データ、ジャンルＣ規則データ、・・・と幾つかの規則処理用データ（ルール処理データ）を備え、システムコントローラー系２０に入力した番組ジャンル情報データに応じて、規則１処理（５）、規則２処理（７）、又は、どちらか一方のルール処理を切り替えることも考えられる。 Here, in the rule 1 process (5) and the rule 2 process (7), as shown in FIG. 4, the rule switching process system 900 performs genre A rule data, genre as rule process data corresponding to a plurality of program genres. B rule data, genre C rule data,... And some rule processing data (rule processing data), and according to the program genre information data input to the system controller system 20, rule 1 processing (5), It is also conceivable to switch the rule 2 process (7) or one of the rule processes.

また、図４に示すように、個人別にルール処理用データを幾つか設けて切り替えることも考えられる。 In addition, as shown in FIG. 4, it is conceivable to switch by providing some rule processing data for each individual.

この場合は、所定動作モードにおいて、システムコントローラーに入力された所定のユーザーによる設定処理により、個人１用規則処理データ、個人２用規則処理データ、個人３用規則処理データ、・・・のいずれかが、システムコントローラー系２０を介して選択処理され、その選択された規則処理データに基づいて所定のルール処理が行われる。 In this case, in the predetermined operation mode, any one of the personal 1 rule processing data, the personal 2 rule processing data, the personal 3 rule processing data, etc., is set by a predetermined user input to the system controller. However, selection processing is performed via the system controller system 20, and predetermined rule processing is performed based on the selected rule processing data.

図４に示すような個人別の規則処理データを設けることで、例えば個人別に、通常再生又は特殊再生などの所定再生動作を行い、その再生状態、再生位置などの動作情報、動作位置情報などを所定の個人別規則処理に反映できるように所定メモリー手段に記憶して、それら情報データを所定の学習処理により、個人別規則処理データとして、随時、所定のタイミングでデータを更新処理するなどの動作を行うことが考えられ、そのような個人別学習処理には有効な信号処理方法と考えられる。 By providing individual rule processing data as shown in FIG. 4, for example, a predetermined reproduction operation such as normal reproduction or special reproduction is performed for each individual, and the reproduction state, operation information such as reproduction position, operation position information, etc. Operations such as storing data in predetermined memory means so that it can be reflected in predetermined individual rule processing, and updating the data as required by individual learning processing as individual rule processing data at a predetermined timing It is considered that this is an effective signal processing method for such individual learning processing.

図４に示すように、ルール切り替え処理系９０１により各個人別ルール処理（規則処理）を切り替える場合も、規則１処理（５）、規則２処理（７）、又は、どちらか一方のルール処理を切り替えることが考えられる。 As shown in FIG. 4, even when switching each individual rule process (rule process) by the rule switching processing system 901, the rule 1 process (5), the rule 2 process (7), or one of the rule processes is performed. It is possible to switch.

（規則処理の書式）
（規則１処理の場合）
ここで、意味付け処理されたＰＵは、例えば、ある放送番組を想定した場合に、以下のような英文字と意味を設定して、所定、画像音声特徴データと関連させて記述する。 (Rule processing format)
(Rule 1 processing)
Here, for example, assuming a certain broadcast program, the PU that has been subjected to meaning processing is described in association with predetermined audio / video feature data by setting the following English characters and meanings.

文字に対する意味付けは、その放送番組においてキーフレーム（重要フレーム、重要シーン）と想定されるであろうシーン、又は要約再生、チャプター設定などに有効と想定される所定の記録、再生区間を選択して記述する。 For the meaning of characters, select a scene that is assumed to be a key frame (important frame, important scene) in the broadcast program, or a predetermined recording and playback section that is assumed to be effective for summary playback, chapter setting, etc. Describe.

また、場合によっては、ユーザーが所望するシーンを記述することも考えられる。 In some cases, it may be possible to describe a scene desired by the user.

この場合は、所定の調整モードなどで、ユーザーが所望するルールを記述できるようにすることも考えられる。 In this case, it may be possible to describe a rule desired by the user in a predetermined adjustment mode.

ここで、図５の上段に示すように、ニュース番組、相撲番組の場合の一例について示すと次の表１のようになる。 Here, as shown in the upper part of FIG. 5, an example of a news program and a sumo program is shown in Table 1 below.

ここで、例えば、この例ではａでアナウンサーのシーンを抜き出すルールを記述しているが、一つの規則処理では、すべての想定されるａのシーン（アナウンサーの出現シーン）を抽出することはできないと思われるため、幾つかの複数の規則に分けて記述することも考えられる。 Here, for example, in this example, a rule for extracting an announcer's scene with a is described, but it is not possible to extract all the assumed a scenes (announcer appearing scenes) in one rule process. Because it seems, it can be considered to be divided into several rules.

ｂ，ｃ，ｄ，ｅなど他の場合についても同様に複数の規則に分けて考えることもできる。 Similarly, other cases such as b, c, d, and e can be divided into a plurality of rules.

相撲番組の場合では、次の表２のようになる。 In the case of a sumo program, it is as shown in Table 2 below.

この場合についても、各文字に対して複数の規則を設定することが考えられる。 Also in this case, it is conceivable to set a plurality of rules for each character.

場合に応じて、抽出したいシーン（キーフレーム）の想定される規則を分けて記述を行う。 Depending on the case, the description is made by dividing the assumed rules of the scene (key frame) to be extracted.

また、放送番組では、一義的に意味付けできないシーンも想定できるので、例えば、設定文字として、＠を考え、次の表３のようの設定することも考えられる。 Also, since it is possible to assume scenes that cannot be uniquely defined in a broadcast program, for example, it is possible to consider @ as a setting character and set as shown in Table 3 below.

このように設定した設定文字（意味文字）に対する規則１処理について、ニュース番組の場合を例に具体的に説明する。 The rule 1 process for the set characters (meaning characters) set in this way will be specifically described taking a news program as an example.

図１８に示すように、各所定の特徴データが検出される場合に、上述したニュース番組の場合の定義文字ａ，ｂ，ｃ，ｄ，ｅに対する各シーンが対応すると仮定する。 As shown in FIG. 18, when each predetermined feature data is detected, it is assumed that each scene corresponding to the definition characters a, b, c, d, and e in the case of the news program described above corresponds.

ここで、○の場合は論理積、△の場合は論理和の所定処理と仮定し、例えば、定義文字ａのアナウンサーのシーンでは、音声特徴の属性が話者音声、色特徴の検出領域２又は検出領域３で所定の色が検出され、類似画像情報の頻度１位又は２位が検出され、人物特徴の検出領域１又は検出領域２又は検出領域５で検出され、カメラ特徴は静止の場合と想定できる。 Here, it is assumed that a predetermined process of logical product is performed in the case of ◯, and logical sum in the case of △. For example, in the announcer scene of the definition character a, the speech feature attribute is speaker speech, color feature detection region 2 or A predetermined color is detected in the detection area 3, the first or second frequency of the similar image information is detected, detected in the detection area 1 or the detection area 2 or the detection area 5 of the human feature, and the camera feature is stationary. Can be assumed.

他のｂ，ｃ，ｄ，ｅなども図５の各○、△印に応じて、上記ａの場合と同様に、各所定の特徴データと関係付けて、意味文字と特徴データを関係付けできる。 Other b, c, d, e, and the like can be associated with each predetermined characteristic data and can be associated with the meaning character and the characteristic data in the same manner as in the case of a, according to the marks ◯ and Δ in FIG. .

上述したように各意味文字と各特徴データは、所定の処理すなわち規則１処理、規則２処理を行うために、所定の書式にしたがって記述することが考えられる。 As described above, each semantic character and each feature data can be described according to a predetermined format in order to perform predetermined processing, that is, rule 1 processing and rule 2 processing.

図６の（Ａ）はその一例で、ベクトル成分のように想定して記述するものである。 FIG. 6 (A) is an example, and is described assuming a vector component.

すなわち、図５に示した各特徴データを、例えば、音声特徴の属性とし、属性が話者音声はＡ１、属性が音楽はＡ２、属性がその他はＡ３とする。 That is, each feature data shown in FIG. 5 is, for example, an attribute of a voice feature, the attribute is A1 for speaker voice, A2 for music, and A3 for other attributes.

映像特徴の色特徴で領域１はＢ１、領域２はＢ２などとする。 In the color feature of the video feature, region 1 is B1, region 2 is B2, and so on.

以下、同様に、各特徴に対して、Ｂ１〜Ｂ４、Ｃ１〜Ｃ２、Ｄ１〜Ｄ５、Ｅ１〜Ｅ４、Ｆ１〜Ｆ４、Ｇ１などが設定できる。 Similarly, B1 to B4, C1 to C2, D1 to D5, E1 to E4, F1 to F4, and G1 can be set for each feature.

図６の（Ａ）において、例えば、意味文字ａの場合は次の（２）式のように記述できる。 In FIG. 6A, for example, the meaning character a can be described as the following equation (2).

ａ＝１．０（Ａ１）１００＊（１．０（Ｂ２）１００＋１．０（Ｂ３）１００）＊（１．０（Ｃ１）１００＋１．０（Ｃ２）１００）＊（１．０（Ｄ１）１００＋１．０（Ｄ２）１００＋１．０（Ｄ５）１００）＊１．０（Ｆ１）１００（２）式
他の意味文字に対しても、図６の（Ａ）のように記述できる。 a = 1.0 (A1) 100 * (1.0 (B2) 100 + 1.0 (B3) 100) * (1.0 (C1) 100 + 1.0 (C2) 100) * (1.0 (D1) 100 + 1 0.0 (D2) 100 + 1.0 (D5) 100) * 1.0 (F1) 100 (2) Formula Other meaning characters can also be described as shown in FIG.

なお、ここで、「＊」は論理積（ＡＮＤ）、「＋」は論理和（ＯＲ）と同様の所定演算を表現するものと仮定する。 Here, it is assumed that “*” represents a predetermined operation similar to a logical product (AND) and “+” represents a logical sum (OR).

ここで、例えば１．０（Ａ１）１００の記述について考える。 Here, for example, a description of 1.0 (A1) 100 is considered.

上記したように、（Ａ１）は、音声特徴で属性が話者音声の場合を表現している。 As described above, (A1) represents the case where the attribute is speaker voice and the voice feature.

（重み付け係数）
１．０（Ａ１）１００の１．０は、（Ａ１）に対する重み付け係数で、ここでは、便宜上、０〜１．０の範囲を想定している。 (Weighting factor)
1.0 of 1.0 (A1) 100 is a weighting coefficient for (A1), and here, for convenience, a range of 0 to 1.0 is assumed.

重み付け係数は、所定演算を行うための、便宜的な係数なので、重み付け係数は、０〜１００、又は０〜１０の範囲で設定する（記述する）ことも考えられる。 Since the weighting coefficient is a convenient coefficient for performing a predetermined calculation, the weighting coefficient may be set (described) in the range of 0 to 100, or 0 to 10.

（検出割合係数）
１．０（Ａ１）１００の１００は、（Ａ１）に対する検出割合係数で、その再生ユニット区間で、１００％検出される場合に、１．０（Ａ１）１００は、その条件を満たすものとする。 (Detection rate coefficient)
100 of 1.0 (A1) 100 is a detection ratio coefficient for (A1), and when 100% is detected in the playback unit section, 1.0 (A1) 100 satisfies the condition. .

例えば、１．０（Ａ１）５０の場合は、その再生ユニット区間で、５０％検出される場合に、１．０（Ａ１）１００は、その条件を満たすものとする。 For example, in the case of 1.0 (A1) 50, when 50% is detected in the playback unit section, 1.0 (A1) 100 satisfies the condition.

この検出割合については、下記の（３）式で説明する。 This detection ratio will be described with the following equation (3).

ここで、検出割合係数は、便宜上、０〜１００の範囲を想定している。 Here, the detection ratio coefficient assumes the range of 0-100 for convenience.

検出割合係数は、所定演算を行うための、便宜的な係数なので、０〜１の範囲で設定することや、０〜１０の範囲で設定する（記述する）ことも考えられる。 Since the detection ratio coefficient is a convenient coefficient for performing a predetermined calculation, it may be set in the range of 0 to 1 or set (described) in the range of 0 to 10.

ここで、この検出割合係数は、その特性がその再生ユニット区間で検出できた割合と考えることもできる。 Here, the detection ratio coefficient can be considered as a ratio at which the characteristic can be detected in the reproduction unit section.

例えば、上記１．０（Ａ１）１００では、話者音声が１００％検出しなければ、（Ａ１）の特性を検出したと判定しないと考えることができる。 For example, in the above 1.0 (A1) 100, it can be considered that if the speaker voice is not detected 100%, it is not determined that the characteristic (A1) is detected.

例えば、１．０（Ａ１）５０では、５０％検出したらその特性を検出したと判定する。 For example, in 1.0 (A1) 50, when 50% is detected, it is determined that the characteristic is detected.

すなわち、その所定区間において、所定の特性が検出された割合を係数で表現できるとも考えられる。 That is, it can be considered that the ratio of the predetermined characteristic detected in the predetermined section can be expressed by the coefficient.

（特徴（抽出）データの検出割合）
そこで、特性の検出の割合について考える。 (Characteristic (extraction) data detection rate)
Therefore, the ratio of characteristic detection is considered.

後に図３４〜図３５を参照して処理方法について説明するが、本発明では、音声セグメント特徴とシーンチェンジ特徴に応じて設定処理される再生ユニット（又はプレイユニット）（ＰＵ）という所定の区間を設定する処理概念を導入している。 The processing method will be described later with reference to FIGS. 34 to 35. In the present invention, a predetermined section called a playback unit (or play unit) (PU) that is set according to the audio segment feature and the scene change feature is used. The processing concept to be set is introduced.

そこで、例えば、そのＰＵ区間全体に対する所定の特徴データが検出された割合で、上記で言及した各所定の特性の割合を演算することが考えられる。 Therefore, for example, it is conceivable to calculate the ratio of each predetermined characteristic mentioned above at the ratio at which the predetermined feature data is detected with respect to the entire PU section.

例えば、図７において、ある再生ユニットの区間長（フレーム長、時間長など）をｆａとし、ある特徴データＰの検出区間をｆ０，ｆ１と仮定すると、この場合の特徴データＰの検出割合Ｆは、次の（３）式にて演算処理することができる。 For example, in FIG. 7, assuming that the section length (frame length, time length, etc.) of a certain playback unit is fa and the detection sections of certain feature data P are f0 and f1, the detection rate F of feature data P in this case is The arithmetic processing can be performed by the following equation (3).

Ｆ＝Σｆｉ／ｆａ
＝（ｆ０＋ｆ１）／ｆａ（３）式
この（３）式による演算値は、後で説明する評価値処理において用いることになる。 F = Σfi / fa
= (F0 + f1) / fa (3) Expression The calculation value according to the expression (3) is used in the evaluation value processing described later.

（評価値の演算方法の例）（例１）
評価値（重要度）の演算方法の一例を示す。 (Example of evaluation value calculation method) (Example 1)
An example of an evaluation value (importance) calculation method is shown.

各特徴データについて、理想値と検出結果を考えて以下のような処理を行う。
例えば、ｐ＝ｍ（Ｍ）ｎを考えて次の（処理１）〜（処理５）を行う。 For each feature data, the following processing is performed considering the ideal value and the detection result.
For example, the following (Process 1) to (Process 5) are performed considering p = m (M) n.

（処理１）上記（３）式を用いて、各所定特徴データの検出割合ｓを演算する。 (Process 1) The detection ratio s of each predetermined feature data is calculated using the above equation (3).

（処理２）上記検出割合係数ｎと比較して、
ｓ＜ｎの場合、ｐ＝ｍ×ｓ（４）式
ｓ≧ｎの場合、ｐ＝ｍ×１００（５）式
とする。 (Process 2) Compared with the detection ratio coefficient n,
When s <n, p = m × s (4)
When s ≧ n, p = m × 100 (5).

（処理３）上記処理で、各特徴Ｍにおいて、Ｍ１，Ｍ２・・・など同じ属性の特徴の場合で論理和（＋）処理の場合は平均処理を考える。 (Process 3) In the above process, in each feature M, in the case of features having the same attribute such as M1, M2,.

論理積（＊）処理の場合は、便宜上、論理積係数ｒというような処理概念を導入し、その平均処理の結果に掛けた値を考える。 In the case of logical product (*) processing, for the sake of convenience, a processing concept such as a logical product coefficient r is introduced, and a value multiplied by the result of the average processing is considered.

（処理４）上記の処理を各特徴データＭ毎に行い、各演算値の加算処理を行ってその処理結果をその評価値とする。 (Process 4) The above-described process is performed for each feature data M, the calculation values are added, and the processing result is used as the evaluation value.

（処理５）演算した評価値を比較して評価値が最も大きい場合の意味をその再生ユニットａの意味とする。 (Process 5) The calculated evaluation value is compared, and the meaning when the evaluation value is the largest is the meaning of the reproduction unit a.

上記の評価値処理は、処理方法の一例で、検出された特徴データ、又はその再生ユニット区間において検出された割合などと、設定した「意味」との対応が所定の妥当性を持った処理方法であれば、上記以外の処理方法でも良い。 The above evaluation value processing is an example of a processing method, and the processing method in which the correspondence between the detected feature data or the ratio detected in the playback unit section and the set “meaning” has a predetermined validity. If so, other processing methods may be used.

例えば、上記（処理３）の処理で論理積処理の場合は、平均化処理や論理積係数を掛けないで、同じ属性の特徴データを加算する処理だけにするなどの場合も考えられる。 For example, in the case of the logical product process in the above (Process 3), there may be a case where only the process of adding feature data having the same attribute is performed without applying the averaging process or the logical product coefficient.

上記（処理３）の処理の場合で、同じ特徴データで論理積処理の場合は、検出条件が論理和処理の場合と比較して厳しくなるので、検出値を論理和処理の場合よりも大きく取るように処理を行うように考えることもできる。 In the case of the above processing (processing 3), in the case of logical product processing with the same feature data, the detection condition becomes stricter than in the case of logical sum processing, so the detection value is taken larger than in the case of logical sum processing. It can also be considered to perform the process.

ここで、上記（２）式の場合について考える。 Here, consider the case of the above equation (2).

例えば、各特徴の検出割合を以下の表４のようにし、検出割合係数、重み係数を一緒に示す。 For example, the detection ratio of each feature is as shown in Table 4 below, and the detection ratio coefficient and the weighting coefficient are shown together.

ここで、Ｂ２、Ｂ３や、Ｃ１、Ｃ２などのように同じ特徴の種類で、検出属性が異なる場合や、あるいは検出領域が異なる場合などで、論理和処理（＋）の場合は、便宜上、平均処理を考えて、（２）式から、評価値ｈは、次の（６）式にて示される。 Here, in the case of logical sum processing (+) when the same feature type, such as B2, B3, C1, C2, and the like, or when the detection attribute is different or when the detection area is different, the average is used for convenience. Considering the processing, the evaluation value h is expressed by the following equation (6) from the equation (2).

ｈ＝１００＋（８０＋８０）／２＋（１００＋１００）／２＋（８０＋８０＋８０）／３＋８０
＝１００＋８０＋１００＋８０＋８０
＝４４０（６）式
又は、特徴データの種類で平均化した値を評価値とすることも考えられ、その場合は、特徴データは、Ａ〜Ｆの５種類なので、次の（７）式に示すような評価値と考えることもできる。 h = 100 + (80 + 80) / 2 + (100 + 100) / 2 + (80 + 80 + 80) / 3 + 80
= 100 + 80 + 100 + 80 + 80
= 440 (6) Alternatively, it is also possible to use the value averaged by the type of feature data as an evaluation value. In that case, since the feature data is five types of A to F, the following equation (7) is used. It can also be considered as an evaluation value as shown.

ｈ＝４４０／５
＝８８（７）式 h = 440/5
= 88 Equation (7)

（属性が同じ特徴データ間の関係が論理積処理の場合）
ここで、（２）式の処理で、同じ属性の特徴データ、例えば、Ｂ２、Ｂ３が論理積処理の場合、すなわち、（１．０（Ｂ２）１００＊１．０（Ｂ３）１００）のような場合について考える。 (When the relationship between feature data with the same attribute is logical product processing)
Here, in the processing of equation (2), when feature data having the same attribute, for example, B2 and B3 are logical product processing, that is, (1.0 (B2) 100 * 1.0 (B3) 100) Think about the case.

上記評価値処理の（処理３）から論理積処理係数ｒという概念を導入し、ｒ（８０＋８０）／２のような処理を考える。 The concept of logical product processing coefficient r is introduced from (process 3) of the evaluation value process, and a process such as r (80 + 80) / 2 is considered.

例えば、ｒ＝１．５とすると、
ｈ＝１００＋１．５×（８０＋８０）／２＋（１００＋１００）／２＋（８０＋８０＋８０）／３＋８０
＝１００＋１２０＋１００＋８０＋８０
＝４８０（８）式
また、特徴データの種類５で平均化処理して
ｈ＝４８０／５
＝９６（９）式
という、評価値を考えることができる。 For example, if r = 1.5,
h = 100 + 1.5 × (80 + 80) / 2 + (100 + 100) / 2 + (80 + 80 + 80) / 3 + 80
= 100 + 120 + 100 + 80 + 80
= 480 Equation (8) Also, averaging processing is performed using the feature data type 5 h = 480/5
= 96 An evaluation value of equation (9) can be considered.

これは、論理積処理の場合が論理和処理に比較して条件が厳しいので、検出した「意味」の評価値を大きく設定した方が良いと考える場合である。 This is a case where the condition of the logical product process is stricter than that of the logical sum process, so that it is better to set the evaluation value of the detected “meaning” larger.

また、場合によっては、ｒ＝０．８として、
ｈ＝１００＋０．８×（８０＋８０）／２＋（１００＋１００）／２＋（８０＋８０＋８０）／３＋８０
＝１００＋６４＋１００＋８０＋８０
＝４２４（１０）式
また、特徴データの種類５で平均化処理して
ｈ＝４２４／５
＝８４．５（１１）式
という評価値を考えることもできる。 In some cases, r = 0.8,
h = 100 + 0.8 × (80 + 80) / 2 + (100 + 100) / 2 + (80 + 80 + 80) / 3 + 80
= 100 + 64 + 100 + 80 + 80
= 424 (10) Also, averaging processing is performed with the feature data type 5 h = 424/5
= 84.5 An evaluation value of equation (11) can also be considered.

これは、上記の場合と逆の考えかたで、論理積処理の場合が論理和処理に比較して条件が厳しいので、評価値を小さく設定した方が良いと考える場合である。 This is a case opposite to the above case, where the condition for the logical product process is stricter than that for the logical sum process, and it is considered that it is better to set the evaluation value smaller.

（属性の異なる特徴データ間の関係が倫理和処理の場合）
ここで、例えば、（２）式で示したように、属性の異なる特徴データは、論理積演算子（＊）で表現しているが、論理和演算子（＋）の場合も考えられる。
簡単のため、（２）式で第１項目Ａ１、第２項目Ｂ２だけを考え、
ａ＝１．０（Ａ１）１００＋１．０（Ｂ２）１００（１２）式
上記評価値演算方法（３）で説明したような、便宜上、論理和係数ｗというような概念を考えて処理を行うことが考えられる。
この場合、（１２）式から、評価値ｈは、
ｈ＝（１００＋８０）ｗ（１３）式
となる。ここで、
ｗ＝１
の場合は、論理積処理の場合で、
ａ＝１．０（Ａ１）１００＊１．０（Ｂ２）１００（１４）式
ｈ＝１００＋８０
＝１８０（１５）式
となる。 (When the relationship between feature data with different attributes is ethical processing)
Here, for example, as shown in the equation (2), feature data having different attributes is expressed by a logical product operator (*), but a case of a logical sum operator (+) is also conceivable.
For simplicity, consider only the first item A1 and the second item B2 in equation (2),
a = 1.0 (A1) 100 + 1.0 (B2) 100 (12) Formula For convenience, processing is performed in consideration of a concept such as the logical sum coefficient w as described in the evaluation value calculation method (3). Can be considered.
In this case, from the equation (12), the evaluation value h is
h = (100 + 80) w (13) here,
w = 1
In the case of AND operation,
a = 1.0 (A1) 100 * 1.0 (B2) 100 (14) Formula h = 100 + 80
= 180 Equation (15).

例えば、（８）式の論理和処理の場合には、
ｗ＝１．５（１６）式
として、
ｈ＝（１００＋８０）×１．５
＝２７０（１７）式
と、論理積処理の場合よりも高い評価値となるような処理を考えることもできる。 For example, in the case of the logical sum processing of equation (8),
w = 1.5 (16)
h = (100 + 80) × 1.5
= 270 It is also possible to consider processing that gives an evaluation value higher than that of the formula (17) and the logical product processing.

また、
ｗ＝０．８（１８）式
として、
ｈ＝（１００＋８０）×０．８
＝１４４（１９）式
のように、論理積処理よりも小さい評価値となるような処理も考えられる。 Also,
w = 0.8 (18)
h = (100 + 80) × 0.8
= 144 As shown in the equation (19), a process with an evaluation value smaller than the logical product process is also conceivable.

評価値処理は、設定した意味と各特徴データ、各種の係数などを結びつけた式の値の評価のために便宜上、導入した概念なので、上記評価式で考えた各係数の範囲、値などは、上記の説明でのべた場合に限らず、小さく、又は大きく設定することも考えられる。 Since the evaluation value processing is a concept introduced for the sake of evaluation of the value of an expression that combines the set meaning and each feature data, various coefficients, etc., the range, value, etc. of each coefficient considered in the above evaluation expression are: Not only the case described above but also a small or large setting is conceivable.

以下のような評価値の演算により、ルールファイルにより、ルールに記述された再生ユニットの各区間の評価値が決められ、例えば、要約再生モードの場合は、要約再生時間に応じて、評価値の大きいＰＵ区間が選択され、要約時間にできるだけ近くなるように、段々と評価値の小さいＰＵ区間を選択していく。 By calculating the evaluation value as described below, the evaluation value of each section of the playback unit described in the rule is determined by the rule file. For example, in the summary playback mode, the evaluation value is determined according to the summary playback time. A PU section with a small evaluation value is selected gradually so that a large PU section is selected and is as close as possible to the summary time.

そして、選択した各ＰＵ区間を再生することで、所定の要約再生が実現できる。 Then, by reproducing each selected PU section, predetermined summary reproduction can be realized.

（評価値処理の他の処理方法）
上記で述べた各特徴データｎの一項と、所定演算子＊とからｗ（Ｍ）＊ｋを考え、各所定特徴データの検出割合ｄｅｔ、検出割合係数ｋとして、評価式の各項の特徴データｎの重み係数をｗ（ｎ）として、演算関数Ｐと演算子＊を考える。 (Other processing methods for evaluation value processing)
Considering w (M) * k from one term of each feature data n described above and a predetermined operator *, the detection rate det and detection rate coefficient k of each predetermined feature data is a feature of each term of the evaluation formula. Consider a calculation function P and an operator *, where w (n) is a weighting factor of data n.

Ｐ（＊ｋ（ｎ），ｄｅｔ（ｎ））を考え、
ｄ（ｎ）＝Ｐ（＊ｋ（ｎ），ｄｅｔ（ｎ））（２０）式
とする。 Consider P (* k (n), det (n))
d (n) = P (* k (n), det (n)) Equation (20).

ここで、演算子＊以下の何れかに該当するものとして、
ｄ（ｎ）は、
（１）＊＝（｜｜＞）の場合、すなわちＰ（（｜｜＞）ｋ（ｎ），ｄｅｔ（ｎ））で、
ｉｆ（ｋ（ｎ）≦ｄｅｔ（ｎ））ｔｈｅｎｄ（ｎ）＝０（２１）式
ｅｌｓｅｄ（ｎ）＝１００（２２）式
（２）＊＝（｜｜＜）の場合、すなわちＰ（（｜｜＜）ｋ（ｎ），ｄｅｔ（ｎ））で、
ｉｆ（ｋ（ｎ）＞ｄｅｔ（ｎ））ｔｈｅｎｄ（ｎ）＝０（２３）式
ｅｌｓｅｄ（ｎ）＝１００（２４）式
となる。 Here, as one of the following operators *
d (n) is
(1) When * = (||>), that is, P ((||>) k (n), det (n)),
if (k (n) ≦ det (n)) then d (n) = 0 (21) formula else d (n) = 100 (22) formula (2) In the case of * = (|| <), that is, P ( (|| <) k (n), det (n))
if (k (n)> det (n)) then d (n) = 0 Expression (23) else d (n) = 100 Expression (24).

上記（１）、（２）のような処理の場合は、検出ｄｅｔ（ｎ）と設定検出割合ｋ（ｎ）に応じて、途中処理値ｄ（ｎ）を１００又は０に処理するので、下記の（３）又は（４）で説明する途中処理値が差分値になる場合に比較して、特徴データを顕著に特徴付けたい場合には有効であると考えられる。 In the case of the processes (1) and (2) above, the midway processing value d (n) is processed to 100 or 0 according to the detection det (n) and the set detection ratio k (n). Compared with the case where the midway processing value described in (3) or (4) is a difference value, it is considered effective when it is desired to characterize feature data significantly.

また、さらに、
（３）＊＝（｜＞）の場合、すなわちＰ（（｜＞）ｋ（ｎ），ｄｅｔ（ｎ））で、
ｉｆ（ｋ（ｎ）＜ｄｅｔ（ｎ））ｔｈｅｎｄ（ｎ）＝０（２５）式
ｅｌｓｅｄ（ｎ）＝｜ｋ（ｎ）−ｄｅｔ（ｎ｜（２６）式
（４）＊＝（｜＜）の場合、すなわちＰ（（｜＜）ｋ（ｎ），ｄｅｔ（ｎ））で、
ｉｆ（ｋ（ｎ）＞ｄｅｔ（ｎ））ｔｈｅｎｄ（ｎ）＝０（２７）式
ｅｌｓｅｄ（ｎ）＝｜ｋ（ｎ）−ｄｅｔ（ｎ）｜（２８）式
であるから、評価値は次の（２９）式のようになる。 In addition,
(3) When * = (|>), that is, P ((|>) k (n), det (n)),
if (k (n) <det (n)) then d (n) = 0 (25) equation else d (n) = | k (n) −det (n | (26) equation (4) * = (| <), That is, P ((| <) k (n), det (n)),
if (k (n)> det (n)) then d (n) = 0 (27) expression else d (n) = | k (n) −det (n) | Becomes the following equation (29).

上記の演算子の導入により、例えば、Ａ１、Ｂ２の特徴データがあった場合に以下のように記述することができる。 By introducing the above operator, for example, when there is feature data of A1 and B2, it can be described as follows.

ａ＝１．０（Ａ１）（｜｜＜）１００＋１．０（Ｂ２）（｜＜）１００（３０）式
この場合、例えば、Ａ１特徴の検出割合（実際の検出値）を１００、Ｂ２特徴の検出割合（実際の検出値）を８０と仮定すると、上記（１）、（４）から、評価値ｈは、
ｈ＝（１．０×（１００−０）＋１．０（１００−８０））／（１．０＋１．０）
＝（１００＋２０）／２
＝６０（３１）式
という評価値を考えることができる。 a = 1.0 (A1) (|| <) 100 + 1.0 (B2) (| <) 100 (30) In this case, for example, the detection ratio (actual detection value) of the A1 feature is 100, and the B2 feature Assuming that the detection ratio (actual detection value) is 80, from (1) and (4) above, the evaluation value h is
h = (1.0 * (100-0) +1.0 (100-80)) / (1.0 + 1.0)
= (100 + 20) / 2
= 60 An evaluation value of equation (31) can be considered.

上記のように、評価値処理の方法には、幾つかの方法を考えることができ、ここで説明した方法に限らずとも良い。 As described above, several methods can be considered as the evaluation value processing method, and the method is not limited to the method described here.

ここで、図６の（Ａ）に示した規則１記述においては、記述するデータの出現パターン（意味）の表現方法の一例で、意味として、ａ，ｂ，ｃ・・・などとしたが、その否定としてＡ，Ｂ，Ｃ，・・・、また、ワイルドカードとして、＊などを用いることも考えられる。 Here, in the description of Rule 1 shown in FIG. 6A, an example of a method for expressing the appearance pattern (meaning) of the data to be described is assumed to have a, b, c. It is also conceivable to use A, B, C,...

（規則２処理の場合）
規則２処理では、上記規則１処理で意味付けされた所定区間である再生ユニット同士の意味の接続を考慮して処理を行うようにする。 (Rule 2 processing)
In the rule 2 process, the process is performed in consideration of the connection of the meanings of the reproduction units that are the predetermined sections given in the rule 1 process.

また、時間補正関数を用いて時間的な補正、すなわち時間的な重み付け処理を行う。 In addition, temporal correction, that is, temporal weighting processing is performed using a time correction function.

例えば、上記規則１処理において、意味ａの評価値を７０、意味ｂの評価値を８０と仮定すると、（ａｂ）の評価値ｇは、
ｇ＝７０＋８０
＝１５０
又は、意味数の平均を考えて、ここでは、ａｂの２つなので、
ｇ＝５１０／２
＝７５
又は、それぞれの評価値の積を考えて、
ｇ＝７０×８０
＝５６００
例えば、便宜上、最大値を１００と仮定して最大値で正規化することを考え、
ｇ＝５６００／１００
＝５６
などのようにすることも考えられる。 For example, in the rule 1 process, assuming that the evaluation value of meaning a is 70 and the evaluation value of meaning b is 80, the evaluation value g of (ab) is
g = 70 + 80
= 150
Or, considering the average number of meanings, here it is two of ab,
g = 510/2
= 75
Or, considering the product of each evaluation value,
g = 70 × 80
= 5600
For example, for convenience, assume that the maximum value is 100 and normalize with the maximum value,
g = 5600/100
= 56
It is also possible to do so.

時間補正関数の重み付けは、例えば、上記（ａｂ）が、ある時点ｔで検出でき、その評価値がｇで、ｔでの時間補正係数（重み付け係数）をｗとすると、ｇｔを最終的な評価値とする。 The weighting of the time correction function is, for example, that (ab) can be detected at a certain time t, the evaluation value is g, and the time correction coefficient (weighting coefficient) at t is w. Value.

時間補正関数は、ルールファイルにおいて規則２の所定記述場所に、所定記述規則に従って、その変化点（変化点座標系の情報データ）を記述する。 The time correction function describes the change point (information data of the change point coordinate system) at the predetermined description location of rule 2 in the rule file according to the predetermined description rule.

規則２処理の一例を図６の（Ｂ）に示す。
（時間補正関数）
はじめに、時間補正関数について説明する。 An example of the rule 2 process is shown in FIG.
(Time correction function)
First, the time correction function will be described.

この時間補正関数は、ルールファイルにおける所定番組ジャンルにおける要約時間補正を行うために利用することができる。 This time correction function can be used to correct the summary time in a predetermined program genre in the rule file.

これは、ユーザーによっては、所定の放送番組によっては、放送時間の前半や後半を重点的に再生したいという場合も想定できる。 For some users, it may be assumed that the user wants to play the first half or second half of the broadcast time intensively depending on a predetermined broadcast program.

そこで、記録する番組のジャンル、放送時間、その他などの、その放送番組に応じた種々のパラメータを考慮して、ダイジェスト再生（要約再生）を行う所定の再生区間に対して時間（場合によっては時刻）の重み付けを行うことを考えることができる。 Therefore, in consideration of various parameters depending on the broadcast program such as the genre of the program to be recorded, the broadcast time, etc., the time (in some cases, the time in some cases) for a predetermined playback section in which digest playback (summary playback) is performed. ) Weighting can be considered.

すなわち、この重み付けを行う区間は、時間的にそれ以外の重み付けを行わない区間に比較して、ダイジェスト再生を行う場合の再生の重要度を大きく処理することになる。 In other words, the interval for performing the weighting is processed so that the importance of reproduction when performing digest reproduction is larger than the interval for which no other weighting is performed in time.

図８の（Ａ）〜（Ｉ）は、上記で述べた時間の重み付けを行うための時間補正関数の一例を示すものである。 8A to 8I show an example of a time correction function for performing the time weighting described above.

図８の（Ａ）は、フラットな特性で、所定の要約再生区間に対して時間の重み付けを行わない場合である。 (A) of FIG. 8 is a case where the time is not weighted for a predetermined summary reproduction section with a flat characteristic.

図８の（Ｂ）所定の区間内において、前半部の方を後半部に比較して、要約再生における重要度としての再生の重みを大きくする重み付けを行っている場合である。 FIG. 8B shows a case where weighting for increasing the weight of reproduction as importance in summary reproduction is performed by comparing the first half with the second half within a predetermined section.

図８の（Ｃ）所定の区間内において、後半部の方を前半部に比較して、要約再生における重要度としての再生の重みを大きくする重み付けを行っている場合である。 FIG. 8C shows a case where weighting is performed to increase the weight of reproduction as the importance in summary reproduction by comparing the latter half with the first half within a predetermined section.

図８の（Ｄ）所定の区間内において、前半部と後半部を中間部に比較して、要約再生における重要度としての再生の重みを大きくする重み付けを行っている場合である。 FIG. 8D shows a case where weighting is performed to increase the weight of reproduction as the importance in summary reproduction by comparing the first half and the second half with the middle in a predetermined section.

図８の（Ｅ）所定の区間内において、中間部を前半部及び後半部に比較して、要約再生における重要度としての再生の重みを大きくする重み付けを行っている場合である。 FIG. 8 (E) shows a case where weighting is performed to increase the weight of reproduction as the importance level in the summary reproduction by comparing the middle part with the first half part and the second half part within a predetermined section.

図８の（Ｆ）は、図８の（Ｄ）の違った形の補正関数を２つ接続したようなもので前半部、前半と中央部の間、中央部、中央部と後半部の間、後半部にそれぞれ重みをつけて、さらに各重み付けを異なったものにしている。 (F) in FIG. 8 is like connecting two correction functions having different shapes from those in FIG. 8 (D), and between the first half, the first half and the middle, the middle, and between the middle and the second half. The weights are assigned to the latter half, and the weights are made different.

図８の（Ｇ）は、図８の（Ｅ）の違った形の補正関数を２つ接続したようなもので前半部、前半と中央部の間、中央部、中央部と後半部の間、後半部にそれぞれ重みをつけて、さらに各重み付けを異なったものにしている。 (G) in FIG. 8 is such that two correction functions having different shapes from those in (E) in FIG. 8 are connected, and between the first half, the first half and the middle, the middle, and between the middle and the second half. The weights are assigned to the latter half, and the weights are made different.

図８の（Ｈ）は、図８の（Ｃ）と（Ｄ）の組合せ関数で、図８の（Ｉ）は、図８の（Ｄ）と（Ｂ）の組合せ関数のかたちである。 (H) in FIG. 8 is a combination function of (C) and (D) in FIG. 8, and (I) in FIG. 8 is a combination function in (D) and (B) of FIG.

図９は一般的な時間補正関数の様子を示したもので、開始点、変化点、終点の座標をそれぞれ、Ｐ０（ｔｓ，ｓ３），Ｐ１（ｔ１，ｓ３），・・・，Ｐｅ（ｔｅ，ｓ０）としている。 FIG. 9 shows a general time correction function. The coordinates of the start point, the change point, and the end point are P0 (ts, s3), P1 (t1, s3),..., Pe (te , S0).

ここで、座標のｙ成分は重み付けを表しているので、ここでは、便宜上最大値を１００最小値を０とし、０〜１００の間の値をとるものとし、ｘ座標は、位置情報で、後述する図４１〜図４３に示す「開始終了位置情報」のデータと同じディメンジョンの値、又は開始終了点間の区間に基づく開始点からの割合で、０〜１００の間で設定して位置を示すことも考えられる。 Here, since the y component of the coordinates represents weighting, here, for convenience, the maximum value is 100, the minimum value is 0, and the value is between 0 and 100, and the x coordinate is position information, which will be described later. The same dimension value as the “start / end position information” data shown in FIG. 41 to FIG. 43, or the ratio from the start point based on the section between the start and end points, set between 0 and 100 to indicate the position It is also possible.

（再生ユニットの意味と接続関係、判定処理）
上記で説明したが、所定の特徴抽出処理による特徴データからその再生ユニット（ＰＵ）における意味設定することができる。 (Reproduction unit meaning, connection relationship, judgment processing)
As described above, the meaning of the reproduction unit (PU) can be set from feature data obtained by a predetermined feature extraction process.

ここで、図１０に示すようなビデオデータの構造について検討する。 Here, the structure of video data as shown in FIG. 10 will be considered.

ある一つのプログラム（番組）ｋを想定すると、幾つかのシーンｍ、ｍ＋１、・・・に分類することが考えられ、そしてシーンは幾つかのショットに分類することが考えられる。 Assuming one program (program) k, it can be considered to classify into several scenes m, m + 1,..., And the scene can be classified into several shots.

そしてショットを構成するのは一つ一つのフレームと考えられる。 Each shot is considered to be a single frame.

シーンの切れ目（区切り）はシーンチェンジになる。 A scene change is a scene change.

セグメント（又はショット、又は画像セグメント。以下同様）は、シーン毎に、そのシーンに応じた類似画像のまとまり、又は、類似した画像（映像）特性のまとまりとも考えられる。 A segment (or a shot or an image segment, the same applies hereinafter) is considered to be a group of similar images or a group of similar image (video) characteristics according to the scene.

セグメントやシーンなどは、そのプログラム（番組）の中において、固有な意味の概念を持っていると考えられる。 A segment, a scene, or the like is considered to have a unique meaning concept in the program (program).

そして、各々の意味を持ったセグメント、シーンが幾つかまとまって、その番組を構成しているというビデオ構造を考えることができる。 Then, a video structure can be considered in which several segments and scenes having respective meanings are grouped to constitute the program.

例えば、野球の番組を想定した場合に、打者の画面が続いていたとすると、打者の類似画像が検出され、その類似特性セグメントに分類できると考えられる。 For example, if a baseball program is assumed and the batter's screen continues, it is considered that a similar image of the batter is detected and classified into the similar characteristic segments.

そして、そのセグメントは、「打者の画像」という意味（意味の概念）を持つことになる。 The segment has the meaning (concept of meaning) of “batter's image”.

また、ピッチャーの投球する画面が続いていたら、ピッチャーの類似画面が検出され、その類似特性に応じてセグメントに分類できると考えられる。 Further, if the pitcher throwing screen continues, a similar pitcher screen is detected, and it can be considered that the screen can be classified into segments according to the similar characteristics.

そして、そのセグメントは、「ピッチャーの画像」という意味（意味の概念）を持つことになる。 Then, the segment has a meaning (concept of meaning) of “pitcher image”.

ここで、例えば、ピッチャーが投球して、打者が打撃し、その打者が走塁するような場合を考えた場合に、「ピッチャーの画像シーン」、「打者の画像シーン」、「打者の走塁の画像シーン」という、各々意味を持った画像シーンのつながりを考えることができる。 Here, for example, when considering a case where a pitcher throws, a batter hits and the batter strikes, “pitcher image scene”, “batter image scene”, “batter strikes” It is possible to consider the connection of image scenes each having meaning.

所定プログラム（番組）において、上記したＰＵ毎に、画像特徴データ、音声特徴データが処理され、それら特徴データに応じてそのＰＵの意味を設定することを考える。 Consider that image feature data and audio feature data are processed for each PU described above in a predetermined program (program), and the meaning of the PU is set according to the feature data.

例えば、ニュース番組を考えた場合に、キャスター（アナウンサー）が最初にニュースの項目を読み上げるシーン（ニュース番組のヘッドライン）を想定した場合に、そのシーン（画像）の特徴として、人物特徴が１人〜２人、テロップ（Ｔｌｐ特徴）、音声特徴の属性が話者音声、その他に、ニュース番組を考えた場合に、そのニュース番組の中で、ニュースを読み上げるシーンは幾つか存在すると考えられるので、そのニュース読み上げシーンと類似するシーンは幾つか存在することになり、類似画像特徴すなわち、特定のシーンＩＤは出現頻度が高くなることが考えられる。 For example, when a news program is considered, if a caster (announcer) first reads a news item (news program headline) and assumes a scene (image headline), the scene (image) has one person feature. ~ Two people, telop (Tlp feature), voice feature attribute is speaker voice, etc. In addition, when considering a news program, it is considered that there are several scenes that read the news in the news program, There are some scenes similar to the news reading scene, and it is considered that the appearance frequency of the similar image feature, that is, the specific scene ID increases.

このように、また、規則１処理でも説明したように人物特徴、音声特徴、テロップ特徴、類似画像特徴、その他所定の特徴データに応じて、そのＰＵの意味を設定することが考えられる。 In this way, as described in the rule 1 process, it is conceivable to set the meaning of the PU according to the person feature, voice feature, telop feature, similar image feature, and other predetermined feature data.

例えば上記野球の番組を考えたように、所定の意味を持つＰＵの接続関係を考えることができる。 For example, as in the case of the above baseball program, it is possible to consider PU connection relationships having a predetermined meaning.

すなわち、所定の特徴データ又は特性データを持つＰＵ間の所定の接続を考えることができる。 That is, a predetermined connection between PUs having predetermined feature data or characteristic data can be considered.

上述した所定の意味を持つすなわち所定の意味が設定されたＰＵの接続関係を図１１に示す。 FIG. 11 shows a connection relationship of PUs having the above-described predetermined meaning, that is, a predetermined meaning set.

図１１において、今考えているプログラム（番組）で所定の意味ａ〜意味ｄが設定されており、ある区間ＰＵ（ｎ）〜ＰＵ（ｎ＋２）で、接続関係を考えた場合、ＰＵ（ｎ）の意味ａ、ＰＵ（ｎ＋１）の意味ｂ、ＰＵ（ｎ＋２）の意味ｃがもっとも自然なつながりと考えられることを示している。 In FIG. 11, when a predetermined meaning a to meaning d is set in the program (program) currently considered, and connection relation is considered in a certain section PU (n) to PU (n + 2), PU (n) Meaning a, PU (n + 1) meaning b, and PU (n + 2) meaning c are considered to be the most natural connections.

すなわち、この図１１に示した例は相撲の場合を想定しており、意味ａ「取組み紹介シーン」の後は意味ｂ「立会いシーン」が続くことが一番妥当性であり合理的と考え、意味ｂ「立会いシーン」の後は意味ｃ「取り組みシーン」が続くことが一番妥当性であり合理的と考える。 That is, the example shown in FIG. 11 assumes the case of sumo, and it is most reasonable and reasonable that the meaning a “approach introduction scene” is followed by the meaning b “witness scene”. The meaning b “attendance scene” is followed by the meaning c “working scene”.

そして接続関係として、ａｂｃという意味を定義した文字系列を考えることができ、このａｂｃの系列がキーフレームとすれば、いま考えているプログラム（番組）の中でａｂｃを探して、探した所定区間の最初と最後、又は、その近傍などを所定設定点として設定処理を行うことが考えられる。 Then, as a connection relationship, a character sequence defining the meaning of abc can be considered. If this abc sequence is a key frame, the abc is searched for in the currently considered program (program), and the predetermined section searched. It is conceivable that the setting process is performed with the first and last of or the vicinity thereof as a predetermined set point.

他の例として、例えば、番組ジャンルが野球の場合は、ある区間で再生ユニットが、各々、「投球」、「打った」、「意味無し」、「得点」という場合であったとしたら、「意味無し」を除いて、３つの意味、「投球」、「打った」、「得点」を持つと判定されたＰＵを１つにまとめて、「投球、打った、得点」という所定ＰＵのかたまりを想定することができる。 As another example, for example, when the program genre is baseball, if the playback unit is “throw”, “hit”, “no meaning”, “score” in a certain section, Except for “None”, PUs determined to have three meanings, “throw”, “hit” and “score” are combined into one, and a set of predetermined PUs “throw, hit, score” is collected. Can be assumed.

ここで、「意味無し」のＰＵは、意味がないと判定されたのだから含めても問題ないと考え、上記４つのＰＵを１つにまとめて「投球、打った、意味無し、得点」という所定ＰＵのまとまりを考えることもできる。 Here, since it is determined that there is no meaning for PU of “no meaning”, it is considered that there is no problem even if it is included, and the above four PUs are combined into one and called “throw, hit, meaningless, score” A set of predetermined PUs can also be considered.

ここで、「意味なし」を考えたのは、上記の規則１処理で所定の特徴データから所定の評価処理で、設定した幾つかの意味の中から、所定の意味付け処理を行うすなわち、複数の意味から所定の信号処理に基づいて確からしい意味付けが行えない場合も想定できるからである。 Here, “no meaning” is considered because a predetermined meaning process is performed from a plurality of set meanings in a predetermined evaluation process from a predetermined feature data in the above rule 1 process. This is because it can be assumed that a certain meaning cannot be given based on the predetermined signal processing.

「意味なし」の代わりに、「どのような意味でも良い」というものを設定することも考えられる。これは、上記した＠と同様の処理である。 Instead of “no meaning”, it is possible to set “any meaning”. This is the same processing as @ described above.

あるニュース番組の場合で、ａａｂｂという接続、すなわち、「アナウンサーシーン」、「アナウンサーシーン」、「現場シーン」、「現場シーン」という接続が、妥当で合理的であると考えられる場合を図１２の（Ａ）に示す。 In the case of a certain news program, a connection of aabb, that is, a connection of “announcer scene”, “announcer scene”, “site scene”, “site scene” is considered to be reasonable and reasonable in FIG. Shown in (A).

また、先に説明した相撲番組の場合を図１２の（Ｂ）に示す。 Moreover, the case of the sumo program demonstrated previously is shown to (B) of FIG.

図１３は、上記の番組ジャンルがニュース番組の場合で、図１３の（Ａ）に示すように、参照パターン（参照文字系列）を上記で説明した「ａａｂｂ」として、図１３の（Ｂ）に示すように、いま考えている所定の番組記録区間の中で「ａａｂｂ」の区間を探していき、区間Ａ１、区間Ａ２が「ａａｂｂ」に一致して、検索できたことを示している。 FIG. 13 shows a case where the above-mentioned program genre is a news program. As shown in FIG. 13A, the reference pattern (reference character sequence) is “aabb” described above, and FIG. As shown, the section “aabb” is searched for in the predetermined program recording section under consideration, and the sections A1 and A2 match “aabb”, indicating that the search has been completed.

そして、図１３の（Ｂ）に示すように、例えば、探すことができた「ａａｂｂ」区間の最初の位置ｐ１、ｐ３、最後の位置ｐ２、ｐ４を所定設定位置として設定し、後で説明するプレイリストの位置情報データとして所定の処理を行う。 Then, as shown in FIG. 13B, for example, the first positions p1 and p3 and the last positions p2 and p4 of the searched “aabb” section are set as predetermined setting positions, which will be described later. A predetermined process is performed as position information data of the playlist.

例えば、要約再生モードの場合には、上記設定位置ｐ１〜ｐ２、ｐ３〜ｐ４を再生するように再生制御処理を行う。 For example, in the summary playback mode, playback control processing is performed so that the set positions p1 to p2 and p3 to p4 are played back.

チャプター設定など、所定時点設定（所定位置設定）処理の場合には、ｐ１、ｐ２、ｐ３、ｐ４の各時点、又は、それら各点の所定の近傍の位置をその設定位置として所定の処理を行ことが考えられる。 In the case of predetermined time point setting (predetermined position setting) processing such as chapter setting, predetermined processing is performed with each time point of p1, p2, p3, and p4 or a position in the vicinity of each of these points as the set position. It is possible.

このように、所定の特徴データから所定のＰＵの意味を持つと判定し、その所定の意味を持つとそのＰＵに設定し、それら意味が判定して設定されたＰＵから意味の接続関係を想定して、所定の意味に応じた所定数のＰＵの接続や所定数のＰＵの集合を想定して処理を行うことができる。 In this way, it is determined that the meaning of the predetermined PU is determined from the predetermined feature data, and if it has the predetermined meaning, it is set to the PU, and the meaning connection relation is assumed from the PU set by determining the meaning. Thus, processing can be performed assuming a connection of a predetermined number of PUs according to a predetermined meaning or a set of a predetermined number of PUs.

図６の（Ｂ）に示した規則２処理の記述方法の一例では、キーフレーム（重要フレーム）と想定し、検索したい文字系列を（ａａｂｂ）のようにし、その後に、重み付け係数として１００を設定している。その後のＰｓ（ｔｓ，ｓ４），Ｐ１（ｔ１，ｓ４），Ｐｅ（ｔｅ，ｓ３）は、先に説明した時間補正関数であって、この例の場合は、図１４に示すように、番組の後半部で徐々に重要度が減少するような関数となっている。 In the example of the description method of the rule 2 process shown in FIG. 6B, the key frame (important frame) is assumed, the character sequence to be searched is set as (abb), and then 100 is set as the weighting coefficient. is doing. Subsequent Ps (ts, s4), P1 (t1, s4), and Pe (te, s3) are the time correction functions described above. In this example, as shown in FIG. It is a function whose importance gradually decreases in the second half.

この図１４のような時間補正関数の場合には、番組の前半部を重点的に視聴したいような場合に適していると考えられる。 In the case of the time correction function as shown in FIG. 14, it is considered that the time correction function is suitable for the case where it is desired to watch the first half of the program with priority.

ここで、図６の（Ｂ）に示した規則２処理の記述においては、記述するデータの出現パターン（意味）の表現方法の一例で、意味として、ａ，ｂ，ｃ・・・などとしたが、その否定としてＡ，Ｂ，Ｃ，・・・、また、ワイルドカードとして、＊などを用いることも考えられる。 Here, the description of the rule 2 process shown in FIG. 6B is an example of a method for expressing the appearance pattern (meaning) of the data to be described, and the meaning is a, b, c,. However, A, B, C,... Can be used as the negation, and * can be used as a wild card.

この図６の（Ｂ）に示した規則２処理の記述において、ニュース番組のジャンルの場合の一として、例えば、（Ａｂｂ）とした場合、には、Ａは、「アナウンサーのシーン」以外、ｂは、「現場のシーン」ということになり、「アナウンサーのシーン」以外に「現場のシーン」が２つ続く場合を検出することになる。 In the description of the rule 2 process shown in FIG. 6B, when the news program genre is one, for example, (Abb), A is other than “announcer scene”, b Will be referred to as “on-site scene”, and a case where two “on-site scenes” continue in addition to “announcer scene” will be detected.

ここで、評価値の演算方法の一例は、以下のような処理であった。
例えば、いま考えている再生ユニット群を（ａｂｃ）として、上記（１）式により、ａ、ｂ、ｃの各検出割合（ｖａｌｕｅ）と重み付け係数が以下の表５に示すような場合について考える。 Here, an example of the evaluation value calculation method is the following processing.
For example, let us consider a case where the reproduction unit group currently considered is (abc) and the detection ratios (values) of a, b, and c and the weighting coefficients are as shown in Table 5 below according to the above equation (1).

ここで、１００を掛けたのは、便宜上、割合（％）を考慮したためであるが、上記したように、評価値のスケールは、所定の評価処理が行えて、かつ所定の演算処理上問題なければ良いので、オーバーフローなど処理上問題なければ、割合を考慮しなくても良い場合も考えられる。
（規則２処理における再生ユニット群の変形例）
ここで、規則処理２における、ある「意味」の再生ユニットが複数接続した再生ユニット群を１つの意味群ユニットとし、意味群ユニットが複数接続する場合について考える。 Here, 100 is multiplied for the sake of convenience because the ratio (%) is taken into consideration. However, as described above, the scale of the evaluation value can be subjected to predetermined evaluation processing, and there is no problem in predetermined arithmetic processing. Since there is no problem in processing such as overflow, there may be a case where the ratio need not be considered.
(Modification of playback unit group in rule 2 processing)
Here, a case is considered in which a reproduction unit group in which a plurality of “meaning” reproduction units are connected in rule processing 2 is defined as one semantic group unit, and a plurality of semantic group units are connected.

上記規則１処理では、１つの再生ユニットについて考えた。それは、特徴データから、検出する「意味」にもっと確からしいであろう再生ユニットを見つけるためであった。 In the rule 1 process, one playback unit was considered. It was to find out from the feature data a playback unit that would be more likely to detect "meaning".

この考えをさらに発展させて、再生ユニット群、すなわち、この規則２処理で考えた、意味のつながりの再生ユニットを１つのかたまりとして、そのかたまり同士を接続した区間を検出することも考えられる。 Further developing this idea, it is also possible to detect a section in which the units are connected by using the unit of playback units, that is, the units having the meaning connected in the rule 2 process as one unit.

例えば、上記（ａａｂｂ）をＧａ１とし、（Ｇａ１Ｇａ１）のような接続を考えることができる。この場合に、Ｇａ１の評価値を考えて、上記規則１と類似した処理を行うことが考えられる。 For example, assuming that (aabb) is Ga1, a connection such as (Ga1Ga1) can be considered. In this case, considering the evaluation value of Ga1, it is conceivable to perform processing similar to the above rule 1.

その場合の評価値の演算方法として、例えば、各意味の再生ユニットの評価値の和の平均や、各意味の再生ユニットの評価値の積の平均などを考えることができる。 As an evaluation value calculation method in that case, for example, an average of sums of evaluation values of the reproduction units of the respective meanings, an average of products of the evaluation values of the reproduction units of the respective meanings, or the like can be considered.

例えば、ａの評価値を８０、ｂの評価値を６０とした場合に、Ｇａ１の評価値は、加算の場合は、
（８０＋８０＋６０＋６０）／４＝７０
で、７０を評価値として考えることができる。 For example, when the evaluation value of a is 80 and the evaluation value of b is 60, the evaluation value of Ga1 is
(80 + 80 + 60 + 60) / 4 = 70
Thus, 70 can be considered as the evaluation value.

規則３処理の場合
通常は、図１５の（Ａ）に示すように、上記規則２処理までで良いと考えられるが、複数の番組対して特徴データを設けた場合に、例えば、番組毎に時間的重み付け処理を行いたい場合は、さらに規則処理として、図１５の（Ｂ）に示すように、規則３処理を設けることも考えられる。 In the case of rule 3 processing Normally, as shown in FIG. 15A, it is considered that the processing up to the above rule 2 processing is sufficient. However, when feature data is provided for a plurality of programs, for example, time is set for each program. In the case where it is desired to perform a weighting process, a rule 3 process may be provided as a rule process as shown in FIG.

その一例として、ニュース番組（ｎｅｗｓ）とスポーツ番組（ｓｐｏｒｔｓ）に対して重み付けと、時間補正を行う場合の例を図６の（Ｃ）に示す。 As an example, FIG. 6C shows an example in which weighting and time correction are performed on news programs (news) and sports programs (sports).

図６の（Ｃ）に示す例では、ニュース番組は、１００％の重み付けを行い、時間補正関数として開始点Ｐｓ（ｔｓ，ｓ４）、変化点Ｐ１（ｔ１，ｓ４）、終点Ｐｅ（ｔｅ，ｓ３）とする補正を行い、スポーツ番組に対しては、７０％の重み付けを行い、時間補正関数として開始点Ｐｓ（ｔｓ，ｓ４）、変化点Ｐ１（ｔ１，ｓ４）、終点Ｐｅ（ｔｅ，ｓ３）とする補正を行う。 In the example shown in FIG. 6C, the news program is weighted 100%, and the start point Ps (ts, s4), the change point P1 (t1, s4), and the end point Pe (te, s3) are used as time correction functions. ), Weighting 70% for sports programs, and starting points Ps (ts, s4), change points P1 (t1, s4), end points Pe (te, s3) as time correction functions To correct.

図３で説明した処理内容を図１６を参照して更に説明する。 The processing content described in FIG. 3 will be further described with reference to FIG.

図１６の（Ａ）のような、規則１処理により、各種所定の特徴データに基づいて、各シーンは幾つかの意味付け処理が行われる。 As shown in FIG. 16A, according to the rule 1 process, each scene is subjected to some meaning process based on various predetermined feature data.

ここで、規則２によって意味付けされた各シーンには、図１６の（Ｂ）のように評価値が所定の処理により設定される。 Here, an evaluation value is set to each scene given by rule 2 by a predetermined process as shown in FIG.

例えば、要約再生モードの場合では、ユーザーの所望する時間ｔ１で再生する場合に、上記評価値の一番高いシーン（画像）から選択していき、できるだけｔ１に近くなるように評価値の高いシーンから選択して、その選択した区間を再生するように、その位置情報を設定する。 For example, in the summary playback mode, when playback is performed at a time t1 desired by the user, a scene (image) having the highest evaluation value is selected from the above scenes, and a scene with a high evaluation value is as close to t1 as possible. The position information is set so that the selected section is reproduced.

設定した位置情報は所定のデータメモリーに記憶し、再生制御を行う際に、位置情報を読み出して、所定区間の再生を行っていく。 The set position information is stored in a predetermined data memory, and when performing reproduction control, the position information is read and reproduction of a predetermined section is performed.

そして、各区間を順次再生する（スキップ再生）することで、所定の要約再生（ダイジェスト再生）を行う。 Then, predetermined digest reproduction (digest reproduction) is performed by sequentially reproducing each section (skip reproduction).

図１６の（Ｃ）に示す例では、全記録時間を例えば６０分とし、要約再生を１５分で行いたいと仮定して、評価値が７０以上のＰＵを選択して、１５分にやや満たない場合に、評価値６０のＰＵｎ＋８の区間を選択して、所望の再生時間１５分にできるだけ近くなるように処理を行っている。 In the example shown in FIG. 16C, assuming that the total recording time is 60 minutes, for example, and that summary playback is to be performed in 15 minutes, a PU with an evaluation value of 70 or more is selected, and is slightly less than 15 minutes. If not, the section of PUn + 8 with an evaluation value of 60 is selected, and processing is performed as close as possible to the desired reproduction time of 15 minutes.

このように評価値の大きい所定ＰＵ区間を選択していき、所定の再生時間にできるだけ近くなるように、ＰＵ区間を選択していく。 In this way, the predetermined PU section having a large evaluation value is selected, and the PU section is selected so as to be as close as possible to the predetermined reproduction time.

所望の再生時間Ｔｍに対して所定の許容範囲ｔｃ内に再生時間Ｔがあるように、
Ｔｍ−ｔｃ＜Ｔ＜Ｔｍ＋ｔｃ
となるように、評価値に基づいて所定のＰＵ区間を選択する。 In order that the reproduction time T is within a predetermined allowable range tc with respect to the desired reproduction time Tm,
Tm-tc <T <Tm + tc
Based on the evaluation value, a predetermined PU section is selected.

また、図１６の（Ｄ）に示すように、例えば、上記意味付けされた評価値の高い区間の最初（又はその近傍）、評価値の高い区間の最後（又はその近傍）に所定位置（チャプター）を設定することで、その区間を編集処理したり、スキップ再生の一時停止処理、繰り返し再生処理など、所定の操作を行うことに利用できる。 Further, as shown in FIG. 16D, for example, a predetermined position (chapter) is set at the beginning (or the vicinity thereof) of the above-signified high evaluation value section or at the end (or the vicinity thereof) of the high evaluation value section. ) Can be used to perform predetermined operations such as editing processing of the section, skip playback pause processing, repeated playback processing, and the like.

（２）ブロック構成例
ここでは、簡単のため、記録する画像音声情報信号は、放送番組のデータとし、ＭＰＥＧ（Moving Picture Export Group）による所定の帯域圧縮処理が行なわれるものとする。なお、その他の帯域圧縮信号処理としてウェーブレット変換、フラクタル解析信号処理その他などを用いた場合も考えられる。例えば、下記の説明で画像データのＤＣＴ係数は、ウェーブレット変換の場合には多重解像度解析におけるか解析係数などに相当し同様の信号処理を行うことも考えられる。 (2) Block Configuration Example Here, for the sake of simplicity, it is assumed that the recorded audio / video information signal is broadcast program data and is subjected to predetermined band compression processing by MPEG (Moving Picture Export Group). In addition, the case where wavelet transformation, fractal analysis signal processing, etc. are used as other band compression signal processing is also considered. For example, in the following description, the DCT coefficient of image data corresponds to an analysis coefficient or the like in multi-resolution analysis in the case of wavelet transform, and the same signal processing may be performed.

２．１ブロック構成例１
本発明を適用した記録再生装置３０の全体ブロック構成例を図１７に示す。 2.1 Block configuration example 1
FIG. 17 shows an example of the overall block configuration of a recording / reproducing apparatus 30 to which the present invention is applied.

ここでは、簡単のためテレビ放送を受信して、受信した放送番組を記録することを考える。 Here, for the sake of simplicity, it is assumed that a television broadcast is received and the received broadcast program is recorded.

２．１．１記録信号処理系
この記録再生装置３０では、受信アンテナ系１と受信系２により所定の放送番組が受信され、音声信号は音声Ａ／Ｄ変換処理系３で所定のサンプリング周波数、所定の量子化ビット数で所定のＡ／Ｄ変換信号処理が行われ、その後音声エンコーダー処理系４に入力される。 2.1.1 Recording Signal Processing System In this recording / reproducing apparatus 30, a predetermined broadcast program is received by the receiving antenna system 1 and the receiving system 2, and an audio signal is received by the audio A / D conversion processing system 3 at a predetermined sampling frequency, Predetermined A / D conversion signal processing is performed with a predetermined number of quantization bits, and then input to the speech encoder processing system 4.

音声エンコーダー処理系４では、例えばＭＰＥＧオーディオやＡＣ３オーディオ（ドルビーＡＣ３、又はAudio Code number 3）などの所定の帯域圧縮方式で信号処理が行われる。 In the audio encoder processing system 4, signal processing is performed by a predetermined band compression method such as MPEG audio or AC3 audio (Dolby AC3 or Audio Code number 3).

同様に、映像信号は映像Ａ／Ｄ変換処理系８で所定のサンプリング周波数、所定の量子化ビット数で所定のＡ／Ｄ変換信号処理が行われ、その後、画像エンコーダー処理系９に入力される。 Similarly, the video signal is subjected to predetermined A / D conversion signal processing at a predetermined sampling frequency and a predetermined number of quantization bits in the video A / D conversion processing system 8, and then input to the image encoder processing system 9. .

画像エンコーダー処理系９は、ＭＰＥＧビデオやウェーブレット変換などの所定の帯域圧縮方式で信号処理が行われる。 The image encoder processing system 9 performs signal processing using a predetermined band compression method such as MPEG video or wavelet transform.

音声エンコーダー処理系４及び画像エンコーダー処理系９で処理された音声データ及び画像データは、多重化処理系５を介して記録処理系６に入力される。 Audio data and image data processed by the audio encoder processing system 4 and the image encoder processing system 9 are input to the recording processing system 6 via the multiplexing processing system 5.

音声信号の特徴抽出を行うため、音声エンコーダー処理系４に入力する信号の一部又は上記所定エンコーダー信号処理における信号処理過程の途中の信号の一部は特徴抽出系１０に入力される。 In order to perform feature extraction of the audio signal, a part of the signal input to the audio encoder processing system 4 or a part of the signal in the process of the signal processing in the predetermined encoder signal processing is input to the feature extraction system 10.

ここで、この図１７に示した記録再生装置３０では、音声エンコーダー処理系４に入力される信号の一部として、音声エンコーダー処理系４から特徴抽出系１０に信号が入力されているが、音声エンコーダー処理系４に入力されると共に特徴抽出系１０に入力されるようにしてもよい。 Here, in the recording / reproducing apparatus 30 shown in FIG. 17, a signal is input from the audio encoder processing system 4 to the feature extraction system 10 as a part of a signal input to the audio encoder processing system 4. It may be inputted to the feature extraction system 10 as well as being inputted to the encoder processing system 4.

同様に映像（画像）信号の特徴抽出を行うため、映像エンコーダー処理系９に入力される信号の一部又は上記所定エンコーダー信号処理における信号処理過程の途中の信号の一部が特徴抽出系１０に入力される。 Similarly, in order to perform the feature extraction of the video (image) signal, a part of the signal input to the video encoder processing system 9 or a part of the signal in the signal processing process in the predetermined encoder signal processing is transferred to the feature extraction system 10. Entered.

ここで、この図１７に示した記録再生装置３０では、映像エンコーダー処理系９に入力される信号の一部として、映像エンコーダー処理系９から特徴抽出系１０に信号が入力されているが、映像エンコーダー処理系９に入力される共に特徴抽出系１０に入力されるようにしてもよい。 In the recording / reproducing apparatus 30 shown in FIG. 17, a signal is input from the video encoder processing system 9 to the feature extraction system 10 as part of a signal input to the video encoder processing system 9. Both may be input to the encoder processing system 9 and input to the feature extraction system 10.

記録モードにおいて所定区間毎に逐次特徴データは検出され、上記所定のエンコーダー処理がなされた画像音声データと共に所定の記録媒体７の所定の記録領域に記録される。 In the recording mode, the feature data is sequentially detected for each predetermined section, and is recorded in a predetermined recording area of the predetermined recording medium 7 together with the image / sound data subjected to the predetermined encoder processing.

上記特徴抽出データから所定の要約再生（ダイジェスト再生）を行うためのプレイリストデータ生成（９）又はチャプターデータ生成（１１）をプレイリスト・チャプター生成系１９で所定の信号処理を行う。 Playlist data generation (9) or chapter data generation (11) for performing predetermined summary playback (digest playback) from the feature extraction data is performed by the playlist / chapter generation system 19 with predetermined signal processing.

ここで、プレイリストデータ、チャプターデータ生成は、以下のような信号処理プロセス（処理ａ又は処理ｂ）で行うことが考えられる。 Here, it is conceivable that playlist data and chapter data are generated by the following signal processing process (processing a or processing b).

（処理ａ）特徴抽出データを所定メモリー系又はシステムコントローラー系の所定メモリー領域に所定データ量蓄積した後、所定のプレイリストデータの生成処理、所定チャプターデータの生成処理を行う。 (Processing a) After the feature extraction data is accumulated in a predetermined memory area of a predetermined memory system or system controller system, a predetermined playlist data generation process and a predetermined chapter data generation process are performed.

（処理ｂ）画像音声データを記録する記録媒体７に所定の特徴抽出処理を行う毎に逐次特徴データを記録し、所定データ量記録した後、そのデータを再生して、所定プレイリストデータ、所定チャプターデータ生成を行う。 (Process b) Each time the predetermined feature extraction process is performed on the recording medium 7 for recording the image / audio data, the characteristic data is sequentially recorded, and after the predetermined amount of data is recorded, the data is reproduced to obtain the predetermined playlist data, the predetermined Generate chapter data.

上記（処理ａ）の場合、例えば、所定時間長ｔの放送番組を記録することを考えると、その時間長ｔの記録が終了したら、その放送番組におけるすべての所定特徴抽出データが集積されるので、この時点で、時間長ｔのなかで所定要約再生時間ｔｄに対応するキーフレームがどこになるかを決めるプレイリストデータ生成処理を行うことができる。すなわち、この時間長ｔに処理される特徴データを上記メモリー系又は、システムコントローラー系の所定メモリー領域に蓄積（記憶又は記録）しておくことになる。 In the case of the above (Processing a), for example, when recording a broadcast program having a predetermined time length t, when the recording of the time length t is completed, all the predetermined feature extraction data in the broadcast program are accumulated. At this point in time, it is possible to perform playlist data generation processing that determines where the key frame corresponding to the predetermined summary playback time td is within the time length t. That is, the feature data processed for this time length t is accumulated (stored or recorded) in a predetermined memory area of the memory system or system controller system.

上記（処理ｂ）の場合は、上記（処理ａ）の場合と同様で所定時間長ｔ記録した後、所定時間ｔ記録終了したことを検出（検知）して、所定特徴抽出データ再生して所定要約再生時間ｔｄに応じたプレイリストデータ生成処理を開始することになる。 In the case of (Processing b), as in the case of (Processing a), after recording for a predetermined time length t, it is detected (detected) that the recording for a predetermined time t has been completed, and predetermined feature extraction data is reproduced and stored in a predetermined manner. The playlist data generation process corresponding to the summary playback time td is started.

プレイリストデータ生成処理が終了したら、所定の要約再生の動作を行う準備ができたことになり、このプレイリストデータを用いて所定の要約再生（ダイジェスト再生）が行える。 When the playlist data generation processing is completed, it is ready to perform a predetermined summary playback operation, and a predetermined summary playback (digest playback) can be performed using this playlist data.

上記所定の特徴抽出データは、プレイリストデータがすでに生成されているので、もうプレイリストデータを生成しないとういう場合には消去するように信号処理を行うことも考えられるが、プレイリストデータを修正するなど、データの生成を再度行う場合も考えられるので、特徴抽出データはそのまま記録して残しても良い。 Since the playlist data has already been generated for the predetermined feature extraction data, it may be possible to perform signal processing to delete the playlist data if it is no longer generated. For example, the data generation may be performed again, so that the feature extraction data may be recorded and left as it is.

上記特徴抽出データは、システムコントローラー系２０を介して、所定区間の特徴データ蓄積の後、プレイリスト・チャプター生成系１９で所定のダイジェスト再生（要約再生）用プレイリストデータを生成する。 As for the feature extraction data, after the feature data is accumulated in a predetermined section via the system controller system 20, the playlist / chapter generation system 19 generates predetermined playlist data for digest playback (summary playback).

上記生成されたプレイリストデータは、記録処理系６において所定の記録処理がなされた後、記録媒体７の所定の記録領域に記録される。 The generated playlist data is recorded in a predetermined recording area of the recording medium 7 after being subjected to a predetermined recording process in the recording processing system 6.

ここで、プレイリストデータは、所定の記録された区間をスキップ再生するための、所定再生区間毎の再生開始点情報と再生終了点情報の対となるデータから構成され、例えば、所定区間毎の再生開始フレーム番号と再生終了フレーム番号のデータ対などからなる。 Here, the playlist data is composed of data that is a pair of playback start point information and playback end point information for each predetermined playback section for skip playback of a predetermined recorded section. It consists of a data pair of a playback start frame number and a playback end frame number.

上記プレイリストデータは、その記録したプログラムにおける、所定の必要な区間をスキップ再生することでダイジェスト再生（要約再生）を行う処理のために使用するので、上記のようにフレームデータの他に、タイムコードデータやＭＰＥＧにおけるＰＴＳ（Presentation Time Stamp）、ＤＴＳ（Decode Time Stamp）などのタイムスタンプデータでも良い。 The playlist data is used for the process of performing digest playback (summary playback) by skip-playing a predetermined required section in the recorded program. Therefore, in addition to the frame data as described above, Code data or time stamp data such as PTS (Presentation Time Stamp) or DTS (Decode Time Stamp) in MPEG may be used.

上記プレイリストデータは、上記のように放送番組のような画像音声情報データを記録する記録モード時で所定プログラム記録終了後に所定の生成処理を行う他に、後で説明する再生モードにおいて、特徴抽出データを用いて所定の処理を行うようにしても良い。 In the recording mode for recording image / audio information data such as a broadcast program as described above, the playlist data is subjected to a feature generation process in a playback mode, which will be described later, in addition to performing a predetermined generation process after recording a predetermined program. Predetermined processing may be performed using data.

図１７において、例えば、すでにＭＰＥＧなど所定のエンコード処理がなされた画像、音声データを記録することを想定すると、音声エンコード処理系４、画像エンコード処理系９でエンコード処理を行う必要はなく、直接、多重化処理系５に入力し、記録処理系６で記録処理を行い記録媒体に記録することが考えられる。 In FIG. 17, for example, assuming that an image and audio data that have been subjected to a predetermined encoding process such as MPEG are recorded, it is not necessary to perform the encoding process in the audio encoding processing system 4 and the image encoding processing system 9; It is conceivable that the data is input to the multiplexing processing system 5, recorded by the recording processing system 6 and recorded on a recording medium.

ここで、直接デジタル画像、音声データが入力して記録されるか、受信系２によりアナログ信号が入力し所定のエンコード処理の後に記録されるかは、システムコントロール系２０で検出することができ、このように入力系統違いに応じて、上記所定の画像、音声特徴データ抽出処理を記録モードの時に自動的に行うか、記録終了後に行うかを決めるようにする、又はデジタル画像、音声データが入力する場合には、所定のエンコード処理系をデータが通らないことから所定のデータ構造解析処理が行われないことを考え、記録終了後に行うようにすることも考えられる。 Here, it can be detected by the system control system 20 whether the digital image and audio data are directly inputted and recorded, or whether the analog signal is inputted by the receiving system 2 and recorded after a predetermined encoding process, In this way, depending on the input system difference, it is determined whether the predetermined image and audio feature data extraction processing is automatically performed in the recording mode or after the recording is completed, or digital image and audio data are input. In this case, it is conceivable that the predetermined data structure analysis process is not performed because the data does not pass through the predetermined encoding processing system, and that the recording is performed after the recording is completed.

記録モードにおいて、上記アナログ入力系かデジタル入力系は、ユーザー入力Ｉ／Ｆ系２１を介してユーザーの所定操作によって設定することもできる。 In the recording mode, the analog input system or the digital input system can be set by a user's predetermined operation via the user input I / F system 21.

また、図１７で、音声エンコーダー処理系４又は音声Ａ／Ｄ変換処理系３、映像エンコーダー処理系９又は画像Ａ／Ｄ変換処理系８からの信号と、所定エンコード処理されたデジタル画像、音声データを直接システムコントローラー系２０で検出することで自動的に検出することもできる。 In FIG. 17, a signal from the audio encoder processing system 4 or the audio A / D conversion processing system 3, the video encoder processing system 9 or the image A / D conversion processing system 8, and a digital image and audio data subjected to predetermined encoding processing. Can be automatically detected by directly detecting by the system controller system 20.

所定エンコードされたデジタルデータが検出され、音声エンコーダー系４又は音声Ａ／Ｄ変換処理系３、映像エンコーダー系９又は画像Ａ／Ｄ変換処理系８でデータが検出されない場合は、所定エンコード処理されたデジタル画像、音声データが入力していると判定できる。 If predetermined encoded digital data is detected and no data is detected by the audio encoder system 4 or the audio A / D conversion processing system 3, the video encoder system 9 or the image A / D conversion processing system 8, the predetermined encoding process is performed. It can be determined that digital image and audio data are input.

所定エンコードされたデジタルデータが検出さないで、音声エンコーダー系４又は音声Ａ／Ｄ変換処理系３、映像エンコーダー系９又は画像Ａ／Ｄ変換処理系８からのデータがシステムコントローラー系２０で検出される場合は、アナログ入力と判定できる。 Data from the audio encoder system 4 or the audio A / D conversion processing system 3, the video encoder system 9 or the image A / D conversion processing system 8 is detected by the system controller system 20 without detecting the predetermined encoded digital data. Can be determined as analog input.

アナログ入力とエンコード処理されたデジタルデータが両方検出される場合は、例えば、受信系２からのアナログ入力信号を初期設定（デフォルト設定）として所定の記録処理を行うことが考えられる。 When both analog input and encoded digital data are detected, for example, it is conceivable to perform a predetermined recording process with an analog input signal from the receiving system 2 as an initial setting (default setting).

上記特徴抽出処理は、例えば画像のＤＣＴデータなどを用いるので、所定のエンコード処理がなされる場合には、通常の記録処理のために行うＤＣＴ処理を特徴抽出処理として兼用することができる。音声の場合には、所定エンコード処理におけるサブバンド処理データを用いることを考えると、所定のエンコード処理がなされる場合には、通常の記録処理のために行うサブバンド処理を特徴抽出処理として兼用することができる。 Since the feature extraction process uses, for example, DCT data of an image or the like, when a predetermined encoding process is performed, the DCT process performed for a normal recording process can be used as the feature extraction process. In the case of audio, considering the use of subband processing data in the predetermined encoding process, when the predetermined encoding process is performed, the subband process performed for the normal recording process is also used as the feature extraction process. be able to.

しかし、上記のように、エンコード処理されたデジタルデータが直接入力する場合は、エンコード処理を行う必要がないので、このデータを解析してＤＣＴなどデータを取り出すことが必要になり、処理の負荷が生じることになる。 However, as described above, when the encoded digital data is directly input, it is not necessary to perform the encoding process. Therefore, it is necessary to analyze the data and extract data such as DCT, which increases the processing load. Will occur.

そこで、必要に応じて記録終了後に特徴抽出処理を行うようにすることも考えられる。その他、記録終了後に、特徴抽出処理を行う場合として、上記アナログ入力の場合でも、信号処理系の負荷の具合に応じて、所定の記録が終了したら自動的に行うようにすることも考えられる。 Therefore, it may be considered to perform the feature extraction process after the recording is completed as necessary. In addition, as a case where the feature extraction processing is performed after the recording is finished, even in the case of the analog input, it may be automatically performed after the predetermined recording is finished depending on the load of the signal processing system.

例えば、図２０に示すように、特徴抽出処理はソフトウェア処理で行うこともできるので、システムコントローラー系の性能によっては記録モードの各所定信号処理と同時に行えず、所定の記録処理が終了してから行うことも考えられる。また、システムコントローラー系２０はＣＰＵ、ＤＳＰ（デジタルシグナルプロセッサ）、その他各種プロセッサなどで構成することが考えられるが、性能が高いほど高価なので上記のように処理能力に応じて、特徴抽出処理を記録処理と同時に行うか、終了後に行うかを決めることも考えられる。 For example, as shown in FIG. 20, feature extraction processing can also be performed by software processing. Depending on the performance of the system controller system, it cannot be performed simultaneously with each predetermined signal processing in the recording mode, and after the predetermined recording processing ends. It is possible to do it. The system controller system 20 may be composed of a CPU, a DSP (digital signal processor), and other various processors. However, the higher the performance, the higher the cost, so the feature extraction processing is recorded according to the processing capability as described above. It is also possible to decide whether to perform it simultaneously with processing or after processing.

上記特徴抽出処理を行う所定記録モード終了後としては、例えば、所定のタイマー記録動作終了後や、通常、ユーザーがその装置を動作させていないと想定できる夜中に、上記所定の特徴抽出処理を行うなどが考えられる。その場合に、例えば、装置が動作している時刻をシステムコントローラー系２０内の所定メモリ手段により記憶して、所定の学習処理により、適宜、特徴抽出処理する時刻を自動設定したりすることも考えられる。 After completion of the predetermined recording mode in which the feature extraction processing is performed, for example, the predetermined feature extraction processing is performed after the end of a predetermined timer recording operation or at night when it can be assumed that the user is not normally operating the device. And so on. In that case, for example, it is also conceivable that the time at which the apparatus is operating is stored in a predetermined memory means in the system controller system 20, and the time for feature extraction processing is automatically set appropriately by a predetermined learning process. It is done.

また、記録再生など通常の動作させていない時間がシステムコントローラー系２０で検出される場合には、その動作させていない間に上記所定の特徴抽出処理を行うことも考えられる。その場合に所定のデータすべてが処理されない場合も想定されるが、処理途中の場所をシステムコントローラー系２０内の所定メモリー手段に記憶しておき、装置が記録再生など通常動作していないことを検出して、処理できる時間があると判定されたら、途中の続きから所定の信号処理を行うことも考えられる。 In addition, when the system controller system 20 detects a time during which normal operation such as recording / reproduction is not performed, the predetermined feature extraction process may be performed while the system controller 20 is not operating. In this case, it is assumed that not all of the predetermined data is processed, but the location in the middle of processing is stored in a predetermined memory means in the system controller system 20 to detect that the apparatus is not operating normally such as recording and reproduction. Then, if it is determined that there is a time that can be processed, it may be possible to perform predetermined signal processing from the middle of the process.

２．１．２再生側処理
（通常再生モード動作）
次に図１７に示した記録再生装置３０における再生信号処理について説明する。 2.1.2 Playback side processing (normal playback mode operation)
Next, playback signal processing in the recording / playback apparatus 30 shown in FIG. 17 will be described.

先ず、通常再生モードの動作について説明する。 First, the operation in the normal playback mode will be described.

ユーザー入力Ｉ／Ｆ系２１により、通常再生モードになると、記録媒体７から所定の画像音声情報データ、特徴抽出データなどの記録されている所定のデータが再生され再生処理系１２において所定の再生処理がなされる。 When the normal reproduction mode is set by the user input I / F system 21, predetermined recording data such as predetermined image / audio information data and feature extraction data is reproduced from the recording medium 7, and predetermined reproduction processing is performed in the reproduction processing system 12. Is made.

再生された上記所定のデータは、再生データ分離処理系１３において所定のデータに分離処理され、音声データは音声デコード処理系１４に入力され、記録時に帯域圧縮信号処理された信号処理方式に対応する所定のデコード処理がなされ、その後、音声Ｄ／Ａ処理系１５に入力されてＤ／Ａ変換処理された後、音声信号として出力される。 The reproduced predetermined data is separated into predetermined data in the reproduction data separation processing system 13, and the audio data is input to the audio decoding processing system 14 and corresponds to a signal processing method in which band compression signal processing is performed at the time of recording. A predetermined decoding process is performed, and thereafter, the signal is input to the audio D / A processing system 15 and subjected to a D / A conversion process, and then output as an audio signal.

また、所定の分類処理された画像（映像）データは、映像デコード処理系１６において記録時に帯域圧縮信号処理された信号処理方式に対応する所定のデコード処理がなされた後、映像Ｄ／Ａ処理系１７に入力されてＤ／Ａ変換処理が行なわれ、映像信号として出力される。 Further, the image (video) data subjected to the predetermined classification processing is subjected to a predetermined decoding process corresponding to a signal processing method in which a band compression signal process is performed at the time of recording in the video decoding processing system 16, and then the video D / A processing system 17 is subjected to D / A conversion processing and output as a video signal.

（要約再生（ダイジェスト再生）モード）
要約再生モードを考える場合に、画像音声データと共に特徴抽出データ、プレイリストデータが記録媒体に記録されているかどうかで信号処理方法が異なる。 (Summary playback (digest playback) mode)
When considering the summary reproduction mode, the signal processing method differs depending on whether the feature extraction data and the playlist data are recorded on the recording medium together with the image and sound data.

徴抽出データとプレイリストデータが記録媒体に記録されているかどうかは図１８のように整理することができる。 It can be arranged as shown in FIG. 18 whether or not the collection data and the playlist data are recorded on the recording medium.

まず、図１８に示す（ａ），（ｂ）の場合に相当するプレイリストデータ（プレイリストデータファイル）、チャプターデータが再生できる場合、すなわち、プレイリストデータ、チャプターデータが所定記録媒体（所定データ記録媒体）に記録されており、要約再生モード時に再生、又は、チャプター表示モード時に所定チャプター画像がサムネール表示できる場合について説明する。 First, in the case where playlist data (playlist data file) and chapter data corresponding to the cases (a) and (b) shown in FIG. 18 can be reproduced, that is, the playlist data and chapter data are stored in a predetermined recording medium (predetermined data). A case will be described in which recording is performed on the recording medium) and playback in the summary playback mode or thumbnail display of a predetermined chapter image in the chapter display mode will be described.

すなわち、ユーザーが要約再生モード又は所定チャプターモードを選択した場合について説明する。 That is, a case where the user selects the summary playback mode or the predetermined chapter mode will be described.

ユーザー入力によりユーザー入力Ｉ／Ｆ系２１を介して、システムコントローラー系２０に所定の要約再生（ダイジェスト再生）モードの動作を行うコマンドが入力された場合に、再生データ分離処理系１３で所定データ分離し特徴抽出データ、また、パラメータデータやプレイリストデータ、チャプターデータなどが記録されている場合には、それぞれ分離された所定の特徴抽出データ、所定のパラメータデータ、所定のプレイリストデータ、チャプターデータなどがシステムコントローラー系２０に入力される。 When a command for performing a predetermined summary playback (digest playback) mode operation is input to the system controller system 20 via the user input I / F system 21 by user input, the playback data separation processing system 13 performs predetermined data separation. When feature extraction data, parameter data, playlist data, chapter data, and the like are recorded, predetermined separated feature extraction data, predetermined parameter data, predetermined playlist data, chapter data, etc. Is input to the system controller system 20.

再生データ分離処理系１３で上記特徴抽出データ、パラメータデータ、プレイリストデータ、チャプターデータが分離できない場合には、上記のそれぞれのデータはシステムコントローラー系２０に入力されないので、再生データ分類処理系１３とシステムコントローラー系２０とにより上記所定特徴抽出データ、プレイリストデータ、所定チャプターデータ、パラメータデータなどが所定記録媒体７に記録されているかどうかの判定処理機能を有することになる。 When the reproduction data separation processing system 13 cannot separate the feature extraction data, parameter data, playlist data, and chapter data, each of the above data is not input to the system controller system 20. The system controller system 20 has a function of determining whether or not the predetermined feature extraction data, playlist data, predetermined chapter data, parameter data, and the like are recorded on the predetermined recording medium 7.

プレイリストデータは所定の要約再生を行うために、所定の幾つかの再生区間の再生開始情報データと再生終了情報データから構成されている。 The playlist data is composed of playback start information data and playback end information data of a predetermined number of playback sections in order to perform predetermined summary playback.

チャプターデータは、所定特徴区間の先頭又はその近傍、又はその所定特徴区間の最後又はその近傍、その特徴区間に接続された特徴区間以外の区間の先頭又はその近傍、又はその特徴区間以外の区間の最後又はその近傍の位置情報から構成される。 The chapter data includes the beginning or the vicinity of the predetermined feature section, the end or the vicinity of the predetermined feature section, the beginning or the vicinity of the section other than the feature section connected to the feature section, or the section other than the feature section. It consists of position information at the end or its vicinity.

システムコントローラー系２０では、上記再生検出されたプレイリストデータのスキップ再生開始データ情報、スキップ再生終了データ情報に応じてスキップ再生を行うことでダイジェスト再生の動作を行う。 The system controller system 20 performs a digest playback operation by performing skip playback according to the skip playback start data information and skip playback end data information of the playlist data detected for playback.

また、所定チャプターデータによりチャプター点又はその近傍における画像を所定サムネール画像として所定表示処理を表示処理系２７で行い、所定の画像表示を行う。 In addition, a predetermined display process is performed by the display processing system 27 using the predetermined chapter data as an image at a chapter point or in the vicinity thereof as a predetermined thumbnail image, and a predetermined image display is performed.

次に、図１８に示す（ｃ）（ｄ）の場合に相当するプレイリストデータ（プレイリストデータファイル）、チャプターデータが再生できない場合、すなわち、プレイリストデータ、チャプターデータが記録媒体又は記憶媒体に記録（記憶）されておらず、要約再生モード時に再生できない場合、所定チャプターモード時にサムネール時点を所定サムネール表示、チャプター再生など一連のチャプター関連処理できない場合について説明する。 Next, when play list data (play list data file) and chapter data corresponding to the cases (c) and (d) shown in FIG. 18 cannot be reproduced, that is, play list data and chapter data are stored in a recording medium or a storage medium. A case in which a series of chapter related processing such as displaying a predetermined thumbnail at a predetermined thumbnail mode and chapter playback cannot be performed in the predetermined chapter mode when it is not recorded (stored) and cannot be reproduced in the summary playback mode will be described.

上記説明した放送番組などを受信した画像音声データでなく、例えば、記録媒体２５をＤＶＤソフトとし記録媒体処理系２６、再生処理系１２により再生する場合など他の記録媒体からの画像音声データを再生する場合や、特徴抽出していない画像音声データを再生する場合などは、ここで説明する処理に該当する。 Reproduce image / audio data from other recording media, for example, when the recording medium 25 is DVD software and reproduced by the recording medium processing system 26 and the reproduction processing system 12 instead of the image / audio data received from the broadcast program described above. The case described above, or the case where image / audio data that has not undergone feature extraction are reproduced corresponds to the processing described here.

プレイリストデータ又はチャプターデータが生成されておらず再生検出できない場合や、再生検出されたプレイリストデータ、チャプターデータを生成し直したい場合は、再生検出された所定の特徴抽出データとパラメータデータとから要約再生用プレイリストデータ、及び所定チャプター関連モード用チャプターデータを生成することができる。 If playlist data or chapter data has not been generated and playback detection cannot be performed, or if it is desired to regenerate the playlist data and chapter data that have been detected for playback, from the predetermined feature extraction data and parameter data that have been detected for playback. Summary play list data and chapter data for a predetermined chapter related mode can be generated.

図２６の（ｃ）に示す場合、すなわち、記録時に特徴抽出処理が行われており、特徴データが再生できる場合には、図１７に示した記録再生装置３０における再生処理系１２又は再生データ分離処理系１３からプレイリストデータ又は上記所定の特徴抽出データがプレイリスト・チャプター生成処理系１９に入力され、所定のプレイリストデータ又は所定のチャプターデータが生成される。 In the case shown in (c) of FIG. 26, that is, when feature extraction processing is performed at the time of recording and feature data can be reproduced, the reproduction processing system 12 or reproduction data separation in the recording / reproducing apparatus 30 shown in FIG. The playlist data or the predetermined feature extraction data is input from the processing system 13 to the playlist / chapter generation processing system 19 to generate predetermined playlist data or predetermined chapter data.

ここで説明する動作の場合に、ユーザーが要約再生モードのコマンドを行った場合に、図１９に示すようなプレイリストデータがないことを示す所定の表示を表示系２７で行うようにしても良い。 In the case of the operation described here, when the user performs a command for summary playback mode, a predetermined display indicating that there is no playlist data as shown in FIG. 19 may be performed on the display system 27. .

生成されたプレイリストデータはシステムコントローラー系２０に入力される。 The generated playlist data is input to the system controller system 20.

システムコントローラー系２０は、ユーザー入力による所定の要約再生時間に応じて、上記プレイリストデータに基づいた所定の再生区間を順次再生（スキップ再生）するように再生制御系１８をコントロールしその制御により記録媒体７を再生制御する。 The system controller system 20 controls the reproduction control system 18 so as to sequentially reproduce (skip reproduction) a predetermined reproduction section based on the playlist data in accordance with a predetermined summary reproduction time input by the user, and recording is performed by the control. The reproduction of the medium 7 is controlled.

また、生成されたチャプターデータはシステムコントローラー系２０に入力される。 The generated chapter data is input to the system controller system 20.

システムコントローラー系２０は、ユーザー入力による所定のチャプター関連動作モードに応じて、上記チャプターデータに基づいた所定のチャプター時点の画像サムネール表示、チャプター点のカットや接続などの編集処理、ユーザー選択したチャプター点のスキップ再生など、所定チャプター関連動作が行えるように再生制御系１８をコントロールし、その制御により記録媒体７の再生制御を行ったり、システムコントローラー系２０を介した表示処理系２７の制御などを行う。 The system controller system 20 displays an image thumbnail at a predetermined chapter point based on the chapter data, editing processing such as chapter point cut and connection, and the user-selected chapter point in accordance with a predetermined chapter-related operation mode by user input. The playback control system 18 is controlled so that a predetermined chapter-related operation such as skip playback can be performed, and the playback control of the recording medium 7 is performed by the control, and the display processing system 27 is controlled via the system controller system 20. .

上記したように、例えば、ＤＶＤなど外部記録媒体を記録媒体２５として要約再生する場合にも、上記と同様の信号処理により行うことができ、再生制御系１８により記録媒体処理系２６を制御し、上記したような所定の要約再生処理を行う。 As described above, for example, when an external recording medium such as a DVD is summarized and reproduced as the recording medium 25, it can be performed by the same signal processing as described above, and the recording medium processing system 26 is controlled by the reproduction control system 18, The predetermined summary reproduction process as described above is performed.

また、上記チャプターデータを用いた編集処理（編集操作）、所定チャプター点間（又はその近傍）のスキップ再生、チャプター点（又はその近傍）のサムネール画像表示など、一連の所定チャプター関連動作を上記と同様の信号処理により行うことができ、再生制御系１８により記録媒体処理系２６を制御し、上記したような所定信号処理を行う。 In addition, a series of operations related to a predetermined chapter such as editing processing (editing operation) using the chapter data, skip reproduction between predetermined chapter points (or the vicinity thereof), and thumbnail image display of the chapter points (or the vicinity thereof) are described above. The recording signal processing system 26 is controlled by the reproduction control system 18 to perform the predetermined signal processing as described above.

さらに、図１６に示す（ｄ）すなわち特徴抽出データが再生できない場合について説明する。 Furthermore, (d) shown in FIG. 16, that is, a case where the feature extraction data cannot be reproduced will be described.

上記の例では特徴抽出データからプレイリストデータ、チャプターデータを生成する場合について説明したが、例えば、他のユーザーが記録した外部記録媒体２５を記録媒体Ａ２６にコピーしたような場合を考えると、特徴抽出データが再生できない場合も想定できる。 In the above example, the case where playlist data and chapter data are generated from the feature extraction data has been described. For example, when considering a case where an external recording medium 25 recorded by another user is copied to the recording medium A26, the feature is considered. It can be assumed that the extracted data cannot be reproduced.

記録媒体７に放送番組などの画像音声データは記録されているが特徴抽出データが記録されておらず、再生できない場合について考える。 Consider a case where video and audio data such as a broadcast program is recorded on the recording medium 7 but feature extraction data is not recorded and cannot be reproduced.

ここで説明する動作の場合に、ユーザーが要約再生モード又は上記所定チャプター関連動作モードのコマンドを行った場合に、図１９に示すような特徴データがないことを示す所定の表示を表示系２７で行うようにしても良い。 In the case of the operation described here, when the user performs a command of the summary playback mode or the above-mentioned predetermined chapter related operation mode, a predetermined display indicating that there is no feature data as shown in FIG. You may make it do.

この場合の要約再生モードで記録媒体Ａ７から画像音声データを再生する場合は、再生処理系１２で再生されたデータは再生データ分離処理系１３に入力され、分離された記録時に所定の帯域圧縮方式で処理されている画像データと音声データは特徴抽出処理系１０に入力され、画像特性データであるＤＣＴＤＣ係数、ＡＣ係数、動きベクトル（モーションベクトル）など、音声特性データである音声パワー検出その他など各種所定の特性データ検出処理などが行われる。 In the case of reproducing the audio / video data from the recording medium A7 in the summary reproduction mode in this case, the data reproduced by the reproduction processing system 12 is input to the reproduction data separation processing system 13, and a predetermined band compression method is used at the time of the separated recording. The image data and audio data processed in the above are input to the feature extraction processing system 10, and DCT DC coefficients, AC coefficients, motion vectors (motion vectors), which are image characteristic data, audio power detection, etc., which are audio characteristic data, etc. Various predetermined characteristic data detection processes are performed.

特徴抽出処理系１０では、さらに、上記の各種画像音声特性データと所定のパラメータデータとにより、所定のテロップ特徴データ（テロップ区間判定データ）、人物特徴データその他の画像特徴データ（画像特徴区間判定データ）、及び話者音声特徴データ（話者音声判定データ）、拍手歓声特徴データ（拍手歓声判定データ）その他の音声特徴データ（音声特徴区間判定データ）の各種特徴抽出処理が行われる。 In the feature extraction processing system 10, predetermined telop feature data (telop section determination data), person feature data, and other image feature data (image feature section determination data) are further determined based on the above-described various image / audio characteristic data and predetermined parameter data. ) And speaker voice feature data (speaker voice determination data), applause cheer feature data (applause cheer determination data) and other voice feature data (speech feature section determination data).

上記各種の画像特徴抽出データ（画像特徴データ）及び音声特徴抽出データ（音声特徴データ）はシステムコントローラー系２０に入力され、所定番組あるいは、所定の画像音声区間のすべてについて上記所定の特徴抽出処理が終了したら特徴抽出処理が終了したと判定される。 The various image feature extraction data (image feature data) and audio feature extraction data (audio feature data) are input to the system controller system 20, and the predetermined feature extraction processing is performed for all of the predetermined program or the predetermined image / audio section. When finished, it is determined that the feature extraction process is finished.

ここで、上記特徴抽出処理が終了した場合には、システムコントローラー系２０から所定の信号処理が終了したことを示す信号が表示処理系２７に入力され、例えば図１９に示すような所定の表示を行うようにしても良い。 Here, when the feature extraction processing is completed, a signal indicating that the predetermined signal processing is completed is input from the system controller system 20 to the display processing system 27, and a predetermined display as shown in FIG. You may make it do.

次に上記特徴抽出データから所定のプレイリストデータ、チャプターデータを生成する処理について説明する。 Next, processing for generating predetermined playlist data and chapter data from the feature extraction data will be described.

上記の各種特徴抽出データは、所定の特徴抽出区間ごとにメモリー系１１に蓄えられ、上記すべての所定特徴データの処理が終了したらプレイリスト・チャプター生成処理系１９に入力され、所定のプレイリストデータ又はチャプターデータが生成される。 The various feature extraction data are stored in the memory system 11 for each predetermined feature extraction section. When all the predetermined feature data have been processed, they are input to the playlist / chapter generation processing system 19 and predetermined playlist data Alternatively, chapter data is generated.

ここで、特徴抽出処理系１０から直接プレイリスト・チャプター生成系１９に、所定区間の特徴抽出処理データを逐次入力するようにしても良く、上記で述べたように、所定の全ての区間や所定放送番組の特徴抽出処理が終了したら、システムコントローラー系２０からの所定の信号により、プレイリスト・チャプター生成系１９で上述の如き所定のプレイリストデータ又はチャプターデータ生成処理を行うようにしても良い。 Here, the feature extraction processing data of a predetermined section may be sequentially input directly from the feature extraction processing system 10 to the playlist / chapter generation system 19, and as described above, all predetermined sections or predetermined sections may be input. When the broadcast program feature extraction processing is completed, the playlist / chapter generation system 19 may perform the predetermined playlist data or chapter data generation processing as described above by a predetermined signal from the system controller system 20.

また、特徴抽出処理系からの上記処理された特徴データは、システムコントローラー系２０を介してプレイリスト・チャプター生成処理系１９に入力するように信号処理を行うことも考えられる。 It is also conceivable to perform signal processing so that the processed feature data from the feature extraction processing system is input to the playlist / chapter generation processing system 19 via the system controller system 20.

プレイリスト・チャプター生成処理系１９で、所定のプレイリストデータ又はチャプターデータが生成されると、所定の処理が終了したことを示す信号がシステムコントローラー系２０に入力され、所望の要約時間に応じた要約再生又は所定のチャプターデータを用いた所定チャプター関連動作を行うことができる。 When predetermined playlist data or chapter data is generated in the playlist / chapter generation processing system 19, a signal indicating that the predetermined processing has been completed is input to the system controller system 20, and according to a desired summary time It is possible to perform summary playback or a predetermined chapter related operation using predetermined chapter data.

この場合に、図１９に示すように、プレイリストデータ又はチャプターデータが生成できたことを示す所定の表示を行ったり、要約再生モード、チャプター関連の所定動作モードであることなどの表示を表示系２７により行っても良い。 In this case, as shown in FIG. 19, a predetermined display indicating that playlist data or chapter data has been generated is performed, a summary playback mode, a chapter-related predetermined operation mode, or the like is displayed on the display system. 27 may be performed.

ユーザーが要約再生を行う場合に、例えば記録した放送番組が１時間であったとし、それを３０分で再生したいのか、２０分で再生したいのかなど、ユーザーの所望する要約再生時間は分からないので、記録した放送番組など画像音声データの特徴抽出した全区間の全時間長に応じて、あらかじめ幾つかの要約時間に対応するプレイリストデータを生成することを考えることができる。 When a user performs summary playback, for example, if the recorded broadcast program is 1 hour, and the user wants to play it in 30 minutes or 20 minutes, it does not know the summary playback time desired by the user. It can be considered that playlist data corresponding to several summarization times is generated in advance according to the total time length of all sections in which the features of image and sound data such as recorded broadcast programs are extracted.

例えば、特徴抽出する放送番組の記録時間が１時間であったら、４０分、３０分、２０分の要約再生を行う、それぞれのプレイリストデータを生成する。このようにプレイリストデータを生成することで、リモコン２２などのユーザー入力で要約時間が選択された場合に、すぐに所定の要約時間に対応した要約再生動作を行うことができる。 For example, if the recording time of a broadcast program for feature extraction is 1 hour, each playlist data for summary playback for 40 minutes, 30 minutes, and 20 minutes is generated. By generating playlist data in this way, when a summary time is selected by user input from the remote controller 22 or the like, a summary reproduction operation corresponding to a predetermined summary time can be immediately performed.

記録媒体２５を再生する場合については、上記記録媒体Ａ７を再生する場合と同様で、記録媒体処理系２６により記録媒体２５を検出し、再生処理系１２により再生信号処理し、再生データ分離処理系１３において所定の画像音声データを分離する。その後の信号処理は上記の記録媒体７の場合と同様なので省略する。 The case of reproducing the recording medium 25 is the same as the case of reproducing the recording medium A7, and the recording medium processing system 26 detects the recording medium 25, the reproduction processing system 12 performs reproduction signal processing, and the reproduction data separation processing system. In step 13, predetermined image / audio data is separated. Subsequent signal processing is the same as in the case of the recording medium 7 described above, and is therefore omitted.

ここで、上述した一連の処理を実行する制御プログラムは、専用のハードウェアに組み込まれているコンピュータ又は各種のプログラムをインストールすることで、各種の機能を実行させることが可能な、例えば汎用のパーソナルコンピュータなどに記録媒体からインストールされる。 Here, the control program for executing the above-described series of processing is capable of executing various functions by installing a computer incorporated in dedicated hardware or various programs, for example, a general-purpose personal computer. Installed from a recording medium on a computer or the like.

この記録媒体は、制御プログラムが記録されているハードディスクだけではなく、コンピュータとは別に、ユーザーにプログラムを提供するために配布される、プログラムが記録されている磁気ディスク、光ディスク、光磁気ディスクもしくは半導体メモリーなどよりなるパッケージメディアにより構成される。 This recording medium is not only a hard disk in which a control program is recorded, but also a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor in which the program is recorded, which is distributed to provide a program to a user separately from a computer Consists of package media consisting of memory.

２．２ブロック構成例２
ここで、上記図１７に示した記録再生装置３０の変形例について図２０を参照して説明する。 2.2 Block configuration example 2
A modification of the recording / reproducing apparatus 30 shown in FIG. 17 will be described with reference to FIG.

放送番組を記録再生処理する信号処理の過程は上記図１７に示した記録再生装置３０と同様なので、図１７に示した記録再生装置３０とは信号処理が異なる部分について説明する。 Since the process of signal processing for recording / reproducing broadcast programs is the same as that of the recording / reproducing apparatus 30 shown in FIG. 17, a description will be given of parts different in signal processing from the recording / reproducing apparatus 30 shown in FIG. 17.

２．２．１記録側信号処理
この図２０に示す記録再生装置３０Ａにおいて上述の記録再生装置３０と異なるのは、記録モードにおいて特徴抽出処理を行う一連の信号処理をシステムコントローラー２０系においてソフトウェア的に行うことである。 2.2.1 Recording-side signal processing The recording / reproducing apparatus 30A shown in FIG. 20 differs from the above-described recording / reproducing apparatus 30 in that a series of signal processing for performing feature extraction processing in the recording mode is performed by software in the system controller 20 system. To do.

また、この記録再生装置３０Ａでは、ネットワーク系２４により、所定のソフトウェアをダウンロードして、説明するようなソフトウェア処理による、特徴抽出処理、プレイリスト処理（チャプター生成処理（再生区間、再生時点位置情報生成処理））などを行うことも考えられる。 Further, in the recording / reproducing apparatus 30A, predetermined software is downloaded by the network system 24, and feature extraction processing, playlist processing (chapter generation processing (reproduction section, reproduction time position information generation) by software processing as described below. It is also possible to perform processing)).

ソフトウェアによるダウンロードでは、例えば、本発明の処理が搭載されていない装置が、はじめにあった場合に、時間をおいて後からソフト的に本発明を適用できる利点があり、例えば、製造販売などの時間的に間に合わない場合には、設計、製造側においては、本発明を適用しない簡単な構成のシステムと後で、本発明を適用したシステムの両方のシステムをユーザーに提供できる。 In the case of downloading by software, for example, when there is an apparatus that is not equipped with the processing of the present invention at the beginning, there is an advantage that the present invention can be applied in software later, such as time for manufacturing and sales. If it is not in time, the design and manufacturing side can provide the user with both a system with a simple configuration to which the present invention is not applied and a system to which the present invention is applied later.

ユーザー側では、本発明を適用しない簡単な構成のシステムを購入した後で、ソフト的な処理で、本発明を適用できるので、後から機能を追加できるなどの利点がある。 On the user side, the present invention can be applied by software processing after purchasing a system having a simple configuration to which the present invention is not applied. Therefore, there is an advantage that functions can be added later.

また、処理系を修正、改良するなどの場合にもソフトをダウンロードしてアップグレードすることで、対応できる利点がある。 In addition, there is an advantage that can be handled by downloading and upgrading the software when the processing system is modified or improved.

本発明をソフトウェアダウンロードで装備する場合は、ユーザーは所定の操作系（リモコン２２など）で、所定のインターネットサイトにネットワーク系２４を介して接続し、所定の操作系による操作で本発明のソフトウェアをダウンロードする。 When the present invention is installed by software download, a user connects to a predetermined Internet site via a network system 24 with a predetermined operation system (such as the remote control 22), and operates the software of the present invention by operation using the predetermined operation system. to download.

ダウンロードされた本発明のソフトウェアは、システムコントローラー系２０で、所定の解凍処理、インストール処理などが行われ、後で説明する、特徴抽出処理、プレイリスト処理、チャプター処理ほか、本発明の所定の処理機能が装備される。 The downloaded software of the present invention is subjected to predetermined decompression processing, installation processing, and the like in the system controller system 20, and will be described later, including feature extraction processing, playlist processing, chapter processing, and other predetermined processing of the present invention. Equipped with functions.

システムコントローラー系２０として、所定性能を備えたマイクロプロセッサ（ＭＰＵ、又はＣＰＵ）を用いることで上記した所定の特徴抽出処理を所定の記録処理と同時に行うことが考えられる。 By using a microprocessor (MPU or CPU) having a predetermined performance as the system controller system 20, it is conceivable to perform the above-described predetermined feature extraction process simultaneously with the predetermined recording process.

上記したメモリー系１１もこのシステムコントローラー系２０内に備えられた所定のデータ記憶メモリーを用いることが考えられる。 It is conceivable that the memory system 11 also uses a predetermined data storage memory provided in the system controller system 20.

ここで、上記したように所定の記録処理として、所定の画像音声の帯域圧縮を行う場合に、上記のような所定の性能を備えたＭＰＵ又はＣＰＵ、又はＤＳＰ（デジタル・シグナルプロセッサ）を用いることが考えられ、この帯域圧縮処理を行っている同じＭＰＵ又はＣＰＵ、又はＤＳＰで上記所定の特徴抽出処理、プレイリスト生成処理などを行うことも考えられる。 Here, as described above, as predetermined recording processing, when performing band compression of predetermined video and audio, an MPU or CPU or DSP (digital signal processor) having the predetermined performance as described above is used. It is also possible to perform the predetermined feature extraction processing, playlist generation processing, and the like with the same MPU or CPU or DSP that performs this band compression processing.

２．２．２再生側信号処理
この図２０に示す記録再生装置３０Ａにおいて上述の記録再生装置３０と異なるのは、上記の場合のように再生モードにおいて、特徴データが検出できず特徴抽出処理を行う場合に、一連の信号処理をシステムコントローラー２０系においてソフトウェア的に行うことである。 2.2.2 Playback-side signal processing The recording / reproducing apparatus 30A shown in FIG. 20 differs from the recording / reproducing apparatus 30 described above in that the feature data cannot be detected in the reproduction mode as described above, and the feature extraction process is performed. When performing, a series of signal processing is performed by software in the system controller 20 system.

（３）特徴抽出処理
次に、音声系特徴抽出処理及び映像（画像）系特徴抽出処理の各信号処理について説明する。
３．１音声系特徴抽出処理
音声系特徴抽出処理系では、図２１に示すように、ＭＰＥＧの画像音声ストリームデータがストリーム分離系１００に入力され、分離された音声データは音声データデコード系１０１に入力され所定のデコード処理が行われる。 (3) Feature Extraction Processing Next, each signal processing of audio system feature extraction processing and video (image) system feature extraction processing will be described.
3.1 Audio Feature Extraction Processing In the audio feature extraction processing system, MPEG video / audio stream data is input to the stream separation system 100 and the separated audio data is input to the audio data decoding system 101 as shown in FIG. The inputted decoding process is performed.

デコードされた音声データ（音声信号）はレベル処理系１０２、データカウンタ系１０３、データバッファ系１０４に各々入力され、レベル処理系１０２では、音声データの所定区間の平均パワー（又は平均レベル）Ｐａｖを演算するため、データの絶対値化処理を行い、データカウンタ系１０３で所定サンプルデータ数まで計測されるまで、音声データ積算処理１０５で積算処理を行う。 The decoded audio data (audio signal) is input to the level processing system 102, the data counter system 103, and the data buffer system 104. The level processing system 102 calculates the average power (or average level) Pav in a predetermined section of the audio data. In order to perform the calculation, the data is converted into an absolute value, and the voice data integration process 105 performs the integration process until the data counter system 103 measures a predetermined number of sample data.

ここで、平均パワーＰａｖは、音声データの値（レベル）をＡｄ（ｎ）として以下の（３２）式の演算により求めることができる。 Here, the average power Pav can be obtained by the calculation of the following equation (32), where the value (level) of the audio data is Ad (n).

平均レベルを演算する所定区間として、例えば、約０．０１ｓｅｃ（１０ｍｓｅｃ）〜１ｓｅｃが考えられ、例えば、サンプリング周波数ＦｓをＦｓ＝４８ＫＨｚとすると、４８０〜４８０００サンプルの積算演算行い、サンプル数Ｓｍで平均処理を行って平均レベル（平均パワー）Ｐａｖを求める。 As the predetermined interval for calculating the average level, for example, about 0.01 sec (10 msec) to 1 sec can be considered. For example, if the sampling frequency Fs is Fs = 48 KHz, the integration calculation of 480 to 48000 samples is performed, and the average is obtained with the number of samples Sm. Processing is performed to obtain an average level (average power) Pav.

音声データ積算処理系１０５から出力されたデータＰａｖは、判定処理系１０６に入力され、しきい値設定系１０７で設定された所定しきい値Ａｔｈと比較処理され無音判定処理が行われる。 The data Pav output from the sound data integration processing system 105 is input to the determination processing system 106, and is compared with a predetermined threshold value Ath set by the threshold setting system 107, and a silence determination process is performed.

ここで、しきい値設定系１０７における所定しきい値Ａｔｈの設定において、Ａｔｈは固定値Ａｔｈ０として設定することが考えられるが、固定値Ａｔｈ０の他に、所定音声区間の平均レベルに応じた変動しきい値Ａｔｈｍを設定することも考えられる。 Here, in setting the predetermined threshold value Ath in the threshold setting system 107, it is conceivable that Ath is set as a fixed value Ath0, but in addition to the fixed value Ath0, the fluctuation according to the average level of the predetermined voice section It is also conceivable to set the threshold value Athm.

変動しきい値Ａｔｈｍとして、例えば、いま処理を考えている区間をｎとし、それより前の区間（ｎ−ｋ）の平均レベルＰａｖ（ｎ−ｋ）を考え、次の（３３）式のようにすることが考えられる。 As the variation threshold value Athm, for example, an interval in which processing is considered is n, and an average level Pav (nk) in an interval (nk) before that is considered, and the following equation (33) is given. Can be considered.

例えば、ｔ＝２として、
Ａｔｈｍ＝（Ｐａｖ（ｎ−１）＋Ｐａｖ（ｎ−２））／ｍ（３４）式
例えば、ｍはおおよそ、２〜２０くらいの範囲から設定することが考えられる。 For example, if t = 2
Athm = (Pav (n−1) + Pav (n−2)) / m (34) For example, it is conceivable that m is set from a range of approximately 2 to 20.

（その他の音声特徴抽出処理）
データバッファ系１０４に蓄積された所定音声データは、周波数解析処理系１０８に入力され、所定の周波数解析処理が行われる。 (Other audio feature extraction processing)
The predetermined audio data accumulated in the data buffer system 104 is input to the frequency analysis processing system 108, and a predetermined frequency analysis process is performed.

ここで、周波数解析処理としてＦＦＴ（高速フーリエ変換）などが考えられ、データバッファ系１０４からのデータの所定解析サンプルデータ数は、例えば、５１２、１０２４、２０４８、その他、など２のべき乗の所定サンプル数で所定の解析処理を行う。 Here, FFT (Fast Fourier Transform) or the like can be considered as the frequency analysis processing, and the number of predetermined analysis sample data of data from the data buffer system 104 is, for example, a predetermined power of 2 such as 512, 1024, 2048, etc. A predetermined analysis process is performed with a number.

周波数解析処理系１０８からの信号（データ）は、判定処理系１０９に入力され、所定の判定処理が行われる。 A signal (data) from the frequency analysis processing system 108 is input to the determination processing system 109, and a predetermined determination process is performed.

音楽（楽音）の判別処理は、所定周波数帯域のスペクトルピークの継続性から行うことができる。 Music (musical sound) discrimination processing can be performed from the continuity of a spectrum peak in a predetermined frequency band.

例えば、特開２００２−１１６７８４号公報などにはそれらの技術が開示されている。 For example, Japanese Unexamined Patent Application Publication No. 2002-116784 discloses such techniques.

話者音声の判定では、人の会話音声波形で息継ぎの区間があるので、波形に所定の急峻な対上がり、又は立下り区間が見られ、その所定立ち上がり、又は立下り区間を検出することで所定の信号処理を行うことができる。 In the determination of the speaker voice, since there is a breathing interval in the human conversation voice waveform, a predetermined steep rising or falling interval is seen in the waveform, and by detecting the predetermined rising or falling interval, Predetermined signal processing can be performed.

この場合に、上記音楽（楽音）信号波形の場合は話者音声の場合に比べて、一般的に波形の立ち上がり、又は立下り区間が現れる確率は小さいと考えられるので、この楽音（音楽）波形の特性（特徴）も考慮して、総合的に音声信号の属性判定を行うようにする。 In this case, in the case of the music (musical sound) signal waveform, it is generally considered that the probability that the rising or falling section of the waveform appears is smaller than that in the case of the speaker voice. In consideration of the above characteristics (features), the attribute determination of the audio signal is comprehensively performed.

上記のような、話者音声信号の波形特徴（波形特性）、音楽（楽音）信号の波形特徴（波形特性）の相違から音声信号の属性判定を行う場合に、波形における時間的な物理特性を検出することになるので、上記で説明したような周波数解析を行ってから所定の判定信号処理を行う方法（周波数領域での信号解析、判定処理）の他に、ベースバンド領域で所定の判定処理を行う方法（時間領域での信号解析、判定処理）も考えられる。 When performing speech signal attribute determination based on the difference between the waveform characteristics (waveform characteristics) of the speaker voice signal and the waveform characteristics (waveform characteristics) of the music (musical sound) signal, the temporal physical characteristics of the waveform In addition to a method of performing predetermined determination signal processing after performing frequency analysis as described above (signal analysis and determination processing in the frequency domain), predetermined determination processing in the baseband region is performed. A method of performing (signal analysis and determination processing in the time domain) is also conceivable.

ここで、音声信号（音声データ）をデコード処理しないで、圧縮帯域のままで信号の属性解析を行う場合の音声系特徴抽出処理系の構成例を図２２に示す。 Here, FIG. 22 shows a configuration example of an audio system feature extraction processing system in the case where signal attribute analysis is performed in the compression band without decoding audio signals (audio data).

図２２に示す音声系特徴抽出処理系では、所定の帯域圧縮信号処理が施されたデータストリーム、例えば、ＭＰＥＧなどの画像音声データがストリーム分離系１００に入力されて画像データと音声データに分離され、音声データはストリームデータ解析系１１０に入力され、所定のサンプリング周波数、量子化ビット数その他などの信号解析処理が行われ、所定の音声データはサブバンド解析処理系１１１に入力される。 In the audio system feature extraction processing system shown in FIG. 22, a data stream that has undergone predetermined band compression signal processing, for example, image audio data such as MPEG, is input to the stream separation system 100 and separated into image data and audio data. The audio data is input to the stream data analysis system 110, signal analysis processing such as a predetermined sampling frequency, the number of quantization bits, and the like is performed, and the predetermined audio data is input to the subband analysis processing system 111.

サブバンド解析処理系１１１で所定のサブバンド解析処理が行われ所定サブバンド帯域のデータは上記（３２）式〜（３４）式で説明したのと同様の所定信号処理が行われる。 Predetermined subband analysis processing is performed in the subband analysis processing system 111, and predetermined signal processing similar to that described in the above equations (32) to (34) is performed on data in the predetermined subband band.

すなわち、音声データ積算処理系１０５に入力され、データカウント系１０３で所定のサンプリングデータ数が検出されるまで所定の積算処理が行われ、その後、しきい値設定系１０７で設定される所定しきい値に基づいて判定処理系１０６で所定の無音判定処理が行われる。 That is, a predetermined integration process is performed until a predetermined number of sampling data is detected by the data count system 103 after being input to the audio data integration processing system 105, and then a predetermined threshold set by the threshold setting system 107. Based on the value, a predetermined silence determination process is performed in the determination processing system 106.

この無音判定処理では、音声データのスペクトルを考慮して、エネルギーが多く集まっている帯域で、サブバンド帯域としては大よそ３ＫＨｚ以下の所定データ帯域を用いることが考えられる。 In this silence determination process, it is conceivable to use a predetermined data band of approximately 3 KHz or less as a subband band in a band where a lot of energy is collected in consideration of the spectrum of audio data.

また、上記周波数解析により楽音（音楽）、話者音声の判定処理が行えることを述べたが、サブバンド解析系１１１の処理により、この信号処理系で所定の周波数解析が行われたのと同様と考えられるので、上記での述べたような所定スペクトルピークの継続性判定処理を行うことで属性判定の信号処理を行うことが考えられる。 In addition, although it has been described that determination processing of musical sound (music) and speaker voice can be performed by the above frequency analysis, the processing by the subband analysis system 111 is the same as when predetermined frequency analysis is performed by this signal processing system. Therefore, it is conceivable to perform signal processing for attribute determination by performing continuity determination processing for a predetermined spectrum peak as described above.

この場合、スペクトルピークは、各所定サブバンド帯域の中の最大データ帯域と考えることができ、ＦＦＴ解析処理の場合と同様の信号処理が行えると考えられる。 In this case, the spectrum peak can be considered as the maximum data band in each predetermined subband band, and it is considered that the same signal processing as in the FFT analysis processing can be performed.

３．２画像系特徴
次に映像（画像）系特徴（特徴）抽出処理について説明する。 3.2 Image System Feature Next, video (image) system feature (feature) extraction processing will be described.

映像系特徴抽出処理系では、図２３に示すように、ストリーム分離系で所定の分離処理が行われた画像データは、ストリームデータ解析系２００に入力され、レート検出、画素数検出その他など所定のデータ解析が行われ、ＤＣＴ係数処理系２０１でＤＣＴのＤＣ係数検出、ＡＣ係数検出など所定のＤＣＴ演算処理（逆ＤＣＴ演算処理）が行われ、このＤＣＴ係数処理２０１の出力に基づいて、シーンチェンジ検出系２０２、色特徴検出処理系２０３、類似画像検出処理系２０４、人物検出処理系２０５及びテロップ検出判定処理系２０６における各種処理が行われ、動きベクトル系２０８では、所定の動きベクトル検出処理が行われる。 In the video system feature extraction processing system, as shown in FIG. 23, the image data that has undergone the predetermined separation processing in the stream separation system is input to the stream data analysis system 200, and is subjected to predetermined detection such as rate detection, pixel number detection, and the like. Data analysis is performed, and DCT coefficient processing system 201 performs DCT calculation processing (inverse DCT calculation processing) such as DCT DC coefficient detection and AC coefficient detection. Based on the output of DCT coefficient processing 201, scene change is performed. Various processes are performed in the detection system 202, the color feature detection processing system 203, the similar image detection processing system 204, the person detection processing system 205, and the telop detection determination processing system 206. In the motion vector system 208, a predetermined motion vector detection process is performed. Done.

３．２．１シーンチェンジ特徴
シーンチェンジ検出系２０２では、例えば、所定画面領域に分割しその領域毎にＤＣＴのＤＣ係数データのＹ（輝度データ）、Ｃｂ、Ｃｒ（色差データ）の平均値を演算してフレーム間差分演算又は、フィールド間差分演算をその領域毎に行い、所定しきい値と比較して、所定のシーンチェンジ検出を行うことが考えられる。 3.2.1 Scene Change Feature In the scene change detection system 202, for example, an average value of Y (luminance data), Cb, and Cr (color difference data) of DC coefficient data of DCT is divided into predetermined screen areas. It is conceivable to calculate a difference between frames or calculate a difference between fields for each region and compare a predetermined threshold value to detect a predetermined scene change.

シーンチェンジが無い場合は、各領域のフレーム間（又はフィールド）差分データは所定しきい値より小さく、シーンチェンジがあるとしきい値より差分データが大きくなる場合が検出できる。 When there is no scene change, the difference data between frames (or fields) in each region is smaller than a predetermined threshold value, and when there is a scene change, a case where the difference data becomes larger than the threshold value can be detected.

ここで、画面分割の領域は、例えば、図２４に示すように有効画面を１６分割するような領域を考えることができる。 Here, for example, an area for dividing the effective screen into 16 as shown in FIG.

演算する画面分割の方法は図２４の場合に限らず、分割数を多くすることも、少なくすることも考えられるが、少なすぎるとシーンチェンジの検出精度が鈍感になり、分割数が多いと精度が鋭すぎることが考えられるので、およそ２５６（１６×１６）以下の範囲の間で適当な所定の分割数を設定することが考えられる。 The screen division method to be calculated is not limited to the case of FIG. 24, and it is conceivable to increase or decrease the number of divisions. However, if the number is too small, the accuracy of scene change detection becomes insensitive. Therefore, it is conceivable to set an appropriate predetermined number of divisions within a range of about 256 (16 × 16) or less.

３．２．２色（カラー）特徴
色特徴検出処理系２０３では、ＤＣＴのＤＣ係数の、所定領域におけるＹ、Ｃｂ、Ｃｒデータの平均値から色特徴を検出することができる。 3.2.2 Color Feature The color feature detection processing system 203 can detect a color feature from the average value of Y, Cb, and Cr data of a DCT DC coefficient in a predetermined region.

所定領域としては、例えば、図２５に示すような領域を考えることができる。 As the predetermined area, for example, an area as shown in FIG. 25 can be considered.

この図２５では、有効画面を横方向に４分割し検出領域１〜検出領域４、縦方向に４分割して検出領域５〜検出領域８を設けている。各検出領域には領域ＩＤが付され、各検出領域のデータは領域ＩＤによって識別される。 In FIG. 25, the effective screen is divided into four in the horizontal direction, and detection areas 1 to 4 are provided, and detection areas 5 to 8 are divided into four in the vertical direction. Each detection area is given an area ID, and the data of each detection area is identified by the area ID.

ここで、場合に応じて横方向だけの検出領域１〜４、又は縦方向だけの検出領域５〜８を設けることも考えられる。 Here, depending on the case, it is conceivable to provide detection regions 1 to 4 only in the horizontal direction or detection regions 5 to 8 only in the vertical direction.

また、図２５のような領域分割以外にも、５×５や、６×６といった碁盤状の分割方法なども考えられる。 In addition to the area division as shown in FIG. 25, a grid-like division method such as 5 × 5 or 6 × 6 is also conceivable.

例えば、放送番組で番組ジャンルとして「相撲」の場合を想定した場合に、図２５の検出領域３領域で茶色が検出できる場合は「土俵のシーン」の確率が高いと想定できる。 For example, assuming the case of “sumo” as the program genre in a broadcast program, it can be assumed that the probability of “soil scene” is high when brown can be detected in the detection region 3 of FIG.

この色特徴と例えば、音声の属性特徴と組み合わせると、「土俵のシーン」＋「音声属性その他（又は話者音声）」から「取組みが開始するシーン」の確率が高い想定できるので、このようなシーン区間がキーフレーム区間と設定できる。 When this color feature is combined with, for example, a voice attribute feature, it can be assumed that the probability of a “scene where the effort starts” is high from “soil scene” + “voice attribute other (or speaker voice)”. A scene section can be set as a key frame section.

また、この場合には、取組み開始シーンでは観客の歓声などで音声レベルが大きくなったり、通常の状態とはことなる音声周波数帯域のデータが検出されることが考えられるので、音声レベルや、所定の周波数領域データも特徴データとして考えることもできる。 In this case, in the approach start scene, the sound level may increase due to the cheering of the audience, or data in the sound frequency band different from the normal state may be detected. The frequency domain data can also be considered as feature data.

３．２．３類似シーン（類似画像）特徴
類似画像検出処理系２０４では、類似シーン（類似画像、類似映像）毎に、所定のＩＤ（識別番号、又は識別記号）をその画像（シーン）に付与（付加）（又は、割り当て）する処理で、類似画像（シーン）には同一ＩＤが付与（割り当て）処理される。例えば、特開２００２−３４４８７２号公報にその技術が開示されている。 3.2.3 Similar Scene (Similar Image) Features In the similar image detection processing system 204, a predetermined ID (identification number or identification symbol) is assigned to the image (scene) for each similar scene (similar image, similar video). In the process of giving (adding) (or assigning), a similar image (scene) is given (assigned) by the same ID. For example, Japanese Patent Laid-Open No. 2002-344872 discloses the technique.

この付加（付与）する処理はその画像（シーン）又は画像（シーン）の位置情報（フレーム番号、ＰＴＳ、記録時刻など）と一対一に対応するメモリー手段（データ記録手段）にそのＩＤを記録するもので、その画像を表示又はその画像から再生するなどの動作を行う場合に、その画像（シーン）の位置情報とＩＤは一対一に対応しており、画像（シーン）自体とその位置情報も言うまでもなく一対一に対応しているので、例えば、同一ＩＤの画像を表示するなど類似画像分類や同一ＩＤの画像シーンの時点をスキップ再生するなど、ＩＤを利用する色々な所定動作を行うことができる。 In the process of adding (applying), the ID is recorded in the memory means (data recording means) corresponding to the image (scene) or the position information (frame number, PTS, recording time, etc.) of the image (scene). Therefore, when performing an operation such as displaying or reproducing the image, the position information and ID of the image (scene) have a one-to-one correspondence, and the image (scene) itself and the position information are also Needless to say, since it corresponds one-to-one, for example, various predetermined operations using IDs can be performed, such as displaying images with the same ID, skipping playback of similar image classifications and image scenes with the same ID, etc. it can.

この特徴データとしては、上記のシーンＩＤについて言及したように、検出頻度の１位、２位などの検出出現順位を考えることができる。 As the feature data, the detection appearance ranks such as the first and second detection frequencies can be considered as mentioned for the scene ID.

また、図７に示すような、そのＰＵ区間長に対するその出現順位１位、２位など、検出したＩＤの検出長の割合を考えることができる。 Further, as shown in FIG. 7, the ratio of the detection length of the detected ID such as the first rank and the second rank in the appearance order with respect to the PU section length can be considered.

この特徴抽出処理は、例えば、画面を複数分割（例えば、２５分割）して、その各分割した画面領域に対応する領域のＤＣＴの平均ＤＣ係数を演算して、その演算した平均ＤＣ係数をベクトル成分として、所定ベクトル距離が所定しきい値より小さい所に対応する画像（シーン）を類似画像（類似シーン）とし、類似画像（類似シーン）には、同一の所定ＩＤ（シーンＩＤ）を割り当てる処理である。 In this feature extraction process, for example, the screen is divided into a plurality of parts (for example, 25 parts), the DCT average DC coefficient of the area corresponding to each divided screen area is calculated, and the calculated average DC coefficient is a vector. As a component, an image (scene) corresponding to a place where a predetermined vector distance is smaller than a predetermined threshold is set as a similar image (similar scene), and the same predetermined ID (scene ID) is assigned to the similar image (similar scene). It is.

例えば、ＩＤとして、初期値を１とし、上記所定しきい値より小さい画像（シーン）が検出されない場合は、ＩＤの最大値に１を加算したものを新たなＩＤとして、その画像（シーン）に割り当てる。 For example, if an image (scene) having an initial value of 1 and less than the predetermined threshold is not detected as an ID, a value obtained by adding 1 to the maximum value of the ID is set as a new ID and the image (scene) is added. assign.

本発明における、この特徴データの利用方法として、上記図５を参照して説明したように、所定区間でのＩＤの出現頻度を演算して、頻度１位〜２位の検出を行うなどの処理方法が考えられる。 In the present invention, as described above with reference to FIG. 5, as a method of using the feature data, a process of calculating the appearance frequency of the ID in a predetermined section and detecting the first to second frequencies is performed. A method is conceivable.

これは、例えばニュース番組を想定した場合に、アナウンサーシーンが頻繁に出現する場合や、相撲、野球など、類似シーンの出現が多く見込めるような番組ジャンルでは、有効な処理に用いることができると考えられる。 For example, when a news program is assumed, an announcer scene frequently appears, or in a program genre where many similar scenes such as sumo and baseball can be expected, it can be used for effective processing. It is done.

すなわち、ニュース番組で出現頻度１位や２位では、出現頻度の高いと想定できるアナウンサーシーンが検出できる確率が高いと考えられる。 That is, it is considered that the probability that an announcer scene that can be assumed to have a high appearance frequency can be detected at the first and second appearance frequencies in a news program.

図２６はＩＤの出現頻度の演算方法を説明するための概要を示すもので、例えば、区間ｆ１〜ｆ２、ｆ３〜ｆ４、ｆ５〜ｆ６、ｆ７〜ｆ８の４区間で同一ＩＤであるＩＤ１が検出できている。 FIG. 26 shows an outline for explaining a method for calculating the appearance frequency of IDs. For example, ID1 having the same ID is detected in four sections of sections f1 to f2, f3 to f4, f5 to f6, and f7 to f8. is made of.

すなわちこの区間では、類似したシーンが出現したと考えられる。 That is, it is considered that a similar scene appears in this section.

図２６のように、所定区間同じＩＤが連続している区間を一つとして数え、そのような区間がいくつあるかを演算する。 As shown in FIG. 26, a section having the same ID in a predetermined section is counted as one, and the number of such sections is calculated.

類似シーンが出現しなくなると同一ＩＤではなくなるので、ＩＤの連続性、不連続性の数を演算することで所定の頻度を算出することも考えられる。 When similar scenes do not appear, they do not have the same ID. Therefore, it may be possible to calculate a predetermined frequency by calculating the number of ID continuity and discontinuity.

３．２．４人物特徴
人物検出処理系２０５では、図２７に示すように画面の領域を分割し、各領域における所定の特定色を検出することで、人物が画面に現れているかを判定することが考えられる。 3.2.4 Person Features In the person detection processing system 205, as shown in FIG. 27, the screen area is divided, and a predetermined specific color in each area is detected to determine whether a person appears on the screen. It is possible.

図２７に示した例では、有効画面を２×２に分割した領域１〜４の４つの領域と、画面中央付近の領域５の５つの領域を想定している。 In the example shown in FIG. 27, four areas of areas 1 to 4 obtained by dividing the effective screen into 2 × 2 and five areas of the area 5 near the center of the screen are assumed.

例えば、ニュース番組では、領域５にアナウンサーの顔が出現する場合の確率が高いことが考えられる。 For example, in a news program, the probability that an announcer's face appears in the area 5 may be high.

また、フリップ又はテロップとアナウンサーシーンが出現する場合を想定すると、領域１又は領域２にアナウンサーの顔が出現する場合も考えられる。その場合に領域２又は領域１にフリップ又はテロップが出現すると想定できる。 Further, assuming that a flip or telop and an announcer scene appear, the announcer's face may appear in the region 1 or the region 2. In that case, it can be assumed that a flip or a telop appears in the region 2 or the region 1.

例えば、特定色として白人の肌色を想定すると、以下の条件式により特定色を検出することが実験から分かっている。 For example, when a white skin color is assumed as the specific color, it is known from experiments that the specific color is detected by the following conditional expression.

０．６＜Ｃｂ／Ｃｒ＜０．９〜０．９７（３５）式
（０≦Ｃｂ≦２５５、０≦Ｃｒ≦２５５）（３６）式
以下に説明するように、図２７に示した領域における特定色の検出による方法と別の方法も考えられる。 0.6 <Cb / Cr <0.9 to 0.97 (35) (0 ≦ Cb ≦ 255, 0 ≦ Cr ≦ 255) (36) As described below, in the region shown in FIG. A method different from the method by detecting a specific color is also conceivable.

ここでは、簡単のため画面サイズを７２０×４８０として考える。 Here, for simplicity, the screen size is assumed to be 720 × 480.

（処理１）
色信号（Ｃｂ，Ｃｒ）からの検出条件（白人系肌色条件）ＤＣＴ係数Ｃｂ、ＣｒのＤＣ成分においてマクロブロックは１６×１６で、ｘ方向は７２０／１６＝４５で、（０〜４４）、ｙ方向は４８０／１６＝３０で、（０〜２９）毎に以下の（３．２．３）式に示す判定条件でデータポイントを検出する。場合によっては、ｘ、ｙ方向各々１／２に圧縮してｘ方向０〜２２、ｙ方向０〜１４として処理する。ここで、０≦Ｃｂ≦２５５、０≦Ｃｒ≦２５５である。 (Process 1)
Detection conditions from color signals (Cb, Cr) (white skin color conditions) DCT coefficients Cb, in the DC component of Cr, the macroblock is 16 × 16, the x direction is 720/16 = 45, (0 to 44), The y direction is 480/16 = 30, and data points are detected every (0 to 29) under the determination condition shown in the following formula (3.2.3). In some cases, each of the x and y directions is compressed to ½ and processed as x direction 0 to 22 and y direction 0 to 14. Here, 0 ≦ Cb ≦ 255 and 0 ≦ Cr ≦ 255.

０．６＜Ｃｂ／Ｃｒ＜０．９〜０．９７（３７）式
ここで、例えば、８ビットシフト（１２８倍）して以下の（３８）式のような判定条件も考えられる。 0.6 <Cb / Cr <0.9 to 0.97 (37) Here, for example, an 8-bit shift (128 times) and a determination condition such as the following (38) is also conceivable.

７７＜（Ｃｂ＜＜８／Ｃｒ）＜１１５〜１２４（３８）式 77 << (Cb << 8 / Cr) <115 to 124 (38)

（処理２）
輝度信号ＡＣ係数からの検出条件（人物、顔などの輪郭部検出条件）上述の（３７）式や（３８）式にして示される判定条件において、所定しきい値Ａｔｈより大きいデータをｘ、ｙ方向毎に検出する。 (Process 2)
Detection condition from luminance signal AC coefficient (contour detection condition of person, face, etc.) In the determination condition shown by the above-mentioned formulas (37) and (38), data larger than a predetermined threshold Ath is x, y Detect for each direction.

ｘｈ（ｘ）＞Ａｔｈ（３９）式
ｙｈ（ｙ）＞Ａｔｈ（４０）式
場合によっては検出データから共分散処理を行う。 xh (x)> Ath (39) equation yh (y)> Ath (40) equation In some cases, covariance processing is performed from detected data.

例えば、図２８に示すように、●の部分が検出点で例えば以下のようになる。 For example, as shown in FIG. 28, the black circles are detection points, for example, as follows.

ｘｈ（０）＝０ｙｈ（０）＝０
ｘｈ（１）＝２ｙｈ（１）＝０
ｘｈ（２）＝２ｙｈ（２）＝３
・・・・・・
・・・・・・
・・・・・・ xh (0) = 0 yh (0) = 0
xh (1) = 2 yh (1) = 0
xh (2) = 2 yh (2) = 3
...
...
...

（処理３）
検出物の大きさの正当性の検出条件を考え、ｘｈ（ｘ）、ｙｈ（ｙ）の検出で、所定しきい値Ｈｔｈより大きい検出データ点
ｘｈ（ｘ）＞Ｈｔｈ（４１）式
ｙｈ（ｙ）＞Ｈｔｈ（４２）式
で、ｘ方向、ｙ方向毎に所定しきい値数Ｌｔｈより大きいデータ
ｘｌ（ｎ）＞Ｌｔｈ（４３）式
ｙｌ（ｍ）＞Ｌｔｈ（４４）式
の継続長を検出する。 (Process 3)
Considering the detection condition of the correctness of the size of the detected object, the detection data point that is larger than the predetermined threshold value Hth in the detection of xh (x), yh (y)
xh (x)> Hth (41)
yh (y)> Hth Data larger than a predetermined threshold number Lth in the x and y directions in equation (42)
xl (n)> Lth (43)
yl (m)> Lth The continuation length of the equation (44) is detected.

例えば図２８に示す場合では、Ｈｔｈ＝２で
ｘｈ（ｘ）＞２
ｙｈ（ｙ）＞２
の検出点が継続している部分の継続長は、
ｘｌ（０）＝６
ｘｌ（１）＝１
ｙｌ（０）＝７
ｙｌ（１）＝２
と検出され、例えば、Ｌｔｈ＝３とすると、ｘｌ（０）、ｙｌ（０）がここの処理での検出データとなる。 For example, in the case shown in FIG. 28, when Hth = 2
xh (x)> 2
yh (y)> 2
The continuation length of the part where the detection point of
xl (0) = 6
xl (1) = 1
yl (0) = 7
yl (1) = 2
For example, if Lth = 3, xl (0) and yl (0) are detected data in this processing.

（処理４）
人物の顔としての形状の正当性の検出条件を考える。検出された、ｘｌ（ｎ）、ｙｌ（ｍ）の各々についてその、差分又は比が所定範囲（０〜Ｄｔｈ又はｅｔｈ１〜ｅｔｈ２）のデータを検出する。 (Process 4)
Consider the detection conditions for the correctness of the shape of a person's face. For each of the detected xl (n) and yl (m), data whose difference or ratio is within a predetermined range (0 to Dth or eth1 to eth2) is detected.

｜ｘｌ（ｎ）−ｙｌ（ｍ）｜＜Ｄｔｈ（４５）式
又は
ｅｔｈ１＜ｘｌ（ｎ）／ｙｌ（ｍ）＜ｅｔｈ２（４６）式
図２８の例では、ｘｌ（０）、ｙｌ（０）について演算を行う。 | Xl (n) −yl (m) | <Dth (45) or eth1 <xl (n) / yl (m) <eth2 (46) In the example of FIG. 28, xl (0), yl (0) Perform an operation on.

ここで、人物の顔の形状を考え、顔を４角形で近似すると仮定し、縦横比を演算する。 Here, considering the shape of the human face, it is assumed that the face is approximated by a quadrangle, and the aspect ratio is calculated.

例えば、
０．８＜ｘｌ（ｎ）／ｙｌ（ｍ）＜１．５（４７）式
を検出条件と仮定すると、
ｙｌ（０）／ｘｌ（０）＝１．２（４８）式
で、図２８のｘｌ（０）、ｙｌ（０）の領域の物体は人物の顔の確率が高いと判定できる。 For example,
0.8 <xl (n) / yl (m) <1.5 Assuming the equation (47) as a detection condition,
yl (0) / xl (0) = 1.2 It can be determined that the object in the region of xl (0) and yl (0) in FIG.

ここで、上記（３．２．４）のようなビットシフト処理を行うことも考えられる。 Here, it is conceivable to perform the bit shift processing as described in (3.2.4) above.

上記（処理１）〜（処理４）の検出条件の他、以下の（処理５）のような検出データの継続性判定を行うことも考えられる。 In addition to the detection conditions of the above (Process 1) to (Process 4), it is also conceivable to perform detection data continuity determination as in the following (Process 5).

（処理５）
検出形状の時間継続性の条件を考える。
（継続性判定方法５．１）
場合によっては、上記（処理１）〜（処理４）の検出の時間継続性（検出の安定性）を判定することも考えられる。 (Process 5)
Consider the condition of time continuity of the detected shape.
(Continuity determination method 5.1)
In some cases, it may be possible to determine the detection time continuity (detection stability) in the above (Process 1) to (Process 4).

例えば、上記（４８）式からピクチャーＮでの検出値Ｓ（Ｎ）を
Ｓ（Ｎ）＝ｙｌ（０）／ｘｌ（０）（４９）式
とし、Ｓ（Ｎ＋１）、Ｓ（Ｎ＋２）などを検出して継続性の判定を行うことも考えられる。 For example, from the above equation (48), the detected value S (N) in the picture N is represented by the following equation: S (N) = yl (0) / xl (0) (49), and S (N + 1), S (N + 2), etc. It is also conceivable to detect and determine continuity.

例えば、
０．８＜Ｓ（Ｎ）＜１．５（５０）式
０．８＜Ｓ（Ｎ＋１）＜１．５（５１）式
０．８＜Ｓ（Ｎ＋２）＜１．５（５２）式
と３ピクチャー継続した場合に検出したと判定することも考えられる。 For example,
0.8 <S (N) <1.5 (50) Equation 0.8 <S (N + 1) <1.5 (51) Equation 0.8 <S (N + 2) <1.5 Equation (52) and 3 It is also conceivable to determine that detection has been made when the picture is continued.

ここで、検出処理を行うピクチャーは、Ｉピクチャーが考えられる。 Here, an I picture can be considered as a picture to be detected.

（継続性判定方法５．２）
また、その他の方法として、上記（処理１）〜（処理３）の検出値の何れか、幾つかをピクチャーＮでの検出データとして、Ｎ＋１、Ｎ＋２、Ｎ＋３と継続検出できるかを判定することも考えられる。 (Continuity determination method 5.2)
As another method, it is also possible to determine whether any one of the detection values in the above (Process 1) to (Process 3) can be continuously detected as N + 1, N + 2, N + 3 as some detection data in the picture N. Conceivable.

例えば、フレームＮでの検出値
Ｃｏｌ（Ｎ）＝（Ｃｂ＜＜８）／Ｃｒ（５３）式
を考え、
７７＜Ｃｏｌ（Ｎ）＜１１５（５４）式
７７＜Ｃｏｌ（Ｎ＋１）＜１１５（５５）式
７７＜Ｃｏｌ（Ｎ＋２）＜１１５（５６）式
のように、３つのＩピクチャーを継続して検出できたかを判定し、次の検出処理に移行することも考えられる。 For example, consider the detection value Col (N) = (Cb << 8) / Cr (53) in frame N,
77 <Col (N) <115 (54) Equation 77 <Col (N + 1) <115 (55) Equation 77 <Col (N + 2) <115 As shown in Equation (56), three I pictures can be detected continuously. It may be possible to determine whether or not to proceed to the next detection process.

また、検出されたＮ〜（Ｎ＋２）ピクチャーのデータの平均値を演算し、条件判定することも考えられる。 It is also conceivable to determine the condition by calculating the average value of the detected N to (N + 2) picture data.

すなわち、検出された３ピクチャーデータの平均値をＡｖＣｏｌとし、
ＡｖＣｏｌ＝（Ｃｏｌ（Ｎ）＋Ｃｏｌ（Ｎ＋１）＋Ｃｏｌ（Ｎ＋２））／３
（５７）式
７７＜ＡｖＣｏｌ＜１１５（５８）式
を判定処理することも考えられる。 That is, the average value of the detected three picture data is AvCol,
AvCol = (Col (N) + Col (N + 1) + Col (N + 2)) / 3
(57) Formula 77 <AvCol <115 It is also conceivable to perform determination processing of formula (58).

（継続性判定方法５．３）
上記（３９）式、（４０）式により、それらをピクチャーＮでの検出値ｘｈ（Ｎ）（ｘ）、ｙｈ（Ｎ）（ｙ）として、Ｎ＋１、Ｎ＋２などのピクチャーでの検出判定の継続性を見ることも考えられる。 (Continuity determination method 5.3)
According to the above formulas (39) and (40), detection values xh (N) (x) and yh (N) (y) are detected in the picture N, and detection continuity in pictures such as N + 1 and N + 2 is continued. It is also possible to see.

すなわち、
ｘｈ（Ｎ）（ｘ）＞Ａｔｈ（５９）式
ｘｈ（Ｎ＋１）（ｘ）＞Ａｔｈ（６０）式
ｘｈ（Ｎ＋２）（ｘ）＞Ａｔｈ（６１）式
ｙｈ（Ｎ）（ｙ）＞Ａｔｈ（６２）式
ｙｈ（Ｎ＋１）（ｙ）＞Ａｔｈ（６３）式
ｙｈ（Ｎ＋２）（ｙ）＞Ａｔｈ（６４）式
のように、３つのＩピクチャーが継続して検出できたかを判定し、次の検出処理に移行することも考えられる。 That is,
xh (N) (x)> Ath Formula (59)
xh (N + 1) (x)> Ath Formula (60)
xh (N + 2) (x)> Ath Formula (61)
yh (N) (y)> Ath Formula (62)
yh (N + 1) (y)> Ath Formula (63)
yh (N + 2) (y)> Ath It can be considered that it is determined whether or not three I pictures have been detected continuously, and the process proceeds to the next detection process.

すなわち、検出された３ピクチャーデータの平均値をＡｖｘｈ及びＡｖｙｈとし、
Ａｖｘｈ＝（ｘｈ（Ｎ）（ｘ）＋ｘｈ（Ｎ＋１）（ｘ）＋ｘｈ（Ｎ＋２）（ｘ））／３
（６５）式
Ａｖｙｈ＝（ｙｈ（Ｎ）（ｙ）＋ｙｈ（Ｎ＋１）（ｙ）＋ｙｈ（Ｎ＋２）（ｙ））／３
（６６）式
Ａｖｘｈ＞Ａｔｈ（６７）式
Ａｖｙｈ＞Ａｔｈ（６８）式
を判定処理することも考えられる。 That is, the average value of the detected three picture data is Avxh and Avyh,
Avxh = (xh (N) (x) + xh (N + 1) (x) + xh (N + 2) (x)) / 3
(65) Formula Avyh = (yh (N) (y) + yh (N + 1) (y) + yh (N + 2) (y)) / 3
(66) Formula
Avxh> Ath Formula (67)
Abyh> Ath (68) It is also conceivable to perform the determination processing.

（継続性判定方法５．４）
上記（４３）式、（４４）式により、それらをピクチャーＮでの検出値ｘｌ（Ｎ）（ｘ）、ｙｌ（Ｎ）（ｙ）として、Ｎ＋１、Ｎ＋２などのピクチャーでの検出判定の継続性を見ることも考えられる。 (Continuity determination method 5.4)
According to the above equations (43) and (44), the detection values xl (N) (x) and yl (N) (y) in the picture N are used as detection values continuity of detection determination in pictures such as N + 1 and N + 2. It is also possible to see.

すなわち、
ｘｌ（Ｎ）（ｘ）＞Ｌｔｈ（６９）式
ｘｌ（Ｎ＋１）（ｘ）＞Ｌｔｈ（７０）式
ｘｌ（Ｎ＋２）（ｘ）＞Ｌｔｈ（７１）式
ｙｌ（Ｎ）（ｙ）＞Ｌｔｈ（７２）式
ｙｌ（Ｎ＋１）（ｙ）＞Ｌｔｈ（７３）式
ｙｌ（Ｎ＋２）（ｙ）＞Ｌｔｈ（７４）式
のように、３Ｉピクチャー継続して検出できたかを判定し、次の検出処理に移行することも考えられる。 That is,
xl (N) (x)> Lth (69) Formula
xl (N + 1) (x)> Lth (70) Formula
xl (N + 2) (x)> Lth (71) Formula
yl (N) (y)> Lth (72) Formula
yl (N + 1) (y)> Lth Formula (73)
yl (N + 2) (y)> Lth It can be considered that it is determined whether or not 3I picture can be detected continuously, and the process proceeds to the next detection process, as in the equation (74).

すなわち、検出された３ピクチャーデータの平均値をＡｖｘｌ及びＡｖｙｌとし、
Ａｖｘｌ＝（ｘｌ（Ｎ）（ｘ）＋ｘｌ（Ｎ＋１）（ｘ）＋ｘｌ（Ｎ＋２）（ｘ））／３
（７５）式
Ａｖｙｌ＝（ｙｌ（Ｎ）（ｙ）＋ｙｌ（Ｎ＋１）（ｙ）＋ｙｌ（Ｎ＋２）（ｙ））／３
（７６）式
Ａｖｘｌ＞Ｌｔｈ（７７）式
Ａｖｙｌ＞Ｌｔｈ（７８）式
を判定処理することも考えられる。 That is, the average value of the detected three picture data is Avxl and Avyl,
Avxl = (xl (N) (x) + xl (N + 1) (x) + xl (N + 2) (x)) / 3
(75) Formula Avyl = (yl (N) (y) + yl (N + 1) (y) + yl (N + 2) (y)) / 3
(76) Formula
Avxl> Lth Formula (77)
It is also conceivable to perform determination processing of Avyl> Lth (78).

（人物数検出の基本的な処理方法の概要）
ここで、人物数の検出判定について考える。
（人数判定方法５．１Ｂ）
例えば、図２９の場合、ｘ方向の所定しきい値以上のデータｘｌ（０）、ｘｌ（１）の２個が検出され、ｙ方向はｙｌ（０）の１個が検出されていると仮定する。 (Outline of basic processing method for detecting the number of people)
Here, the detection determination of the number of persons is considered.
(Number of people judgment method 5.1B)
For example, in the case of FIG. 29, it is assumed that two pieces of data xl (0) and xl (1) exceeding a predetermined threshold value in the x direction are detected and one piece of yl (0) is detected in the y direction. To do.

ここで、ｘｌ（０）とｙｌ（０）とで特定される領域１と、ｘｌ（１）とｙｌ（０）とで特定される領域２のデータ密度について考える。 Here, consider the data density of region 1 specified by xl (0) and yl (0) and region 2 specified by xl (1) and yl (0).

領域１について、領域の全データポイントＳ１は、
Ｓ１＝ｘｌ（０）×ｙｌ（０）
＝２０（７９）式
所定しきい値より大きいデータ数は、
Σｘｈ（ｘ）＝１７（８０）式
データ密度Δ１、すなわち単位データポイント当たりのデータ数Δ１は、
Δ１＝０．８５（８１）式
ここで、領域１がすべてしきい値より大きいデータが検出された場合はデータ密度はΔ１＝１になる。そこで、所定しきい値Ｍｔｈを設定し、
Δ１＞Ｍｔｈ（８２）式
を判定する。 For region 1, all data points S1 of the region are
S1 = xl (0) × yl (0)
= 20 (79) Formula The number of data larger than the predetermined threshold is
Σxh (x) = 17 (80) Data density Δ1, that is, the number of data Δ1 per unit data point is
Δ1 = 0.85 (81) Here, when all the data in the region 1 is detected to be larger than the threshold value, the data density is Δ1 = 1. Therefore, a predetermined threshold value Mth is set,
Δ1> Mth (82) is determined.

同様に領域２について、領域の全データポイントＳ２は、
Ｓ２＝ｘｌ（１）×ｙｌ（０）
＝２５（８３）式
となる。所定しきい値より大きいデータ数は、
Σｘｈ（ｘ）＝２１（８４）式
となる。データ密度Δ２は、
Δ２＝０．８４（８５）式
となる。 Similarly, for region 2, all data points S2 of the region are
S2 = xl (1) × yl (0)
= 25 (83). The number of data larger than the predetermined threshold is
Σxh (x) = 21 (84) The data density Δ2 is
Δ2 = 0.84 (85).

ここで、例えばしきい値Ｍｔｈを
Ｍｔｈ＝０．８０（８６）式
と仮定すると、（８１）式、（８５）式から領域１及び領域２は条件を満たし、人物が検出されたとする確率が高いと判定される。 Here, for example, the threshold value Mth is set to
Assuming that Mth = 0.80 (86), it is determined from Expressions (81) and (85) that the regions 1 and 2 satisfy the condition and the probability that a person is detected is high.

ここで、ｘ方向について、ｘｌ（０）＋Ｘｌ（１）とｙｌ（０）で特定される領域Ｓｔについて考えると、全データポイント数は、
（ｘｌ（０）＋ｘｌ（１））×ｙｌ（０）＝４５（８７）式
となる。検出データ数は
Σｘｈ（ｘ）＝１７＋２１
＝３８（８８）式
となる。データ密度はΔは
Δ＝８．４（８９）式
となる。 Here, in the x direction, considering the region St specified by xl (0) + Xl (1) and yl (0), the total number of data points is
(Xl (0) + xl (1)) × yl (0) = 45 (87). The number of detected data is Σxh (x) = 17 + 21
= 38 Equation (88) is obtained. Data density is Δ
Δ = 8.4 Equation (89).

ここで、領域Ｓｔについても
Δ＞Ｍｔｈ（９０）式
であるので、領域１と領域２は、同じｙ方向の位置に人物が検出されると判定される。 Here, also for the region St
Since Δ> Mth (90), it is determined that the person is detected at the same position in the y direction in the area 1 and the area 2.

（その他の人数検出例１（領域が重なっている場合））
図３０について考えると、ｘ方向にｘｌ（０）の１個が検出され、ｙ方向にｙｌ（０）の１個が検出されている。 (Other people detection example 1 (when areas overlap))
Considering FIG. 30, one xl (0) is detected in the x direction and one yl (0) is detected in the y direction.

ｘｌ（０）とｙｌ（０）で特定される領域Ｒについて全データポイントＳｒは
Ｓｒ＝ｘｌ（０）×ｈｌ（０）
＝９０（９１）式
となる。検出データ数は、
Σｘｈ（ｘ）＝４４（９２）式
データ密度Δｒは、
Δｒ＝０．４９（９３）式
となる。 For the region R specified by xl (0) and yl (0), all data points Sr are
Sr = xl (0) × hl (0)
= 90 (91). The number of detected data is
Σxh (x) = 44 (92) The data density Δr is
Δr = 0.49 (93)

ここで、
Δｒ＜Ｍｔｈ
なので、領域Ｒには、１つの人物は検出されるとは判定できない。 here,
Δr <Mth
Therefore, it cannot be determined that one person is detected in the region R.

データ密度の逆数を考えると、
１／Δｒ＝２．０
であり、物体が２つある可能性があると考えられるが、図３１のようにデータがまばらな状態に存在する場合にもデータ密度は同じになる場合もある。 Given the reciprocal of data density,
1 / Δr = 2.0
It is considered that there is a possibility that there are two objects, but the data density may be the same when the data exists in a sparse state as shown in FIG.

図３０について、ｙ方向の分散σを考える。 For FIG. 30, consider the variance σ in the y direction.

ｙｈ（ｙ）の平均値をｙｈａｖ、データ数ｍとして
ｍ＝ｙｌ（０）（９４）式
σｙ＝（Σ（ｙｈ（ｙ）− ｙｈａｖ）＾２）／ｍ
＝２．３２（９５）式
となる。ｘ方向については、平均値をｘｈａｖデータ数ｎとして
ｎ＝ｘｌ（０）（９６）式
σｘ＝（Σ（ｘｈ（ｘ）−ｘｈａｖ）＾２）／ｎ
＝１．０４（９７）式
となる。 The average value of yh (y) is yhav and the number of data m
m = yl (0) (94) Formula
σy = (Σ (yh (y) −yhav) ^ 2) / m
= 2.32 (95). For the x direction, the average value is xhav data number n.
n = xl (0) (96) Formula
σx = (Σ (xh (x) −xhav) ^ 2) / n
= 1.04 (97).

次に、図３７について、同様にｙ方向、ｘ方向の分散は、
σｙ＝０．９９（９８）式
σｘ＝０．６４（９９）式
となる。 Next, regarding FIG. 37, similarly, the dispersion in the y direction and the x direction is
σy = 0.99 Equation (98)
σx = 0.64 (99).

上記の結果から、図３６の方がデータの分散値が大きいことが分かる。 From the above results, it can be seen that the variance value of data is larger in FIG.

そこで、分散値に対して所定しきい値Ｂｔｈ、検出物数に応じたしきい値ｄ１，ｄ２を設定し、以下のような条件を判定し、検出物数を検出することも考えられる。 Therefore, it is conceivable to set the predetermined threshold value Bth and the threshold values d1 and d2 corresponding to the number of detected objects for the variance value, determine the following conditions, and detect the number of detected objects.

σｙ＞Ｂｔｈ（１００）式
σｘ＞Ｂｔｈ（１０１）式
ｄ１＜１／Δ＜ｄ２（１０２）式
例えば、図３６の例では、
Ｂｔｈ＝２．０（１０３）式
ｄ１＝１．８（１０４）式
ｄ２＝２．３（１０５）式
のように、しきい値を設定して判定することも考えられる。 σy> Bth (100) formula
σx> Bth (101)
d1 <1 / Δ <d2 (102) For example, in the example of FIG.
Bth = 2.0 Formula (103)
d1 = 1.8 Equation (104)
d2 = 2.3 It is also conceivable to determine by setting a threshold value as in the equation (105).

（その他の検出例２（領域が対角的に離れている場合））
図３２について考えると、ｘ方向にｘｌ（０）、ｘｌ（１）の２個が検出され、ｙ方向にｙｌ（０）、ｙｌ（１）の２個が検出されている。 (Other detection example 2 (when regions are diagonally separated))
When considering FIG. 32, two of xl (0) and xl (1) are detected in the x direction, and two of yl (0) and yl (1) are detected in the y direction.

ｘｌ（０）とｙｌ（０）で特定される領域Ｒ００について全データポイントＳ００は
Ｓ００＝ｘｌ（０）×ｈｌ（０）
＝２０（１０６）式
検出データ数は、
Σｘｈ（ｘ）＝１７（１０７）式
データ密度Δｒは、
Δ００＝０．８５（１０８）式
ここで、上記（３．２．５２）から
Ｍｔｈ＝０．８０
としたので、
Δ００＞Ｍｔｈ（１０９）式
であり、領域Ｒ００には、１つの人物が検出される確率は高いと判定される。 For the region R00 specified by xl (0) and yl (0), all data points S00 are
S00 = xl (0) × hl (0)
= 20 (106) Expression The number of detected data is
Σxh (x) = 17 (107) The data density Δr is
Δ00 = 0.85 (108) From the above (3.2.52)
Mth = 0.80
Because
Δ00> Mth (109) Formula, and it is determined that the probability that one person is detected in the region R00 is high.

次に、ｘｌ（０）と（ｙｌ（０）＋ｙｌ（１））で特定される領域Ｒａについて全データポイントＳａは
Ｓａ＝ｘｌ（０）×（ｙｌ（０）＋ｙｌ（１））
＝４０（１１０）式
となる。全検出データ数は、（１０７）式から
Σｘｈ（ｘ）＝１７（１１１）式
であり、データ密度Δａは、
Δａ＝１７／４０
＝０．４３（１１２）式
となる。これは、しきい値条件を満たしていない。 Next, for the region Ra specified by xl (0) and (yl (0) + yl (1)), all data points Sa are Sa = xl (0) × (yl (0) + yl (1))
= 40 (110). The total number of detected data is from the equation (107), Σxh (x) = 17 (111), and the data density Δa is
Δa = 17/40
= 0.43 (112). This does not satisfy the threshold condition.

すなわち、今、ｘｌ（０）と（ｙｌ（０）＋ｙｌ（１））を考えているので、仮に、Δａが所定しきい値より大きいとすると、２つの人物が検出される確率は高いと判定される。 That is, since xl (0) and (yl (0) + yl (1)) are now considered, if Δa is larger than a predetermined threshold, it is determined that the probability that two persons are detected is high. Is done.

しかし、（１１２）式からΔａは、所定しきい値以下なので、ｘｌ（０）と（ｙｌ（０）＋ｙｌ（１））で特定される領域には、２つの人物は検出されるとは判定できず、１つの人物が検出されると判定する方が確からしい。 However, since Δa is equal to or smaller than a predetermined threshold value from the equation (112), it is determined that two persons are detected in the region specified by xl (0) and (yl (0) + yl (1)). It seems that it is more sure to determine that one person is detected.

すなわち、ｘｌ（０）とｙｌ（１）と特定される領域には人物が検出される確立は低いと判定される。 That is, it is determined that the probability that a person is detected is low in the areas specified as xl (0) and yl (1).

同様に、ｘｌ（１）と（ｙｌ（０）＋ｙｌ（１））とで特定される領域Ｒｂについて、全データ数は
Σｘｈ（ｘ）＝１７（１１３）式
となる。全データポイント数Ｓｂは、
Ｓｂ＝ｘｌ（１）（ｙｌ（０）＋ｙｌ（１））
＝４０（１１４）式
となる。データ密度Δｂは、
Δｂ＝１７／４０
＝０．４３（１１５）式
となる。 Similarly, for the region Rb specified by xl (1) and (yl (0) + yl (1)), the total number of data is expressed by Σxh (x) = 17 (113). The total number of data points Sb is
Sb = xl (1) (yl (0) + yl (1))
= 40 (114). The data density Δb is
Δb = 17/40
= 0.43 (115).

この（１１５）式から領域Ｒｂには２つの人物が検出される確率は低いことになる。 From this equation (115), the probability that two persons are detected in the region Rb is low.

ここで、ｘｌ（１）とｙｌ（０）で特定される領域のデータ密度Δ１０は、検出データ数が
Σｘｈ（ｘ）＝１７
であり、全データポイント数が
ｘｌ（１）×ｙｌ（０）＝２０
なので、
Δ１０＝１７／２０
＝０．８５（１１６）式
ｘｌ（１）とｙｌ（１）とで特定される領域のデータ密度Δ１１についても同様に
Δ１１＝０．８５（１１７）式
となる。 Here, the data density Δ10 of the region specified by xl (1) and yl (0) is the number of detected data.
Σxh (x) = 17
And the total number of data points is
xl (1) × yl (0) = 20
So,
Δ10 = 17/20
= 0.85 (116) Similarly for the data density Δ11 of the region specified by the expressions xl (1) and yl (1)
Δ11 = 0.85 (117).

上記（１１５）式〜（１１７）式から、領域１０か領域１１の何れかは人物が検出される確率は低いことになる。 From the above formulas (115) to (117), the probability that a person is detected is low in either the area 10 or the area 11.

次に、（ｘｌ（０）＋ｘｌ（１））とｙｌ（０）で特定される領域Ｒｃのデータ密度について考える。検出データ数は、
Σｙｈ（ｙ）＝１７
全データポイント数は
（ｘｌ（０）＋ｘｌ（１））×ｙｌ（０）＝４０
よって、データ密度Δｃは、
Δｃ＝１７／４０
＝０．４３（１１８）式
これは、上記の所定しきい値Ｍｔｈ以下なので、領域Ｒｃは２つの人物が検出される確率は低く、上記（１０９）式及び（１１５）式〜（１１７）式などから、結局、人物が検出されるのは、ｘｌ（０）とｙｌ（０）で特定される領域、ｘｌ（１）とｙｌ（１）で特定される領域の２つとなる。 Next, consider the data density of the region Rc specified by (xl (0) + xl (1)) and yl (0). The number of detected data is
Σyh (y) = 17
The total number of data points is (xl (0) + xl (1)) × yl (0) = 40
Therefore, the data density Δc is
Δc = 17/40
= 0.43 (118) Since this is not more than the above-mentioned predetermined threshold value Mth, the probability that two persons are detected in the region Rc is low, and the above equations (109) and (115) to (117) As a result, a person is eventually detected in two areas: an area specified by xl (0) and yl (0) and an area specified by xl (1) and yl (1).

以上のような判定処理により、人物数検出の可能性も考えられる。 The possibility of detecting the number of persons is also conceivable by the determination process as described above.

（その他の人検出処理方法（方式２））
その他の方法として、ｘ方向（０〜４４）及びｙ方向（０〜２９）に対して順次、所定しきい値条件を満たすか判定して行く方法も考えられる。 (Other human detection processing methods (method 2))
As another method, a method of sequentially determining whether a predetermined threshold condition is satisfied in the x direction (0 to 44) and the y direction (0 to 29) is also conceivable.

データ系列をｄ（ｘ）（ｙ）として、上記（３７）式、（４１）式及び（４２）式の条件を満たすデータ系列を検出して行き、例えば、
ｄ（ｘ１）（ｙ１），ｄ（ｘ２）（ｙ１）
ｄ（ｘ１）（ｙ２），ｄ（ｘ２）（ｙ２）
このように、ｘ方向とｙ方向で連続して検出されたとすれば、検出物の大きさと、位置も同時に分かることになる。 Assuming that the data series is d (x) (y), the data series that satisfies the conditions of the above formulas (37), (41), and (42) is detected.
d (x1) (y1), d (x2) (y1)
d (x1) (y2), d (x2) (y2)
As described above, if the detection is continuously performed in the x direction and the y direction, the size and the position of the detected object can be known at the same time.

しかし、この方法では全データを１つずつ検出し、系列データの連続性の判定を行うなどのため、演算時間が上記の（方式１）に比較して多くかかると思われる。 However, in this method, all the data is detected one by one and the continuity of the series data is determined, so that it seems that the calculation time is longer than that in the above (Method 1).

この手法を行う場合には、例えば、ｘ方向及び、ｙ方向にデータを１／２に圧縮してデータ処理数を減らしてから行うことも考えられる。 In the case of performing this method, for example, it is conceivable that the data is compressed by half in the x direction and the y direction to reduce the number of data processing.

（その他の人検出処理方法（方式３））
上記（方式２）と類似の方法で別の手法として、人物を４角形で近似し、４角形の大きさを順次変えて、その４角形の領域のデータが所定の条件を満たすかを判定することも考えられる。 (Other human detection processing methods (method 3))
As another method similar to the above (Method 2), a person is approximated by a quadrangular shape, the size of the quadrangular shape is sequentially changed, and it is determined whether the data of the quadrangular region satisfies a predetermined condition. It is also possible.

例えば、図３３のような（２×２）、（３×３）、（４×４）の４角形を考える。 For example, consider (2 × 2), (3 × 3), and (4 × 4) quadrangles as shown in FIG.

上記のような大きさの異なる４角形の領域を順次小さい４角形から１データずつ移動させ、その領域内のデータが条件を満たすか判定して行き、すべての領域で判定が終了したら、次の大きさの４角形についても同様に処理を行う。 The quadrangular regions having different sizes as described above are sequentially moved one by one from the smaller quadrangles, and it is determined whether the data in the region satisfies the conditions. The same process is performed for the quadrangular size.

すべての大きさの４角形について、処理が終了した時点で、検出領域と、検出個数が分かるが、上記（方式２）同様処理時間が多くかかると考えられる。 Although the detection area and the number of detections are known at the time when the processing is completed for all the quadrangles, it is considered that the processing time is long as in the above (Method 2).

３．２．５テロップ特徴
テロップ検出判定処理系２０６では、図２５に示すような画面の領域におけるＤＣＴのＡＣ係数の平均値を検出する。 3.2.5 Telop Feature The telop detection determination processing system 206 detects an average value of DCT AC coefficients in a screen area as shown in FIG.

所定領域における画面内で、所定の大きさの文字情報を含むテロップは比較的、輪郭がはっきりしており、図２５の何れかの領域にテロップ画像が現れた場合に、所定しきい値以上のＡＣ係数が検出でき、これによりテロップ検出が行えると考えられる。 A telop including character information of a predetermined size within a screen in a predetermined area has a relatively clear outline, and when a telop image appears in any area of FIG. It is considered that the AC coefficient can be detected, and thus the telop can be detected.

上記のようにＤＣＴのＡＣ係数を検出する方法の他に、ベースバンド領域（時間領域の信号）でエッジ検出する方法も考えられ、例えば、画像の輝度データのフレーム間差分によりエッジを検出することも考えられる。 In addition to the method of detecting the AC coefficient of DCT as described above, a method of detecting an edge in the baseband region (time domain signal) is also conceivable. For example, an edge is detected by a difference between frames of luminance data of an image. Is also possible.

また、ウェーブレット変換により、多重解像度解析を行い、所定の高周波成分データを含む所定の多重解析度領域におけるデータを用いて、図２５に対応する領域の平均値を演算するようにして上記ＡＣ係数を用いる場合と同様の信号を行うことも考えられる。 In addition, multi-resolution analysis is performed by wavelet transform, and the AC coefficient is calculated by calculating the average value of the region corresponding to FIG. 25 using data in a predetermined multi-resolution region including predetermined high-frequency component data. It is also conceivable to perform a signal similar to that used.

テロップは、フリップの用に淡色領域とは限らず、例えば、ニュース映像の下部に出現する文字情報であり、出現領域は、おおむね、番組ジャンルにもよるが、画面の下部、上部、又は左側際、右側際などの場合の可能性が大きい。 A telop is not limited to a light-colored area for flipping, but is, for example, text information that appears at the bottom of a news video. The appearance area depends on the bottom, top, or left side of the screen, depending on the program genre. In the case of right side, etc., the possibility is great.

テロップ特徴と、フリップ特徴をまとめて、文字特徴とすることも考えられる。 It is also possible to combine the telop feature and the flip feature into a character feature.

３．２．６カメラ特徴
カメラ特徴判定処理系２０９では、ズーム、パンその他などのカメラ動作に関する特徴で、例えば、特開２００３−２９８９８１号公報に開示されているように、ＭＰＥＧの場合には、Ｐピクチャーのモーションベクトル（動きベクトル）を用いて判定することが考えられる。 3.2.6 Camera Features The camera feature determination processing system 209 is a feature related to camera operations such as zooming, panning, and the like. For example, as disclosed in Japanese Patent Application Laid-Open No. 2003-289881, It is conceivable to make a determination using a motion vector (motion vector) of a P picture.

その他に例えば、特表２００２−５３５８９４号公報にカメラ特徴に関する技術が開示されている。 In addition, for example, Japanese Unexamined Patent Publication No. 2002-535894 discloses a technique related to camera characteristics.

（４）再生ユニット（プレイユニット）処理
ここで、ダイジェスト再生（要約再生）は所定の信号処理による音声系特徴データ、映像系特徴データの各特徴データ（特徴抽出データ）を用いて所定の信号処理により、所定区間内において幾つかの重要再生区間（キーフレーム区間）を選定（選択）して、その各区間を逐次、スキップ再生動作することを考える。 (4) Playback unit (play unit) processing Here, digest playback (summary playback) is a predetermined signal processing using each feature data (feature extraction data) of audio system feature data and video system feature data by predetermined signal processing. Thus, it is considered that several important playback sections (key frame sections) are selected (selected) within a predetermined section and each section is sequentially subjected to skip playback operation.

スキップ再生を行う場合に、例えば、話者音声区間の途中で、スキップすると、画面上は見ていてあまり違和感がない場合でも、音声の途中で途切れる場合にユーザーによっては聴感上で違和感を生じる場合が考えられるので、所定音声レベル（音量）以下の区間を無音区間と設定し、その区間内の所定の時点をスキップ時点の候補とすることが考えられる。 When performing skip playback, for example, when skipping in the middle of the speaker's voice section, even if there is not much discomfort when looking on the screen, some users may experience discomfort in the sense of hearing when the sound is interrupted in the middle Therefore, it is conceivable that a section below a predetermined sound level (volume) is set as a silent section, and a predetermined time point in the section is set as a skip time candidate.

また、映像のシーンチェンジは、放送番組、映画その他の映像再生では、話題の区切りの時点として考えられるので、シーンチェンジ点、又はその近傍をスキップ時点の候補とすることも考えられる。 In addition, a video scene change can be considered as a topic break point in broadcast programs, movies, and other video reproductions. Therefore, a scene change point or its vicinity can be considered as a skip point candidate.

上記のことから、所定音声信号における所定無音区間と所定映像信号のシーンチェンジ時点又はその近傍の所定時点に応じてスキップ再生時点、スキップ再生区間を考えることができる。 From the above, it is possible to consider skip reproduction time points and skip reproduction intervals according to a predetermined silent period in a predetermined audio signal and a scene change time point of the predetermined video signal or a predetermined time point in the vicinity thereof.

ここでは、上記のような観点から、（（スキップ再生時点間（又はその区間）を、便宜上、所定の再生単位（以下再生ユニット又はプレイユニットＰｌａｙＵｎｉｔ（又はＰＵ））を設定して処理を行うことを考える。 Here, from the above viewpoint, (for the sake of convenience, a predetermined playback unit (hereinafter referred to as playback unit or play unit Play Unit (or PU)) is set between ((skip playback time points)). Think about it.

このように設定された再生ユニット（ＰＵ）における所定の画像系特徴データ、所定の音声系特徴データが所定処理され、それら映像、音声特徴データと要約再生時間に応じて所定の要約再生（ダイジェスト再生）区間が設定され、所定の要約再生モードでスキップ再生を行うことで所定の要約再生が実行される。 Predetermined image system feature data and predetermined audio system feature data in the playback unit (PU) thus set are subjected to predetermined processing, and predetermined summary playback (digest playback) is performed according to the video, audio feature data and summary playback time. ) A section is set, and predetermined summary playback is executed by performing skip playback in a predetermined summary playback mode.

また、上記のような要約再生を行うだけでなく、所定の信号処理により設定されたＰＵの最初（又はその近傍）又は最後（又はその近傍）の時点にチャプター（又は編集点、又は再生区切り点）を設定することも考えることができる。 In addition to performing summary playback as described above, a chapter (or edit point or playback breakpoint) at the first (or nearby) or last (or nearby) time point of a PU set by predetermined signal processing. ) Can also be considered.

すなわち、上記で述べたようなチャプターを設定することで、そのチャプター点を所定の信号処理によりサムネール表示を行うことや、ユーザーがそのサムネール表示を見て編集を行うなどの操作を行うことができる。 That is, by setting the chapters as described above, the chapter points can be displayed as thumbnails by predetermined signal processing, and the user can perform operations such as editing while viewing the thumbnail display. .

次に再生ユニット（プレイユニット）（ＰＵ）の処理の一例について、図３４〜図３５を参照して説明する。 Next, an example of processing of a playback unit (play unit) (PU) will be described with reference to FIGS.

（有音区間の場合（音声信号が所定レベル以上の場合））
図３４の（Ａ）に示す処理法１のように、所定平均レベル以上の有音区間で、その音声区間が１０秒〜２０秒の範囲の場合には、シーンチェンジによらず音声セグメントが１５秒に最も近い切れ目（所定の無音検出時点）を再生ユニットの区切りとする。 (In the case of a sound section (when the audio signal is above a certain level))
As in processing method 1 shown in FIG. 34 (A), when the voice segment is in the range of 10 to 20 seconds in a voiced segment of a predetermined average level or higher, the voice segment is 15 regardless of the scene change. The break closest to the second (predetermined silence detection time) is defined as a playback unit break.

図３４の（Ｂ）に示す処理法２のように、音声区間が連続して２０秒より長く、シーンチェンジ区間が２０秒以下の場合には、シーンチェンジの切れ目が１５秒に最も近いシーンチェンジ検出点を再生ユニットの区切りとする。 As in processing method 2 shown in FIG. 34 (B), when the voice section is continuously longer than 20 seconds and the scene change section is 20 seconds or less, the scene change break is closest to 15 seconds. The detection point is defined as a playback unit break.

図３４の（Ｃ）に示す処理法３のように、音声連続して２０秒より長く、シーンチェンジ区間が２０秒より長い場合には、音声セグメント、シーンチェンジによらず、再生ユニットが２０秒になったらその時点で区切りとする。 When the voice is continuously longer than 20 seconds and the scene change section is longer than 20 seconds as in the processing method 3 shown in FIG. 34C, the playback unit is 20 seconds regardless of the voice segment and the scene change. When it becomes, it is set as a delimiter at that time.

図３４の（Ｄ）に示す処理法４のように、音声特徴の属性が１０秒〜２０秒の範囲で変化する場合には、その属性変化点を再生ユニットの区切り点とする。 When the attribute of the voice feature changes in the range of 10 to 20 seconds as in the processing method 4 shown in FIG. 34D, the attribute change point is set as the break point of the playback unit.

図３４の（Ｅ）に示す処理法５のように、ＣＭ（コマーシャル）検出した場合には、ＣＭ検出点を再生ユニットの区切り点とする。 When the CM (commercial) is detected as in the processing method 5 shown in FIG. 34E, the CM detection point is set as a break point of the playback unit.

ここで、ＣＭ検出の方法について図３５を参照して説明する。 Here, a CM detection method will be described with reference to FIG.

一般的に放送番組のＣＭの区間長は所定時間長（通常一般的には、１５秒又は３０秒又は６０秒）であり、ＣＭの区切り点（開始、終了時点）ではシーンチェンジがあるので、上記所定時間長の検出とシーンチェンジ検出をすることで、図３６に示すようにＣＭを検出することが可能である。 In general, the CM section length of a broadcast program is a predetermined time length (usually generally 15 seconds, 30 seconds or 60 seconds), and there is a scene change at the breakpoint (start and end time) of the CM. By detecting the predetermined time length and detecting a scene change, it is possible to detect a CM as shown in FIG.

（無音区間の場合（音声の平均レベルが所定以下の場合））
図３５の（Ａ）に示す処理法６のように、無音区間（音声平均レベルが所定以下の区間）が２０秒より長く、シーンチェンジ検出区間長が２０秒以下の場合には、１５秒に最も近いシーンチェンジ検出点を再生ユニットの区切り点とする。 (In case of silent section (when the average level of audio is below a certain level))
As in the processing method 6 shown in FIG. 35A, when the silent section (the section where the sound average level is not more than a predetermined value) is longer than 20 seconds and the scene change detection section length is 20 seconds or less, it is set to 15 seconds. The closest scene change detection point is set as the break point of the playback unit.

図３５の（Ｂ）に示す処理法７のように、無音区間が２０秒より長く、シーンチェンジ検出区間が２０秒より長い場合には、シーンチェンジ検出点によらず、再生ユニットの開始点から２０秒の時点で区切り点とする。 When the silence interval is longer than 20 seconds and the scene change detection interval is longer than 20 seconds as in the processing method 7 shown in FIG. 35B, the start point of the playback unit is used regardless of the scene change detection point. A breakpoint is set at 20 seconds.

上記いずれの再生ユニット処理の説明で、再生ユニットの開始点の初期値は、そのプログラム（放送番組）を記録した開始時点とする。 In any of the above description of the playback unit processing, the initial value of the start point of the playback unit is the start time when the program (broadcast program) is recorded.

上記のような再生ユニット処理により、所定の音声特徴、所定の映像特徴（シーンチェンジ特徴）に応じた所定の再生単位を再生することができる。 Through the playback unit processing as described above, a predetermined playback unit corresponding to a predetermined audio feature and a predetermined video feature (scene change feature) can be played back.

（再生ユニット生成処理系ブロック構成例）
上記で説明した再生ユニットの生成する処理系と、後で説明する、この再生ユニットに特徴データを入れ込むユニット化特徴データ処理系のブロック構成例を図３７に示す。 (Example of playback unit generation processing system block configuration)
FIG. 37 shows a block configuration example of a processing system generated by the playback unit described above and a unitized feature data processing system which will be described later and in which feature data is inserted into the playback unit.

要約再生、チャプター点設定などの所定時点設定処理は、再生ユニットの開始点、終点に設定するので、上記で説明した再生ユニット毎に特徴抽出データを対応付けて処理を行う。 Predetermined time point setting processing such as summary playback and chapter point setting is performed at the start point and end point of the playback unit, and thus processing is performed in association with the feature extraction data for each playback unit described above.

すなわち、所定区間毎に特徴抽出した所定の各特徴抽出データ、音声系特徴抽出データ、映像系特徴抽出データを再生ユニットの区間に基づいて反映させる処理を行う。 That is, processing is performed to reflect each predetermined feature extraction data, audio feature extraction data, and video feature extraction data extracted for each predetermined section based on the section of the playback unit.

ここで、図３７のブロック構成例について説明する。 Here, a block configuration example of FIG. 37 will be described.

図３７に示すブロック構成例では、無音判定情報データが時間計測系３０１に入力され、上記で説明した再生ユニット処理に基づく所定の間隔（時間長）が計測され、その処理出力が再生ユニット処理系３０２に入力される。 In the block configuration example shown in FIG. 37, silence determination information data is input to the time measurement system 301, a predetermined interval (time length) based on the playback unit processing described above is measured, and the processing output is the playback unit processing system. 302 is input.

再生ユニット処理系３０２は、シーンチェンジ判定情報データとＣＭ検出判定情報データも入力され、上記再生ユニット処理の各処理方法の説明で行ったような信号処理を行い所定の再生ユニットを生成する。 The playback unit processing system 302 also receives scene change determination information data and CM detection determination information data, and performs signal processing as described in the description of each processing method of the playback unit processing to generate a predetermined playback unit.

ここで、ＣＭ検出系３０４は、無音特徴検出情報データとシーンチェンジ特徴情報データ、それにＣＭが放送されている番組のチャンネルかを判定するチャンネル情報が入力され、図３６で説明したような所定の信号処理方法により、ＣＭ検出処理を行う。 Here, the CM detection system 304 receives silence feature detection information data, scene change feature information data, and channel information for determining whether the channel of the program on which the CM is broadcast is input. CM detection processing is performed by a signal processing method.

再生ユニット特徴データ処理系３０３は、音声属性情報、無音情報などの音声系特徴データと、シーンチェンジ特徴、色特徴、類似画像特徴、人物特徴、テロップ特徴、人物特徴などの各特徴データが入力され、後で説明するように再生ユニットに各特徴抽出データを入れ込む処理を行っている。 The playback unit feature data processing system 303 receives voice feature data such as voice attribute information and silence information, and feature data such as scene change features, color features, similar image features, person features, telop features, and person features. Then, as will be described later, the process of inserting each feature extraction data into the reproduction unit is performed.

（５）再生ユニット化特徴データ処理
次に特徴データファイル処理について説明する。 (5) Playback Unitized Feature Data Processing Next, feature data file processing will be described.

特徴抽出するデータは、音声系特徴抽出データ及び映像系（画像系）特徴抽出データがある。 The feature extraction data includes audio feature extraction data and video (image) feature extraction data.

この特徴データ処理は、上記で説明した再生ユニットに特徴抽出した各音声系、映像系特徴データを入れ込む処理を行った結果のデータ（データファイル）であり、再生ユニット毎に各種特徴データが所定の記録媒体に記録される。 This feature data processing is data (data file) obtained as a result of performing the process of inserting the audio system and video system feature data extracted in the playback unit described above. Various feature data is predetermined for each playback unit. Recorded on the recording medium.

ここで、再生ユニット毎に特徴データを記録することを考えるが、各特徴データを所定の検出区間通りに検出した各特徴データを所定記録媒体に記録して、その後、上記した再生ユニットの所定区間に応じた特徴データに処理を行うことも考えられる。 Here, it is considered that feature data is recorded for each reproduction unit. However, each feature data obtained by detecting each feature data according to a predetermined detection section is recorded on a predetermined recording medium. It is also conceivable to perform processing on feature data according to the above.

特徴抽出データは、音声信号（音声データ）、画像（映像）信号（画像（映像）データ）から、所定の特性信号（特性データ）を取り出して、その取り出した信号（データ）を所定の処理を行うことで、音声、画像の特徴を示す特徴データとして考えることができるが、ここでは、特別な注意書きをする場合を除き、特性データ（特性信号）から所定の処理を行って特徴を示す信号（データ）も特徴データ（特徴信号）又は特徴抽出データ（特徴抽出信号）と記述するものとする。 Feature extraction data is obtained by extracting a predetermined characteristic signal (characteristic data) from an audio signal (audio data) and an image (video) signal (image (video) data), and subjecting the extracted signal (data) to predetermined processing. By doing this, it can be considered as feature data indicating the characteristics of audio and images, but here, unless otherwise noted, a signal indicating the characteristics by performing predetermined processing from the characteristic data (characteristic signal) (Data) is also described as feature data (feature signal) or feature extraction data (feature extraction signal).

映像（画像）信号は、ＭＰＥＧストリームから特性データとしてＩピクチャーにおける輝度信号（Ｙ信号）、色信号（色差信号）（Ｃｂ、Ｃｒ信号）のＤＣＴのＤＣ係数、Ｂ又はＰピクチャーの動きベクトル（モーションベクトル）データ、また、ＤＣＴのＡＣ係数をそれぞれ取り出し、取り出した画面位置情報、所定しきい値、相関演算などから、シーンチェンジ特徴（ｓｃｎ特徴）、カメラ動作特徴（カメラ特徴）（ｃａｍ特徴）、類似画像特徴（類似シーン特徴又はシーンＩＤ特徴）（ｓｉｄ特徴）、テロップ特徴（ｔｌｐ特徴）、色特徴（カラー特徴）（ｃｏｌ特徴）、人物特徴（Ｐｅｒｓｏｎ特徴）などをそれぞれ考えることができる。 The video (image) signal includes, as characteristic data from the MPEG stream, a luminance signal (Y signal) in an I picture, a DC signal of a DCT of a color signal (color difference signal) (Cb, Cr signal), a motion vector (motion of a B or P picture) Vector) data and AC coefficients of DCT are extracted, and from the extracted screen position information, predetermined threshold value, correlation calculation, etc., scene change feature (scn feature), camera operation feature (camera feature) (cam feature), Similar image features (similar scene features or scene ID features) (sid features), telop features (tlp features), color features (color features) (col features), person features (Person features), and the like can be considered.

音声信号は、特性データ処理として、例えば、約２０ｍｓ毎に平均レベルが演算処理されこの演算データと所定しきい値とから、所定区間における音声信号の属性（種別）、平均パワー（平均レベル）などの音声特徴（ｓｅｇ特徴）を考えることができる。 For example, an average level of an audio signal is calculated every about 20 ms as characteristic data processing, and the attribute (type) of the audio signal in a predetermined section, average power (average level), etc. are calculated from the calculated data and a predetermined threshold value. Voice features (seg features) can be considered.

ここでは、音声属性として、話者音声、音楽（楽音）、その他音声（スポーツ番組などにおける歓声などを想定することができる）を考えることができる。 Here, speaker voice, music (musical sound), and other voices (cheers in sports programs can be assumed) can be considered as voice attributes.

５．１特徴抽出データファイルの構成
図３８に示す特徴抽出データファイルの構成例１は、上記した音声系特徴データ、ｓｃｎ特徴、ｃａｍ特徴、ｓｉｄ特徴、ｔｌｐ特徴、ｃｏｌ特徴、Ｐｅｒｓｏｎ特徴などの映像系特徴データを各々別々の特徴データファイルとする場合の例である。 5.1 Configuration of Feature Extraction Data File Configuration example 1 of the feature extraction data file shown in FIG. 38 is a video of the above-described audio system feature data, scn feature, cam feature, sid feature, tlp feature, col feature, Person feature, etc. This is an example in which system feature data is made into separate feature data files.

各々の特徴データファイルはテキスト形式のデータ又は、バイナリー形式のデータで書き込まれている。 Each feature data file is written in text format data or binary format data.

なお、これらの特徴データは、所定の記録媒体に記録するファイルデータとしての他に、通常のデータとして所定の記録媒体（半導体メモリーなど）に一時的に記憶（記録）して、後で説明する要約リストデータ生成や所定設定時点生成（チャプター点）などの所定の処理のために読み出して用いることも考えられる。以下に説明する図３９、図４０の場合も同様である。 These characteristic data are temporarily stored (recorded) in a predetermined recording medium (such as a semiconductor memory) as normal data in addition to file data to be recorded in a predetermined recording medium, and will be described later. It is also conceivable to read and use for a predetermined process such as summary list data generation or predetermined set time point generation (chapter point). The same applies to FIGS. 39 and 40 described below.

図３９に示す例２は、上記したすべての音声系特徴データをテキスト形式又は、バイナリー形式の一つのファイルとしてまとめ、上記したすべての映像系特徴データをテキスト形式又は、バイナリー形式の一つのファイルとしてまとめた場合の例である。 In Example 2 shown in FIG. 39, all the above-mentioned audio system feature data is collected as one file in text format or binary format, and all the above-mentioned video system feature data is stored as one file in text format or binary format. It is an example in the case of putting it together.

図４０に示す例３は、上記したすべての音声系特徴データ及び、上記したすべての映像系特徴データをテキスト形式又は、バイナリー形式の一つのファイルとしてまとめた場合の例である。 Example 3 shown in FIG. 40 is an example in the case where all the audio system feature data and all the video system feature data described above are collected as one file in a text format or a binary format.

このように、一つのファイルとしてまとめることにより、図３８の例１の場合と比較してファイル数が一つだけなので、ファイルとしての扱いが簡単になり、さらにバイナリー形式とすると、データサイズ（ファイルサイズ、ファイル容量）が小さくなり効率的になる。 In this way, as a single file, the number of files is only one compared to the case of Example 1 in FIG. 38, so that it is easy to handle as a file, and further in binary format, the data size (file Size and file capacity) will be smaller and more efficient.

ここでは、特徴データファイルを図４０の例３に示すような場合で特徴データをバイナリー形式で書き込む場合について説明する。 Here, a case will be described in which feature data is written in a binary format when the feature data file is as shown in Example 3 of FIG.

また、図４０の例３は、図３９の例２において、すべての音声系特徴データをバイナリー形式で記述したデータと、すべての映像系特徴データをバイナリー形式で記述したデータを一緒にしたものと考えることができる。 In addition, Example 3 in FIG. 40 is the same as Example 2 in FIG. 39, in which all audio system feature data is described in binary format and all video system feature data is described in binary format. Can think.

上記のことから、特徴データファイルにおける以下の説明における音声系特徴データの処理方法（記述方法）は、図３９の例２における音声系特徴データについて適用することができ、映像系特徴データの処理方法（記述方法）は、図３９の例２における映像系特徴データについて適用することができる。 From the above, the audio system feature data processing method (description method) in the following description of the feature data file can be applied to the audio system feature data in Example 2 of FIG. The (description method) can be applied to the video system feature data in Example 2 of FIG.

５．２特徴データの階層構造
再生ユニットを単位とする特徴データの階層構造を図４１に示す。 5.2 Hierarchical Structure of Feature Data FIG. 41 shows a hierarchical structure of feature data in units of playback units.

以下の説明では、本発明における所定の所定処理単位（再生ユニット）における所定の特徴データ処理を考えるものとする。 In the following description, predetermined feature data processing in a predetermined predetermined processing unit (reproduction unit) in the present invention will be considered.

図４１に示すように、特徴データヘッダ情報、プログラム１特徴データ、プログラム２特徴データなどで構成される。 As shown in FIG. 41, it is composed of feature data header information, program 1 feature data, program 2 feature data, and the like.

特徴データヘッダ情報は、図４２に示すように、プログラム１、プログラム２、などプログラム全体の総記録時間、記録開始、記録終了時刻、プログラム数（番組数）、その他情報などの所定データから構成されている。 As shown in FIG. 42, the characteristic data header information is composed of predetermined data such as the total recording time of the whole program such as the program 1 and the program 2, the recording start time, the recording end time, the number of programs (number of programs), and other information. ing.

次にプログラム１特徴データを例に、プログラム（番組）の特徴データについて説明する。 Next, program (program) feature data will be described using program 1 feature data as an example.

図４１に示すように、プログラム１特徴データは、プログラム１情報、再生ユニット１情報、再生ユニット２情報などで構成されている。 As shown in FIG. 41, the program 1 feature data includes program 1 information, playback unit 1 information, playback unit 2 information, and the like.

図４２に示すように、プログラム１情報は、プログラム記録時間、プログラム開始、終了時刻、プログラムジャンル（番組ジャンル）、その他情報などの所定データで構成されている。 As shown in FIG. 42, the program 1 information includes predetermined data such as program recording time, program start time, end time, program genre (program genre), and other information.

次に、再生ユニット１情報を例に、再生ユニットのデータ構造について説明する。 Next, the data structure of the playback unit will be described using the playback unit 1 information as an example.

図４１に示すように再生ユニット１情報は、音声特徴データ、映像特徴データで構成されている。 As shown in FIG. 41, the playback unit 1 information is composed of audio feature data and video feature data.

（音声系特徴データの構成）
図４１に示すように、音声系特徴データは、系列番号情報、開始終了位置情報、音声属性情報、特徴データ、その他情報データなどで構成されている。 (Structure of voice system feature data)
As shown in FIG. 41, the audio system feature data is composed of sequence number information, start / end position information, audio attribute information, feature data, other information data, and the like.

（映像系特徴データの構成）
図４１に示すように、映像系特徴データは、シーンチェンジ特徴、色特徴、類似画像特徴、人物特徴、テロップ特徴、カメラ特徴などの各所定の特徴情報データから構成されている。 (Configuration of video feature data)
As shown in FIG. 41, the video system feature data is composed of predetermined feature information data such as a scene change feature, a color feature, a similar image feature, a person feature, a telop feature, and a camera feature.

以下のシーンチェンジ特徴、色特徴、類似画像特徴、人物特徴、テロップ特徴、カメラ特徴など各特徴データの説明で、全ての所定区間でその各項目の特徴データを所定記録媒体に記録（書き込み処理）する処理の他に、例えば、所定しきい値以上の特徴データが検出された場合のみ、そのデータとして所定の記録媒体に記録（書き込み処理）するように所定データ処理を行う方法も考えられる。 In the following explanation of each feature data such as scene change feature, color feature, similar image feature, person feature, telop feature, camera feature, etc., the feature data of each item is recorded on a predetermined recording medium in every predetermined section (writing process) In addition to the processing to be performed, for example, a method of performing predetermined data processing so as to record (write processing) as predetermined data on a predetermined recording medium only when feature data of a predetermined threshold value or more is detected is also conceivable.

このように、所定しきい値以上の特徴データが検出された場合のみ、所定のデータ処理を行う場合には、しいき値より小さい場合には所定の特徴データが書き込まれないので、しきい値以上の特徴データが検出されて所定の記録（書き込み）処理が行われ、最初から何番目の特徴データ検出かを知る場合には、下記で説明する系列番号情報から知ることができる。 As described above, when the predetermined data processing is performed only when feature data equal to or greater than the predetermined threshold is detected, the predetermined feature data is not written when the threshold data is smaller than the threshold value. When the above feature data is detected, a predetermined recording (writing) process is performed, and the number of feature data detected from the beginning can be known from the sequence number information described below.

（シーンチェンジ特徴）
図４３に示すように、系列番号情報、開始終了位置情報、特徴データ、その他データからなる。 (Scene change feature)
As shown in FIG. 43, it consists of sequence number information, start / end position information, feature data, and other data.

ここで、系列番号情報は、０、１、２、３、・・・とそのプログラム（方法番組）の始めからのシーンチェンジが起きた順番を示す情報である。 Here, the sequence number information is information indicating the order in which scene changes have occurred from the beginning of 0, 1, 2, 3,... And the program (method program).

開始終了位置情報は、上記各順番のシーンチェンジの開始終了の位置を示す情報データで、フレーム（フィールド）番号、ＰＴＳ、ＤＴＳ、時間などの情報データを用いることが考えられる。 The start / end position information is information data indicating the start / end positions of the scene changes in each order, and information data such as a frame (field) number, PTS, DTS, and time may be used.

（色特徴）
図４３に示すように、系列番号情報、検出領域を識別する情報データ、開始終了位置情報データ、特徴データ、その他データなどからなる。 (Color features)
As shown in FIG. 43, it includes sequence number information, information data for identifying a detection area, start / end position information data, feature data, and other data.

ここで、系列番号情報は、０、１、２、３、・・・とそのプログラム（方法番組）の始めからの色特徴検出の順番を示す情報である。 Here, the sequence number information is information indicating the order of color feature detection from the beginning of 0, 1, 2, 3,... And its program (method program).

開始終了位置情報は、上記各順番における色特徴検出で、各領域の特徴検出した開始終了の位置を示す情報データで、フレーム（フィールド）番号、ＰＴＳ、ＤＴＳ、時間などの情報データを用いることが考えられる。 The start / end position information is information data indicating the start / end position where the feature detection of each area is performed in the color feature detection in each order, and information data such as a frame (field) number, PTS, DTS, and time is used. Conceivable.

特徴データは、例えば、ＲＧＢ、Ｙ、Ｃｂ、Ｃｒなどのデータが考えられる。 For example, data such as RGB, Y, Cb, and Cr can be considered as the feature data.

（類似画像特徴）
図４３に示すように、系列番号情報、頻度情報開始終了位置情報、特徴データ、その他データなどからなる。 (Similar image features)
As shown in FIG. 43, it consists of sequence number information, frequency information start / end position information, feature data, and other data.

ここで、系列番号情報は、０、１、２、３、・・・とそのプログラム（方法番組）の始めからの類似画像特徴検出の順番を示す情報である。 Here, the sequence number information is information indicating 0, 1, 2, 3,... And the order of similar image feature detection from the beginning of the program (method program).

特徴データとしては、上記したような有効画面を所定数の領域に分割（例えば２５分割）した各分割領域のＤＣＴの平均ＤＣ係数などが考えられる。 As the feature data, the DCT average DC coefficient of each divided area obtained by dividing the effective screen as described above into a predetermined number of areas (for example, 25 divisions) can be considered.

（人物特徴）
図４３に示すように、系列番号情報、検出領域を識別する情報データ、開始終了位置情報データ、特徴データ、その他データなどからなる。 (Characteristics)
As shown in FIG. 43, it includes sequence number information, information data for identifying a detection area, start / end position information data, feature data, and other data.

ここで、系列番号情報は、０、１、２、３、・・・とそのプログラム（方法番組）の始めからの類似画像特徴検出の順番を示す情報である。
（テロップ特徴）
図４３に示すように、系列番号情報、検出領域を識別する情報データ、開始終了位置情報データ、特徴データ、その他データなどからなる。 Here, the sequence number information is information indicating 0, 1, 2, 3,... And the order of similar image feature detection from the beginning of the program (method program).
(Telop features)
As shown in FIG. 43, it includes sequence number information, information data for identifying a detection area, start / end position information data, feature data, and other data.

ここで、系列番号情報は、０、１、２、３、・・・とそのプログラム（方法番組）の始めからのテロップ特徴検出の順番を示す情報である。 Here, the sequence number information is information indicating the order of telop feature detection from the beginning of 0, 1, 2, 3,... And its program (method program).

（カメラ特徴）
図４３に示すように、系列番号情報、検出領域を識別する情報データ、開始終了位置情報データ、特徴データ、その他データなどからなる。 (Camera features)
As shown in FIG. 43, it includes sequence number information, information data for identifying a detection area, start / end position information data, feature data, and other data.

ここで、系列番号情報は、０、１、２、３、・・・とそのプログラム（方法番組）の始めからのカメラ特徴検出の順番を示す情報である。 Here, the sequence number information is information indicating the order of camera feature detection from the beginning of 0, 1, 2, 3,... And its program (method program).

ここで、放送番組を記録する場合に、放送番組の所定の記録処理と同時に、ここで説明する特徴抽出処理、特徴データの書き込み処理（記録処理）を行うことを考えることができるが、すでに、記録済みの放送番組や、そのた映画、ドラマその他画像音声ソフトについて、所定の特徴抽出処理を行い、特徴データファイルを生成することも考えることができる。 Here, when a broadcast program is recorded, it can be considered to perform a feature extraction process and a feature data writing process (recording process) described here simultaneously with a predetermined recording process of the broadcast program. It can be considered that a predetermined feature extraction process is performed on a recorded broadcast program, a movie, a drama, and other image / audio software to generate a feature data file.

プログラム１について上記のようにＰＵと特徴データを考えることができ、そのほかのプログラム２、プログラム３などを記録する場合にも、上記で説明したプログラム１の場合と同様にＰＵと特徴データを考えることができる。 As described above, PU and feature data can be considered for program 1, and when recording other programs 2, 3 and the like, PU and feature data are considered in the same manner as in program 1 described above. Can do.

（６）プレイリスト処理（要約再生リスト生成処理）
次に、上記した特徴抽出処理が生成したＰＵファイル（ＰＵ特徴データファイル）から、要約再生（ダイジェスト再生）を行うための要約データ処理に関する説明を行う。 (6) Playlist processing (summary playlist generation processing)
Next, the summary data processing for performing summary playback (digest playback) from the PU file (PU feature data file) generated by the above feature extraction processing will be described.

６．１要約ルール処理
本願でのべる特徴データを用いる要約再生（ダイジェスト再生）では、上記したＰＵを単位とする所定再生区間をスキップ再生処理することで所望のダイジェスト再生を行う。 6.1 Summary Rule Processing In summary playback (digest playback) using feature data described in the present application, desired digest playback is performed by performing skip playback processing on the above-described predetermined playback section in units of PUs.

６．２所定時点設定処理（プレイリストファイル）処理
次にプレイリストファイルについて説明する。 6.2 Predetermined Time Setting Process (Playlist File) Process Next, the playlist file will be described.

このファイルは、上記した特徴データに応じて意味付けされたＰＵ、又はＰＵの接合体（ＰＵの集合体、又はＰＵの連結体）の内どれを選択して再生処理を行うかの所定データの情報が所定の書式に応じて記述されているデータである。 This file contains predetermined data indicating which one of the PUs or PU joints (PU aggregates or PU concatenations) that have been given meaning according to the above-described feature data is to be selected for playback processing. It is data in which information is described according to a predetermined format.

ここで、このデータは特徴抽出の基となった画像音声データが記録された所定の記録媒体に記録する（書き込み処理）場合の他に、所定のメモリー手段に一時的に記憶する場合も考えられる。 Here, in addition to the case where the data is recorded on a predetermined recording medium on which the image / sound data on which the feature extraction is based (recording process) is recorded, the data may be temporarily stored in a predetermined memory unit. .

プレイリストファイルの一例を図４４の（Ａ）、（Ｂ）に示す。 An example of the playlist file is shown in FIGS. 44 (A) and 44 (B).

図４４の（Ａ）に示す例１における（ａ）の縦のデータ系列は、再生区間の開始位置情報のデータで、フレーム番号、時間（時刻）、ストリーム（圧縮された画像音声データ）からのＰＴＳ（プレゼンテーション・タイム・スタンプ）あるいは、ＤＴＳ（デコード・タイム・スタンプ）などの所定の情報データなども考えられる。 The vertical data series of (a) in Example 1 shown in (A) of FIG. 44 is data of the start position information of the playback section, and includes the frame number, time (time), and stream (compressed video and audio data). Predetermined information data such as PTS (presentation time stamp) or DTS (decode time stamp) is also conceivable.

図４４の（Ａ）に示す例１における（ｂ）の縦のデータ系列は再生区間の終了位置情報のデータで、上記例１の（ａ）のデータと対応して、フレーム番号、時間（時刻）、ストリーム（圧縮された画像音声データ）からのＰＴＳ（プレゼンテーション・タイム・スタンプ）あるいは、ＤＴＳ（デコード・タイム・スタンプ）などの所定の情報データなども考えられる。 The vertical data series (b) in Example 1 shown in (A) of FIG. 44 is the data of the end position information of the playback section, and corresponds to the data (a) in Example 1 above, and corresponds to the frame number, time (time). ), Predetermined information data such as PTS (Presentation Time Stamp) or DTS (Decode Time Stamp) from a stream (compressed video / audio data) is also conceivable.

図４４の（Ａ）に示す例１における（ｃ）の縦のデータ系列は、その再生ユニット（ＰＵ）、又は再生ユニット群（ＰＵ群）の重要度である。 The vertical data series of (c) in Example 1 shown in FIG. 44A is the importance of the playback unit (PU) or playback unit group (PU group).

図４４の（Ａ）に示す例１における（ｄ）縦のデータ系列は、要約ルールで規定された、又は設定された意味の文字データである。 The vertical data series (d) in Example 1 shown in FIG. 44A is character data having the meaning defined or set by the summary rule.

図４４の（Ｂ）に示す例２は、全てのＰＵ区間について意味文字と評価値（重要度）を記述し、再生区間、チャプター設定などの所定時点を示すために「１」、「０」の識別データを設けた場合の例である。 Example 2 shown in FIG. 44B describes semantic characters and evaluation values (importance) for all PU sections, and “1” and “0” are used to indicate predetermined points in time such as playback sections and chapter settings. This is an example when the identification data is provided.

図４４の（Ｂ）に示す例２の（ａ）（ｂ）で示される開始点、終了点は、次の段のデータと連続的になっているのが分かる。 It can be seen that the start point and end point shown in (a) and (b) of Example 2 shown in (B) of FIG. 44 are continuous with the data of the next stage.

例えば、図４４の（Ｂ）に示す例２において、最初の開始点０終了点２２９で、次の開始点２３０に連続的につながっている。 For example, in Example 2 shown in FIG. 44B, the first start point 0 and the end point 229 are continuously connected to the next start point 230.

図４４の（Ｂ）に示す例２における（ｅ）の縦のデータ系列は、要約再生を行うかどうかのフラグ情報データで、「１」の場合は再生を行う場合で、「０」の場合は再生を行わない場合である。 The vertical data series (e) in Example 2 shown in (B) of FIG. 44 is flag information data indicating whether or not summary playback is performed. When “1”, playback is performed and when “0” is performed. Is a case where no reproduction is performed.

また、「１」の最初の時点、「０」の最初の時点を所定時点設定点（チャプター点）と見ることも考えられる。
（７）動作フローチャート
図４５は、本発明の動作フローチャートの一例であり、これについて説明する。 It is also conceivable that the first time point of “1” and the first time point of “0” are regarded as a predetermined time point set point (chapter point).
(7) Operation Flowchart FIG. 45 is an example of an operation flowchart of the present invention, which will be described.

処理を開始すると、まず、最初のステップＳ１で記録モードか再生モードか判定され、記録モードの場合は記録処理（Ｒ）に、また、再生モードの場合はステップＳ２の処理に移行する。 When the process is started, it is first determined in step S1 whether the recording mode or the reproduction mode is selected. If the recording mode is selected, the process proceeds to the recording process (R), and if the reproduction mode is selected, the process proceeds to step S2.

７．１再生処理関係動作フローチャート
（再生処理動作フローチャートの一例）
再生モードの場合は、ステップＳ２で要約再生（ダイジェスト再生）モードか通常再生モードか判定され、通常再生モードの場合は通常再生処理（Ｐ）に移行する。 7.1 Reproduction Process Related Operation Flowchart (Example of Reproduction Process Operation Flowchart)
In the case of the reproduction mode, it is determined in step S2 whether the digest reproduction (digest reproduction) mode or the normal reproduction mode is selected, and in the case of the normal reproduction mode, the process proceeds to the normal reproduction process (P).

要約再生モードの場合は、ステップＳ３で所定の特徴抽出データが所定記録媒体に記録されているかの検出処理、又は所定ファイルデータとして記録媒体の所定記録領域に記録されているかの検出処理が判定処理される。 In the case of the digest playback mode, a determination process is performed in step S3 to determine whether or not the predetermined feature extraction data is recorded on the predetermined recording medium or whether or not the predetermined feature extraction data is recorded in the predetermined recording area of the recording medium as the predetermined file data. Is done.

ステップＳ３で所定の特徴抽出データが検出される場合には、ステップＳ４で所定のプレイリストデータ（データファイル）が所定記録媒体の所定記録領域に記録されているかが検出され、プレイリストデータ（プレイリストファイル）が検出される場合は、ステップＳ５で所定プレイリストデータを読出し処理する。 If predetermined feature extraction data is detected in step S3, it is detected in step S4 whether predetermined playlist data (data file) is recorded in a predetermined recording area of a predetermined recording medium. If a (list file) is detected, predetermined playlist data is read and processed in step S5.

ステップＳ３で所定の特徴抽出データが検出されないと判定される場合には、ステップＳ８でいま要約再生しようとする画像音声データ（プログラム、放送番組）を読み込んで所定の特徴抽出処理を行い、ステップＳ９で処理が終了したかが判定され終了しない場合はステップＳ８に戻り終了するまで処理を行う。 If it is determined in step S3 that the predetermined feature extraction data is not detected, in step S8, the image / audio data (program, broadcast program) to be summarized and reproduced is read and predetermined feature extraction processing is performed, and step S9 is performed. In step S8, it is determined whether or not the process has been completed.

ステップＳ９で所定の特徴抽出処理が終了したと判定された場合には、ステップＳ６に移行して所定のプレイリストデータ生成処理が行われる。 If it is determined in step S9 that the predetermined feature extraction process has been completed, the process proceeds to step S6, and a predetermined playlist data generation process is performed.

ステップＳ４で所定のプレイリストデータ（ファイル）が検出されないと判定される場合は、ステップＳ６において所定の記録媒体の所定記録領域に記録されている（又は記憶されている）所定の特徴抽出データを読み込み処理して所定のプレイリストデータ（ファイル）を生成処理して所定の記録媒体の所定領域に逐次、あるいは、処理が終了後データを書き込み、ステップＳ７で全てのプレイリスト生成処理が終了したかが判定され、終了しない場合はステップＳ６に戻り処理を繰り返し、Ｓ７で所定のプレイリストデータが全て生成されたと判定された場合は、ステップＳ５で書き込んだプレイリストデータを読み込み処理する。 If it is determined in step S4 that the predetermined playlist data (file) is not detected, the predetermined feature extraction data recorded (or stored) in the predetermined recording area of the predetermined recording medium in step S6. Whether read processing is performed to generate predetermined playlist data (file) and data is written sequentially to a predetermined area of a predetermined recording medium or after the processing is completed, and all playlist generation processing is completed in step S7 If the process is not completed, the process returns to step S6 to repeat the process. If it is determined in S7 that all predetermined play list data has been generated, the play list data written in step S5 is read and processed.

ここで、ステップＳ６において、逐次生成されたプレイリストデータは上記放送番組などの画像音声情報データが記録されている同じ記録媒体上の所定記録領域に、逐次記録するようにしても良いし、又は画像音声データが記録されたのとは別の記録媒体、例えば、装着、着脱可能な所定メモリー手段などに情報を書き込むようにしても良い。 Here, in step S6, the playlist data generated sequentially may be sequentially recorded in a predetermined recording area on the same recording medium in which the image / audio information data such as the broadcast program is recorded, or Information may be written on a recording medium other than the one on which the image / audio data is recorded, for example, a predetermined memory means that can be attached and detached.

この場合にも、所定プレイリストデータが逐次生成処理されると共に、逐次データを書き込む（記憶処理する）ようにしても良いし、所定プレイリストデータが全て生成処理され、プレイリスト処理が終了してから、生成された全てのプレイリストデータをまとめて記録（記憶）処理するようにしても良い。 Also in this case, the predetermined playlist data may be sequentially generated and written (stored) sequentially, or all the predetermined playlist data may be generated and the playlist processing is completed. Then, all the generated playlist data may be recorded (stored) together.

また、プレイリストデータは、後に図４６、図４７を参照して説明するように、記録時間に応じて、ユーザーが複数の要約再生時間を選択できるように、記録時間に応じて、複数のプレイリストデータを生成するようにしても良い。 In addition, as will be described later with reference to FIGS. 46 and 47, the playlist data is stored in a plurality of play modes in accordance with the recording time so that the user can select a plurality of summary playback times. List data may be generated.

ここでは、上記したように、所定ＰＵ区間、又は複数のＰＵ区間の接合された所定区間毎に、所定評価値も設定処理されるので、評価値に応じて要約再生時間を操作することができる。 Here, as described above, since the predetermined evaluation value is also set for each predetermined PU section or each predetermined section obtained by joining a plurality of PU sections, the summary playback time can be manipulated according to the evaluation value. .

ステップＳ１０で再生時間選択モードになり、ステップＳ１１で、ユーザーがすぐ再生時間を選択したか、又は要約再生モード選択した後プレイリストデータの検出処理終了後から所定時間ｔｍｏｄ内にユーザーが再生時間を選択処理したかが判定され、選択されない場合は、Ｓ１２でユーザーにより再生ストップが選択されたかが判定処理される。 In step S10, the playback time selection mode is set. In step S11, the user selects the playback time immediately, or after selecting the summary playback mode, the user sets the playback time within a predetermined time tmod after the completion of the playlist data detection process. It is determined whether or not the selection process has been performed. If the selection process has not been selected, it is determined whether or not the reproduction stop has been selected by the user in S12.

ステップＳ１２でユーザーにより再生ストップが選択された場合は処理を終了し、再生ストップでない場合はステップＳ１０に戻り上記所定の処理を繰り返す。 If the reproduction stop is selected by the user in step S12, the process ends. If not, the process returns to step S10 to repeat the predetermined process.

ステップＳ１１で、ユーザーが再生時間をすぐ選択した場合、又は上記所定時間のｔｍｏｄ内で再生時間を選択しない場合はステップＳ１３で要約再生動作処理に移行する。 If it is determined in step S11 that the user has immediately selected a playback time, or if no playback time is selected within the predetermined time tmod, the process proceeds to a summary playback operation process in step S13.

ここで、ユーザーが再生時間を選択した場合はその要約再生時間で、再生時間を選択しないで上記所定時間ｔｍｏｄ経過した場合は、所定のデフォルト設定再生時間（所期設定再生時間）ｔｐｂ０が設定される。 Here, when the user selects the playback time, the summary playback time is set. When the predetermined time tmod has elapsed without selecting the playback time, a predetermined default set playback time (preset playback time) tpb0 is set. The

ここで、ユーザーによりダイジェスト再生時間を任意に選択できるようにしても良いし、記録したプログラム記録時間とプレイリストデータに基づいた、あらかじめ設定された再生時間から選択処理できるようにしても良い。 Here, the digest playback time may be arbitrarily selected by the user, or may be selected from a preset playback time based on the recorded program recording time and playlist data.

この場合、例えば、５分、１０分、１５分、２０分、３０分など時間が考えられ、デフォルトの要約再生時間は、記録時間に応じて、例えば、図４６のように設定することも考えられる。 In this case, for example, a time of 5 minutes, 10 minutes, 15 minutes, 20 minutes, 30 minutes, etc. can be considered, and the default summary playback time may be set as shown in FIG. 46 according to the recording time, for example. It is done.

図４６の例では、所定記録時間以上（Ｔｒｅｃｍｉｎ）の場合にのみ要約再生モードが設定できるようにして、この所定記録時間Ｔｒｅｃｍｉｎとして、記録時間Ｔｒｅｃが１０分未満の場合は、時間が短いので、要約再生は設定されず通常再生のみとしている。 In the example of FIG. 46, the summary playback mode can be set only when the recording time is equal to or longer than the predetermined recording time (Trecmin). As the predetermined recording time Trecmin, when the recording time Trec is less than 10 minutes, the time is short. Summary playback is not set and normal playback only.

一例として例えば、図４６から記録時間Ｔｒｅｃが６０分の場合は、ユーザーによる選択可能な要約再生時間は、１０分、１５分、３０分、４０分となり、デフォルトの設定時間は、３０分となる。 As an example, for example, when the recording time Trec is 60 minutes from FIG. 46, the summary playback time that can be selected by the user is 10, 15, 30, 40 minutes, and the default setting time is 30 minutes. .

図４６に示す例では、記録時間Ｔｒｅｃが長くなるほど、ユーザーによる選択可能な要約再生時間の選択数が多くなっているが、上記したように、記録時間が短い場合は、スキップ再生処理による要約再生でスキップ処理される総区間が多くなると、それだけ情報が欠落することになり、再生内容が把握できなくなることが考えられるので選択数を少なくし、適切な要約時間の選択が行えるようにし、それに比較して記録時間が長い場合は、情報量が多いので選択数を多くしてユーザーによる効果的、有効な動作が行えるようにしている。 In the example shown in FIG. 46, as the recording time Trec increases, the number of summary playback times that can be selected by the user increases. As described above, when the recording time is short, summary playback by skip playback processing is performed. If the total number of skipped sections increases, information will be lost, and it may be impossible to grasp the playback content, so the number of selections can be reduced, and an appropriate summary time can be selected and compared. When the recording time is long, the amount of information is large, so the number of selections is increased so that the user can perform an effective and effective operation.

このようなユーザーによる選択可能な要約再生時間の一覧、デフォルトの再生時間などの情報は、本発明を適用した記録再生装置における所定表示手段又は、その装置に接続された所定の表示手段、又は装置のリモコン上における液晶などの所定表示画面などに表示することが考えられる。 Information such as a list of summary playback times that can be selected by the user, default playback time, and the like is stored in the recording / playback apparatus to which the present invention is applied, or the predetermined display means or apparatus connected to the apparatus. Display on a predetermined display screen such as a liquid crystal on the remote control of the display.

ここで、プレイリスト生成処理と同時に、チャプター設定処理を行うこともでき、記録時間に応じて図４４に示すように、設定可能なチャプター数に応じて自動的に所定のチャプター設定処理が行われる。 Here, the chapter setting process can also be performed simultaneously with the playlist generation process. As shown in FIG. 44 according to the recording time, a predetermined chapter setting process is automatically performed according to the number of settable chapters. .

例えば、図４４から記録時間が１時間の場合は、５〜４０個のチャプターが設定されるように所定の信号処理が行われる。 For example, from FIG. 44, when the recording time is 1 hour, predetermined signal processing is performed so that 5 to 40 chapters are set.

ステップＳ１３では要約再生動作が行われるが、上記したように、所定ＰＵ区間又は複数のＰＵ区間の接合区間毎に所定評価値が設定されているので、設定時間と評価値に応じてスキップ再生処理が行われ、それにより要約再生が行われる。 In step S13, the summary playback operation is performed. As described above, since the predetermined evaluation value is set for each of the predetermined PU sections or the joint sections of the plurality of PU sections, the skip playback processing is performed according to the set time and the evaluation value. Thus, summary reproduction is performed.

すなわち、評価値が高いＰＵ区間から最優先して順次選択され、選択した要約再生時間にできるだけ近くなるように、順次、上記最優先評価値に比較して評価値の小さい区間を選択処理していく。 That is, the PU section with the highest evaluation value is sequentially selected with the highest priority, and the section with the smaller evaluation value is sequentially selected and compared with the highest priority evaluation value so as to be as close as possible to the selected summary playback time. Go.

ステップＳ１４では再生動作を終了するか判定され、終了の場合は処理を終了し、終了しない場合はステップＳ１５で再生している所定プログラム（番組）が終了したか判定され、終了の場合は処理を終了し終了しない場合は、ステップＳ１６に移行し再生時間を変更するか判定する。 In step S14, it is determined whether or not the reproduction operation is to be terminated. If it is terminated, the process is terminated. If not, it is determined whether or not the predetermined program (program) being reproduced is terminated in step S15. If it is finished but not finished, the process proceeds to step S16 to determine whether to change the playback time.

ステップＳ１６で再生時間を変更する場合はステップＳ１０に戻り上記処理を繰り返し、変更しない場合はステップＳ１３に戻り、要約再生動作を繰り返す。 If the playback time is changed in step S16, the process returns to step S10 and the above processing is repeated. If not changed, the process returns to step S13 and the summary playback operation is repeated.

７．２記録処理関係動作フローチャート
（記録処理動作フローチャートの一例）
記録モードの場合における動作フローチャートの例を図４８に示す。 7.2 Recording process-related operation flowchart (an example of a recording process operation flowchart)
FIG. 48 shows an example of an operation flowchart in the case of the recording mode.

図４５に示したフローチャートのステップＳ１で記録モードが選択された場合は、図４８に示すフローチャートのステップＲ１でタイマー記録モードか通常記録モードかが判定され、通常記録モードの場合は、ステップＲ９に移行し通常記録動作を行う。 When the recording mode is selected in step S1 of the flowchart shown in FIG. 45, it is determined in step R1 of the flowchart shown in FIG. 48 whether the timer recording mode or the normal recording mode is selected. Transition to normal recording operation.

ステップＲ９の通常記録動作で所定の記録信号処理に移行して、ステップＲ１０においてＭＰＥＧなどの所定エンコード処理される画像音声データ、又はエンコード処理された画像音声データから所定の特徴抽出処理が行われる。 The routine proceeds to predetermined recording signal processing in the normal recording operation in step R9, and predetermined feature extraction processing is performed from the image / audio data subjected to predetermined encoding processing such as MPEG or the encoded audio / video data in step R10.

ここで、記録信号処理と特徴抽出信号処理は、同時に行うことができる。 Here, the recording signal processing and the feature extraction signal processing can be performed simultaneously.

所定エンコード処理される画像音声データについては、所定エンコード処理される途中の画像音声データを用いて所定の特徴抽出処理を行うもので、例えば、画像のＤＣＴ処理系からＤＣＴ信号処理のＤＣ係数データ、ＡＣ係数データなどを取り出すことができ、それら所定のデータを用いて所定信号処理を行うことでシーンチェンジ（特徴）検出（カット点（特徴）検出）、テロップ（特徴）検出など上記した各所定の特徴抽出信号処理を行う。 The image / audio data to be subjected to the predetermined encoding process is to perform a predetermined feature extraction process using the image / audio data in the middle of the predetermined encoding process. For example, from the DCT processing system of the image, the DC coefficient data of the DCT signal processing, AC coefficient data and the like can be taken out, and predetermined signal processing is performed using the predetermined data to detect each scene change (feature) detection (cut point (feature) detection), telop (feature) detection, etc. Performs feature extraction signal processing.

音声データは、所定の帯域圧縮信号処理における所定サブバンド信号処理において、所定サブバンド帯域におけるデータを用いることで、話者音声、音楽（楽音）判定検出などの信号処理を行うことが考えられる。 The audio data may be subjected to signal processing such as speaker voice and music (musical tone) determination detection by using data in a predetermined subband band in predetermined subband signal processing in predetermined band compression signal processing.

楽音判定信号処理については、例えば、所定サブバンド帯域におけるデータの継続性を判定することで判定処理を行うことが考えられる。 As for the musical tone determination signal processing, for example, it is conceivable to perform the determination processing by determining continuity of data in a predetermined subband band.

また、ベースバンド帯域の画像音声データを用いることも考えられ、例えば、画像のベースバンド信号を用いて、フレーム（又はフィールド）間差分信号処理によりシーンチェンジ検出処理や、その差分信号によるエッジ検出によりテロップ特徴信号処理など、その他所定の特徴抽出信号処理を行うことが考えられる。 It is also conceivable to use video / audio data in the baseband band. For example, by using the baseband signal of the image, scene change detection processing by frame (or field) differential signal processing, or edge detection by the differential signal. It is conceivable to perform other predetermined feature extraction signal processing such as telop feature signal processing.

ここで、各画像、音声特徴抽出信号処理された特徴データ（特徴抽出データ）は、画像音声データが記録される同じ所定記録媒体、又は所定のバッファメモリーなどの所定データ記憶手段（データ記録手段）に記録する。 Here, each image and the feature data (feature extraction data) subjected to the audio feature extraction signal processing are the same predetermined recording medium on which the image audio data is recorded, or predetermined data storage means (data recording means) such as a predetermined buffer memory. To record.

ステップＲ１１で通常記録モード終了か判定され、終了ではない場合はステップＲ９に戻り、上記動作を繰り返し、終了の場合は、ステップＲ１２に移行しプレイリストデータ生成処理（又はチャプターデータ生成処理）に移行する。 In step R11, it is determined whether or not the normal recording mode is ended. If not, the process returns to step R9, and the above operation is repeated. If it is ended, the process proceeds to step R12 and shifts to playlist data generation processing (or chapter data generation processing). To do.

ステップＲ１でタイマー記録モードの場合は、ステップＲ２で記録開始、記録終了時刻設定を行い、ステップＲ３で所定の動作時刻か判定され、所定時刻ではない場合は、ステップＲ７で動作待機し、ステップＲ８でユーザーによりタイマー動作解除の割り込み処理が行われたか判定され、タイマー動作を継続する場合は、ステップＲ３に戻り上記動作を繰り返す。 If the timer recording mode is set at step R1, the recording start time and the recording end time are set at step R2, and it is determined at step R3 whether the predetermined operation time is reached. If it is not the predetermined time, the operation waits at step R7, and step R8 In step S3, it is determined whether the timer operation cancellation interrupt process has been performed by the user. If the timer operation is to be continued, the process returns to step R3 to repeat the above operation.

ステップＲ８でタイマー動作が解除された場合は、図４５のステップＳ１に戻り、最初の動作モード選択処理を行う。 If the timer operation is canceled in step R8, the process returns to step S1 in FIG. 45 and the first operation mode selection process is performed.

ステップＲ３で所定の記録動作時刻になったと判定されたら、記録動作を開始し、上記したステップＲ９〜ステップＲ１１と同様の動作をステップＲ４〜ステップＲ６で行う。 If it is determined in step R3 that the predetermined recording operation time has come, the recording operation is started, and the same operations as in steps R9 to R11 described above are performed in steps R4 to R6.

特徴データは、上記したように、各画像、音声特徴抽出信号処理された特徴データ（特徴抽出データ）は画像音声データが記録される同じ所定記録媒体、又は所定のバッファメモリーなどの所定データ記憶手段（データ記録手段）に記録する。ステップＲ６で記録終了時刻と判定された場合は、ステップＲ１２に移行してプレイリストデータ生成処理又はチャプターデータ生成処理を行う。 As described above, the feature data is each image, and the feature data processed by the audio feature extraction signal (feature extraction data) is the same predetermined recording medium on which the image / audio data is recorded, or predetermined data storage means such as a predetermined buffer memory. (Data recording means) If it is determined in step R6 that the recording end time is reached, the process proceeds to step R12 to perform playlist data generation processing or chapter data generation processing.

ステップＲ１２では、上記各種の所定特徴抽出処理された特徴データ（特徴抽出処理された所定特徴データを所定の加工処理、所定の信号処理を施したデータ、それらデータを用いて所定判定処理を行ったデータなども含む）を所定記録媒体から読出し処理を行い、所定のプレイリストデータ（ファイル）生成処理、チャプターデータ生成処理を行う。 In step R12, the above-described various feature data subjected to the predetermined feature extraction processing (the predetermined feature data subjected to the feature extraction processing is subjected to predetermined processing, predetermined signal processing, and predetermined determination processing is performed using these data. Data (including data) is read out from a predetermined recording medium, and predetermined playlist data (file) generation processing and chapter data generation processing are performed.

生成されたプレイリストデータ、チャプターデータは、所定記録媒体に記録され、ステップＲ１３で生成処理が終了したか判定処理され、終了しない場合は、ステップＲ１２に戻り上記処理動作を繰り返し、ステップＲ１３で終了したと判定された場合は動作を終了する。 The generated playlist data and chapter data are recorded on a predetermined recording medium, and it is determined whether or not the generation process is completed in step R13. If not, the process returns to step R12 and repeats the above processing operations, and the process ends in step R13. If it is determined that the operation has been performed, the operation is terminated.

ここで、上記プレイリストデータ、チャプターデータは、逐次、データの生成処理と同時に所定記録媒体に記録する場合の他に、上記処理対象にしている所定の放送番組、プログラム、又は所定記録区間に対する所定プレイリストデータ、チャプターデータの全ての生成処理が終了した後に、まとめて所定記録媒体に記録するようにしても良い。
（特徴抽出処理と平行してプレイリストデータ（チャプター）処理を行う場合） Here, in addition to the case where the playlist data and chapter data are sequentially recorded on a predetermined recording medium simultaneously with the data generation processing, the predetermined broadcast program, program, or predetermined recording section to be processed is predetermined. After all the generation processing of the play list data and the chapter data is completed, they may be recorded together on a predetermined recording medium.
(When performing playlist data (chapter) processing in parallel with feature extraction processing)

ここで、上記説明では、所定の放送番組、プログラムなど画像音声情報データの記録処理と同時に所定の特徴抽出処理を行い、特徴抽出処理した各種の特徴データ（特徴抽出データ、又は特徴抽出データを用いて所定の加工、所定の信号処理を施した信号を含む）を所定の記録媒体に記録して、上記所定の放送番組、プログラムが終了した後、記録した特徴データを読み出して、プレイリストデータ（ファイル）、チャプターデータなどを生成処理する場合を述べたが、上記特徴抽出処理と同時に、又は特徴抽出処理と平行してプレイリストデータ（ファイル）、チャプターデータ生成処理を行うことも考えられる。 Here, in the above description, predetermined feature extraction processing is performed simultaneously with recording processing of image / audio information data such as predetermined broadcast programs and programs, and various feature data (feature extraction data or feature extraction data) subjected to feature extraction processing are used. (Including a signal subjected to predetermined processing and predetermined signal processing) on a predetermined recording medium, and after the predetermined broadcast program or program ends, the recorded feature data is read out and playlist data ( Although the case of generating a file), chapter data, etc. has been described, it is also conceivable to perform playlist data (file), chapter data generation processing simultaneously with the feature extraction processing or in parallel with the feature extraction processing.

７．３再生ユニット処理関係動作フローチャート
（ＰＵ処理で所定データ区間毎に音声セグメント処理とシーンチェンジ処理を行う場合の動作フローチャート）
上記したＰＵ信号処理の場合で、音声セグメント検出点とシーンチェンジ検出点から所定信号処理を行う動作フローチャートの一例を図４９に示す。 7.3 Operational flowchart related to playback unit processing (operational flowchart when performing audio segment processing and scene change processing for each predetermined data section in PU processing)
FIG. 49 shows an example of an operation flowchart for performing predetermined signal processing from the audio segment detection point and the scene change detection point in the case of the PU signal processing described above.

処理を開始するとステップＰ１で画像音声情報データが記録されている所定記録媒体から音声データ、及び後で説明するシーンチェンジ検出処理のために画像データの所定サンプルデータ数を読出し処理して、ステップＰ２で読み出したデータをメモリーなど所定の記録手段であるデータバッファに記憶処理（書き込み処理、記録処理）を行っていく。 When the processing is started, in step P1, audio data and a predetermined number of sample data of image data are read out from a predetermined recording medium on which image / audio information data is recorded, and a scene change detection process described later, and step P2 The data read in (1) is subjected to storage processing (writing processing, recording processing) in a data buffer which is a predetermined recording means such as a memory.

ステップＰ３で所定サンプル数のデータがバッファに記録されたと判定された場合はステップＰ４に移行し、まだ所定サンプルデータが記録されないと判定された場合はステップＰ２に戻り動作を繰り返す。 If it is determined in step P3 that data of a predetermined number of samples has been recorded in the buffer, the process proceeds to step P4. If it is determined that predetermined sample data has not yet been recorded, the process returns to step P2 and the operation is repeated.

ここで、ステップＰ２〜ステップＰ７ではＰＵ処理のために、所定、音声信号の有音、無音判定処理を考えるので、ステップＰ２の所定サンプルデータ数としては、大よそ０．１秒くらい〜１秒くらいの所定区間の間に相当するデータ数のバッファ処理を行う。 Here, in step P2 to step P7, since predetermined processing of voice signal sound / silence determination processing is considered for PU processing, the number of predetermined sample data in step P2 is about 0.1 second to about 1 second. The buffer processing corresponding to the number of data corresponding to the predetermined interval is performed.

例えば、サンプリング周波数４８ＫＨｚの場合は、１秒間で４８０００サンプルデータなので、０．１秒の場合は４８００サンプルのデータをバッファに記録する。 For example, when the sampling frequency is 48 KHz, the data is 48000 samples per second, and when 0.1 seconds, the data of 4800 samples is recorded in the buffer.

ステップＰ４でバッファから音声データを読出し処理し、ステップステップＰ５で、上記で述べたような所定区間の音声レベルの演算処理を行い、ステップＰ６で所定レベルと比較処理を行い、所定レベル以上か所定レベル以下かの判定処理を行って、無音検出（無音判定）処理が行われる。 In step P4, the audio data is read out from the buffer. In step P5, the audio level calculation process for the predetermined section as described above is performed. In step P6, a comparison process is performed with the predetermined level. A determination process of whether the level is lower or lower is performed, and a silence detection (silence determination) process is performed.

ステップＰ６でその区間が無音区間と判定された場合は、ステップＰ７でその情報を所定メモリー（バッファ）に記憶（記録）し、無音でなく有音と判定された場合はステップＰ８に移行し、ステップＰ１で読み込んだバッファのデータの音声バッファ処理が終了したか判定処理され、終了しない場合はステップＰ２に戻り上記の処理を繰り返し、終了した場合はステップＰ９に移行する。 If it is determined in step P6 that the section is a silent section, the information is stored (recorded) in a predetermined memory (buffer) in step P7. If it is determined that the section is not silent but there is a voice, the process proceeds to step P8. It is determined whether or not the audio buffer processing of the buffer data read in step P1 has been completed. If not, the processing returns to step P2, and the above processing is repeated. If the processing has ended, the processing moves to step P9.

ステップＰ９では、ステップＰ８で処理された音声セグメント情報データを読出し、ステップＰ１０で上記した短い無音区間、有音区間、長い無音区間、有音区間のセグメント処理を行う。 In step P9, the voice segment information data processed in step P8 is read, and in the step P10, the segment processing of the short silent section, the voiced section, the long silent section, and the voiced section described above is performed.

ステップＰ１１では、所定データサンプル数の画像データのＤＣＴ処理データを所定のバッファメモリー（所定データ記録手段）に記録処理を行い、ステップＰ１２で所定データ量の記録が終了したかが判定され、所定データ量ではない場合は、ステップＰ１１に戻り上記バッファメモリー系への書き込み処理を繰り返し、ステップＰ１２で所定データ量の書き込み処理が終了したと判定された場合は、ステップＰ１３に移行する。 In step P11, DCT processing data of image data of a predetermined number of data samples is recorded in a predetermined buffer memory (predetermined data recording means). In step P12, it is determined whether recording of a predetermined amount of data has been completed. If it is not the amount, the process returns to Step P11 to repeat the writing process to the buffer memory system. If it is determined in Step P12 that the writing process of the predetermined data amount has been completed, the process proceeds to Step P13.

ステップＰ１３では上記所定のバッファメモリー系から記録した（書き込み処理した）所定のＤＣＴデータを読出し処理し、ステップＰ１４において、フレーム間差分などの所定信号処理を行い、所定のシーンチェンジ検出処理を行う。 In step P13, predetermined DCT data recorded (written) from the predetermined buffer memory system is read out. In step P14, predetermined signal processing such as interframe difference is performed, and predetermined scene change detection processing is performed.

ステップＰ１５で所定のシーンチェンジがあったか判定処理され、シーンチェンジがあったと判定される場合は、ステップＰ１６で所定のメモリー手段（データ記録手段、データバッファ手段など）にシーンチェンジがあった時点の位置情報データを記憶（書き込み処理）してステップＰ１７に移行し、ステップＰ１５でシーンチェンジがないと判定された場合はステップＰ１７に移行する。 In step P15, it is determined whether or not a predetermined scene change has occurred. If it is determined that a scene change has occurred, the position at the time when the predetermined memory means (data recording means, data buffer means, etc.) has changed in step P16 The information data is stored (write process), and the process proceeds to step P17. If it is determined in step P15 that there is no scene change, the process proceeds to step P17.

ステップＰ１７では、所定データバッファ内の所定データ量の上記シーンチェンジ検出処理が終了したか判定処理され、終了しない場合はステップＰ１１に戻り上記信号処理を繰り返し、ステップＰ１７で終了したと判定される場合は、ステップＰ１８に移行する。 In step P17, it is determined whether or not the scene change detection process for a predetermined amount of data in the predetermined data buffer has been completed. If not, the process returns to step P11 to repeat the signal processing, and it is determined that the process has been completed in step P17. Shifts to Step P18.

ステップＰ１８では所定バッファメモリー手段に記録された（記憶された）シーンチェンジ位置情報を読出し、ステップＰ１９で所定区間長より短いなど、短過ぎる区間は前後区間と接合するなどの、シーンチェンジ検出区間の補正処理を行う。 In step P18, the scene change position information recorded (stored) in the predetermined buffer memory means is read, and in step P19, a section that is too short, such as shorter than the predetermined section length, is joined to the preceding and following sections. Perform correction processing.

ステップＰ２０では上記所定区間における生成処理された音声セグメント位置情報データ及びシーンチェンジ位置情報データを読み出し、ステップＰ２１で音声セグメント位置、音声セグメント区間長、シーンチェンジ位置、シーンチェンジ区間長などの所定情報データから、所定のＰＵの位置情報、区間情報など所定ＰＵ情報データを生成処理する。 In step P20, the audio segment position information data and scene change position information data generated and processed in the predetermined section are read. In step P21, predetermined information data such as an audio segment position, an audio segment section length, a scene change position, and a scene change section length are read. Then, predetermined PU information data such as predetermined PU position information and section information is generated.

ステップＰ２２では、ステップＰ２１で処理されたＰＵ情報から、そのＰＵ区間に対応する特徴データ（又は特徴抽出データ、又は特徴抽出データを所定の信号処理を行った信号など）を所定の記録媒体、又は所定のデータバッファに書き込み処理を行う。 In step P22, from the PU information processed in step P21, feature data (or feature extraction data or a signal obtained by performing predetermined signal processing on the feature extraction data) corresponding to the PU section is stored in a predetermined recording medium, or Write processing to a predetermined data buffer.

上記したように、これら記録媒体は、いま処理の対象としている放送番組、プログラムなど所定区間の画像音声情報データが記録されているのと同じ所定記録媒体上における所定記録領域の他に、別の所定記録媒体上に記録（記憶、書き込み処理）を行うことも考えられる。 As described above, these recording media are different from the predetermined recording area on the same predetermined recording medium on which the image / audio information data of the predetermined section such as the broadcast program or program to be processed is recorded. It is also conceivable to perform recording (storage and writing processing) on a predetermined recording medium.

ステップＰ２３では所定データ量の上記音声セグメント処理、シーンチェンジ処理、ＰＵ処理など一連の信号処理が終了したか判定処理され、終了したと判定される場合は処理を終了し、終了していないと判定された場合はステップＰ１に戻り、上記処理を繰り返す。 In step P23, it is determined whether or not a series of signal processing such as audio segment processing, scene change processing, and PU processing for a predetermined amount of data has been completed. If it is determined that the processing has ended, the processing ends, and it is determined that the processing has not ended. If so, the process returns to step P1 to repeat the above process.

（ＰＵ処理で全ての音声セグメント処理を行った後にシーンチェンジ処理を行う場合の動作フローチャート）
ここで、上記では、今考えている記録した所定放送番組、プログラムなどの、画像音声データの所定区間毎に、逐次、音声データのセグメント処理を行い、その後、画像のシーンチェンジ検出処理を行ったが、上記のように所定区間毎の処理ではなく、今処理の対象としている放送番組、プログラムの所定区間全ての音声セグメント処理が終了した後、全てのシーンチェンジ検出処理を行い、全てのシーンチェンジ検出処理が終了した後、上記所定のＰＵ処理を行うことも考えられる。 (Operation flowchart when scene change processing is performed after all audio segment processing is performed in PU processing)
Here, in the above, the segment processing of the audio data is sequentially performed for each predetermined section of the image / audio data, such as the predetermined broadcast program and program that are currently considered, and then the scene change detection process of the image is performed. However, instead of processing for each predetermined section as described above, all the scene change detection processes are performed after the audio segment processing for all the predetermined sections of the broadcast program or program to be processed is completed, and all scene changes are performed. It is also conceivable to perform the predetermined PU process after the detection process is completed.

上記したＰＵ信号処理の場合で、音声セグメント検出点とシーンチェンジ検出点から所定信号処理を行う動作フローチャートの他の一例を図５０に示す。 FIG. 50 shows another example of an operation flowchart for performing predetermined signal processing from the audio segment detection point and the scene change detection point in the case of the PU signal processing described above.

処理を開始すると、先ず最初のステップＴ１において上記図４９のフローチャートにおけるステップＰ１〜ステップＰ９で説明したような所定音声セグメント処理を行う。 When the process is started, first, in a first step T1, predetermined voice segment processing as described in steps P1 to P9 in the flowchart of FIG. 49 is performed.

ここで、音声データは所定バッファメモリーに逐次所定データサンプル量のデータを読み込んで行う。 Here, the audio data is obtained by sequentially reading data of a predetermined data sample amount into a predetermined buffer memory.

ステップＴ２で音声セグメント処理を行ったセグメント位置情報のデータを所定メモリー手段（データ記憶手段、データ記録手段）に記録していき、ステップＴ３において、いま処理対象となっている放送番組、プログラムなどの所定区間すべての音声データについて所定セグメント処理が終了したか判定され、終了しないと判定された場合はステップＴ１に戻り上記の処理を繰り返し、終了したと判定された場合はステップＴ４に移行する。 The segment position information data subjected to the audio segment processing in step T2 is recorded in predetermined memory means (data storage means, data recording means), and in step T3, the broadcast program, program, etc. that are currently processed are recorded. It is determined whether or not the predetermined segment processing has been completed for all the audio data in the predetermined section. If it is determined that the predetermined segment processing has not ended, the process returns to step T1 to repeat the above-described processing.

ステップＴ４において上記図４９のフローチャートにおけるステップＰ１１〜ステップＰ１８で説明したような所定シーンチェンジ処理を行う。ここで、画像のＤＣＴデータは所定バッファメモリーに逐次所定データサンプル量のデータを読み込んで行う。 In step T4, a predetermined scene change process as described in steps P11 to P18 in the flowchart of FIG. 49 is performed. Here, DCT data of an image is obtained by sequentially reading a predetermined data sample amount of data into a predetermined buffer memory.

ステップＴ５で所定シーンチェンジ処理を行ったシーンチェンジ位置情報のデータを所定メモリー手段（データ記憶手段、データ記録手段）に記録していき、ステップＴ６において、いま処理対象となっている放送番組、プログラムなどの所定区間すべての画像のＤＣＴデータについて所定シーンチェンジ処理が終了したか判定され、終了しないと判定された場合はステップＴ４に戻り上記の処理を繰り返し、終了したと判定された場合はステップＴ７に移行する。 The data of the scene change position information that has undergone the predetermined scene change process in step T5 is recorded in predetermined memory means (data storage means, data recording means), and in step T6, the broadcast program or program that is currently processed is recorded. It is determined whether or not the predetermined scene change process has been completed for the DCT data of all the images in the predetermined section. If it is determined that the predetermined scene change process has not been completed, the process returns to step T4 and the above process is repeated. Migrate to

ステップＴ７では上記所定メモリー手段から所定音声セグメント位置情報のデータと、所定シーンチェンジ位置情報のデータを読出し、ステップＴ８で所定ＰＵ処理を行い、ステップＴ９で、いま処理対象となっている放送番組、プログラムなどの所定区間すべての区間にわたり所定ＰＵ処理が終了したか判定され、終了したと判定された場合は処理を終了し、終了しないと判定された場合はＴ７に戻り上記動作を繰り返す。 In step T7, predetermined audio segment position information data and predetermined scene change position information data are read from the predetermined memory means, predetermined PU processing is performed in step T8, and in step T9, the broadcast program currently being processed, It is determined whether or not the predetermined PU process has been completed over all the predetermined sections such as a program. If it is determined that the predetermined PU process has ended, the process ends. If it is determined that the predetermined PU process has not ended, the process returns to T7 and the above operation is repeated.

本発明を適用した記録再生装置における要約再生、チャプター処理の動作説明図である。It is operation | movement explanatory drawing of the summary reproduction | regeneration and the chapter process in the recording / reproducing apparatus to which this invention is applied. 上記チャプター処理による表示の一例を模式的に示す図である。It is a figure which shows an example of the display by the said chapter process typically. 上記記録再生装置における処理プロセスの一例を模式的に示す図である。It is a figure which shows typically an example of the process in the said recording / reproducing apparatus. 上記記録再生装置における規則処理の説明図である。It is explanatory drawing of the rule process in the said recording / reproducing apparatus. 上記記録再生装置における意味付け処理と特徴データの関係の一例の説明図である。It is explanatory drawing of an example of the relationship between the meaning assignment | providing process and characteristic data in the said recording / reproducing apparatus. 上記記録再生装置における規則ファイル書式の一例の説明図である。It is explanatory drawing of an example of the rule file format in the said recording / reproducing apparatus. 上記記録再生装置における評価値の演算処理方法の一例の説明図である。It is explanatory drawing of an example of the calculation processing method of the evaluation value in the said recording / reproducing apparatus. 上記記録再生装置における時間補正関数の一例の説明図である。It is explanatory drawing of an example of the time correction function in the said recording / reproducing apparatus. 上記記録再生装置における時間補正関数の一般型の一例の説明図である。It is explanatory drawing of an example of the general type of the time correction function in the said recording / reproducing apparatus. 上記記録再生装置におけるビデオデータの構造の一例の説明図である。It is explanatory drawing of an example of the structure of the video data in the said recording / reproducing apparatus. 上記記録再生装置における再生ユニット間の接続関係の一例の説明図である。It is explanatory drawing of an example of the connection relation between the reproduction | regeneration units in the said recording / reproducing apparatus. 上記記録再生装置における再生ユニット間の意味付け処理の一例の説明図である。It is explanatory drawing of an example of the meaning addition process between the reproduction | regeneration units in the said recording / reproducing apparatus. 上記記録再生装置における規則２処理の一例の説明図である。It is explanatory drawing of an example of the rule 2 process in the said recording / reproducing apparatus. 上記記録再生装置における時間補正関数の一例の説明図である。It is explanatory drawing of an example of the time correction function in the said recording / reproducing apparatus. 上記記録再生装置における規則ファイルの構成の一例の説明図である。It is explanatory drawing of an example of a structure of the rule file in the said recording / reproducing apparatus. 上記記録再生装置における本発明の処理プロセスの一例の説明図である。It is explanatory drawing of an example of the process of this invention in the said recording / reproducing apparatus. 本発明を適用した記録再生装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the recording / reproducing apparatus to which this invention is applied. 上記記録再生装置における各種所定データ記録状態の一例の説明図である。It is explanatory drawing of an example of the various predetermined data recording states in the said recording / reproducing apparatus. 上記記録再生装置における表示の一例を示す図である。It is a figure which shows an example of the display in the said recording / reproducing apparatus. 本発明を適用した記録再生装置の他の構成例を示すブロック図である。FIG. 10 is a block diagram illustrating another configuration example of a recording / reproducing apparatus to which the present invention is applied. 上記記録再生装置における音声系特徴抽出処理系の構成の一例を示すブロック図である。It is a block diagram which shows an example of a structure of the audio | voice system characteristic extraction processing system in the said recording / reproducing apparatus. 上記記録再生装置における音声系特徴抽出処理系の構成の他の例を示すブロック図である。It is a block diagram which shows the other example of a structure of the audio system characteristic extraction processing system in the said recording / reproducing apparatus. 上記記録再生装置における映像系特徴抽出処理系の構成の一例を示すブロック図である。It is a block diagram which shows an example of a structure of the video-type feature extraction processing system in the said recording / reproducing apparatus. 上記記録再生装置におけるシーンチェンジ処理の説明図である。It is explanatory drawing of the scene change process in the said recording / reproducing apparatus. 上記記録再生装置におけるテロップ、カラー特徴検出領域の一例の説明図である。It is explanatory drawing of an example of the telop and color feature detection area | region in the said recording / reproducing apparatus. 上記記録再生装置における類似画像特徴の一例の説明図である。It is explanatory drawing of an example of the similar image characteristic in the said recording / reproducing apparatus. 上記記録再生装置における人物特徴検出領域の一例の説明図である。It is explanatory drawing of an example of the person characteristic detection area | region in the said recording / reproducing apparatus. 上記記録再生装置における人物検出処理の一例の説明図である。It is explanatory drawing of an example of the person detection process in the said recording / reproducing apparatus. 上記記録再生装置における人物検出（人数判定）処理の一例の説明図である。It is explanatory drawing of an example of the person detection (number determination) process in the said recording / reproducing apparatus. 上記記録再生装置における人数検出処理の一例の説明図である。It is explanatory drawing of an example of the number of persons detection process in the said recording / reproducing apparatus. 上記記録再生装置における人数検出処理の一例の説明図である。It is explanatory drawing of an example of the number of persons detection process in the said recording / reproducing apparatus. 上記記録再生装置における人数検出処理の一例の説明図である。It is explanatory drawing of an example of the number of persons detection process in the said recording / reproducing apparatus. 上記記録再生装置における人数検出処理の一例の説明図である。It is explanatory drawing of an example of the number of persons detection process in the said recording / reproducing apparatus. 上記記録再生装置における再生ユニット処理の一例の説明図である。It is explanatory drawing of an example of the reproduction | regeneration unit process in the said recording / reproducing apparatus. 上記記録再生装置における再生ユニット処理の一例の説明図である。It is explanatory drawing of an example of the reproduction | regeneration unit process in the said recording / reproducing apparatus. 上記記録再生装置におけるＣＭ（コマーシャル）検出処理の一例の説明図である。It is explanatory drawing of an example of CM (commercial) detection process in the said recording / reproducing apparatus. 上記記録再生装置における再生ユニット処理系の構成例を示すブロック図である。It is a block diagram which shows the structural example of the reproduction | regeneration unit processing system in the said recording / reproducing apparatus. 上記記録再生装置における特徴データファイルの構成の一例の説明図である。It is explanatory drawing of an example of a structure of the characteristic data file in the said recording / reproducing apparatus. 上記記録再生装置における特徴データファイルの構成の一例の説明図である。It is explanatory drawing of an example of a structure of the characteristic data file in the said recording / reproducing apparatus. 上記記録再生装置における特徴データファイルの構成の一例の説明図である。It is explanatory drawing of an example of a structure of the characteristic data file in the said recording / reproducing apparatus. 上記記録再生装置における再生ユニットデータの階層構造の一例の説明図である。It is explanatory drawing of an example of the hierarchical structure of the reproduction | regeneration unit data in the said recording / reproducing apparatus. 上記記録再生装置における再生ユニットデータの階層構造の一例の説明図である。It is explanatory drawing of an example of the hierarchical structure of the reproduction | regeneration unit data in the said recording / reproducing apparatus. 上記記録再生装置における再生ユニット映像特徴データの構成の一例の説明図である。It is explanatory drawing of an example of a structure of the reproduction | regeneration unit video characteristic data in the said recording / reproducing apparatus. 上記記録再生装置におけるプレイリスト（要約）データの一例の説明図である。It is explanatory drawing of an example of the play list (summary) data in the said recording / reproducing apparatus. 上記記録再生装置の動作例を示すフローチャートである。It is a flowchart which shows the operation example of the said recording / reproducing apparatus. 上記記録再生装置における記録時間と選択可能要約再生時間の関係の一例の説明図である。It is explanatory drawing of an example of the relationship between the recording time in the said recording / reproducing apparatus, and the selectable summary reproduction | regeneration time. 上記記録再生装置における記録時間と自動設定チャプター数一例の説明図である。It is explanatory drawing of an example of the recording time in the said recording / reproducing apparatus, and the number of automatic setting chapters. 上記記録再生装置の記録動作例を示すフローチャートである。It is a flowchart which shows the example of recording operation | movement of the said recording / reproducing apparatus. 上記記録再生装置の再生動作例を示すフローチャートである。It is a flowchart which shows the example of reproduction | regeneration operation | movement of the said recording / reproducing apparatus. 上記記録再生装置の再生動作を他の例を示すフローチャートである。It is a flowchart which shows the other example of reproducing | regenerating operation | movement of the said recording / reproducing apparatus.

Explanation of symbols

１受信アンテナ系、２放送受信信号処理系、３音声Ａ／Ｄ変換処理系、４音声エンコーダー処理系、５多重化処理系、６記録処理系、７記録媒体系、８映像Ａ／Ｄ変換処理系、９映像エンコーダー処理系、１０特徴抽出処理系、１１メモリー系、１２再生処理系、１３再生データ分離処理系、１４音声デコード処理系、１５音声Ｄ／Ａ処理系、１６映像デコード処理系、１７映像Ｄ／Ａ処理系、１８再生制御系、１９プレイリストデータ生成処理系、２０システムコントローラー系、２１ユーザー入力Ｉ／Ｆ系、２２リモコン系、２３ネットワークＩ／Ｆ系、２４ネットワーク系、２５記録媒体系、２６記録媒体処理系、２７表示処理系、３０，３０Ａ記録再生装置、１００ストリーム分離系、１０１音声データデコード系、１０２レベル処理系、１０３データカウント系、１０４データバッファ系、１０５音声データ積算処理系、１０６判定処理系、１０７しきい値設定系、１０８周波数解析処理系、１０９判定処理系、１１０ストリームデータ解析系、１１１サブバンド解析処理系、１１２データバッファ系、２００ストリームデータ解析系、２０１ＤＣＴ係数処理系、２０２シーンチェンジ検出処理系、２０３色特徴検出処理系、２０４類似画像検出処理系、２０５人物検出判定処理系、２０６テロップ検出判定処理系、２０７フリップ検出処理系、２０８動きベクトル処理系、２０９カメラ特徴判定処理系、２１０オブジェクト検出処理系、３０１区間長計測系、３０２再生ユニット処理系、３０３再生ユニット特徴データ処理系、９００ルール切り替え処理系、９０１ルール切り替え処理系 1 receiving antenna system, 2 broadcast reception signal processing system, 3 audio A / D conversion processing system, 4 audio encoder processing system, 5 multiplexing processing system, 6 recording processing system, 7 recording medium system, 8 video A / D conversion processing System, 9 video encoder processing system, 10 feature extraction processing system, 11 memory system, 12 playback processing system, 13 playback data separation processing system, 14 audio decoding processing system, 15 audio D / A processing system, 16 video decoding processing system, 17 Video D / A processing system, 18 Playback control system, 19 Playlist data generation processing system, 20 System controller system, 21 User input I / F system, 22 Remote control system, 23 Network I / F system, 24 Network system, 25 Recording medium system, 26 recording medium processing system, 27 display processing system, 30, 30A recording / reproducing apparatus, 100 stream separation system, 10 1 audio data decoding system, 102 level processing system, 103 data count system, 104 data buffer system, 105 audio data integration processing system, 106 judgment processing system, 107 threshold setting system, 108 frequency analysis processing system, 109 judgment processing system 110 Stream data analysis system, 111 Subband analysis processing system, 112 Data buffer system, 200 Stream data analysis system, 201 DCT coefficient processing system, 202 Scene change detection processing system, 203 Color feature detection processing system, 204 Similar image detection processing System, 205 human detection determination processing system, 206 telop detection determination processing system, 207 flip detection processing system, 208 motion vector processing system, 209 camera feature determination processing system, 210 object detection processing system, 301 section length measurement system, 302 playback unit Processing system, 03 playback unit, wherein the data processing system, 900 rule switching processing system, 901 rule switching processing system

Claims

In an information signal processing method for recording or reproducing a predetermined video / audio information signal by a predetermined band compression signal processing using a predetermined recording medium,
When recording image and audio information signals,
The image / audio information signal of the analog input system automatically extracts a predetermined characteristic signal in the image or the audio signal when the image / audio information signal is recorded,
For the audio / video information signal of the digital input system, characteristic data extraction processing for extracting a predetermined characteristic signal in the image or audio signal is automatically performed after the recording of the audio / video information signal is completed, or the recording of the audio / video information signal is performed. An information signal processing method comprising: performing information signal processing by selecting whether the characteristic data extraction processing is automatically performed by a predetermined selection operation or manually when desired after completion.

Capture software for executing the information signal processing by a predetermined operation by a predetermined data input system,
Set the above information signal processing to an executable state,
2. The information signal processing method according to claim 1, wherein the information signal processing is executed when the predetermined operation mode is set by a predetermined operation system.

Recording means for recording a predetermined video / audio information signal on a predetermined recording medium by predetermined band compression signal processing ;
Characteristic data extraction means for performing characteristic data extraction processing for extracting predetermined characteristic data for each predetermined section of the video and audio information signal;
When recording the video / audio information signal from the recording means, the video / audio information signal of the analog input system is automatically subjected to the characteristic data extraction processing by the characteristic data extraction means when the video / audio information signal is recorded. The video / audio information signal of the digital input system automatically performs the characteristic data extraction processing by the characteristic data extraction means after the recording of the video / audio information signal is completed, or the characteristic after the recording of the video / audio information signal is completed. An information signal processing apparatus comprising: information processing means for selecting whether the characteristic data extraction processing by the data extraction means is automatically performed by a predetermined selection operation means or manually when desired. .

A data input system for capturing software for executing predetermined information signal processing by a predetermined operation;
Signal processing setting means for setting to a state in which predetermined information signal processing can be executed by software fetched by the data input system,
The information processing unit automatically performs the characteristic data extraction process on the analog input system image / audio information signal when the image / audio information signal is recorded by the characteristic data extraction unit , and the digital input system image For the audio information signal, the characteristic data extraction means automatically performs the characteristic data extraction process after the recording of the video / audio information signal or the predetermined characteristic data extraction process is performed after the recording of the video / audio information signal is completed. 4. The information signal processing apparatus according to claim 3, wherein the information signal processing apparatus is configured to select whether to perform automatically or manually when desired.

A program recording medium in which a control program for recording or reproducing a predetermined video / audio information signal by predetermined band compression signal processing using a predetermined recording medium is recorded so as to be read and executed by a computer,
When recording image and audio information signals,
The image / audio information signal of the analog input system automatically extracts a predetermined characteristic signal in the image or the audio signal when the image / audio information signal is recorded ,
For the audio / video information signal of the digital input system, a characteristic data extraction process for extracting a predetermined characteristic signal in the image or audio signal is automatically performed after the recording of the audio / video information signal is completed, or the recording of the audio / video information signal is performed. A control program is recorded so as to be readable and executable by a computer, characterized by selecting whether the characteristic data extraction process is automatically performed by a predetermined selection operation or manually when desired after completion. A program recording medium characterized by the above.