JP2008104038A

JP2008104038A - Information processor, information processing method, and program

Info

Publication number: JP2008104038A
Application number: JP2006285829A
Authority: JP
Inventors: Taka Murakoshi; 象村越
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2006-10-20
Filing date: 2006-10-20
Publication date: 2008-05-01

Abstract

PROBLEM TO BE SOLVED: To detect a region which is considered that a feature of a picture is well represented. SOLUTION: The same processing is not performed by regarding a picture of all regions of all frames constituting a program as a target, but when it is determined that a certain frame contains an important region considered to be important when viewing from a point of extracting the feature of the picture, the picture of the important region is intensively analyzed and the feature is extracted. Since a region containing a display region of a telop which comes to be displayed in the form of thrusting away subtitles is inserted so as to certainly show to a viewer from the producer side of the program, there is the high possibility that the feature of a scene is well represented and the region is designated as the important region. The present invention is applicable to a device for analyzing the picture. COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、情報処理装置、情報処理方法、およびプログラムに関し、特に、映像の特徴をよく表していると考えられる領域を検出することができるようにした情報処理装置、情報処理方法、およびプログラムに関する。 The present invention relates to an information processing device, an information processing method, and a program, and more particularly, to an information processing device, an information processing method, and a program that can detect an area that is considered to well represent a feature of a video. .

ハードディスクレコーダなどの録画機器の中には、録画したテレビジョン番組の映像を解析することによって特徴を抽出し、抽出した特徴に基づいて、シーンの切り替わりの位置を検出してチャプタを自動的に設定したり、ダイジェスト再生の対象となるシーンを自動的に選択したりする機能が搭載されているものがある。 In recording devices such as hard disk recorders, features are extracted by analyzing the video of the recorded television program, and chapter switching is automatically set based on the extracted features by detecting scene switching positions. And a function for automatically selecting a scene to be digest-played.

例えば、映像の特徴が大きく変わった位置が内容も変わった位置として検出され、その位置にチャプタが自動的に設定される。また、テロップが頻繁に表示される区間が番組の盛り上がりの区間として検出され、ユーザがダイジェスト再生を選択したときに再生される区間として自動的に選択される。 For example, a position where the feature of the video has changed significantly is detected as a position where the content has changed, and a chapter is automatically set at that position. In addition, a section in which telops are frequently displayed is detected as an exciting section of the program, and is automatically selected as a section to be played when the user selects digest playback.

特徴の抽出は、人や風景が映っている、番組のメインの映像だけでなく、それに重ねて挿入されたテロップなども対象として行われる。テロップが存在するか否かの情報を特徴として抽出するだけでなく、その内容（文字）を認識し、認識した文字を特徴として抽出することも行われている。例えば、認識された文字の中に人の名前が含まれている場合、その名前は、メインの映像に映っている出演者の名前を表すものとして管理される。 The feature extraction is performed not only on the main video of the program in which people and scenery are reflected, but also on telops inserted in layers. In addition to extracting information on whether or not a telop exists as a feature, the contents (characters) are recognized and the recognized character is extracted as a feature. For example, when a person's name is included in the recognized characters, the name is managed as representing the name of the performer shown in the main video.

特許文献１には、色情報に基づいて、静止画像が人工画像であるか否かを判定し、人工画像であると判定された画像に含まれる文字情報を抽出するとともに、その文字情報の抽出の対象になった画像の背景の色などから、その画像が重要な画像であるかを判定する技術が開示されている。
特開２００４−７０４２７号公報 Patent Document 1 determines whether or not a still image is an artificial image based on color information, extracts character information included in an image determined to be an artificial image, and extracts the character information. A technique for determining whether an image is an important image from the background color of the image that is the target of the image is disclosed.
JP 2004-70427 A

近年のテレビジョン番組の傾向としてテロップを多用する傾向があり、映像の特徴を抽出するという点からは、さほど重要ではないテロップも中には存在する。映像の特徴を効率的に抽出するには、重要ではないテロップを対象とするよりも、重要なテロップを対象として重点的に解析を行い、特徴を抽出した方が好ましい。 There is a tendency to frequently use telops as a trend of recent television programs, and there are some telops that are not so important in terms of extracting video features. In order to efficiently extract the features of the video, it is preferable to extract the features by focusing on important telops rather than targeting unimportant telops.

本発明はこのような状況に鑑みてなされたものであり、映像の特徴をよく表していると考えられる領域を検出することができるようにするものである。 The present invention has been made in view of such a situation, and makes it possible to detect a region that is considered to well represent the characteristics of an image.

本発明の一側面の情報処理装置は、それぞれのフレームに含まれる字幕の表示領域の位置を取得する取得手段と、字幕の表示領域の平均の位置から所定の距離だけ離れた位置に字幕の表示領域を含むフレームのうち、字幕の表示領域の平均の位置を含む所定の領域を重要領域として指定する指定手段とを備える。 An information processing apparatus according to an aspect of the present invention includes: an acquisition unit configured to acquire a position of a subtitle display area included in each frame; and a subtitle display at a position away from an average position of the subtitle display area by a predetermined distance. And a designation unit that designates a predetermined area including an average position of a subtitle display area as an important area among frames including the area.

前記指定手段により指定された前記重要領域の映像の特徴を抽出する処理を、他の領域の映像の特徴を抽出する処理よりも多く行う特徴抽出手段をさらに設けることができる。 It is possible to further provide feature extraction means for performing the process of extracting the video features of the important area designated by the designation means more than the process of extracting the video features of other areas.

前記特徴抽出手段には、時間的、処理量的、または回数的に多く、前記指定手段により指定された前記重要領域の映像の特徴を抽出する処理を行わせることができる。 The feature extraction means can perform processing for extracting the feature of the video of the important area designated by the designation means in a large amount of time, amount of processing, or number of times.

本発明の一側面の情報処理方法またはプログラムは、それぞれのフレームに含まれる字幕の表示領域の位置を取得し、字幕の表示領域の平均の位置から所定の距離だけ離れた位置に字幕の表示領域を含むフレームのうち、字幕の表示領域の平均の位置を含む所定の領域を重要領域として指定するステップを含む。 An information processing method or program according to one aspect of the present invention acquires the position of a caption display area included in each frame, and displays the caption display area at a position away from the average position of the caption display area by a predetermined distance. Including a step of designating a predetermined area including the average position of the subtitle display area as an important area.

本発明の一側面においては、それぞれのフレームに含まれる字幕の表示領域の位置が取得され、字幕の表示領域の平均の位置から所定の距離だけ離れた位置に字幕の表示領域を含むフレームのうち、字幕の表示領域の平均の位置を含む所定の領域が重要領域として指定される。 In one aspect of the present invention, the position of the caption display area included in each frame is acquired, and among the frames including the caption display area at a predetermined distance from the average position of the caption display area A predetermined area including the average position of the caption display area is designated as the important area.

本発明の一側面によれば、映像の特徴をよく表していると考えられる領域を検出することができる。 According to one aspect of the present invention, it is possible to detect a region that is considered to well represent the characteristics of an image.

以下に本発明の実施の形態を説明するが、本発明の構成要件と、明細書又は図面に記載の実施の形態との対応関係を例示すると、次のようになる。この記載は、本発明をサポートする実施の形態が、明細書又は図面に記載されていることを確認するためのものである。従って、明細書又は図面中には記載されているが、本発明の構成要件に対応する実施の形態として、ここには記載されていない実施の形態があったとしても、そのことは、その実施の形態が、その構成要件に対応するものではないことを意味するものではない。逆に、実施の形態が発明に対応するものとしてここに記載されていたとしても、そのことは、その実施の形態が、その構成要件以外には対応しないものであることを意味するものでもない。 Embodiments of the present invention will be described below. Correspondences between the constituent elements of the present invention and the embodiments described in the specification or the drawings are exemplified as follows. This description is intended to confirm that the embodiments supporting the present invention are described in the specification or the drawings. Therefore, even if there is an embodiment which is described in the specification or the drawings but is not described here as an embodiment corresponding to the constituent elements of the present invention, that is not the case. It does not mean that the form does not correspond to the constituent requirements. On the contrary, even if an embodiment is described herein as corresponding to the invention, this does not mean that the embodiment does not correspond to other than the configuration requirements. .

本発明の一側面の情報処理装置（例えば、図１の情報処理装置１）は、それぞれのフレームに含まれる字幕の表示領域の位置を取得する取得手段（例えば、図５の字幕位置判定部３３）と、字幕の表示領域の平均の位置から所定の距離だけ離れた位置に字幕の表示領域を含むフレームのうち、字幕の表示領域の平均の位置を含む所定の領域を重要領域として指定する指定手段（例えば、図５の領域指定部３４）とを備える。 The information processing apparatus according to one aspect of the present invention (for example, the information processing apparatus 1 in FIG. 1) acquires acquisition means (for example, the caption position determination unit 33 in FIG. 5) that acquires the position of the display area of the caption included in each frame. ) And designation specifying a predetermined area including the average position of the subtitle display area as an important area among frames including the subtitle display area at a predetermined distance from the average position of the subtitle display area Means (for example, the area designating unit 34 in FIG. 5).

この情報処理装置には、前記指定手段により指定された前記重要領域の映像の特徴を抽出する処理を、他の領域の映像の特徴を抽出する処理よりも多く行う特徴抽出手段（例えば、図５の映像特徴抽出部３５）をさらに設けることができる。 In this information processing apparatus, the feature extraction unit (for example, FIG. 5) performs more processing for extracting video features of the important region specified by the specifying unit than processing for extracting video features of other regions. Can be further provided.

本発明の一側面の情報処理方法またはプログラムは、それぞれのフレームに含まれる字幕の表示領域の位置を取得し、字幕の表示領域の平均の位置から所定の距離だけ離れた位置に字幕の表示領域を含むフレームのうち、字幕の表示領域の平均の位置を含む所定の領域を重要領域として指定するステップ（例えば、図６のステップＳ４）を含む。 An information processing method or program according to one aspect of the present invention acquires the position of a caption display area included in each frame, and displays the caption display area at a position away from the average position of the caption display area by a predetermined distance. Among the frames including, includes a step (for example, step S4 in FIG. 6) of designating a predetermined region including the average position of the subtitle display region as an important region.

以下、本発明の実施の形態について図を参照して説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

図１は、本発明の一実施形態に係る情報処理装置１を示す図である。 FIG. 1 is a diagram showing an information processing apparatus 1 according to an embodiment of the present invention.

図１に示されるように、情報処理装置１にはケーブルを介してテレビジョン受像機２が接続される。情報処理装置１は、ハードディスクなどの記録媒体を内蔵し、例えば、デジタルテレビジョン放送、あるいはインターネットを介した放送によって提供される番組をハードディスクに記録（録画）する。すなわち、情報処理装置１には、アンテナからの信号などが供給されるようになされている。情報処理装置１は、リモートコントローラなどを用いて行われるユーザの操作に応じて録画済みの番組を再生し、番組の映像や音声をテレビジョン受像機２に出力させる。 As shown in FIG. 1, a television receiver 2 is connected to the information processing apparatus 1 via a cable. The information processing apparatus 1 includes a recording medium such as a hard disk, and records (records) a program provided by, for example, digital television broadcasting or broadcasting via the Internet on the hard disk. That is, the information processing apparatus 1 is supplied with a signal from an antenna. The information processing apparatus 1 reproduces a recorded program in response to a user operation performed using a remote controller or the like, and causes the television receiver 2 to output program video and audio.

また、情報処理装置１は、受信された録画の対象になっている番組の映像や、ハードディスクに記録されている録画済みの番組の映像を解析して特徴を抽出し、抽出した特徴に基づいて、番組全体を所定の映像の区間からなるシーンに区切り、シーンの区切りにチャプタを設定する機能を有している。ユーザは、自ら設定することなく情報処理装置１により自動的に設定されたチャプタを選択することによって好みのシーンだけを再生させたり、情報処理装置１により選択されたシーンだけをダイジェスト再生させたりすることができる。 Further, the information processing apparatus 1 analyzes the received video of the program to be recorded and the video of the recorded program recorded on the hard disk, extracts features, and based on the extracted features The program has a function of dividing the entire program into scenes made up of predetermined video sections and setting chapters at the scene breaks. The user reproduces only a favorite scene by selecting a chapter automatically set by the information processing apparatus 1 without setting the user himself or performs a digest reproduction of only the scene selected by the information processing apparatus 1. be able to.

このような機能を有する情報処理装置１においては、番組を構成する全てのフレームの全ての領域の映像を対象として同じ内容、同じ程度の解析処理が行われることによって特徴が抽出されるのではなく、あるフレームに、映像の特徴を抽出するという点から見て重要であると考えられる領域である重要領域がある場合、その重要領域の映像を対象として重点的に解析処理が行われ、特徴が抽出される。 In the information processing apparatus 1 having such a function, features are not extracted by performing the same content and the same level of analysis processing on the video of all areas of all frames constituting the program. If there is an important area that is considered to be important in terms of extracting video features in a certain frame, analysis processing is focused on the video in that important area, and the features are Extracted.

重要領域の映像に対しては、時間をかけて解析処理が行われるといったように時間的に多くの処理が行われたり、解析処理が多くのデータを用いて行われるといったように処理量的に多くの処理が行われたり、解析処理が多くの回数だけ行われるといったように回数的に多くの処理が行われたりするようにして、同じフレームの他の領域や他のフレームの領域の映像と較べて重点的に解析が行われる。重要領域の映像に対して複雑な内容の解析処理が行われることによってより詳しい特徴が抽出されるようにしてもよいし、重要領域の映像から優先的に解析処理が行われるようにしてもよい。 For the video in the important area, a lot of processing such as time-consuming analysis processing is performed, or analysis processing is performed using a lot of data. Many processes are performed many times, such as many processes are performed or analysis processes are performed many times, and images of other areas of the same frame or areas of other frames Compared with the analysis, it is focused on. More detailed features may be extracted by performing complex content analysis processing on the important region video, or analysis processing may be preferentially performed from the important region video. .

図２は、重要領域の例を示す図である。 FIG. 2 is a diagram illustrating an example of an important area.

図２の映像Ｐ₁乃至Ｐ₇は、ある番組のシーンの代表画である。映像Ｐ₁乃至Ｐ₇に示される、内側に「字幕」の文字が付されている枠は、その枠によって囲まれる領域に字幕が表示されることを表す。例えば、メインの映像に映っている出演者の発話の内容を表す文字が、字幕としてメインの映像に重ねて表示される。 Videos P _{1 to} P _{7 in} FIG. 2 are representative pictures of a scene of a certain program. The frames shown in the images P _{1 to} P ₇ with the characters “caption” on the inside indicate that the caption is displayed in the area surrounded by the frames. For example, characters representing the content of the utterance of the performer shown in the main video are displayed as subtitles superimposed on the main video.

デジタルテレビジョン放送によって放送される番組のデータには、映像や音声のデータの他に字幕のデータも含まれており、ユーザは字幕の表示のオン／オフを選択することができる。字幕のデータには、字幕として表示する文章のデータ（テキストデータ）の他に、表示タイミングや、フレーム内の表示位置を指定するデータも含まれている。 Program data broadcast by digital television broadcasting includes caption data in addition to video and audio data, and the user can select on / off of caption display. In addition to text data (text data) to be displayed as subtitles, the subtitle data includes display timing and data for specifying a display position in the frame.

図２の例においては、映像Ｐ₁に表示される字幕の位置はフレームの下方の位置とされている。テロップなどの表示が行われない通常時には、このように、字幕はフレームの下方の位置に表示される。 In the example of FIG. 2, the position of the caption displayed in the video P ₁ is the position below the frame. In normal times when no telop or the like is displayed, the subtitle is displayed at a position below the frame.

映像Ｐ₂に表示される字幕の位置は、オープンキャプションである「Happy Birthday!!」のテロップがフレームの下方の位置に表示されていることから、フレームの中央付近の位置とされている。「Happy Birthday!!」のテロップの表示が開始されることに伴い、それまでフレームの下方の位置に表示されていた字幕の位置がフレームの中央付近の位置に移動し、テロップが表示されている間、字幕はフレームの中央付近の位置に表示され続ける。 The position of the subtitle displayed in the video P ₂ is a position near the center of the frame since the telop “Happy Birthday !!”, which is an open caption, is displayed at a position below the frame. With the start of the “Happy Birthday !!” telop display, the subtitle position that was displayed at the lower position of the frame has moved to a position near the center of the frame, and the telop is displayed. Meanwhile, the subtitle continues to be displayed at a position near the center of the frame.

同様に、図２の例においては、映像Ｐ₃に表示される字幕の位置はフレームの下方の位置とされ、映像Ｐ₄に表示される字幕の位置は、オープンキャプションである「失敗→失敗→禁止」のテロップがフレームの下方の位置に表示されていることから、フレームの中央付近の位置とされている。 Similarly, in the example of FIG. 2, the position of the subtitle displayed in the video P ₃ is a position below the frame, and the position of the subtitle displayed in the video P ₄ is “failure → failure → Since the “prohibited” telop is displayed at a position below the frame, it is positioned near the center of the frame.

映像Ｐ₅に表示される字幕の位置はフレームの下方の位置とされ、映像Ｐ₆に表示される字幕の位置は、オープンキャプションである「さようなら！」のテロップがフレームの下方の位置に表示されていることから、フレームの中央付近の位置とされている。映像Ｐ₇に表示される字幕の位置はフレームの下方の位置とされている。 The position of the subtitle displayed in the video P ₅ is the position below the frame, and the position of the subtitle displayed in the video P ₆ is an open caption “Goodbye!” Telop displayed at the position below the frame. Therefore, the position is near the center of the frame. The position of the caption displayed on the video P ₇ is a position below the frame.

例えば、このようにして字幕が表示される場合、字幕をいわば押しのける形で表示されるようになった「Happy Birthday!!」のテロップの表示領域付近の領域である映像Ｐ₂の領域Ａ₁、「失敗→失敗→禁止」のテロップの表示領域付近の領域である映像Ｐ₄の領域Ａ₂、「さようなら！」のテロップの表示領域付近の領域である映像Ｐ₆の領域Ａ₃が重要領域として指定される。領域Ａ₁乃至Ａ₃は、それぞれ、テロップが表示されている領域ではあるが通常時には字幕が表示される領域でもあり、通常時に字幕が表示される領域の重心位置を含む一定の領域が、このように重要領域として指定される。 For example, when subtitles are displayed in this way, the area A ₁ of the video P ₂ , which is an area near the display area of the telop of “Happy Birthday !!” The area A ₂ of the video P ₄ that is the area near the display area of the “failure → failure → prohibited” telop and the area A ₃ of the video P ₆ that is the area near the display area of the “goodbye!” Telop are important areas. It is specified. Each of the areas A _{1 to} A ₃ is an area where a telop is displayed, but is also an area where captions are normally displayed, and a certain area including the barycentric position of the area where captions are normally displayed is Is designated as an important area.

字幕を押しのけてまで表示されるテロップは、番組の制作者側が視聴者に確実に見せようとして挿入されたものであるから、シーンの特徴をよく表しているものである可能性が高い。従って、そのようなテロップが表示される領域やその周辺の映像を重点的に解析することによって、チャプタの設定やダイジェスト再生のシーンを決定するための基準として用いるのに意味のある特徴を抽出することが可能になる。 The telop that is displayed until the subtitle is pushed is inserted so that the producer of the program surely shows it to the viewer, so it is highly likely that the telop clearly represents the features of the scene. Therefore, features that are meaningful to be used as a reference for determining chapter settings and digest playback scenes are extracted by focusing on the area where such telop is displayed and the surrounding video. It becomes possible.

字幕を押しのけて表示されるテロップとしては、人の名前、電話番号、地理的な名称を含むテロップなどがあるが、そのようなテロップを重点的に解析し、文字認識することにより、人の名前などを特徴として抽出することが可能になる。 There are telops that include a person's name, phone number, and geographical name as the telop displayed by pushing the subtitles. The person's name can be identified by analyzing the telop and recognizing characters. Can be extracted as features.

図３は、重要領域の検出の概念を示す図である。 FIG. 3 is a diagram illustrating the concept of important area detection.

例えば、字幕が表示される領域である字幕領域の重心位置がそれぞれのフレームを対象として検出され、平均の位置からずれた位置に字幕領域の重心位置があるとして検出されたフレームから重要領域が指定される。重要領域は、字幕領域の重心の平均の位置を含むものになるように指定される。 For example, the position of the center of gravity of the caption area, which is the area where the caption is displayed, is detected for each frame, and the important area is specified from the detected frame as the position of the center of gravity of the caption area is shifted from the average position. Is done. The important area is designated to include the average position of the center of gravity of the caption area.

図３の例においては、映像Ｐ₁内の字幕領域の重心の縦方向（ｙ軸方向）の位置は位置ａとされ、同様に位置ａを重心の縦方向の位置とした領域に字幕が表示されるフレームの数が多いことから、位置ａが、字幕領域の重心の平均の位置とされている。 In the example of FIG. 3, the position in the vertical direction (y-axis direction) of the center of gravity of the caption area in the video P ₁ is the position a. Similarly, the caption is displayed in the area where the position a is the vertical position of the center of gravity. Since the number of frames to be played is large, the position a is the average position of the center of gravity of the caption area.

また、位置ａから離れた位置に字幕領域の重心位置があるとして検出された映像Ｐ₂，Ｐ₄，Ｐ₆から、領域Ａ₁乃至Ａ₃の重要領域がそれぞれ指定されている。 In addition, the important areas of the areas A _{1 to} A ₃ are respectively designated from the images P ₂ , P ₄ , and P ₆ detected that the position of the center of gravity of the caption area is located away from the position a.

重要領域を指定する情報処理装置１の一連の処理についてはフローチャートを参照して後述する。 A series of processing of the information processing apparatus 1 that designates an important area will be described later with reference to a flowchart.

図４は、情報処理装置１のハードウエア構成例を示すブロック図である。 FIG. 4 is a block diagram illustrating a hardware configuration example of the information processing apparatus 1.

CPU(Central Processing Unit)１１は、ROM(Read Only Memory)１２、または記録部１９に記録されているプログラムに従って各種の処理を実行する。RAM(Random Access Memory)１３には、CPU１１が実行するプログラムやデータなどが適宜記録される。これらのCPU１１、ROM１２、およびRAM１３は、バス１４により相互に接続されている。 A CPU (Central Processing Unit) 11 executes various processes according to a program recorded in a ROM (Read Only Memory) 12 or a recording unit 19. A RAM (Random Access Memory) 13 appropriately stores programs executed by the CPU 11 and data. The CPU 11, ROM 12, and RAM 13 are connected to each other by a bus 14.

CPU１１にはまた、バス１４を介して入出力インタフェース１５が接続されている。入出力インタフェース１５には、受信部１６、入力部１７、出力部１８、記録部１９、通信部２０、およびドライブ２１が接続されている。 An input / output interface 15 is also connected to the CPU 11 via the bus 14. The input / output interface 15 is connected to a receiving unit 16, an input unit 17, an output unit 18, a recording unit 19, a communication unit 20, and a drive 21.

受信部１６は、アンテナ１６Ａからの放送波信号を受信、復調し、MPEG-TS(Moving Picture Experts Group-Transport Stream)を取得する。受信部１６は、録画の対象になっている番組のデータ（番組の映像、音声、字幕のデータ）をMPEG-TSから抽出し、抽出したデータを入出力インタフェース１５を介して記録部１９に出力する。 The receiving unit 16 receives and demodulates the broadcast wave signal from the antenna 16A, and acquires MPEG-TS (Moving Picture Experts Group-Transport Stream). The receiving unit 16 extracts program data (program video, audio, subtitle data) to be recorded from the MPEG-TS, and outputs the extracted data to the recording unit 19 via the input / output interface 15. To do.

入力部１７は、リモートコントローラからの信号を受信し、ユーザの操作の内容を表す情報を入出力インタフェース１５、バス１４を介してCPU１１に出力する。CPU１１においては、入力部１７から供給される情報に対応して録画済みの番組の再生などの各種の処理が行われる。 The input unit 17 receives a signal from the remote controller, and outputs information representing the contents of the user's operation to the CPU 11 via the input / output interface 15 and the bus 14. In the CPU 11, various processes such as reproduction of a recorded program are performed in accordance with information supplied from the input unit 17.

出力部１８は、受信部１６により受信され、例えばCPU１１によりソフトウエア的にデコードされた番組の映像などをテレビジョン受像機２に表示させる。 The output unit 18 causes the television receiver 2 to display, for example, video of a program received by the receiving unit 16 and decoded by the CPU 11 in software.

記録部１９は例えばハードディスクからなり、CPU１１が実行するプログラムや、入出力インタフェース１５を介して受信部１６から供給された番組のデータなどの各種のデータを記録する。 The recording unit 19 includes, for example, a hard disk, and records various data such as a program executed by the CPU 11 and program data supplied from the receiving unit 16 via the input / output interface 15.

通信部２０は、インターネットやローカルエリアネットワークなどのネットワークを介して外部の装置と通信を行う。インターネットを介して放送された番組が通信部２０において受信されるようにしてもよい。 The communication unit 20 communicates with an external device via a network such as the Internet or a local area network. A program broadcast via the Internet may be received by the communication unit 20.

ドライブ２１は、磁気ディスク、光ディスク、光磁気ディスク、或いは半導体メモリなどのリムーバブルメディア２２が装着されたとき、それらを駆動し、そこに記録されているプログラムやデータなどを取得する。取得されたプログラムやデータは、必要に応じて記録部１９に転送され、記録される。 When a removable medium 22 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory is mounted, the drive 21 drives them to acquire programs and data recorded therein. The acquired program and data are transferred to the recording unit 19 and recorded as necessary.

図５は、情報処理装置１の機能構成例を示すブロック図である。図５に示す機能部のうちの少なくとも一部は、図４のCPU１１により所定のプログラムが実行されることによって実現される。 FIG. 5 is a block diagram illustrating a functional configuration example of the information processing apparatus 1. At least a part of the functional units shown in FIG. 5 is realized by the CPU 11 shown in FIG. 4 executing a predetermined program.

図５に示されるように、情報処理装置１においては、字幕・映像取得部３１、映像デコード部３２、字幕位置判定部３３、領域指定部３４、および映像特徴抽出部３５が実現される。 As shown in FIG. 5, in the information processing apparatus 1, a caption / video acquisition unit 31, a video decoding unit 32, a subtitle position determination unit 33, a region designation unit 34, and a video feature extraction unit 35 are realized.

字幕・映像取得部３１は、番組の映像、字幕のデータを取得し、取得した映像のデータを映像デコード部３２に、字幕のデータに含まれる、字幕の表示位置を表すデータである字幕位置情報を字幕位置判定部３３に出力する。例えば、受信部１６により受信された番組の映像、字幕のデータ、または、記録部１９に記録されている、録画済みの番組の映像、字幕のデータが字幕・映像取得部３１により取得される。 The subtitle / video acquisition unit 31 acquires video and subtitle data of a program, and the acquired video data is included in the video decoding unit 32 and the subtitle position information is data representing the subtitle display position included in the subtitle data. Is output to the subtitle position determination unit 33. For example, the program video and caption data received by the receiving unit 16, or the recorded program video and caption data recorded in the recording unit 19 are acquired by the subtitle / video acquisition unit 31.

映像デコード部３２は、字幕・映像取得部３１から供給されたデータから図２に示されるようなそれぞれのフレームの映像をデコードし、デコードして得られたベースバンドの映像情報を映像特徴抽出部３５に出力する。 The video decoding unit 32 decodes the video of each frame as shown in FIG. 2 from the data supplied from the caption / video acquisition unit 31, and the baseband video information obtained by decoding the video information is a video feature extraction unit. 35.

字幕位置判定部３３は、字幕・映像取得部３１から供給された字幕位置情報に基づいて、字幕の表示位置が通常時の表示位置から所定の範囲に収まっているか否かを判定し、判定結果である字幕位置判定情報を領域指定部３４に出力する。 The subtitle position determination unit 33 determines whether the display position of the subtitle is within a predetermined range from the normal display position based on the subtitle position information supplied from the subtitle / video acquisition unit 31, and the determination result Is output to the area specifying unit 34.

例えば、字幕位置判定部３３は、注目するフレームに含まれる字幕領域の重心の位置を字幕位置情報に基づいて取得し、取得した位置と、字幕領域の平均の位置との差を算出する。字幕位置判定部３３に対しては、フレームの所定の位置を基準とした、字幕領域の重心の縦方向の位置の移動平均が字幕領域の平均の位置としてあらかじめ与えられており、注目するフレームに含まれる字幕領域の重心の位置との差を算出するのに用いられる。字幕位置判定部３３においては、このような判定がそれぞれのフレームを対象として順次行われる。 For example, the caption position determination unit 33 acquires the position of the center of gravity of the caption area included in the frame of interest based on the caption position information, and calculates the difference between the acquired position and the average position of the caption area. For the caption position determination unit 33, a moving average of the vertical position of the center of gravity of the caption area based on a predetermined position of the frame is given in advance as the average position of the caption area, and This is used to calculate the difference from the position of the center of gravity of the included caption area. In the caption position determination unit 33, such determination is sequentially performed for each frame.

領域指定部３４は、字幕位置判定部３３から供給された字幕位置判定情報に基づいて重要領域を指定し、指定した重要領域の位置、範囲などを表す情報である重要領域情報を映像特徴抽出部３５に出力する。 The area designation unit 34 designates an important area based on the caption position determination information supplied from the caption position determination unit 33, and the important area information, which is information indicating the position and range of the designated important area, is the video feature extraction unit. 35.

例えば、領域指定部３４は、字幕領域の平均の位置との差が閾値としての差を超えており、所定の距離だけ離れた位置に字幕領域が含まれていると字幕位置判定部３３により判定されたフレーム全体のうち、字幕領域の平均の位置を含む一定の領域を重要領域として指定し、その位置、範囲などを表す重要領域情報を映像特徴抽出部３５に出力する。 For example, the area specifying unit 34 determines that the difference between the average position of the subtitle areas exceeds the threshold difference and the subtitle position determining unit 33 determines that the subtitle area is included at a position separated by a predetermined distance. Of the entire frame, a certain area including the average position of the caption area is designated as an important area, and important area information representing the position, range, and the like is output to the video feature extraction unit 35.

映像特徴抽出部３５は、映像デコード部３２から供給された映像情報に基づいて、映像を構成する画素の画素値、映像に映る被写体の動きなどを解析し、特徴として抽出する。このとき、映像特徴抽出部３５は、領域指定部３４から供給された重要領域情報により指定される重要領域の映像を他の領域の映像と較べて重点的に解析し、特徴を抽出する。 Based on the video information supplied from the video decoding unit 32, the video feature extraction unit 35 analyzes the pixel values of the pixels constituting the video, the movement of the subject shown in the video, and extracts the features. At this time, the video feature extraction unit 35 analyzes the video of the important region designated by the important region information supplied from the region designating unit 34 with priority compared with the video of other regions, and extracts the features.

例えば、テロップの検出処理が行われ、テロップの有無、テロップの内容などが特徴として抽出されるようにしてもよい。テロップの検出は、エッジを検出し、文字を認識するなどして行われる。他の領域をも対象としたテロップの検出処理により複数のテロップが検出されている場合、重要領域内のテロップのみ、あるいは、重要領域に重なるテロップから順に、特徴抽出の対象として用いられる。重要領域にはテロップがあると仮定され、処理が進められるようにしてもよい。 For example, telop detection processing may be performed, and the presence / absence of a telop, the content of the telop, and the like may be extracted as features. The telop is detected by detecting an edge and recognizing a character. When a plurality of telops are detected by the telop detection process for other regions as well, only the telops in the important region or the telops overlapping the important region are used as feature extraction targets. It is assumed that there is a telop in the important area, and the process may be advanced.

映像特徴抽出部３５により抽出された特徴は抽出結果として後段の構成に出力され、所定のタイミングで、チャプタの設定やダイジェスト再生のシーンを選択する基準として用いられる。 The feature extracted by the video feature extraction unit 35 is output as a result of extraction to a subsequent configuration, and is used as a reference for selecting a chapter setting or digest playback scene at a predetermined timing.

次に、図６のフローチャートを参照して、重要領域を指定する情報処理装置１の処理について説明する。 Next, processing of the information processing apparatus 1 that designates an important area will be described with reference to the flowchart of FIG.

この処理は、番組の映像、字幕のデータが字幕・映像取得部３１により取得され、映像のデータが映像デコード部３２に、字幕位置情報が字幕位置判定部３３に出力されたときに開始される。映像デコード部３２においては、映像のデコードが行われ、映像情報が映像特徴抽出部３５に出力される。 This processing is started when the video and subtitle data of the program is acquired by the subtitle / video acquisition unit 31, and the video data is output to the video decoding unit 32 and the subtitle position information is output to the subtitle position determination unit 33. . The video decoding unit 32 decodes the video and outputs video information to the video feature extraction unit 35.

ステップＳ１において、字幕位置判定部３３は、注目するフレームに含まれる字幕領域の重心の位置を字幕・映像取得部３１から供給された字幕位置情報に基づいて取得する。 In step S 1, the caption position determination unit 33 acquires the position of the center of gravity of the caption area included in the frame of interest based on the caption position information supplied from the caption / video acquisition unit 31.

ステップＳ２において、字幕位置判定部３３は、ステップＳ１で取得した位置と、字幕領域の移動平均の位置との差を算出する。 In step S2, the caption position determination unit 33 calculates the difference between the position acquired in step S1 and the moving average position of the caption area.

ステップＳ３において、字幕位置判定部３３は、注目するフレームに含まれる字幕領域の重心の位置と、字幕領域の移動平均の位置との差が閾値としての差を超えているか否かを判定し、超えていないと判定した場合、ステップＳ１以降の処理を繰り返す。字幕位置判定部３３による判定結果は領域指定部３４に供給される。 In step S3, the caption position determination unit 33 determines whether or not the difference between the position of the center of gravity of the caption area included in the frame of interest and the moving average position of the caption area exceeds the difference as a threshold, If it is determined that the number does not exceed, the processes after step S1 are repeated. The determination result by the caption position determination unit 33 is supplied to the region designation unit 34.

一方、ステップＳ３において、注目するフレームに含まれる字幕領域の重心の位置と、字幕領域の移動平均の位置との差が閾値としての差を超えていると判定された場合、ステップＳ４において、領域指定部３４は、字幕領域の移動平均の位置との差が閾値としての差を超えており、所定の距離だけ離れた位置に字幕領域が含まれていると字幕位置判定部３３により判定されたフレーム全体のうち、字幕領域の移動平均の位置を含む一定の領域を重要領域として指定する。領域指定部３４により指定された重要領域の位置、範囲などを表す重要領域情報は映像特徴抽出部３５に供給される。 On the other hand, if it is determined in step S3 that the difference between the position of the center of gravity of the subtitle area included in the frame of interest and the moving average position of the subtitle area exceeds the threshold value, the area in step S4 The designation unit 34 determines that the difference between the moving average position of the caption area exceeds the threshold difference and the caption position determination section 33 determines that the caption area is included at a position separated by a predetermined distance. A certain area including the moving average position of the caption area in the entire frame is designated as an important area. The important area information indicating the position and range of the important area designated by the area designation unit 34 is supplied to the video feature extraction unit 35.

ステップＳ５において、映像特徴抽出部３５は、領域指定部３４から供給された重要領域情報により指定される重要領域の映像のみを対象として、あるいは、重要領域の映像から順に対象として、特徴の抽出を行う。映像特徴抽出部３５により抽出された特徴の抽出結果は後段に供給され、処理は終了される。 In step S 5, the video feature extraction unit 35 extracts features for only the video of the important region specified by the important region information supplied from the region specification unit 34 or for the video of the important region in order. Do. The feature extraction result extracted by the video feature extraction unit 35 is supplied to the subsequent stage, and the process ends.

以上の処理により、映像の特徴をよく表していると考えられる領域を重要領域として指定することができ、指定した重要領域の映像を重点的に解析することによって、特徴を効率的に抽出することが可能になる。 Through the above process, it is possible to designate an area that is considered to well represent the features of the video as an important area, and to efficiently extract the features by focusing on the video of the designated important area Is possible.

以上においては、字幕の表示領域の位置が移動した場合に、通常の字幕の表示領域の位置を含む領域が重要領域として指定されるものとしたが、例えば、フレームの隅に表示される時刻の表示領域の位置が移動した場合やロゴの表示領域の位置が移動した場合などのように、他の情報が表示される領域の位置が移動した場合に、その情報の通常時の表示領域の位置を含む領域が重要領域として指定されるようにしてもよい。 In the above, when the position of the subtitle display area is moved, the area including the position of the normal subtitle display area is designated as the important area. When the position of the area where other information is displayed moves, such as when the position of the display area moves or when the position of the logo display area moves, the position of the normal display area of that information May be designated as an important area.

また、以上においては、重要領域の映像が重点的に解析され、特徴が抽出されるものとしたが、解析自体は他の領域と同じ内容、同じ程度の処理が行われ、重要領域の映像から抽出された特徴に、他の領域の映像から抽出された特徴よりも高い優先度が設定され、チャプタの設定などに優先的に用いられるようにしてもよい。 In the above, the video of the important area is preferentially analyzed, and the features are extracted. However, the analysis itself is performed with the same content and the same level of processing as the other areas. A higher priority may be set for the extracted feature than the feature extracted from the video in the other region, and the extracted feature may be used preferentially for chapter setting or the like.

上述した一連の処理は、ハードウエアにより実行させることもできるし、ソフトウエアにより実行させることもできる。一連の処理をソフトウエアにより実行させる場合には、そのソフトウエアを構成するプログラムが、専用のハードウエアに組み込まれているコンピュータ、または、各種のプログラムをインストールすることで、各種の機能を実行することが可能な、例えば汎用のパーソナルコンピュータなどに、プログラム記録媒体からインストールされる。 The series of processes described above can be executed by hardware or can be executed by software. When a series of processing is executed by software, a program constituting the software executes various functions by installing a computer incorporated in dedicated hardware or various programs. For example, it is installed from a program recording medium in a general-purpose personal computer or the like.

コンピュータにインストールされ、コンピュータによって実行可能な状態とされるプログラムを格納するプログラム記録媒体は、図４に示すように、磁気ディスク（フレキシブルディスクを含む）、光ディスク（CD-ROM(Compact Disc-Read Only Memory)，DVD(Digital Versatile Disc)を含む）、光磁気ディスク、もしくは半導体メモリなどよりなるパッケージメディアであるリムーバブルメディア２２、または、プログラムが一時的もしくは永続的に格納されるROM１２や、記録部１９を構成するハードディスクなどにより構成される。プログラム記録媒体へのプログラムの格納は、必要に応じてルータ、モデムなどのインタフェースである通信部２０を介して、ローカルエリアネットワーク、インターネット、デジタル衛星放送といった、有線または無線の通信媒体を利用して行われる。 As shown in FIG. 4, a program recording medium that stores a program that is installed in a computer and can be executed by the computer is a magnetic disk (including a flexible disk), an optical disk (CD-ROM (Compact Disc-Read Only). Memory), DVD (Digital Versatile Disc), magneto-optical disk, or removable media 22 which is a package medium made of semiconductor memory, or ROM 12 where a program is temporarily or permanently stored, or recording unit 19 It is comprised by the hard disk etc. which comprise. The program is stored in the program recording medium using a wired or wireless communication medium such as a local area network, the Internet, or digital satellite broadcasting via the communication unit 20 that is an interface such as a router or a modem as necessary. Done.

なお、本明細書において、プログラムを記述するステップは、記載された順序に沿って時系列的に行われる処理はもちろん、必ずしも時系列的に処理されなくとも、並列的あるいは個別に実行される処理をも含むものである。 In the present specification, the steps for describing a program are not only processes performed in time series in the order described, but also processes that are executed in parallel or individually even if they are not necessarily processed in time series. Is also included.

本発明の実施の形態は、上述した実施の形態に限定されるものではなく、本発明の要旨を逸脱しない範囲において種々の変更が可能である。 The embodiments of the present invention are not limited to the above-described embodiments, and various modifications can be made without departing from the scope of the present invention.

本発明の一実施形態に係る情報処理装置を示す図である。It is a figure which shows the information processing apparatus which concerns on one Embodiment of this invention. 重要領域の例を示す図である。It is a figure which shows the example of an important area | region. 重要領域の検出の概念を示す図である。It is a figure which shows the concept of the detection of an important area | region. 情報処理装置のハードウエア構成例を示すブロック図である。It is a block diagram which shows the hardware structural example of information processing apparatus. 情報処理装置の機能構成例を示すブロック図である。It is a block diagram which shows the function structural example of information processing apparatus. 情報処理装置の処理を説明するフローチャートである。It is a flowchart explaining the process of information processing apparatus.

Explanation of symbols

１情報処理装置，３１字幕・映像取得部，３２映像デコード部，３３字幕位置判定部，３４領域指定部，３５映像特徴抽出部 DESCRIPTION OF SYMBOLS 1 Information processing apparatus, 31 Subtitle | video acquisition part, 32 Image | video decoding part, 33 Subtitle position determination part, 34 Area | region designation | designated part, 35 Image | video feature extraction part

Claims

Acquisition means for acquiring the position of the display area of the subtitles included in each frame;
Designation means for designating a predetermined area including the average position of the subtitle display area as an important area from among the frames including the subtitle display area at a predetermined distance from the average position of the subtitle display area; Information processing apparatus provided.

The information processing apparatus according to claim 1, further comprising a feature extraction unit that performs a process of extracting video features of the important area designated by the designation unit more than a process of extracting video features of another area. .

The information processing apparatus according to claim 2, wherein the feature extraction unit performs processing for extracting video features of the important area designated by the designation unit, which is large in time, processing amount, or number of times.

Get the position of the subtitle display area included in each frame,
Information including a step of designating a predetermined area including the average position of the subtitle display area as an important area among frames including the subtitle display area at a predetermined distance from the average position of the subtitle display area Processing method.

Get the position of the subtitle display area included in each frame,
A process including a step of designating a predetermined area including the average position of the subtitle display area as an important area among frames including the subtitle display area at a predetermined distance from the average position of the subtitle display area A program that causes a computer to execute.