JP2005115607A - Video retrieving device - Google Patents

Video retrieving device Download PDF

Info

Publication number
JP2005115607A
JP2005115607A JP2003348187A JP2003348187A JP2005115607A JP 2005115607 A JP2005115607 A JP 2005115607A JP 2003348187 A JP2003348187 A JP 2003348187A JP 2003348187 A JP2003348187 A JP 2003348187A JP 2005115607 A JP2005115607 A JP 2005115607A
Authority
JP
Japan
Prior art keywords
video
scene
subtitle
analysis
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP2003348187A
Other languages
Japanese (ja)
Inventor
Shingo Miyauchi
進吾 宮内
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Panasonic Holdings Corp
Original Assignee
Matsushita Electric Industrial Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Matsushita Electric Industrial Co Ltd filed Critical Matsushita Electric Industrial Co Ltd
Priority to JP2003348187A priority Critical patent/JP2005115607A/en
Publication of JP2005115607A publication Critical patent/JP2005115607A/en
Pending legal-status Critical Current

Links

Images

Abstract

<P>PROBLEM TO BE SOLVED: To provide a means for extracting a desired scene out of accumulated video contents so as to automatize retrieval of the video. <P>SOLUTION: Subtitle (teletext broadcasting) information multiplexed with video data is fetched and accumulated together with the video and, with these as a clue, the video is retrieved. First, by retrieving a character string or the like from the accumulated subtitle information, subtitles conforming to the scene that a user requires are retrieved and the video equivalent to subtitle presentation timing is used as a candidate for the requested scene. Moreover, image analysis and audio analysis are applied to the candidate scene and the scene which is consequently decided to meet the user's request is extracted. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は、蓄積された映像コンテンツの中から、所望の映像シーンを検索する装置に関するものである。   The present invention relates to an apparatus for searching for a desired video scene from stored video content.

従来、蓄積された映像コンテンツの中からユーザ所望のシーンを検索するには、映像に対して何らかの画像解析あるいは音声解析を適用し、この結果に基づいて当該シーンを抽出する方法があった。また、コンテンツに関する情報を記述したメタデータを人手により作成し、これを用いて映像の検索を行う手段が取られていた。
特開2001−69437号公報 特開2001−143451号公報
Conventionally, in order to search a user-desired scene from the stored video content, there has been a method of applying some kind of image analysis or audio analysis to the video and extracting the scene based on the result. In addition, there has been a means for manually creating metadata describing information related to content and searching for video using the metadata.
JP 2001-69437 A JP 2001-143451 A

従来の特許文献1に記載の画像解析や特許文献2に記載の音声解析を用いる方法は処理の負荷が大きく、検索に時間がかかるという課題があった。また、コンテンツに関する情報を人手で作成しようとすると、膨大な手間が必要であった。   Conventional methods using the image analysis described in Patent Document 1 and the voice analysis described in Patent Document 2 have a problem that the processing load is large and the search takes time. Moreover, enormous effort is required to manually create information about content.

上記を解決するため、本発明装置は映像データに多重化されている字幕(文字放送)情報を取り出し、映像と共に蓄積し、これを手掛りとして映像の検索を行うことを特徴とするものである。   In order to solve the above-described problem, the apparatus of the present invention is characterized in that it extracts subtitle (text broadcast) information multiplexed in video data, accumulates it with the video, and searches for the video using this as a clue.

まず、蓄積された字幕情報に対して文字列探索などを行うことにより、ユーザの要求するシーンに適合すると思われる字幕を探し出し、その字幕の提示タイミングに相当する映像を要求シーンの候補とする。さらに、画像解析および音声解析をこの候補シーンに対して適用し、その結果ユーザの要求を満たすと判断されるシーンを抽出する。   First, by performing a character string search or the like on the stored subtitle information, a subtitle that seems to be suitable for the scene requested by the user is found, and a video corresponding to the presentation timing of the subtitle is set as a request scene candidate. Further, image analysis and sound analysis are applied to the candidate scene, and as a result, a scene determined to satisfy the user's request is extracted.

本発明によれば、字幕情報を利用することにより、画像あるいは音声解析のみを用いた従来の映像検索手段より高速に所望のシーンを抽出することが可能となる。また、映像コンテンツに関する情報を人手により作成する手間が省ける。   According to the present invention, by using subtitle information, a desired scene can be extracted at higher speed than conventional video search means using only image or audio analysis. Further, it is possible to save time and effort for manually creating information relating to video content.

(実施の形態1)
図1は、本発明の映像検索装置の実施の形態の一例の構成を示したブロック図である。以下、この図を参照し本発明の実施形態を説明する。なお本形態における映像データとは、動画像と音声、およびこれに付随する字幕(発話内容などをテキスト化したもの)が符号化され多重化されたものと定義する。また、ここで抽出対象とするシーンとは、例えばスポーツ中継における得点シーン、ドラマやバラエティにおける特定人物の登場シーン、ニュースにおける特定のトピックなど、意味的なまとまりを有する映像のセグメントを指す。
(Embodiment 1)
FIG. 1 is a block diagram showing a configuration of an example of an embodiment of a video search apparatus according to the present invention. Hereinafter, an embodiment of the present invention will be described with reference to this figure. Note that the video data in this embodiment is defined as video and audio and subtitles (texts of utterances and the like) accompanying the video and audio are encoded and multiplexed. In addition, the scene to be extracted here refers to a segment of a video having a meaningful group such as a scoring scene in a sports broadcast, an appearance scene of a specific person in a drama or variety, or a specific topic in news.

はじめに、本装置に映像データが入力されると、多重分離処理部100において字幕データが取り出される。この字幕データは字幕データ処理部110に入力され復号化され、また併せて入力された映像の時刻情報と対応付けられる。これにより、発話内容等とその提示タイミング(映像と同期を取るために必要な時刻情報)を含む字幕情報が生成される。得られた字幕情報は映像と共に映像蓄積部200に蓄積される。   First, when video data is input to this apparatus, subtitle data is extracted by the demultiplexing processing unit 100. The subtitle data is input to the subtitle data processing unit 110, decoded, and associated with the time information of the input video. Thereby, caption information including the utterance contents and the presentation timing thereof (time information necessary for synchronizing with the video) is generated. The obtained caption information is stored in the video storage unit 200 together with the video.

蓄積された映像に対し、ユーザ要求受付部400よりユーザから特定シーンの検索要求があると、シーン抽出部300はその要求に合致するシーンの抽出を開始する。まず、シーン抽出部において抽出シーンに関連する字幕的な特徴が設定されると、字幕解析部310は映像蓄積部に蓄積された字幕情報の中からその設定にマッチする箇所を検出する。そして、シーン抽出部は検出された字幕の時刻情報を参照し、このタイミングに相当する映像を候補シーンとして映像蓄積部から選び出す。   When the user request is received from the user request accepting unit 400 for the stored video, the scene extracting unit 300 starts extracting a scene that matches the request. First, when a subtitle-like feature related to the extracted scene is set in the scene extraction unit, the subtitle analysis unit 310 detects a portion that matches the setting from the subtitle information stored in the video storage unit. The scene extraction unit refers to the detected subtitle time information, and selects a video corresponding to this timing from the video storage unit as a candidate scene.

字幕情報の中には発話内容や話者名、情景のナレーションなどが含まれていることから、ここでの字幕的な特徴とはシーンに関するキーワードや人物名の有無など、ここでの字幕解析とは文字列探索や自然言語処理などを想定する。例えば処理の一例を図2を用いて説明すると、サッカーのゴールシーンを検索する場合、蓄積された字幕情報の中から「ゴール」という文字列を探し出し、それら字幕の提示タイミングにあたる周辺の映像を当該シーンの候補として選出する、といった方法を取ることが考えられる。   Since the subtitle information includes utterance content, speaker name, scene narration, etc., the subtitle-like features here include subtitle analysis here, such as the presence of keywords related to scenes and the presence of person names. Assumes character string search and natural language processing. For example, when an example of processing is described with reference to FIG. 2, when searching for a goal scene of soccer, a character string “goal” is searched from accumulated subtitle information, and surrounding video corresponding to the presentation timing of those subtitles is searched. It can be considered to select a candidate for a scene.

次に画像解析あるいは音声解析、または両者を用いることにより、前記で得られた候補シーンの精査を行う。シーン抽出部において抽出シーンに関連する画像的な特徴が設定されると、画像解析部320は候補シーンがその設定にマッチするかを判別する。ここでの画像解析としては、上記の従来技術にあるようなモデル画像とのマッチングや、オブジェクト/顔認識、輝度や色情報に基づく判定、カット分割、などの利用が考えられる。   Next, the candidate scene obtained as described above is scrutinized by using image analysis or voice analysis, or both. When an image feature related to the extracted scene is set in the scene extraction unit, the image analysis unit 320 determines whether the candidate scene matches the setting. As the image analysis here, it is conceivable to use matching with a model image as in the above prior art, object / face recognition, determination based on luminance or color information, cut division, and the like.

本例のサッカーのゴールシーンを検索する場合であれば、まず前記字幕解析により得られた候補映像を、カット分割を用いてカットに分割する。そして、分割されたカットの各先頭フレームと、ゴールポストを特徴とするゴールシーンのモデル画像とのマッチングを行う。この類似度が閾値以上であれば、そのカット(および前後数カット)は所望のシーンに相当すると判断することができる。   In the case of searching for the soccer goal scene of this example, first, the candidate video obtained by the caption analysis is divided into cuts using cut division. Then, each head frame of the divided cut is matched with the model image of the goal scene characterized by the goal post. If the similarity is equal to or greater than the threshold, it can be determined that the cut (and the preceding and following several cuts) correspond to a desired scene.

同様に音声解析部330は、抽出シーンに関連する音声的な特徴が設定されると、候補シーンがその設定にマッチするかを判別する。ここでの音声解析としては、上記の従来技術にあるような周波数解析や歓声/特定音検出、スペクトル分析、話者判定、などの利用が考えられる。サッカーのゴールシーンを検索する場合を例に挙げると、まず前記字幕解析により得られた候補映像に対して、音声のパワーレベルと周波数の解析を行う。この結果、大きな歓声を含む区間の前後を、所望のシーンに相当すると判断することができる。   Similarly, when an audio feature related to the extracted scene is set, the audio analysis unit 330 determines whether the candidate scene matches the setting. As voice analysis here, use of frequency analysis, cheer / specific sound detection, spectrum analysis, speaker determination, and the like as in the above-described prior art can be considered. Taking a case where a soccer goal scene is searched as an example, the power level and frequency of an audio are first analyzed for the candidate video obtained by the caption analysis. As a result, it can be determined that the front and back of the section including a loud cheer correspond to the desired scene.

シーン抽出部は以上の解析結果から総合的に判断し、ユーザの要求に合致するシーンを映像蓄積部から抽出する。そして、抽出された映像は映像提示部500においてユーザに提示される。   The scene extraction unit comprehensively determines from the above analysis results, and extracts a scene that matches the user's request from the video storage unit. Then, the extracted video is presented to the user in the video presentation unit 500.

上記のように本発明によれば、字幕情報を利用することにより、画像あるいは音声解析のみを用いた従来の映像検索手段より高速に所望のシーンを抽出することが可能となる。また、映像コンテンツに関する情報を人手により作成する手間を省略することができる。   As described above, according to the present invention, it is possible to extract a desired scene at a higher speed than the conventional video search means using only image or audio analysis by using the caption information. Further, it is possible to save time and labor for manually creating information related to video content.

本発明の映像検索装置は、字幕情報を有する映像コンテンツが蓄積されている環境において、シーンの検索を実現する機能を持つことより、デジタル放送受信端末や映像記録・再生装置、メディアサーバなどの映像機器において有用である。   The video search apparatus according to the present invention has a function of searching for a scene in an environment in which video content having subtitle information is stored. Useful in equipment.

本発明の実施の形態の構成を示したブロック図The block diagram which showed the structure of embodiment of this invention 字幕解析の実施例の概観図Overview of subtitle analysis example

符号の説明Explanation of symbols

100 多重分離処理部
110 字幕データ処理部
200 映像蓄積部
300 シーン抽出部
310 字幕解析部
320 画像解析部
330 音声解析部
400 ユーザ要求受付部
500 映像提示部
100 Demultiplexing Processing Unit 110 Subtitle Data Processing Unit 200 Video Storage Unit 300 Scene Extraction Unit 310 Subtitle Analysis Unit 320 Image Analysis Unit 330 Audio Analysis Unit 400 User Request Accepting Unit 500 Video Presentation Unit

Claims (1)

映像データに多重化された字幕情報を取り出して前記映像と共に蓄積し、蓄積された字幕情報の中からユーザが所望するシーンに適合する箇所を検出し、その字幕の提示タイミングに相当する映像に画像解析あるいは音声解析を適用し、シーンを抽出することを特徴とした映像検索装置。 The subtitle information multiplexed in the video data is taken out and stored together with the video, and a location suitable for the scene desired by the user is detected from the stored subtitle information, and the video corresponding to the presentation timing of the subtitle is displayed. A video search apparatus characterized by extracting scenes by applying analysis or voice analysis.
JP2003348187A 2003-10-07 2003-10-07 Video retrieving device Pending JP2005115607A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2003348187A JP2005115607A (en) 2003-10-07 2003-10-07 Video retrieving device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2003348187A JP2005115607A (en) 2003-10-07 2003-10-07 Video retrieving device

Publications (1)

Publication Number Publication Date
JP2005115607A true JP2005115607A (en) 2005-04-28

Family

ID=34540456

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2003348187A Pending JP2005115607A (en) 2003-10-07 2003-10-07 Video retrieving device

Country Status (1)

Country Link
JP (1) JP2005115607A (en)

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006332765A (en) * 2005-05-23 2006-12-07 Sharp Corp Contents searching/reproducing method, contents searching/reproducing apparatus, and program and recording medium
JP2006350477A (en) * 2005-06-13 2006-12-28 Canon Inc File management device, its control method, computer program, and computer readable storage medium
JP2007052626A (en) * 2005-08-18 2007-03-01 Matsushita Electric Ind Co Ltd Metadata input device and content processor
JP2008006214A (en) * 2006-06-30 2008-01-17 Toshiba Corp Operation control device of electronic apparatus
JP2008078876A (en) * 2006-09-20 2008-04-03 Hitachi Ltd Program recording reproducing device, program reproducing position control method and program information providing device
JP2008130050A (en) * 2006-11-24 2008-06-05 Canon Inc Image retrieval device and method therefor
JP2008141621A (en) * 2006-12-04 2008-06-19 Nippon Hoso Kyokai <Nhk> Device and program for extracting video-image
JP2008172324A (en) * 2007-01-09 2008-07-24 Sony Corp Information processor, processing method and program
JP2008219285A (en) * 2007-03-01 2008-09-18 Nintendo Co Ltd Video content display program and video content display device
WO2009032639A1 (en) * 2007-09-04 2009-03-12 Apple Inc. Display of video subtitles
WO2009110491A1 (en) * 2008-03-07 2009-09-11 シャープ株式会社 Content display apparatus, content display method, program, and recording medium
JP2010161722A (en) * 2009-01-09 2010-07-22 Sony Corp Data processing apparatus and method, and program
JP2010218385A (en) * 2009-03-18 2010-09-30 Nippon Hoso Kyokai <Nhk> Content retrieval device and computer program
JP2011109292A (en) * 2009-11-16 2011-06-02 Canon Inc Imaging apparatus, control method and program thereof, and storage medium
JP2012043422A (en) * 2010-08-16 2012-03-01 Nhn Corp Retrieval result providing method and system using subtitle information
WO2013037082A1 (en) * 2011-09-12 2013-03-21 Intel Corporation Using gestures to capture multimedia clips
US8693847B2 (en) 2009-02-06 2014-04-08 Sony Corporation Contents processing apparatus and method
JP2015177470A (en) * 2014-03-17 2015-10-05 富士通株式会社 Extraction program, extraction method, and extraction device
JP2019003604A (en) * 2017-06-09 2019-01-10 富士ゼロックス株式会社 Methods, systems and programs for content curation in video-based communications
CN110035326A (en) * 2019-04-04 2019-07-19 北京字节跳动网络技术有限公司 Subtitle generation, the video retrieval method based on subtitle, device and electronic equipment

Cited By (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006332765A (en) * 2005-05-23 2006-12-07 Sharp Corp Contents searching/reproducing method, contents searching/reproducing apparatus, and program and recording medium
JP2006350477A (en) * 2005-06-13 2006-12-28 Canon Inc File management device, its control method, computer program, and computer readable storage medium
JP2007052626A (en) * 2005-08-18 2007-03-01 Matsushita Electric Ind Co Ltd Metadata input device and content processor
JP2008006214A (en) * 2006-06-30 2008-01-17 Toshiba Corp Operation control device of electronic apparatus
JP4668875B2 (en) * 2006-09-20 2011-04-13 株式会社日立製作所 Program recording / playback apparatus, program playback position control method, and program information providing apparatus
JP2008078876A (en) * 2006-09-20 2008-04-03 Hitachi Ltd Program recording reproducing device, program reproducing position control method and program information providing device
JP2008130050A (en) * 2006-11-24 2008-06-05 Canon Inc Image retrieval device and method therefor
US8265397B2 (en) 2006-11-24 2012-09-11 Canon Kabushiki Kaisha Image retrieval apparatus and method thereof
JP4695582B2 (en) * 2006-12-04 2011-06-08 日本放送協会 Video extraction apparatus and video extraction program
JP2008141621A (en) * 2006-12-04 2008-06-19 Nippon Hoso Kyokai <Nhk> Device and program for extracting video-image
JP2008172324A (en) * 2007-01-09 2008-07-24 Sony Corp Information processor, processing method and program
US8879885B2 (en) 2007-01-09 2014-11-04 Sony Corporation Information processing apparatus, information processing method, and program
JP2008219285A (en) * 2007-03-01 2008-09-18 Nintendo Co Ltd Video content display program and video content display device
WO2009032639A1 (en) * 2007-09-04 2009-03-12 Apple Inc. Display of video subtitles
US9602757B2 (en) 2007-09-04 2017-03-21 Apple Inc. Display of video subtitles
US10652500B2 (en) 2007-09-04 2020-05-12 Apple Inc. Display of video subtitles
US10003764B2 (en) 2007-09-04 2018-06-19 Apple Inc. Display of video subtitles
WO2009110491A1 (en) * 2008-03-07 2009-09-11 シャープ株式会社 Content display apparatus, content display method, program, and recording medium
JP2010161722A (en) * 2009-01-09 2010-07-22 Sony Corp Data processing apparatus and method, and program
US8693847B2 (en) 2009-02-06 2014-04-08 Sony Corporation Contents processing apparatus and method
JP2010218385A (en) * 2009-03-18 2010-09-30 Nippon Hoso Kyokai <Nhk> Content retrieval device and computer program
JP2011109292A (en) * 2009-11-16 2011-06-02 Canon Inc Imaging apparatus, control method and program thereof, and storage medium
JP2012043422A (en) * 2010-08-16 2012-03-01 Nhn Corp Retrieval result providing method and system using subtitle information
WO2013037082A1 (en) * 2011-09-12 2013-03-21 Intel Corporation Using gestures to capture multimedia clips
JP2015177470A (en) * 2014-03-17 2015-10-05 富士通株式会社 Extraction program, extraction method, and extraction device
US9892320B2 (en) 2014-03-17 2018-02-13 Fujitsu Limited Method of extracting attack scene from sports footage
JP2019003604A (en) * 2017-06-09 2019-01-10 富士ゼロックス株式会社 Methods, systems and programs for content curation in video-based communications
JP7069778B2 (en) 2017-06-09 2022-05-18 富士フイルムビジネスイノベーション株式会社 Methods, systems and programs for content curation in video-based communications
CN110035326A (en) * 2019-04-04 2019-07-19 北京字节跳动网络技术有限公司 Subtitle generation, the video retrieval method based on subtitle, device and electronic equipment

Similar Documents

Publication Publication Date Title
JP2005115607A (en) Video retrieving device
CN101650958B (en) Extraction method and index establishment method of movie video scene fragment
US6580437B1 (en) System for organizing videos based on closed-caption information
TWI233026B (en) Multi-lingual transcription system
KR101644789B1 (en) Apparatus and Method for providing information related to broadcasting program
JP4127668B2 (en) Information processing apparatus, information processing method, and program
JP2004152063A (en) Structuring method, structuring device and structuring program of multimedia contents, and providing method thereof
US20060136226A1 (en) System and method for creating artificial TV news programs
CN110781328A (en) Video generation method, system, device and storage medium based on voice recognition
WO2005055196A3 (en) System &amp; method for integrative analysis of intrinsic and extrinsic audio-visual data
US7349477B2 (en) Audio-assisted video segmentation and summarization
CN110691271A (en) News video generation method, system, device and storage medium
JP2012181358A (en) Text display time determination device, text display system, method, and program
JP2005025413A (en) Content processing device, content processing method, and program
JP2006115052A (en) Content retrieval device and its input device, content retrieval system, content retrieval method, program and recording medium
CN110992984B (en) Audio processing method and device and storage medium
US8538244B2 (en) Recording/reproduction apparatus and recording/reproduction method
KR20110080712A (en) Method and system for searching moving picture by voice recognition of mobile communication terminal and apparatus for converting text of voice in moving picture
JP2008022292A (en) Performer information search system, performer information obtaining apparatus, performer information searcher, method thereof and program
CN116017088A (en) Video subtitle processing method, device, electronic equipment and storage medium
JP2007519321A (en) Method and circuit for creating a multimedia summary of an audiovisual data stream
KR100348901B1 (en) Segmentation of acoustic scences in audio/video materials
JP2008059343A (en) Information processing system and program
WO2011161820A1 (en) Video processing device, video processing method and video processing program
JP4323937B2 (en) Video comment generating apparatus and program thereof