JP2007293602A

JP2007293602A - System and method for retrieving image and program

Info

Publication number: JP2007293602A
Application number: JP2006120939A
Authority: JP
Inventors: Takehiro Yamamoto; 武洋山本; Norihiro Maki; 紀宏牧; Shinji Nakano; 信司中野
Original assignee: NEC Corp; NEC System Technologies Ltd
Current assignee: NEC Corp; NEC Solution Innovators Ltd
Priority date: 2006-04-25
Filing date: 2006-04-25
Publication date: 2007-11-08

Abstract

<P>PROBLEM TO BE SOLVED: To objectively retrieve classification of images and to shorten a time for selecting a plurality of images according to a common keyword. <P>SOLUTION: An image distribution server registration device 14 transmits correspondence data information as a result of an image text relating means 12 associating image data and image associated data with each other to an image distribution server 30. The image distribution server 30 is provided with an image data storage area 31, an image associated data storage area 32, a correspondence data storage area 33, a mining means 35 performing weighting during retrieval, and a classification dictionary 34 and the like for performing weighting. A terminal 40 transmits information such as a keyword to the image distribution server 30. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、テキストと連動した映像あいまい検索を行う技術に関する。 The present invention relates to a technique for performing an ambiguous video search in conjunction with text.

従来、映像データのネットワークを通じて、検索する場合、映像に関する情報をテキスト化し、インデクス化し、動画・映像ファイルに含ませることによって、入力されたキーワードと一致した動画・映像を結果として配信することがなされていた。 Conventionally, when searching through a network of video data, information related to the video is converted into text, indexed, and included in the video / video file, thereby distributing the video / video that matches the input keyword as a result. It was.

この場合、入力されたキーワードと一致しないと結果が得られない問題や映像再生においても、先頭からしか再生できないという問題が生じる。 In this case, there arises a problem that a result cannot be obtained unless it matches the input keyword, and that a video can be reproduced only from the beginning.

ここで、検索対象が画像データの場合、画像データとその画像データに関連する情報をオブジェクト抽出、特徴量抽出、単語推定、語句推定、類義語推定、の各処理により生成し、画像データをキーワード検索する際の検索制度（ユーザが意図する画像を検索する精度）を上げる技術が提案されている（例えば、特許文献１参照）。
特開２００４−３６２３１４号公報 Here, when the search target is image data, the image data and information related to the image data are generated by object extraction, feature extraction, word estimation, phrase estimation, and synonym estimation, and the image data is searched by keyword. A technique for improving a search system (accuracy for searching an image intended by a user) has been proposed (for example, see Patent Document 1).
JP 2004-362314 A

しかしながら、上述した従来例においては次のような問題点があった。 However, the conventional example described above has the following problems.

従来の方法では、映像（動画像）データの検索における映像に関連するデータとの対応に基づく検索精度（主観的な分類ではなく客観的な分類に基づく検索精度）については実現することが困難であった。 In the conventional method, it is difficult to realize the search accuracy (search accuracy based on objective classification, not subjective classification) based on correspondence with video-related data in video (moving image) data search. there were.

また、映像データに関する概要データを関連付け、キーワード等により概要データを検索させた結果においても、利用者が取得したいと思う結果が得られないという問題がある。 In addition, there is a problem that the result that the user wants to acquire cannot be obtained even when the outline data related to the video data is correlated and the outline data is searched by a keyword or the like.

本発明は、以上説明した事情に鑑みてなされたものであり、その目的は、映像の分類を客観的に検索し、かつ、一般的なキーワードにより、複数の映像を選択する時間を短縮することにある。 The present invention has been made in view of the circumstances described above, and an object of the present invention is to objectively search for a video classification and reduce the time for selecting a plurality of videos using a general keyword. It is in.

上記課題を解決するために、本発明は、映像に付随する音声から取得したテキストデータから映像を特定する映像検索システムを提供する。 In order to solve the above-described problems, the present invention provides a video search system for specifying a video from text data acquired from audio accompanying the video.

また、本発明は、映像データと映像付随データを関係付ける映像テキスト対応手段と、
前記映像テキスト対応手段の結果の対応データ情報を映像配信サーバに登録する映像配信サーバ登録装置と、
映像データ格納領域、映像付随データ格納領域、対応データ格納領域、検索時に重み付けをするマイニング手段、重み付けをするための分類辞書、マイニング手段による結果を検索結果として生成する検索結果生成手段と映像配信装置で構成される映像配信サーバと、
前記映像配信サーバにキーワード等の情報を送る端末装置がそれぞれネットワークで接続されている映像検索システムを提供する。 The present invention also provides a video text corresponding means for associating video data and video accompanying data;
A video distribution server registration device for registering corresponding data information as a result of the video text corresponding means in a video distribution server;
Video data storage area, video associated data storage area, corresponding data storage area, mining means for weighting at the time of search, classification dictionary for weighting, search result generating means for generating results by mining means as search results, and video distribution apparatus A video distribution server comprising:
Provided is a video search system in which terminal devices that send information such as keywords to the video distribution server are connected via a network.

また、本発明は、映像に付随する音声から取得したテキストデータから映像を特定する映像検索方法を提供する。 The present invention also provides a video search method for identifying a video from text data acquired from audio accompanying the video.

また、本発明は、映像データと映像付随データを関係付ける映像テキスト対応手段と、
前記映像テキスト対応手段の結果の対応データ情報を映像配信サーバに登録する映像配信サーバ登録装置と、
映像データ格納領域、映像付随データ格納領域、対応データ格納領域、検索時に重み付けをするマイニング手段、重み付けをするための分類辞書、マイニング手段による結果を検索結果として生成する検索結果生成手段と映像配信装置で構成される映像配信サーバと、
前記映像配信サーバにキーワード等の情報を送る端末装置を利用する映像検索方法を提供する。 The present invention also provides a video text corresponding means for associating video data and video accompanying data;
A video distribution server registration device for registering corresponding data information as a result of the video text corresponding means in a video distribution server;
Video data storage area, video associated data storage area, corresponding data storage area, mining means for weighting at the time of search, classification dictionary for weighting, search result generating means for generating results by mining means as search results, and video distribution apparatus A video distribution server comprising:
A video search method using a terminal device that sends information such as keywords to the video distribution server is provided.

本発明によれば、映像の分類を客観的に検索し、かつ、一般的なキーワードにより、複数の映像を選択する時間を短縮することができる。 ADVANTAGE OF THE INVENTION According to this invention, the classification | category of an image | video can be searched objectively, and the time which selects a some image | video with a general keyword can be shortened.

以下、本発明の第１の実施の形態について図面を参照して詳細に説明する。 Hereinafter, a first embodiment of the present invention will be described in detail with reference to the drawings.

図１を参照すると、本実施の形態における映像検索システムは、映像データ１０と映像付随データ１１（映像データ内の全て音声をテキストデータに変えたもの、または映像データの概要をテキストデータにしたもの、タイトルなど）を映像データ中の指定時間に対応させる映像テキスト対応手段１２により、映像と映像付随データを対応させ、対応データ１３を生成する。 Referring to FIG. 1, the video search system according to the present embodiment includes video data 10 and video accompanying data 11 (all audio in video data is changed to text data, or video data is converted to text data). , A title, etc.) are associated with the specified time in the video data by the video text corresponding means 12 so that the video is associated with the video accompanying data and the corresponding data 13 is generated.

映像データ１０、映像付随データ１１、対応データ１３を映像配信サーバ登録装置１４がネットワークで映像配信サーバ３０に接続されており、映像配信サーバ３０は、映像データ格納領域３１、映像付随データ格納領域３２、対応データ格納領域３３、分類辞書３４、マイニング手段３５、検索結果から再生映像位置に変換する検索結果生成手段３６及び映像配信装置３７で構成される。 The video distribution server registration device 14 is connected to the video distribution server 30 via the network for the video data 10, the video associated data 11, and the corresponding data 13. The video distribution server 30 includes a video data storage area 31 and a video associated data storage area 32. , A corresponding data storage area 33, a classification dictionary 34, a mining means 35, a search result generating means 36 for converting a search result into a reproduction video position, and a video distribution device 37.

さらに、映像配信サーバ３０に繋がれている端末４０の入力装置４１、表示再生装置４２で構成される。 Furthermore, the terminal 40 is connected to the video distribution server 30 by an input device 41 and a display / playback device 42.

次に、本実施の形態におけるテキストと連動した映像あいまい検索動作について、説明する。 Next, the video fuzzy search operation in conjunction with the text in this embodiment will be described.

映像データ１０に関して、映像のタイトル、概要などのテキストデータ、または、映像内の全ての音声をテキストデータにするテキストエディタや音声認識により作成し、映像付随データ１１とする。 The video data 10 is created by text data such as a video title and outline, or a text editor or voice recognition that converts all voices in the video to text data, and is used as video accompanying data 11.

映像データ１０と映像付随データ１１を入力として、映像テキスト対応手段１２に渡し、映像テキスト対応手段１２は、映像データ１０の区切り時間と映像付随データ１１との対応を取り、再生時間とテキストの対応データ１３を作成する。 The video data 10 and the video associated data 11 are input and passed to the video text corresponding means 12, and the video text corresponding means 12 takes the correspondence between the separation time of the video data 10 and the video associated data 11, and the correspondence between the reproduction time and the text. Data 13 is created.

これは、例えば、特開２００４−１５２０６３号公報に開示されているような手法により実現されてもよい。すなわち、タイムコード生成手段は、音声情報における各単語の開始時刻と終了時刻との情報である単語タイムコードを音声認識手段に出力する。音声認識手段は、音声認識テキスト情報に単語タイムコードを付加して、映像情報とともにマッピング手段に出力する。従って、この実施の形態では、タイムコード生成手段は、音声認識テキスト情報を所定の複数の分割部分（例えば複数の段落）に分割した場合の各分割部分の開始時刻と終了時刻とを示すタイムコードを生成するための情報として、単語タイムコードを生成する。 This may be realized, for example, by a technique as disclosed in JP 2004-152063 A. That is, the time code generation means outputs a word time code, which is information on the start time and end time of each word in the voice information, to the voice recognition means. The voice recognition means adds a word time code to the voice recognition text information and outputs it to the mapping means together with the video information. Therefore, in this embodiment, the time code generation means is a time code indicating the start time and end time of each divided portion when the speech recognition text information is divided into a plurality of predetermined divided portions (for example, a plurality of paragraphs). A word time code is generated as information for generating.

テキスト入力手段は、オペレータからテキストメディアを受け取る。テキスト入力手段７は、テキストメディアがデジタル媒体であった場合には、テキストメディアに記録されているテキスト情報をマッピング手段に出力する。テキストメディアがアナログ媒体であった場合には、テキスト入力手段は、例えばＯＣＲ（光学式文字読み取り）装置を含むものとして構成される。そして、テキストメディアに記録されているテキスト情報に対してデジタル変換を行って、デジタル化したテキスト情報をマッピング手段に出力する。 The text input means receives text media from the operator. When the text media is a digital medium, the text input means 7 outputs the text information recorded on the text media to the mapping means. When the text medium is an analog medium, the text input means is configured to include, for example, an OCR (optical character reading) device. Then, digital conversion is performed on the text information recorded on the text medium, and the digitized text information is output to the mapping means.

マッピング手段は、テキスト情報を適当な間隔で分割部分に区切る。ここでは、例えば、改行やインデント等を検出することによって文の固まりである段落を検出し、改行箇所等を区切り位置としてテキスト情報を区切る。さらに、マッピング手段は、テキスト情報と音声認識テキスト情報とを比較して、テキスト情報における区切り位置にもとづいて音声認識テキスト情報を区切り、段落音声認識テキスト情報を生成する。さらに、テキスト情報における各段落（区切られたテキスト情報）と各段落音声認識テキスト情報との１対１の対応を示す情報である対応情報を生成する。そして、段落音声認識テキスト情報およびテキスト情報とともに対応情報を構造化手段に出力する。 The mapping means divides the text information into divided parts at appropriate intervals. Here, for example, paragraphs that are chunks of sentences are detected by detecting line breaks, indents, and the like, and the text information is delimited using the line breaks and the like as delimiter positions. Further, the mapping means compares the text information with the speech recognition text information, delimits the speech recognition text information based on the delimiter position in the text information, and generates paragraph speech recognition text information. Furthermore, correspondence information that is information indicating a one-to-one correspondence between each paragraph (delimited text information) and each paragraph speech recognition text information in the text information is generated. Then, the correspondence information is output to the structuring means together with the paragraph voice recognition text information and the text information.

構造化手段は、段落音声認識テキスト情報における各単語に付加されている単語タイムコードから、各段落の開始時刻と終了時刻である各タイムコードを算出する。さらに構造化手段は、タイムコードと対応情報とにもとづいて、テキスト情報の各段落と各タイムコードとの１対１の対応を示す情報である構造化情報を生成する。 The structuring means calculates each time code that is the start time and end time of each paragraph from the word time code added to each word in the paragraph speech recognition text information. Further, the structuring unit generates structured information that is information indicating a one-to-one correspondence between each paragraph of the text information and each time code based on the time code and the correspondence information.

また、構造化手段１１は、構造化情報にもとづいて、テキスト情報を保存するテキストメディアファイルおよびタイムコードを保存するタイムコードファイルを生成するとともに、映像情報を保存する映像メディアファイルとを生成する。 Further, the structuring unit 11 generates a text media file for storing text information and a time code file for storing time code based on the structured information, and a video media file for storing video information.

例えば、テキストメディアファイルには各段落内のテキスト情報が段落順に格納され、タイムコードファイルには各段落に対応するタイムコードがテキスト情報における段落順と同順で格納される。なお、それぞれが各段落に対応した複数のテキストメディアファイルと、タイムコードファイルとを生成してもよい。 For example, text information in each paragraph is stored in the text media file in the order of paragraphs, and a time code corresponding to each paragraph is stored in the time code file in the same order as the paragraph order in the text information. A plurality of text media files each corresponding to each paragraph and a time code file may be generated.

次に、構造化手段は、映像メディアファイルと、テキストメディアファイルと、タイムコードファイルとをデータ格納手段に出力する。データ格納手段内の映像メディア格納手段は映像メディアファイルを記憶し、データ格納手段内のテキストメディア格納手段はテキストメディアファイルを記憶し、データ格納手段内のタイムコード格納手段はタイムコードファイルを記憶する。この例において、テキスト情報と映像情報とを格納する格納手段は、映像メディア格納手段およびテキストメディア格納手段に相当し、各分割部分のタイムコードを格納するタイムコード格納手段は、タイムコード格納手段に相当する。 Next, the structuring unit outputs the video media file, the text media file, and the time code file to the data storage unit. The video media storage means in the data storage means stores the video media file, the text media storage means in the data storage means stores the text media file, and the time code storage means in the data storage means stores the time code file. . In this example, the storage means for storing the text information and the video information corresponds to the video media storage means and the text media storage means, and the time code storage means for storing the time code of each divided portion is the time code storage means. Equivalent to.

なお、映像情報は、映像入力手段から、音声認識手段、マッピング手段および構造化手段を介してデータ格納手段に供給されるようにしてもよいが、映像入力手段から直接データ格納手段に供給されるようにしてもよい。 The video information may be supplied from the video input unit to the data storage unit via the voice recognition unit, the mapping unit, and the structuring unit, but is directly supplied from the video input unit to the data storage unit. You may do it.

また、構造化手段が、テキストメディアファイルにおけるテキスト情報の各段落の開始アドレスと終了アドレスと、タイムコードファイルにおける各段落のタイムコードの開始アドレスと終了アドレスとを、管理情報として生成してもよい。このとき、マルチメディアコンテンツ装置は、データ格納手段において管理情報を格納する管理ファイル格納手段が含まれる構成になる。そして、構造化手段は、管理情報を管理ファイル格納手段に出力し、管理ファイル格納手段は管理情報を記憶する。 Further, the structuring means may generate, as management information, the start address and end address of each paragraph of the text information in the text media file, and the start address and end address of the time code of each paragraph in the time code file. . At this time, the multimedia content apparatus includes a management file storage unit that stores management information in the data storage unit. The structuring unit outputs the management information to the management file storage unit, and the management file storage unit stores the management information.

また、構造化手段が、テキスト情報とタイムコードとを結合した、タイムコードを含む構造化されたテキストメディアを生成してもよい。 Further, the structuring unit may generate a structured text media including the time code obtained by combining the text information and the time code.

さらに、構造化手段が、ＸＭＬ（エクステンシブルマークアップランゲージ）言語による、ＭＰＥＧ７（ムービングピクチャーエキスパートグループ７）形式の構造的記述によるＸＭＬファイルを生成する方法もある。ＸＭＬファイルを生成する場合には、マルチメディアコンテンツ装置は、データ格納手段にテキストメディア格納手段とタイムコード格納手段とが含まれず、ＸＭＬファイル格納手段が含まれた構成になる。そして、構造化手段は、ＸＭＬファイルをＸＭＬファイル格納手段に出力し、ＸＭＬファイル格納手段は、ＸＭＬファイルを記憶する。この例では、映像情報が格納される格納手段は映像メディア格納手段に相当し、タイムコード、テキスト情報、および映像情報の格納位置を示す情報を記述したＸＭＬ言語が格納されるＸＭＬファイル格納手段は、ＸＭＬファイル格納手段に相当する。 Further, there is a method in which the structuring means generates an XML file with a structural description in the MPEG7 (moving picture expert group 7) format in the XML (Extensible Markup Language) language. In the case of generating an XML file, the multimedia content apparatus does not include a text media storage unit and a time code storage unit in the data storage unit, but includes an XML file storage unit. Then, the structuring unit outputs the XML file to the XML file storage unit, and the XML file storage unit stores the XML file. In this example, the storage means for storing the video information corresponds to the video media storage means, and the XML file storage means for storing the XML language describing the time code, text information, and information indicating the storage position of the video information is stored. This corresponds to the XML file storage means.

ユーザは、所望の映像情報およびテキスト情報を要求するときに、同期データ利用手段にキーワードとなる語句を入力する。すると、入出力手段は、ユーザが入力した語句を検索制御手段に出力する。検索制御手段は、その語句を含むテキスト情報の段落をテキストメディア格納手段、またはＸＭＬファイル格納手段から検索し、該当するテキスト情報の段落を入出力手段に出力する。さらに、ユーザが、あるテキスト情報の段落を選択した場合、入出力手段は、ユーザが選択したテキスト情報と同期する映像情報の出力を検索制御手段に要求する。 When a user requests desired video information and text information, the user inputs a word / phrase as a keyword to the synchronous data utilization means. Then, the input / output means outputs the phrase input by the user to the search control means. The search control unit searches the text media storage unit or the XML file storage unit for a paragraph of text information including the word and outputs the corresponding text information paragraph to the input / output unit. Further, when the user selects a paragraph of certain text information, the input / output means requests the search control means to output video information synchronized with the text information selected by the user.

検索制御手段は、映像情報を映像メディア格納手段から取り出し、ユーザが選択したテキスト情報の段落のタイムコードを、タイムコード格納手段、テキストメディア格納手段、またはＸＭＬファイル格納手段から抽出し、同期手段に出力する。同期手段は、タイムコードが示す開始時刻を映像情報の出力の先頭時間とし、タイムコードの示す終了時刻を映像情報の出力の最終時間として、入出力手段に映像情報の出力を行う。また、同期手段は、テキスト情報を、タイムコードにもとづいて加工して入出力手段に出力する。このときの加工として、例えば、テキスト情報をスクロールさせるなどの方法がある。 The search control means extracts the video information from the video media storage means, extracts the time code of the paragraph of the text information selected by the user from the time code storage means, the text media storage means, or the XML file storage means, and sends it to the synchronization means. Output. The synchronization means outputs the video information to the input / output means with the start time indicated by the time code as the start time of the video information output and the end time indicated by the time code as the final time of the video information output. The synchronizing means processes the text information based on the time code and outputs it to the input / output means. As processing at this time, for example, there is a method of scrolling text information.

なお、同期データ生成手段および同期データ利用手段は、コンピュータシステムで実現できる。ただし、入出力手段は、ユーザ側のマイクロコンピュータ等のキーボードや表示部などの入出力手段に相当する。同期データ生成手段および同期データ利用手段（入出力手段を除く。）がコンピュータシステムで実現される場合には、音声認識手段、タイムコード生成手段、マッピング手段、構造化手段、検索制御手段および同期手段は、ソフトウェアによって実現される。また、データ格納手段は、コンピュータシステムにおける磁気ディスク等の記憶媒体によって実現される。 The synchronous data generating means and the synchronous data using means can be realized by a computer system. However, the input / output means corresponds to input / output means such as a keyboard and a display unit of a user-side microcomputer. When the synchronous data generating means and the synchronous data using means (excluding the input / output means) are realized by a computer system, voice recognition means, time code generating means, mapping means, structuring means, search control means, and synchronizing means Is realized by software. The data storage means is realized by a storage medium such as a magnetic disk in the computer system.

具体的には、コンピュータシステムに実装されるソフトウェアは、テキスト情報に対応する音声情報にもとづいて音声認識処理を行って音声認識テキスト情報を生成する処理と、音声認識テキスト情報とテキスト情報とを比較し、テキスト情報の区切り位置にもとづいて音声認識テキスト情報を分割部分に分割する処理と、それぞれの分割部分の開始時刻と終了時刻とを示すタイムコードを生成する処理と、生成されたタイムコードによってテキスト情報と映像情報とを所定の分割部分毎に対応付けて構造化する処理と、テキスト情報と、映像情報と、タイムコードとを格納する処理とを実行し、また、ユーザが入力した検索条件に合致するテキスト情報における分割部分を特定する処理と、特定された分割部分に対応するタイムコードを抽出する処理と、抽出されたタイムコードに対応する映像情報を特定し、特定した映像情報を、検索条件に合致するテキスト情報とともにユーザに提供する処理とを実行するプログラムを含む。 Specifically, the software implemented in the computer system compares the speech recognition text information and the text information with the processing for generating speech recognition text information by performing speech recognition processing based on the speech information corresponding to the text information. The processing for dividing the speech recognition text information into divided parts based on the position where the text information is separated, the process for generating the time code indicating the start time and the end time of each divided part, and the generated time code A process for associating and structuring text information and video information for each predetermined divided portion, a process for storing text information, video information, and time code, and a search condition input by the user The process for identifying the divided part in the text information that matches and the time code corresponding to the identified divided part are extracted. It includes a process of, identifies the image information corresponding to the extracted time code, the identified image information, a program for executing the process of providing the user with the text data matching the search keyword.

さて、映像配信サーバ登録装置１４は、映像データ１０、映像付随データ１１、対応データ１３をネットワークで映像配信サーバ３０にそれぞれ、映像データ格納領域３１、映像付随データ格納領域３２、対応データ格納領域３３に登録する。 Now, the video distribution server registration device 14 sends the video data 10, the video associated data 11, and the corresponding data 13 to the video distribution server 30 via the network, respectively, a video data storage area 31, a video associated data storage area 32, and a corresponding data storage area 33. Register with.

マイニング手段３５は、端末４０の入力装置４１から送られてきたデータを入力として分類辞書３４を補助として、映像付随データ１１内を検索し、出現頻度、データ間関係を得点化し、検索結果を検索結果生成手段３６に渡す。 The mining means 35 uses the data sent from the input device 41 of the terminal 40 as an input, searches the video accompanying data 11 with the classification dictionary 34 as an aid, scores the appearance frequency and the relationship between the data, and searches the search result. The result is passed to the result generation means 36.

検索結果生成手段３６は、マイニング手段３５の結果を基に、対応データから映像と再生開始位置を決定し、端末４０の表示再生装置４２に送る。 Based on the result of the mining means 35, the search result generating means 36 determines the video and the playback start position from the corresponding data, and sends it to the display / playback apparatus 42 of the terminal 40.

端末４０では、表示再生装置４２に表示された結果を選択することにより、映像再生が開始される。 In the terminal 40, video playback is started by selecting a result displayed on the display playback device 42.

次に、本実施の具体的な例について図２、３、４を用いて説明する。 Next, a specific example of this embodiment will be described with reference to FIGS.

まず、５分間程度の自己紹介映像データ１０に関して、全ての発言をテキスト化し、映像付随データ１１を作成する。映像データ１０と映像付随データ１１を映像テキスト対応手段１２に渡し、映像テキスト対応手段１２は、図２のような対応データ１３を生成する。 First, regarding the self-introduction video data 10 for about 5 minutes, all the statements are converted into text, and video accompanying data 11 is created. The video data 10 and the video accompanying data 11 are passed to the video text corresponding unit 12, and the video text corresponding unit 12 generates corresponding data 13 as shown in FIG.

次に、映像配信サーバ登録装置１４は、映像データ１０、映像付随データ１１、対応データ１３を指定された映像配信サーバ３０の映像データ格納領域３１、映像付随データ格納領域３２、対応データ格納領域３３にそれぞれ登録する。 Next, the video distribution server registration device 14 includes the video data storage area 31, the video associated data storage area 32, and the corresponding data storage area 33 of the video distribution server 30 in which the video data 10, the video associated data 11, and the corresponding data 13 are designated. Register each.

次に、端末４０の入力装置４１により、キーワード"長所"を入力する。 Next, the keyword “advantage” is input by the input device 41 of the terminal 40.

入力されたデータは、映像配信サーバ３０に送られ、マイニング手段３５に渡される。 The input data is sent to the video distribution server 30 and passed to the mining means 35.

マイニング手段３５は、"長所"というキーワードを受け取り、分類辞書３４から関連する"性格"、"短所"というキーワードを得る。 The mining means 35 receives the keyword “merits” and obtains the keywords “personality” and “disadvantages” from the classification dictionary 34.

マイニング手段３５は、受け取った"長所"と分類辞書３４からの"性格"、"短所"のキーワードにより、映像付随データ領域３２内の映像付随データ１１を全文検索を実施し、段落毎のキーワード出現回数により、重み付けをし、段落毎の得点化を行う。 The mining means 35 performs a full-text search on the video-associated data 11 in the video-associated data area 32 by using the received “advantages” and keywords of “personality” and “disadvantages” from the classification dictionary 34, and a keyword appears for each paragraph. Weighting is performed according to the number of times, and a score is obtained for each paragraph.

マイニング手段３５は、この結果を、検索結果生成手段３６に渡す。 The mining means 35 passes this result to the search result generating means 36.

検索結果生成手段３６は、段落ごとの得点の高い順に対応データ格納領域３３の内の対応データ１３と照合することにより、図３に示す結果を生成し、端末４０の表示再生装置４１に結果を返却する。 The search result generating means 36 generates the result shown in FIG. 3 by collating with the corresponding data 13 in the corresponding data storage area 33 in the descending order of the score for each paragraph, and the result is displayed on the display reproduction device 41 of the terminal 40. return.

端末４０の表示再生装置４１に表示された結果を、利用者は選択することにより、映像配信サーバ３０の映像配信装置３７に、映像データ格納領域３２内にある映像データ１０の特定の再生位置から再生を開始し、映像を配信する。 When the user selects the result displayed on the display / playback device 41 of the terminal 40, the user selects the result from the specific playback position of the video data 10 in the video data storage area 32. Start playback and distribute the video.

上記の本実施の形態によれば、マイニング手段と分類辞書により、映像の分類を客観的に検索することができ、かつ、一般的なキーワードにより、複数の映像を選択する時間が短縮される。 According to the present embodiment, the video classification can be objectively searched by the mining means and the classification dictionary, and the time for selecting a plurality of videos can be shortened by using a general keyword.

また、マイニングエンジンの結果を他の映像との関係をも情報として出力することにより、さらに利用者に新たな映像を提供することが可能となる。 Further, by outputting the result of the mining engine as information on the relationship with other videos, it is possible to provide a new video to the user.

次に、本発明の第２の実施の形態について説明する。 Next, a second embodiment of the present invention will be described.

映像配信サーバ３０に複数の映像が格納されている状態で、端末４０の入力装置４１からのキーワードの入力ではなく、一般的なＷｅｂページのテキストの単語からでも、マイニング手段３５にキーワードを引き渡すこともできる。 In a state where a plurality of videos are stored in the video distribution server 30, the keyword is handed over to the mining means 35 not from the keyword input from the input device 41 of the terminal 40 but from the text word of a general Web page. You can also.

例えば、ＦＡＱ（よくある質問とその解答）のページにおいて、言葉だけの説明ではなく映像による説明を実施したい場合など、キーワードなる単語の選択し、マイニング手段３５に渡すことにより、関連する映像データの一覧を具体例に示した流れと同じ経路で利用者に返すことができ、かつ、映像による解答があえられる For example, in the FAQ (Frequently Asked Questions and Answers) page, when it is desired to carry out explanation by video rather than explanation of only words, a word as a keyword is selected and passed to the mining means 35, so that related video data The list can be returned to the user through the same route as shown in the example, and answers can be obtained by video.

なお、上述する各実施の形態は、本発明の好適な実施の形態であり、本発明の要旨を逸脱しない範囲内において種々変更実施が可能である。例えば、映像配信サーバ登録装置１４、映像配信サーバ３０および端末４０の機能を実現するためのプログラムを各装置に読込ませて実行することにより各装置の機能を実現する処理を行ってもよい。さらに、そのプログラムは、コンピュータ読み取り可能な記録媒体であるＣＤ−ＲＯＭまたは光磁気ディスクなどを介して、または伝送媒体であるインターネット、電話回線などを介して伝送波により他のコンピュータシステムに伝送されてもよい。 Each of the above-described embodiments is a preferred embodiment of the present invention, and various modifications can be made without departing from the scope of the present invention. For example, the processing for realizing the function of each device may be performed by causing each device to read and execute a program for realizing the functions of the video distribution server registration device 14, the video distribution server 30, and the terminal 40. Further, the program is transmitted to another computer system by a transmission wave via a computer-readable recording medium such as a CD-ROM or a magneto-optical disk, or via a transmission medium such as the Internet or a telephone line. Also good.

上述する各実施の形態は、映像配信サーバ登録装置１４、映像配信サーバ３０および端末４０が１つのコンピュータシステムとして実現されている構成について説明したが、各装置が別個に接続されている構成や機能毎に複数の装置などが追加された構成にも適用可能であることはもちろんである。 In each of the above-described embodiments, the configuration in which the video distribution server registration device 14, the video distribution server 30, and the terminal 40 are realized as one computer system has been described. However, the configuration and function in which each device is connected separately. Of course, the present invention can also be applied to a configuration in which a plurality of devices are added for each.

本発明は、映像コンテンツを多数所有する事業所や映像による教育を実施する事業所において、多数の映像から簡単なキーワードで必要な映像を分類から探すことなく見つける作業に適用可能である。また、教育等の映像では、必要な箇所から再生可能であるため、学習効果が期待できる。 INDUSTRIAL APPLICABILITY The present invention can be applied to an operation of finding a necessary video from a large number of videos without searching by a simple keyword in a business that owns a large number of video contents or a business that conducts video education. In addition, since a video such as education can be reproduced from a necessary place, a learning effect can be expected.

本発明の第１の実施の形態における映像検索システムの構成を示すブロック図である。It is a block diagram which shows the structure of the video search system in the 1st Embodiment of this invention. 本発明の第１の実施の形態における映像テキスト対応手段が生成する対応データの一例を示す図である。It is a figure which shows an example of the corresponding data which the video text corresponding | compatible means in the 1st Embodiment of this invention produces | generates. 本発明の第１の実施の形態における検索結果生成手段が対応データと照合することにより生成するデータの一例を示す図である。It is a figure which shows an example of the data produced | generated by the search result production | generation means in the 1st Embodiment of this invention collating with corresponding data.

Explanation of symbols

１０映像データ
１１映像付随データ
１２映像テキスト対応手段
１３対応データ
１４映像配信サーバ登録装置
３０映像配信サーバ
３１映像データ格納領域
３２映像付随データ格納領域
３３対応データ格納領域
３４分類辞書
３５マイニング手段
３６検索結果生成手段
３７映像配信装置
４０端末
４１入力装置
４２表示再生装置 DESCRIPTION OF SYMBOLS 10 Video data 11 Video accompanying data 12 Video text corresponding means 13 Corresponding data 14 Video distribution server registration apparatus 30 Video distribution server 31 Video data storage area 32 Video accompanying data storage area 33 Corresponding data storage area 34 Classification dictionary 35 Mining means 36 Search result Generation means 37 Video distribution device 40 Terminal 41 Input device 42 Display reproduction device

Claims

A video search system characterized in that a video is specified from text data acquired from audio accompanying the video.

2. The video search system according to claim 1, wherein the video data is weighted at the time of search by a keyword.

3. The video search system according to claim 1, wherein a playback start point is specified in the search result.

Video text correspondence means for associating video data and video accompanying data;
A video distribution server registration device for registering corresponding data information as a result of the video text corresponding means in a video distribution server;
Video data storage area, video associated data storage area, corresponding data storage area, mining means for weighting at the time of search, classification dictionary for weighting, search result generating means for generating results by mining means as search results, and video distribution apparatus A video distribution server comprising:
A video search system, wherein terminal devices that send information such as keywords to the video distribution server are connected to each other via a network.

A video search method characterized in that a video is specified from text data acquired from audio accompanying the video.

6. The video search method according to claim 5, wherein the video data is weighted when searching by a keyword.

7. The video search method according to claim 5, wherein a playback start point is specified in the search result.

Video text correspondence means for associating video data and video accompanying data;
A video distribution server registration device for registering corresponding data information as a result of the video text corresponding means in a video distribution server;
Video data storage area, video associated data storage area, corresponding data storage area, mining means for weighting at the time of search, classification dictionary for weighting, search result generating means for generating results by mining means as search results, and video distribution apparatus A video distribution server comprising:
A video search method using a terminal device that sends information such as a keyword to the video distribution server.

A program for causing a computer to realize the function according to any one of claims 1 to 4.