JP2011049707A

JP2011049707A - Moving image playback device, moving image playback method, and program

Info

Publication number: JP2011049707A
Application number: JP2009194901A
Authority: JP
Inventors: Atsunori Sakai; 敦典坂井
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2009-08-26
Filing date: 2009-08-26
Publication date: 2011-03-10
Anticipated expiration: 2029-08-26
Also published as: JP5499566B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a moving image playback device, a moving image playback method, and a program which can retrieve a moving image file based on a speech content of a speaker in the moving image, or retrieve a scene in the moving image file from among a large amount of the moving image files. <P>SOLUTION: The moving image playback device 1 is provided with a voice recognition means 2 which performs voice recognition of the moving image file; a text file creation means 3 which creates a text file from output of the voice recognition means 2; a keyword detection means 5 which detects a keyword pre-stored in a memory means 4 from the text file; a moving image order decision means 6 which decides a priority order of the playback moving image referring keyword weight attached to the keyword; a moving image playback means 7, which carries out playback of the moving image selected by user's selection referring the moving image order; and a weight updating means 8 which updates a keyword weight, based on the moving image selected by the user and a mode occurred by the keyword. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、動画再生装置、動画再生方法及びプログラムに関し、更に詳しくは、動画ファイルを音声認識し、ユーザが希望する優先順位で動画を再生する動画再生装置、動画再生方法及びプログラムに関する。 The present invention relates to a moving image reproducing apparatus, a moving image reproducing method, and a program, and more particularly, to a moving image reproducing device, a moving image reproducing method, and a program for recognizing a moving image file and reproducing a moving image in a priority order desired by a user.

近年、ｙｏｕｔｕｂｅ（登録商標）等の動画共有Ｗｅｂサイトには、大量の動画ファイルがアップロードされている。アップロードされた動画ファイルには、動画製作者（動画投稿者）又はユーザ（動画視聴者）によってメタデータが付加されている。メタデータとは、タイトル情報、カテゴリ情報、説明文等の付加的なデータをいう。 In recent years, a large number of moving image files have been uploaded to moving image sharing websites such as YouTube (registered trademark). Metadata is added to the uploaded video file by a video producer (video poster) or a user (video viewer). Metadata refers to additional data such as title information, category information, and explanatory text.

ユーザは、目的の動画ファイルを探し出す際に、メタデータを用いて自らが再生を希望する再生データをカテゴリ検索、キーワード検索等による検索を行う。しかし、メタデータは限定された情報に過ぎないので、メタデータを用いた検索で目的の動画ファイルを探し出すことは困難である。また、動画ファイル内の特定の場面（シーン又は区間）を探し出すことは更に困難であった。 When searching for a target moving image file, the user searches the reproduction data that he / she desires to reproduce by category search, keyword search, or the like using the metadata. However, since the metadata is only limited information, it is difficult to find the target moving image file by searching using the metadata. Further, it has been more difficult to find a specific scene (scene or section) in the moving image file.

特許文献１には、インターネット等の動画サイト上に登録された大量の動画に対してキーワード検索を行う動画検索システムが記載されている。この動画検索システムでは、動画ファイルデータベース上の動画ファイルを音声認識してテキストデータ（テキスト）を生成し、このテキストを新たな検索ワードとして登録することで、検索キーワードの陳腐化を防止している。 Patent Document 1 describes a moving image search system that performs a keyword search for a large number of moving images registered on a moving image site such as the Internet. In this moving image search system, a moving image file in a moving image file database is recognized by speech to generate text data (text), and this text is registered as a new search word to prevent the search keyword from becoming obsolete. .

特許文献２には、選択された動画ファイルに含まれるキーワードを抽出或いは表示し、このキーワードを利用者に選択させることで、所望のシーンを直ちに再生する動作再生装置が記載されている。 Japanese Patent Application Laid-Open No. 2004-228561 describes an operation playback device that extracts or displays a keyword included in a selected moving image file and causes a user to select the keyword to immediately play back a desired scene.

特許文献３には、動画コンテンツを画像認識により各シーンに分割し、シーン毎の代表静止画をストーリーに沿って保存する技術が記載されている。この技術は、各シーンの代表静止画を再生し、これをユーザが見ることで、所望のシーンを探し出すものである。 Patent Document 3 describes a technique in which moving image content is divided into scenes by image recognition and a representative still image for each scene is stored along a story. In this technique, a representative still image of each scene is reproduced, and a user views it to find a desired scene.

特開２００８−１３４９７９号公報JP 2008-134799 A 特開２００８−１４８０７７号公報JP 2008-148077 A 特開２００２−３３５４７３号公報JP 2002-335473 A 特開平１１−２５２７１号公報Japanese Patent Laid-Open No. 11-25271

特許文献１に記載の技術は、単に検索キーワードの陳腐化を防止するものに過ぎない。また、特許文献１，２に記載の技術は、キーワードや代表静止画をユーザに提示し、ユーザが希望のシーンを探し出すものに過ぎず、動画内の話者の発話内容に基づいて、動画ファイルを検索することはできない。そのため、大量の動画ファイルから目的の動画ファイルを検索すること、また、その動画ファイル内の一場面を、条件を絞り込みながら探し出していくことは困難である。 The technique described in Patent Document 1 merely prevents the search keyword from becoming obsolete. In addition, the techniques described in Patent Documents 1 and 2 merely present keywords and representative still images to the user, and the user searches for a desired scene. Based on the utterance contents of the speakers in the video, the video file Cannot be searched. Therefore, it is difficult to search for a target moving image file from a large number of moving image files and to find one scene in the moving image file while narrowing down the conditions.

また、特許文献４には、画像認識により類似の画像を検索する技術が記載されている。しかし、この技術を用いて動画ファイルの一場面を検索したとしても、あくまで画像認識を利用した検索に過ぎず、特許文献１〜３と同様に、動画内の話者の発話内容に基づいた検索はできない。 Patent Document 4 describes a technique for searching for similar images by image recognition. However, even if one scene of a video file is searched using this technology, it is merely a search using image recognition, and a search based on the utterance content of a speaker in the video as in Patent Documents 1 to 3. I can't.

本発明は、大量の動画ファイルから、動画内の話者の発話内容に基づいて動画ファイルを検索することや動画ファイル内の一場面を検索できる動画再生装置、動画再生方法及びプログラムを提供することを目的とする。 The present invention provides a video playback device, a video playback method, and a program capable of searching a video file from a large number of video files based on the utterance content of a speaker in the video and searching for a scene in the video file. With the goal.

上記目的を達成するために、本発明は、動画ファイルを音声認識する音声認識手段と、
前記音声認識手段の出力からテキストファイルを生成するテキストファイル生成手段と、
記憶装置に予め記憶されたキーワードを、前記テキストファイルから検出するキーワード検出手段と、
キーワードに付けられたキーワード重みを参照して、再生すべき動画の優先順位を定める動画順位決定手段と、
前記動画順位を参照するユーザの選択によって選択された動画を再生する動画再生手段と、
ユーザが選択した動画と、前記キーワードが発生した態様とに基づいて、前記キーワード重みを更新する重み更新手段と、を備える動画再生装置を提供する。 In order to achieve the above object, the present invention provides a voice recognition means for voice recognition of a moving image file,
Text file generation means for generating a text file from the output of the voice recognition means;
Keyword detecting means for detecting a keyword stored in advance in the storage device from the text file;
A video ranking determining means for determining the priority of the video to be played with reference to the keyword weight attached to the keyword,
Movie playback means for playing back the movie selected by the user's selection referring to the movie ranking;
There is provided a moving image reproducing apparatus including weight updating means for updating the keyword weight based on a moving image selected by a user and an aspect in which the keyword is generated.

また、本発明は、動画ファイルを音声認識する音声認識ステップと、
前記音声認識ステップの出力からテキストファイルを生成するテキストファイル生成ステップと、
記憶装置に予め記憶されたキーワードを、前記テキストファイルから検出するキーワード検出ステップと、
キーワードに付けられたキーワード重みを参照して、再生すべき動画の優先順位を定める動画順位決定ステップと、
前記動画順位を参照するユーザの選択によって選択された動画を再生する動画再生ステップと、
ユーザが選択した動画と、前記キーワードが発生した態様とに基づいて、前記キーワード重みを更新する重み更新ステップと、を備える動画再生方法を提供する。 The present invention also includes a speech recognition step for recognizing a moving image file,
A text file generation step for generating a text file from the output of the speech recognition step;
A keyword detection step of detecting a keyword stored in advance in the storage device from the text file;
A video ranking determination step that determines the priority of the videos to be played by referring to the keyword weights attached to the keywords;
A video playback step of playing back the video selected by the user's selection referring to the video ranking;
There is provided a moving image reproduction method including a weight update step of updating the keyword weight based on a moving image selected by a user and an aspect in which the keyword is generated.

さらに、本発明は、コンピュータを備え、動画を再生する動画再生装置のためのプログラムであって、前記コンピュータに、
動画ファイルを音声認識する音声認識処理と、
前記音声認識処理の出力からテキストファイルを生成するテキストファイル生成処理と、
記憶装置に予め記憶されたキーワードを、前記テキストファイルから検出するキーワード検出処理と、
キーワードに付けられたキーワード重みを参照して、再生すべき動画の優先順位を定める動画順位決定処理と、
前記動画順位を参照するユーザの選択によって選択された動画を再生する動画再生処理と、
ユーザが選択した動画と、前記キーワードが発生した態様とに基づいて、前記キーワード重みを更新する重み更新処理と、を実行させるプログラムを提供する。 Furthermore, the present invention is a program for a moving image reproducing apparatus that includes a computer and reproduces a moving image, and the computer includes:
Voice recognition processing that recognizes video files,
A text file generation process for generating a text file from the output of the speech recognition process;
A keyword detection process for detecting a keyword stored in advance in the storage device from the text file;
A video ranking determination process that determines the priority of videos to be played with reference to the keyword weights assigned to the keywords,
A video playback process for playing back the video selected by the user's selection referring to the video ranking;
There is provided a program for executing a weight update process for updating the keyword weight based on a moving image selected by a user and an aspect in which the keyword is generated.

本発明の動画再生装置、動画再生方法及びプログラムでは、大量の動画ファイルから、動画内の話者の発話内容に基づいて希望の動画ファイルを検索し、或いは動画ファイル内の一場面を検索できる。 In the moving image playback apparatus, the moving image playback method, and the program of the present invention, a desired moving image file can be searched from a large number of moving image files based on the utterance content of a speaker in the moving image, or one scene in the moving image file can be searched.

本発明の動画再生装置の最小構成を示すブロック図。The block diagram which shows the minimum structure of the moving image reproduction apparatus of this invention. 本発明の実施形態に係る動画再生装置を示す全体図。1 is an overall view showing a moving image playback apparatus according to an embodiment of the present invention. 図２に示す動画再生装置の構成を示すブロック図。The block diagram which shows the structure of the moving image reproducing apparatus shown in FIG. 図２に示す動画再生装置の動作を示すシーケンス図。The sequence diagram which shows operation | movement of the moving image reproducing apparatus shown in FIG. 動画順位及びシーン順位を決定する手順を示すフローチャート。The flowchart which shows the procedure which determines a moving image order and a scene order. キーワード重みを例示する図。The figure which illustrates keyword weight. （ａ）及び（ｂ）は、キーワード情報を例示する図。(A) And (b) is a figure which illustrates keyword information. 動画順位及びシーン順位を含む検索結果一覧を例示する図。The figure which illustrates the search result list containing animation ranking and scene ranking. Ｗｅｂブラウザ上での検索結果一覧画面を示す図。The figure which shows the search result list screen on a web browser. 更新されたキーワード重みを例示する図。The figure which illustrates the keyword weight updated. 他のキーワードのキーワード重みを例示する図。The figure which illustrates the keyword weight of another keyword.

図１は、本発明の動画再生装置の最小構成を示すブロック図である。本発明の動画再生装置１は、その最小構成として、音声認識手段２と、テキストファイル生成手段３と、記憶装置４と、キーワード検出手段５と、動画順位決定手段６と、動画再生手段７と、重み更新手段８とを備える。音声認識手段２は、動画ファイルを音声認識する。テキストファイル生成手段３は、音声認識手段２の出力からテキストファイルを生成する。キーワード検出手段５は、記憶装置４に予め記憶されたキーワードを、テキストファイルから検出する。動画順位決定手段６は、キーワードに付けられたキーワード重みを参照して、再生すべき動画の優先順位を定める。動画再生手段７は、動画順位を参照するユーザの選択によって選択された動画を再生する。重み更新手段８は、ユーザが選択した動画と、キーワードが発生した態様とに基づいて、キーワード重みを更新する。 FIG. 1 is a block diagram showing the minimum configuration of the moving picture reproducing apparatus of the present invention. The moving picture reproducing apparatus 1 of the present invention has, as its minimum configuration, a voice recognition means 2, a text file generating means 3, a storage device 4, a keyword detecting means 5, a moving picture rank determining means 6, and a moving picture reproducing means 7. And weight updating means 8. The voice recognition means 2 recognizes a moving image file by voice. The text file generation unit 3 generates a text file from the output of the voice recognition unit 2. The keyword detection means 5 detects a keyword stored in advance in the storage device 4 from the text file. The moving picture order determining means 6 determines the priority order of moving pictures to be reproduced with reference to the keyword weights attached to the keywords. The moving image reproducing means 7 reproduces the moving image selected by the user's selection referring to the moving image ranking. The weight updating unit 8 updates the keyword weight based on the moving image selected by the user and the mode in which the keyword is generated.

上記動画再生装置１では、動画ファイルを音声認識して生成したテキストファイルから、テキストファイルに含まれるキーワードを検出し、また、記憶装置４に予め記憶されたキーワード重みを取得する。取得したキーワード重みを参照して、再生すべき動画の優先順位を定める。定められた動画順位を参照してユーザが選択した動画は、再生される。次いで、キーワード重みは、ユーザが選択した動画と、この動画内でのキーワードが発生した態様とに基づいて更新される。更新されたキーワード重みは、次回、再生すべき動画の優先順位を定めるときに利用される。 In the moving image reproduction device 1, keywords included in the text file are detected from a text file generated by voice recognition of the moving image file, and keyword weights stored in advance in the storage device 4 are acquired. Referring to the acquired keyword weight, the priority order of the moving image to be reproduced is determined. The moving image selected by the user with reference to the determined moving image order is reproduced. Next, the keyword weight is updated based on the moving image selected by the user and the mode in which the keyword is generated in the moving image. The updated keyword weight is used when determining the priority order of the moving image to be reproduced next time.

つまり、ユーザが目的の動画ファイルに関連するキーワードを入力すれば、このキーワードを含む再生すべき動画の優先順位が一旦は定まり、この動画順位を参照してユーザが動画を選択することで、動画順位を定めるパラメータの一つであるキーワード重みが更新される。従って、次回からは、ユーザの選択をフィードバックして動的に変更されるキーワード重みを用いて、動画順位が決定されるので、繰返し使用すればする程、大量の動画ファイルから目的の動画を検索する精度が高まる。また、目的の動画ファイルに関連するキーワードとして、動画内の話者の発話内容を反映した複数のキーワードを入力すれば、目的の動画ファイル内の一場面（区間、シーン）を検索できる。 In other words, if the user inputs a keyword related to the target video file, the priority order of the video to be played that includes this keyword is once determined, and the user selects the video by referring to this video ranking. The keyword weight that is one of the parameters for determining the ranking is updated. Therefore, from the next time, the video ranking is determined using keyword weights that are dynamically changed by feeding back the user's selection, so the more repeatedly used, the target video is searched from a large number of video files. Increase the accuracy. In addition, if a plurality of keywords reflecting the utterance content of the speaker in the video are input as keywords related to the target video file, one scene (section, scene) in the target video file can be searched.

また、本発明の動画再生方法及びプログラムでは、上記動画再生装置１の最小構成に対応する構成を有しており、上記同様に、大量の動画ファイルから、動画内の話者の発話内容に基づいて動画ファイルを検索することや動画ファイル内の一場面を検索することができる。 In addition, the moving image playback method and program of the present invention have a configuration corresponding to the minimum configuration of the moving image playback device 1, and similarly to the above, based on the utterance contents of the speakers in the video from a large number of video files. You can search for a movie file or a scene in a movie file.

以下、図２〜図１１を参照して、本発明の例示的な実施の形態について詳細に説明する。図２は、本発明の実施形態に係る動画再生装置を示す全体図である。動画再生装置１０は、例えば、ユーザ端末１１のユーザ１２が目的とする或いは目的に合っている動画（以下、目的の動画）、及び、目的の動画内の一場面（以下、区間又はシーン）を検索し再生するための装置である。 Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to FIGS. FIG. 2 is an overall view showing the moving image playback apparatus according to the embodiment of the present invention. For example, the video playback device 10 displays a video that the user 12 of the user terminal 11 aims or meets the purpose (hereinafter, target video) and one scene (hereinafter, section or scene) in the target video. It is a device for searching and playing.

動画再生装置１０は、Ｗｅｂサーバ２０と、動画検索サーバ３０と、ファイルサーバ４０とを備え、これらの各サーバ２０，３０，４０がローカルネットワーク（ＬＡＮ）５０で接続されている。また、動画再生装置１０には、複数のユーザ端末１１，１３が接続されている。ユーザ端末１１，１３は、インターネット６０を経由してＷｅｂサーバ２０に接続され、各種Ｗｅｂサイトにアクセス可能となっている。以下では、Ｗｅｂサイトを、ｙｏｕｔｕｂｅ（登録商標）等の動画共有Ｗｅｂサイトとし、また、ユーザ端末１１のユーザ１２を目的の動画を検索する利用者とし、さらに、ユーザ端末１３のユーザ１４を動画共有Ｗｅｂサイトに動画を投稿する動画投稿者とする。 The video playback device 10 includes a Web server 20, a video search server 30, and a file server 40, and these servers 20, 30, and 40 are connected via a local network (LAN) 50. In addition, a plurality of user terminals 11 and 13 are connected to the moving image playback apparatus 10. The user terminals 11 and 13 are connected to the Web server 20 via the Internet 60 and can access various Web sites. In the following, the website is a video sharing website such as YouTube (registered trademark), the user 12 of the user terminal 11 is a user who searches for a target video, and the user 14 of the user terminal 13 is a video sharing Assume that a video poster submits a video to a website.

図３は、動画再生装置１０の構成を示すブロック図である。動画投稿者であるユーザ１４のユーザ端末１３は、動画投稿手段１５を有する。利用者であるユーザ１２のユーザ端末１１は、キーワード設定手段１６と動画選択手段１７とを有する。これらの各手段１５，１６，１７は、Ｗｅｂサーバ２０にアクセス可能である。動画投稿手段１５は、ユーザ１４の操作に従い、Ｗｅｂ上の動画共有サイトにアクセスし、動画ファイルを投稿するための手段である。キーワード設定手段１６は、ユーザ１２の操作に従い、目的の動画に関連すると思われるキーワードを入力するための手段である。動画選択手段１７は、ユーザ１２の操作に従い、Ｗｅｂサイト画面上に表示された動画ファイル検索結果一覧から、目的と合っている（又は合っていない）動画を選択するための手段である。 FIG. 3 is a block diagram illustrating a configuration of the moving image playback apparatus 10. The user terminal 13 of the user 14 who is a video poster has a video posting unit 15. The user terminal 11 of the user 12 who is a user has a keyword setting unit 16 and a moving image selection unit 17. Each of these means 15, 16, and 17 can access the Web server 20. The video posting unit 15 is a unit for accessing a video sharing site on the Web and posting a video file in accordance with the operation of the user 14. The keyword setting means 16 is a means for inputting a keyword that seems to be related to the target moving image in accordance with the operation of the user 12. The moving image selection means 17 is means for selecting a moving image that matches (or does not match) the purpose from the moving image file search result list displayed on the website screen in accordance with the operation of the user 12.

Ｗｅｂサーバ２０は、動画投稿画面表示手段２１と、キーワード入力画面表示手段２２と、動画検索結果表示手段２３とを備える。また、動画検索サーバ３０は、音声認識手段３１と、動画ファイル検索手段３２と、キーワード情報一時検索結果データベース３３と、動画ファイル一時検索結果出力データベース３４と、キーワード重み付け変更手段３５とを備える。さらに、ファイルサーバ４０は、動画ファイルデータベース４１と、言語モデルデータベース４２と、認識テキストデータベース４３と、キーワード情報データベース４４とを備える。 The Web server 20 includes a moving image posting screen display unit 21, a keyword input screen display unit 22, and a moving image search result display unit 23. The moving image search server 30 includes a voice recognition unit 31, a moving image file searching unit 32, a keyword information temporary search result database 33, a moving image file temporary search result output database 34, and a keyword weight change unit 35. Further, the file server 40 includes a moving image file database 41, a language model database 42, a recognized text database 43, and a keyword information database 44.

動画投稿画面表示手段２１は、動画投稿手段１５により投稿された動画ファイル、及び動画ファイルに付加されたメタデータを、動画ファイルデータベース４１に格納する。メタデータとは、動画ファイルに関するタイトル情報、カテゴリ情報、説明文等の付加的なデータである。なお、動画ファイルは、動画ファイルデータベース４１に格納されたことで、アップロードされたことになる。 The moving image posting screen display means 21 stores the moving image file posted by the moving image posting means 15 and the metadata added to the moving image file in the moving image file database 41. The metadata is additional data such as title information, category information, and explanatory text regarding the moving image file. The moving image file is uploaded because it is stored in the moving image file database 41.

音声認識手段３１は、アップロードされている動画ファイルを動画ファイルデータベース４１から取得し、言語モデルデータベース４２の言語モデル（辞書データ）を参照して音声認識を実行する。音声認識手段３１は、音声認識によりテキスト化を行い、認識テキスト（テキストファイル）を生成する。このとき、音声認識手段３１は、単にテキスト化を行うだけでなく、認識テキストに含まれる各単語（記載）と、この各単語の動画ファイルの再生時における先頭からの発話開始時間とを紐づけて記録する。 The voice recognition means 31 acquires the uploaded moving picture file from the moving picture file database 41 and performs voice recognition with reference to the language model (dictionary data) in the language model database 42. The voice recognition unit 31 converts the text into a text by voice recognition and generates a recognized text (text file). At this time, the speech recognition means 31 not only performs text conversion, but also associates each word (description) included in the recognized text with the utterance start time from the beginning when the moving image file of each word is reproduced. Record.

音声認識手段３１は、動画ファイルに付加されたメタデータに、言語モデルデータベース４２の言語モデルに登録されていないワードが含まれていれば、このワードを言語モデルに追加登録する。ワードが追加登録されることで、言語モデルに含まれる登録ワードが自動的に増加する。なお、ワードとしては、動画投稿者が動画ファイルに付加したタイトルや説明文に含まれるワードや、利用者が指定したキーワード等が挙げられる。また、音声認識手段３１は、認識テキストを認識テキストデータベース４３に格納する。認識テキストデータベース４３に格納された認識テキストは、インデックス化される。インデックスとは、データの検索速度を向上させるために、データベース内でどのデータがどこに格納されているかを示した索引をいう。 If the metadata added to the moving image file includes a word that is not registered in the language model of the language model database 42, the speech recognition unit 31 additionally registers this word in the language model. By registering additional words, the registered words included in the language model automatically increase. Examples of the word include a word included in a title or an explanatory text added to the video file by the video poster, a keyword designated by the user, and the like. The voice recognition unit 31 stores the recognized text in the recognized text database 43. The recognized text stored in the recognized text database 43 is indexed. An index refers to an index indicating which data is stored in the database in order to improve the data search speed.

続いて、利用者であるユーザ１２が、ユーザ端末１１を用いて動画共有サイトにアクセスした場合について説明する。ユーザ１２がユーザ端末１１を用いて動画共有サイトにアクセスすると、Ｗｅｂサーバ２０内のキーワード入力画面表示手段２２は、キーワード入力画面をユーザ端末１１に表示する。次に、キーワード入力画面上で目的の動画（検索したい動画）の一場面に関連したキーワードがキーワード設定手段１６を用いて入力されると、キーワード入力画面表示手段２２は、入力されたキーワードを動画ファイル検索手段３２に引き渡す。 Next, a case where the user 12 who is a user accesses the video sharing site using the user terminal 11 will be described. When the user 12 accesses the video sharing site using the user terminal 11, the keyword input screen display unit 22 in the Web server 20 displays the keyword input screen on the user terminal 11. Next, when a keyword related to a scene of a target video (video to be searched) is input using the keyword setting unit 16 on the keyword input screen, the keyword input screen display unit 22 displays the input keyword as a video. Delivered to the file search means 32.

動画ファイル検索手段３２は、入力されたキーワードについてキーワード情報データベース４４を検索する。キーワード情報データベース４４には、キーワードに付けられたキーワードの重み付け情報（キーワード重み）が格納されており、動画ファイル検索手段３２は、入力されたキーワードのキーワード重みをキーワード情報データベース４４から取得する。 The moving image file search means 32 searches the keyword information database 44 for the input keyword. The keyword information database 44 stores keyword weighting information (keyword weights) attached to the keywords, and the moving image file search means 32 acquires the keyword weights of the input keywords from the keyword information database 44.

また、動画ファイル検索手段３２は、入力されたキーワードについて、認識テキストデータベース４３を検索し、認識テキスト内に同じキーワードが含まれている動画ファイルを、動画ファイルデータベース４１から取得する。動画ファイル検索手段３２は、例えば、取得した動画ファイルの認識テキスト及びキーワード重みから、再生すべき動画順位及びシーン順位を算出し、これらの算出結果を含む動画ファイル検索結果を動画ファイル一時検索結果データベース３４に格納する。同時に、動画ファイル検索手段３２は、例えば動画ファイル検索結果から、入力されたキーワードが動画ファイル内で発話されている先頭からの経過時間（秒数）を取得し、キーワード情報としてキーワード情報一時検索結果データベース３３に格納する。ここで、音声認識手段３１が認識テキストに含まれる各単語と、この各単語の動画ファイルの再生時における先頭からの発話開始時間とを紐づけて記録していることから、先頭からの経過時間の取得が可能となる。動画ファイル検索手段３２は、認識テキスト内の記載と、動画ファイルの再生時における先頭からの経過時間とを対応付けると共に、認識テキストを動画ファイルの再生時間に従って複数の区間（シーン）に区分する経過時間算出手段としても機能する。なお、キーワード情報としては、例えば、シーンＩＤ、キーワード名、キーワードが含まれている動画ファイル名、発話開始時間、キーワード重み等が含まれる。 In addition, the moving image file search unit 32 searches the recognized text database 43 for the input keyword, and acquires a moving image file including the same keyword in the recognized text from the moving image file database 41. The moving image file search means 32 calculates, for example, the moving image ranking and the scene ranking to be reproduced from the acquired recognition text and keyword weight of the moving image file, and the moving image file search result including these calculation results is stored in the moving image file temporary search result database. 34. At the same time, the moving image file search means 32 acquires the elapsed time (seconds) from the beginning when the input keyword is uttered in the moving image file, for example, from the moving image file search result, and the keyword information temporary search result as keyword information Store in the database 33. Here, since the speech recognition means 31 records each word included in the recognized text and the utterance start time from the beginning when the moving image file of each word is played back, the elapsed time from the beginning is recorded. Can be acquired. The moving image file search means 32 associates the description in the recognized text with the elapsed time from the beginning when the moving image file is played back, and the elapsed time for dividing the recognized text into a plurality of sections (scenes) according to the moving time of the moving image file. It also functions as a calculation means. Note that the keyword information includes, for example, a scene ID, a keyword name, a moving image file name including the keyword, an utterance start time, a keyword weight, and the like.

動画検索結果表示手段２３は、動画ファイル一時検索結果データベース３４から動画ファイル検索結果を、キーワード情報一時検索結果データベース３３からキーワード情報をそれぞれ取得する。そして、動画検索結果表示手段２３は、これらのデータベース３３，３４から取得した情報を、ユーザ端末１１に表示されたＷｅｂサイト画面上に、動画ファイル検索結果一覧として表示する。 The moving image search result display means 23 acquires a moving image file search result from the moving image file temporary search result database 34 and keyword information from the keyword information temporary search result database 33. Then, the moving image search result display means 23 displays the information acquired from these databases 33 and 34 as a moving image file search result list on the Web site screen displayed on the user terminal 11.

ユーザ端末１１の動画選択手段１７は、ユーザ１２の操作に従い、Ｗｅｂサイト画面上に表示された動画ファイル検索結果一覧から、目的と合っている（又は合っていない）動画を選択する。動画検索結果表示手段２３は、動画選択手段１７による選択結果を、キーワード重み付け変更手段３５に送る。 The moving image selection means 17 of the user terminal 11 selects a moving image that matches (or does not match) the purpose from the moving image file search result list displayed on the website screen according to the operation of the user 12. The moving picture search result display means 23 sends the selection result by the moving picture selection means 17 to the keyword weighting change means 35.

キーワード重み付け変更手段３５は、動画選択手段１７で選択された目的と合っている（又は合っていない）動画ファイルに含まれるキーワードのキーワード情報を、キーワード情報一時検索結果データベース３３から取得する。キーワード重み付け変更手段３５は、取得したキーワードのキーワード重みを、動画選択手段１７による選択結果に基づいて更新し、更新後のキーワード重みをキーワード情報データベース４４に格納する。キーワード情報データベース４４に格納された更新後のキーワード重みは、ユーザ１２がユーザ端末１１を用いて再度検索を行う場合に使用される。つまり、目的の動画の一場面に関連したキーワードがキーワード設定手段１６から再度入力されると、動画ファイル検索手段３２が再生すべき動画順位及びシーン順位を決定する際に、更新後のキーワード重みを参照することになる。 The keyword weighting changing unit 35 acquires keyword information of keywords included in the moving image file that matches (or does not match) the purpose selected by the moving image selecting unit 17 from the keyword information temporary search result database 33. The keyword weight changing unit 35 updates the keyword weight of the acquired keyword based on the selection result by the moving image selecting unit 17, and stores the updated keyword weight in the keyword information database 44. The updated keyword weight stored in the keyword information database 44 is used when the user 12 searches again using the user terminal 11. That is, when a keyword related to one scene of the target moving image is input again from the keyword setting means 16, the updated keyword weight is set when the moving image file searching means 32 determines the moving image rank and the scene order to be reproduced. Will refer to it.

以下、図３〜図１０を参照して、動画再生装置１０の動作を具体的に説明する。図４は、動画再生装置１０の動作を示すシーケンス図である。動画投稿手段１５が、動画投稿者であるユーザ１４の操作に従い、Ｗｅｂ上の動画共有サイトにアクセスし、動画ファイルをＷｅｂサーバ２０に投稿する（ステップＳ１１）。Ｗｅｂサーバ２０の動画投稿画面表示手段２１は、投稿された動画ファイル（動画データ）及びこの動画ファイルに付加されたメタデータを、ファイルサーバ４０の動画ファイルデータベース４１に格納する（ステップＳ１２）。 Hereinafter, with reference to FIGS. 3 to 10, the operation of the moving image playback apparatus 10 will be specifically described. FIG. 4 is a sequence diagram showing the operation of the moving image playback apparatus 10. The moving image posting means 15 accesses a moving image sharing site on the Web and posts a moving image file to the Web server 20 in accordance with the operation of the user 14 who is a moving image poster (step S11). The moving image posting screen display means 21 of the Web server 20 stores the posted moving image file (moving image data) and the metadata added to the moving image file in the moving image file database 41 of the file server 40 (step S12).

次に、動画検索サーバ３０の音声認識手段３１は、動画ファイルデータベース４１に動画ファイルが格納されると、動画ファイルデータベース４１から自動的に動画ファイルを取得する（ステップＳ１３）。続いて、音声認識手段３１は、言語モデルデータベース４２から言語モデルを取得して（ステップＳ１４）、音声認識を行い、動画ファイル内の音声をテキスト化して、認識テキストを作成する（ステップＳ１５）。ステップＳ１５では、テキスト化する際には、各単語が発話される先頭からの秒数を認識テキスト内の各単語に紐づけて記述する。 Next, when the moving image file is stored in the moving image file database 41, the voice recognition unit 31 of the moving image search server 30 automatically acquires the moving image file from the moving image file database 41 (step S13). Subsequently, the voice recognition means 31 acquires a language model from the language model database 42 (step S14), performs voice recognition, converts the voice in the moving image file into text, and creates a recognized text (step S15). In step S15, when converting to text, the number of seconds from the beginning of each word is described in association with each word in the recognized text.

続いて、音声認識手段３１は、言語モデルに登録されていないワードを言語モデルデータベース４２に追加登録する（ステップＳ１６）。次に、音声認識手段３１は、認識テキストをファイルサーバ４０の認識テキストデータベース４３に格納する（ステップＳ１７）。格納された認識テキストは、ファイルサーバ４０上の認識テキストデータベース４３内で自動的にインデックス化される（ステップＳ１８）。 Subsequently, the speech recognition means 31 additionally registers words that are not registered in the language model in the language model database 42 (step S16). Next, the speech recognition means 31 stores the recognized text in the recognized text database 43 of the file server 40 (step S17). The stored recognized text is automatically indexed in the recognized text database 43 on the file server 40 (step S18).

上記ステップＳ１１〜Ｓ１８の処理が行われた後に、利用者であるユーザ１２が目的の動画を検索する場合について説明する。まず、ユーザ１２の操作に従い、ユーザ端末１１から動画共有サイトへのアクセスが発生すると（ステップＳ１９）、Ｗｅｂサーバ２０のキーワード入力画面表示手段２２は、検索キーワード入力画面をユーザ端末１１に表示する（ステップＳ２０）。キーワード入力画面表示手段２２は、検索キーワード入力画面に目的の動画に関連すると思われるキーワードがキーワード設定手段１６から入力されると（ステップＳ２１）、入力されたキーワードを、動画検索サーバ３０の動画ファイル検索手段３２に送信する（ステップＳ２２）。 A case will be described in which the user 12 as a user searches for a target moving image after the processes in steps S11 to S18 are performed. First, when an access to the video sharing site occurs from the user terminal 11 according to the operation of the user 12 (step S19), the keyword input screen display means 22 of the Web server 20 displays a search keyword input screen on the user terminal 11 ( Step S20). When a keyword that seems to be related to the target video is input from the keyword setting unit 16 on the search keyword input screen (step S21), the keyword input screen display unit 22 converts the input keyword into a video file of the video search server 30. It transmits to the search means 32 (step S22).

次いで、動画ファイル検索手段３２は、例えばファイルサーバ４０の動画ファイルデータベース４１にアクセスし、動画ファイルに付加されたメタデータに、上記ステップＳ２１で入力されたキーワードが含まれる動画ファイルを検索する（ステップＳ２３）。次に、動画ファイル検索手段３２は、認識テキストデータベース４３にアクセスし、認識テキストに、入力されたキーワードが含まれる動画ファイルを検索する（ステップＳ２４）。 Next, the moving image file search means 32 accesses, for example, the moving image file database 41 of the file server 40, and searches for the moving image file in which the keyword input in step S21 is included in the metadata added to the moving image file (step S21). S23). Next, the moving image file search unit 32 accesses the recognized text database 43 and searches for a moving image file in which the input keyword is included in the recognized text (step S24).

続いて、動画ファイル検索手段３２は、ステップＳ２３，２４の検索結果から、入力されたキーワードが含まれる動画ファイル名一覧（検索結果動画一覧）を、認識テキストデータベース４３から取得する（ステップＳ２５）。次に、動画ファイル検索手段３２は、ステップＳ２５で取得した検索結果動画一覧に含まれる動画ファイルのメタデータ（タイトル、カテゴリ情報等）を、動画ファイルデータベース４１から取得する（ステップＳ２６）。次に、動画ファイル検索手段３２は、検索結果動画一覧に含まれる動画ファイルの表示順を算出する（ステップＳ３０）。ステップＳ３０での表示順とは、再生すべき動画の優先順位（順位）と動画に含まれるシーンの優先順位（順位）とを含む。以下、図５を参照して、動画ファイル検索手段３２によるステップＳ３０の処理について説明する。 Subsequently, the moving image file search means 32 acquires a moving image file name list (search result moving image list) including the input keyword from the recognized text database 43 from the search results of steps S23 and S24 (step S25). Next, the moving image file search means 32 acquires the metadata (title, category information, etc.) of the moving image file included in the search result moving image list acquired in step S25 from the moving image file database 41 (step S26). Next, the moving image file search means 32 calculates the display order of the moving image files included in the search result moving image list (step S30). The display order in step S30 includes the priority order (rank) of moving images to be reproduced and the priority order (rank) of scenes included in the moving images. Hereinafter, with reference to FIG. 5, the process of step S30 by the moving image file search means 32 will be described.

まず、動画ファイル検索手段３２は、検索結果動画一覧に含まれる動画ファイル毎に認識テキスト内に含まれているキーワード名とキーワード数とを、認識テキストデータベース４３から取得する（ステップＳ３１）。次に、認識テキスト内に含まれていたキーワードのキーワード重みを、キーワード情報データベース４４から取得する（ステップＳ３２）。 First, the moving image file search means 32 acquires the keyword name and the number of keywords included in the recognized text for each moving image file included in the search result moving image list from the recognized text database 43 (step S31). Next, the keyword weight of the keyword included in the recognized text is acquired from the keyword information database 44 (step S32).

ここで、キーワード及びキーワード重みについて説明する。キーワードは、プラスキーワードとマイナスキーワードとに区別される。この区別は、ユーザ１２がユーザ端末１１の動画選択手段１７を用いて、例えば動画順位を参照して、図９に示す動画ファイル検索結果動画一覧７０に含まれる動画ファイルを選択したか否かによる。なお、動画ファイルの選択は、図９に示す「対象」領域７１をチェックすればよく、また、「対象外」領域７２をチェックすれば、非選択となる。例えば、動画順位が上位である動画ファイルであっても、ユーザ１２が目的に合わない動画（目的のシーンを含まない動画）として動画ファイルを選択しなければ、この動画ファイルに含まれるキーワードはマイナスキーワードと定義される。一方、動画順位が下位であっても、ユーザが目的に合う動画（目的のシーンを含む動画）として動画ファイルを選択すれば、この動画ファイルに含まれるキーワードはプラスキーワードと定義される。 Here, keywords and keyword weights will be described. Keywords are classified into positive keywords and negative keywords. This distinction depends on whether or not the user 12 has selected a moving image file included in the moving image file search result moving image list 70 shown in FIG. . Note that the selection of the moving image file may be made by checking the “target” area 71 shown in FIG. 9 and not checking if the “non-target” area 72 is checked. For example, even if the video file has the highest video ranking, if the user 12 does not select a video file as a video that does not meet the purpose (a video that does not include the target scene), the keyword included in the video file is negative. Defined as a keyword. On the other hand, even if the moving image rank is low, if the user selects a moving image file as a moving image suitable for the purpose (moving image including the target scene), the keyword included in the moving image file is defined as a plus keyword.

プラスキーワードのキーワード重みは、ユーザ１２に選択された全ての動画ファイル内でのプラスキーワードの発生回数と、ユーザ１２に選択された全ての動画ファイルの数との比率で示される。つまり、プラスキーワードのキーワード重みは、ユーザ１２に選択された１動画に含まれるプラスキーワードの平均発生回数をいう。 The keyword weight of the plus keyword is indicated by a ratio between the number of occurrences of the plus keyword in all moving image files selected by the user 12 and the number of all moving image files selected by the user 12. That is, the keyword weight of the plus keyword refers to the average number of occurrences of the plus keyword included in one moving image selected by the user 12.

マイナスキーワードのキーワード重みは、ユーザ１２に選択されなかった全ての動画ファイル内でのマイナスキーワードの発生回数と、ユーザ１２に選択されなかった全ての動画ファイルの数との比率で示される。つまり、マイナスキーワードのキーワード重みは、ユーザ１２に選択されなかった１動画に含まれるマイナスキーワードの平均発生回数をいう。 The keyword weight of the minus keyword is indicated by a ratio between the number of occurrences of the minus keyword in all moving image files not selected by the user 12 and the number of all moving image files not selected by the user 12. That is, the keyword weight of the minus keyword refers to the average number of occurrences of the minus keyword included in one moving image that is not selected by the user 12.

図６に、キーワード情報データベース４４に格納されたプラスキーワード、マイナスキーワード及びそれぞれのキーワード重みの具体例を示す。ここでは、ユーザ１２が、Ａ首相の会見でアメリカ経済に関する意見を聞きたい場合に、キーワード設定手段１６を用いて、検索キーワードとして、「Ａ」、「首相」、「アメリカ」、「経済」を入力した場合を想定する。これらの入力されたキーワードを含む動画が、図９に示す動画ファイル検索結果動画一覧７０に表示されると、ユーザ１２は、目的に合う又は合わない動画を選択する。一例として、会見等の動画は、ユーザ１２によって目的に合う動画として選択される。このため、会見等の動画に含まれるキーワードは、図６に示すように、全てプラスキーワード４４ａとされる。一方、ニュース等の動画は、Ａ首相の会見ではなく、例えばキャスターの発言が主であるから、ユーザ１２によって目的に合わない動画として選択されることになる。このため、ニュース等の動画に含まれるキーワードは、全てマイナスキーワード４４ｂとされる。 FIG. 6 shows specific examples of plus keywords, minus keywords, and keyword weights stored in the keyword information database 44. Here, when the user 12 wants to hear an opinion about the American economy at the meeting of the Prime Minister A, the keyword setting means 16 is used as search keywords “A”, “Prime Minister”, “America”, “Economy”. Assume the case of input. When a moving image including these input keywords is displayed in the moving image file search result moving image list 70 shown in FIG. 9, the user 12 selects a moving image that suits or does not fit the purpose. As an example, a video such as a conference is selected by the user 12 as a video suitable for the purpose. For this reason, the keywords included in the moving image such as the conference are all set as positive keywords 44a as shown in FIG. On the other hand, a moving image such as news is not a meeting of Prime Minister A, but is mainly a remark of a caster, for example, and is therefore selected by the user 12 as a moving image that does not meet the purpose. For this reason, all the keywords included in the moving images such as news are set to the minus keywords 44b.

プラスキーワードのキーワード重みは、「Ａ」、「首相」、「アメリカ」、「経済」に対して、それぞれ「０．３０」、「０．４２」、「３．１７」、「２．５０」となっている。なお、「Ａ」、「首相」のキーワード重みが、「アメリカ」、「経済」のキーワード重みと比べて小さくなっている理由は、会見等の動画では、Ａ首相自身が話しているので、「Ａ」、「首相」等のプラスキーワードの平均発生回数が少ないためである。 The keyword weights of the plus keywords are “0.30”, “0.42”, “3.17”, “2.50” for “A”, “Prime Minister”, “America”, and “Economy”, respectively. It has become. The reason why the keyword weights of “A” and “Prime Minister” are smaller than the keyword weights of “America” and “Economy” is because Prime Minister A talks in videos such as conferences. This is because the average number of occurrences of positive keywords such as “A” and “Prime Minister” is small.

一方、マイナスキーワードのキーワード重みは、「Ａ」、「首相」、「アメリカ」、「経済」に対して、それぞれ「２．５０」、「３．２６」、「０．６０」、「０．２４」となっている。なお、「Ａ」、「首相」のキーワード重みが、「アメリカ」、「経済」のキーワード重みと比べて大きくなっている。この理由は、ニュース等の動画では、キャスターが「Ａ」、「首相」等のマイナスキーワードを多く発言するので、これらのマイナスキーワードの平均発生回数が大きいからである。 On the other hand, the keyword weights of the negative keywords are “2.50”, “3.26”, “0.60”, “0.0” for “A”, “Prime Minister”, “America”, and “Economy”, respectively. 24 ". Note that the keyword weights of “A” and “Prime Minister” are larger than the keyword weights of “America” and “Economy”. This is because, in a video such as news, the caster speaks many negative keywords such as “A” and “Prime Minister”, so the average number of occurrences of these negative keywords is large.

ここで、キーワード「Ａ」に着目する。プラスキーワード「Ａ」のキーワード重みが「０．３０」となる例としては、ユーザ１２に選択された動画数が１０個であり、この１０個の動画内で「Ａ」の発生回数が３回であった場合等が挙げられる。また、マイナスキーワード「Ａ」のキーワード重みが「２．５０」となる例としては、ユーザ１２に選択されなかった動画数が１２個であり、この１２個の動画内で「Ａ」の発生回数が３０回であった場合等が挙げられる。これらのキーワード重みを算出する際に用いられた動画数、発生回数は、例えばキーワード情報データベース４４に保持される。 Here, attention is focused on the keyword “A”. As an example in which the keyword weight of the plus keyword “A” is “0.30”, the number of moving images selected by the user 12 is 10, and the number of occurrences of “A” is 3 times in the 10 moving images. And the like. Further, as an example in which the keyword weight of the minus keyword “A” is “2.50”, the number of moving images not selected by the user 12 is 12, and the number of occurrences of “A” in these 12 moving images. Is 30 times. The number of moving images and the number of occurrences used when calculating these keyword weights are held in the keyword information database 44, for example.

再び図５に戻り説明する。動画ファイル検索手段３２は、ステップＳ３２で取得したキーワード重みから、プラスキーワードのキーワード重みとマイナスキーワードのキーワード重みとの差（キーワードの重み値の差）を算出する（ステップＳ３３）。図６に示す各キーワード重みから、キーワード「Ａ」の重み値の差が「−２．２０」、キーワード「首相」の重み値の差が「−２．８４」、キーワード「アメリカ」の重み値の差が「２．５７」、キーワード「経済」の重み値の差が「２．２６」となる。 Returning again to FIG. The moving image file search means 32 calculates the difference (keyword weight value difference) between the keyword weight of the plus keyword and the keyword weight of the minus keyword from the keyword weight acquired in step S32 (step S33). From the keyword weights shown in FIG. 6, the difference between the weight values of the keyword “A” is “−2.20”, the difference between the weight values of the keyword “Prime” is “−2.84”, and the weight value of the keyword “America”. Is “2.57”, and the difference between the weight values of the keyword “economy” is “2.26”.

次に、動画ファイル検索手段３２は、動画ファイル内のキーワード間の時間間隔を算出する（ステップＳ３４）。キーワード間の時間間隔は、動画ファイル検索手段３２が経過時間算出手段として、上記したように、認識テキスト内の各キーワードと、動画ファイルの再生時における先頭からの経過時間とを対応付け、認識テキストを動画ファイルの再生時間に従って複数の区間（シーン）に区分することで算出できる。即ち、動画ファイル検出手段３２は、図７（ａ）に示すように、シーンＩＤ３３ａ、キーワード名３３ｂ、キーワードが含まれている動画ファイル名３３ｃ、及び発話開始時間３３ｄが含まれるキーワード情報を生成し、このキーワード情報をキーワード情報一時検索結果データベース３３に格納する。なお、キーワード情報には、動画ファイル検索手段３２がキーワード情報データベース４４から取得したキーワード重みも含まれる。以下では、動画ファイル名「動画１」の動画に着目する。 Next, the moving image file search means 32 calculates a time interval between keywords in the moving image file (step S34). As described above, the time interval between the keywords is determined by associating each keyword in the recognized text with the elapsed time from the beginning at the time of playback of the moving image file as the elapsed time calculating means. Can be calculated by dividing it into a plurality of sections (scenes) according to the playback time of the moving image file. That is, as shown in FIG. 7A, the moving image file detecting unit 32 generates keyword information including a scene ID 33a, a keyword name 33b, a moving image file name 33c including the keyword, and an utterance start time 33d. The keyword information is stored in the keyword information temporary search result database 33. The keyword information includes keyword weights acquired from the keyword information database 44 by the moving image file search means 32. In the following, attention is focused on the moving image with the moving image file name “moving image 1”.

「動画１」のキーワード情報は、図７（ｂ）のように時系列で示すと、動画再生開始から動画再生終了までの間で、キーワード「首相」からキーワード「アメリカ」までの時間間隔が４５秒、キーワード「アメリカ」からキーワード「経済」までの時間間隔が２秒、キーワード「経済」からキーワード「アメリカ」までの時間間隔が３秒となる。 When the keyword information of “Movie 1” is shown in time series as shown in FIG. 7B, the time interval from the keyword “Prime Minister” to the keyword “USA” is 45 from the start of movie playback to the end of movie playback. Second, the time interval from the keyword “America” to the keyword “Economy” is 2 seconds, and the time interval from the keyword “Economy” to the keyword “America” is 3 seconds.

続いて、動画ファイル検索手段３２は、例えば「動画１」について、上記ステップＳ３１で取得した「認識テキスト内に含まれるキーワード数」、上記ステップＳ３３で取得した「キーワードの重み値の差」、及び、上記ステップＳ３４で取得した「キーワード間の時間間隔」の３つの要素に基づいて、動画順位を決定する（ステップＳ３５）。 Subsequently, the moving image file search means 32, for example, for “moving image 1”, “the number of keywords included in the recognized text” acquired in step S31, “the difference in keyword weight values” acquired in step S33, and The moving image ranking is determined based on the three elements of “time interval between keywords” acquired in step S34 (step S35).

ステップＳ３５で用いられる計算式としては、例えば、以下の式（１）が挙げられる。
{（キーワード「Ａ」の重み値の差）×（キーワード「Ａ」のキーワード数）＋（キーワード「首相」の重み値の差）×（キーワード「首相」のキーワード数）＋…}＋{（キーワード間の時間間隔が３０秒以内の個数）／（３０秒以内のキーワード間の平均秒数）}
式（１） As a calculation formula used at Step S35, the following formula (1) is mentioned, for example.
{(Difference in weight value of keyword “A”) × (Number of keywords of keyword “A”) + (Difference in weight value of keyword “Prime”) × (Number of keywords of keyword “Prime”) + ...} + {( Number of time intervals between keywords within 30 seconds) / (Average number of seconds between keywords within 30 seconds)}
Formula (1)

ここで、図７（ｂ）を参照すると、「動画１」でのキーワード「Ａ」のキーワード数は０個、キーワード「首相」のキーワード数は１個、キーワード「アメリカ」のキーワード数は２個、キーワード「経済」のキーワード数は１個である。同じく図７（ｂ）を参照すると、キーワード間の時間間隔が３０秒以内のキーワードの個数は３個、この３個のキーワード間の平均秒数は（２＋３）／２＝２．５秒となる。 Here, referring to FIG. 7B, the number of keywords “A” in “Movie 1” is 0, the number of keywords “Prime” is 1, and the number of keywords “America” is 2. The number of keywords “Economy” is one. Similarly, referring to FIG. 7B, the number of keywords whose time interval between keywords is within 30 seconds is 3, and the average number of seconds between these three keywords is (2 + 3) /2=2.5 seconds. .

従って、これらの数値を式（１）に代入すると、
{（−２．２０×０）＋（−２．８４×１）＋（２．５７×２）＋（２．２６×１）}＋（３／２．５）＝５．７６
となり、動画順位を決定するための値「５．７６」が得られる。このような計算を、他の動画ファイルに対しても行うことで、動画ファイルの動画順位を決定できる。つまり、動画ファイル検索手段３２は、キーワードの重み値の差が大きいほど、キーワードの発生回数が多いほど、且つ、キーワードの単位時間当たりの発生回数が多いほど、動画順位を上位とする。 Therefore, when these numerical values are substituted into the equation (1),
{(-2.20 × 0) + (− 2.84 × 1) + (2.57 × 2) + (2.26 × 1)} + (3 / 2.5) = 5.76
Thus, the value “5.76” for determining the moving image ranking is obtained. By performing such calculation for other moving image files, the moving image ranking of the moving image files can be determined. That is, the moving image file search means 32 sets the moving image rank higher as the difference between the keyword weight values is larger, the keyword is generated more frequently, and the keyword is generated more times per unit time.

次に、動画ファイル検索手段３２は、「キーワード間の時間間隔」と「プラスキーワードのキーワード重み」とに基づいて、動画ファイル内の各キーワードで区分された区間（シーン）毎の順位を決定する（ステップＳ３６）。ここで、「動画１」内のシーンＩＤ：００１〜００４までの４つのシーンの順位を決定する場合について、図７（ｂ）を参照して説明する。 Next, the moving image file search means 32 determines the rank for each section (scene) divided by each keyword in the moving image file based on the “time interval between keywords” and “the keyword weight of a plus keyword”. (Step S36). Here, the case where the order of the four scenes ID “001” to “004” in “Movie 1” is determined will be described with reference to FIG.

ステップＳ３６で用いられる計算式としては、例えば、以下の式（２）が挙げられる。
{（プラスキーワードのキーワード重み）／（隣り合うキーワード間の秒数の合計値）}
式（２） As a calculation formula used at Step S36, the following formula (2) is mentioned, for example.
{(Keyword weight of plus keyword) / (Total number of seconds between adjacent keywords)}
Formula (2)

但し、隣り合うキーワードが１つの場合には、この隣り合うキーワード間の秒数を２倍した値を合計値とする。よって、隣り合うキーワード間の秒数の合計値は、図７（ｂ）を参照すると、シーンＩＤ：００１が「４５×２＝９０秒」、シーンＩＤ：００２が「４５＋２＝４７秒」、シーンＩＤ：００３が「２＋３＝５秒」、シーンＩＤ：００４が「３×２＝６秒」となる。 However, when there is one adjacent keyword, a value obtained by doubling the number of seconds between the adjacent keywords is set as the total value. Therefore, referring to FIG. 7B, the total value of the number of seconds between adjacent keywords is as follows: scene ID: 001 is “45 × 2 = 90 seconds”, scene ID: 002 is “45 + 2 = 47 seconds”, scene ID: 003 is “2 + 3 = 5 seconds” and scene ID: 004 is “3 × 2 = 6 seconds”.

そこで、これらの合計値と、図６に示したプラスキーワードのキーワード重みとを式（２）に代入すると、シーンＩＤ：００１が「０．４２／９０＝０．００４７」、シーンＩＤ：００２が「３．１７／４７＝０．０６７」、シーンＩＤ：００３が「２．５０／５＝０．５０」、シーンＩＤ：００４が「３．１７／６＝０．５３」を得られ、得られた値が大きい程、シーン順位を上位とする。よって、「動画１」内のシーン順位は、シーンＩＤ：００４＞シーンＩＤ：００３＞シーンＩＤ：００２＞シーンＩＤ：００１となる。つまり、動画ファイル検索手段３２は、シーンのプラスキーワードのキーワード重みが大きいほど、且つ、キーワードの発生間隔が短いほど、より重要なシーンと判定してシーン順位を上位とする。 Therefore, by substituting these total values and the keyword weights of the plus keywords shown in FIG. 6 into Expression (2), the scene ID: 001 is “0.42 / 90 = 0.007” and the scene ID: 002 is “3.17 / 47 = 0.67”, scene ID: 003 “2.50 / 5 = 0.50”, and scene ID: 004 “3.17 / 6 = 0.53” are obtained. The larger the value obtained, the higher the scene ranking. Therefore, the scene order in “Movie 1” is Scene ID: 004> Scene ID: 003> Scene ID: 002> Scene ID: 001. That is, the moving image file search unit 32 determines that the scene is more important as the keyword weight of the plus keyword of the scene is larger and the keyword generation interval is shorter, and ranks the scene higher.

続いて、動画ファイル検索手段３２は、ステップＳ３５で決定した動画ファイルの動画順位と、ステップＳ３６で決定した動画ファイル内のシーン順位とを、動画ファイル名等の情報と共に、動画ファイル一時検索結果データベース３４に格納する（ステップＳ３７）。ステップＳ３７では、図８に示すように、動画ファイル一時検索結果データベース３４に、検索結果一覧として、動画順位３４ａ、シーン順位３４ｂ、動画ファイル名３４ｃ、シーンＩＤ３４ｄ、更に、タイトル、カテゴリ、動画サイズ等が格納される。 Subsequently, the moving image file search means 32 stores the moving image file ranking determined in step S35 and the scene order in the moving image file determined in step S36, together with information such as the moving image file name, in a moving image file temporary search result database. 34 (step S37). In step S37, as shown in FIG. 8, in the moving image file temporary search result database 34, a moving image rank 34a, a scene rank 34b, a moving image file name 34c, a scene ID 34d, a title, a category, a moving image size, etc. Is stored.

再び図４に戻り説明する。動画ファイル検索手段３２は、図７（ａ）に示す内容でキーワード情報一時検索結果データベース３３に格納したキーワード情報と、図８に示す動画ファイル一時検索結果データベース３４に格納した検索結果一覧とを、Ｗｅｂサーバ２０の動画検索結果表示手段２３に送信する（ステップＳ４０）。 Returning again to FIG. The moving image file search means 32 stores the keyword information stored in the keyword information temporary search result database 33 with the contents shown in FIG. 7A and the search result list stored in the moving image file temporary search result database 34 shown in FIG. It transmits to the moving image search result display means 23 of the Web server 20 (step S40).

次に、動画検索結果表示手段２３は、ステップＳ４０で取得したキーワード情報と検索結果一覧とに基づいて、Ｗｅｂ画面上に、図９に示す検索結果一覧画面（動画ファイル検索結果一覧）７０を表示する（ステップＳ４１）。動画ファイル検索結果一覧７０には、図示のように、動画順位、シーン順位、更に動画のタイトル、カテゴリ、動画ファイル名、検索ワード（入力されたキーワード）が表示されている。さらに、動画ファイル検索結果一覧７０には、図示のように、ユーザ１２が、目的に合った動画、又は、目的に合っていない動画である選択をするための「対象」領域７１及び「対象外」領域７２と、動画再生画面７３と、映像の時間軸７４と、動画内でのシーンの位置７５とが表示される。なお、シーンの位置７５をクリックすると、動画再生におけるシーンの頭出しができる。 Next, the video search result display means 23 displays a search result list screen (video file search result list) 70 shown in FIG. 9 on the Web screen based on the keyword information acquired in step S40 and the search result list. (Step S41). In the moving image file search result list 70, as shown in the figure, moving image ranking, scene ranking, moving image title, category, moving image file name, and search word (input keyword) are displayed. Further, in the moving image file search result list 70, as shown in the figure, a “target” area 71 and a “non-target” for the user 12 to select a moving image that matches the purpose or a moving image that does not meet the purpose. ”Area 72, moving image playback screen 73, video time axis 74, and scene position 75 in the moving image are displayed. If the scene position 75 is clicked, the scene can be cued in the moving image reproduction.

続いて、ステップＳ３０で表示された動画ファイル検索結果一覧７０から、「対象」領域７１又は「対象外」領域７２がユーザ１２の操作に応じてチェックされると、ユーザ端末１１の動画選択手段１７は、目的に合っている又は合っていない動画を選択し（ステップＳ４２）、選択結果を動画検索結果表示手段２３に送信する。動画検索結果表示手段２３は、選択結果を、動画検索サーバ３０のキーワード重み付け変更手段３５に送信する（ステップＳ４３）。 Subsequently, when the “target” area 71 or the “non-target” area 72 is checked in accordance with the operation of the user 12 from the moving image file search result list 70 displayed in step S 30, the moving picture selection means 17 of the user terminal 11. Selects a moving image that matches or does not match the purpose (step S42), and transmits the selection result to the moving image search result display means 23. The moving image search result display unit 23 transmits the selection result to the keyword weight change unit 35 of the moving image search server 30 (step S43).

キーワード重み付け変更手段３５は、選択結果に基づいて、目的に合っている動画として選択された動画ファイルに含まれるキーワード（即ち、プラスキーワード）と、目的に合っていない動画として選択された動画ファイルに含まれるキーワード（即ち、マイナスキーワード）と、これらのキーワード重みとを、キーワード情報一時検索結果データベース３３から取得する（ステップＳ４４）。 Based on the selection result, the keyword weighting changing unit 35 applies the keyword (that is, the plus keyword) included in the moving image file selected as the moving image suitable for the purpose and the moving image file selected as the moving image not suited for the purpose. The keywords included (ie, minus keywords) and their keyword weights are acquired from the keyword information temporary search result database 33 (step S44).

次に、キーワード重み付け変更手段３５は、取得したプラスキーワードのキーワード重み、マイナスキーワードのキーワード重みを変更（更新）する（ステップＳ４５）。以下、図１０を参照して、ステップＳ４５でのキーワード重みを更新する処理について説明する。図１０は、更新後のプラスキーワードのキーワード重み、及び、マイナスキーワードのキーワード重みを示している。ここでは、一例としてキーワード「Ａ」に着目する。 Next, the keyword weight changing means 35 changes (updates) the keyword weight of the acquired plus keyword and the keyword weight of the minus keyword (step S45). Hereinafter, with reference to FIG. 10, the process of updating the keyword weight in step S45 will be described. FIG. 10 shows the keyword weight of the plus keyword and the keyword weight of the minus keyword after the update. Here, attention is focused on the keyword “A” as an example.

キーワード重み付け変更手段３５は、キーワード情報一時検索結果データベース３３にアクセスする。キーワード重み付け変更手段３５は、ファイルサーバ４０内のキーワード情報データベース４４に保持されていた更新前のプラスキーワード「Ａ」のキーワード重み「０．３」と、この値を算出するために用いられた、ユーザ１２に選択された動画数「１０個」と、この１０個の動画内での「Ａ」の発生回数「３回」とを取得する。一例として、ステップＳ４２の選択結果が、目的に合っている動画として新たに５個の動画が選択され、この５個の動画内での「Ａ」の発生回数が１回であったとする。この場合には、キーワード重み付け変更手段３５は、プラスキーワード「Ａ」の新たなキーワード重みを
{（３＋１）／（１０＋５）}≒０．２７とする。 The keyword weighting changing means 35 accesses the keyword information temporary search result database 33. The keyword weight change means 35 is used to calculate the keyword weight “0.3” of the plus keyword “A” before update held in the keyword information database 44 in the file server 40 and this value. The number of moving images “10” selected by the user 12 and the number of occurrences “3” of “A” in the ten moving images are acquired. As an example, it is assumed that five moving images are newly selected as moving images that match the purpose of the selection result in step S42, and the number of occurrences of “A” in the five moving images is one. In this case, the keyword weight changing means 35 sets the new keyword weight of the plus keyword “A”.
{(3 + 1) / (10 + 5)} ≈0.27.

つまり、キーワード重み付け変更手段３５は、図１０に示すように、プラスキーワード「Ａ」のキーワード重み４４ｃを、ステップＳ４２の選択結果に応じて、図６に示す「０．３」から「０．２７」に更新する。 That is, as shown in FIG. 10, the keyword weight changing means 35 changes the keyword weight 44c of the plus keyword “A” from “0.3” to “0.27” shown in FIG. 6 according to the selection result of step S42. Update to

また、キーワード重み付け変更手段３５は、キーワード情報一時検索結果データベース３３にアクセスし、キーワード情報データベース４４に保持されていた更新前のマイナスキーワード「Ａ」のキーワード重み「２．５０」と、この値を算出するために用いられた、ユーザ１２に目的に合っていないとして選択された動画数「１２個」と、この１２個の動画内での「Ａ」の発生回数「３０回」とを取得する。一例として、ステップＳ４２の選択結果が、目的に合っていない動画として新たに３個の動画が選択され、この３個の動画内での「Ａ」の発生回数が８回であったとする。この場合には、キーワード重み付け変更手段３５は、マイナスキーワード「Ａ」の新たなキーワード重みを
{（３０＋８）／（１２＋３）}≒２．５３とする。 Further, the keyword weighting changing means 35 accesses the keyword information temporary search result database 33, and stores the keyword weight “2.50” of the negative keyword “A” before update held in the keyword information database 44 and this value. The number of moving images “12” selected as being unsuitable for the user 12 and the number of occurrences “30” of “A” in these 12 moving images used for calculation are acquired. . As an example, it is assumed that three moving images are newly selected as moving images that do not match the purpose of the selection result in step S42, and the number of occurrences of “A” in these three moving images is eight. In this case, the keyword weight changing unit 35 sets a new keyword weight for the minus keyword “A”.
{(30 + 8) / (12 + 3)} ≈2.53.

つまり、キーワード重み付け変更手段３５は、図１０に示すように、マイナスキーワード「Ａ」のキーワード重み４４ｄを、ステップＳ４２の選択結果に応じて、図６に示す「２．５」から「２．５３」に更新する。上記計算を他のキーワードに適用することで、図１０に例示する更新後のプラスキーワードのキーワード重み及びマイナスキーワードのキーワード重みが算出可能となる。 That is, as shown in FIG. 10, the keyword weight changing means 35 changes the keyword weight 44d of the minus keyword “A” from “2.5” to “2.53” shown in FIG. 6 according to the selection result in step S42. Update to By applying the above calculation to other keywords, it is possible to calculate the keyword weight of the updated plus keyword and the keyword weight of the minus keyword, which are illustrated in FIG.

次いで、キーワード重み付け変更手段３５は、キーワード情報データベース４４に既に格納されているキーワードに更新後のキーワード重みを付与する（ステップＳ４６）。キーワード情報データベース４４には、図１０に示すように、同時に検索されたキーワードのセット（例えば、「Ａ」「首相」「アメリカ」「経済」からなるワンセット）毎にキーワード重みが保存される。なお、ステップＳ４６では、キーワード情報データベース４４に格納されていないキーワードについては、算出されたキーワード重みと共に、キーワード情報データベース４４に新規に登録する。 Next, the keyword weight changing means 35 gives the updated keyword weight to the keyword already stored in the keyword information database 44 (step S46). As shown in FIG. 10, keyword weights are stored in the keyword information database 44 for each set of keywords searched simultaneously (for example, one set including “A”, “Prime Minister”, “America”, and “Economy”). In step S46, keywords that are not stored in the keyword information database 44 are newly registered in the keyword information database 44 together with the calculated keyword weights.

ユーザ１２は、目的の動画が検索されるまで、ユーザ端末１１のキーワード設定手段１６を用いてキーワードを入力し、さらに、動画選択手段１７を用いて動画ファイル検索結果一覧７０から目的の動画を選択する。キーワード重み付け変更手段３５は、ユーザ１２の操作による動画の選択に応じて、キーワード重みを更新する。そして、動画ファイル検索手段３２は、動画順位及びシーン順位を決定する際に更新後のキーワード重みを参照する。つまり、動画再生装置１０では、ユーザ１２が再度同じ検索キーワードで検索をする場合に、更新されたキーワード重みが適用されるので、目的に合った動画及びシーンの順位が上位に表示され、より目的に合った動画ファイルを検索できる。 The user 12 inputs a keyword using the keyword setting unit 16 of the user terminal 11 until the target video is searched, and further selects the target video from the video file search result list 70 using the video selection unit 17. To do. The keyword weight change unit 35 updates the keyword weight according to the selection of the moving image by the operation of the user 12. Then, the moving image file search means 32 refers to the updated keyword weight when determining the moving image ranking and the scene ranking. In other words, in the video playback device 10, when the user 12 searches again with the same search keyword, the updated keyword weights are applied, so that the ranking of the video and scene that suits the purpose is displayed higher, You can search for video files that fit your needs.

本実施形態では、キーワード毎にキーワード重みがあり、このキーワード重みがユーザの操作を反映して動的に更新されるので、使用すればする程、検索結果の上位に目的のシーンを含む動画が表示される精度が高まる。また、キーワード間の時間間隔に着目することで、動画ファイル内でのシーンの順位を決定できる。さらに、キーワードと、動画ファイルの再生時における先頭からの再生位置（経過時間）とが対応付けられているので、動画ファイル内でキーワードが発話されている目的のシーンを瞬時に頭出しできる。また、動画ファイルの音声認識による発話内容からの検索と動画ファイルのメタデータによる検索とを組み合わせることで、精度の高い検索が可能となる。よって、本実施形態では、Ｗｅｂ上等にある大量の動画ファイルから、動画内の話者がキーワードを実際に発話しているシーンを効率的に検索できる。また、入力されたキーワードを蓄積（学習）することで、効率的な検索の絞り込みが可能となる。 In the present embodiment, there is a keyword weight for each keyword, and this keyword weight is dynamically updated to reflect the user's operation. The displayed accuracy is increased. In addition, by focusing on the time interval between keywords, the order of scenes in a moving image file can be determined. Furthermore, since the keyword is associated with the reproduction position (elapsed time) from the beginning when the moving image file is reproduced, the target scene where the keyword is uttered in the moving image file can be found immediately. In addition, it is possible to perform a highly accurate search by combining the search from the utterance content by the voice recognition of the moving image file and the search by the metadata of the moving image file. Therefore, in this embodiment, it is possible to efficiently search a scene in which a speaker in a moving image actually speaks a keyword from a large amount of moving image files on the Web or the like. Further, by storing (learning) the input keywords, it is possible to narrow down search efficiently.

上記実施形態では、ユーザ１２が、目的のシーンを含む動画として、Ａ首相の会見でアメリカ経済に関する意見を聞きたい場合を例示したが、これに限定されない。一例として、図１１に示すように、ユーザ１２が、サッカーのＢ選手のインタビューでゴールの感想を聞きたい場合に、検索キーワードとして、「サッカー」「Ｂ」「インタビュー」「ゴール」を入力したとする。 In the above-described embodiment, the case where the user 12 wants to hear an opinion about the US economy at a meeting of Prime Minister A as a moving image including a target scene is illustrated, but the present invention is not limited to this. As an example, as shown in FIG. 11, when the user 12 wants to hear the impression of the goal in an interview with a soccer player B, he / she entered “soccer” “B” “interview” “goal” as a search keyword. To do.

これらの検索キーワードを含む動画が、図９に示す動画ファイル検索結果動画一覧７０に表示されると、ユーザ１２は、目的に合う又は合わない動画を選択することになる。例えば、インタビュー映像の動画は、ユーザ１２によって目的に合う動画として選択される。このため、インタビュー映像の動画に含まれるキーワードは、全てプラスキーワード４４ｅとされる。プラスキーワードのキーワード重みは、「サッカー」、「Ｂ」、「インタビュー」、「ゴール」に対して、それぞれ「０．２４」、「０．８１」、「１．０５」、「２．７６」となっている。インタビュー映像では、「サッカー」「Ｂ」などのプラスキーワードが他のプラスキーワードと比べて平均発生回数が少ないので、キーワード重みが小さくなっている。 When a moving image including these search keywords is displayed in the moving image file search result moving image list 70 shown in FIG. 9, the user 12 selects a moving image that suits or does not match the purpose. For example, the video of the interview video is selected by the user 12 as a video suitable for the purpose. For this reason, all the keywords included in the video of the interview video are the positive keywords 44e. The keyword weights of the plus keywords are “0.24”, “0.81”, “1.05”, and “2.76” for “soccer”, “B”, “interview”, and “goal”, respectively. It has become. In the interview video, the keyword weight is small because the positive keywords such as “soccer” and “B” have a smaller average number of occurrences than other positive keywords.

一方、ニュース映像、サッカーのプレイ映像の動画は、Ｂ選手のインタビューではなく、例えばキャスターの発言が主であるから、ユーザ１２によって目的に合わない動画として選択されることになる。このため、ニュース映像、サッカーのプレイ映像の動画に含まれるキーワードは、全てマイナスキーワード４４ｆとされる。マイナスキーワードのキーワード重みは、「サッカー」、「Ｂ」、「インタビュー」、「ゴール」に対して、それぞれ「２．２０」、「２．８９」、「１．５５」、「１．１０」となっている。ニュース映像、サッカーのプレイ映像では、「サッカー」「Ｂ」などのマイナスキーワードが他のマイナスキーワードと比べて平均発生回数が大きいので、キーワード重みが大きくなっている。なお、これらのプラスキーワード４４ｅ及びマイナスキーワード４４ｆは、キーワード情報データベース４４に格納される。このような場合であっても、上記実施形態の構成により、動画内の話者が実際にキーワードを発話しているシーンを高い精度で検索できる。 On the other hand, since the video of the news video and the football play video is not the interview of the player B but mainly the remark of the caster, for example, the video is selected by the user 12 as an unsuitable video. For this reason, all the keywords included in the video of the news video and the soccer play video are set to the minus keyword 44f. The keyword weights of the negative keywords are “2.20”, “2.89”, “1.55”, and “1.10” for “soccer”, “B”, “interview”, and “goal”, respectively. It has become. In news videos and soccer play videos, negative keywords such as “soccer” and “B” have a larger average number of occurrences than other negative keywords, and thus the keyword weight is large. The positive keyword 44e and the negative keyword 44f are stored in the keyword information database 44. Even in such a case, according to the configuration of the above-described embodiment, a scene in which a speaker in a moving image actually speaks a keyword can be searched with high accuracy.

また、上記実施形態では、Ｗｅｂ上にアップロードされた動画ファイルの音声データに基づいて、目的のシーンが含まれる動画を検索する例について説明したが、これに限定されない。一例として、ハードディスクレコーダー等に保存された動画ファイルから目的のシーンを検索することもできる。 Moreover, although the said embodiment demonstrated the example which searches the moving image containing the target scene based on the audio | voice data of the moving image file uploaded on Web, it is not limited to this. As an example, a target scene can be searched from a moving image file stored in a hard disk recorder or the like.

以上、本発明をその好適な実施形態に基づいて説明したが、本発明の動画再生装置、動画再生方法及びプログラムは、上記実施形態の構成にのみ限定されるものではなく、上記実施形態の構成から種々の修正及び変更を施したものも、本発明の範囲に含まれる。 As described above, the present invention has been described based on the preferred embodiment. However, the moving image reproducing apparatus, the moving image reproducing method, and the program of the present invention are not limited to the configuration of the above embodiment, and the configuration of the above embodiment. To which various modifications and changes are made within the scope of the present invention.

１，１０：動画再生装置
２，３１：音声認識手段
３：テキストファイル生成手段
４：記憶装置
５：キーワード検出手段
６：動画順位決定手段
７：動画再生手段
８：重み更新手段
１１，１３：ユーザ端末
１２：ユーザ（利用者）
１４：ユーザ（動画投稿者）
１５：動画投稿手段
１６：キーワード設定手段
１７：動画選択手段
２０：Ｗｅｂサーバ
２１：動画投稿画面表示手段
２２：キーワード入力画面表示手段
２３：動画検索結果表示手段
３０：動画検索サーバ
３２：動画ファイル検索手段
３３：キーワード情報一時検索結果データベース
３４：動画ファイル一時検索結果データベース
３５：キーワード重み付け変更手段
４０：ファイルサーバ
４１：動画ファイルデータベース
４２：言語モデルデータベース
４３：認識テキストデータベース
４４：キーワード情報データベース
５０：ＬＡＮ
６０：インターネット
７０：検索結果一覧画面 DESCRIPTION OF SYMBOLS 1,10: Moving image reproducing device 2, 31: Voice recognition means 3: Text file generating means 4: Storage device 5: Keyword detecting means 6: Moving image ranking determining means 7: Moving image reproducing means 8: Weight updating means 11, 13: User Terminal 12: User (user)
14: User (video contributor)
15: Movie posting unit 16: Keyword setting unit 17: Movie selection unit 20: Web server 21: Movie posting screen display unit 22: Keyword input screen display unit 23: Movie search result display unit 30: Movie search server 32: Movie file search Means 33: Keyword information temporary search result database 34: Movie file temporary search result database 35: Keyword weight change means 40: File server 41: Movie file database 42: Language model database 43: Recognition text database 44: Keyword information database 50: LAN
60: Internet 70: Search result list screen

Claims

A voice recognition means for voice recognition of a video file;
Text file generation means for generating a text file from the output of the voice recognition means;
Keyword detecting means for detecting a keyword stored in advance in the storage device from the text file;
A video ranking determining means for determining the priority of the video to be played with reference to the keyword weight attached to the keyword,
Movie playback means for playing back the movie selected by the user's selection referring to the movie ranking;
A moving image playback apparatus comprising: a moving image selected by a user; and weight updating means for updating the keyword weight based on a mode in which the keyword is generated.

The moving image reproduction apparatus according to claim 1, further comprising an elapsed time calculation unit that associates the description in the text file with the elapsed time from the beginning when the moving image file is reproduced.

The elapsed time calculating means divides the text file into a plurality of sections according to the playback time of the moving picture file, the moving picture ranking determining means determines the priority order of the sections to be played back, and the moving picture playing means The moving image playback device according to claim 2, wherein a section to be played back is selected from the sections.

4. The moving image reproducing apparatus according to claim 2, wherein the keyword generation mode includes at least one of the number of occurrences of the keyword per unit time, the number of occurrences of the keyword in the moving image file, and the keyword occurrence interval.

The weight updating means defines a keyword included in the video selected by the user as a positive keyword, and the ratio of the number of selected videos and the number of videos to be selected increases as the number of occurrences of the positive keyword increases. The smaller the, the higher the keyword weight of the plus keyword,
A keyword included in a video that is not selected by the user is defined as a negative keyword. The more the negative keyword is generated, the smaller the ratio between the number of videos that are not selected and the number of videos to be selected. The moving image reproducing device according to claim 1, wherein the moving image reproducing device is modified so that the keyword weight of the minus keyword increases.

The moving image ranking determining means reproduces the larger the difference between the keyword weight of the plus keyword and the keyword weight of the minus keyword, the greater the number of occurrences of the keyword, and the greater the number of occurrences of the keyword per unit time. The moving image reproducing apparatus according to claim 5, wherein the priority order of moving images to be given is higher.

6. The moving image ranking determining means sets the priority of the section to be played back higher as the keyword weight of the plus keyword in the section to be played back of the moving picture is larger and the keyword occurrence interval is shorter. 6. The moving image playback device according to 6.

A voice recognition step for recognizing a video file,
A text file generation step for generating a text file from the output of the speech recognition step;
A keyword detection step of detecting a keyword stored in advance in the storage device from the text file;
A video ranking determination step that determines the priority of the videos to be played by referring to the keyword weights attached to the keywords;
A video playback step of playing back the video selected by the user's selection referring to the video ranking;
A video reproduction method comprising: a weight update step of updating the keyword weight based on a video selected by a user and an aspect in which the keyword is generated.

A program for a video playback device that includes a computer and plays back a video, the computer comprising:
Voice recognition processing that recognizes video files,
A text file generation process for generating a text file from the output of the speech recognition process;
A keyword detection process for detecting a keyword stored in advance in the storage device from the text file;
A video ranking determination process that determines the priority of videos to be played with reference to the keyword weights assigned to the keywords,
A video playback process for playing back the video selected by the user's selection referring to the video ranking;
A program for executing weight update processing for updating the keyword weight based on a moving image selected by a user and an aspect in which the keyword is generated.