JP2010287974A

JP2010287974A - Mobile phone and program

Info

Publication number: JP2010287974A
Application number: JP2009138829A
Authority: JP
Inventors: Takanori Yamada; 貴則山田
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2009-06-10
Filing date: 2009-06-10
Publication date: 2010-12-24

Abstract

<P>PROBLEM TO BE SOLVED: To provide a mobile phone, or the like, capable of automatically providing a moving image that is easy for a user to view, in real time during photographing, on the basis of a reference image without applying load on the user by imparting a tag. <P>SOLUTION: The mobile phone including a moving image data storage section and a moving image data expanding section includes a reference image feature storage section wherein reference images, feature quantities of the reference images, and keywords are stored in association with each other; a feature quantity calculating section which calculates the feature quantity of an image included in moving image data; a feature quantity comparing section which compares the feature quantity calculated by the feature quantity calculating section, with the feature quantities stored in the reference image feature storing section; and a tag information preserving section which generates and preserves tag information including a keyword of a reference image resembling the image contained in the moving image data, on the basis of the comparison result of the feature quantity comparing section. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、画像入力部から入力された画像と、音声入力部から入力された音声とから、動画データを作成し、動画ストレージに格納する動画データ格納部と、前記動画ストレージから、動画データを読み出して、画像及び音声を展開して、画像を画像出力部に、音声を音声出力部に出力する動画データ展開部を備えた携帯電話等に関する。 The present invention creates a moving image data from an image input from the image input unit and a sound input from the audio input unit, and stores the moving image data in the moving image storage, and the moving image data from the moving image storage. The present invention relates to a mobile phone or the like provided with a moving image data expansion unit that reads out and expands an image and sound and outputs the image to an image output unit and the sound to an audio output unit.

画像や、動画を撮影し、再生する画像処理装置が種々知られている。ここで、携帯電話等の装置においも、メモリ媒体が安価になったことや、高画素のカメラが搭載されたことにより、高品質な動画を長時間撮影することが出来るようになってきている。 Various image processing apparatuses that capture and reproduce images and moving images are known. Here, even in devices such as mobile phones, it has become possible to shoot high-quality moving pictures for a long time due to the low cost of memory media and the incorporation of high-pixel cameras. .

長時間の撮影を行うと、特に見たいシーンを発見するには早送り等の操作を行う必要があり、時間がかかる。ＤＶＤなどで販売されている映像ソフトでは、あらかじめチャプタ分割されており、チャプタを指定してジャンプすれば、目的の場面をすぐに見ることが出来るが、携帯電話等で撮影された動画は、撮影後にパソコンなどにデータを移動し、専用の編集ソフトで、あるいは自分で場面の区切り目をつけるなどの作業が必要であった。したがって、長時間撮影する能力を持ちながらも、見たい場面をすぐに見られないという使いにくさは、動画撮影機能を利用者に使ってもらえないことの一因となっていた。 When shooting for a long time, it is necessary to perform operations such as fast-forwarding in order to find a scene to be seen in particular, which takes time. In video software sold on DVD, etc., chapters are divided in advance, and if you jump by specifying chapters, you can see the target scene immediately, but videos shot with mobile phones etc. Later, it was necessary to move the data to a personal computer, etc., and work with special editing software or by setting the scene breaks by themselves. Therefore, the difficulty of using the video shooting function by the user has been one of the reasons why the user cannot use the video shooting function while having the ability to shoot for a long time.

ここで、ＴＶ向け録画装置等の場合、ＣＭ等で自動的に区切る装置があるが、無音を検出する方式や、音声のモノラル／ステレオの切り替わりを検出する方式であり、携帯電話での撮影においては利用が難しかった。 Here, in the case of a recording device for TV and the like, there is a device that automatically divides by CM or the like, but there are a method for detecting silence and a method for detecting a change of monaural / stereo sound, and in shooting with a mobile phone. Was difficult to use.

ここで、分類画像が動画である場合にも適用可能なメタデータ付与装置及びメタデータ付与方法が開示されている（例えば、特許文献１参照）。例えば、インターネットから収集した静止画像データから特徴量を抽出し、動画の各フレームデータ（画像）から抽出した特徴量と照合し、近い静止画像データに関連するキーワードをメタデータとして付与する手段を提供している。 Here, a metadata providing apparatus and a metadata providing method that can be applied even when the classified image is a moving image are disclosed (for example, see Patent Document 1). For example, a feature is provided that extracts feature values from still image data collected from the Internet, compares them with feature values extracted from each frame data (image) of moving images, and assigns keywords related to near still image data as metadata is doing.

特開２００８−２１７７０１号公報JP 2008-217701 A

上述した方法を用いれば、動画像の各フレームに類似画像検索を行い、各フレームにメタデータを付与することが出来るが、一般的に各画像の特徴ベクトルを算出する処理や、特徴ベクトルに基づく類似画像検索は処理量が大きく時間がかかり、特に画像サイズが大きくなればなるほどより多くの時間がかかるようになるため、録画時にどのように適用するのかという課題が残ってしまう。特に、ビデオカメラなど、専用の録画装置ではサイズの大きい動画を扱うための専用のハードウェアが使われるため比較的実現しやすいが、携帯電話等のように、電話等他の機能と共通で使う汎用かつ小型のＣＰＵなどを使わなければならないなどの制約の中では実現困難であった。 By using the above-described method, it is possible to perform a similar image search for each frame of a moving image and to add metadata to each frame. Generally, however, a process for calculating a feature vector of each image and a feature vector are used. The similar image search takes a large amount of processing and takes time, and in particular, the larger the image size, the more time it takes. Thus, there remains a problem of how to apply it during recording. In particular, a dedicated recording device such as a video camera is relatively easy to implement because it uses dedicated hardware for handling large-sized movies, but it is used in common with other functions such as a telephone, such as a mobile phone. It was difficult to realize it under the restriction that a general-purpose and small CPU had to be used.

また、すべての動画フレームに対してメタデータを付与できたとして、シーンの変わり目、すなわち何か特徴的なものが初めて写る場面を一覧として再生時に利用者に提示するには、一度すべてのメタデータを読み込む必要があるが、動画に連続して同じものが写っている限りそのフレーム数分のメタデータが付与されることとなり、読み込みに時間がかかるといった問題点が生じていた。 Also, assuming that all video frames have been given metadata, in order to present a list of scene changes, i.e., scenes where something unique is first seen, to the user during playback, all metadata must be entered once. However, as long as the same image is continuously captured in the video, metadata corresponding to the number of frames is added, which causes a problem that it takes time to read.

上述した課題に鑑み、本発明が目的とするところは、比較的処理能力が低い携帯電話で高解像度な動画像を撮影する場合であっても、撮影中にリアルタイムかつ自動的に、参照画像に基づいて、利用者に負担をかけることなく、タグを付与することにより、利用者にとって視聴しやすい動画を提供出来る携帯電話等を提供することを目的とする。 In view of the above-described problems, the present invention aims to automatically convert a reference image to a reference image in real time during shooting even when shooting a high-resolution moving image with a mobile phone having relatively low processing capability. An object of the present invention is to provide a mobile phone or the like that can provide a video that can be easily viewed by the user by attaching a tag without imposing a burden on the user.

上述した課題に鑑み、本発明の携帯電話は、画像入力部から入力された画像と、音声入力部から入力された音声とから、動画データを作成し、動画ストレージに格納する動画データ格納部と、前記動画ストレージから、動画データを読み出して、画像及び音声を展開して、画像を画像出力部に、音声を音声出力部に出力する動画データ展開部と、を備えた携帯電話であって、参照画像と、参照画像の特徴量と、キーワードとを対応づけて記憶する参照画像特徴記憶部と、前記動画データに含まれる画像の特徴量を算出する特徴量算出部と、前記特徴量算出部により算出された特徴量と、前記参照画像特徴記憶部に記憶されている特徴量とを比較する特徴量比較部と、前記特徴量比較部の比較結果に基づいて、前記動画データに含まれる画像に類似する参照画像のキーワードを含むタグ情報を生成し保存するタグ情報保存部と、を備えることを特徴とする。 In view of the above-described problems, the mobile phone of the present invention includes a moving image data storage unit that creates moving image data from an image input from the image input unit and audio input from the audio input unit and stores the moving image data in the moving image storage. A mobile phone comprising: a moving image data reading unit that reads moving image data from the moving image storage, expands an image and sound, outputs an image to an image output unit, and outputs sound to an audio output unit; A reference image feature storage unit that stores a reference image, a feature amount of the reference image, and a keyword in association with each other, a feature amount calculation unit that calculates a feature amount of an image included in the moving image data, and the feature amount calculation unit An image included in the moving image data based on a comparison result of the feature amount comparison unit that compares the feature amount calculated by the feature amount stored in the reference image feature storage unit and the feature amount comparison unit In A tag information storage unit to store generates tag information including the keyword of the reference image to be similar, characterized in that it comprises a.

また、本発明の携帯電話において、前記タグ情報保存部は、前記参照画像に類似する画像の動画データの開始時刻をダグ情報に更に含めて保存することを特徴とする。 In the mobile phone of the present invention, the tag information storage unit stores the start time of moving image data of an image similar to the reference image in addition to the tag information.

また、本発明の携帯電話において、前記タグ情報保存部は、前記参照画像に類似する画像の動画データの終了時刻をダグ情報に更に含めて保存することを特徴とする。 In the mobile phone of the present invention, the tag information storage unit stores the end time of moving image data of an image similar to the reference image in addition to the tag information.

また、本発明の携帯電話において、前記タグ情報保存部は、前記参照画像に対応するキーワードに応じたシーンの情報についてタグ情報に含めて保存することを特徴とする。 In the mobile phone of the present invention, the tag information storage unit stores the scene information corresponding to the keyword corresponding to the reference image in the tag information.

また、本発明の携帯電話は、前記動画データ展開部に対し、前記動画データに基づいたタグ情報に基づいて、再生の指示を行う再生指示部を更に有することを特徴とする。 In addition, the mobile phone of the present invention further includes a reproduction instruction unit that instructs the moving image data expansion unit to perform reproduction based on tag information based on the moving image data.

また、本発明の携帯電話において、前記再生指示部は、キーワードが含まれているシーンを前記タグ情報から特定し、当該シーンが連続して再生されるように前記動画データ展開部に対して再生指示を行うことを特徴とする。 In the mobile phone of the present invention, the reproduction instruction unit identifies a scene including a keyword from the tag information, and reproduces the moving image data expansion unit so that the scene is continuously reproduced. It is characterized by giving instructions.

また、本発明の携帯電話において、前記動画データ展開部は、再生指示部から早送りの指示があった際、前記タグ情報に基づき、各タグの先頭フレームのみ又は各タグの先頭から数秒間のフレームのみを、時系列にしたがって順に動画データを展開して出力することを特徴とする。 Further, in the mobile phone of the present invention, the moving image data expansion unit, when instructed to fast-forward from the reproduction instruction unit, based on the tag information, only the first frame of each tag or a frame of several seconds from the top of each tag Only, the moving image data is expanded and output in order according to the time series.

本発明のプログラムは、画像出力装置及び音声出力装置を備えたコンピュータに、画像データと、音声データとから、動画データを作成し、動画ストレージに格納する動画データ格納機能と、前記動画ストレージから、動画データを読み出して、画像及び音声を展開して、画像を画像出力装置に、音声を音声出力装置に出力する動画データ展開機能と、を実現させるプログラムであって、参照画像と、参照画像の特徴量と、キーワードとを対応づけて記憶する参照画像特徴記憶機能と、前記動画データに含まれる画像の特徴量を算出する特徴量算出機能と、前記特徴量算出機能により算出された特徴量と、前記参照画像特徴記憶機能により記憶されている特徴量とを比較する特徴量比較機能と、前記特徴量比較機能の比較結果に基づいて、前記動画データに含まれる画像に類似する参照画像のキーワードを含むタグ情報を生成し保存するタグ情報保存機能と、を実現することを特徴とする。 The program of the present invention includes a moving image data storage function for creating moving image data from a video data and sound data in a computer equipped with an image output device and an audio output device, and storing the moving image data in a moving image storage; A program that reads moving image data, expands an image and sound, and realizes a moving image data expansion function that outputs the image to the image output device and the sound to the sound output device. A reference image feature storage function for storing a feature quantity and a keyword in association with each other, a feature quantity calculation function for calculating a feature quantity of an image included in the moving image data, and a feature quantity calculated by the feature quantity calculation function Based on the comparison result of the feature amount comparison function for comparing the feature amount stored by the reference image feature storage function and the feature amount comparison function, Characterized in that to realize the tag information stored function to generate and save the tag information including the keyword of the reference image similar to the image included in the video data.

本発明の携帯電話によれば、参照画像と、参照画像の特徴量と、キーワードとを対応づけて記憶する参照画像特徴記憶部を有しており、動画データに含まれる画像の特徴量を算出し、算出された特徴量と、前記参照画像特徴記憶部に記憶されている特徴量とを比較する。そして、比較結果に基づいて、前記動画データに含まれる画像に類似する参照画像のキーワードを含むタグ情報を生成し保存する。したがって、撮影者が意識することなく、参照画像特徴記憶部に記憶されている参照画像に基づいてタグ情報が生成されることとなり、利用者が再生時にタグ情報を利用して再生可能な動画データを提供することができる。 According to the mobile phone of the present invention, it has the reference image feature storage unit that stores the reference image, the feature amount of the reference image, and the keyword in association with each other, and calculates the feature amount of the image included in the moving image data. Then, the calculated feature amount is compared with the feature amount stored in the reference image feature storage unit. Then, based on the comparison result, tag information including a keyword of a reference image similar to the image included in the moving image data is generated and stored. Therefore, tag information is generated based on the reference image stored in the reference image feature storage unit without the photographer being conscious, and the moving image data that the user can reproduce using the tag information during reproduction. Can be provided.

また、利用者がキーワードを指定することによって、動画データからキーワードを含むシーンのみを再生するといったことが、本発明の携帯電話により実現可能となる。 In addition, the mobile phone of the present invention can reproduce only a scene including the keyword from the moving image data by designating the keyword by the user.

本実施形態における携帯電話の機能構成を説明するための図である。It is a figure for demonstrating the function structure of the mobile telephone in this embodiment. 本実施形態における携帯電話が、動画データを撮影する場合に使用される機能部を説明するための図である。It is a figure for demonstrating the function part used when the mobile telephone in this embodiment image | photographs moving image data. 本実施形態における携帯電話が、動画データを再生する場合に使用される機能部を説明するための図である。It is a figure for demonstrating the function part used when the mobile telephone in this embodiment reproduces | regenerates moving image data. 本実施形態における参照画像特徴ＤＢのデータ構成の一例を説明するための図である。It is a figure for demonstrating an example of the data structure of reference image characteristic DB in this embodiment. 本実施形態における動画ストレージのデータ構成の一例を説明するための図である。It is a figure for demonstrating an example of the data structure of the moving image storage in this embodiment. 本実施形態におけるタグ情報の一例を説明するための図である。It is a figure for demonstrating an example of the tag information in this embodiment. 本実施形態におけるタグ情報生成処理の一例を説明するための図である。It is a figure for demonstrating an example of the tag information generation process in this embodiment. 本実施形態におけるタグ保存処理の一例を説明するための図である。It is a figure for demonstrating an example of the tag preservation | save process in this embodiment. 本実施形態における動画データのタイムシーケンスを表している図である。It is a figure showing the time sequence of the moving image data in this embodiment. 本実施形態における画面表示の一例を説明するための図である。It is a figure for demonstrating an example of the screen display in this embodiment. 本実施形態における動画データのタイムシーケンスを表している図である。It is a figure showing the time sequence of the moving image data in this embodiment.

以下、図面を参照して本発明を実施するための最良の形態について説明する。なお、本実施形態では、一例として、高解像度の動画撮影が可能な携帯電話１に適用した場合について説明する。 The best mode for carrying out the present invention will be described below with reference to the drawings. In the present embodiment, as an example, a case will be described in which the present invention is applied to a mobile phone 1 capable of high-resolution moving image shooting.

［機能構成］
まず、携帯電話１の機能構成について説明する、図１は、携帯電話１の機能構成の全体を示したものであり、図２は、携帯電話１において動画を記録する場合に利用される機能部を中心に、図３は、動画を再生する場合に利用される機能部を中心にそれぞれ表した図である。 [Function configuration]
First, the functional configuration of the mobile phone 1 will be described. FIG. 1 shows the entire functional configuration of the mobile phone 1, and FIG. 2 shows functional units used when recording a moving image on the mobile phone 1. FIG. 3 is a diagram mainly illustrating functional units used when reproducing a moving image.

図１に示すように、携帯電話１は、制御部１０に、通信部２０と、画像入出力部２２と、カメラ部２４と、表示部２６と、画像処理部３０と、音声入出力部４０と、マイク４２と、スピーカ４４と、音声処理部５０と、記憶部６０と、動画処理部７０と、画像解析部８０と、再生指示部９０とが、バスを介して接続されている。 As shown in FIG. 1, the mobile phone 1 includes a control unit 10, a communication unit 20, an image input / output unit 22, a camera unit 24, a display unit 26, an image processing unit 30, and an audio input / output unit 40. The microphone 42, the speaker 44, the sound processing unit 50, the storage unit 60, the moving image processing unit 70, the image analysis unit 80, and the reproduction instruction unit 90 are connected via a bus.

制御部１０は、携帯電話１を制御する機能部であり、例えばＣＰＵ等の処理装置により構成されている。制御部１０は、記憶部６０に記憶されている各種プログラムを読み出して実行することにより、各機能を実現することとなる。 The control unit 10 is a functional unit that controls the mobile phone 1, and is configured by a processing device such as a CPU, for example. The control unit 10 implements each function by reading and executing various programs stored in the storage unit 60.

通信部２０は、携帯電話１が基地局と通信を行う為の機能部であり、通信部２０を介して音声データ及び各データをネットワーク（例えば、携帯電話網）を介して通信可能となる。 The communication unit 20 is a functional unit for the mobile phone 1 to communicate with a base station, and can communicate voice data and each data via the network (for example, a mobile phone network) via the communication unit 20.

画像入出力部２２は、入力された画像を処理して各機能部に出力する機能部である。例えば、カメラ部２４から入力された画像を処理したり、表示部２６に適した表示形態にて画像を出力したりする機能部である。 The image input / output unit 22 is a functional unit that processes an input image and outputs the processed image to each functional unit. For example, it is a functional unit that processes an image input from the camera unit 24 or outputs an image in a display form suitable for the display unit 26.

ここで、画像入出力部２２は、出力する装置及び機能部に応じて適宜適切な大きさの画像に変換し、出力する画像変換機能を備えるものとする。例えば、入力された画像データ（例えば、解像度がＸＧＡの画像データ）を表示部２６（例えば、表示能力としてＶＧＡの表示部）に出力する場合には、表示部２６の表示サイズ（解像度がＶＧＡの画像データ）に適切に変換されて出力することとなる。 Here, it is assumed that the image input / output unit 22 has an image conversion function for appropriately converting an image having an appropriate size according to an output device and a function unit and outputting the image. For example, when the input image data (for example, image data with a resolution of XGA) is output to the display unit 26 (for example, a display unit with VGA as a display capability), the display size of the display unit 26 (with a resolution of VGA). Image data) is appropriately converted and output.

また、本実施形態においては、画像入出力部２２から特徴量算出部８２に出力される画像データは、表示部２６に出力される画像データにて出力されることとする。これにより、画像のエンコードの次に処理負荷の大きい特徴量を抽出する処理において、表示用の画像データ（すなわち、撮影されたデータより解像度の低い画像データ）を利用することで、処理負荷を軽減することができる。 In the present embodiment, the image data output from the image input / output unit 22 to the feature amount calculation unit 82 is output as image data output to the display unit 26. This reduces the processing load by using display image data (that is, image data having a resolution lower than that of the captured data) in the process of extracting feature quantities having the next largest processing load after image encoding. can do.

カメラ部２４は、被写体や風景等を撮影するための機能部であり、例えば、ＣＣＤ等で構成されている。カメラ部２４で撮影された画像は、画像入出力部２２に出力される。 The camera unit 24 is a functional unit for photographing a subject, a landscape, and the like, and is configured by, for example, a CCD. An image captured by the camera unit 24 is output to the image input / output unit 22.

表示部２６は、携帯電話１における種々の情報を表示したり、画像又は動画像を表示したりする機能部である。例えば、液晶ディスプレイや、有機ＥＬディスプレイ等により構成されている。 The display unit 26 is a functional unit that displays various information in the mobile phone 1 and displays an image or a moving image. For example, it is configured by a liquid crystal display, an organic EL display, or the like.

画像処理部３０は、携帯電話１において、所定の形式にて画像データを符号化したり、復号したりする機能部である。画像処理部３０は、画像データを符号化する画像エンコーダ部３２と、画像データを復号する画像デコーダ部３４とを備えている。 The image processing unit 30 is a functional unit that encodes and decodes image data in a predetermined format in the mobile phone 1. The image processing unit 30 includes an image encoder unit 32 that encodes image data and an image decoder unit 34 that decodes image data.

音声入出力部４０は、入力された音声を処理して各機能部に出力する機能部である。例えば、マイク４２から入力された音声を処理したり、スピーカ４４に出力したりする機能部である。なお、本実施形態においては、音声出力部としてスピーカ４４を利用して説明するが、例えばヘッドフォン等の他の音声出力装置で構成されても良いことは勿論である。 The voice input / output unit 40 is a functional unit that processes input voice and outputs the processed voice to each functional unit. For example, it is a functional unit that processes sound input from the microphone 42 or outputs it to the speaker 44. In this embodiment, the speaker 44 is used as an audio output unit. However, it is needless to say that the audio output unit may be composed of other audio output devices such as headphones.

音声処理部５０は、携帯電話１において、所定の形式にて音声データを符号化したり、復号したりする機能部である。音声処理部５０は、音声データを符号化する音声エンコーダ部５２と、音声デコーダ部５４とを備えている。 The voice processing unit 50 is a functional unit that encodes and decodes voice data in a predetermined format in the mobile phone 1. The audio processing unit 50 includes an audio encoder unit 52 that encodes audio data, and an audio decoder unit 54.

記憶部６０は、携帯電話１の設定状態を記憶したり、一時的な画像データを保存したりするための機能部である。ここで、記憶部６０は、例えば、半導体メモリ、ハードディスクドライブ、光学式ディスクドライブ等のいずれかの記憶装置により構成されている。また、携帯電話１を動作させる為の各種データやプログラム等も記憶している。制御部１０は、記憶部６０に記憶されている制御プログラムを読み出して実行することにより、各種制御処理を実行する。 The storage unit 60 is a functional unit for storing the setting state of the mobile phone 1 and storing temporary image data. Here, the storage unit 60 is configured by any storage device such as a semiconductor memory, a hard disk drive, or an optical disk drive, for example. Further, various data and programs for operating the mobile phone 1 are also stored. The control unit 10 executes various control processes by reading and executing the control program stored in the storage unit 60.

また、記憶部６０には、参照画像特徴ＤＢ６２と、動画ストレージ６４との領域が確保されている。 Further, the storage unit 60 has areas for the reference image feature DB 62 and the moving image storage 64.

ここで、参照画像特徴ＤＢ６２のデータ構造の一例として図４に示す。参照画像特徴ＤＢ６２は、参照画像のファイルパスと、参照画像に対応するキーワードと、参照画像の特徴量データとが対応づけて記憶されている。 An example of the data structure of the reference image feature DB 62 is shown in FIG. The reference image feature DB 62 stores a file path of the reference image, a keyword corresponding to the reference image, and feature amount data of the reference image in association with each other.

ここで、同じ被写体であっても、撮影条件などにより画像の特徴量には差異が生じるため、同じキーワードに該当する画像を複数登録することでより高い識別率を得ることができる。また、参照画像は携帯電話の製造者があらかじめ登録しておいても良いし、使用者が後から登録できるようにしても良い。また、携帯電話には人物名と対応する画像を登録可能な電話帳機能が備わっていることが多いため、これを参照画像として用いても良い。さらに、携帯電話での静止画像撮影では、ＧＰＳを用いて位置情報を埋め込んだりできるので、これを利用するのもよい。 Here, even if the subject is the same, there is a difference in the feature amount of the image depending on the shooting conditions and the like, so that a higher identification rate can be obtained by registering a plurality of images corresponding to the same keyword. The reference image may be registered in advance by the manufacturer of the mobile phone, or may be registered later by the user. In addition, since mobile phones often have a telephone directory function that can register images corresponding to person names, these may be used as reference images. Furthermore, in still image shooting with a mobile phone, position information can be embedded using GPS, and this may be used.

動画ストレージ６４は、動画データと、動画データに対応するタグ情報とが記憶されている記憶領域である。例えば、図５に示すように、動画像データ（例えば、「recorded100.mp4」）に対応づけて、タグ情報（例えば、「<taglist>〜」）がそれぞれ記憶されている。 The moving image storage 64 is a storage area in which moving image data and tag information corresponding to the moving image data are stored. For example, as shown in FIG. 5, tag information (for example, “<taglist> ˜”) is stored in association with moving image data (for example, “recorded100.mp4”).

ここで、タグ情報について、図６を用いて説明する。まず、ｆｉｌｅｎａｍｅタグ（Ｔ１）により、動画像データが指定されており、当該動画像データ内のタグがｔａｇｉｄタグ（Ｔ１００）で指定されている。この、ｔａｇｉｄタグで囲まれている範囲（例えば、Ｔ１０やＴ１２）が一つのタグとなる。 Here, the tag information will be described with reference to FIG. First, moving image data is specified by a filename tag (T1), and a tag in the moving image data is specified by a tag id tag (T100). A range (for example, T10 or T12) surrounded by the tag id tag is one tag.

また、各タグ情報の詳細については、以下の通りである。
ｒｅｆ−ｉｍｇタグ（Ｔ１０２）：参照画像にアクセスするためのパス情報が記録されている。
ｔｈｕｍｂｎａｉｌタグ（Ｔ１０４）：サムネイル画像に関するパス情報が記録されている。
ｋｅｙｗｏｒｄタグ（Ｔ１０６）：コンテンツに対するキーワードが記録されている。
ｓｔａｒｔ−ｔｉｍｅタグ（Ｔ１０８）：コンテンツの開始時間が記録されている。
ｅｎｄ−ｔｉｍｅタグ（Ｔ１１０）：コンテンツの終了時間が記録されている。 Details of each tag information are as follows.
ref-img tag (T102): Path information for accessing the reference image is recorded.
thumbnail tag (T104): Path information related to the thumbnail image is recorded.
keyword (T106): A keyword for the content is recorded.
start-time tag (T108): The start time of the content is recorded.
end-time tag (T110): The end time of the content is recorded.

なお、本実施形態には、タグ情報の一例としてＸＭＬ形式としているが、他の形式にて保存されることとしても良いことは勿論である。 In this embodiment, the XML format is used as an example of the tag information, but it is needless to say that the tag information may be stored in another format.

動画処理部７０は、入力された画像データ及び音声データを結合することにより動画データを例えば、Ｍｐｅｇ４形式等の所定の形式に符号化したり、逆に読み出された動画データを、画像データ及び音声データに分離して出力したりする機能部である。動画処理部７０には、動画データを符号化して格納する動画データ格納部７２と、動画データを展開して画像データと音声データに分離（展開）する動画データ展開部７４とを備えている。 The moving image processing unit 70 combines the input image data and audio data to encode the moving image data into a predetermined format such as the Mpeg4 format, or conversely, the read moving image data is converted into image data and audio data. This is a functional unit that outputs data separately. The moving image processing unit 70 includes a moving image data storage unit 72 that encodes and stores moving image data, and a moving image data expansion unit 74 that expands the moving image data and separates (decompresses) it into image data and audio data.

画像解析部８０は、各種画像を解析したり、動画ストレージ６４からタグ情報を読み出したり保存したりする機能部である。画像解析部８０には、特徴量算出部８２と、特徴量比較部８４と、タグ保存部８６と、タグ読み出し部８８とを含んでいる。 The image analysis unit 80 is a functional unit that analyzes various images and reads and saves tag information from the moving image storage 64. The image analysis unit 80 includes a feature amount calculation unit 82, a feature amount comparison unit 84, a tag storage unit 86, and a tag reading unit 88.

特徴量算出部８２は、入力された１フレーム分の画像データの特徴情報を算出する機能部である。本実施形態では特徴情報の算出方法は特に問わないが、例えば、ＲＧＢ方式におけるＲ、Ｇ、Ｂ各色のヒストグラムを特徴情報とする方法や、ＹＵＶ方式におけるＹのみのヒストグラムを特徴情報とする方法などがより適する。 The feature amount calculation unit 82 is a functional unit that calculates the feature information of the input image data for one frame. In this embodiment, the feature information calculation method is not particularly limited. For example, a method using the R, G, and B color histograms in the RGB method as feature information, a method using the Y-only histogram in the YUV method as feature information, and the like. Is more suitable.

なお、特徴量算出部８２に入力されるデータは、画像入出力部２２から出力される表示用の画像データであることとする。なお、特徴量算出部８２の処理性能により、撮影された画像データ（すなわち、高解像度の画像データ）を利用しても良い。 Note that the data input to the feature amount calculation unit 82 is display image data output from the image input / output unit 22. Note that captured image data (that is, high-resolution image data) may be used depending on the processing performance of the feature amount calculation unit 82.

特徴量比較部８４は、特徴量算出部８２で算出した特徴情報と、参照画像特徴ＤＢ６２から取得した特徴情報とを比較し、類似度を算出するための機能部である。 The feature amount comparison unit 84 is a functional unit for calculating the degree of similarity by comparing the feature information calculated by the feature amount calculation unit 82 with the feature information acquired from the reference image feature DB 62.

タグ保存部８６は、特徴量比較部８４で一定以上の類似度を持つと判断された参照画像情報と、特徴量算出部８２で特徴算出処理をしたフレームデータのタイムスタンプ情報を元に、録画中の動画データのタグ情報を作成し、動画データ対応づけて動画ストレージ６４に記憶する。 The tag storage unit 86 performs recording based on the reference image information determined by the feature amount comparison unit 84 to have a certain degree of similarity and the time stamp information of the frame data subjected to the feature calculation processing by the feature amount calculation unit 82. The tag information of the moving image data inside is created and stored in the moving image storage 64 in association with the moving image data.

タグ読み出し部８８は、動画ストレージ６４に含まれているタグ情報を読み出し、動画データ展開部７４及び画像入出力部２２にタグ情報を出力する。 The tag reading unit 88 reads the tag information included in the moving image storage 64 and outputs the tag information to the moving image data expansion unit 74 and the image input / output unit 22.

再生指示部９０は、表示部２６に表示されたタグ情報を見た利用者が、動画像の好みの位置から再生するように位置を指定する役割を持つハードウェア及びソフトウェアである。ハードウェアとしては、表示部２６と一体となったタッチパネルを用いる方法でもよいし、携帯電話のキー、携帯電話に接続されたキーボードなどを用いる方法でも良い。 The reproduction instructing unit 90 is hardware and software having a role of designating a position so that a user who views the tag information displayed on the display unit 26 reproduces from a favorite position of a moving image. As the hardware, a method using a touch panel integrated with the display unit 26 may be used, or a method using a key of a mobile phone, a keyboard connected to the mobile phone, or the like may be used.

［動画記録時について］
つづいて、各機能部について、動画を記録する場合と、動画を再生する場合に利用される機能部を中心に説明する。まず、動画を記録する場合について説明する。 [When recording movies]
Next, each functional unit will be described focusing on the functional units used when recording a moving image and reproducing the moving image. First, a case where a moving image is recorded will be described.

カメラ部２４、マイク４２はそれぞれ携帯電話１に取り付けられた撮影装置と音声入力装置である。 The camera unit 24 and the microphone 42 are a photographing device and a voice input device attached to the mobile phone 1, respectively.

カメラ部２４から入力された画像（動画）は、画像入出力部２２に出力され、画像データとして、表示部２６、画像エンコーダ部３２、特徴量算出部８２に出力される。特徴量算出部は、画像入出力部２２から入力された画像に基づいて特徴量を算出する。そして、参照画像特徴ＤＢ６２に記憶されている参照画像の特徴量と、特徴量算出部８２により算出された特徴量とを比較し、類似する参照画像（すなわち、タグ）を認識する。 An image (moving image) input from the camera unit 24 is output to the image input / output unit 22 and output to the display unit 26, the image encoder unit 32, and the feature amount calculation unit 82 as image data. The feature amount calculation unit calculates a feature amount based on the image input from the image input / output unit 22. Then, the feature amount of the reference image stored in the reference image feature DB 62 is compared with the feature amount calculated by the feature amount calculation unit 82 to recognize a similar reference image (ie, tag).

ここで、画像入出力部２２から画像エンコーダ部３２により所定の形式にエンコードされた画像データが送信される。 Here, image data encoded in a predetermined format is transmitted from the image input / output unit 22 by the image encoder unit 32.

また、マイク４２から入力された音声は、音声入出力部４０により音声データに変更される。また、音声エンコーダ部５２により、所定の形式にエンコードされ、動画データ格納部７２に出力する。 The voice input from the microphone 42 is changed to voice data by the voice input / output unit 40. Further, the audio encoder unit 52 encodes the data into a predetermined format and outputs the encoded data to the moving image data storage unit 72.

また、動画データ結合部は、画像エンコーダ部３２により出力された画像データと、音声エンコーダ部５２により出力された音声データとを、動画ファイルとする。そして、動画ファイルは、動画ストレージ６４に、タグ保存部８６により記憶されたタグ情報と併せて、動画ストレージ６４に記憶される。 Further, the moving image data combining unit sets the image data output from the image encoder unit 32 and the audio data output from the audio encoder unit 52 as a moving image file. The moving image file is stored in the moving image storage 64 in the moving image storage 64 together with the tag information stored by the tag storage unit 86.

［再生機能］
続いて、動画を再生する場合について説明する。図３は本実施形態における携帯電話１で動画像を再生する機能を実現するための一実施形態例を示すシステム図である。 [Playback function]
Next, a case where a moving image is reproduced will be described. FIG. 3 is a system diagram showing an embodiment for realizing a function of reproducing a moving image on the mobile phone 1 in the present embodiment.

再生指示部９０より、利用者により動画の再生が指示されると、動画データ展開部７４は、動画ストレージ６４より対応する動画データと、タグ情報を読み出す。 When the reproduction instruction unit 90 instructs the user to reproduce a moving image, the moving image data expansion unit 74 reads the corresponding moving image data and tag information from the moving image storage 64.

そして、動画ストレージ６４から読み出された動画データは動画データ展開部７４に、タグ情報はタグ読み出し部８８を介して動画データ展開部７４及び画像入出力部２２にそれぞれ出力される。 The moving image data read from the moving image storage 64 is output to the moving image data expansion unit 74, and the tag information is output to the moving image data expansion unit 74 and the image input / output unit 22 via the tag reading unit 88.

動画データ展開部７４は、入力された動画データを画像データと音声データとにそれぞれ分離して出力する。具体的には、画像データを画像デコーダ部３４に、音声データを音声デコーダ部５４にそれぞれ出力する。 The moving image data expansion unit 74 outputs the input moving image data separately into image data and audio data. Specifically, the image data is output to the image decoder unit 34 and the audio data is output to the audio decoder unit 54.

そして、画像デコーダ部３４により、画像が復号され、画像入出力部２２を介して表示部２６に出力される。また、音声デコーダ部５４により、音声が復号され、音声入出力部４０を介してスピーカ４４に出力される。 Then, the image decoder unit 34 decodes the image and outputs it to the display unit 26 via the image input / output unit 22. Also, the audio decoder 54 decodes the audio and outputs it to the speaker 44 via the audio input / output unit 40.

［タグ情報生成処理］
続いて、動画像撮影時におけるタグ情報生成手順について、図７を用いて説明する。携帯電話１内には、あらかじめ参照画像となる画像と、それに対応するキーワードを参照画像特徴ＤＢ６２に登録しておく。ここで、参照画像は複数登録することができ、撮影開始前に特徴量算出部８２において特徴量を算出し、参照画像特徴ＤＢ６２に登録しておく。 [Tag information generation processing]
Next, a tag information generation procedure at the time of moving image shooting will be described with reference to FIG. In the mobile phone 1, an image serving as a reference image and a keyword corresponding to the image are registered in advance in the reference image feature DB 62. Here, a plurality of reference images can be registered, and a feature amount is calculated by the feature amount calculation unit 82 before shooting and registered in the reference image feature DB 62.

撮影を開始すると、まず参照画像特徴ＤＢ６２から特徴量比較部８４に特徴量データをすべて読み込む（ステップＳ１０）。具体的には、特徴量データがｎ個登録されていれば、変数Ｆ_ｒ１〜Ｆ_ｒｎとして読み込む。読み込みが終わったら、カメラ部２４からの画像の入力を開始する（ステップＳ１２）。 When shooting is started, all feature quantity data is first read from the reference image feature DB 62 into the feature quantity comparison unit 84 (step S10). Specifically, if n pieces of feature data are registered, they are read as variables F _{r1 to} F _rn . When reading is completed, input of an image from the camera unit 24 is started (step S12).

画像入出力部２２は、撮影パラメータに従い画像エンコーダ部３２に画像を出力する。また、画像エンコーダ部３２に提供した画像がどのような画像であるかを利用者にわかりやすくするため、表示部２６にも画像を出力する。また、画像入出力部２２には、常に最新の画像データを取得可能なバッファが備わっており、１フレーム分のデータが貯まると特徴量算出部８２に通知する。 The image input / output unit 22 outputs an image to the image encoder unit 32 according to the shooting parameters. Further, in order to make it easy for the user to understand what the image provided to the image encoder unit 32 is, the image is also output to the display unit 26. Further, the image input / output unit 22 includes a buffer that can always acquire the latest image data, and notifies the feature amount calculation unit 82 when data for one frame is accumulated.

携帯端末の場合、一般に表示部２６の解像度は撮影可能なサイズよりも小さいため、画像エンコーダ部３２に出力される画像よりも表示部２６に出力する画像の解像度の方が粗くなる。そこで本実施形態では、実際に符号化する画像を特徴抽出対象として使うのではなく、表示部２６で表示するための画像を使うことにする。 In the case of a portable terminal, since the resolution of the display unit 26 is generally smaller than the size that can be photographed, the resolution of the image output to the display unit 26 is coarser than the image output to the image encoder unit 32. Therefore, in this embodiment, an image to be displayed on the display unit 26 is used instead of an image to be actually encoded as a feature extraction target.

すなわち、画像入出力部２２のバッファには、表示部２６で表示するための画像と同じデータを格納する。特徴量算出部８２は、データを取得する準備が整っていれば、任意のタイミングでそれを取得することが出来るが、データを取得する前にカメラから新たな画像が入力されると画像入出力部２２内にデータは上書きされる。特徴量算出部８２は、バッファにデータがたまったことを通知されると、１フレーム分のデータが入力され（ステップＳ１４）、特徴量データを算出する。ここで、算出した特徴量はｉ番目のフレームの特徴量データを意味するＦ_ｉに格納される（ステップＳ１６）。 That is, the same data as the image to be displayed on the display unit 26 is stored in the buffer of the image input / output unit 22. The feature amount calculation unit 82 can acquire the data at any timing as long as it is ready to acquire the data. However, if a new image is input from the camera before the data is acquired, the image input / output is performed. The data is overwritten in the unit 22. When notified that the data has accumulated in the buffer, the feature amount calculation unit 82 receives data for one frame (step S14), and calculates the feature amount data. Here, the calculated feature quantity is stored in F _i which means the feature quantity data of the i-th frame (step S16).

次に、特徴量比較部８４でＦ_ｉとＦ_ｒ１〜Ｆ_ｒｎの類似度を算出する（ステップＳ１８）。類似度の算出方法は、特徴量データの算出方法によりさまざまな算出方法があるが、例えばＹＵＶ方式におけるＹのみのヒストグラムを特徴量データとしたときには、各要素の差分を全要素にわたって加算したものを類似度とすることができる。すなわち、Ｆ_ｉの各要素をＦ_ｉ、Ｆ_ｒ１〜Ｆ_ｒｎの各要素をＦ_ｒ１〜Ｆ_ｒｎ、要素数をｍとすれば、Ｆ_ｉとｎ番目の参照画像の特徴量データとの類似度Ｓ_ｒｎは以下の式で求められる。

この算出方法では、ヒストグラムのすべての要素が同じ値の時にはＳ_ｒｎ＝０となり、もっとも類似している画像であると判断でき、Ｓ_ｒｎが大きいほど類似していないと判断することができる。類似度の算出は読み込んである参照画像データの特徴量データの数だけ算出するので、ｎ個読み込んであればＳ_ｒ１〜Ｓ_ｒｎに格納される。 Next, the feature amount comparison unit 84 calculates the similarity between F _i and F _{r1 to} F _rn (step S18). There are various calculation methods for calculating the degree of similarity, depending on the calculation method of feature amount data. For example, when a histogram of only Y in the YUV method is used as feature amount data, the difference between each element is added over all elements. Similarity can be obtained. That is, each element _F i of _{F _i,} F r1 _{to F rn} each element _F r1 _{to F rn} of, if the number of elements is m, the similarity between the feature amount data of _{F i} and n-th reference image S _rn is obtained by the following equation.

In this calculation method, when all the elements of the histogram have the same value, S _rn = 0, so that it can be determined that the images are most similar, and as S _rn is large, it can be determined that they are not similar. Since the similarity is calculated by the number of feature amount data of the read reference image data, if n are read, they are stored in S _{r1 to} S _rn .

そして、算出した類似度のうち、一定以下の値（閾値）を持つ、すなわち一定以上の値を持っていると判断できるものを抽出し、タグ情報の候補として抽出する（ステップＳ２０）。この一定の閾値は、すべての参照画像で同じ値でも良いし、参照画像に個々に設定しても良い。 Then, among the calculated similarities, those having a value (threshold value) below a certain value, that is, a value that can be determined to have a value above a certain value are extracted and extracted as tag information candidates (step S20). This constant threshold value may be the same value for all reference images, or may be set individually for each reference image.

つづいて、タグ保存処理（ステップＳ２２）が実行される。そして、録画を終了しない場合には、ステップＳ１４から繰り返し処理を実行し、他の場合には、本処理を終了する。 Subsequently, tag storage processing (step S22) is executed. If the recording is not finished, the process is repeated from step S14. In other cases, the process is finished.

［タグ保存処理］
続いて、図７のステップＳ２２において実行されるタグ保存処理について図８を用いて説明する。 [Tag save processing]
Subsequently, the tag storing process executed in step S22 of FIG. 7 will be described with reference to FIG.

タグ保存部８６では、図７のステップＳ２０において、一定以上類似していると判定された画像データのそれぞれについて、ステップＳ５０からＳ５８を繰り返すことによってフレームのタグ情報を記録する。 The tag storage unit 86 records the frame tag information by repeating steps S50 to S58 for each of the image data determined to be similar to a certain level in step S20 of FIG.

まず、１つの類似する参照画像データに着目し、その参照画像データが前のフレームでも類似すると判断されていたかどうかを判断する（ステップＳ５０）。もし前のフレームでもその画像が類似と判断されていれば、そのタグ情報候補はスキップする（ステップＳ５０；Ｙｅｓ→ステップＳ６０）。 First, paying attention to one similar reference image data, it is determined whether or not the reference image data is determined to be similar in the previous frame (step S50). If it is determined that the image is similar in the previous frame, the tag information candidate is skipped (step S50; Yes → step S60).

他方、前のフレームでは類似と判断されていなければ、タグリストにタグ情報として追加する（ステップＳ５２）。ここで追加されるのは、図５におけるｔａｇタグと、そのｉｄである（例えば、Ｔ１００）。次に、ステップＳ５２で追加したタグ情報に、参照画像にアクセスするためのパス情報と、参照画像のキーワード情報を記録する（ステップＳ５４）。これらは、図５におけるｒｅｆ−ｉｍｇタグ（例えば、Ｔ１０２）とｋｅｙｗｏｒｄタグ（例えば、Ｔ１０６）に該当する。次に、Ｓ４４において、入力したフレームデータのタイムスタンプ情報を記録する（ステップＳ５６）。これは図５におけるｓｔａｒｔ−ｔｉｍｅタグ（例えば、Ｔ１０８）に該当する。 On the other hand, if it is not determined to be similar in the previous frame, it is added as tag information to the tag list (step S52). What is added here is the tag tag in FIG. 5 and its id (for example, T100). Next, path information for accessing the reference image and keyword information of the reference image are recorded in the tag information added in step S52 (step S54). These correspond to the ref-img tag (for example, T102) and the keyword (for example, T106) in FIG. Next, in S44, time stamp information of the input frame data is recorded (step S56). This corresponds to the start-time tag (for example, T108) in FIG.

次に、フレームデータをサムネイル画像として記録し、アクセスするための情報として記録する（ステップＳ５８）。これは、動画像再生時に、場面一覧として視聴者にわかりやすく提示するための情報である。もちろん、そのような用途がない場合にはこの手順は不要であり、例えばタグ情報の開始時間情報に基づきそのフレームをデコードして視聴者に提示しても良い。 Next, the frame data is recorded as a thumbnail image and recorded as information for access (step S58). This is information for easily presenting the viewer with a list of scenes during playback of a moving image. Of course, if there is no such use, this procedure is unnecessary. For example, the frame may be decoded based on the start time information of the tag information and presented to the viewer.

このように、ステップＳ５０からＳ５８までを繰り返すと、新たにタグ情報候補として抽出された参照画像データに基づいたタグ情報が追加される。この後、総てのタグについて処理が完了すると（ステップＳ６０；Ｙｅｓ）、Ｓ５０からＳ６０までの一連の処理で変更されていないタグ情報のうち、終了時刻が記録されていないタグ情報に、フレームデータのタイムスタンプを終了時刻として記録する（ステップＳ６２）。これは図５におけるｅｎｄ−ｔｉｍｅタグ（例えば、Ｔ１１０）に該当する。これは、ある参照画像に類似していると判断された場面が終了したことを意味する。なお、１フレームあたりのタグ保存処理はここまでで終了である。 As described above, when steps S50 to S58 are repeated, tag information based on the reference image data newly extracted as the tag information candidate is added. Thereafter, when the processing is completed for all the tags (step S60; Yes), the frame data is added to the tag information in which the end time is not recorded among the tag information that has not been changed in the series of processing from S50 to S60. Is recorded as the end time (step S62). This corresponds to the end-time tag (for example, T110) in FIG. This means that the scene determined to be similar to a certain reference image has ended. Note that the tag storage processing per frame ends here.

このように、一つのタグに対して開始時刻から終了時刻の範囲を「シーン」という。例えば、タグ情報（キーワード）として「山」に対応する動画データの範囲を一つの「シーン」とする。 Thus, the range from the start time to the end time for one tag is referred to as a “scene”. For example, a range of moving image data corresponding to “mountain” as tag information (keyword) is set as one “scene”.

図９は、図５のように保存されたタグが示す、動画データ内の一部の場面を時系列で示したイメージ図である。この区間では、花、人物、紅葉のシーンが撮影されており、図４に示したその他のキーワード（夜景、新設、雪景色）に該当するシーンはない。また、花と人物、人物と紅葉のシーンは一部が同時に被写体となっていることがわかる。 FIG. 9 is an image view showing a part of scenes in the moving image data in time series indicated by the tags stored as shown in FIG. In this section, scenes of flowers, people, and autumn leaves are photographed, and there are no scenes corresponding to the other keywords (night view, new installation, snow scene) shown in FIG. In addition, it can be seen that the scenes of flowers and people, and people and autumn leaves are part of the subject at the same time.

具体的には、タグｉｄ＝１（タイムスタンプが１２３４５〜１３３４５のフレーム）については花が、タグｉｄ＝２（タイムスタンプが１３０００〜１６０００のフレーム）については人物が、タグｉｄ＝３（タイムスタンプが１８００１〜１９１２３のフレーム）については紅葉が、タグｉｄ＝４（タイムスタンプが１８５８３〜２４２１０のフレーム）については人物が写っている。ここで、タイムスタンプが１３０００〜１３３４５のフレームについては、花と人物と両方の被写体が映っていることが解る。 Specifically, for tag id = 1 (frames with a time stamp of 12345 to 13345), a flower is used for tag id = 2 (frames with a time stamp of 13,000 to 16000), and for tag id = 3 (time stamp). Is a frame of 18001 to 19123), and a person is shown for a tag id = 4 (a frame whose time stamp is 18583 to 24210). Here, it can be seen that for the frames with time stamps of 13000 to 13345, both the flower and the person are shown.

このように、本実施形態では、ある動画フレームがどのようなシーンなのかを一意に決めるのではなく、参照画像との類似度が閾値以上であるという条件の中で、複数の意味を持つシーンであることを許容する。このようにすることで、人物が写っているのに花も写っているから花のシーンとして判断されてしまい、人物の登場シーンから再生することが出来なかった、というようなことを防ぐことが出来る。 As described above, in the present embodiment, a scene having a plurality of meanings is provided in a condition that the degree of similarity with a reference image is equal to or greater than a threshold value, instead of uniquely determining what kind of scene a certain moving image frame is. Is allowed. By doing this, it is possible to prevent a situation where a person is captured but a flower is also captured and the scene is judged to be a flower, and it cannot be reproduced from the scene where the person appears. I can do it.

図１０は、視聴者が見たいシーンを選ぶための画面の一例である。図１０（ａ）では、シーン一覧を表示して、その中からシーンを選んで再生することができる。ここでは、「花」「人物」「紅葉」「人物」と例示した４つのシーンを時系列で並べている。表示しているのは、どのような場面かを視覚的に判断するためのサムネイル画像と、何の場面であるかを示すキーワードと、開始時刻とである。 FIG. 10 is an example of a screen for selecting a scene that the viewer wants to see. In FIG. 10A, a scene list is displayed, and a scene can be selected from the list and reproduced. Here, four scenes exemplified as “flower”, “person”, “autumn leaves”, and “person” are arranged in time series. What is displayed is a thumbnail image for visually determining what kind of scene it is, a keyword indicating what scene it is, and a start time.

もちろん、サムネイル画像のみを表示してシーンを選択できるようにしてもよいし、シーンが多い場合には画面をスクロールして表示できるようにしてもよい。また、サムネイル画像は図５に示したタグ情報に含まれるサムネイル画像を使用しても良いし、タグが示す時刻から実際に動画像を再生し、動く映像として提示しても良い。 Of course, only thumbnail images may be displayed so that a scene can be selected, or when there are many scenes, the screen may be scrolled and displayed. Further, as the thumbnail image, the thumbnail image included in the tag information shown in FIG. 5 may be used, or the moving image may be actually reproduced from the time indicated by the tag and presented as a moving image.

図１０（ｂ）では、シーン名を指定して、同じシーン名を持つシーンをまとめて再生する方法の例を示している。上述した実施形態の例で示した動画で「人物」を選択すると、タグｉｄ＝２と、タグｉｄ＝４のシーンを連続して再生することが出来る。これと同じ手法で、例えば人物が写っているシーンだけを抜き出して別の動画ファイルとして作成するといったことも行うことが出来る。 FIG. 10B shows an example of a method in which scene names are designated and scenes having the same scene name are reproduced together. When “person” is selected in the moving image shown in the example of the embodiment described above, the scenes with tag id = 2 and tag id = 4 can be reproduced continuously. With this same method, for example, it is possible to extract only a scene in which a person is shown and create it as another moving image file.

また、タグ情報のもう一つの使用方法として、動画の早送りへの利用が考えられる。これまでの早送りは、すべてのフレームを通常より速くデコードして表示する方法（倍速再生など）と、一定時間毎のフレームのみを表示する方法と、ＭＰＥＧ４などにおけるイントラフレーム符号化されたフレームのみを表示する方法などが使用されてきた。 Further, as another method of using the tag information, use for fast-forwarding a moving image can be considered. Fast-forwarding so far includes decoding all frames faster than usual (such as double-speed playback), displaying only frames at regular intervals, and only intra-frame encoded frames such as MPEG4. Display methods have been used.

しかしながら、これは動画の内容とは関係なく一定区間の表示が飛ばされるため、ちょうど見たい位置にあわせるのが難しかった。そこで、記録したタグ情報のサムネイル画像を次々と表示することで、早送りをどの位置で解除しても何らかのシーンが始まる位置から再生を再開することが出来るようになる。 However, since the display of a certain section is skipped regardless of the content of the moving image, it has been difficult to adjust the position exactly to be viewed. Therefore, by displaying the thumbnail images of the recorded tag information one after another, it becomes possible to resume playback from the position where some scene starts, regardless of the position where fast forward is canceled.

また、早送りの場合に、サムネイル画像表示するだけでなく、各タグの先頭から数秒間のみを連続して表示することで、内容をより具体的に把握しやすいかたちで再生する（ダイジェスト再生を行う）こともできる。図１１は、上述した実施形態による方法で録画した動画像を、早送りする場合に、再生されるフレームと再生されないフレームを時系列で表した図である。 Also, in the case of fast-forwarding, not only thumbnail images are displayed, but also only a few seconds from the beginning of each tag are displayed continuously, so that the contents can be played back in a form that is easier to grasp (digest playback is performed) You can also FIG. 11 is a diagram showing, in a time series, frames that are played back and frames that are not played back when fast-forwarding a moving image recorded by the method according to the above-described embodiment.

動画データ展開部７４は、各タグの再生時間をあらかじめ決めておき（利用者に設定させる方法でもよい）、利用者による早送り操作(あるいはダイジェスト再生操作)があった場合にこの再生動作を開始する。 The video data development unit 74 determines the playback time of each tag in advance (or may be set by the user), and starts this playback operation when there is a fast-forward operation (or digest playback operation) by the user. .

まず、現在の再生時刻に最も近い時刻のタグ情報を取得し、その開始時刻から、あらかじめ設定された再生時間分のデータを再生する。その後は、次のタグデータを取得し、同様に指定時間分のデータを再生する動作を繰り返す。 First, tag information at a time closest to the current reproduction time is acquired, and data for a reproduction time set in advance is reproduced from the start time. Thereafter, the next tag data is acquired, and the operation for reproducing the data for the specified time is repeated.

タグ情報は、参照画像が複数ある場合には同時刻に複数存在する場合がある。早送り操作があった時刻、または、早送り中にあるタグ情報の再生が終わった時刻が、次のタグ情報の再生すべきフレームの時刻と重複している場合には、その分のフレームを再生しないなどとする。 When there are a plurality of reference images, a plurality of tag information may exist at the same time. If the time at which a fast-forward operation is performed or the time when reproduction of tag information during fast-forwarding is overlapped with the time of the frame to be reproduced in the next tag information, the corresponding frame is not reproduced. And so on.

［変形例］
以上、この発明の実施形態について図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計等も特許請求の範囲に含まれる。 [Modification]
The embodiment of the present invention has been described in detail with reference to the drawings. However, the specific configuration is not limited to the embodiment, and the design and the like within the scope not departing from the gist of the present invention are also claimed. include.

また、上述した携帯電話の各機能や、その他の機能をハードウェア的に実現することは勿論として、上述した各機能を備えるコンピュータプログラムを、コンピュータのメモリにロードすることによっても実現できる。このコンピュータプログラムは、携帯電話に備えられた半導体メモリ等の記録媒体に格納される。そして、その記録媒体から携帯電話に内蔵されるコンピュータのメモリにロードされ実行されることにより、上述した各機能が実現される。 In addition to realizing the functions of the mobile phone described above and other functions in hardware, it can also be realized by loading a computer program having the functions described above into the memory of the computer. This computer program is stored in a recording medium such as a semiconductor memory provided in the mobile phone. Each function described above is realized by being loaded from the recording medium into a memory of a computer built in the mobile phone and executed.

１携帯電話
１０制御部
２０通信部
２２画像入出力部
２４カメラ部
２６表示部
３０画像処理部
３２画像エンコーダ部
３４画像デコーダ部
４０音声入出力部
４２マイク
４４スピーカ
５０音声処理部
５２音声エンコーダ部
５４音声デコーダ部
６０記憶部
６２参照画像特徴ＤＢ
６４動画ストレージ
７０動画処理部
７２動画データ格納部
７４動画データ展開部
８０画像解析部
８２特徴量算出部
８４特徴量比較部
８６タグ保存部
８８タグ読み出し部
９０再生指示部 DESCRIPTION OF SYMBOLS 1 Cellular phone 10 Control part 20 Communication part 22 Image input / output part 24 Camera part 26 Display part 30 Image processing part 32 Image encoder part 34 Image decoder part 40 Voice input / output part 42 Microphone 44 Speaker 50 Voice processing part 52 Voice encoder part 54 Audio decoder unit 60 Storage unit 62 Reference image feature DB
64 moving image storage 70 moving image processing unit 72 moving image data storage unit 74 moving image data development unit 80 image analysis unit 82 feature amount calculation unit 84 feature amount comparison unit 86 tag storage unit 88 tag reading unit 90 reproduction instruction unit

Claims

A moving image data storage unit that creates moving image data from an image input from the image input unit and a sound input from the audio input unit and stores the moving image data in the moving image storage; reads out the moving image data from the moving image storage; And a moving image data expansion unit that expands sound and outputs an image to the image output unit and outputs sound to the sound output unit,
A reference image feature storage unit that stores a reference image, a feature amount of the reference image, and a keyword in association with each other;
A feature amount calculation unit for calculating a feature amount of an image included in the moving image data;
A feature amount comparison unit that compares the feature amount calculated by the feature amount calculation unit with the feature amount stored in the reference image feature storage unit;
A tag information storage unit that generates and stores tag information including a keyword of a reference image similar to an image included in the moving image data based on a comparison result of the feature amount comparison unit;
A mobile phone comprising:

The mobile phone according to claim 1, wherein the tag information storage unit stores the start time of moving image data of an image similar to the reference image further included in the tag information.

3. The mobile phone according to claim 2, wherein the tag information storage unit further stores the end time of moving image data of an image similar to the reference image in addition to the tag information.

The mobile phone according to any one of claims 1 to 3, wherein the tag information storage unit stores scene information corresponding to a keyword corresponding to the reference image in tag information.

5. The mobile phone according to claim 1, further comprising: a reproduction instruction unit that instructs the moving image data expansion unit to perform reproduction based on tag information based on the moving image data. phone.

The playback instruction unit identifies a scene including a keyword from the tag information, and issues a playback instruction to the moving image data development unit so that the scene is continuously played back. Item 6. The mobile phone according to Item 5.

The video data expansion unit, in response to a fast-forward instruction from the playback instruction unit, based on the tag information, only the first frame of each tag or only a few seconds of frames from the top of each tag in order according to the time series 6. The mobile phone according to claim 5, wherein data is expanded and output.

In a computer equipped with an image output device and an audio output device,
A video data storage function for creating video data from image data and audio data and storing it in the video storage, and reading video data from the video storage, developing images and audio, and outputting the image to an image output device In addition, a program for realizing a video data expansion function for outputting sound to a sound output device,
A reference image feature storage function for storing a reference image, a feature amount of the reference image, and a keyword in association with each other;
A feature amount calculation function for calculating a feature amount of an image included in the moving image data;
A feature amount comparison function for comparing the feature amount calculated by the feature amount calculation function with the feature amount stored by the reference image feature storage function;
A tag information storage function for generating and storing tag information including a keyword of a reference image similar to an image included in the moving image data based on a comparison result of the feature amount comparison function;
A program that realizes