JP2007323268A

JP2007323268A - Video providing device

Info

Publication number: JP2007323268A
Application number: JP2006151446A
Authority: JP
Inventors: Yusuke Suzuki; 雄介鈴木
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 2006-05-31
Filing date: 2006-05-31
Publication date: 2007-12-13
Anticipated expiration: 2026-05-31
Also published as: JP5114871B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a video providing device for retrieving and providing a video based on the motion of a user. <P>SOLUTION: A video providing device 300 provides a motion video corresponding to a physical motion showing a specific concept. The video providing device 300 is provided with a video storage part for storing a plurality of motion videos and characteristic data characterizing each motion video; a user characteristic data acquisition part for acquiring the characteristic data from the physical motion of a user; and a video acquisition part for comparing the characteristic data acquired by the user characteristic data acquisition part with the characteristic data stored in the video storage part, and for acquiring a motion video similar to the physical motion of the user from the video storage part. The user characteristic data acquisition part may use an averaged image 360Im generated by averaging the respective pixel values of each of a plurality of static images extracted from one motion video as characteristic data. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、映像提供装置に関し、より詳細には、データベースから映像を検索して提供する映像提供装置に関する。 The present invention relates to a video providing apparatus, and more particularly, to a video providing apparatus that searches and provides a video from a database.

例えば手話のような、語や文章、記号に対応付けられた特定の意味を有する動作を含む映像を、複数の映像が記憶されたデータベースから検索する検索装置が、学習等の目的のため用いられている。従来、このような検索装置は、例えば、手の位置、手の形等の分類を、表示される候補の中から随時ボタンやマウス等によって範囲を選択して絞り込むといった作業が必要なものであった（例えば、特許文献１）。 For example, a search device that searches a database in which a plurality of videos are stored for videos including actions having specific meanings associated with words, sentences, and symbols, such as sign language, is used for learning purposes. ing. Conventionally, such a search device, for example, requires a task of selecting and narrowing the classification of hand positions, hand shapes, and the like from the displayed candidates by a button or a mouse as needed. (For example, Patent Document 1).

特開２０００−３１２７号公報JP 2000-3127 A

しかし、上記のように、動作の分類を選択する検索方法では、検索されるべき映像の情報をユーザがある程度正確に把握していることが必要である。また、ダイナミックサーチと呼ばれる検索キーと検索結果との動的な変更が難しい等の問題があった。 However, as described above, the search method for selecting the action classification requires that the user grasps the information of the video to be searched accurately to some extent. In addition, there is a problem that it is difficult to dynamically change a search key called a dynamic search and a search result.

そこで、本発明は、上記問題に鑑みてなされたものであり、本発明の目的とするところは、ユーザの動作に基づいて映像を検索して提供することの可能な、新規かつ改良された映像提供装置を提供することにある。 Therefore, the present invention has been made in view of the above problems, and an object of the present invention is to provide a new and improved video that can be searched for and provided based on a user's operation. It is to provide a providing device.

上記課題を解決するために、本発明のある観点によれば、特定の概念を表す身体の動作に対応した動作映像を提供する映像提供装置が提供される。かかる映像提供装置は、複数の動作映像および各動作映像を特徴付ける特徴データを記憶する映像記憶部と、ユーザの身体の動作から特徴データを取得するユーザ特徴データ取得部と、ユーザ特徴データ取得部により取得された特徴データと映像記憶部に記憶された特徴データとを比較して、映像記憶部からユーザの身体の動作に類似する動作映像を取得する映像検索部とを備えることを特徴とする。 In order to solve the above-described problems, according to an aspect of the present invention, there is provided an image providing apparatus that provides an operation image corresponding to a body motion representing a specific concept. Such a video providing device includes a video storage unit that stores a plurality of motion videos and feature data that characterizes each motion video, a user feature data acquisition unit that acquires feature data from a user's body motion, and a user feature data acquisition unit. A video search unit that compares the acquired feature data with the feature data stored in the video storage unit and acquires a motion video similar to the motion of the user's body from the video storage unit is provided.

本発明によれば、映像提供装置は、複数の動作映像と、その動作映像を特徴付ける特徴データを記憶する映像記憶部を備えている。一方、映像提供装置は、ユーザが行う動作から、その動作を特徴付ける特徴データを映像記憶部に記憶された動作映像の特徴データと同様の形式で取得する。そして、映像提供装置は、映像記憶部に記憶された特徴データと、ユーザの動作から取得された特徴データとを比較して、類似すると判定された特徴データに対応する動作映像を取得する。これにより、ボタンやマウス等を用いずに、ユーザの動作から所望の動作映像を取得することが可能となる。 According to the present invention, the video providing apparatus includes a video storage unit that stores a plurality of motion images and feature data that characterizes the motion images. On the other hand, the video providing apparatus acquires, from the action performed by the user, feature data characterizing the action in the same format as the feature data of the action video stored in the video storage unit. Then, the video providing device compares the feature data stored in the video storage unit with the feature data acquired from the user's motion, and acquires the motion video corresponding to the feature data determined to be similar. As a result, it is possible to obtain a desired motion image from the user's motion without using a button or a mouse.

ここで、ユーザ特徴データ取得部は、１つの動作映像から所定の時間ごとに複数の静止画像を取得し、取得した複数の静止画像から、該静止画像における身体の特定部位（例えば手）の位置を特徴データとしてそれぞれ抽出するようにしてもよい。さらに、ユーザ特徴データ取得部は、取得した複数の静止画像から、身体の特定部位（例えば手）の形状を特徴データとしてそれぞれ抽出することもできる。すなわち、この場合には、時間の経過とともに変化する身体の特定部位の位置や形状の変化を、所定の時間ごとに特徴データを取得することで、動作の変化をデータとして把握する。 Here, the user feature data acquisition unit acquires a plurality of still images at predetermined time intervals from one motion image, and the position of a specific part of the body (for example, a hand) in the still images from the acquired plurality of still images. May be extracted as feature data. Furthermore, the user feature data acquisition unit can also extract the shape of a specific part of the body (for example, a hand) as feature data from a plurality of acquired still images. That is, in this case, a change in the position or shape of a specific part of the body that changes with the passage of time is acquired as feature data at predetermined time intervals, thereby grasping the change in movement as data.

また、ユーザ特徴データ取得部は、ユーザの身体の動作を映像として取得する映像取得部と、取得した映像からユーザの身体の動作を特徴付ける特徴データを抽出する特徴データ抽出部とを備えてもよい。すなわち、動作するユーザを映像取得部により撮影して映像を取得し、この映像から特徴データを抽出することができる。あるいは、ユーザ特徴データ取得部は、ユーザが装着することにより、該ユーザの身体の動作を特徴付ける特徴データを取得することの可能な装着型入出力装置、例えばデータグローブを備えることもできる。このような装置を用いれば、データの加工をせずとも直接的に特徴データを取得することが可能である。 In addition, the user feature data acquisition unit may include a video acquisition unit that acquires the motion of the user's body as a video, and a feature data extraction unit that extracts feature data that characterizes the motion of the user's body from the acquired video. . That is, it is possible to acquire an image by photographing an operating user with the image acquisition unit, and extract feature data from the image. Alternatively, the user feature data acquisition unit may include a wearable input / output device, for example, a data glove, that can acquire feature data that characterizes the movement of the user's body when worn by the user. If such an apparatus is used, it is possible to acquire feature data directly without processing the data.

映像検索部は、ユーザ特徴データ取得部により取得された特徴データと映像記憶部に記憶された特徴データとの類否判断を、例えばＤＰマッチングを用いて行うことができる。 The video search unit can perform similarity determination between the feature data acquired by the user feature data acquisition unit and the feature data stored in the video storage unit using, for example, DP matching.

また、ユーザ特徴データ取得部は、特徴データとして、１つの動作映像から１つの静止画像を作成して取得することもできる。この場合、ユーザ特徴データ取得部は、ユーザの身体の動作を映像として取得する映像取得部と、映像取得部により取得された映像から静止画像を作成する映像加工部とを備える。映像検索部は、映像加工部により作成された静止画像と映像記憶部に記憶された静止画像とを比較して、映像記憶部からユーザの身体の動作に類似する動作映像を取得する。 The user feature data acquisition unit can also create and acquire one still image from one motion video as the feature data. In this case, the user feature data acquisition unit includes a video acquisition unit that acquires the motion of the user's body as a video, and a video processing unit that creates a still image from the video acquired by the video acquisition unit. The video search unit compares the still image created by the video processing unit with the still image stored in the video storage unit, and acquires an operation video similar to the motion of the user's body from the video storage unit.

ここで、映像加工部は、映像取得部により取得された１つの映像から複数の静止画像を抽出し、抽出された該複数の静止画像の各画素について、画素値を平均化して平均画素値を算出して、算出された各画素の平均画素値から１つの平均化画像を作成することができる。すなわち、この場合には、時間の経過とともに変化する身体の特定部位の位置や形状の変化を、時間を畳み込んだ１つの静止画像を作成することにより、動作の変化をデータとして把握する。 Here, the video processing unit extracts a plurality of still images from one video acquired by the video acquisition unit, averages pixel values for each pixel of the extracted still images, and calculates an average pixel value. It is possible to create one averaged image from the calculated average pixel value of each pixel. That is, in this case, a change in movement is grasped as data by creating one still image in which the change in position and shape of a specific part of the body that changes with the passage of time is convoluted with time.

また、映像提供装置は、映像取得部により取得された映像から、ユーザの身体の特定部位が位置する領域を認識する画像認識部をさらに備えることもできる。映像記憶部が、動作映像における人物の身体の特定部位が位置する領域にしたがって、動作映像を分類して記憶している場合、画像認識部は、認識した領域に基づいて、映像記憶部に記憶された動作映像のうち、特定の分類に属する動作映像のみを検索対象として決定する。すなわち、検索する記憶部に記憶される情報を大まかに分類しておくことにより、ユーザの動作と大きく異なる動作映像を検索対象から除外することができるので、検索処理を高速化することが可能となる。 The video providing apparatus may further include an image recognition unit that recognizes a region where a specific part of the user's body is located from the video acquired by the video acquisition unit. When the video storage unit classifies and stores the motion video according to the region where the specific part of the human body in the motion video is located, the image recognition unit stores the motion video in the video storage unit based on the recognized region. Of the motion pictures that have been processed, only motion pictures that belong to a specific category are determined as search targets. In other words, by roughly classifying the information stored in the storage unit to be searched, it is possible to exclude from the search target motion video that is significantly different from the user's motion, so that the search process can be speeded up. Become.

また、映像検索部は、ユーザ特徴データ取得部により取得された特徴データと映像記憶部に記憶された特徴データとの類否判断を、例えばパターンマッチング処理により行うことができる。具体的には、例えば差分総和法や正規化相関法等の方法を用いることができる。 In addition, the video search unit can perform similarity determination between the feature data acquired by the user feature data acquisition unit and the feature data stored in the video storage unit by, for example, pattern matching processing. Specifically, for example, a method such as a difference sum method or a normalized correlation method can be used.

さらに、映像提供装置は、ユーザ特徴データ取得部により取得されたユーザの身体の動作、または映像取得部により取得された動作映像のうち、少なくとも１つを表示することの可能な映像表示部をさらに備えることができる。かかる映像表示部には、ユーザが動作を行う時間を示す動作時間表示部を設けてもよい。 Furthermore, the video providing device further includes a video display unit capable of displaying at least one of the user's body motion acquired by the user feature data acquisition unit or the motion video acquired by the video acquisition unit. Can be provided. The video display unit may be provided with an operation time display unit that indicates a time during which the user performs an operation.

また、本発明にかかる映像提供装置は、ユーザ特徴データ取得部により取得された特徴データに応じて、映像表示部における映像の表示を制御する制御部をさらに備えることもできる。かかる制御部は、例えばユーザの特徴データとして身体の手の位置を取得した場合、手の位置が映像表示部の特定の領域に位置すると判定した場合に、例えば表示する映像を拡大する制御命令を映像表示部に送信する。このように、制御部は、ユーザの特徴データに応じて、映像表示部の表示を制御する。 The video providing apparatus according to the present invention may further include a control unit that controls display of video on the video display unit in accordance with the feature data acquired by the user feature data acquisition unit. For example, when the position of the hand of the body is acquired as the feature data of the user, and when it is determined that the position of the hand is located in a specific area of the image display unit, the control unit issues a control command for enlarging the image to be displayed, for example. Send to the video display. As described above, the control unit controls the display of the video display unit according to the feature data of the user.

このような映像提供装置に適用する動作映像は、例えば、特定の意味を表現する手話とすることができる。この際、表示部には、１または２以上の手話単語を表す映像を表示するようにしてもよい。手話は身体の動作により意味を伝達するため、動作を中断せずに検索、画面操作等を行うことができる点で本発明の映像提供装置への適用に適している。 The motion video applied to such a video providing device can be, for example, a sign language expressing a specific meaning. At this time, an image representing one or more sign language words may be displayed on the display unit. Since sign language conveys the meaning by the movement of the body, it is suitable for application to the video providing apparatus of the present invention in that search, screen operation, etc. can be performed without interrupting the movement.

以上説明したように本発明によれば、ユーザの動作に基づいて映像を検索して提供することの可能な映像提供装置を提供することができる。 As described above, according to the present invention, it is possible to provide a video providing apparatus capable of searching and providing a video based on a user's operation.

以下に添付図面を参照しながら、本発明の好適な実施の形態について詳細に説明する。なお、本明細書及び図面において、実質的に同一の機能構成を有する構成要素については、同一の符号を付することにより重複説明を省略する。 Exemplary embodiments of the present invention will be described below in detail with reference to the accompanying drawings. In addition, in this specification and drawing, about the component which has the substantially same function structure, duplication description is abbreviate | omitted by attaching | subjecting the same code | symbol.

以下に示す各実施形態では、手話の学習に際して使用することを想定した、ユーザが行う動作と類似する手話動作をデータベースから検索する装置への適用例について説明する。 In the following embodiments, application examples to an apparatus that searches a database for a sign language action similar to an action performed by a user, which is assumed to be used for learning sign language, will be described.

（第１の実施形態）
まず、図１〜４に基づいて、本発明の第１の実施形態にかかる映像提供装置について説明する。ここで、図１は、本実施形態にかかる映像提供装置１００の構成を示すブロック図である。図２は、手の位置情報を説明するための説明図である。図３は、検索情報データベース１６０に記憶される情報の例を示す説明図である。図４は、手の形を説明するための説明図である。 (First embodiment)
First, a video providing apparatus according to the first embodiment of the present invention will be described with reference to FIGS. Here, FIG. 1 is a block diagram showing a configuration of the video providing apparatus 100 according to the present embodiment. FIG. 2 is an explanatory diagram for explaining hand position information. FIG. 3 is an explanatory diagram illustrating an example of information stored in the search information database 160. FIG. 4 is an explanatory diagram for explaining the shape of the hand.

図１に示すように、本実施形態にかかる映像提供装置１００は、映像取得部１１０と、画像処理部１２０と、画像認識部１３０と、情報検索部１４０と、映像表示部１５０と、検索情報データベース１６０と、映像データベース１７０とを備える。また、符号１０は、映像検索を行うユーザである。 As shown in FIG. 1, a video providing apparatus 100 according to the present embodiment includes a video acquisition unit 110, an image processing unit 120, an image recognition unit 130, an information search unit 140, a video display unit 150, and search information. A database 160 and a video database 170 are provided. Reference numeral 10 denotes a user who performs video search.

映像取得部１１０は、ユーザ１０の動作を撮影して動作映像を取得する機能部であり、例えばカメラを用いることができる。映像取得部１１０は、外部から入力される指示にしたがって動作可能であり、向き、ズーム動作、パン・チルト動作等の撮影条件を変更することができる。映像取得部１１０により取得された映像は、画像処理部１２０に送信される。なお、映像提供装置１００の構成により、１または２以上の映像取得部１１０を備えることができる。 The video acquisition unit 110 is a functional unit that captures a motion of the user 10 and acquires a motion video. For example, a camera can be used. The video acquisition unit 110 can operate according to instructions input from the outside, and can change shooting conditions such as orientation, zoom operation, pan / tilt operation, and the like. The video acquired by the video acquisition unit 110 is transmitted to the image processing unit 120. Depending on the configuration of the video providing apparatus 100, one or more video acquisition units 110 can be provided.

画像処理部１２０は、映像取得部１１０により取得された映像から静止画像を取得し、取得した静止画像に対して画像処理を行う機能部である。画像処理部１２０により取得された静止画像は、画像認識部１３０に送信される。 The image processing unit 120 is a functional unit that acquires a still image from the video acquired by the video acquisition unit 110 and performs image processing on the acquired still image. The still image acquired by the image processing unit 120 is transmitted to the image recognition unit 130.

画像認識部１３０は、画像処理部１２０により取得された静止画像から特徴データを抽出する機能部である。本実施形態における特徴データは、静止画像におけるユーザ１０の重心位置、手の位置、手の形等からなる。例えば、ユーザ１０の重心位置および手の位置は、ユーザ１０が表示された画面を、図２左下を原点（０、０）とするｘｙ座標系として、画面内での手の位置を二次元座標で表した値により示すことができる。例えば、図２に示すように、ユーザ１０の重心位置は、ユーザ１０の体領域の重心位置Ｇ（Ｘ_Ｇ、Ｙ_Ｇ）とし、手の位置は、右手領域、左手領域の各重心位置を左右の手の位置Ｐ_Ｒ（Ｘ_Ｒ、Ｙ_Ｒ）、Ｐ_Ｌ（Ｘ_Ｌ、Ｙ_Ｌ）としてもよい。画像認識部１３０により取得された特徴データは、情報検索部１４０に送信される。 The image recognition unit 130 is a functional unit that extracts feature data from the still image acquired by the image processing unit 120. The feature data in the present embodiment includes the position of the center of gravity of the user 10 in the still image, the position of the hand, the shape of the hand, and the like. For example, the position of the center of gravity and the position of the hand of the user 10 are expressed in two-dimensional coordinates using the screen on which the user 10 is displayed as an xy coordinate system with the lower left of FIG. 2 being the origin (0, 0). It can show by the value represented by. For example, as shown in FIG. 2, the gravity center position of the user 10 is the gravity center position G (X _G , Y _G ) of the body area of the user 10, and the hand position is the left and right gravity center positions of the right hand area and the left hand area. hand position _{_{_{_{P R (X R, Y R}}}} ), P L (X L, Y L) may be. The feature data acquired by the image recognition unit 130 is transmitted to the information search unit 140.

情報検索部１４０は、ユーザ１０の動作と類似する動作映像を検索する機能部である。情報検索部１４０は、画像認識部１３０から受信した特徴データを、１つの動作映像の時間に相当する所定時間分蓄積し、蓄積した特徴データと対応する動作映像の映像ＩＤを検索情報データベース１６０から取得する。後述する検索情報データベース１６０には、動作映像に付された固有の映像ＩＤと、検索のために動作映像からあらかじめ抽出された特徴データ等が記憶されている。ユーザ１０の動作と類似する動作映像の映像ＩＤは、例えばＤＰ（ＤｙｎａｍｉｃＰｒｏｇｒａｍｍｉｎｇ）マッチング等の既存の方法を用いて、画像認識部１３０により取得されたユーザ１０の特徴データと、検索情報データベース１６０に記憶された特徴データとを比較することにより検索することができる。このとき、１または２以上の映像ＩＤが、検索情報データベース１６０から取得される。また、複数の映像ＩＤを取得した場合には、例えば所定の値に設定された類似度等に基づいて、動作映像に対してユーザ１０に提示する順序付けを行うこともできる。 The information search unit 140 is a functional unit that searches for an operation video similar to the operation of the user 10. The information search unit 140 accumulates the feature data received from the image recognition unit 130 for a predetermined time corresponding to the time of one motion video, and stores the video ID of the motion video corresponding to the stored feature data from the search information database 160. get. A search information database 160, which will be described later, stores a unique video ID assigned to an operation video, feature data extracted in advance from the operation video for search, and the like. The video ID of the motion video similar to the motion of the user 10 is stored in the search data database 160 and the feature data of the user 10 acquired by the image recognition unit 130 using an existing method such as DP (Dynamic Programming) matching. A search can be made by comparing the stored feature data. At this time, one or more video IDs are acquired from the search information database 160. Further, when a plurality of video IDs are acquired, for example, based on the similarity set to a predetermined value or the like, it is possible to order the motion videos to be presented to the user 10.

映像表示部１５０は、映像を表示する機能部であり、例えばディスプレイ等を用いることができる。映像表示部１５０には、映像取得部１１０が取得したユーザ１０の動作の映像や、情報検索部１４０により取得された映像ＩＤに対応する動作映像等が表示される。 The video display unit 150 is a functional unit that displays video, and for example, a display or the like can be used. The video display unit 150 displays a video of the operation of the user 10 acquired by the video acquisition unit 110, an operation video corresponding to the video ID acquired by the information search unit 140, and the like.

検索情報データベース１６０は、情報検索部１４０による映像ＩＤ検索のために用いるデータを記憶する記憶部であり、例えばＲＡＭやハードディスク等のメモリを含んで構成される。検索情報データベース１６０は、少なくとも、映像データベース１７０に記憶されている動作映像に関連付けられた映像ＩＤと、情報検索部１４０が検索に用いる、ユーザ１０の動作を示す特徴データと比較可能なデータとが記憶されている。ここで、「ユーザ１０の動作を示す特徴データと比較可能なデータ」とは、ユーザ１０の動作の映像についての特徴データと同一の形式で、映像データベース１７０に記憶された映像から抽出された特徴データをいう。例えば、手話の例を挙げれば、動作映像における一定時間ごとの手の位置や手の形等が数値として格納される。 The search information database 160 is a storage unit that stores data used for video ID search by the information search unit 140, and includes, for example, a memory such as a RAM or a hard disk. The search information database 160 includes at least a video ID associated with the motion video stored in the video database 170 and data that can be compared with the feature data indicating the motion of the user 10 used by the information search unit 140 for the search. It is remembered. Here, the “data that can be compared with the feature data indicating the operation of the user 10” is the feature extracted from the video stored in the video database 170 in the same format as the feature data of the video of the user 10 operation. Data. For example, in the case of sign language, the position of the hand and the shape of the hand at regular intervals in the motion video are stored as numerical values.

検索情報データベース１６０は、例えば、図２に示すように、映像ＩＤ１６１、日本語ラベル１６２、ファイル名１６３、特徴データ１６４等を記憶している。映像ＩＤ１６１は、映像データベース１７０に記憶されている動作映像に関連付けられた固有の記号である。日本語ラベル１６２は、動作映像の示す内容の理解を容易にするために設けられる文字列であり、また、検索された動作映像を映像表示部１５０に表示する際に、検索結果の一部として表示することもできる。ファイル名１６３は、動作映像のファイル名を示す。 The search information database 160 stores, for example, a video ID 161, a Japanese label 162, a file name 163, feature data 164, etc., as shown in FIG. The video ID 161 is a unique symbol associated with the motion video stored in the video database 170. The Japanese label 162 is a character string provided for facilitating understanding of the content indicated by the motion video, and as a part of the search result when the retrieved motion video is displayed on the video display unit 150. It can also be displayed. The file name 163 indicates the file name of the operation video.

特徴データ１６４は、動作映像から抽出された一定時間分のデータであり、例えば、左右の手について手の位置、手の形を一定の時間ごとに記憶している。ここで、「手の位置」とは、図２を参照しながら上述したように、二次元座標上の位置で表される手の位置である。また、「手の形」とは、指を伸ばしたり曲げたりすることにより表される手の形状をいう。図４に、手の形と記号とを対応付けした例を示す。例えば、通常手話の場合、手の形は約８０種類に区別されている。例えば、手を握った形状を０、０の状態から親指を開いた形状を１０、人差し指と中指を開いて前方にほぼ直角に傾けた形状を６５等のように対応させることができる。 The feature data 164 is data for a certain period of time extracted from the motion video, and stores, for example, the position and shape of the hand for each left and right hand at regular intervals. Here, the “hand position” is a hand position represented by a position on a two-dimensional coordinate as described above with reference to FIG. Further, the “hand shape” refers to a hand shape expressed by stretching or bending a finger. FIG. 4 shows an example in which hand shapes and symbols are associated with each other. For example, in the case of normal sign language, there are about 80 types of hand shapes. For example, the shape in which the hand is grasped can be made to correspond to a shape in which the thumb is opened from the state of 0, 0, 10 and a shape in which the forefinger and middle finger are opened and inclined forward at a substantially right angle is 65.

映像データベース１７０は、情報検索部１４０により取得された映像ＩＤに関連付けられた動作映像を記憶する記憶部であり、例えばＲＡＭやハードディスク等のメモリを含んで構成される。映像データベース１７０は、例えば、映像ＩＤ、動作映像等を記憶している。映像データベース１７０に記憶される動作映像は、検索結果として、映像表示部１５０を介してユーザ１０に提示される映像である。 The video database 170 is a storage unit that stores an operation video associated with the video ID acquired by the information search unit 140, and includes a memory such as a RAM or a hard disk. The video database 170 stores, for example, a video ID, an operation video, and the like. The operation video stored in the video database 170 is a video presented to the user 10 through the video display unit 150 as a search result.

このような構成の映像提供装置１００を用いた検索では、例えば図５に示すように、映像表示部１５０に、映像取得部１１０により取得されたユーザ１０の動作を示す画像１５５および情報検索部１４０の検索により取得された動作映像１５７が表示させることにより、ユーザ１０に対して検索結果を提示することができる。次に、図６および図７に基づいて、かかる映像提供装置１００による動作映像の検索処理について説明する。ここで、図６は、初期設定の処理を示すフローチャートである。図７は、映像提供装置１００による動作映像の検索処理を示すフローチャートである。 In the search using the video providing apparatus 100 having such a configuration, for example, as illustrated in FIG. 5, an image 155 indicating the operation of the user 10 acquired by the video acquisition unit 110 and the information search unit 140 are displayed on the video display unit 150. By displaying the operation video 157 acquired by the search, it is possible to present the search result to the user 10. Next, operation video search processing by the video providing apparatus 100 will be described with reference to FIGS. 6 and 7. FIG. 6 is a flowchart showing the initial setting process. FIG. 7 is a flowchart showing operation video search processing by the video providing apparatus 100.

＜１．初期設定処理＞
映像提供装置１００の初期設定は、映像取得部１１０から出力される映像の質を最適化するために行う処理であり、かかる処理により、検索の精度を向上させることができる。映像提供装置１００の初期設定では、図６に示すように、まず、画像認識部１３０により、初期設定用テンプレート画像（以下、「テンプレート画像」とする。）について、特徴データを抽出して保持する（Ｓ１０１）テンプレート画像とは、ユーザ１０に指示する姿勢と同一の姿勢を最適な撮影条件下において撮像した静止画像である。すなわち、ステップＳ１０１では、テンプレート画像の、例えば手の位置および手の形を認識する。 <1. Initial setting processing>
The initial setting of the video providing apparatus 100 is a process performed to optimize the quality of the video output from the video acquisition unit 110, and the search accuracy can be improved by such a process. In the initial setting of the video providing apparatus 100, as shown in FIG. 6, first, feature data is extracted and held for an initial setting template image (hereinafter referred to as “template image”) by the image recognition unit 130. (S101) The template image is a still image obtained by capturing the same posture as the one instructed to the user 10 under the optimal shooting conditions. That is, in step S101, for example, the hand position and hand shape of the template image are recognized.

次いで、ユーザ１０に対してテンプレート画像と同一の姿勢をとるように指示し、ユーザ１０の姿勢を映像取得部１１０により撮影する（Ｓ１０３）。ステップＳ１０３では、映像表示部１５０にテンプレート画像を表示することにより、ユーザ１０に動作を促すことができる。また、ユーザ１０は、例えば初期設定が終了するまで姿勢を変更しないことが望ましい。 Next, the user 10 is instructed to take the same posture as the template image, and the posture of the user 10 is photographed by the video acquisition unit 110 (S103). In step S103, the template 10 is displayed on the video display unit 150, whereby the user 10 can be prompted to operate. Further, it is desirable that the user 10 does not change the posture until the initial setting is completed, for example.

さらに、画像処理部１２０により、映像取得部１１０で取得した映像から、１つの画像を取得する（Ｓ１０５）。取得される画像は、例えば、映像の撮像時間の中間時点における画像とすることができる。画像処理部１２０により取得された画像は、画像認識部１３０に送信される。 Further, the image processing unit 120 acquires one image from the video acquired by the video acquisition unit 110 (S105). The acquired image can be, for example, an image at an intermediate point in the video imaging time. The image acquired by the image processing unit 120 is transmitted to the image recognition unit 130.

その後、画像処理部１２０により取得された静止画像について、特徴データを抽出する（Ｓ１０７）。ステップＳ１０７において抽出する特徴データは、ステップＳ１０１において抽出されたテンプレート画像の特徴データと同様、例えば手の位置および手の形に関する情報とすることができる。手の位置は映像のｘｙ座標における座標値により、手の形は各形状に対応付けられた記号により、数値として表すことができる。そして、ユーザの姿勢に関する特徴データとテンプレート画像の特徴データとを比較して、映像提供装置１００の設定の調整を行う。 Thereafter, feature data is extracted from the still image acquired by the image processing unit 120 (S107). The feature data extracted in step S107 can be, for example, information related to the hand position and hand shape, similar to the feature data of the template image extracted in step S101. The position of the hand can be expressed as a numerical value by a coordinate value in the xy coordinates of the video, and the shape of the hand can be expressed by a symbol associated with each shape. Then, the feature data relating to the user's posture is compared with the feature data of the template image, and the setting of the video providing device 100 is adjusted.

まず、映像撮像部１１０により取得された映像から得た画像において、ユーザが画像の中心位置にいるか否かを判定する（Ｓ１０９）。ステップＳ１０９において、ユーザが画像の中心位置にいると判定された場合には、ステップＳ１１１の処理に移行する。一方、ユーザが画像の中心位置から外れていると判定された場合には、映像取得部１１０の調整を行う（Ｓ１１３）。この場合、例えば映像取得部１１０を、ユーザが画像の中心位置から外れて位置する方向と反対方向に回転移動させる等の調整が行われる。ステップＳ１１３における調整処理を終えると、再度ステップＳ１０５の処理を実行する。 First, in the image obtained from the video acquired by the video imaging unit 110, it is determined whether or not the user is at the center position of the image (S109). If it is determined in step S109 that the user is at the center position of the image, the process proceeds to step S111. On the other hand, when it is determined that the user is out of the center position of the image, the video acquisition unit 110 is adjusted (S113). In this case, for example, adjustment such as rotating the video acquisition unit 110 in a direction opposite to the direction in which the user deviates from the center position of the image is performed. When the adjustment process in step S113 is completed, the process of step S105 is executed again.

次いで、映像撮像部１１０により取得された映像において、映像内のユーザの大きさが適当か否かを判定する（Ｓ１１１）。ステップＳ１１１において映像内のユーザの大きさが適当であると判定された場合には、初期処理を終了する。一方、映像内のユーザの大きさが大きすぎたり小さすぎたりして不適当であると判定された場合には、映像取得部１１０の調整を行う（Ｓ１１３）。この場合、例えば映像取得部１１０のズーム調整等の調整が行われる。ステップＳ１１３における調整処理を終えると、再度ステップＳ１０５の処理を実行する。 Next, it is determined whether or not the size of the user in the video is appropriate in the video acquired by the video imaging unit 110 (S111). If it is determined in step S111 that the size of the user in the video is appropriate, the initial process is terminated. On the other hand, when it is determined that the size of the user in the video is too large or too small, the video acquisition unit 110 is adjusted (S113). In this case, for example, adjustment such as zoom adjustment of the video acquisition unit 110 is performed. When the adjustment process in step S113 is completed, the process of step S105 is executed again.

以上の処理を行うことにより、最終的に、映像取得部１１０は、撮影したユーザが画面の中心位置に、適当な大きさで表示される映像を取得することができるように設定される。 By performing the above processing, the video acquisition unit 110 is finally set so that the photographed user can acquire a video displayed at an appropriate size at the center position of the screen.

以上、映像提供装置１００の初期処理について説明した。かかる初期処理では、映像取得部１１０から取得された映像から静止画像を取得し、取得した静止画像に対して特徴データを抽出する認識処理を行う。そして、認識処理結果に基づいて映像取得部１１０に撮影状況修正のための指示を送信することにより、映像取得部１１０から出力される映像の質を向上させることが可能となる。 The initial processing of the video providing apparatus 100 has been described above. In the initial processing, a still image is acquired from the video acquired from the video acquisition unit 110, and recognition processing for extracting feature data from the acquired still image is performed. Then, it is possible to improve the quality of the video output from the video acquisition unit 110 by transmitting an instruction for correcting the shooting situation to the video acquisition unit 110 based on the recognition processing result.

なお、初期設定方法は、上述した方法に限定されない。例えば、テンプレート画像や映像取得部１１０により取得する映像の特徴データは、映像全体に対するものではなく、映像の一部分に関するデータでも構わない。また、映像取得部１１０により取得する映像の特徴データは、映像の一部分に関するデータでもよく、周波数領域に展開したもの、モザイク処理や中間値を取得して画像の画素数を減らしたデータ、またはこれらのデータに対して処理を施したもの等であってもよい。 Note that the initial setting method is not limited to the method described above. For example, the feature data of the video acquired by the template image or the video acquisition unit 110 is not related to the entire video but may be data related to a part of the video. Further, the video feature data acquired by the video acquisition unit 110 may be data relating to a part of the video, data developed in the frequency domain, data obtained by acquiring mosaic processing or intermediate values to reduce the number of pixels of the image, or these Data obtained by processing the data may be used.

＜２．動作映像検索処理＞
次に、図７に基づいて、本実施形態にかかる映像提供装置１００における動作映像の検索処理について説明する。まず、ユーザの動作を映像取得部１１０により映像として取得する（Ｓ１１５）。映像取得部１１０により取得された映像は、画像処理部１２０に送信される。 <2. Operation video search processing>
Next, operation video search processing in the video providing apparatus 100 according to the present embodiment will be described with reference to FIG. First, the user's action is acquired as an image by the image acquisition unit 110 (S115). The video acquired by the video acquisition unit 110 is transmitted to the image processing unit 120.

次いで、画像処理部１２０は、映像取得部１１０から受信した映像から静止画像を取得する（Ｓ１１７）。画像処理部１２０は、例えば、１つの映像から所定の時間ごとに複数の静止画像を取得する。こうして取得された複数の静止画像は、画像認識部１３０に送信される。 Next, the image processing unit 120 acquires a still image from the video received from the video acquisition unit 110 (S117). For example, the image processing unit 120 acquires a plurality of still images from a single video every predetermined time. The plurality of still images acquired in this way are transmitted to the image recognition unit 130.

さらに、画像認識部１３０は、画像処理部１２０より受信した複数の画像について、それぞれ特徴データを取得する（Ｓ１１９）。ステップＳ１１９では、検索情報データベース１６０に記憶された動作映像の特徴データと比較して、ユーザの動作と類似する動作映像を検索するために用いられるための情報を取得する。本実施形態では、ユーザの左右の手の位置および手の形等の情報が取得される。取得された特徴データは、画像認識部１３０から情報検索部１４０へ送信される。 Further, the image recognition unit 130 acquires feature data for each of the plurality of images received from the image processing unit 120 (S119). In step S119, information for use in searching for an action video similar to the user's action is acquired by comparing with the feature data of the action video stored in the search information database 160. In the present embodiment, information such as the positions of the left and right hands of the user and the shape of the hand is acquired. The acquired feature data is transmitted from the image recognition unit 130 to the information search unit 140.

その後、情報検索部１４０は、画像認識部１３０により取得された特徴データに基づいて、検索情報データベース１６０からユーザの動作と類似する動作内容を含む動作映像の映像ＩＤを取得する（Ｓ１２１）。情報検索部１４０は、まず、画像認識部１３０により取得された特徴データを一定時間分蓄積する。そして、蓄積された特徴データと検索情報データベース１６０に記憶された特徴データとを比較する。 Thereafter, the information search unit 140 acquires the video ID of the motion video including the operation content similar to the user's operation from the search information database 160 based on the feature data acquired by the image recognition unit 130 (S121). The information search unit 140 first accumulates feature data acquired by the image recognition unit 130 for a certain period of time. Then, the accumulated feature data is compared with the feature data stored in the search information database 160.

具体的には、例えば、ＤＰマッチングを用いることができる。ここで、ＤＰマッチングとは、音声認識など、パターン認識の分野で広く用いられている手法である。例えば、時系列データで与えられる系列長の異なる２つの時系列パターンの類似度を求めることができる。認識に用いる場合には、入力された時系列のパターンと、認識の対象となる比較対象時系列パターンとを、時間などの制約のもとで、可能な照合と類似度計算を実施する。各時点における類似度を所定の時間の範囲で累積した値を入力パターンと比較パターンとの距離とし、入力パターンと認識の対象となるすべてのデータのパターンとの距離を計算し、その中で最小の距離を示すデータを認識結果とする。この計算は、動的計画法を用いると効率よく解くことができるため、ＤＰマッチングと呼ばれている。 Specifically, for example, DP matching can be used. Here, DP matching is a technique widely used in the field of pattern recognition, such as voice recognition. For example, the similarity between two time series patterns having different series lengths given by time series data can be obtained. When used for recognition, the input time-series pattern and the comparison target time-series pattern to be recognized are subjected to possible matching and similarity calculation under constraints such as time. The distance between the input pattern and the comparison pattern is calculated by accumulating the similarity at each time point within the specified time range, and the distance between the input pattern and all the data patterns to be recognized is calculated. The data indicating the distance is taken as the recognition result. This calculation is called DP matching because it can be solved efficiently by using dynamic programming.

これにより、ユーザの動作を示す特徴データと類似度の高いデータを有する動作映像の映像ＩＤを取得する。取得される映像ＩＤの数は、１または２以上とすることができ、あらかじめ設定してもよく、所定の類似度以上の映像を示すようにしてもよい。取得された映像ＩＤは、映像表示部１５０に送信される。このとき、映像ＩＤとともに、日本語ラベル、映像ファイル名を送信してもよい。 As a result, the video ID of the motion video having data having high similarity to the feature data indicating the user's motion is acquired. The number of acquired video IDs may be 1 or 2 or more, and may be set in advance or may indicate a video with a predetermined similarity or higher. The acquired video ID is transmitted to the video display unit 150. At this time, a Japanese label and a video file name may be transmitted together with the video ID.

次いで、映像表示部１５０は、情報検索部１４０により取得された映像ＩＤに基づいて、映像ＩＤに対応付けられた動作映像を映像データベース１７０より取得する（Ｓ１２３）。そして、取得された映像は、映像表示部１５０に表示される（Ｓ１２５）。この際、映像表示部１５０が情報検索部１４０から日本語ラベルも受信している場合には、日本語ラベルも映像表示部１５０に表示してもよい。このようにして、検索結果をユーザに提示することができる。 Next, the video display unit 150 acquires an operation video associated with the video ID from the video database 170 based on the video ID acquired by the information search unit 140 (S123). Then, the acquired video is displayed on the video display unit 150 (S125). At this time, if the video display unit 150 also receives a Japanese label from the information search unit 140, the Japanese label may be displayed on the video display unit 150. In this way, the search result can be presented to the user.

以上、第１の実施形態にかかる情報提供装置１００について説明した。本実施形態によれば、手の位置や手の形を、マウスなど入力装置を用いて明示的に選択することなく、カメラなどの映像取得部１１０によりユーザ１０の動作を検索情報として入力することにより、ユーザ１０の動作に類似した動作内容を有する動作映像を容易に検索でき、ユーザ１０に提供することができる。 The information providing apparatus 100 according to the first embodiment has been described above. According to the present embodiment, the operation of the user 10 is input as search information by the video acquisition unit 110 such as a camera without explicitly selecting the position and shape of the hand using an input device such as a mouse. Thus, an operation video having an operation content similar to the operation of the user 10 can be easily searched and provided to the user 10.

（第２の実施形態）
次に、図８および図９に基づいて、本発明の第２の実施形態にかかる映像提供装置について説明する。なお、図８は、本実施形態にかかる映像提供装置２００の構成を示すブロック図である。図９は、本実施形態にかかる映像検索処理を示すフローチャートである。 (Second Embodiment)
Next, an image providing apparatus according to the second embodiment of the present invention will be described based on FIGS. 8 and 9. FIG. 8 is a block diagram showing the configuration of the video providing apparatus 200 according to the present embodiment. FIG. 9 is a flowchart showing video search processing according to the present embodiment.

本実施形態にかかる映像提供装置２００は、ユーザ情報取得部２１０と、情報検索部２２０と、映像表示部２３０と、検索情報データベース２４０と、映像データベース２５０とを備える。本実施形態にかかる映像提供装置２００は、第１の実施形態と比較して、映像取得部１１０の代わりに、ユーザ１０の動作を取得するためのユーザ情報取得部２１０を備える点で相違する。なお、図８における符号２０は、ユーザ１０の手を表している。 The video providing apparatus 200 according to the present embodiment includes a user information acquisition unit 210, an information search unit 220, a video display unit 230, a search information database 240, and a video database 250. The video providing apparatus 200 according to the present embodiment is different from the first embodiment in that a user information acquisition unit 210 for acquiring the operation of the user 10 is provided instead of the video acquisition unit 110. In addition, the code | symbol 20 in FIG. 8 represents the user's 10 hand.

ユーザ情報取得部２１０は、ユーザ１０の動作を示す情報（特徴データ）を映像以外の方法で取得する機能部であり、例えば、データグローブ（手袋状の入出力装置）等を用いることができる。ユーザ１０は、例えば、データグローブを手２０に装着して動作することにより、特徴データとして使用される手の位置情報、手の形等の手に関する情報を直接的に取得することができる。このため、第１の実施形態のように画像処理部１２０による画像処理、画像認識部１３０による特徴データの取得処理を行わなくともよい。ユーザ情報取得部２１０で取得された特徴データは、情報検索部２２０に送信される。 The user information acquisition unit 210 is a functional unit that acquires information (feature data) indicating the operation of the user 10 by a method other than video. For example, a data glove (glove-like input / output device) or the like can be used. For example, the user 10 can directly acquire information on the hand such as the position information of the hand and the shape of the hand used as the feature data by operating with the data glove attached to the hand 20. For this reason, it is not necessary to perform image processing by the image processing unit 120 and feature data acquisition processing by the image recognition unit 130 as in the first embodiment. The feature data acquired by the user information acquisition unit 210 is transmitted to the information search unit 220.

なお、情報検索部２２０、映像表示部２３０、検索情報データベース２４０および映像データベース２５０は、第１の実施形態にかかる情報検索部１４０、映像表示部１５０、検索情報データベース１６０および映像データベース１７０にそれぞれ対応し、同一の機能を有するため、ここではその説明を省略する。 The information search unit 220, the video display unit 230, the search information database 240, and the video database 250 correspond to the information search unit 140, the video display unit 150, the search information database 160, and the video database 170 according to the first embodiment, respectively. And since it has the same function, the description is abbreviate | omitted here.

次に、本実施形態にかかる映像提供装置２００による動作映像の検索処理について説明する。なお、第１の実施形態と同一の処理については、その詳細な説明は省略する。 Next, operation video search processing by the video providing apparatus 200 according to the present embodiment will be described. Note that detailed description of the same processing as that of the first embodiment is omitted.

＜動作映像検索処理＞
まず、図９に示すように、ユーザの動作をユーザ情報取得部２１０により取得する（Ｓ２０１）。ユーザ１０は、例えばデータグローブ等のユーザ情報取得部２１０を装着して動作することにより、特徴データとして、例えばユーザ１０の手の位置および手の形に関する情報が一定時間ごとに取得される。ユーザ情報取得部２１０により取得された特徴データは、情報検索部２２０に送信される。なお、ユーザ情報取得部２１０の初期設定処理は、必要に応じて行えばよい。 <Operation video search processing>
First, as shown in FIG. 9, the user information is acquired by the user information acquisition unit 210 (S201). By operating the user 10 wearing a user information acquisition unit 210 such as a data glove, for example, information regarding the position and shape of the hand of the user 10 is acquired as feature data at regular intervals. The feature data acquired by the user information acquisition unit 210 is transmitted to the information search unit 220. The initial setting process of the user information acquisition unit 210 may be performed as necessary.

次いで、情報検索部２２０は、ユーザ情報取得部２１０により取得された特徴データに基づいて、検索情報データベース２４０からユーザの動作と類似する動作内容を含む動作映像の映像ＩＤを取得する（Ｓ２０３）。情報検索部２２０は、まず、ユーザ情報取得部２１０により取得された特徴データを一定時間分蓄積する。そして、蓄積された特徴データと検索情報データベース２４０に記憶された特徴データとを比較する。具体的には、例えば、ＤＰマッチングを用いることができる。取得される映像ＩＤの数は、１または２以上とすることができ、あらかじめ設定してもよく、所定の類似度以上の映像を示すようにしてもよい。取得された映像ＩＤは、映像表示部２３０に送信される。このとき、映像ＩＤとともに、日本語ラベル、映像ファイル名を送信してもよい。 Next, based on the feature data acquired by the user information acquisition unit 210, the information search unit 220 acquires a video ID of an operation video including operation content similar to the user's operation from the search information database 240 (S203). First, the information search unit 220 accumulates feature data acquired by the user information acquisition unit 210 for a certain period of time. Then, the accumulated feature data is compared with the feature data stored in the search information database 240. Specifically, for example, DP matching can be used. The number of acquired video IDs may be 1 or 2 or more, and may be set in advance or may indicate a video with a predetermined similarity or higher. The acquired video ID is transmitted to the video display unit 230. At this time, a Japanese label and a video file name may be transmitted together with the video ID.

さらに、映像表示部２３０は、情報検索部２２０により取得された映像ＩＤに基づいて、映像ＩＤに対応付けられた動作映像を映像データベース２５０より取得する（Ｓ２０５）。そして、取得された映像は、映像表示部２３０に表示される（Ｓ２０７）。この際、映像表示部２３０が情報検索部２２０から日本語ラベルも受信している場合には、日本語ラベルも映像表示部２３０に表示してもよい。このようにして、検索結果をユーザに提示することができる。 Further, the video display unit 230 acquires an operation video associated with the video ID from the video database 250 based on the video ID acquired by the information search unit 220 (S205). The acquired video is displayed on the video display unit 230 (S207). At this time, when the video display unit 230 also receives a Japanese label from the information search unit 220, the Japanese label may be displayed on the video display unit 230. In this way, the search result can be presented to the user.

以上、第２の実施形態にかかる情報提供装置２００について説明した。本実施形態によれば、手の位置や手の形を、マウスなど入力装置を用いて明示的に選択することなく、ユーザ情報取得部２１０によりユーザ１０の動作を示す特徴データを検索情報として入力することにより、ユーザ１０の動作に類似した動作内容を有する動作映像を容易に検索でき、ユーザ１０に提供することができる。 Heretofore, the information providing apparatus 200 according to the second embodiment has been described. According to the present embodiment, feature data indicating the operation of the user 10 is input as search information by the user information acquisition unit 210 without explicitly selecting the position or shape of the hand using an input device such as a mouse. By doing this, it is possible to easily search for an operation video having an operation content similar to the operation of the user 10 and provide it to the user 10.

さらに、本実施形態にかかる情報提供装置２００は、第１の実施形態と比較して、ユーザの動作を示す特徴データを、映像からではなくユーザ情報取得部２１０により直接取得する。すなわち、第１の実施形態のように画像処理、画像認識時の手の位置、手の形状の抽出処理の精度によって、検索情報データベースから取得された、ユーザへ提供される映像は変動する可能性がある。一方、本実施形態では、このように画像処理、画像認識の精度に依存することなく、一定の精度でユーザの動作を示す特徴データを取得することができる。このため、例えば、ユーザの背景が複雑すぎるために、映像からは精度よくユーザの位置情報等を抽出することが困難な場合にも、画像処理等を行うことなくユーザの動作を示す特徴データを取得することができるので、精度よく検索処理を行うことができる。 Furthermore, the information providing apparatus 200 according to the present embodiment directly acquires the feature data indicating the user's operation by the user information acquisition unit 210, not from the video, as compared with the first embodiment. That is, the video provided to the user acquired from the search information database may vary depending on the accuracy of the image processing, the position of the hand during image recognition, and the accuracy of the hand shape extraction processing as in the first embodiment. There is. On the other hand, in the present embodiment, it is possible to acquire feature data indicating a user's action with a certain accuracy without depending on the accuracy of image processing and image recognition. For this reason, for example, even when it is difficult to accurately extract the user's position information or the like from the video because the user's background is too complex, feature data indicating the user's operation is performed without performing image processing or the like. Since it can be acquired, the search process can be performed with high accuracy.

（第３の実施形態）
次に、図１０〜図１３に基づいて、第３の実施形態にかかる映像提供装置３００について説明する。ここで、図１０は、本実施形態にかかる映像提供装置３００の構成を示すブロック図である。図１１は、平均化画像の生成方法を示すフローチャートである。図１２は、平均化画像の生成方法を説明するための説明図である。図１３は、本実施形態にかかる映像提供装置３００による動作映像の検索処理を示すフローチャートである。 (Third embodiment)
Next, an image providing apparatus 300 according to the third embodiment will be described with reference to FIGS. Here, FIG. 10 is a block diagram showing a configuration of the video providing apparatus 300 according to the present embodiment. FIG. 11 is a flowchart showing a method for generating an averaged image. FIG. 12 is an explanatory diagram for explaining a method of generating an averaged image. FIG. 13 is a flowchart showing the operation video search processing by the video providing apparatus 300 according to the present embodiment.

本実施形態にかかる映像提供装置３００は、図１０に示すように、映像取得部３１０と、映像加工部３２０と、画像検索部３３０と、映像表示部３４０と、平均化画像データベース３５０と、映像データベース３６０とを備える。なお、映像取得部３１０、画像検索部３３０、映像表示部３４０および映像データベース３６０は、第１の実施形態にかかる映像取得部１１０、情報検索部１４０、映像表示部１５０および映像データベース１７０にそれぞれ対応している。このため、同一の機能についてはその詳細な説明を省略する。 As shown in FIG. 10, the video providing apparatus 300 according to the present embodiment includes a video acquisition unit 310, a video processing unit 320, an image search unit 330, a video display unit 340, an averaged image database 350, and a video. A database 360. The video acquisition unit 310, the image search unit 330, the video display unit 340, and the video database 360 correspond to the video acquisition unit 110, the information search unit 140, the video display unit 150, and the video database 170 according to the first embodiment, respectively. is doing. For this reason, the detailed description of the same function is omitted.

映像加工部３２０は、映像取得部３１０により取得された映像に対して加工処理を行い、画像検索部３３０が行う検索処理に用いる形式の画像を作成するための機能部である。本実施形態の映像加工部３２０は、映像取得部３１０により取得された映像から、後述する平均化画像を作成し、平均化画像を検索するための特徴データとして画像検索部３３０へ送信する。 The video processing unit 320 is a functional unit that performs processing on the video acquired by the video acquisition unit 310 and creates an image in a format used for the search processing performed by the image search unit 330. The video processing unit 320 of the present embodiment creates an averaged image described later from the video acquired by the video acquisition unit 310 and transmits it to the image search unit 330 as feature data for searching for the averaged image.

映像表示部３４０は、映像を表示する機能部であり、例えばディスプレイ等を用いることができる。映像表示部３４０には、図５に示すように、映像取得部３１０が取得したユーザ１０の動作の映像や、画像検索部３３０により取得された映像ＩＤに対応する動作映像等が表示され、さらに、ユーザ１０に対して動作を行うべき時間を指示する動作時間指示部１５３を設けることができる。動作時間指示部１５３は、例えば、時間の経過とともに動作時間指示部１５３の領域内を塗りつぶし、または移動して時間経過を示すバーとすることにより実現することができる。このように、ユーザ１０に対して動作指示を行うことにより、映像取得部３１０により取得される動作の開始部分および終了部分を定めることができる。 The video display unit 340 is a functional unit that displays video, and for example, a display or the like can be used. As shown in FIG. 5, the video display unit 340 displays the video of the operation of the user 10 acquired by the video acquisition unit 310, the operation video corresponding to the video ID acquired by the image search unit 330, and the like. An operation time instruction unit 153 that instructs the user 10 to perform an operation can be provided. The operation time instruction unit 153 can be realized by, for example, filling the area of the operation time instruction unit 153 with the passage of time or moving the bar to indicate the passage of time. As described above, by instructing the user 10 to perform an operation, it is possible to determine the start portion and the end portion of the operation acquired by the video acquisition unit 310.

平均化画像データベース３５０は、映像データベース３６０に記憶された動作映像から、後述する平均化画像作成方法により作成された平均化画像を記憶する記憶部である。平均化画像は、映像データベース３６０に記憶された動作映像と同様、動作映像を特定するために関連付けられた映像ＩＤと関連付けて記憶されている。さらに、日本語ラベルも平均化画像に関連付けて記憶してもよい。 The averaged image database 350 is a storage unit that stores an averaged image created from an operation video stored in the video database 360 by an averaged image creation method described later. The averaged image is stored in association with the video ID associated with the motion video, as is the case with the motion video stored in the video database 360. Further, Japanese labels may be stored in association with the averaged image.

このような映像提供装置３００は、映像加工部３２０により映像取得部３１０が取得した映像を加工して作成された平均化画像と、映像データベース３６０の各動作映像について作成された平均化画像とを比較した結果に基づいて、映像データベース３６０からユーザ１０の動作と類似する動作内容を有する動作映像を取得することを特徴とする。そこで、以下に、平均化画像作成処理、および映像提供装置３００による動作映像の検索処理について詳細に説明する。 Such a video providing apparatus 300 includes an averaged image created by processing the video acquired by the video acquisition unit 310 by the video processing unit 320 and an averaged image created for each operation video in the video database 360. Based on the comparison result, an operation image having operation contents similar to the operation of the user 10 is acquired from the image database 360. Therefore, the averaged image creation process and the operation video search process by the video providing apparatus 300 will be described in detail below.

＜１．平均化画像作成処理＞
本実施形態において、平均化画像とは、映像を構成する静止画像の各時間における各座標値の平均により作成される静止画像をいう。平均化画像作成処理では、まず、映像加工部３２０は、映像取得部３１０により取得された映像から複数の静止画像を取得する（Ｓ３０９１）。例えば、図１２に示すような、１つの動作を撮影した映像３５０があるとする。映像加工部３２０は、映像３５０から各時間における静止画像を取り出し、取り出した各静止画像の背景を除去する処理を行う。背景除去がなされた各静止画像は、図１２において、３５０Ｉｍ（＿１〜＿Ｎ、Ｎは正の整数）で表される。 <1. Averaging image creation process>
In the present embodiment, the averaged image refers to a still image created by averaging the coordinate values at each time of still images constituting a video. In the averaged image creation process, first, the video processing unit 320 acquires a plurality of still images from the video acquired by the video acquisition unit 310 (S3091). For example, it is assumed that there is an image 350 obtained by shooting one operation as shown in FIG. The video processing unit 320 extracts a still image at each time from the video 350 and performs a process of removing the background of each extracted still image. Each still image from which the background has been removed is represented by 350 Im (_1 to _N, N is a positive integer) in FIG.

次いで、各静止画像３５０Ｉｍにおける同一座標における画素値を平均化する（Ｓ３０９３）。図１２における各静止画像３５０Ｉｍの同一座標における画素値をＰ（ｘ、ｙ）（＿１〜＿Ｎ、Ｎは正の整数）で表したとすると、これらの画素値の平均（以下、「平均画素値」という。）は、以下の数式１で表される。 Next, the pixel values at the same coordinates in each still image 350Im are averaged (S3093). If the pixel values at the same coordinates of each still image 350Im in FIG. 12 are represented by P (x, y) (_1 to _N, N is a positive integer), the average of these pixel values (hereinafter referred to as “average pixel value”). Is expressed by the following Equation 1.

このように、静止画像３５０Ｉｍを構成する各画素について、平均画素値を算出する。

Thus, the average pixel value is calculated for each pixel constituting the still image 350Im.

その後、ステップＳ３０９３により算出された各画素における平均画素値から平均化画像３６０Ｉｍが作成される（Ｓ３０９５）。作成された平均化画像３６０Ｉｍは、換言すると、映像時間を畳み込んで作成された画像であり、手の位置の変化や手の形状の変化、変化の緩急等の情報を含んでいる。このようにして、時間経過に伴う動作の変化を組み込んだ１つの静止画像（平均化画像）を作成することができる。 Thereafter, an averaged image 360Im is created from the average pixel value in each pixel calculated in step S3093 (S3095). In other words, the created averaged image 360Im is an image created by convolving the video time, and includes information such as a change in the position of the hand, a change in the shape of the hand, and the rate of change. In this way, one still image (averaged image) incorporating a change in operation with time can be created.

＜２．動作映像検索処理＞
次に、図１３に基づいて、本実施形態にかかる映像提供装置３００における動作映像の検索処理について説明する。なお、本実施形態における検索処理の前に、第１の実施形態にて説明した初期設定処理を行ってもよい。 <2. Operation video search processing>
Next, operation video search processing in the video providing apparatus 300 according to the present embodiment will be described with reference to FIG. Note that the initial setting process described in the first embodiment may be performed before the search process in the present embodiment.

まず、映像提供装置３００は、ユーザ１０に対して、動作開始の指示を行う（Ｓ３０１）。動作開始の指示は、例えば図５に示すように、映像表示部３４０に動作時間指示部１５３を設け、動作時間指示部１５３にバーを表示させ始めることにより行うことができる。映像提供装置３００は、動作開始の指示を行った後、映像取得部３１０によるユーザ１０の撮影を開始する（Ｓ３０３）。 First, the video providing apparatus 300 instructs the user 10 to start an operation (S301). For example, as illustrated in FIG. 5, the operation start instruction can be performed by providing an operation time instruction unit 153 in the video display unit 340 and starting to display a bar on the operation time instruction unit 153. After providing the operation start instruction, the video providing apparatus 300 starts shooting of the user 10 by the video acquisition unit 310 (S303).

動作開始の指示を受けたユーザ１０は、動作を開始する（Ｓ３０５）。そして、所定の時間の経過後、映像提供装置３００は、ユーザ１０に対して動作終了の指示を行う（Ｓ３０７）。動作終了の指示は、例えば映像表示部３４０に設けられた動作時間指示部１５３の範囲内がバーにより満たされたことにより行うことができる。映像提供装置３００は、動作終了の指示を行った後、映像取得部３１０によるユーザ１０の撮影を終了する。このようにして取得されたユーザ１０の動作の映像は、映像取得部３１０から映像加工部３２０に送信される。 Receiving the instruction to start the operation, the user 10 starts the operation (S305). Then, after a predetermined time has elapsed, the video providing apparatus 300 instructs the user 10 to end the operation (S307). The operation end instruction can be given, for example, when the range of the operation time instruction unit 153 provided in the video display unit 340 is filled with a bar. The video providing apparatus 300 gives an instruction to end the operation, and then ends the shooting of the user 10 by the video acquisition unit 310. The video of the action of the user 10 acquired in this way is transmitted from the video acquisition unit 310 to the video processing unit 320.

次いで、映像加工部３２０は、受信した映像から平均化画像を作成する（Ｓ３０９）。ステップＳ３０９では、画像検索部３３０による検索処理に用いる平均化画像を、例えば上述の平均化画像作成処理により作成する。なお、平均化画像を取得するアルゴリズムは、上記の例に限定されず、例えば論理和を用いる等、同様の効果が得られる方法を用いてもよい。映像加工部３２０により作成された平均化画像は、画像検索部３３０に送信される。 Next, the video processing unit 320 creates an averaged image from the received video (S309). In step S309, the averaged image used for the search process by the image search unit 330 is generated by the above-described averaged image generation process, for example. The algorithm for acquiring the averaged image is not limited to the above example, and a method that can obtain the same effect, such as using logical sum, may be used. The averaged image created by the video processing unit 320 is transmitted to the image search unit 330.

さらに、画像検索部３３０は、受信した平均化画像に基づいて、平均化画像データベース３５０を検索し、平均化画像と類似する画像と対応する映像ＩＤを取得する（Ｓ３１１）。画像検索部３３０は、ユーザ１０の動作の映像から作成した平均化画像と平均化画像データベース３５０が記憶する動作映像の平均化画像とを比較する。そして、類似度が高いと判定された平均化画像データベース３５０の平均化画像について、この平均化画像に関連付けられた映像ＩＤを取得する。 Further, the image search unit 330 searches the averaged image database 350 based on the received averaged image, and acquires a video ID corresponding to an image similar to the averaged image (S311). The image search unit 330 compares the averaged image created from the motion video of the user 10 with the averaged video of motion video stored in the averaged image database 350. Then, for the averaged image in the averaged image database 350 determined to have a high degree of similarity, a video ID associated with the averaged image is acquired.

ステップＳ３１１の画像検索は、例えば、パターンマッチング処理において一般的に使用される差分総和法、正規化相関法等の既存の方法を用いることができる。画像の類否判断は、例えば、各画像の各座標における画素値の差分の合計値を算出し、その差分の合計値が小さいものをより類似度が高い画像であると判断することができる。そして、画像検索部３３０は、類似度の高い画像を１または２以上選択して、選択した画像に関連付けられた映像ＩＤを取得して、映像表示部３４０に送信する。画像の選択は、例えば、類似度の高いものから所定数だけ選択してもよく、所定の類似度以上の値を有する画像を選択してもよい。 For the image search in step S311, for example, an existing method such as a difference sum method or a normalized correlation method generally used in pattern matching processing can be used. The image similarity determination can be performed by, for example, calculating a total value of pixel value differences at each coordinate of each image, and determining that an image having a smaller difference value is a higher similarity image. Then, the image search unit 330 selects one or more images with high similarity, acquires a video ID associated with the selected image, and transmits the video ID to the video display unit 340. For example, a predetermined number of images may be selected from those having a high degree of similarity, or an image having a value equal to or higher than a predetermined degree of similarity may be selected.

その後、映像表示部３４０は、受信した映像ＩＤに対応する動作映像を映像データベース３６０から取得し、映像表示部３４０に表示する（Ｓ３１５）。この際、映像表示部３４０が画像検索部３３０から日本語ラベルも受信している場合には、日本語ラベルも映像表示部３４０に表示してもよい。このようにして、検索結果をユーザに提示することができる。 Thereafter, the video display unit 340 acquires an operation video corresponding to the received video ID from the video database 360 and displays it on the video display unit 340 (S315). At this time, if the video display unit 340 also receives a Japanese label from the image search unit 330, the Japanese label may be displayed on the video display unit 340. In this way, the search result can be presented to the user.

以上、第３の実施形態にかかる情報提供装置３００について説明した。本実施形態では、映像を検索する際に、映像取得部により取得された映像から、認識処理を用いて特徴データを取得するのではなく、映像加工部３２０によって平均化画像を作成し、映像データベース３６０に記憶された各動作映像について作成された平均化画像と比較して、平均化画像間での類似度検索処理を行うことを特徴とする。 The information providing apparatus 300 according to the third embodiment has been described above. In the present embodiment, when searching for a video, instead of acquiring feature data from the video acquired by the video acquisition unit using recognition processing, an averaged image is created by the video processing unit 320, and a video database is created. Compared with the averaged image created for each motion video stored in 360, similarity search processing between the averaged images is performed.

本実施形態によれば、手の位置や手の形を、マウスなど入力装置を用いて明示的に選択することなく、カメラなどの映像取得部３１０によりユーザ１０の動作を検索情報として入力することにより、ユーザ１０の動作に類似した動作内容を有する動作映像を容易に検索でき、ユーザ１０に提供することができる。 According to the present embodiment, the operation of the user 10 is input as search information by the video acquisition unit 310 such as a camera without explicitly selecting the position or shape of the hand using an input device such as a mouse. Thus, an operation video having an operation content similar to the operation of the user 10 can be easily searched and provided to the user 10.

さらに、映像検索処理において平均化画像を用いて検索することにより、第１の実施形態と比較して、比較するデータ量が少ないため検索処理が単純であり、ハードウェアへの実装、並列化が容易であるため、検索処理を高速化することができる。また、誤差の影響を受け難く、検索結果の制度を高めることもできる。第２の実施形態と比較しても、データグローブ等、ユーザ１０の情報取得のために特別な装置を用意する必要がなく、装置を容易に構成することができる。 Further, by searching using an averaged image in the video search process, the search process is simple because the amount of data to be compared is small compared to the first embodiment, and can be implemented in hardware and parallelized. Since it is easy, the search process can be speeded up. In addition, it is difficult to be affected by errors, and the search result system can be enhanced. Compared to the second embodiment, it is not necessary to prepare a special device for acquiring information of the user 10 such as a data glove, and the device can be easily configured.

（第４の実施形態）
次に、図１４および図１５に基づいて、第４の実施形態にかかる映像提供装置４００について説明する。ここで、図１４は、本実施形態にかかる映像提供装置４００の構成を示すブロック図である。図１５は、本実施形態にかかる映像提供装置４００による動作映像の検索処理を示すフローチャートである。 (Fourth embodiment)
Next, a video providing apparatus 400 according to the fourth embodiment will be described based on FIGS. 14 and 15. Here, FIG. 14 is a block diagram illustrating a configuration of the video providing apparatus 400 according to the present embodiment. FIG. 15 is a flowchart showing operation video search processing by the video providing apparatus 400 according to the present embodiment.

本実施形態にかかる映像提供装置４００は、図１４に示すように、映像取得部４１０と、映像加工部４２０と、画像処理部４３０と、画像認識部４４０と、画像検索部４５０と、映像表示部４６０と、平均化画像データベース４７０と、映像データベース４８０とを備える。なお、映像取得部４１０、画像加工部４２０、画像検索部４５０、映像表示部４６０、平均化画像データベース４７０および映像データベース４８０は、第３の実施形態にかかる映像取得部４１０、画像加工部３２０、画像検索部３３０、映像表示部３４０、平均化画像データベース３５０および映像データベース３６０にそれぞれ対応している。このため、同一の機能についてはその詳細な説明を省略する。 As shown in FIG. 14, the video providing apparatus 400 according to the present embodiment includes a video acquisition unit 410, a video processing unit 420, an image processing unit 430, an image recognition unit 440, an image search unit 450, and a video display. 460, an averaged image database 470, and a video database 480. The video acquisition unit 410, the image processing unit 420, the image search unit 450, the video display unit 460, the averaged image database 470, and the video database 480 are the video acquisition unit 410, the image processing unit 320, and the video processing unit 320 according to the third embodiment. The image search unit 330, the video display unit 340, the averaged image database 350, and the video database 360 correspond to each other. For this reason, the detailed description of the same function is omitted.

本実施形態にかかる映像提供装置４００は、第３の実施形態と比較して、平均化画像データベース４７０を複数に分割し、平均化画像データベース４７０に記憶される平均化画像を、動作中の人物の手の位置等に基づいて大別することを特徴とする。平均化画像データベース４７０は、例えば、動作を行う人物の手が最もよく滞留する位置によって分類することができる。複数の平均化画像データベース４７０には、それぞれを区別するためのデータベースＩＤが付与されている。 Compared with the third embodiment, the video providing apparatus 400 according to the present embodiment divides the averaged image database 470 into a plurality of images, and stores the averaged image stored in the averaged image database 470 as an operating person. It is characterized by the fact that it is roughly divided based on the position of the hand. The averaged image database 470 can be classified, for example, according to the position where the hand of the person performing the action stays best. A plurality of averaged image databases 470 are assigned database IDs for distinguishing each of them.

画像処理部４３０は、映像取得部４１０により取得された映像から、検索すべき平均化画像データベース４７０を特定するために用いられる静止画像を取得する機能部である。本実施形態では、例えば、画像提供装置４００によるユーザ１０への動作開始の指示と動作終了の指示との中間時点における画像を取得する。画像処理部４３０により取得された静止画像は、画像認識部４４０に送信される。 The image processing unit 430 is a functional unit that acquires a still image used to specify the averaged image database 470 to be searched from the video acquired by the video acquisition unit 410. In the present embodiment, for example, an image at an intermediate time point between the operation start instruction and the operation end instruction to the user 10 by the image providing apparatus 400 is acquired. The still image acquired by the image processing unit 430 is transmitted to the image recognition unit 440.

画像認識部４４０は、画像処理部４３０により取得された画像から、検索すべき平均化画像データベース４７０を特定するための画像認識処理を行う機能部である。画像認識部４４０は、平均化画像データベース４７０の分類条件に基づいて画像を認識する。例えば、平均化画像データベース４７０が人物の手の滞留位置により分類されている場合、画像処理部４３０により取得された画像から手の位置を認識し、画像の手の位置と最も類似する条件により分類された平均化画像データベース４７０を決定する。画像認識処理部４４０は、決定された平均化画像データベース４７０のデータベースＩＤを、画像検索部４５０に送信する。 The image recognition unit 440 is a functional unit that performs image recognition processing for specifying the averaged image database 470 to be searched from the images acquired by the image processing unit 430. The image recognition unit 440 recognizes an image based on the classification condition of the averaged image database 470. For example, when the averaged image database 470 is classified based on the stay position of a person's hand, the position of the hand is recognized from the image acquired by the image processing unit 430, and the classification is performed according to the condition most similar to the hand position of the image. The averaged image database 470 is determined. The image recognition processing unit 440 transmits the determined database ID of the averaged image database 470 to the image search unit 450.

このような映像提供装置３００は、映像加工部３２０により映像取得部３１０が取得した映像を加工して作成された平均化画像と類似する平均化画像を検索する際、あらかじめ所定の条件により大別された複数の平均化画像データベース４７０のうち、１つのデータベースについて検索することを特徴とする。以下に、本実施形態にかかる映像提供装置４００による動作映像の検索処理について説明する。なお、第３の実施形態と同様の処理については、詳細な説明を省略する。 When such an image providing apparatus 300 searches for an averaged image similar to the averaged image created by processing the video acquired by the video acquisition unit 310 by the video processing unit 320, it is roughly classified according to predetermined conditions in advance. One of the plurality of averaged image databases 470 is searched for. The operation video search processing by the video providing apparatus 400 according to the present embodiment will be described below. Note that detailed description of processing similar to that of the third embodiment is omitted.

＜動作映像検索処理＞
まず、映像提供装置４００は、ユーザ１０に対して、動作開始の指示を行う（Ｓ４０１）。映像提供装置４００は、動作開始の指示を行った後、映像取得部４１０によるユーザ１０の撮影を開始する（Ｓ４０３）。 <Operation video search processing>
First, the video providing apparatus 400 instructs the user 10 to start an operation (S401). After instructing the operation start, the video providing apparatus 400 starts shooting of the user 10 by the video acquisition unit 410 (S403).

動作開始の指示を受けたユーザ１０は、動作を開始する（Ｓ４０５）。そして、所定の時間の経過後、映像提供装置４００は、ユーザ１０に対して動作終了の指示を行う（Ｓ４０７）。映像提供装置４００は、動作終了の指示を行った後、映像取得部４１０によるユーザ１０の撮影を終了する。このようにして取得されたユーザ１０の動作の映像は、映像取得部４１０から映像加工部４２０および画像処理部４３０に送信される。 Receiving the instruction to start the operation, the user 10 starts the operation (S405). Then, after a predetermined time has elapsed, the video providing apparatus 400 instructs the user 10 to end the operation (S407). The video providing apparatus 400 ends the shooting of the user 10 by the video acquisition unit 410 after giving an instruction to end the operation. The video of the action of the user 10 acquired in this way is transmitted from the video acquisition unit 410 to the video processing unit 420 and the image processing unit 430.

次いで、映像加工部４２０は、受信した映像から平均化画像を作成する（Ｓ４０９）。ステップＳ４０９では、画像検索部３３０による検索処理に用いる平均化画像を、例えば第３の実施形態と同様、平均化画像作成処理により作成する。なお、平均化画像を取得するアルゴリズムは、上記の例に限定されず、同様の効果が得られる方法を用いてもよい。映像加工部４２０により作成された平均化画像は、画像検索部４５０に送信される。 Next, the video processing unit 420 creates an averaged image from the received video (S409). In step S409, an averaged image used for the search process by the image search unit 330 is generated by an averaged image generation process, for example, as in the third embodiment. The algorithm for acquiring the averaged image is not limited to the above example, and a method that can obtain the same effect may be used. The averaged image created by the video processing unit 420 is transmitted to the image search unit 450.

一方、画像処理部４３０は、受信した映像から、検索すべき平均化画像データベース４７０を特定するための静止画像を取得する（Ｓ４１１）。本実施形態では、例えば、画像提供装置４００によるユーザ１０への動作開始の指示と動作終了の指示との中間時点における画像を取得する。そして、画像処理部４３０は、取得された中間時点における画像に対して、認識精度を高める目的で平滑化フィルタを利用したノイズ除去処理等の前処理を行い（Ｓ４１３）、画像認識部４４０に送信する。 On the other hand, the image processing unit 430 acquires a still image for specifying the averaged image database 470 to be searched from the received video (S411). In the present embodiment, for example, an image at an intermediate time point between the operation start instruction and the operation end instruction to the user 10 by the image providing apparatus 400 is acquired. Then, the image processing unit 430 performs preprocessing such as noise removal processing using a smoothing filter on the acquired image at the intermediate time point for the purpose of improving recognition accuracy (S413), and transmits it to the image recognition unit 440. To do.

画像認識部４４０は、受信した中間時点における画像から、手の位置や手の形等の情報を取得する（Ｓ４１５）。例えば、「スカート」という日本語ラベルで表される手話は、体の下側（腹部付近）での手の動きが多い。したがって、画像認識部４４０は、「スカート」を意味する動作を行ったユーザ１０の映像から手の滞留位置は体の下側であることを認識する。手の位置は、例えば手の色、手の重心、手の形状等の情報から認識することができる。そして、画像認識部４４０は、手の滞留位置が主に体の下側である平均化画像を記憶した平均化画像データベース４７０を特定し、特定した平均化画像データベース４７０に付与されたデータベースＩＤを画像検索部４５０に送信する（Ｓ４１７）。 The image recognizing unit 440 acquires information such as the position of the hand and the shape of the hand from the received image at the intermediate time (S415). For example, the sign language represented by the Japanese label “Skirt” has many hand movements under the body (near the abdomen). Therefore, the image recognizing unit 440 recognizes that the staying position of the hand is on the lower side of the body from the image of the user 10 who has performed the operation meaning “skirt”. The position of the hand can be recognized from information such as the color of the hand, the center of gravity of the hand, and the shape of the hand. Then, the image recognition unit 440 specifies the averaged image database 470 that stores the averaged image in which the hand staying position is mainly on the lower side of the body, and the database ID assigned to the specified averaged image database 470 is determined. It transmits to the image search part 450 (S417).

さらに、画像検索部４５０は、受信した平均化画像およびデータベースＩＤに基づいて、平均化画像データベース４７０を検索し、平均化画像と類似する画像と対応する映像ＩＤを取得する（Ｓ４１９）。画像検索部４５０は、受信したデータベースＩＤと関連付けられた平均化画像データベース４７０についてのみ検索する。すなわち、ステップＳ４１７において、手の滞留位置が主に体の下側である平均化画像を記憶した平均化画像データベース４７０が特定された場合には、ステップＳ４１９ではかかる平均化画像データベース４７０のみが検索される。したがって、例えば、体の上側（例えば頭の上部）での手の動きが多い「晴れ」という日本語ラベルを意味する手話の動作映像は、別の平均化画像データベース４７０に記憶されているため、検索対象から除外される。このように、検索対象を絞り込むことが可能となる。 Further, the image search unit 450 searches the averaged image database 470 based on the received averaged image and database ID, and acquires a video ID corresponding to an image similar to the averaged image (S419). The image search unit 450 searches only the averaged image database 470 associated with the received database ID. That is, in step S417, when the averaged image database 470 storing the averaged image in which the hand staying position is mainly on the lower side of the body is specified, only the averaged image database 470 is searched in step S419. Is done. Therefore, for example, a motion image of sign language that means a Japanese label “sunny” with a lot of hand movements on the upper side of the body (for example, the upper part of the head) is stored in another averaged image database 470. Excluded from search. In this way, it is possible to narrow down the search target.

そして、平均化画像と平均化画像データベース４７０が記憶する動作映像の平均化画像とを比較し、類似度の高い平均化画像の映像ＩＤを取得する。ステップＳ４１９における検索処理は、第３の実施形態におけるステップＳ３１３と同様であるので、その詳細は省略する。そして、画像検索部４５０は、取得した映像ＩＤを映像表示部４６０に送信する。 Then, the averaged image and the averaged image of the motion video stored in the averaged image database 470 are compared, and the video ID of the averaged image having a high degree of similarity is acquired. Since the search process in step S419 is the same as that in step S313 in the third embodiment, the details thereof are omitted. Then, the image search unit 450 transmits the acquired video ID to the video display unit 460.

その後、映像表示部４６０は、受信した映像ＩＤに対応する動作映像を映像データベース４８０から取得し（Ｓ４２１）、映像表示部４６０に表示する（Ｓ４２３）。この際、映像表示部４６０が画像検索部４５０から日本語ラベルも受信している場合には、日本語ラベルも映像表示部４６０に表示してもよい。このようにして、検索結果をユーザに提示することができる。 Thereafter, the video display unit 460 acquires an operation video corresponding to the received video ID from the video database 480 (S421), and displays it on the video display unit 460 (S423). At this time, if the video display unit 460 receives a Japanese label from the image search unit 450, the Japanese label may also be displayed on the video display unit 460. In this way, the search result can be presented to the user.

以上、第４の実施形態にかかる情報提供装置４００について説明した。本実施形態によれば、平均化画像データベース４７０を所定の分類条件にしたがって複数に分割し、分割された平均化画像データベース４７０のうち、検索すべき平均化画像データベース４７０を特定して、検索するデータベースの範囲を制限することを特徴とする。 The information providing apparatus 400 according to the fourth embodiment has been described above. According to the present embodiment, the averaged image database 470 is divided into a plurality according to a predetermined classification condition, and the averaged image database 470 to be searched is specified from the divided averaged image database 470 and searched. It is characterized by limiting the range of the database.

本実施形態によれば、手の位置や手の形を、マウスなど入力装置を用いて明示的に選択することなく、カメラなどの映像取得部４１０によりユーザ１０の動作を検索情報として入力することにより、ユーザ１０の動作に類似した動作内容を有する動作映像を容易に検索でき、ユーザ１０に提供することができる。 According to the present embodiment, the operation of the user 10 is input as search information by the video acquisition unit 410 such as a camera without explicitly selecting the position and shape of the hand using an input device such as a mouse. Thus, an operation video having an operation content similar to the operation of the user 10 can be easily searched and provided to the user 10.

さらに、映像検索処理において平均化画像を用いて検索することにより、第１の実施形態と比較して、検索処理が単純であり、ハードウェアへの実装、並列化が容易であるため、検索処理を高速化することができる。また、誤差の影響を受け難く、検索結果の制度を高めることもできる。第２の実施形態と比較しても、データグローブ等、ユーザ１０の情報取得のために特別な装置を用意する必要がなく、装置を容易に構成することができる。そして、検索する平均化画像データベース４７０の範囲を制限することにより、大量の映像を検索する場合にも高速な処理を行うことができる。 Furthermore, by using the averaged image in the video search process, the search process is simpler than the first embodiment, and can be easily implemented in hardware and parallelized. Can be speeded up. In addition, it is difficult to be affected by errors, and the search result system can be enhanced. Compared to the second embodiment, it is not necessary to prepare a special device for acquiring information of the user 10 such as a data glove, and the device can be easily configured. Then, by limiting the range of the averaged image database 470 to be searched, high-speed processing can be performed even when a large amount of videos are searched.

（第５の実施形態）
次に、図１６および図１７に基づいて、第５の実施形態にかかる映像提供装置５００について説明する。ここで、図１６は、本実施形態にかかる映像提供装置５００の構成を示すブロック図である。図１７は、本実施形態にかかる映像提供装置５００による画面操作処理を示すフローチャートである。 (Fifth embodiment)
Next, a video providing apparatus 500 according to the fifth embodiment will be described with reference to FIGS. 16 and 17. Here, FIG. 16 is a block diagram illustrating a configuration of the video providing apparatus 500 according to the present embodiment. FIG. 17 is a flowchart showing screen operation processing by the video providing apparatus 500 according to the present embodiment.

本実施形態にかかる映像提供装置５００は、画面表示部５６０に表示された映像の選択等を、ユーザ１０の動作によって行うことができる。すなわち、映像検索処理により検索された結果に対して、ユーザ１０がインタラクティブに操作することができるように制御部５７０を備えることを特徴とする。以下では、第４の実施形態にかかる映像提供装置４００に対して、上記機能を備えた映像提供装置５００について説明するが、第１〜第３の実施形態にかかる映像提供装置１００、２００、３００に備えることも可能である。 The video providing apparatus 500 according to the present embodiment can select a video displayed on the screen display unit 560 by the operation of the user 10. That is, the control unit 570 is provided so that the user 10 can interactively operate the result searched by the video search process. In the following, the video providing apparatus 500 having the above functions is described with respect to the video providing apparatus 400 according to the fourth embodiment. However, the video providing apparatuses 100, 200, and 300 according to the first to third embodiments are described. It is also possible to prepare for.

本実施形態にかかる映像提供装置５００は、図１６に示すように、映像取得部５１０と、映像加工部５２０と、画像処理部５３０と、画像認識部５４０と、画像検索部５５０と、映像表示部５６０と、制御部５７０と、平均化画像データベース５８０と、映像データベース５９０とを備える。なお、映像取得部５１０、映像加工部５２０、画像処理部５３０、画像認識部５４０、画像検索部５５０、映像表示部５６０、平均化画像データベース５８０および映像データベース５９０は、第４の実施形態にかかる映像取得部４１０、映像加工部４２０、画像処理部４３０、画像認識部４４０、画像検索部４５０、映像表示部４６０、平均化画像データベース４７０および映像データベース４８０にそれぞれ対応している。このため、同一の機能についてはその詳細な説明を省略する。 As shown in FIG. 16, the video providing apparatus 500 according to the present embodiment includes a video acquisition unit 510, a video processing unit 520, an image processing unit 530, an image recognition unit 540, an image search unit 550, and a video display. Unit 560, control unit 570, averaged image database 580, and video database 590. The video acquisition unit 510, the video processing unit 520, the image processing unit 530, the image recognition unit 540, the image search unit 550, the video display unit 560, the averaged image database 580, and the video database 590 are related to the fourth embodiment. The video acquisition unit 410, the video processing unit 420, the image processing unit 430, the image recognition unit 440, the image search unit 450, the video display unit 460, the averaged image database 470, and the video database 480 correspond to each. For this reason, the detailed description of the same function is omitted.

制御部５７０は、画像認識部５４０により認識されたユーザ１０の手の位置に応じて、映像表示部５６０に表示される映像を制御する機能部である。例えば、特定の領域内にユーザ１０の手が位置しているときに、その領域内にある画像を拡大したり、再度表示したり、表示を中止したりする等といった処理を行う。 The control unit 570 is a functional unit that controls the video displayed on the video display unit 560 in accordance with the position of the hand of the user 10 recognized by the image recognition unit 540. For example, when the hand of the user 10 is located in a specific area, processing such as enlarging an image in the area, displaying it again, or canceling display is performed.

このような機能を備える映像提供装置５００の画面操作処理は、図１７に示すように行われる。図１７に示す画面操作処理は、図１５に示す動作映像の検索処理の後に行われる。 The screen operation process of the video providing apparatus 500 having such a function is performed as shown in FIG. The screen operation process shown in FIG. 17 is performed after the operation video search process shown in FIG.

＜画面操作処理＞
まず、画像処理部５３０は、映像取得部５１０により取得された映像に対して前処理を行い、画像を作成する（Ｓ５０１）。そして、画像処理部５３０は、取得された中間時点における画像に対して、認識精度を高める目的で平滑化フィルタを利用したノイズ除去処理等の前処理を行い、画像認識部５４０に送信する。 <Screen operation processing>
First, the image processing unit 530 performs preprocessing on the video acquired by the video acquisition unit 510 to create an image (S501). Then, the image processing unit 530 performs preprocessing such as noise removal processing using a smoothing filter on the acquired image at the intermediate time point for the purpose of improving recognition accuracy, and transmits the processed image to the image recognition unit 540.

次いで、画像認識部５４０は、受信した画像から、手の位置を認識する（Ｓ５０３）。認識された手の位置情報は、画像認識部５４０から制御部５７０に送信される。 Next, the image recognition unit 540 recognizes the position of the hand from the received image (S503). The recognized hand position information is transmitted from the image recognition unit 540 to the control unit 570.

手の位置情報を受信した制御部５７０は、映像表示部５６０の表示を変更するかを判定する（Ｓ５０５）。例えば、手の位置が、ｘｙ座標平面において特定の領域内にあるかについて判定する。ここで、特定の領域内とは、例えば操作したい対象のある領域とすることができ、具体的には、拡大したい画像の領域や、再度表示するための表示ボタンの位置する領域等とすることができる。 The control unit 570 that has received the hand position information determines whether to change the display of the video display unit 560 (S505). For example, it is determined whether the position of the hand is within a specific area in the xy coordinate plane. Here, the specific area can be, for example, an area with an object to be operated, specifically, an area of an image to be enlarged, an area where a display button for redisplaying is located, or the like. Can do.

ステップＳ５０５において、手の位置が特定の領域内とあると判定された場合、手の位置に応じた制御命令（コマンド）を映像表示部５６０へ送信する（Ｓ５０７）。例えば、拡大したい画像の領域に手が位置している場合には、画像を拡大する命令が映像表示部５６０に送信される。そして、制御命令を受信した映像表示部５６０は、命令にしたがって映像表示部５６０における映像表示方法を変更して、再表示する（Ｓ５０９）。このようにして、映像表示部５６０に表示される映像の表示方法を変更することができる。 If it is determined in step S505 that the position of the hand is within the specific area, a control command (command) corresponding to the position of the hand is transmitted to the video display unit 560 (S507). For example, when the hand is positioned in the area of the image to be enlarged, a command for enlarging the image is transmitted to the video display unit 560. Then, the video display unit 560 that has received the control command changes the video display method in the video display unit 560 according to the command, and re-displays (S509). In this way, the display method of the video displayed on the video display unit 560 can be changed.

以上、第５の実施形態にかかる映像提供装置５００について説明した。本実施形態では、映像の検索処理機能を備える映像提供装置に対して、ユーザの手の位置に応じて画面の表示を操作することができるように制御部５７０を備えることを特徴とする。これにより、ユーザ１０は、インタラクティブに映像提供装置５００を操作することができるので、映像の検索と映像の表示変更との動作を途切れることなく行うことができる。 The video providing apparatus 500 according to the fifth embodiment has been described above. The present embodiment is characterized in that a control unit 570 is provided for a video providing apparatus having a video search processing function so that screen display can be operated according to the position of the user's hand. As a result, the user 10 can interactively operate the video providing apparatus 500, and thus can perform the video search and video display change operations without interruption.

以上、添付図面を参照しながら本発明の好適な実施形態について説明したが、本発明は係る例に限定されないことは言うまでもない。当業者であれば、特許請求の範囲に記載された範疇内において、各種の変更例または修正例に想到し得ることは明らかであり、それらについても当然に本発明の技術的範囲に属するものと了解される。 As mentioned above, although preferred embodiment of this invention was described referring an accompanying drawing, it cannot be overemphasized that this invention is not limited to the example which concerns. It will be apparent to those skilled in the art that various changes and modifications can be made within the scope of the claims, and these are naturally within the technical scope of the present invention. Understood.

例えば、上記実施形態において、手話の映像を用いて説明したが、本発明はかかる例に限定されず、記号に対応付けることの可能な、意味を有する映像であれば適用可能である。このような映像としては、例えば、バレエやダンス、ヨガ、手旗信号等の映像が考えられる。 For example, in the above embodiment, the description has been made using the sign language video, but the present invention is not limited to such an example, and any video having a meaning that can be associated with a symbol is applicable. As such an image, for example, an image such as ballet, dance, yoga, or a flag signal can be considered.

また、上記実施形態において、検索に使用するユーザの動作映像は、完全な情報として映像取得部により取得する必要はなく、モザイク処理や中間値処理等の方法により画素数や画素値のビット数を減らした映像であってもよい。 In the above embodiment, the user's motion video used for the search does not need to be acquired as complete information by the video acquisition unit, and the number of pixels and the number of bits of the pixel value are determined by a method such as mosaic processing or intermediate value processing. It may be a reduced image.

さらに、上記実施形態において、映像加工部は、映像取得部により取得された映像から実際に静止画像を作成する必要はなく、必要な画素のみについて情報を取得するようにしてもよい。 Furthermore, in the above-described embodiment, the video processing unit does not need to actually create a still image from the video acquired by the video acquisition unit, and may acquire information only for necessary pixels.

また、第４の実施形態において、画像処理部４３０により取得した静止画像に対して、画像認識部４４０は画像認識処理を行ったが、本発明はかかる例に限定されず、例えば、映像加工部４２０により生成された平均化画像に対して画像認識処理を行ってもよい。 In the fourth embodiment, the image recognition unit 440 performs image recognition processing on the still image acquired by the image processing unit 430. However, the present invention is not limited to such an example. For example, the image processing unit Image recognition processing may be performed on the averaged image generated by 420.

本発明は、映像提供装置に適用可能であり、特に、データベースから映像を検索して提供する映像提供装置に適用可能である。 The present invention can be applied to an image providing apparatus, and in particular, can be applied to an image providing apparatus that retrieves and provides an image from a database.

本発明の第１の実施形態にかかる映像提供装置の構成を示すブロック図である。It is a block diagram which shows the structure of the image | video provision apparatus concerning the 1st Embodiment of this invention. 手の位置情報を説明するための説明図である。It is explanatory drawing for demonstrating the positional information on a hand. 検索情報データベースに記憶される情報の例を示す説明図である。It is explanatory drawing which shows the example of the information memorize | stored in a search information database. 手の形を説明するための説明図である。It is explanatory drawing for demonstrating the shape of a hand. 映像表示部の映像表示例を示す説明図である。It is explanatory drawing which shows the example of a video display of a video display part. 初期設定の処理を示すフローチャートである。It is a flowchart which shows the process of initial setting. 同実施形態にかかる映像提供装置による動作映像の検索処理を示すフローチャートである。It is a flowchart which shows the search process of the operation | movement video by the video provision apparatus concerning the embodiment. 本発明の第２の実施形態にかかる映像提供装置の構成を示すブロック図である。It is a block diagram which shows the structure of the video provision apparatus concerning the 2nd Embodiment of this invention. 同実施形態にかかる映像提供装置による動作映像の検索処理を示すフローチャートである。It is a flowchart which shows the search process of the operation | movement video by the video provision apparatus concerning the embodiment. 本発明の第３の実施形態にかかる映像提供装置の構成を示すブロック図である。It is a block diagram which shows the structure of the image | video provision apparatus concerning the 3rd Embodiment of this invention. 平均化画像の生成方法を示すフローチャートである。It is a flowchart which shows the production | generation method of an average image. 平均化画像の生成方法を説明するための説明図である。It is explanatory drawing for demonstrating the production | generation method of an averaged image. 同実施形態にかかる映像提供装置による動作映像の検索処理を示すフローチャートである。It is a flowchart which shows the search process of the operation | movement video by the video provision apparatus concerning the embodiment. 本発明の第４の実施形態にかかる映像提供装置の構成を示すブロック図である。It is a block diagram which shows the structure of the video provision apparatus concerning the 4th Embodiment of this invention. 同実施形態にかかる映像提供装置による動作映像の検索処理を示すフローチャートである。It is a flowchart which shows the search process of the operation | movement video by the video provision apparatus concerning the embodiment. 本発明の第５の実施形態にかかる映像提供装置の構成を示すブロック図である。It is a block diagram which shows the structure of the video provision apparatus concerning the 5th Embodiment of this invention. 同実施形態にかかる映像提供装置による画面操作処理を示すフローチャートである。It is a flowchart which shows the screen operation process by the video provision apparatus concerning the embodiment.

Explanation of symbols

１００、２００、３００、４００、５００映像提供装置
１１０、３１０、４１０、５１０映像取得部
１２０、４３０、５３０画像処理部
１３０、４４０、５４０画像認識部
１４０、２２０情報検索部
１５０、２３０、３４０、４６０、５６０映像表示部
１５３動作時間指示部
１６０、２４０検索情報データベース
１７０、２５０、３６０、４８０、５９０映像データベース
２１０ユーザ情報取得部
３２０、４２０、５２０映像加工部
３３０、４５０、５５０画像検索部
３５０、４７０、５８０平均化画像データベース
３６０Ｉｍ平均値画像
５７０制御部 100, 200, 300, 400, 500 Video providing device 110, 310, 410, 510 Video acquisition unit 120, 430, 530 Image processing unit 130, 440, 540 Image recognition unit 140, 220 Information search unit 150, 230, 340, 460, 560 Video display unit 153 Operating time instruction unit 160, 240 Search information database 170, 250, 360, 480, 590 Video database 210 User information acquisition unit 320, 420, 520 Video processing unit 330, 450, 550 Image search unit 350 470, 580 Averaged image database 360 Im Average value image 570 Control unit

Claims

An image providing device that provides an action image corresponding to the movement of the body representing a specific concept:
A video storage unit for storing a plurality of motion images and feature data characterizing each motion image;
A user feature data acquisition unit for acquiring feature data from the movement of the user's body;
Video search for comparing the feature data acquired by the user feature data acquisition unit with the feature data stored in the video storage unit and searching for an operation video similar to the motion of the user's body from the video storage unit Part;
A video providing apparatus comprising:

The user feature data acquisition unit acquires a plurality of still images at a predetermined time from one motion video,
The video providing apparatus according to claim 1, wherein a position of a specific part of the body in the still image is extracted as feature data from the acquired plurality of still images.

The video providing apparatus according to claim 1, wherein the user feature data acquisition unit extracts the shape of a specific part of the body as feature data from the plurality of acquired still images.

The user feature data acquisition unit
A video acquisition unit that acquires the motion of the user's body as a video;
A feature data extraction unit that extracts feature data characterizing the movement of the user's body from the acquired video;
The video providing apparatus according to claim 1, further comprising:

The user feature data acquisition unit
The video providing apparatus according to any one of claims 1 to 3, wherein the video providing apparatus is a data glove capable of acquiring characteristic data characterizing the movement of the user's body when worn by the user.

The video search unit performs a similarity determination between the feature data acquired by the user feature data acquisition unit and the feature data stored in the video storage unit using DP matching. The video provision apparatus in any one of 1-5.

The video providing apparatus according to claim 1, wherein the user feature data acquisition unit generates and acquires one still image from one motion video as the feature data.

The user feature data acquisition unit
A video acquisition unit that acquires the motion of the user's body as a video;
A video processing unit that creates the still image from the video acquired by the video acquisition unit;
With
The video search unit compares the still image created by the video processing unit with the still image stored in the video storage unit, and obtains an operation video similar to the motion of the user's body from the video storage unit. The video providing apparatus according to claim 7, wherein the video providing apparatus is acquired.

The video processing unit extracts a plurality of still images from one video acquired by the video acquisition unit,
For each pixel of the extracted still images, the pixel value is averaged to calculate an average pixel value,
9. The video providing apparatus according to claim 7, wherein one averaged image is created from the calculated average pixel value of each pixel.

An image recognition unit for recognizing a region where a specific part of the user's body is located from the video acquired by the video acquisition unit;
The video storage unit classifies and stores the motion video according to a region where a specific part of a person's body is located in the motion video,
The said image recognition part determines only the operation | movement video which belongs to a specific classification | category as a search object among the operation | movement videos memorize | stored in the said image | video memory | storage part based on the recognized area | region. 10. The video providing apparatus according to any one of 9 above.

8. The video search unit according to claim 7, wherein the similarity between the feature data acquired by the user feature data acquisition unit and the feature data stored in the video storage unit is determined by a pattern matching process. 10. The video providing device according to any one of 10.

The apparatus further comprises a video display unit capable of displaying at least one of a user's body movement acquired by the user characteristic data acquisition unit or an operation video acquired by the video acquisition unit. The video providing device according to claim 1.

The video providing apparatus according to claim 12, wherein the video display unit includes an operation time display unit indicating a time during which the user performs an operation.

The video according to claim 12, further comprising a control unit that controls display of video on the video display unit in accordance with the feature data acquired by the user feature data acquisition unit. Providing device.

The video providing apparatus according to claim 1, wherein the motion video is a sign language expressing a specific meaning.

The video providing apparatus according to claim 15, wherein the video display unit displays a video representing one or more sign language words.