JP2004153765A

JP2004153765A - Meta-data production apparatus and production method

Info

Publication number: JP2004153765A
Application number: JP2002319757A
Authority: JP
Inventors: Mitsuru Yasukata; 満安方; Masaaki Kobayashi; 正明小林; Kenji Matsui; 謙二松井; Hiroyasu Kuwano; 裕康桑野; Mitsuru Endo; 充遠藤; Hiroyuki Sakai; 啓行酒井; Masafumi Shimotashiro; 雅文下田代
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 2002-11-01
Filing date: 2002-11-01
Publication date: 2004-05-27

Abstract

<P>PROBLEM TO BE SOLVED: To provide a meta-data production apparatus and production method for reducing labor when imparting meta-data associated with still picture contents in the production of the contents. <P>SOLUTION: The meta-data production apparatus and production method is provided with an audio input means, an audio recognizing means and a meta-data creating means, information associated with the contents is inputted by the audio input means, and the inputted audio signal is recognized by the audio recognizing means and converted to meta-data by the meta-data creating means. The apparatus is equipped with an address imparting means for inputting still picture address information imparted to the still picture contents and the meta-data as meta-data with address information, and a still picture contents / meta-data recording means for recording the produced still picture contents and meta-data with the address information. <P>COPYRIGHT: (C)2004,JPO

Description

【０００１】
【発明の属する技術分野】
本発明は、静止画コンテンツ制作におけるメタデータ制作システム及び方法に関するものである。
【０００２】
【従来の技術】
近年、静止画コンテンツの制作において、これらコンテンツに関連したメタデータの付与することがおこなわれている。
【０００３】
しかしながら、上記メタデータの付与は、制作された静止画コンテンツのシナリオあるいはナレーション原稿をもとに、手作業でコンピュータ入力することにより制作する方法が一般的であり、相当な労力の必要な方法であった。
【０００４】
【特許文献１】
特開平９−１３０７３６号公報
【０００５】
【発明が解決しようとする課題】
本願発明は、上記従来の問題点に係る課題を解決することを目的とするものであって、静止画コンテンツの制作時あるいは、静止画コンテンツを制作した直後更には、制作時とは無関係な時刻において上記制作された静止画コンテンツを再生することによりメタデータとすべき情報を、音声入力でコンピュータに入力することによりメタデータを制作するシステム及び方法を提供することを目的とする。
【０００６】
【課題を解決するための手段】
上記課題を解決するために本願発明は、制作された静止画コンテンツの再生手段、上記再生手段で再生された静止画映像信号を表示する映像モニタ手段、上記映像モニタ手段でオペレータが確認した制作すべきメタデータ内容をオペレータの発声によりマイクで収録する音声入力手段、上記音声入力手段により入力された音声信号を認識する音声認識手段、音声認識手段で認識された音声情報をメタデータに変換することによりメタデータを生成するメタデータ生成手段、および上記静止画コンテンツと上記メタデータとを関連づけるため、上記静止画コンテンツに付与されている静止画番地情報と上記メタデータとを入力し番地情報付メタデータとする番地付与手段、上記制作された静止画コンテンツおよび上記番地情報付メタデータとを記録する静止画コンテンツ・メタデータ記録手段とを備えたものである。
【０００７】
これにより、従来キーボードで入力し、制作していたメタデータを、音声認識を用いて音声入力することにより極めて容易に制作することが可能となる。
【０００８】
【発明の実施の形態】
本発明の請求項１に係る発明は、静止画コンテンツに関連するメタデータの制作装置であって、音声入力手段と、音声認識手段と、メタデータ制作手段とを具備し、上記静止画コンテンツに関連した情報を上記音声入力手段により入力し、上記入力された音声信号を上記音声認識手段にて認識し、認識したデータを上記メタデータ制作手段によりメタデータに変換し、かつ、上記静止画コンテンツとともに上記メタデータを、記録手段を用いて記録することを特徴とするメタデータ制作装置である。
【０００９】
本発明の請求項２に係る発明は、コンテンツに関連するメタデータの制作装置であって、音声入力手段と音声認識手段とメタデータ制作手段と上記静止画コンテンツに関連した辞書とを具備し、上記静止画コンテンツに関連した情報を上記音声入力手段により入力し、上記入力された音声信号を上記音声認識手段にて上記コンテンツに関連した辞書に関連付けて認識し、かつ、上記静止画コンテンツとともに上記メタデータを、記録手段を用いて記録することを特徴とするメタデータ制作装置である。
【００１０】
本発明の請求項３に係る発明は、上記請求項１ないし上記請求項２に係る発明のメタデータ制作装置であって、上記静止画コンテンツに付与されている番地情報と上記メタデータとを入力し、番地情報付メタデータを生成する番地付与手段を備え、上記静止画コンテンツと生成された上記メタデータとを関連付けることを特徴とする請求項１または２のいずれかに記載のメタデータ制作装置である。
【００１１】
以下、本発明の実施の形態について図面を用いて説明する。
（実施の形態１）
図１は、本発明の実施の形態１によるメタデータ制作装置の構成を示すブロック図である。図１において、１はカメラ、２は静止画コンテンツ記録および番地付与・記録手段、３は静止画コンテンツ・メタデータ記録手段、４は静止画コンテンツ・メタデータ再生手段、５は静止画コンテンツ・メタデータ表示手段、６はマイク、７は音声認識手段、８はメタデータ生成手段、９は番地付与手段、１０は辞書である。
カメラ１で撮影された静止画コンテンツは、静止画コンテンツ記録および番地付与・記録手段２に供給される。ここで、上記静止画コンテンツは記録媒体（図示せず）に記録されるとともに番地が付与され、上記番地も上記記録媒体（図示せず）に記録される。なお、上記記録媒体は、半導体メモリで構成されるのが一般的であるが、半導体メモリに限定されず、たとえば、磁気メモリ、光記録媒体、光磁気記録媒体など、種々の記録媒体を用いることが可能である。ここで記録された静止画コンテンツは出力端子２０１、入力端子３０１を介して、静止画コンテンツ・メタデータ記録手段に供給される。同様に、番地情報も出力端子２０２、入力端子３０２を介して、上記静止画コンテンツ・メタデータ記録手段３に供給される。さらに上記番地情報は上記出力端子２０２、入力端子９０２を介してメタデータ番地付与手段９（後出）に供給される。
一方、上記カメラ１で撮影された静止画に関連する情報がマイク６を介して音声認識手段７に入力される。上記静止画に関連する情報は、たとえば、タイトル、撮影の日時、撮影者、撮影場所（どこ）、被撮影者（だれ）、被撮影物（なに）・・・など、撮影されたコンテンツに関連する情報である。
また、必要に応じて、音声認識用の辞書１０も、音声認識手段７に供給される。音声認識手段７で認識された、音声データはメタデータ生成手段８に供給され、メタデータあるいはタグに変換される。なお、一般には、メタデータとは、このようなタグ［タイトル、撮影の日時、撮影者、撮影場所（どこ）、被撮影者（だれ）、被撮影物（なに）・・・］の集合体を意味するものである。このようにして、生成されたメタデータあるいはタグは、静止画コンテンツ自身の内容あるいはシーンとの関係を一致させるため、メタデータへの番地付与手段７に供給される。上記メタデータへの番地付与手段７にて、出力端子２０２、入力端子９０３を介して供給された番地情報が上記メタデータに付与される。この様にして番地の付与されたメタデータは、出力端子９０３、入力端子３０３を介して、上記静止画コンテンツ・メタデータ記録手段３に供給される。
静止画コンテンツ・メタデータ記録手段３では、同一の番地の静止画コンテンツと同一の番地のメタデータとが関連付けらて記録される。
より具体的に説明するため、上記静止画コンテンツ・メタデータ記録手段３にて記録された静止画コンテンツおよびメタデータを静止画コンテンツ・メタデータ再生手段４にて再生し、静止画コンテンツ・メタデータ表示手段５にて表示された結果の一例を、図２に示す。
図２において、静止画コンテンツ・メタデータ表示手段５の画面は、たとえば静止画コンテンツの表示部５０１、番地の表示部５０２およびメタデータの表示領域５１０で構成される。メタデータの表示領域５１０は、たとえば１）タイトルの記述部５１１、２）日時の記述部５１２、３）撮影者の記述部５１３、４）撮影場所の記述部５１４、・・・・などで構成される。これら、１）タイトルの記述部５１１、２）日時の記述部５１２、３）撮影者の記述部５１３、４）撮影場所の記述部５１４、・・・・などが、上述した音声認識により生成されたメタデータである。
上述した説明では、メタデータの生成を、静止画コンテンツの撮影の事前、撮影とほぼ同時期あるいは、撮影の直後など、必ずしも、撮影された静止画コンテンツの確認を必要としなかった場合についての説明である。
次には、たとえば静止画コンテンツの後付として、メタデータの生成を行うなど、静止画コンテンツを再生し、モニタ手段でモニタした上記静止画コンテンツに対し、メタデータを生成する場合について図３を用いて説明する。なお、図１と同様の機能については、その説明を省略する。
カメラ１で撮影された静止画コンテンツは、静止画コンテンツ記録および番地付与・記録手段２に供給される。ここで、上記静止画コンテンツは記録媒体（図示せず）に記録されるとともに番地が付与され、上記番地も上記記録媒体（図示せず）に記録される。このような記録媒体（図示せず）を、静止画コンテンツ・番地再生手段１１に供給する。静止画コンテンツ・番地再生手段１１で再生された静止画コンテンツは、モニタ手段１２に供給される。同様に再生された番地情報は、出力端子１１２、入力端子９０２を介して、メタデータの番地付与手段９に供給される。マイク６に向かって、音声入力する担当者（図示せず）は、上記モニタ手段１２に映出された静止画コンテンツを確認のうえ、メタデータ生成に必要な言葉を発声する。このようにして、上記カメラ１で撮影された静止画に関連する情報がマイク６を介して音声認識手段７に入力される。上記静止画に関連する情報は、たとえば、タイトル、撮影の日時、撮影者、撮影場所（どこ）、被撮影者（だれ）、被撮影物（なに）・・・など、撮影されたコンテンツに関連する情報である。これ以降は、図１の説明と同様である。
【００１２】
なお、一般的には、音声認識には何らかの誤認識が生じる可能性がある。誤認識が生じた場合、制作されたメタデータ、タグをコンピュータ手段などの情報処理手段を用いて修正することは可能である。
【００１３】
【発明の効果】
以上説明したように発明は、静止画コンテンツに関連したメタデータの作成あるいはタグ付けを行うに当たり、音声入力による音声認識を用い、且つ、上記メタデータあるいはタグと静止画コンテンツとの番地あるいはシーンとの関連付けを行うため、従来のようなキーボード入力と比較して効率的にメタデータの作成やタグ付けを実施することができる。
【図面の簡単な説明】
【図１】本発明の実施形態１に係るメタデータ制作装置の構成を示すブロック図
【図２】本発明の静止画コンテンツ・メタデータ表示手段の一例を示す図
【図３】本発明の実施形態２に係るメタデータ制作装置の構成を示すブロック図
【符号の説明】
１カメラ
２静止画コンテンツ記録および番地付与手段
３静止画コンテンツ・メタデータ記録手段
４静止画コンテンツ・メタデータ再生手段
５静止画コンテンツ・メタデータ表示手段
６マイク
７音声認識手段
８メタデータ生成手段
９メタデータ番地付与手段
１０辞書
１１静止画コンテンツ・番地再生手段
１２モニタ手段
１１１静止画コンテンツ出力端子
１１２番地出力端子
２０１静止画コンテンツ出力端子
２０２番地出力端子
３０１映像入力端子
３０２番地入力端子
３０３メタデータ・番地入力端子
５０１静止画コンテンツ表示手段
５０２番地表示手段
５１０メタデータ表示領域
５１１タイトル表示領域
５１２日時表示領域
５１３撮影者表示領域
５１４場所表示領域
９０２番地入力端子
９０３メタデータ・番地出力端子[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a metadata production system and method for producing still image content.
[0002]
[Prior art]
In recent years, in the production of still image contents, metadata related to these contents has been added.
[0003]
However, the method of adding the metadata is generally a method of manually inputting a computer based on a scenario or a narration manuscript of the produced still image content, and is a method requiring considerable labor. there were.
[0004]
[Patent Document 1]
JP-A-9-130736
[Problems to be solved by the invention]
An object of the present invention is to solve the problems related to the above-described conventional problems, and is to be performed at the time of producing still image content or immediately after producing still image content, and further, at a time irrelevant to the production time. It is an object of the present invention to provide a system and a method for producing metadata by inputting information to be metadata by reproducing the produced still image content into a computer by voice input.
[0006]
[Means for Solving the Problems]
In order to solve the above-mentioned problems, the present invention provides a reproducing means for a produced still image content, a video monitor means for displaying a still image video signal reproduced by the reproducing means, and a production method which is confirmed by an operator with the video monitor means. Voice input means for recording the metadata content to be recorded by a microphone by an operator, voice recognition means for recognizing the voice signal input by the voice input means, and converting voice information recognized by the voice recognition means into metadata. Metadata generating means for generating metadata according to the above, and in order to associate the still image content with the metadata, input the still image address information and the metadata given to the still image content, The address assigning means as data, the produced still image content and the metadata with address information It is obtained by a still picture content metadata recording means for recording.
[0007]
As a result, it is possible to extremely easily produce metadata that has been conventionally input and produced using a keyboard by voice input using voice recognition.
[0008]
BEST MODE FOR CARRYING OUT THE INVENTION
The invention according to claim 1 of the present invention is an apparatus for producing metadata related to still image content, comprising: a voice input unit, a voice recognition unit, and a metadata production unit, and Related information is input by the voice input means, the input voice signal is recognized by the voice recognition means, the recognized data is converted to metadata by the metadata production means, and the still image content In addition, the metadata production apparatus records the metadata using a recording unit.
[0009]
An invention according to claim 2 of the present invention is an apparatus for producing metadata related to content, comprising: a voice input unit, a voice recognition unit, a metadata production unit, and a dictionary related to the still image content, Information related to the still image content is input by the voice input unit, the input voice signal is recognized by the voice recognition unit in association with a dictionary related to the content, and the voice signal is recognized together with the still image content. A metadata production apparatus characterized in that metadata is recorded using a recording unit.
[0010]
According to a third aspect of the present invention, there is provided the metadata producing apparatus according to the first or second aspect, wherein the address information and the metadata assigned to the still image content are input. 3. The metadata producing apparatus according to claim 1, further comprising an address assigning means for generating address information-added metadata, wherein the still image content is associated with the generated metadata. It is.
[0011]
Hereinafter, embodiments of the present invention will be described with reference to the drawings.
(Embodiment 1)
FIG. 1 is a block diagram showing a configuration of a metadata production device according to Embodiment 1 of the present invention. In FIG. 1, 1 is a camera, 2 is a still image content recording and address assigning / recording unit, 3 is a still image content / metadata recording unit, 4 is a still image content / metadata reproducing unit, and 5 is a still image content / metadata. Data display means, 6 is a microphone, 7 is voice recognition means, 8 is metadata generation means, 9 is address assignment means, and 10 is a dictionary.
The still image content shot by the camera 1 is supplied to a still image content recording and address assigning / recording unit 2. Here, the still image content is recorded on a recording medium (not shown) and assigned an address, and the address is also recorded on the recording medium (not shown). The recording medium is generally constituted by a semiconductor memory, but is not limited to a semiconductor memory. For example, various recording media such as a magnetic memory, an optical recording medium, and a magneto-optical recording medium may be used. Is possible. The still image content recorded here is supplied to the still image content / metadata recording unit via the output terminal 201 and the input terminal 301. Similarly, address information is also supplied to the still image content / metadata recording means 3 via the output terminal 202 and the input terminal 302. Further, the address information is supplied to the metadata address assigning means 9 (described later) via the output terminal 202 and the input terminal 902.
On the other hand, information related to a still image captured by the camera 1 is input to the voice recognition unit 7 via the microphone 6. Information related to the still image includes, for example, title, date and time of photographing, photographer, photographing place (where), photographed person (who), photographed object (what), and the like. Related information.
Further, a dictionary 10 for voice recognition is also supplied to the voice recognition means 7 as needed. The voice data recognized by the voice recognition means 7 is supplied to the metadata generation means 8 and converted into metadata or tags. In general, metadata is a set of such tags [title, date and time of photographing, photographer, photographing place (where), photographed person (who), photographed object (what) ...]. Means the body. The generated metadata or tag is supplied to the metadata address assigning means 7 in order to match the content of the still image content itself or the relationship with the scene. The address information supplied via the output terminal 202 and the input terminal 903 is assigned to the metadata by the address assigning means 7 for the metadata. The metadata to which the address is assigned in this manner is supplied to the still image content / metadata recording unit 3 via the output terminal 903 and the input terminal 303.
The still image content / metadata recording means 3 records the still image content at the same address and the metadata at the same address in association with each other.
In order to explain this more specifically, the still image content and metadata recorded by the still image content / metadata recording unit 3 are reproduced by the still image content / metadata reproducing unit 4 and the still image content / metadata is reproduced. FIG. 2 shows an example of the result displayed on the display unit 5.
In FIG. 2, the screen of the still image content / metadata display unit 5 includes, for example, a still image content display unit 501, an address display unit 502, and a metadata display area 510. The metadata display area 510 includes, for example, 1) a title description section 511, 2) a date / time description section 512, 3) a photographer description section 513, 4) a shooting location description section 514, and so on. Is done. The 1) title description section 511, 2) date / time description section 512, 3) photographer description section 513, 4) shooting location description section 514,... Are generated by the above-described speech recognition. Metadata.
In the above description, the generation of the metadata is not necessarily required to confirm the photographed still image content, such as before photographing of the still image content, almost at the same time as the photographing, or immediately after the photographing. It is.
Next, FIG. 3 shows a case where the still image content is reproduced, for example, metadata is generated as a postscript of the still image content, and the metadata is generated for the still image content monitored by the monitor means. It will be described using FIG. The description of the same functions as those in FIG. 1 is omitted.
The still image content shot by the camera 1 is supplied to a still image content recording and address assigning / recording unit 2. Here, the still image content is recorded on a recording medium (not shown) and assigned an address, and the address is also recorded on the recording medium (not shown). Such a recording medium (not shown) is supplied to the still image content / address reproduction means 11. The still image content reproduced by the still image content / address reproduction unit 11 is supplied to the monitor unit 12. Similarly, the reproduced address information is supplied to the metadata address assigning means 9 via the output terminal 112 and the input terminal 902. The person in charge of voice input (not shown) toward the microphone 6 confirms the still image content displayed on the monitor means 12 and speaks words necessary for generating metadata. In this way, information related to the still image captured by the camera 1 is input to the voice recognition means 7 via the microphone 6. Information related to the still image includes, for example, title, date and time of photographing, photographer, photographing place (where), photographed person (who), photographed object (what), and the like. Related information. Subsequent steps are the same as the description of FIG.
[0012]
In general, some erroneous recognition may occur in voice recognition. If misrecognition occurs, it is possible to correct the produced metadata and tags using information processing means such as computer means.
[0013]
【The invention's effect】
As described above, the present invention uses voice recognition by voice input to create or tag metadata related to still image content, and uses the metadata or tag and the address or scene of the still image content. , It is possible to create and tag metadata more efficiently as compared with a conventional keyboard input.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a configuration of a metadata producing apparatus according to a first embodiment of the present invention; FIG. 2 is a diagram showing an example of a still image content / metadata display means of the present invention; FIG. FIG. 2 is a block diagram illustrating a configuration of a metadata production device according to a second embodiment.
DESCRIPTION OF SYMBOLS 1 Camera 2 Still image content recording and address assigning means 3 Still image content / metadata recording means 4 Still image content / metadata reproducing means 5 Still image content / metadata display means 6 Microphone 7 Voice recognition means 8 Metadata generating means 9 Metadata address assigning means 10 Dictionary 11 Still picture content / address reproducing means 12 Monitoring means 111 Still picture content output terminal 112 Address output terminal 201 Still picture content output terminal 202 Address output terminal 301 Video input terminal 302 Address input terminal 303 Metadata Address input terminal 501 Still image content display means 502 Address display means 510 Metadata display area 511 Title display area 512 Date and time display area 513 Photographer display area 514 Location display area 902 Address input terminal 903 Metadata / address output terminal

Claims

An apparatus for producing metadata related to still image content,
A voice input unit, a voice recognition unit, and a metadata production unit;
The information related to the still image content is input by the audio input unit, the input audio signal is recognized by the audio recognition unit, and the recognized data is converted into metadata by the metadata production unit. A metadata producing apparatus, wherein each address of the still image content is associated with the metadata.

An apparatus for producing metadata related to still image content,
Comprising a voice input means, a voice recognition means, a metadata production means, and a dictionary related to the still image content,
Information related to the still image content is input by the audio input unit, the input audio signal is recognized by the voice recognition unit in association with a dictionary related to the still image content, and the recognized data is converted to the meta data. A metadata producing apparatus, wherein data is converted into metadata by data producing means, and each address of the still image content is associated with the metadata.

A method for producing metadata related to still image content,
Using voice input means, voice recognition means, and metadata production means,
The information relating to the still image content is input by the audio input means, the input audio signal is recognized by the audio recognition means, and converted into metadata by the metadata producing means, Wherein each address of the metadata is associated with the metadata.

A method for producing metadata related to still image content,
Using voice input means, voice recognition means, metadata production means and a dictionary related to the content,
Information related to the content is input by the voice input unit, the input voice signal is recognized by the voice recognition unit in association with a dictionary related to the content, and converted to metadata by the metadata production unit. And associating each address of the still image content with the metadata.