JP6857983B2

JP6857983B2 - Metadata generation system

Info

Publication number: JP6857983B2
Application number: JP2016165100A
Authority: JP
Inventors: 孝利石井
Original assignee: Jcc株式会社; Ｊｃｃ株式会社
Priority date: 2016-08-25
Filing date: 2016-08-25
Publication date: 2021-04-14
Anticipated expiration: 2036-08-25
Also published as: JP2018033048A

Description

本発明は、メタデータを生成するシステムに関し、特にテレビ放送番組又はインターネット配信動画に関するメタデータを生成するシステムに関するものである。 The present invention relates to a system that generates metadata, and more particularly to a system that generates metadata related to a television broadcast program or an Internet-distributed moving image.

従来より、テレビ放送番組又はインターネット配信動画に関するメタデータの重要性が高まってきている。メタデータとは、あるデータそのものではなく、そのデータに関連する情報のことである。データの作成日時や作成者、データ形式、タイトル、注釈などが考えられる。データを効率的に管理したり検索したりするために重要な情報である。
例えば、本件特許出願人は、過去において、テレビ放送局が放送するテレビ放送番組を録画する録画手段と、前記録画手段により録画された映像に対応させ番組内容を要約したメタデータを格納するメタデータ格納手段と、画面上に前記メタデータを表示させること
ができるディスプレイ手段とを備え、ユーザーが画面上に表示されたメタデータを視認して適宜選択することにより、当該メタデータに対応する映像を画面上に表示させて視認できるように構成された映像システムに関する発明を出願して特許を取得している（特許文献１）。 Traditionally, the importance of metadata related to television broadcast programs or Internet-distributed videos has increased. Metadata is not the data itself, but the information associated with that data. The date and time when the data was created, the creator, the data format, the title, the annotation, etc. can be considered. This is important information for efficient management and retrieval of data.
For example, the patent applicant has in the past a recording means for recording a TV broadcast program broadcast by a TV broadcasting station, and metadata for storing metadata summarizing the program contents corresponding to the video recorded by the recording means. A storage means and a display means capable of displaying the metadata on the screen are provided, and the user visually recognizes the metadata displayed on the screen and appropriately selects the metadata to obtain an image corresponding to the metadata. We have applied for and obtained a patent for an invention relating to a video system configured to be displayed on a screen so that it can be visually recognized (Patent Document 1).

しかしながら、テレビ放送番組に関するメタデータは、人間の手によって作成されることが一般的であり、時間とコストとがかかっていた。また、一度作成されたメタデータは、当該番組に限って利用されることが一般的であるため、同じような情報を繰り返し利用することも難しく、効率も良くないという不具合があった。
前記事情は、テレビ放送番組に限らず、急速に実用化が進んだインターネット配信動画に関しても存在するため、インターネット配信動画に関するメタデータについても同様の不具合があった。 However, metadata about television broadcast programs is generally created by humans, which is time consuming and costly. Further, since the metadata once created is generally used only for the program, it is difficult to repeatedly use the same information, and there is a problem that the efficiency is not good.
Since the above situation exists not only for TV broadcast programs but also for Internet-distributed videos that have been rapidly put into practical use, there is a similar problem with metadata related to Internet-distributed videos.

特許第４２２７８６６号Patent No. 4227866

本発明は、以上のような従来の不具合を解決するためのものであって、その課題は、テレビ放送番組又はインターネット配信動画に関するメタデータを短時間で作成し、人的コストを削減することができるシステムを提供することにある。 The present invention is for solving the above-mentioned conventional problems, and the problem is to create metadata about a television broadcast program or an Internet-distributed video in a short time and reduce human costs. It is to provide a system that can do it.

前記課題を解決するために、請求項１に記載の発明にあっては、映像を録画する録画ファイルを有する録画手段と、前記録画ファイルに録画された映像に表示された文字情報を取得する文字情報取得手段と、前記文字情報取得手段によって取得された前記文字情報を集約して文章化する文字情報文章化手段と、前記文字情報文章化手段によって文章化された前記文字情報を前記録画ファイルに録画された映像のメタデータとしてメタデータ格納ファイルに格納するメタデータ格納手段とを備え、前記文字情報取得手段は、前記録画ファイルに録画された映像に対して画像解析を行い、前記映像から文字情報を抽出する文字情報抽出手段と、
前記映像に含まれる人物、ロゴ、前記人物の持ち物又は前記人物の表情と、人物情報、ロゴ情報、物情報又は表情情報とを照合し、前記映像に含まれる人物、ロゴ、前記人物の持ち物又は前記人物の表情を文字情報として抽出する映像認識情報抽出手段と、
前記録画ファイルに録画された映像と共に録音された音声に対して音声解析を行い、前記音声から文字情報を抽出する音声情報抽出手段と、前記文字情報抽出手段、前記映像認識情報抽出手段、及び、前記音声情報抽出手段によって、夫々、抽出された文字情報を互いに照合する複合情報照合手段とを有することを特徴とする。 In order to solve the above problem, in the invention according to claim 1, a recording means having a recording file for recording an image and a character for acquiring character information displayed on the image recorded in the recording file. The information acquisition means, the character information writing means for aggregating and writing the character information acquired by the character information acquisition means, and the character information written by the character information writing means in the recording file. It is provided with a metadata storage means for storing the metadata of the recorded video in a metadata storage file, and the character information acquisition means performs image analysis on the video recorded in the recording file and characters from the video. Character information extraction means for extracting information and
The person, logo, belongings of the person or the facial expression of the person included in the video are collated with the person information, logo information, physical information or facial expression information, and the person, logo, belongings of the person or the facial expression information included in the video are collated. A video recognition information extraction means that extracts the facial expression of the person as character information,
A voice information extraction means that performs voice analysis on the voice recorded together with the video recorded in the recording file and extracts character information from the voice, the character information extraction means, the video recognition information extraction means, and Each of the voice information extracting means has a compound information collating means for collating the extracted character information with each other.

ここで、文字情報とは、映像に表示され、映像に関連する単語、文章の情報であって、例えば、映像に表示されたテロップの文字列を含む概念である。
従って、前記録画手段によって、前記録画ファイルに映像が録画された場合には、前記文字情報取得手段によって、前記録画ファイルに録画された前記映像に表示された文字情報が取得され、前記文字情報文章化手段によって、取得された前記文字情報が文章化され、前記メタデータ格納手段によって、文章化された前記文字情報が前記映像のメタデータとして前記メタデータ格納ファイルに格納される。
また、前記録画手段によって、前記録画ファイルに映像が録画された場合には、前記文字情報抽出手段によって、前記録画ファイルに録画された前記映像が画像解析されることにより前記映像から文字情報が抽出され、前記映像認識情報抽出手段によって、前記映像に含まれる人物、ロゴ、前記人物の持ち物又は前記人物の表情と、人物情報、ロゴ情報、物情報又は表情情報とが照合され、前記映像に含まれる人物、ロゴ、前記人物の持ち物又は前記人物の表情が文字情報として抽出され、前記音声情報抽出手段によって、前記録画ファイルに録画された前記映像と共に録音された前記音声が音声解析されることにより前記音声から文字情報が抽出され、前記複合情報照合手段によって、前記文字情報抽出手段、前記映像認識情報抽出手段、及び、前記音声情報抽出手段によって、夫々、抽出された文字情報が互いに照合される。 Here, the character information is information on words and sentences displayed on the video and related to the video, and is a concept including, for example, a character string of the telop displayed on the video.
Therefore, when an image is recorded in the recorded file by the recording means, the character information displayed in the image recorded in the recording file is acquired by the character information acquisition means, and the character information text is obtained. The acquired character information is documented by the conversion means, and the documented character information is stored in the metadata storage file as metadata of the video by the metadata storage means.
When a video is recorded in the recorded file by the recording means, the text information extraction means extracts the text information from the video by image-analyzing the video recorded in the recording file. Then, the image recognition information extraction means collates the person, logo, belongings of the person or the facial expression of the person with the person information, logo information, object information or facial expression information included in the image, and includes the person, the logo information, the object information or the facial expression information. The person, logo, belongings of the person, or facial expression of the person are extracted as character information, and the voice recorded together with the video recorded in the recording file is voice-analyzed by the voice information extraction means. Character information is extracted from the voice, and the extracted character information is collated with each other by the compound information collating means, the character information extracting means, the video recognition information extracting means, and the voice information extracting means, respectively. ..

請求項２に記載の発明にあっては、前記文字情報取得手段は、前記録画ファイルに録画された映像に対して画像解析を行い、前記映像から文字情報を抽出する文字情報抽出手段と、前記文字情報抽出手段によって抽出された前記文字情報を辞書ファイルと照合する辞書照合手段とを有することを特徴とする。
ここで、辞書ファイルには、各国の言語に関する文字、熟語を有する辞書データが照合可能に含まれている。
従って、前記文字情報抽出手段によって、前記録画ファイルに録画された前記映像が画像解析されることにより前記映像から文字情報が抽出され、前記辞書照合手段によって、抽出された前記文字情報が前記辞書ファイルと照合される。 In the invention according to claim 2, the character information acquisition means includes a character information extraction means that performs image analysis on a video recorded in the recording file and extracts character information from the video, and the character information extraction means. It is characterized by having a dictionary collating means for collating the character information extracted by the character information extracting means with a dictionary file.
Here, the dictionary file includes dictionary data having characters and idioms related to each country's language so that they can be collated.
Therefore, the character information is extracted from the video by image-analyzing the video recorded in the recording file by the character information extracting means, and the character information extracted by the dictionary collating means is the dictionary file. Is matched with.

請求項３に記載に発明にあっては、前記文字情報抽出手段は、画像解析済みの映像と、
前記画像解析済みの映像から抽出された文字情報とを有する画像解析蓄積ファイルと照合して画像解析する画像解析手段を有することを特徴とする。
ここで、画像解析済みの映像とは、これまでに画像解析された映像を意味し、前記画像解析済みの映像から抽出された文字情報とは、画像解析された結果、正しく前記映像から抽出された文字情報を意味する。
従って、前記画像解析手段によって、前記録画ファイルに録画された前記映像が、画像解析済みの映像と、前記画像解析済みの映像から抽出された文字情報とを有する画像解析蓄積ファイルと照合されることにより、画像解析される。 In the invention described in claim 3, the character information extraction means includes an image that has been image-analyzed and an image.
It is characterized by having an image analysis means for collating with an image analysis storage file having character information extracted from the image-analyzed video and performing image analysis.
Here, the image-analyzed video means the video that has been image-analyzed so far, and the character information extracted from the image-analyzed video is correctly extracted from the video as a result of the image analysis. Means character information.
Therefore, the image analysis means collates the image recorded in the recorded file with the image analysis storage file having the image-analyzed image and the character information extracted from the image-analyzed image. Is image-analyzed.

請求項４に記載の発明にあっては、前記文字情報抽出手段は、前記画像解析手段によって画像解析された映像と、前記映像から抽出された文字情報とに基づいて、前記画像解析蓄積ファイルを修正する画像解析学習手段をさらに有することを特徴とする。
ここで、修正は追加、削除を含む概念である。
従って、前記画像解析学習手段によって、前記画像解析蓄積ファイルが前記画像解析手段によって画像解析された映像と、前記映像から抽出された文字情報とに基づいて修正される。 In the invention according to claim 4, the character information extracting means obtains the image analysis storage file based on the image analyzed by the image analysis means and the character information extracted from the image. It is characterized by further having an image analysis learning means for modifying.
Here, modification is a concept including addition and deletion.
Therefore, the image analysis learning means modifies the image analysis storage file based on the image analyzed by the image analysis means and the character information extracted from the image.

請求項５に記載の発明にあっては、前記文字情報取得手段は、前記録画ファイルに録画された映像に対して画像解析を行い、前記映像から文字情報を抽出する文字情報抽出手段と、前記文字情報抽出手段によって抽出された前記文字情報をインターネットにより検索し取得された情報と照合するインターネット照合手段とを有することを特徴とする。
ここで、インターネットにより検索し取得された情報とは、大手新聞社、地方新聞社、ニュース配信会社、テレビ会社等のサイト、ニュース専門サイト、ニュースまとめサイト、その他一般のウェブサイトから取得される情報や、オンライン辞書等から取得される用語解説に関する情報を含む概念である。
従って、前記文字情報抽出手段によって、前記録画ファイルに録画された前記映像が画像解析されることにより前記映像から文字情報が抽出され、前記インターネット照合手段によって、抽出された前記文字情報がインターネットにより検索され取得された情報と照合される。 In the invention according to claim 5, the character information acquisition means includes a character information extraction means that performs image analysis on a video recorded in the recording file and extracts character information from the video, and the character information extraction means. It is characterized by having an Internet collation means that searches the character information extracted by the character information extraction means on the Internet and collates it with the acquired information.
Here, the information obtained by searching on the Internet is information obtained from sites such as major newspapers, local newspapers, news distribution companies, TV companies, news specialized sites, news summary sites, and other general websites. It is a concept that includes information on glossary obtained from online dictionaries and the like.
Therefore, the character information is extracted from the video by image analysis of the video recorded in the recording file by the character information extracting means, and the character information extracted by the Internet collation means is searched by the Internet. It is collated with the acquired information.

請求項６に記載の発明にあっては、前記文字情報取得手段は、前記文字情報抽出手段によって抽出された文字情報に基づいて、前記辞書ファイルを修正する辞書更新手段をさらに有することを特徴とする。
ここで、修正は追加、削除を含む概念である。
従って、前記辞書更新手段によって、前記辞書ファイルが前記文字情報抽出手段によって抽出された前記文字情報に基づいて修正される。 The invention according to claim 6 is characterized in that the character information acquisition means further includes a dictionary update means for modifying the dictionary file based on the character information extracted by the character information extraction means. To do.
Here, modification is a concept including addition and deletion.
Therefore, the dictionary updating means modifies the dictionary file based on the character information extracted by the character information extracting means.

請求項７に記載の発明にあっては、前記辞書ファイルは、辞書データと、前記辞書データの頻度パラメータとを有し、前記辞書照合手段は、前記頻度パラメータの大きい辞書データを照合対象として優先的に選択することを特徴とする。
ここで、頻度パラメータとは、辞書データに含まれる単語、熟語等が映像にどのような頻度で表示されているかを表すパラメータである。具体的には、前記辞書照合手段が、前記文字情報抽出手段によって映像から抽出された文字情報を辞書ファイルと照合する毎に前記頻度パラメータを更新する。
従って、前記辞書照合手段によって、前記頻度パラメータの大きい前記辞書データが照合対象として優先的に選択され、選択された前記辞書データと、前記文字情報抽出手段によって抽出された前記文字情報とが照合される。 In the invention according to claim 7, the dictionary file has dictionary data and a frequency parameter of the dictionary data, and the dictionary collating means gives priority to dictionary data having a large frequency parameter as a collation target. It is characterized by selecting the target.
Here, the frequency parameter is a parameter indicating how often words, idioms, etc. included in the dictionary data are displayed in the video. Specifically, the frequency parameter is updated every time the dictionary collating means collates the character information extracted from the video by the character information extracting means with the dictionary file.
Therefore, the dictionary collating means preferentially selects the dictionary data having a large frequency parameter as a collation target, and the selected dictionary data is collated with the character information extracted by the character information extracting means. To.

請求項８に記載の発明にあっては、前記文字情報文章化手段は、前記メタデータ格納ファイルを参照し、前記文字情報取得手段によって取得された前記文字情報に関連するメタ
データを前記文字情報の文章化に利用することを特徴とする。
従って、前記文字情報文章化手段は、前記文字情報取得手段によって取得された前記文字情報を集約して文章化する際に、前記メタデータ格納ファイルを参照して、前記文字情報に関連する作成済のメタデータを前記文字情報の文章化に利用することができる。 In the invention according to claim 8, the character information writing means refers to the metadata storage file, and the metadata related to the character information acquired by the character information acquisition means is used as the character information. It is characterized by being used for writing.
Therefore, when the character information documenting means aggregates and documents the character information acquired by the character information acquisition means, the character information documenting means has been created in relation to the character information by referring to the metadata storage file. The metadata of the above can be used for writing the character information.

請求項９に記載の発明にあっては、前記文字情報文章化手段は、前記録画ファイルに録画された映像の電子番組表データを取得し、前記文字情報の文章化に利用することを特徴とする。
ここで、電子番組表データとは、テレビ放送局が放送する放送番組映像やインターネットによって配信される動画映像の放送日時、配信日時、ジャンル、タイトル、出演者等の情報が含まれたデータである。
従って、前記文字情報文章化手段は、前記文字情報取得手段によって取得された前記文字情報を集約して文章化する際に、前記映像の電子番組表データを取得して、前記文字情報の文章化に利用することができる。 The invention according to claim 9 is characterized in that the character information writing means acquires electronic program guide data of a video recorded in the recording file and uses it for writing the character information. To do.
Here, the electronic program guide data is data including information such as the broadcast date / time, distribution date / time, genre, title, performer, etc. of the broadcast program video broadcast by the television broadcasting station or the video video distributed via the Internet. ..
Therefore, when the character information documenting means aggregates and documents the character information acquired by the character information acquisition means, the character information documenting means acquires the electronic program guide data of the video and documents the character information. Can be used for.

請求項１０に記載の発明にあっては、前記文字情報取得手段は、前記映像に含まれる人物、ロゴ、前記人物の持ち物又は前記人物の表情と、人物情報、ロゴ情報、物情報又は表情情報とを照合し、前記映像に含まれる人物、ロゴ、前記人物の持ち物又は前記人物の表情を文字情報として抽出する映像認識情報抽出手段を有することを特徴とする。
従って、前記映像認識情報抽出手段によって、前記映像に含まれる人物、ロゴ、前記人物の持ち物又は前記人物の表情と、人物情報、ロゴ情報、物情報又は表情情報とが照合され、前記映像に含まれる人物、ロゴ、前記人物の持ち物又は前記人物の表情が文字情報として抽出される。 In the invention according to claim 10, the character information acquisition means includes a person, a logo, a property of the person or a facial expression of the person, and personal information, logo information, physical information or facial expression information. It is characterized by having a video recognition information extraction means for extracting a person, a logo, a property of the person, or a facial expression of the person included in the video as character information.
Therefore, the image recognition information extracting means collates the person, logo, belongings of the person, or the facial expression of the person with the person information, logo information, object information, or facial expression information, and includes the person, logo, object information, or facial expression information. The person, the logo, the belongings of the person, or the facial expression of the person are extracted as character information.

請求項１１に記載に発明にあっては、前記人物情報、前記ロゴ情報、前記物情報又は前記表情情報は、画像解析済みの映像と、前記画像解析済みの映像から抽出された文字情報とにより構成されていることを特徴とする。
ここで、画像解析済みの映像とは、これまでに画像解析された映像を意味し、前記画像解析済みの映像から抽出された文字情報とは、画像解析された結果、正しく前記映像から抽出された文字情報を意味する。
従って、前記映像認識情報抽出手段によって、前記録画ファイルに録画された前記映像が、画像解析済みの映像と、前記画像解析済みの映像から抽出された文字情報とを有する前記人物情報、前記ロゴ情報、前記物情報又は前記表情情報と照合されることにより、前記映像に含まれる人物、ロゴ、前記人物の持ち物又は前記人物の表情が文字情報として抽出される。 In the invention according to claim 11, the person information, the logo information, the object information, or the facial expression information is based on an image-analyzed image and character information extracted from the image-analyzed image. It is characterized by being configured.
Here, the image-analyzed video means the video that has been image-analyzed so far, and the character information extracted from the image-analyzed video is correctly extracted from the video as a result of the image analysis. Means character information.
Therefore, the person information and the logo information in which the video recorded in the recording file by the video recognition information extracting means has an image-analyzed video and character information extracted from the image-analyzed video. By collating with the object information or the facial expression information, the person, the logo, the possession of the person, or the facial expression of the person included in the video is extracted as character information.

請求項１２に記載の発明にあっては、前記文字情報取得手段は、前記映像認識情報抽出手段によって画像解析された映像と、前記映像から抽出された文字情報とに基づいて、前記人物情報、前記ロゴ情報、前記物情報又は前記表情情報を修正する映像認識学習手段をさらに有することを特徴とする。
ここで、修正は追加、削除を含む概念である。
従って、前記映像認識学習手段によって、前記人物情報、前記ロゴ情報、前記物情報又は前記表情情報が、前記映像認識情報抽出手段によって画像解析された映像と、前記映像から抽出された文字情報とに基づいて修正される。 In the invention according to claim 12, the character information acquisition means obtains the person information based on the image analyzed by the image recognition information extraction means and the character information extracted from the image. It is characterized by further having an image recognition learning means for modifying the logo information, the object information, or the facial expression information.
Here, modification is a concept including addition and deletion.
Therefore, the image recognition learning means has image-analyzed the person information, the logo information, the object information, or the facial expression information by the image recognition information extraction means, and the character information extracted from the image. It will be corrected based on.

請求項１３に記載の発明にあっては、前記文字情報取得手段は、前記録画ファイルに録画された映像と共に録音された音声に対して音声解析を行い、前記音声から文字情報を抽出する音声情報抽出手段と、前記音声情報抽出手段によって抽出された前記文字情報を辞書ファイルと照合する辞書照合手段とを有することを特徴とする。
従って、前記音声情報抽出手段によって、前記録画ファイルに録画された前記映像と共
に録音された前記音声が音声解析されることにより前記音声から文字情報が抽出され、前記辞書照合手段によって、抽出された前記文字情報が前記辞書ファイルと照合される。 In the invention according to claim 13, the character information acquisition means performs voice analysis on the voice recorded together with the video recorded in the recording file, and extracts the character information from the voice. It is characterized by having an extraction means and a dictionary collation means for collating the character information extracted by the voice information extraction means with a dictionary file.
Therefore, the voice information extraction means extracts character information from the voice by voice analysis of the voice recorded together with the video recorded in the recording file, and the extracted voice is extracted by the dictionary collation means. The character information is collated with the dictionary file.

請求項１４に記載に発明にあっては、前記音声情報抽出手段は、音声解析済みの音声と、前記音声解析済みの音声から抽出された文字情報とを有する音声解析蓄積ファイルと照合して音声解析する音声解析手段を有することを特徴とする。
ここで、音声解析済みの音声とは、これまでに音声解析された音声を意味し、前記音声解析済みの音声から抽出された文字情報とは、音声解析された結果、正しく前記音声から抽出された文字情報を意味する。
従って、前記音声解析手段によって、前記録画ファイルに録画された前記映像と共に録音された前記音声が、音声解析済みの音声と、前記音声解析済みの音声から抽出された文字情報とを有する音声解析蓄積ファイルと照合されることにより、音声解析される。 In the invention described in claim 14, the voice information extracting means collates the voice analyzed voice with a voice analysis storage file having character information extracted from the voice analyzed voice, and the voice is voiced. It is characterized by having a voice analysis means for analysis.
Here, the voice that has been voice-analyzed means the voice that has been voice-analyzed so far, and the character information extracted from the voice that has been voice-analyzed is correctly extracted from the voice as a result of voice analysis. Means textual information.
Therefore, the voice recorded by the voice analysis means together with the video recorded in the recording file has the voice analyzed voice and the character information extracted from the voice analyzed voice. Voice analysis is performed by collating with the file.

請求項１５に記載の発明にあっては、前記文字情報取得手段は、前記音声解析手段によって音声解析された音声と、前記音声から抽出された文字情報とに基づいて、前記音声解析蓄積ファイルを修正する音声解析学習手段をさらに有することを特徴とする。 In the invention according to claim 15, the character information acquisition means obtains the voice analysis storage file based on the voice analyzed by the voice analysis means and the character information extracted from the voice. It is characterized by further having a voice analysis learning means for modifying.

ここで、修正は追加、削除を含む概念である。Here, modification is a concept including addition and deletion.
従って、前記音声解析学習手段によって、前記音声解析蓄積ファイルが前記音声解析手段によって音声解析された音声と、前記音声から抽出された文字情報とに基づいて修正される。Therefore, the voice analysis learning means modifies the voice analysis storage file based on the voice analyzed by the voice analysis means and the character information extracted from the voice.

請求項１６に記載の発明にあっては、前記映像は、テレビ放送局が放送する放送番組映像であることを特徴とする。The invention according to claim 16 is characterized in that the video is a broadcast program video broadcast by a television broadcasting station.

請求項１７に記載の発明にあっては、前記映像は、インターネットによって配信される動画映像であることを特徴とする。The invention according to claim 17 is characterized in that the video is a moving image distributed via the Internet.

請求項１〜１７に記載のメタデータ生成システムにあっては、前記録画手段によって、前記録画ファイルに映像が録画された場合には、前記文字情報取得手段によって、前記録
画ファイルに録画された前記映像に表示された文字情報が取得され、前記文字情報文章化手段によって、取得された前記文字情報が文章化され、前記メタデータ格納手段によって、文章化された前記文字情報が前記映像のメタデータとして前記メタデータ格納ファイルに格納されるので、前記映像に表示され、前記映像に関連する単語、文章の情報である前記文字情報から前記映像のメタデータを精度良く自動作成することができる。
その結果、テレビ放送番組又はインターネット配信動画に関するメタデータを短時間で作成し、人的コストを削減することができるシステムを提供することができる。 In the metadata generation system according to claims 1 to 17 , when a video is recorded in the recording file by the recording means, the character information acquisition means records the video in the recording file. The character information displayed on the video is acquired, the acquired character information is documented by the character information documenting means, and the character information documented by the metadata storage means is the metadata of the image. Since it is stored in the metadata storage file, the metadata of the video can be accurately and automatically created from the character information which is displayed in the video and is information on words and sentences related to the video.
As a result, it is possible to provide a system capable of creating metadata about a television broadcast program or an Internet-distributed video in a short time and reducing human costs.

また、請求項１に記載のメタデータ生成システムにあっては、前記録画手段によって、前記録画ファイルに映像が録画された場合には、前記文字情報抽出手段によって、前記録画ファイルに録画された前記映像が画像解析されることにより前記映像から文字情報が抽出され、前記映像認識情報抽出手段によって、前記映像に含まれる人物、ロゴ、前記人物の持ち物又は前記人物の表情と、人物情報、ロゴ情報、物情報又は表情情報とが照合され、前記映像に含まれる人物、ロゴ、前記人物の持ち物又は前記人物の表情が文字情報として抽出され、前記音声情報抽出手段によって、前記録画ファイルに録画された前記映像と共に録音された前記音声が音声解析されることにより前記音声から文字情報が抽出され、前記複合情報照合手段によって、前記文字情報抽出手段、前記映像認識情報抽出手段、及び、前記音声情報抽出手段によって、夫々、抽出された文字情報が互いに照合される。Further, in the metadata generation system according to claim 1, when a video is recorded in the recording file by the recording means, the character information extraction means records the video in the recording file. Character information is extracted from the video by image analysis of the video, and the person, logo, belongings of the person or facial expression of the person, person information, logo information included in the video are extracted by the video recognition information extraction means. , Object information or facial expression information is collated, and the person, logo, personal belongings of the person or facial expression of the person included in the video are extracted as character information, and recorded in the recording file by the audio information extraction means. Character information is extracted from the voice by analyzing the voice recorded together with the video, and the character information extracting means, the video recognition information extracting means, and the voice information extraction by the composite information collating means. By means, the extracted character information is collated with each other.
従って、画像解析、音声解析、及び、前記映像に含まれる人物、ロゴ、前記人物の持ち物又は前記人物の表情から効率よく前記文字情報を抽出できる。Therefore, the character information can be efficiently extracted from image analysis, voice analysis, and the person, logo, belongings of the person, or facial expression of the person included in the video.
また、前記複合情報照合手段によって、前記文字情報抽出手段、前記映像認識情報抽出手段、及び、前記音声情報抽出手段によって、夫々、抽出された文字情報が互いに照合されるので、例えば、前記文字情報抽出手段によって誤認識したり、完全に認識することが出来なかったりした文字や単語を、前記音声情報抽出手段によって抽出された文字情報に基づいて修正することができる。Further, since the extracted character information is collated with each other by the compound information collating means, the character information extracting means, the video recognition information extracting means, and the audio information extracting means, for example, the character information. Characters and words that are erroneously recognized by the extraction means or cannot be completely recognized can be corrected based on the character information extracted by the voice information extraction means.
その結果、テレビ放送番組又はインターネット配信動画に関するメタデータをより精度良く効率的に自動生成することが出来るシステムを提供することができる。As a result, it is possible to provide a system capable of automatically generating metadata related to a television broadcast program or an Internet-distributed moving image more accurately and efficiently.

請求項２に記載のメタデータ生成システムにあっては、前記文字情報抽出手段によって、前記録画ファイルに録画された前記映像が画像解析されることにより前記映像から文字情報が抽出され、前記辞書照合手段によって、抽出された前記文字情報が前記辞書ファイルと照合される。In the metadata generation system according to claim 2, the character information extraction means extracts the character information from the video by image-analyzing the video recorded in the recording file, and collates the dictionary. By the means, the extracted character information is collated with the dictionary file.
従って、画像解析によって効率よく前記映像から前記文字情報を抽出できると共に、前記文字情報が前記辞書ファイルと照合されることにより、例えば、画像解析によって誤認識したり、完全に認識することが出来なかったりした文字や単語を前記辞書ファイルに基づいて修正し、前記文字情報の精度を高めることができる。Therefore, the character information can be efficiently extracted from the video by image analysis, and the character information is collated with the dictionary file, so that, for example, it cannot be erroneously recognized or completely recognized by image analysis. It is possible to improve the accuracy of the character information by modifying the characters and words that have been lost based on the dictionary file.

請求項３に記載のメタデータ生成システムにあっては、前記画像解析手段によって、前記録画ファイルに録画された前記映像が、画像解析済みの映像と、前記画像解析済みの映像から抽出された文字情報とを有する画像解析蓄積ファイルと照合されることにより、画像解析される。In the metadata generation system according to claim 3, the video recorded in the recording file by the image analysis means is an image-analyzed video and characters extracted from the image-analyzed video. Image analysis is performed by collating with an image analysis storage file having information.
従って、過去から蓄積された画像解析結果を用いて効果的に画像解析を行うことができ、その結果、前記映像のメタデータを精度良く短時間で作成することができる。Therefore, it is possible to effectively perform image analysis using the image analysis results accumulated from the past, and as a result, it is possible to accurately create the metadata of the video in a short time.

請求項４に記載のメタデータ生成システムにあっては、前記画像解析学習手段によって、前記画像解析蓄積ファイルが前記画像解析手段によって画像解析された映像と、前記映像から抽出された文字情報とに基づいて修正される。In the metadata generation system according to claim 4, the image analysis storage file is image-analyzed by the image analysis means by the image analysis learning means, and the character information extracted from the image is used. It will be corrected based on.
従って、今回行った画像解析結果を前記画像解析蓄積ファイルに追加したり、前記画像解析蓄積ファイルに含まれる誤った情報を今回行った画像解析結果に基づいて削除したりすることができ、その結果、前記画像解析蓄積ファイルを更新して常に最新の状態で使用することができる。Therefore, the result of the image analysis performed this time can be added to the image analysis storage file, and the erroneous information contained in the image analysis storage file can be deleted based on the result of the image analysis performed this time. , The image analysis storage file can be updated and always used in the latest state.

また、請求項５に記載のメタデータ生成システムにあっては、前記文字情報抽出手段によって、前記録画ファイルに録画された前記映像が画像解析されることにより前記映像から文字情報が抽出され、前記インターネット照合手段によって、抽出された前記文字情報がインターネットにより検索され取得された情報と照合される。Further, in the metadata generation system according to claim 5, character information is extracted from the video by image analysis of the video recorded in the recording file by the character information extraction means, and the character information is extracted. The extracted character information is collated with the information searched and acquired by the Internet by the Internet collation means.
従って、画像解析によって効率よく前記映像から前記文字情報を抽出できると共に、前記文字情報がインターネットにより検索され取得された情報と照合されることにより、例えば、画像解析によって誤認識したり、完全に認識することが出来なかったりした文字や単語をインターネットにより検索され取得された前記情報に基づいて修正し、前記文字情報の精度を高めることができる。Therefore, the character information can be efficiently extracted from the video by image analysis, and the character information is collated with the information searched and acquired by the Internet, so that, for example, misrecognition or complete recognition is performed by image analysis. It is possible to improve the accuracy of the character information by correcting the characters and words that could not be performed based on the information obtained by searching on the Internet.

請求項６に記載のメタデータ生成システムにあっては、前記辞書更新手段によって、前記辞書ファイルが前記文字情報抽出手段によって抽出された前記文字情報に基づいて修正されるので、前記文字情報から得られる新たな単語、文章等の情報を前記辞書ファイルに追加したり、前記辞書ファイルに含まれる誤った情報を前記文字情報に基づいて削除したりすることができ、その結果、前記辞書ファイルを更新して常に最新の状態で使用することができる。In the metadata generation system according to claim 6, the dictionary update means modifies the dictionary file based on the character information extracted by the character information extraction means, and thus obtains from the character information. Information such as new words and sentences can be added to the dictionary file, and erroneous information contained in the dictionary file can be deleted based on the character information, and as a result, the dictionary file is updated. And you can always use it in the latest state.
請求項７に記載のメタデータ生成システムにあっては、前記辞書照合手段によって、前In the metadata generation system according to claim 7, the dictionary collation means is used in advance.
記頻度パラメータの大きい前記辞書データが照合対象として優先的に選択され、選択された前記辞書データと、前記文字情報抽出手段によって抽出された前記文字情報とが照合されるので、例えば、前記辞書ファイルに互いに類似した複数の文字や単語が存在する場合に、前記頻度パラメータの大きい前記辞書データが優先的に選択され、照合対象となる。The dictionary data having a large writing frequency parameter is preferentially selected as a collation target, and the selected dictionary data is collated with the character information extracted by the character information extraction means. Therefore, for example, the dictionary file When a plurality of characters or words similar to each other exist in the dictionary, the dictionary data having a large frequency parameter is preferentially selected and is a collation target.
その結果、前記頻度パラメータの大きい前記辞書データに基づいて修正することができ、前記文字情報の精度をより効率的に高めることができる。As a result, it is possible to make corrections based on the dictionary data having a large frequency parameter, and the accuracy of the character information can be improved more efficiently.

請求項８に記載のメタデータ生成システムにあっては、前記文字情報文章化手段は、前記文字情報取得手段によって取得された前記文字情報を集約して文章化する際に、前記メタデータ格納ファイルを参照して、前記文字情報に関連する作成済のメタデータを前記文字情報の文章化に利用することができ、その結果、テレビ放送番組又はインターネット配信動画に関するメタデータを精度良く、より効率的に自動生成することができる。 In the metadata generation system according to claim 8, when the character information documenting means aggregates and documents the character information acquired by the character information acquisition means, the metadata storage file The created metadata related to the character information can be used for writing the character information, and as a result, the metadata related to the television broadcast program or the Internet-distributed video can be accurately and more efficiently. Can be automatically generated.

請求項９に記載のメタデータ生成システムにあっては、前記文字情報文章化手段は、前記文字情報取得手段によって取得された前記文字情報を集約して文章化する際に、前記映像の電子番組表データを取得して、放送日時、配信日時、ジャンル、タイトル、出演者等の情報が含まれた前記電子番組表データを前記文字情報の文章化に利用することができる。その結果、テレビ放送番組又はインターネット配信動画に関するメタデータを精度良く、より効率的に自動生成することができる。 In the metadata generation system according to claim 9, the character information documenting means is an electronic program of the video when the character information acquired by the character information acquisition means is aggregated and documented. It is possible to acquire the table data and use the electronic program table data including information such as broadcast date / time, distribution date / time, genre, title, performer, etc. for writing the character information. As a result, metadata related to a television broadcast program or an Internet-distributed moving image can be automatically generated with high accuracy and more efficiently.

請求項１０に記載のメタデータ生成システムにあっては、前記映像認識情報抽出手段によって、前記映像に含まれる人物、ロゴ、前記人物の持ち物又は前記人物の表情と、人物情報、ロゴ情報、物情報又は表情情報とが照合され、前記映像に含まれる人物、ロゴ、前記人物の持ち物又は前記人物の表情が文字情報として抽出されるので、前記映像に含まれる人物、ロゴ、前記人物の持ち物又は前記人物の表情から前記映像のメタデータを作成することができる。 In the metadata generation system according to claim 10, the person, logo, belongings of the person or facial expression of the person, person information, logo information, and objects included in the image are used by the image recognition information extraction means. Since the information or facial expression information is collated and the person, logo, belongings of the person or facial expression of the person included in the video are extracted as character information, the person, logo, belongings of the person or the person's belongings included in the video are extracted. The metadata of the video can be created from the facial expression of the person.

請求項１１に記載のメタデータ生成システムにあっては、前記映像認識情報抽出手段によって、前記録画ファイルに録画された前記映像が、画像解析済みの映像と、前記画像解析済みの映像から抽出された文字情報とを有する前記人物情報、前記ロゴ情報、前記物情報又は前記表情情報と照合されることにより、前記映像に含まれる人物、ロゴ、前記人物の持ち物又は前記人物の表情が文字情報として抽出される。
従って、過去から蓄積された画像解析結果を用いて効果的に画像解析を行うことができ、その結果、前記映像のメタデータを精度良く短時間で作成することができる。 In the metadata generation system according to claim 11, the image recorded in the recording file is extracted from the image-analyzed image and the image-analyzed image by the image recognition information extraction means. By collating with the person information, the logo information, the object information, or the facial expression information having the character information, the person, the logo, the person's belongings, or the facial expression of the person included in the video can be used as the character information. Be extracted.
Therefore, it is possible to effectively perform image analysis using the image analysis results accumulated from the past, and as a result, it is possible to accurately create the metadata of the video in a short time.

請求項１２に記載のメタデータ生成システムにあっては、前記映像認識学習手段によって、前記人物情報、前記ロゴ情報、前記物情報又は前記表情情報が、前記映像認識情報抽出手段によって画像解析された映像と、前記映像から抽出された文字情報とに基づいて修正される。
従って、今回行った画像解析結果を前記人物情報、前記ロゴ情報、前記物情報又は前記表情情報に追加したり、前記人物情報、前記ロゴ情報、前記物情報又は前記表情情報に含まれる誤った情報を今回行った画像解析結果に基づいて削除したりすることができ、その結果、前記人物情報、前記ロゴ情報、前記物情報又は前記表情情報を更新して常に最新の状態で使用することができる。 In the metadata generation system according to claim 12, the person information, the logo information, the object information or the facial expression information are image-analyzed by the image recognition information extraction means by the image recognition learning means. It is corrected based on the video and the character information extracted from the video.
Therefore, the result of the image analysis performed this time may be added to the person information, the logo information, the object information or the facial expression information, or erroneous information included in the person information, the logo information, the object information or the facial expression information. Can be deleted based on the result of the image analysis performed this time, and as a result, the person information, the logo information, the object information, or the facial expression information can be updated and always used in the latest state. ..

請求項１３に記載のメタデータ生成システムにあっては、前記音声情報抽出手段によって、前記録画ファイルに録画された前記映像と共に録音された前記音声が音声解析されることにより前記音声から文字情報が抽出され、前記辞書照合手段によって、抽出された前記文字情報が前記辞書ファイルと照合される。
従って、音声解析によって効率よく前記映像と共に録音された前記音声から前記文字情報
を抽出できると共に、前記文字情報が前記辞書ファイルと照合されることから、例えば、音声解析によって誤認識したり、完全に認識することが出来なかったりした文字や単語を前記辞書ファイルに基づいて修正し、前記文字情報の精度を高めることができる。 In the metadata generation system according to claim 13, character information is obtained from the voice by analyzing the voice recorded together with the video recorded in the recording file by the voice information extraction means. The extracted character information is collated with the dictionary file by the dictionary collation means.
Therefore, the character information can be efficiently extracted from the voice recorded together with the video by voice analysis, and the character information is collated with the dictionary file. Therefore, for example, misrecognition or completeness can be achieved by voice analysis. Characters and words that could not be recognized can be corrected based on the dictionary file, and the accuracy of the character information can be improved.

請求項１４に記載に発明にあっては、前記音声解析手段によって、前記録画ファイルに録画された前記映像と共に録音された前記音声が、音声解析済みの音声と、前記音声解析済みの音声から抽出された文字情報とを有する音声解析蓄積ファイルと照合されることにより、音声解析される。
従って、過去から蓄積された音声解析結果を用いて効果的に音声解析を行うことができ、その結果、前記映像のメタデータを精度良く短時間で作成することができる。 In the invention described in claim 14, the voice recorded together with the video recorded in the recording file is extracted from the voice analyzed voice and the voice analyzed voice by the voice analysis means. The voice is analyzed by collating with the voice analysis storage file having the character information.
Therefore, it is possible to effectively perform voice analysis using the voice analysis results accumulated from the past, and as a result, it is possible to accurately create the metadata of the video in a short time.

請求項１５に記載の発明にあっては、前記音声解析学習手段によって、前記音声解析蓄積ファイルが前記音声解析手段によって音声解析された音声と、前記音声から抽出された文字情報とに基づいて修正される。 In the invention according to claim 15, the voice analysis storage file is modified based on the voice analyzed by the voice analysis means and the character information extracted from the voice by the voice analysis learning means. Will be done.

従って、今回行った音声解析結果を前記音声解析蓄積ファイルに追加したり、前記音声解析蓄積ファイルに含まれる誤った情報を今回行った音声解析結果に基づいて削除したりすることができ、その結果、前記音声解析蓄積ファイルを更新して常に最新の状態で使用することができる。Therefore, the result of the voice analysis performed this time can be added to the voice analysis storage file, and the erroneous information contained in the voice analysis storage file can be deleted based on the result of the voice analysis stored this time. , The voice analysis storage file can be updated and always used in the latest state.

図１は、本発明に係るメタデータ生成システムの一実施の形態を示すブロック図である。FIG. 1 is a block diagram showing an embodiment of a metadata generation system according to the present invention. 図２は、本発明に係るメタデータ生成システムの一実施の形態において、メタデータ生成システムにおける処理の流れを示すフローチャートである。FIG. 2 is a flowchart showing a processing flow in the metadata generation system in one embodiment of the metadata generation system according to the present invention. 図３は、本発明に係るメタデータ生成システムの一実施の形態において、（ａ）は放送番組映像を表す模式図、（ｂ）は（ａ）の放送番組映像から生成されたメタデータである。3A and 3B are schematic views showing a broadcast program image in an embodiment of a metadata generation system according to the present invention, and FIG. 3B is metadata generated from the broadcast program image of (a). ..

以下、添付図面に示す実施の形態に基づき、本発明を詳細に説明する。
（１）本実施の形態に係るメタデータ生成システム１０の構成
図１及び図３に示すように、本発明の一実施の形態に係るメタデータ生成システム１０は、テレビ放送局３０が放送する放送番組映像Ｖを録画する録画ファイル１１を有する録画手段１２と、録画ファイル１１に録画された映像Ｖから文字情報Ｃを取得する文字情報取得手段１３と、文字情報取得手段１３によって取得された文字情報Ｃを集約して文章化する文字情報文章化手段１４と、文字情報文章化手段１４によって文章化された文字情報を録画ファイル１１に録画された映像ＶのメタデータＭとしてメタデータ格納ファイル１５に格納するメタデータ格納手段１６とを備えている。 Hereinafter, the present invention will be described in detail based on the embodiments shown in the accompanying drawings.
(1) Configuration of Metadata Generation System 10 According to the Embodiment of the Present As shown in FIGS. 1 and 3, the metadata generation system 10 according to the embodiment of the present invention is a broadcast broadcast by the television broadcasting station 30. The recording means 12 having the recording file 11 for recording the program video V, the character information acquisition means 13 for acquiring the character information C from the video V recorded in the recording file 11, and the character information acquired by the character information acquisition means 13. The character information writing means 14 that aggregates C and makes it into a sentence, and the character information that is written by the character information writing means 14 are stored in the metadata storage file 15 as the metadata M of the video V recorded in the recording file 11. It is provided with a metadata storage means 16 for storing.

また、図１及び図３に示すように、本実施の形態に係る文字情報取得手段１３は、録画ファイル１１に録画された映像Ｖに対して画像解析を行い、映像Ｖから文字情報Ｃを抽出する文字情報抽出手段１７と、文字情報抽出手段１７によって抽出された文字情報Ｃを辞書ファイル１８と照合する辞書照合手段１９とを有している。
本実施の形態にかかる文字情報抽出手段１７は、録画ファイル１１に録画された映像Ｖに対して画像解析を行うことによって文字列を抽出する画像解析手段３１と、抽出した前記文字列に対して形態素解析を行うことによって前記文字列に含まれる単語を抽出する単語解析手段３２とを有している。
ここで、形態素解析とは、文法的な情報の注記の無い自然言語のテキストデータ（文）から、対象言語の文法や、辞書と呼ばれる単語の品詞等の情報にもとづき、形態素（おおまかにいえば、言語で意味を持つ最小単位）の列に分割し、それぞれの形態素の品詞等を判別する作業である。具体的には、「○×オープン決勝進出」という文字列から「○×」（大会名）、「○×オープン」、「決勝」、「進出」、「決勝進出」といった単語を抽出することができる。 Further, as shown in FIGS. 1 and 3, the character information acquisition means 13 according to the present embodiment performs image analysis on the video V recorded in the recording file 11 and extracts the character information C from the video V. It has a character information extracting means 17 to be used, and a dictionary collating means 19 for collating the character information C extracted by the character information extracting means 17 with the dictionary file 18.
The character information extracting means 17 according to the present embodiment is an image analysis means 31 that extracts a character string by performing image analysis on the video V recorded in the recording file 11, and the extracted character string. It has a word analysis means 32 for extracting words included in the character string by performing morphological analysis.
Here, morphological analysis refers to morphological elements (roughly speaking) based on information such as the grammar of the target language and the part of speech of words called dictionaries, from text data (sentences) in natural language without notes of grammatical information. , The smallest unit that has meaning in the language), and the part of speech of each morpheme is discriminated. Specifically, it is possible to extract words such as "○ ×" (meeting name), "○ × open", "final", "advance", and "advance to the final" from the character string "○ × open final advance". it can.

図１に示すように、本実施の形態に係る画像解析手段３１は、画像解析済みの映像と、前記画像解析済みの映像から抽出された文字情報とを有する画像解析蓄積ファイル３５と照合して画像解析するように構成されている。
ここで、画像解析済みの映像とは、これまでに画像解析された映像を意味し、前記画像解析済みの映像から抽出された文字情報とは、画像解析された結果、正しく前記映像から抽出された文字情報を意味する。 As shown in FIG. 1, the image analysis means 31 according to the present embodiment collates the image-analyzed image with the image analysis storage file 35 having the character information extracted from the image-analyzed image. It is configured for image analysis.
Here, the image-analyzed video means the video that has been image-analyzed so far, and the character information extracted from the image-analyzed video is correctly extracted from the video as a result of the image analysis. Means character information.

また、図１及び図３に示すように、本実施の形態に係る文字情報抽出手段１７は、画像解析手段３１によって画像解析された映像Ｖと、映像Ｖから抽出された文字情報Ｃとに基づいて、画像解析蓄積ファイル３５を修正する画像解析学習手段３６をさらに有している。 Further, as shown in FIGS. 1 and 3, the character information extracting means 17 according to the present embodiment is based on the image V image-analyzed by the image analysis means 31 and the character information C extracted from the image V. Further, the image analysis learning means 36 for modifying the image analysis storage file 35 is provided.

また、図１及び図３に示すように、本実施の形態に係る文字情報取得手段１３は、文字情報抽出手段１７によって抽出された文字情報Ｃをインターネット２０により検索し取得された情報と照合するインターネット照合手段２１を有している。 Further, as shown in FIGS. 1 and 3, the character information acquisition means 13 according to the present embodiment searches the character information C extracted by the character information extraction means 17 on the Internet 20 and collates it with the acquired information. It has an Internet collation means 21.

また、図１及び図３に示すように、本実施の形態に係る文字情報取得手段１３は、文字情報抽出手段１７によって抽出された文字情報Ｃに基づいて、辞書ファイル１８を修正する辞書更新手段３３をさらに有している。 Further, as shown in FIGS. 1 and 3, the character information acquisition means 13 according to the present embodiment is a dictionary update means for modifying the dictionary file 18 based on the character information C extracted by the character information extraction means 17. It also has 33.

また、図１及び図３に示すように、本実施の形態に係る辞書ファイル１８は、各国の言語に関する文字、熟語を有する辞書データが照合可能に含まれている辞書データＤと、辞書データＤの頻度パラメータ３４とを有し、辞書照合手段１９は、頻度パラメータ３４の大きい辞書データＤを照合対象として優先的に選択するように構成されている。 Further, as shown in FIGS. 1 and 3, the dictionary file 18 according to the present embodiment includes dictionary data D and dictionary data D in which dictionary data having characters and compound words related to each country's language can be collated. The dictionary collation means 19 is configured to preferentially select dictionary data D having a large frequency parameter 34 as a collation target.

また、図１及び図３に示すように、本実施の形態に係る文字情報文章化手段１４は、メ
タデータ格納ファイル１５を参照し、文字情報取得手段１３によって取得された文字情報Ｃに関連するメタデータＭを文字情報Ｃの文章化に利用するように構成されている。 Further, as shown in FIGS. 1 and 3, the character information documenting means 14 according to the present embodiment refers to the metadata storage file 15 and is related to the character information C acquired by the character information acquisition means 13. It is configured to use the metadata M for writing the character information C.

また、図１及び図３に示すように、本実施の形態に係る文字情報文章化手段１４は、録画ファイル１１に録画された映像Ｖの電子番組表データＥを取得し、文字情報Ｃの文章化に利用するように構成されている。本実施の形態に係る電子番組表データＥには、テレビ放送局３０が放送する放送番組映像Ｖの放送日時、配信日時、ジャンル、タイトル、出演者等の情報が含まれている。 Further, as shown in FIGS. 1 and 3, the character information writing means 14 according to the present embodiment acquires the electronic program guide data E of the video V recorded in the recording file 11, and the text of the character information C. It is configured to be used for conversion. The electronic program guide data E according to the present embodiment includes information such as the broadcast date / time, the distribution date / time, the genre, the title, and the performers of the broadcast program video V broadcast by the television broadcasting station 30.

また、図１及び図３に示すように、本実施の形態に係る文字情報取得手段１３は、映像Ｖに含まれる人物Ｐ、ロゴＬ、人物Ｐの持ち物Ｂ又は人物Ｐの表情Ｆと、人物情報、ロゴ情報、物情報又は表情情報とを照合し、映像Ｖに含まれる人物Ｐ、ロゴＬ、人物Ｐの持ち物Ｂ又は人物Ｐの表情Ｆを文字情報Ｃとして抽出する映像認識情報抽出手段２２を有している。 Further, as shown in FIGS. 1 and 3, the character information acquisition means 13 according to the present embodiment includes a person P, a logo L, a person P's belongings B or a person P's facial expression F included in the image V, and a person. Video recognition information extraction means 22 that collates information, logo information, object information, or facial information, and extracts the person P, logo L, personal belongings B of person P, or facial expression F of person P included in the image V as character information C. have.

本実施の形態に係る人物情報、ロゴ情報、物情報又は表情情報は、画像解析済みの映像と、前記画像解析済みの映像から抽出された文字情報とにより構成されている。
ここで、画像解析済みの映像とは、これまでに画像解析された映像を意味し、前記画像解析済みの映像から抽出された文字情報とは、画像解析された結果、正しく前記映像から抽出された文字情報を意味する。 The person information, logo information, object information, or facial expression information according to the present embodiment is composed of an image-analyzed image and character information extracted from the image-analyzed image.
Here, the image-analyzed video means the video that has been image-analyzed so far, and the character information extracted from the image-analyzed video is correctly extracted from the video as a result of the image analysis. Means character information.

また、図１及び図３に示すように、本実施の形態に係る文字情報取得手段１３は、映像認識情報抽出手段２２によって画像解析された映像Ｖと、映像Ｖから抽出された文字情報Ｃとに基づいて、人物情報、ロゴ情報、物情報又は表情情報を修正する映像認識学習手段３７をさらに有することを特徴とする。 Further, as shown in FIGS. 1 and 3, the character information acquisition means 13 according to the present embodiment includes the image V image-analyzed by the image recognition information extraction unit 22 and the character information C extracted from the image V. It is characterized in that it further has an image recognition learning means 37 that corrects personal information, logo information, object information, or facial expression information based on the above.

また、図１及び図３に示すように、本実施の形態に係る文字情報取得手段１３は、録画ファイル１１に録画された映像Ｖと共に録音された音声に対して音声解析を行い、前記音声から文字情報Ｃを抽出する音声情報抽出手段２３を有している。 Further, as shown in FIGS. 1 and 3, the character information acquisition means 13 according to the present embodiment performs voice analysis on the voice recorded together with the video V recorded in the recording file 11, and from the voice. It has a voice information extraction means 23 for extracting character information C.

図１に示すように、本実施の形態に係る音声情報抽出手段２３は、音声解析済みの音声と、前記音声解析済みの音声から抽出された文字情報とを有する音声解析蓄積ファイル３８と照合して音声解析する音声解析手段３９を有することを特徴とする。
ここで、音声解析済みの音声とは、これまでに音声解析された音声を意味し、前記音声解析済みの音声から抽出された文字情報とは、音声解析された結果、正しく前記音声から抽出された文字情報を意味する。 As shown in FIG. 1, the voice information extracting means 23 according to the present embodiment collates the voice analyzed voice with the voice analysis storage file 38 having the character information extracted from the voice analyzed voice. It is characterized by having a voice analysis means 39 for voice analysis.
Here, the voice that has been voice-analyzed means the voice that has been voice-analyzed so far, and the character information extracted from the voice that has been voice-analyzed is correctly extracted from the voice as a result of voice analysis. Means textual information.

また、図１及び図３に示すように、本実施の形態に係る音声情報抽出手段２３は、音声解析手段３９によって音声解析された音声と、前記音声から抽出された文字情報とに基づいて、音声解析蓄積ファイル３８を修正する音声解析学習手段４０をさらに有することを特徴とする。 Further, as shown in FIGS. 1 and 3, the voice information extracting means 23 according to the present embodiment is based on the voice analyzed by the voice analysis means 39 and the character information extracted from the voice. It is characterized by further having a voice analysis learning means 40 for modifying the voice analysis storage file 38.

図１及び図３に示すように、本実施の形態に係る文字情報取得手段１３にあっては、文字情報抽出手段１７、映像認識情報抽出手段２２、及び、音声情報抽出手段２３によって、夫々、抽出された文字情報Ｃを互いに照合する複合情報照合手段２４を備えている。 As shown in FIGS. 1 and 3, in the character information acquisition means 13 according to the present embodiment, the character information extraction means 17, the video recognition information extraction means 22, and the voice information extraction means 23 are used, respectively. The compound information collation means 24 for collating the extracted character information C with each other is provided.

図１及び図３に示すように、本実施の形態に係る録画手段１２は、全ての放送局、例えば、我が国における全ての地上局及び衛星放送の放送局から放送された全ての放送番組の映像を、所定期間、例えば１ヶ月に亘って録画しうるように所定の容量のハードディスク型の記憶装置を有する大型の録画装置である。
本実施の形態において、録画手段１２内に装備されたハードディスク内の録画ファイル１１は、テレビ放送局３０により放送された映像Ｖからなる番組コンテンツ２５と、番組コンテンツ２５が放送されたチャンネル名２６と、番組コンテンツ２５のタイムコード２７に関する情報を有している。
この場合、番組コンテンツ２５は、放送番組単位、当該放送番組を構成するコーナー単位、又は当該放送番組を構成する記事単位からなる。 As shown in FIGS. 1 and 3, the recording means 12 according to the present embodiment is a video of all broadcast programs broadcast from all broadcasting stations, for example, all terrestrial stations and satellite broadcasting broadcasting stations in Japan. Is a large-scale recording device having a hard disk-type storage device having a predetermined capacity so that the recording can be performed for a predetermined period, for example, one month.
In the present embodiment, the recording file 11 in the hard disk provided in the recording means 12 includes a program content 25 composed of a video V broadcast by the television broadcasting station 30, and a channel name 26 on which the program content 25 is broadcast. , Has information about the time code 27 of the program content 25.
In this case, the program content 25 is composed of a broadcast program unit, a corner unit constituting the broadcast program, or an article unit constituting the broadcast program.

また、図１及び図３に示すように、本実施の形態において、メタデータ格納手段１６のメタデータ格納ファイル１５には、番組コンテンツ要約テキストデータ２８と、番組コンテンツ２５が放送されたチャンネル名２９と、番組コンテンツ２５のタイムコード２７とが記録されており、いずれも本実施の形態におけるメタデータＭを構成するデータである。
番組コンテンツ要約テキストデータ２８とは、テレビ放送局３０により放送されたテレビ番組の内容を文字化して要約したものである。番組コンテンツ要約テキストデータ２８は、番組コンテンツ２５と同様に、放送番組単位、当該放送番組を構成するコーナー単位、又は当該放送番組を構成する記事単位からなる。
また、番組コンテンツ要約テキストデータ２８には、ニュアンスパラメータを含めることができる。ここで、「ニュアンスパラメータ」とは、前記検索キーワードに対応する語句が出現する前記サイト情報のニュアンス（印象）を人工知能等のような自動システムや人間の判断により、数値化したものである。
例えば、番組コンテンツが良い内容（ｇｏｏｄ）であれば高く（プラス評価）、悪い内容（ｂａｄ）であれば低く（マイナス評価）、事実を述べただけの中立的な内容（ｎｅｕｔｒａｌ）であれば０（ゼロ評価）とすることができる。 Further, as shown in FIGS. 1 and 3, in the present embodiment, the metadata storage file 15 of the metadata storage means 16 contains the program content summary text data 28 and the channel name 29 on which the program content 25 is broadcast. And the time code 27 of the program content 25 are recorded, and all of them are data constituting the metadata M in the present embodiment.
The program content summary text data 28 is a textual summary of the contents of a television program broadcast by the television broadcasting station 30. Similar to the program content 25, the program content summary text data 28 is composed of a broadcast program unit, a corner unit constituting the broadcast program, or an article unit constituting the broadcast program.
In addition, the program content summary text data 28 can include nuance parameters. Here, the "nuance parameter" is a numerical value of the nuance (impression) of the site information in which a word or phrase corresponding to the search keyword appears by an automatic system such as artificial intelligence or a human judgment.
For example, if the program content is good (good), it is high (positive evaluation), if it is bad (bad), it is low (negative evaluation), and if it is neutral content (neutral) that only states the facts, it is 0. (Zero evaluation) can be set.

（２）本実施の形態に係るメタデータ生成システム１０の処理の流れ
図２に示すように、本実施の形態に係るメタデータ生成システム１０は以下の工程に従って処理を行う。まず、録画手段１２が、テレビ放送局３０が放送する放送番組映像Ｖを録画ファイル１１に録画する（Ｓ１）。
この際、録画手段１２は、全ての放送局、例えば、我が国における全ての地上局及び衛星放送の放送局から放送された全ての放送番組の映像を、所定期間、例えば１ヶ月に亘って録画することもできる。 (2) Process flow of the metadata generation system 10 according to the present embodiment As shown in FIG. 2, the metadata generation system 10 according to the present embodiment performs processing according to the following steps. First, the recording means 12 records the broadcast program video V broadcast by the television broadcasting station 30 in the recording file 11 (S1).
At this time, the recording means 12 records images of all broadcast programs broadcast from all broadcasting stations, for example, all terrestrial stations and satellite broadcasting broadcasting stations in Japan, for a predetermined period, for example, one month. You can also do it.

次いで、図２に示すように、文字情報取得手段１３が、録画ファイル１１に録画された映像Ｖに表示された文字情報Ｃを取得する。
この際、文字情報抽出手段１７が、録画ファイル１１に録画された映像Ｖに対して画像解析を行い、映像Ｖから文字情報Ｃを抽出する（Ｓ２ａ）。
特に、図１及び図３に示すように、本実施の形態にかかる文字情報抽出手段１７にあっては、画像解析手段３１が録画ファイル１１に録画された映像Ｖに対して画像解析を行うことによって文字列を抽出し、単語解析手段３２が抽出した前記文字列に対して形態素解析を行うことによって前記文字列に含まれる単語を抽出する。
具体的には、図３（ａ）に示すように、画像解析手段３１が番組コンテンツ２５の映像Ｖに対して画像解析を行うことによって、「××ニュース」、「速報○△選手○×オープン決勝進出」という文字列を抽出することができる。
続いて、単語解析手段３２が抽出したこれらの文字列に対して形態素解析を行うことによって、「××」（番組名、チャンネル名）、「ニュース」、「××ニュース」、「速報」、「○△」（選手名）、「選手」、「○△選手」、「○×」（地域名、大会名）、「オープン」、「○×オープン」、「決勝」、「進出」といった単語を抽出することができる。
なお、図１及び図３に示すように、本実施の形態に係る文字情報抽出手段１７にあっては、画像解析手段３１が、録画ファイル１１に録画された映像Ｖと、画像解析済みの映像及び前記画像解析済みの映像から抽出された文字情報を有する画像解析蓄積ファイル３５
とを照合することにより、画像解析する。 Next, as shown in FIG. 2, the character information acquisition means 13 acquires the character information C displayed on the video V recorded in the recording file 11.
At this time, the character information extracting means 17 performs image analysis on the video V recorded in the recording file 11 and extracts the character information C from the video V (S2a).
In particular, as shown in FIGS. 1 and 3, in the character information extraction means 17 according to the present embodiment, the image analysis means 31 performs image analysis on the video V recorded in the recording file 11. The character string is extracted by the above, and the words included in the character string are extracted by performing morphological analysis on the character string extracted by the word analysis means 32.
Specifically, as shown in FIG. 3A, the image analysis means 31 performs image analysis on the video V of the program content 25 to perform "XX news" and "breaking news ○ △ player ○ × open". The character string "advance to the final" can be extracted.
Subsequently, by performing morphological analysis on these character strings extracted by the word analysis means 32, "XX" (program name, channel name), "news", "XX news", "breaking news", Words such as "○ △" (player name), "player", "○ △ player", "○ ×" (region name, tournament name), "open", "○ × open", "final", "advance" Can be extracted.
As shown in FIGS. 1 and 3, in the character information extracting means 17 according to the present embodiment, the image analysis means 31 has the image V recorded in the recording file 11 and the image-analyzed image. And the image analysis storage file 35 having the character information extracted from the image-analyzed video.
Image analysis is performed by collating with.

また、図２に示すように、映像認識情報抽出手段２２が、映像Ｖに含まれる人物Ｐ、ロゴＬ、人物Ｐの持ち物Ｂ又は人物Ｐの表情Ｆと、人物情報、ロゴ情報、物情報又は表情情報とを照合し、映像Ｖに含まれる人物Ｐ、ロゴＬ、人物Ｐの持ち物Ｂ又は人物Ｐの表情Ｆを文字情報Ｃとして抽出する（Ｓ２ｂ）。
具体的には、図３（ａ）に示すように、映像認識情報抽出手段２２が番組コンテンツ２５の映像Ｖに含まれる人物Ｐ、ロゴＬ、人物Ｐの持ち物Ｂ、人物Ｐの表情Ｆに対して、人物情報、ロゴ情報、物情報、表情情報を照合することによって、人物Ｐが「○△選手」、ロゴＬが「○×オープン」、人物Ｐの持ち物Ｂが「テニス（ラケット）」、人物Ｐの表情Ｆが「精一杯な表情」であることが照合され、夫々を文字情報Ｃとして抽出することができる。
なお、図１及び図３に示すように、本実施の形態にあっては、映像認識情報抽出手段２２が、録画ファイル１１に録画された映像Ｖと、画像解析済みの映像及び前記画像解析済みの映像から抽出された文字情報を有する人物情報、ロゴ情報、物情報又は表情情報とを照合することにより、映像Ｖに含まれる人物Ｐ、ロゴＬ、人物Ｐの持ち物Ｂ又は人物Ｐの表情Ｆを文字情報Ｃとして抽出する。 Further, as shown in FIG. 2, the image recognition information extracting means 22 includes the person P, the logo L, the possession B of the person P, or the facial expression F of the person P, and the person information, the logo information, the object information, or the person information. By collating with the facial expression information, the person P, the logo L, the possession B of the person P, or the facial expression F of the person P included in the image V is extracted as the character information C (S2b).
Specifically, as shown in FIG. 3A, the video recognition information extracting means 22 with respect to the person P, the logo L, the personal belongings B of the person P, and the facial expression F of the person P included in the video V of the program content 25. By collating person information, logo information, object information, and facial expression information, person P is "○ △ player", logo L is "○ × open", and person P's belongings B is "tennis (racket)". It is verified that the facial expression F of the person P is the "full facial expression", and each can be extracted as the character information C.
As shown in FIGS. 1 and 3, in the present embodiment, the image recognition information extraction means 22 includes the image V recorded in the recording file 11, the image-analyzed image, and the image-analyzed image. By collating the person information, logo information, object information, or facial expression information having the character information extracted from the video, the person P, the logo L, the personal belongings B of the person P, or the facial expression F of the person P included in the video V. Is extracted as character information C.

また、図２に示すように、音声情報抽出手段２３が、録画ファイル１１に録画された映像Ｖと共に録音された音声に対して音声解析を行い、前記音声から文字情報Ｃを抽出する（Ｓ２ｃ）。
なお、図１及び図３に示すように、本実施の形態に係る音声情報抽出手段２３にあっては、音声解析手段３９が、録画ファイル１１に録画された映像Ｖと、画像解析済みの映像及び前記画像解析済みの映像から抽出された文字情報を有する音声解析蓄積ファイル３８とを照合することにより、音声解析する。 Further, as shown in FIG. 2, the voice information extracting means 23 performs voice analysis on the voice recorded together with the video V recorded in the recording file 11, and extracts the character information C from the voice (S2c). ..
As shown in FIGS. 1 and 3, in the voice information extraction means 23 according to the present embodiment, the voice analysis means 39 has the image V recorded in the recording file 11 and the image-analyzed image. And the voice analysis is performed by collating with the voice analysis storage file 38 having the character information extracted from the image-analyzed video.

続いて、図２に示すように、複合情報照合手段２４が、文字情報抽出手段１７、映像認識情報抽出手段２２、及び、音声情報抽出手段２３によって、夫々、抽出された文字情報を互いに照合する（Ｓ３）。
具体的には、図１及び図３に示すように、文字情報抽出手段１７によって抽出された「○△選手」及び「○×オープン」が、映像認識情報抽出手段２２によって抽出された「○△選手」（人物Ｐより抽出）及び「○×オープン」（ロゴＬより抽出）と照合され、文字情報Ｃが正しく抽出されたことを確認し、文字情報Ｃの精度を高めることができる。
なお、処理速度を優先する場合には、複合情報照合手段２４による照合工程Ｓ３を省略してもよい。 Subsequently, as shown in FIG. 2, the composite information collating means 24 collates the extracted character information with each other by the character information extracting means 17, the video recognition information extracting means 22, and the voice information extracting means 23, respectively. (S3).
Specifically, as shown in FIGS. 1 and 3, the “○ △ player” and “○ × open” extracted by the character information extracting means 17 are “○ △” extracted by the video recognition information extracting means 22. It is possible to improve the accuracy of the character information C by collating with "player" (extracted from the person P) and "○ × open" (extracted from the logo L) and confirming that the character information C is correctly extracted.
If the processing speed is prioritized, the collation step S3 by the composite information collation means 24 may be omitted.

ここで、図１及び図３に示すように、画像解析学習手段３６が、画像解析手段３１によって画像解析された映像Ｖと、映像Ｖから抽出された文字情報Ｃとに基づいて、画像解析蓄積ファイル３５を修正することができる。 Here, as shown in FIGS. 1 and 3, the image analysis learning means 36 accumulates image analysis based on the image V image-analyzed by the image analysis means 31 and the character information C extracted from the image V. File 35 can be modified.

また、ここで、図１及び図３に示すように、映像認識学習手段３７が、映像認識情報抽出手段２２によって画像解析された映像Ｖと、映像Ｖから抽出された文字情報Ｃとに基づいて、人物情報、ロゴ情報、物情報又は表情情報を修正することができる。 Further, as shown in FIGS. 1 and 3, the video recognition learning means 37 is based on the video V image-analyzed by the video recognition information extraction means 22 and the character information C extracted from the video V. , Person information, logo information, object information or facial expression information can be modified.

また、さらに、図１に示すように、音声解析学習手段４０が、音声解析手段３９によって音声解析された音声と、前記音声から抽出された文字情報とに基づいて、音声解析蓄積ファイル３８を修正することができる。 Further, as shown in FIG. 1, the voice analysis learning means 40 modifies the voice analysis storage file 38 based on the voice analyzed by the voice analysis means 39 and the character information extracted from the voice. can do.

続いて、図２に示すように、辞書照合手段１９が、文字情報抽出手段１７、映像認識情報抽出手段２２、又は、音声情報抽出手段２３によって抽出された文字情報Ｃを辞書ファイル１８と照合する（Ｓ４ａ）。照合した結果、文字情報Ｃが辞書ファイル１８と一致しなかった場合は、文字情報Ｃを辞書ファイル１８に基づいて修正する。一方、文字情報Ｃが辞書ファイル１８と一致した場合には、そのまま照合処理を終了する。
ここで、図１に示すように、辞書照合手段１９は、頻度パラメータ３４の大きい辞書データＤを照合対象として優先的に選択することができる。例えば、文字情報抽出手段１７によって文字情報Ｃが「速報」ではなく誤って「連報」と抽出された場合において、辞書ファイル１８と照合し、仮に「連報」という単語が登録されていたとしても、「速報」という単語の使用頻度が高く、当該単語の頻度パラメータが大きいことから、文字情報Ｃは「速報」であると判断して、文字情報Ｃを修正することができる。 Subsequently, as shown in FIG. 2, the dictionary collating means 19 collates the character information C extracted by the character information extracting means 17, the video recognition information extracting means 22, or the voice information extracting means 23 with the dictionary file 18. (S4a). If the character information C does not match the dictionary file 18 as a result of collation, the character information C is corrected based on the dictionary file 18. On the other hand, when the character information C matches the dictionary file 18, the collation process ends as it is.
Here, as shown in FIG. 1, the dictionary collation means 19 can preferentially select dictionary data D having a large frequency parameter 34 as a collation target. For example, when the character information C is mistakenly extracted as "continuous report" instead of "flash report" by the character information extracting means 17, it is collated with the dictionary file 18 and the word "continuous report" is registered. However, since the word "flash report" is frequently used and the frequency parameter of the word is large, it is possible to determine that the character information C is "flash report" and correct the character information C.

また、図１及び図３に示すように、辞書更新手段２２が、文字情報抽出手段１７によって抽出された文字情報Ｃに基づいて、辞書ファイル１８を修正することができる。
例えば、文字情報抽出手段１７によって「独壇場（どくだんじょう）」が抽出された場合に、辞書ファイル１８に存在する「独擅場（どくせんじょう）」という正しい表現の他に、「独壇場（どくだんじょう）」という元々は誤りだが慣用的に使用されるようになった表現を辞書ファイル１８に追加することができる。 Further, as shown in FIGS. 1 and 3, the dictionary updating means 22 can modify the dictionary file 18 based on the character information C extracted by the character information extracting means 17.
For example, when "Dokudanjo" is extracted by the character information extraction means 17, in addition to the correct expression "Dokusenjo" existing in the dictionary file 18, "Dokudanjo" An originally incorrect but commonly used expression can be added to the dictionary file 18.

また、図２に示すように、インターネット照合手段２１が、文字情報抽出手段１７、映像認識情報抽出手段２２、又は、音声情報抽出手段２３によって抽出された文字情報Ｃをインターネット２０により検索し取得された情報Ｉと照合することもできる（Ｓ４ｂ）。
照合した結果、文字情報Ｃが情報Ｉと一致しなかった場合は、文字情報Ｃを情報Ｉに基づいて修正する。一方、文字情報Ｃが情報Ｉと一致した場合には、そのまま照合処理を終了する。
処理速度を優先する場合には、辞書照合手段１９による照合工程Ｓ４ａ、インターネット照合手段２１による照合工程Ｓ４ｂのいずれか一方のみを実行すればよく、一方、文字情報Ｃの精度（正確さ）を優先する場合には、両方の工程を順序問わず実行することもできる。 Further, as shown in FIG. 2, the Internet collation means 21 searches and acquires the character information C extracted by the character information extraction means 17, the video recognition information extraction means 22, or the voice information extraction means 23 by the Internet 20. It can also be collated with the information I (S4b).
If the character information C does not match the information I as a result of collation, the character information C is corrected based on the information I. On the other hand, when the character information C matches the information I, the collation process is terminated as it is.
When the processing speed is prioritized, only one of the collation step S4a by the dictionary collation means 19 and the collation step S4b by the Internet collation means 21 needs to be executed, while the accuracy (accuracy) of the character information C is prioritized. If so, both steps can be performed in any order.

次いで、図２に示すように、文字情報文章化手段１４が、取得された文字情報Ｃを集約して文章化する（Ｓ５）。
具体的には、図３に示すように、取得された文字情報Ｃである「××ニュース」、「速報」、「○△選手」、「○×オープン」、「決勝」、「進出」、「テニス（ラケット）」を集約して、「［××ニュース］○×オープンに出場している日本のトップテニスプレーヤー○△選手が決勝に進出した」という文字情報へと文章化することができる。
この際、文字情報文章化手段１４は、メタデータ格納ファイル１５を参照し、文字情報取得手段１３によって取得された文字情報Ｃに関連するメタデータＭを文字情報Ｃの文章化に利用することができる。
例えば、前日に放送された映像に係るメタデータが「（０２／２８１２：００）［××ニュース］○×オープンに出場している日本のトップテニスプレーヤー○△選手が準決勝に進出した」というものであった場合に、「○×オープンに出場している」、「日本のトップテニスプレーヤー○△選手」、「準決勝に進出した」という文章を利用して、文字情報Ｃの文章化を迅速に処理し、精度を高めることができる。 Next, as shown in FIG. 2, the character information writing means 14 aggregates the acquired character information C and puts it into a sentence (S5).
Specifically, as shown in FIG. 3, the acquired character information C is "XX news", "breaking news", "○ △ player", "○ × open", "final", "advancement", "Tennis (racket)" can be aggregated and written into textual information such as "[XX News] XX Open Japanese top tennis player XX has advanced to the final." ..
At this time, the character information writing means 14 can refer to the metadata storage file 15 and use the metadata M related to the character information C acquired by the character information acquiring means 13 for writing the character information C. it can.
For example, the metadata related to the video broadcast the day before says "(02/28 12:00) [XX News] XX Japan's top tennis player XX has advanced to the semi-finals." If it is a thing, use the sentences "I participated in the ○ × open", "Japan's top tennis player ○ △ player", and "advanced to the semi-final" to quickly write the text information C. Can be processed to improve accuracy.

また、図１及び図３に示すように、文字情報文章化手段１４は、録画ファイル１１に録画された映像Ｖの電子番組表データＥを取得し、文字情報Ｃの文章化に利用することもできる。例えば、電子番組表データＥに「３月１日１２時 ××ニュース」という情報が含まれていれば、メタデータＭに「（０３／０１１２：００）［××ニュース］」という情報を追加し、文字情報Ｃの文章化を迅速に処理し、精度を高めることができる。 Further, as shown in FIGS. 1 and 3, the character information writing means 14 may acquire the electronic program guide data E of the video V recorded in the recording file 11 and use it for writing the character information C. it can. For example, if the electronic program guide data E contains the information "March 1, 12:00 XX news", the metadata M contains the information "(03/01 12:00) [XX news]". In addition, it is possible to quickly process the textualization of the character information C and improve the accuracy.

次いで、図２に示すように、メタデータ格納手段１６が、文字情報文章化手段１４によ
って文章化された文字情報を録画ファイル１１に録画された映像ＶのメタデータＭとしてメタデータ格納ファイル１５に格納する（Ｓ６）。
具体的には、図３（ｂ）に示すように、メタデータ格納手段１６が、番組コンテンツ２５の映像ＶのメタデータＭとして「（０３／０１１２：００）［××ニュース］○×オープンに出場している日本のトップテニスプレーヤー○△選手が決勝に進出した」というメタデータをメタデータ格納ファイル１５に格納することができる。
以上より、映像Ｖに表示され、映像Ｖに関連する単語、文章の情報である文字情報Ｃから映像ＶのメタデータＭを作成することができる。 Next, as shown in FIG. 2, the metadata storage means 16 stores the character information documented by the character information documenting means 14 in the metadata storage file 15 as the metadata M of the video V recorded in the recording file 11. Store (S6).
Specifically, as shown in FIG. 3B, the metadata storage means 16 opens "(03/01 12:00) [XX news] XX news" as the metadata M of the video V of the program content 25. It is possible to store the metadata that "the top Japanese tennis player ○ △ player who participated in the game has advanced to the final" in the metadata storage file 15.
From the above, the metadata M of the video V can be created from the character information C which is displayed on the video V and is information on words and sentences related to the video V.

（３）本実施の形態に係るメタデータ生成システム１０の効果
図１及び図３に示すように、本実施の形態に係るメタデータ生成システム１０にあっては、録画手段１２によって、録画ファイル１１に映像が録画された場合には、文字情報取得手段１３によって、録画ファイル１１に録画された映像Ｖに表示された文字情報Ｃが取得され、文字情報文章化手段１４によって、取得された文字情報Ｃが文章化され、メタデータ格納手段１６によって、文章化された文字情報が映像ＶのメタデータＭとしてメタデータ格納ファイル１５に格納されるので、映像Ｖに表示され、映像Ｖに関連する単語、文章の情報である文字情報Ｃから映像ＶのメタデータＭを精度良く自動作成することができる。
その結果、テレビ放送番組に関するメタデータを短時間で作成し、人的コストを削減することができるシステムを提供することができる。 (3) Effect of Metadata Generation System 10 According to the Present Embodiment As shown in FIGS. 1 and 3, in the metadata generation system 10 according to the present embodiment, the recording file 11 is provided by the recording means 12. When the video is recorded in, the character information acquisition means 13 acquires the character information C displayed in the video V recorded in the recording file 11, and the character information documenting means 14 acquires the character information. C is documented, and the documented character information is stored in the metadata storage file 15 as the metadata M of the video V by the metadata storage means 16, so that the words displayed in the video V and related to the video V are displayed. , The metadata M of the video V can be automatically created with high accuracy from the character information C which is the text information.
As a result, it is possible to provide a system capable of creating metadata about a television broadcast program in a short time and reducing human costs.

図１及び図３に示すように、本実施の形態に係るメタデータ生成システム１０にあっては、文字情報抽出手段１７によって、録画ファイル１１に録画された映像Ｖが画像解析されることにより映像Ｖから文字情報Ｃが抽出され、辞書照合手段１９によって、抽出された文字情報Ｃが辞書ファイル１８と照合される。
従って、画像解析によって効率よく映像Ｖから文字情報Ｃを抽出できると共に、文字情報Ｃが辞書ファイル１８と照合されることにより、例えば、画像解析によって誤認識したり、完全に認識することが出来なかったりした文字や単語を辞書ファイル１８に基づいて修正し、文字情報Ｃの精度を高めることができる。 As shown in FIGS. 1 and 3, in the metadata generation system 10 according to the present embodiment, the image V recorded in the recording file 11 is image-analyzed by the character information extraction means 17. The character information C is extracted from V, and the extracted character information C is collated with the dictionary file 18 by the dictionary collating means 19.
Therefore, the character information C can be efficiently extracted from the video V by image analysis, and the character information C is collated with the dictionary file 18, so that, for example, it cannot be erroneously recognized or completely recognized by image analysis. It is possible to improve the accuracy of the character information C by correcting the characters and words that have been lost based on the dictionary file 18.

図１及び図３に示すように、本実施の形態に係るメタデータ生成システム１０にあっては、画像解析手段３１によって、録画ファイル１１に録画された映像Ｖが、画像解析済みの映像と、前記画像解析済みの映像から抽出された文字情報とを有する画像解析蓄積ファイル３５と照合されることにより、画像解析される。
従って、過去から蓄積された画像解析結果を用いて効果的に画像解析を行うことができ、その結果、映像ＶのメタデータＭを精度良く短時間で作成することができる。 As shown in FIGS. 1 and 3, in the metadata generation system 10 according to the present embodiment, the image V recorded in the recording file 11 by the image analysis means 31 is the image that has been image-analyzed and the image V that has been image-analyzed. The image is analyzed by collating with the image analysis storage file 35 having the character information extracted from the image-analyzed video.
Therefore, it is possible to effectively perform image analysis using the image analysis results accumulated from the past, and as a result, it is possible to accurately create the metadata M of the image V in a short time.

図１及び図３に示すように、本実施の形態に係るメタデータ生成システム１０にあっては、画像解析学習手段３６によって、画像解析蓄積ファイル３５が画像解析手段３１によって画像解析された映像Ｖと、映像Ｖから抽出された文字情報Ｃとに基づいて修正される。
従って、今回行った画像解析結果を画像解析蓄積ファイル３５に追加したり、画像解析蓄積ファイル３５に含まれる誤った情報を今回行った画像解析結果に基づいて削除したりすることができ、その結果、画像解析蓄積ファイル３５を更新して常に最新の状態で使用することができる。 As shown in FIGS. 1 and 3, in the metadata generation system 10 according to the present embodiment, the image V in which the image analysis storage file 35 is image-analyzed by the image analysis means 31 by the image analysis learning means 36. And the character information C extracted from the image V, and the correction is made.
Therefore, the result of the image analysis performed this time can be added to the image analysis storage file 35, and the erroneous information contained in the image analysis storage file 35 can be deleted based on the result of the image analysis performed this time. , The image analysis storage file 35 can be updated and always used in the latest state.

また、図１及び図３に示すように、本実施の形態に係るメタデータ生成システム１０にあっては、文字情報抽出手段１７によって、録画ファイル１１に録画された映像Ｖが画像解析されることにより映像Ｖから文字情報Ｃが抽出され、インターネット照合手段２１によって、抽出された文字情報Ｃがインターネットにより検索され取得された情報Ｉと照合
される。
従って、画像解析によって効率よく映像Ｖから文字情報Ｃを抽出できると共に、文字情報Ｃがインターネットにより検索され取得された情報Ｉと照合されることにより、例えば、画像解析によって誤認識したり、完全に認識することが出来なかったりした文字や単語をインターネットにより検索され取得された情報Ｉに基づいて修正し、文字情報Ｃの精度を高めることができる。 Further, as shown in FIGS. 1 and 3, in the metadata generation system 10 according to the present embodiment, the image V recorded in the recording file 11 is image-analyzed by the character information extraction means 17. Character information C is extracted from the video V, and the extracted character information C is collated with the information I searched and acquired by the Internet collation means 21.
Therefore, the character information C can be efficiently extracted from the video V by image analysis, and the character information C is collated with the information I searched and acquired by the Internet, so that, for example, it is erroneously recognized by image analysis or completely. Characters and words that could not be recognized can be corrected based on the information I obtained by searching on the Internet, and the accuracy of the character information C can be improved.

図１及び図３に示すように、本実施の形態に係るメタデータ生成システム１０にあっては、辞書更新手段３３によって、辞書ファイル１８が文字情報抽出手段１７によって抽出された文字情報Ｃに基づいて修正されるので、文字情報Ｃから得られる新たな単語、文章等の情報を辞書ファイル１８に追加したり、辞書ファイル１８に含まれる誤った情報を文字情報Ｃに基づいて削除したりすることができ、その結果、辞書ファイル１８を更新して常に最新の状態で使用することができる。 As shown in FIGS. 1 and 3, in the metadata generation system 10 according to the present embodiment, the dictionary file 18 is based on the character information C extracted by the character information extracting means 17 by the dictionary updating means 33. Information such as new words and sentences obtained from the character information C is added to the dictionary file 18, and erroneous information contained in the dictionary file 18 is deleted based on the character information C. As a result, the dictionary file 18 can be updated and always used in the latest state.

図１及び図３に示すように、本実施の形態に係るメタデータ生成システム１０にあっては、辞書照合手段１９によって、頻度パラメータ３４の大きい辞書データＤが照合対象として優先的に選択され、選択された辞書データＤと、文字情報抽出手段１７によって抽出された文字情報Ｃとが照合されるので、例えば、辞書ファイル１８に互いに類似した複数の文字や単語が存在する場合に、頻度パラメータ３４の大きい辞書データＤが優先的に選択され、照合対象となる。
その結果、頻度パラメータ３４の大きい辞書データに基づいて修正することができ、文字情報Ｃの精度をより効率的に高めることができる。 As shown in FIGS. 1 and 3, in the metadata generation system 10 according to the present embodiment, the dictionary data D having a large frequency parameter 34 is preferentially selected as a collation target by the dictionary collation means 19. Since the selected dictionary data D and the character information C extracted by the character information extraction means 17 are collated, for example, when a plurality of characters or words similar to each other exist in the dictionary file 18, the frequency parameter 34 The dictionary data D having a large size is preferentially selected and becomes a collation target.
As a result, the correction can be made based on the dictionary data having a large frequency parameter 34, and the accuracy of the character information C can be improved more efficiently.

図１及び図３に示すように、本実施の形態に係るメタデータ生成システム１０にあっては、文字情報文章化手段１４は、文字情報取得手段１３によって取得された文字情報Ｃを集約して文章化する際に、メタデータ格納ファイル１５を参照して、文字情報Ｃに関連する作成済のメタデータＭを文字情報Ｃの文章化に利用することができ、その結果、テレビ放送番組又はインターネット配信動画に関するメタデータを精度良く、より効率的に自動生成することができる。 As shown in FIGS. 1 and 3, in the metadata generation system 10 according to the present embodiment, the character information writing means 14 aggregates the character information C acquired by the character information acquisition means 13. When writing the text, the metadata storage file 15 can be referred to, and the created metadata M related to the character information C can be used for writing the character information C, and as a result, a television broadcast program or the Internet. It is possible to automatically generate metadata about the delivered video with high accuracy and more efficiently.

図１及び図３に示すように、本実施の形態に係るメタデータ生成システム１０にあっては、文字情報文章化手段１４は、文字情報取得手段１３によって取得された文字情報Ｃを集約して文章化する際に、映像Ｖの電子番組表データＥを取得して、放送日時、配信日時、ジャンル、タイトル、出演者等の情報が含まれた電子番組表データＥを文字情報Ｃの文章化に利用することができる。その結果、テレビ放送番組又はインターネット配信動画に関するメタデータを精度良く、より効率的に自動生成することができる。 As shown in FIGS. 1 and 3, in the metadata generation system 10 according to the present embodiment, the character information writing means 14 aggregates the character information C acquired by the character information acquisition means 13. When writing the text, the electronic program guide data E of the video V is acquired, and the electronic program guide data E including information such as the broadcast date and time, the delivery date and time, the genre, the title, and the performers is written as the character information C. Can be used for. As a result, metadata related to a television broadcast program or an Internet-distributed moving image can be automatically generated with high accuracy and more efficiently.

また、図１及び図３に示すように、本実施の形態に係るメタデータ生成システム１０にあっては、映像認識情報抽出手段２２によって、映像Ｖに含まれる人物Ｐ、ロゴＬ、人物Ｐの持ち物Ｂ又は人物Ｐの表情Ｆと、人物情報、ロゴ情報、物情報又は表情情報とが照合され、映像Ｖに含まれる人物Ｐ、ロゴＬ、人物Ｐの持ち物Ｂ又は人物Ｐの表情Ｆが文字情報Ｃとして抽出されるので、映像Ｖに含まれる人物Ｐ、ロゴＬ、人物Ｐの持ち物Ｂ又は人物Ｐの表情Ｆから映像ＶのメタデータＭを作成することができる。 Further, as shown in FIGS. 1 and 3, in the metadata generation system 10 according to the present embodiment, the image recognition information extracting means 22 causes the person P, the logo L, and the person P included in the image V to be connected. The facial expression F of the possession B or the person P is collated with the personal information, the logo information, the physical information or the facial expression information, and the personality P, the logo L, and the facial expression F of the personal belongings B or the person P included in the image V are characters. Since it is extracted as the information C, the metadata M of the image V can be created from the person P, the logo L, the belongings B of the person P, or the facial expression F of the person P included in the image V.

また、本実施の形態に係るメタデータ生成システム１０にあっては、映像認識情報抽出手段２２によって、録画ファイル１１に録画された映像Ｖが、画像解析済みの映像と、前記画像解析済みの映像から抽出された文字情報とを有する前記人物情報、前記ロゴ情報、前記物情報又は前記表情情報と照合されることにより、映像Ｖに含まれる人物Ｐ、ロゴＬ、人物Ｐの持ち物Ｂ又は人物Ｐの表情Ｆが文字情報Ｃとして抽出される。
従って、過去から蓄積された画像解析結果を用いて効果的に画像解析を行うことができ
、その結果、映像ＶのメタデータＭを精度良く短時間で作成することができる。 Further, in the metadata generation system 10 according to the present embodiment, the image V recorded in the recording file 11 by the image recognition information extraction means 22 is an image-analyzed image and an image-analyzed image. By collating with the person information, the logo information, the object information, or the facial expression information having the character information extracted from, the person P, the logo L, the person P's belongings B or the person P included in the image V. The expression F of is extracted as the character information C.
Therefore, it is possible to effectively perform image analysis using the image analysis results accumulated from the past, and as a result, it is possible to accurately create the metadata M of the image V in a short time.

また、本実施の形態に係るメタデータ生成システム１０にあっては、映像認識学習手段３７によって、前記人物情報、前記ロゴ情報、前記物情報又は前記表情情報が、映像認識情報抽出手段２２によって画像解析された映像Ｖと、映像Ｖから抽出された文字情報Ｃとに基づいて修正される。
従って、今回行った画像解析結果を前記人物情報、前記ロゴ情報、前記物情報又は前記表情情報に追加したり、前記人物情報、前記ロゴ情報、前記物情報又は前記表情情報に含まれる誤った情報を今回行った画像解析結果に基づいて削除したりすることができ、その結果、前記人物情報、前記ロゴ情報、前記物情報又は前記表情情報を更新して常に最新の状態で使用することができる。 Further, in the metadata generation system 10 according to the present embodiment, the image recognition learning means 37 displays the person information, the logo information, the object information, or the facial expression information by the image recognition information extracting means 22. It is corrected based on the analyzed video V and the character information C extracted from the video V.
Therefore, the result of the image analysis performed this time may be added to the person information, the logo information, the object information or the facial expression information, or erroneous information included in the person information, the logo information, the object information or the facial expression information. Can be deleted based on the result of the image analysis performed this time, and as a result, the person information, the logo information, the object information, or the facial expression information can be updated and always used in the latest state. ..

また、図１及び図３に示すように、本実施の形態に係るメタデータ生成システム１０にあっては、音声情報抽出手段２３によって、録画ファイル１１に録画された映像Ｖと共に録音された音声が音声解析されることにより前記音声から文字情報Ｃが抽出され、辞書照合手段１９によって、抽出された文字情報Ｃが辞書ファイル１８と照合される。
従って、音声解析によって効率よく映像Ｖと共に録音された音声から文字情報Ｃを抽出できると共に、文字情報Ｃが辞書ファイル１８と照合されることから、例えば、音声解析によって誤認識したり、完全に認識することが出来なかったりした文字や単語を辞書ファイル１８に基づいて修正し、文字情報Ｃの精度を高めることができる。 Further, as shown in FIGS. 1 and 3, in the metadata generation system 10 according to the present embodiment, the audio recorded together with the video V recorded in the recording file 11 by the audio information extraction means 23 is recorded. The character information C is extracted from the voice by the voice analysis, and the extracted character information C is collated with the dictionary file 18 by the dictionary collating means 19.
Therefore, the character information C can be efficiently extracted from the voice recorded together with the video V by voice analysis, and the character information C is collated with the dictionary file 18, so that, for example, misrecognition or complete recognition is performed by voice analysis. Characters and words that could not be recorded can be corrected based on the dictionary file 18, and the accuracy of the character information C can be improved.

本実施の形態に係るメタデータ生成システム１０にあっては、音声解析手段３９によって、録画ファイル１１に録画された映像Ｖと共に録音された前記音声が、音声解析済みの音声と、前記音声解析済みの音声から抽出された文字情報とを有する音声解析蓄積ファイル３８と照合されることにより、音声解析される。
従って、過去から蓄積された音声解析結果を用いて効果的に音声解析を行うことができ、その結果、映像ＶのメタデータＭを精度良く短時間で作成することができる。 In the metadata generation system 10 according to the present embodiment, the voice recorded together with the video V recorded in the recording file 11 by the voice analysis means 39 is the voice that has been voice-analyzed and the voice that has been analyzed. The voice is analyzed by collating with the voice analysis storage file 38 having the character information extracted from the voice of.
Therefore, it is possible to effectively perform voice analysis using the voice analysis results accumulated from the past, and as a result, it is possible to accurately create the metadata M of the video V in a short time.

また、本実施の形態に係るメタデータ生成システム１０にあっては、音声解析学習手段４０によって、音声解析蓄積ファイル３８が音声解析手段３９によって音声解析された音声と、前記音声から抽出された文字情報Ｃとに基づいて修正される。
従って、今回行った音声解析結果を音声解析蓄積ファイル３８に追加したり、音声解析蓄積ファイル３８に含まれる誤った情報を今回行った音声解析結果に基づいて削除したりすることができ、その結果、音声解析蓄積ファイル３８を更新して常に最新の状態で使用することができる。 Further, in the metadata generation system 10 according to the present embodiment, the voice in which the voice analysis storage file 38 is voice-analyzed by the voice analysis means 39 by the voice analysis learning means 40 and the characters extracted from the voice. It is corrected based on the information C.
Therefore, the result of the voice analysis performed this time can be added to the voice analysis storage file 38, and the erroneous information contained in the voice analysis storage file 38 can be deleted based on the result of the voice analysis stored this time. , The voice analysis storage file 38 can be updated and always used in the latest state.

また、図１及び図３に示すように、本実施の形態に係るメタデータ生成システム１０にあっては、録画手段１２によって、録画ファイル１１に映像Ｖが録画された場合には、文字情報抽出手段１７によって、録画ファイル１１に録画された映像Ｖが画像解析されることにより映像Ｖから文字情報Ｃが抽出され、映像認識情報抽出手段２２によって、映像Ｖに含まれる人物Ｐ、ロゴＬ、人物Ｐの持ち物Ｂ又は人物Ｐの表情Ｆと、人物情報、ロゴ情報、物情報又は表情情報とが照合され、映像Ｖに含まれる人物Ｐ、ロゴＬ、人物Ｐの持ち物Ｂ又は人物Ｐの表情Ｆが文字情報Ｃとして抽出され、音声情報抽出手段２３によって、録画ファイル１１に録画された映像Ｖと共に録音された音声が音声解析されることにより前記音声から文字情報Ｃが抽出され、複合情報照合手段２４によって、文字情報抽出手段１７、映像認識情報抽出手段２２、及び、音声情報抽出手段２３によって、夫々、抽出された文字情報Ｃが互いに照合される。
従って、画像解析、音声解析、及び、映像Ｖに含まれる人物Ｐ、ロゴＬ、人物Ｐの持ち物Ｂ又は人物Ｐの表情Ｆから効率よく文字情報Ｃを抽出できる。
また、複合情報照合手段２４によって、文字情報抽出手段１７、映像認識情報抽出手段
２２、及び、音声情報抽出手段２３によって、夫々、抽出された文字情報Ｃが互いに照合されるので、例えば、文字情報抽出手段１７によって誤認識したり、完全に認識することが出来なかったりした文字や単語を、音声情報抽出手段２３によって抽出された文字情報Ｃに基づいて修正することができる。
その結果、テレビ放送番組又はインターネット配信動画に関するメタデータをより精度良く効率的に自動生成することが出来るシステムを提供することができる。 Further, as shown in FIGS. 1 and 3, in the metadata generation system 10 according to the present embodiment, when the video V is recorded in the recording file 11 by the recording means 12, character information is extracted. Character information C is extracted from the video V by image analysis of the video V recorded in the recording file 11 by the means 17, and the person P, the logo L, and the person included in the video V are extracted by the video recognition information extraction means 22. The facial expression F of P's belongings B or person P is collated with the personal information, logo information, physical information or facial expression information, and the personality P, logo L, person P's belongings B or person P's facial expression F included in the video V is collated. Is extracted as character information C, and the voice information extraction means 23 extracts the character information C from the voice by analyzing the voice recorded together with the video V recorded in the recording file 11, and the composite information collation means. By 24, the extracted character information C is collated with each other by the character information extracting means 17, the video recognition information extracting means 22, and the voice information extracting means 23, respectively.
Therefore, the character information C can be efficiently extracted from the image analysis, the voice analysis, and the person P, the logo L, the possession B of the person P, or the facial expression F of the person P included in the video V.
Further, the compound information collating means 24 collates the extracted character information C with each other by the character information extracting means 17, the video recognition information extracting means 22, and the voice information extracting means 23, respectively. Therefore, for example, the character information. Characters and words that are erroneously recognized by the extraction means 17 or cannot be completely recognized can be corrected based on the character information C extracted by the voice information extraction means 23.
As a result, it is possible to provide a system capable of automatically generating metadata related to a television broadcast program or an Internet-distributed moving image more accurately and efficiently.

本実施の形態にあっては、映像Ｖは、テレビ放送局３０が放送する放送番組映像である場合を例に説明したが、前記構成に限定されず、映像Ｖは、インターネットによって配信される動画映像であってもよい。 In the present embodiment, the video V has been described as an example of a broadcast program video broadcast by the television broadcasting station 30, but the video V is not limited to the above configuration, and the video V is a video distributed by the Internet. It may be a video.

本考案は、メタデータを生成するシステムに広く適用可能であり、産業上利用可能性を有している。 The present invention is widely applicable to systems that generate metadata and has industrial applicability.

１０：メタデータ生成システム
１１：録画ファイル
１２：録画手段
１３：文字情報取得手段
１４：文字情報文章化手段
１５：メタデータ格納ファイル
１６：メタデータ格納手段
１７：文字情報抽出手段
１８：辞書ファイル
１９：辞書照合手段
２０：インターネット
２１：インターネット照合手段
２２：映像認識情報抽出手段
２３：音声情報抽出手段
２４：複合情報照合手段
２５：番組コンテンツ
２６：チャンネル名
２７：タイムコード
２８：番組コンテンツ要約テキストデータ
２９：チャンネル名
３０：テレビ放送局
３１：画像解析手段
３２：単語解析手段
３３：辞書更新手段
３４：頻度パラメータ
３５：画像解析蓄積ファイル
３６：画像解析学習手段
３７：映像認識学習手段
３８：音声解析蓄積ファイル
３９：音声解析手段
４０：音声解析学習手段
Ｂ：人物の持ち物
Ｃ：文字情報
Ｄ：辞書データ
Ｅ：電子番組表データ
Ｆ：人物の表情
Ｌ：ロゴ
Ｍ：メタデータ
Ｐ：人物
Ｖ：映像 10: Metadata generation system 11: Recorded file 12: Recording means 13: Character information acquisition means 14: Character information documenting means 15: Metadata storage file 16: Metadata storage means 17: Character information extraction means 18: Dictionary file 19 : Dictionary collation means 20: Internet 21: Internet collation means 22: Video recognition information extraction means 23: Audio information extraction means 24: Composite information collation means 25: Program content 26: Channel name 27: Time code 28: Program content summary text data 29: Channel name 30: Television broadcasting station 31: Image analysis means 32: Word analysis means 33: Dictionary update means 34: Frequency parameter 35: Image analysis storage file 36: Image analysis learning means 37: Video recognition learning means 38: Voice analysis Storage file 39: Voice analysis means 40: Voice analysis Learning means B: Person's belongings C: Character information D: Dictionary data E: Electronic program guide data F: Person's facial expression L: Logo M: Metadata P: Person V: Video

Claims

A recording means having a recording file for recording a video, a character information acquisition means for acquiring character information from the video recorded in the recording file, and the character information acquired by the character information acquisition means are aggregated and documented. It is provided with a character information documenting means for storing the character information documented by the character information documenting means and a metadata storage means for storing the character information documented by the character information documenting means in a metadata storage file as metadata of a video recorded in the recording file.
The character information acquisition means includes a character information extraction means that performs image analysis on the video recorded in the recording file and extracts character information from the video.
The person, logo, belongings of the person or the facial expression of the person included in the video are collated with the person information, logo information, physical information or facial expression information, and the person, logo, belongings of the person or the facial expression information included in the video are collated. A video recognition information extraction means that extracts the facial expression of the person as character information,
A voice information extraction means that performs voice analysis on the voice recorded together with the video recorded in the recording file and extracts character information from the voice.
A metadata generation system comprising the character information extracting means, the video recognition information extracting means, and a composite information collating means for collating the character information extracted by the voice information extracting means with each other.

The character information acquisition means performs image analysis on the video recorded in the recorded file, extracts character information from the video, and obtains the character information extracted by the character information extraction means. The metadata generation system according to claim 1, further comprising a dictionary collation means for collating with a dictionary file.

The character information extracting means is characterized by having an image analysis means for collating an image-analyzed video and an image analysis storage file having character information extracted from the image-analyzed video for image analysis. The metadata generation system according to claim 2.

The character information extracting means further includes an image analysis learning means for modifying the image analysis storage file based on the image analyzed by the image analysis means and the character information extracted from the image. The metadata generation system according to claim 3.

The character information acquisition means performs image analysis on the video recorded in the recorded file, extracts character information from the video, and obtains the character information extracted by the character information extraction means. The metadata generation system according to claim 1, further comprising an Internet collation means for collating with information searched and acquired by the Internet.

The metadata generation system according to claim 2, wherein the character information acquisition means further includes a dictionary update means for modifying the dictionary file based on the character information extracted by the character information extraction means.

2. The dictionary file has dictionary data and a frequency parameter of the dictionary data, and the dictionary collation means preferentially selects dictionary data having a large frequency parameter as a collation target. Described metadata generation system.

The character information writing means refers to the metadata storage file, and uses the metadata related to the character information acquired by the character information acquisition means for writing the character information. Item 1. The metadata generation system according to item 1.

The metadata generation system according to claim 1, wherein the character information writing means acquires electronic program guide data of a video recorded in the recorded file and uses it for writing the character information.

The character information acquisition means collates the person, logo, belongings of the person or the facial expression of the person with the person information, logo information, object information or facial expression information included in the video, and the person included in the video. The metadata generation system according to claim 1, further comprising a video recognition information extracting means for extracting a logo, the belongings of the person, or the facial expression of the person as character information.

The person information, the logo information, the object information, or the facial expression information is characterized in that it is composed of an image-analyzed image and character information extracted from the image-analyzed image.
10. The metadata generation system according to claim 10.

The character information acquisition means obtains the person information, the logo information, the object information, or the facial expression information based on the image analyzed by the image recognition information extracting means and the character information extracted from the image. The metadata generation system according to claim 11, further comprising a video recognition learning means for modifying.

The character information acquisition means has a voice information extraction means for extracting character information from the voice by performing voice analysis on the voice recorded together with the video recorded in the recording file. The metadata generation system described in.

The voice information extracting means is characterized by having a voice analysis means for collating a voice analyzed voice and a voice analysis storage file having character information extracted from the voice analyzed voice to perform voice analysis. The metadata generation system according to claim 13.

The character information acquisition means further includes a voice analysis learning means for modifying the voice analysis storage file based on the voice analyzed by the voice analysis means and the character information extracted from the voice. 14. The metadata generation system according to claim 14.

The metadata generation system according to any one of claims 1 to 15, wherein the video is a broadcast program video broadcast by a television broadcasting station.

The metadata generation system according to any one of claims 1 to 16, wherein the video is a moving image distributed via the Internet.