JP2009272961A

JP2009272961A - Content evaluation method, device and program and computer-readable recording medium

Info

Publication number: JP2009272961A
Application number: JP2008122655A
Authority: JP
Inventors: Takeshi Irie; 豪入江; Kota Hidaka; 浩太日高; Hidenobu Osada; 秀信長田; Mitsuhiro Wagatsuma; 光洋我妻; Takashi Sato; 隆佐藤; Yukinobu Taniguchi; 行信谷口
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2008-05-08
Filing date: 2008-05-08
Publication date: 2009-11-19
Anticipated expiration: 2028-05-08
Also published as: JP5054608B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide more detailed quality information by analyzing at least one of image, audio and music signals in contents and numerically and accurately measuring the quality thereof for each segment of the contents. <P>SOLUTION: The present invention includes; extracting as an analytic signal image information or audio information in contents or at least one of image information and audio information; storing the analytic signal in a storage means; acquiring the analytic signal from the storage means; referencing a rule storage means storing rules as condition determination sentences, using an analytic signal, of which the generation frequency is varied in accordance with a difference in the qualities of the contents; and calculating, for outputting, a quality value while using a rule sufficiency scale calculated on the basis of the rules corresponding to the analytic signal. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、コンテンツ評価方法及び装置及びプログラム及びコンピュータ読み取り可能な記録媒体に係り、特に、動画像、音声、音楽などのマルチメディアコンテンツのクオリティを自動的に評価するためのコンテンツ評価方法及び装置及びプログラム及びコンピュータ読み取り可能な記録媒体に関する。 The present invention relates to a content evaluation method and apparatus, a program, and a computer-readable recording medium, and in particular, a content evaluation method and apparatus for automatically evaluating the quality of multimedia content such as moving images, sounds, and music, and the like. The present invention relates to a program and a computer-readable recording medium.

現在マルチメディアコンテンツの数が増大している。マルチメディアコンテンツには、主として映像・音声・音楽コンテンツがあるが、これらはいずれも時間メディアであるため、コンテンツ内容を把握するためには、凡そコンテンツの時間長と同じ時間を必要とする。このような時間的コストを低減したいという要請の下、コンテンツを視聴することなく、内容を事前に把握することのできる技術が求められている。 Currently, the number of multimedia contents is increasing. Multimedia contents mainly include video, audio, and music contents. Since these are all time media, it takes about the same time length as the contents to grasp the contents. Under such a demand for reducing the time cost, there is a demand for a technology that can grasp the contents in advance without viewing the contents.

一言で内容といっても、様々な情報が含まれるが、中でも、コンテンツのクオリティは非常の重要な情報であると認識されている。例えば、商用のドラマや映画などは、プロフェッショナルの作成者、撮影機器によって作成されており、画質や音質、ストーリーなどのクオリティが比較的高い。一方で、家庭等で撮影されたホームビデオなどは、アマチュアの作者によって作成されたものであることが多いため、相対的にはクオリティが低い。 Even if it is content in a word, various information is included, but it is recognized that the quality of the content is very important information. For example, commercial dramas and movies are created by professional creators and photographic equipment, and have relatively high image quality, sound quality, and story quality. On the other hand, home videos and the like taken at home are often produced by amateur authors, and therefore have relatively low quality.

このように、コンテンツのクオリティを知ることができれば、そのコンテンツがどのような目的で作成されたコンテンツであるのか、また、どのようなコンテンツであるのかなどを事前に推測することも可能となってくる。 In this way, if the quality of the content can be known, it is possible to estimate in advance what purpose the content is created for and what kind of content it is. come.

コンテンツの情報を用いて、コンテンツのクオリティを評価する技術として、映像策定時のカメラの動きに着目し、これが急激に動く場合や手振れを含むセグメントは、クオリティが低いと判断する技術がある（例えば、特許文献１参照）。 As a technology for evaluating the quality of content using content information, there is a technology that focuses on the movement of the camera at the time of video formulation, and determines that the quality of a segment that includes a sudden movement or camera shake is low (for example, , See Patent Document 1).

なお、関連技術として、発話セグメントと音楽セグメントを検出する方法（例えば、特許文献２参照）、テロップの出現領域を特定する手法（例えば、特許文献３参照）、顔の撮影された領域を検出する方法（例えば、特許文献４参照）、動画検索方法（例えば、特許文献５、６参照）、基本周波数及びパワーの抽出方法（例えば、非特許文献１参照）、映像構造化手法（例えば、非特許文献２参照）、などが、公知の技術として存在する。
特開２０００−２６１７５７号公報特開平１０−１８７１８２号公報特開平１１−３２８４２３号公報特開２００５−１５７９１１号公報特開２００２−２４５０５１号公報特開２００６−６０７９６号公報特開２００５−３１１６７６号公報古井卓熙、「ディジタル音声処理第４章 4.9 ピッチ抽出」、東海大学出版会、1985年９月、pp. 57-59. 谷口行信、阿久津明人、外村佳伸、Panorama Excerpts: パノラマ画像の自動生成・レイアウトによる映像一覧、電子情報通信学会論文誌D-II, Vol. J82-D-II, NO.3, pp.390-398, 1999. In addition, as related technologies, a method for detecting speech segments and music segments (for example, see Patent Document 2), a method for specifying an appearance region of a telop (for example, see Patent Document 3), and a region in which a face is photographed is detected. Method (for example, see Patent Document 4), moving image search method (for example, see Patent Documents 5 and 6), fundamental frequency and power extraction method (for example, Non-Patent Document 1), video structuring method (for example, Non-Patent Document) Reference 2), etc. exist as known techniques.
JP 2000-261757 A Japanese Patent Laid-Open No. 10-187182 Japanese Patent Laid-Open No. 11-328423 JP 2005-157911 A JP 2002-245051 A JP 2006-60796 A Japanese Patent Laid-Open No. 2005-311676 Takumi Furui, “Digital Audio Processing Chapter 4, 4.9 Pitch Extraction”, Tokai University Press, September 1985, pp. 57-59. Yukinobu Taniguchi, Akito Akutsu, Yoshinobu Tonomura, Panorama Excerpts: Video list by automatic panorama image generation / layout, IEICE Transactions D-II, Vol. J82-D-II, NO.3, pp. 390-398, 1999.

従来の技術では、カメラの動きのみを用いて、コンテンツのクオリティを判定している。しかしながら、これはドラマや映画、スポーツ映像や一般のホームビデオなど、多くの映像に含まれるものであるため、これのみに基づいてクオリティを測る場合、その精度が低いものになってしまうという問題があった。また、クオリティが低いかそうでないか、の２値判定をするに留まっていた。 In the prior art, the quality of content is determined using only the movement of the camera. However, since this is included in many videos such as dramas, movies, sports videos and general home videos, when measuring quality based only on this, there is a problem that the accuracy is low. there were. In addition, the binary determination of whether the quality is low or not is limited.

上記のような理由から、従来技術のみでは、コンテンツの詳細なクオリティを提供できないという問題がある。 For the above reasons, there is a problem that the detailed quality of content cannot be provided only by the conventional technology.

本発明は、上記の点に鑑みなされたもので、コンテンツ中の画像、音声、音楽信号の少なくとも１つを解析し、コンテンツのセグメント毎に、そのクオリティを数値的に、精度良く計測することで、より詳細なクオリティ情報を提供できるコンテンツ評価方法及び装置及びプログラム及びコンピュータ読み取り可能な記録媒体を提供することを目的とする。 The present invention has been made in view of the above points. By analyzing at least one of an image, a sound, and a music signal in content, and measuring the quality numerically and accurately for each segment of the content. Another object of the present invention is to provide a content evaluation method and apparatus and program capable of providing more detailed quality information, and a computer-readable recording medium.

図１は、本発明の原理構成図である。 FIG. 1 is a principle configuration diagram of the present invention.

本発明（請求項１）は、コンテンツ中に含まれる画像、音声、音楽信号の少なくとも１つを解析し、コンテンツを評価するコンテンツ評価装置であって、
コンテンツ中の画像情報、または、音声情報、または、画像情報、及び、音声情報の少なくとも１つを分析信号として抽出し、記憶手段４０に格納する分析信号抽出手段２０と、
コンテンツのクオリティの違いにより発生頻度が変化する分析信号を用いた条件判定文であるルールを格納したルール記憶手段７０と、
記憶手段４０から分析信号を取得し、ルール記憶手段７０を参照し、該分析信号に対応するルールに基づいて算出されるルール充足性尺度を用いてクオリティ値を計算して出力するクオリティ値算出手段６０と、を有する。 The present invention (Claim 1) is a content evaluation apparatus that analyzes at least one of an image, a sound, and a music signal included in content and evaluates the content,
Analysis signal extraction means 20 for extracting at least one of image information or audio information or image information and audio information in the content as an analysis signal and storing it in the storage means 40;
A rule storage means 70 for storing a rule that is a condition determination sentence using an analysis signal whose frequency of occurrence changes due to a difference in content quality;
Quality value calculation means for acquiring an analysis signal from the storage means 40, referring to the rule storage means 70, and calculating and outputting a quality value using a rule satisfaction measure calculated based on a rule corresponding to the analysis signal 60.

また、本発明（請求項２）は、ルール記憶手段７０において、
条件判定文として、
一連の会話中のカット点がある場合に、コンテンツのクオリティを高く評価し、一連の会話中にカメラワークがある場合にコンテンツのクオリティを低く評価する条件判定文を格納する。 Further, the present invention (Claim 2) is provided in the rule storage means 70.
As a condition judgment sentence,
When there is a cut point during a series of conversations, a condition judgment sentence is stored that evaluates the quality of the content highly, and evaluates the content quality low when there is camera work during the series of conversations.

また、本発明（請求項３）は、コンテンツ中に含まれる画像、音声、音楽信号の少なくとも１つを解析し、コンテンツを評価するコンテンツ評価装置であって、
コンテンツ中の画像情報、または、音声情報、または、画像情報及び音声情報の少なくとも１つを分析信号として抽出し、記憶手段に格納する分析信号抽出手段と、
記憶手段から分析信号を読み出して、該分析信号のうち、画像信号のショット長、動き量、カラーヒストグラム、ピッチ変化、パワーレベル、音声信号のピッチ変化、パワーレベル比のうちの少なくとも１つを用いて算出される特徴量尺度を用いて、クオリティ値を計算して出力するクオリティ値算出手段と、を有する。 The present invention (Claim 3) is a content evaluation apparatus that analyzes at least one of an image, a sound, and a music signal included in content and evaluates the content,
Analysis signal extraction means for extracting at least one of image information, audio information, or image information and audio information in the content as an analysis signal, and storing it in the storage means;
The analysis signal is read from the storage means, and at least one of the shot length of the image signal, the amount of motion, the color histogram, the pitch change, the power level, the pitch change of the audio signal, and the power level ratio among the analysis signals is used. Quality value calculation means for calculating and outputting a quality value using the feature quantity scale calculated in this manner.

本発明（請求項４）は、コンテンツ中に含まれる画像、音声、音楽信号の少なくとも１つを解析し、コンテンツを評価するコンテンツ評価装置であって、
コンテンツ中の画像情報、または、音声情報、または、画像情報及び音声情報の少なくとも１つを分析信号として抽出し、記憶手段に格納する分析信号抽出手段と、
一連の会話中のカット点がある場合に、コンテンツのクオリティを高く評価し、一連の会話中にカメラワークがある場合にコンテンツのクオリティを低く評価する条件判定文を格納したルール記憶手段と、
前記憶手段から分析信号を取得し、ルール記憶手段を参照し、該分析信号に対応するルールに基づいて算出されるルール充足性尺度を用いてクオリティ値を計算して出力する第１のクオリティ値算出手段と、
記憶手段から分析信号を読み出して、該分析信号のうち、画像信号のショット長、動き量、カラーヒストグラム、ピッチ変化、パワーレベル、音声信号のピッチ変化、パワーレベル比のうちの少なくとも１つを用いて算出される特徴量尺度を用いて、クオリティ値を計算して出力する第２のクオリティ値算出手段と、を有する。 The present invention (Claim 4) is a content evaluation device that analyzes at least one of an image, a sound, and a music signal included in content and evaluates the content,
Analysis signal extraction means for extracting at least one of image information, audio information, or image information and audio information in the content as an analysis signal, and storing it in the storage means;
A rule storage means for storing a condition judgment sentence that evaluates the quality of the content highly when there is a cut point during a series of conversations, and evaluates the quality of the content low when there is camera work during the series of conversations;
A first quality value obtained by obtaining an analysis signal from the pre-storage means, referring to the rule storage means, and calculating and outputting a quality value using a rule satisfaction measure calculated based on a rule corresponding to the analysis signal A calculation means;
The analysis signal is read from the storage means, and at least one of the shot length of the image signal, the amount of motion, the color histogram, the pitch change, the power level, the pitch change of the audio signal, and the power level ratio among the analysis signals is used. Second quality value calculating means for calculating and outputting a quality value using the feature quantity scale calculated in this manner.

図２は、本発明の原理を説明するための図である。 FIG. 2 is a diagram for explaining the principle of the present invention.

本発明（請求項５）は、コンテンツ中に含まれる画像、音声、音楽信号の少なくとも１つを解析し、コンテンツを評価するコンテンツ評価方法であって、
分析信号抽出手段が、コンテンツ中の画像情報、または、音声情報、または、画像情報、及び、音声情報の少なくとも１つを分析信号として抽出し、記憶手段に格納する分析信号抽出ステップ（ステップ１）と、
クオリティ値算出手段が、記憶手段から分析信号を取得し、コンテンツのクオリティの違いにより発生頻度が変化する分析信号を用いた条件判定文であるルールを格納したルール記憶手段を参照し、該分析信号に対応するルールに基づいて算出されるルール充足性尺度を用いてクオリティ値を計算して出力するクオリティ値算出ステップ（ステップ２）と、を行う。 The present invention (Claim 5) is a content evaluation method for analyzing a content by analyzing at least one of an image, a sound, and a music signal included in the content,
An analysis signal extraction step (step 1) in which the analysis signal extraction unit extracts at least one of image information, audio information, image information, and audio information in the content as an analysis signal and stores it in the storage unit. When,
The quality value calculation means obtains an analysis signal from the storage means, and refers to the rule storage means that stores a rule that is a condition determination sentence using an analysis signal whose occurrence frequency varies depending on the quality of the content. And a quality value calculating step (step 2) for calculating and outputting a quality value using a rule satisfaction measure calculated based on the rule corresponding to.

また、本発明（請求項６）は、クオリティ値算出ステップ（ステップ２）において、
一連の会話中のカット点がある場合に、コンテンツのクオリティを高く評価し、一連の会話中にカメラワークがある場合にコンテンツのクオリティを低く評価する条件判定文を格納した、ルール記憶手段を参照する。 Further, according to the present invention (Claim 6), in the quality value calculation step (Step 2),
Refers to rule storage means that stores a condition judgment statement that evaluates content quality high when there is a cut point during a series of conversations, and evaluates content quality low when there is camerawork during a series of conversations To do.

本発明（請求項７）は、コンテンツ中に含まれる画像、音声、音楽信号の少なくとも１つを解析し、コンテンツを評価するコンテンツ評価方法であって、
分析信号抽出手段がコンテンツ中の画像情報、または、音声情報、または、画像情報及び音声情報の少なくとも１つを分析信号として抽出し、記憶手段に格納する分析信号抽出ステップと、
クオリティ値算出手段が、記憶手段から分析信号を読み出して、該分析信号のうち、画像信号のショット長、動き量、カラーヒストグラム、ピッチ変化、パワーレベル、音声信号のピッチ変化、パワーレベル比のうちの少なくとも１つを用いて算出される特徴量尺度を用いて、クオリティ値を計算して出力するクオリティ値算出ステップと、を行う。 The present invention (Claim 7) is a content evaluation method for analyzing a content by analyzing at least one of an image, a sound, and a music signal included in the content,
An analysis signal extraction step in which the analysis signal extraction unit extracts at least one of image information or audio information or image information and audio information in the content as an analysis signal, and stores the analysis signal in a storage unit;
The quality value calculation means reads the analysis signal from the storage means, and among the analysis signal, the shot length of the image signal, the amount of motion, the color histogram, the pitch change, the power level, the pitch change of the audio signal, and the power level ratio A quality value calculating step of calculating and outputting a quality value using a feature amount scale calculated using at least one of the following.

本発明（請求項８）は、コンテンツ中に含まれる画像、音声、音楽信号の少なくとも１つを解析し、コンテンツを評価するコンテンツ評価方法であって、
分析信号抽出手段が、コンテンツ中の画像情報、または、音声情報、または、画像情報及び音声情報の少なくとも１つを分析信号として抽出し、記憶手段に格納する分析信号抽出ステップと、
第１のクオリティ値算出手段が、記憶手段から分析信号を取得し、一連の会話中のカット点がある場合に、コンテンツのクオリティを高く評価し、一連の会話中にカメラワークがある場合にコンテンツのクオリティを低く評価する条件判定文を格納したルール記憶手段を参照し、該分析信号に対応するルールに基づいて算出されるルール充足性尺度を用いてクオリティ値を計算して出力する第１のクオリティ値算出ステップと、
第２のクオリティ値算出手段が、記憶手段から分析信号を読み出して、該分析信号のうち、画像信号のショット長、動き量、カラーヒストグラム、ピッチ変化、パワーレベル、音声信号のピッチ変化、パワーレベル比のうちの少なくとも１つを用いて算出される特徴量尺度を用いて、クオリティ値を計算して出力する第２のクオリティ値算出ステップと、
を行う。 The present invention (Claim 8) is a content evaluation method for analyzing content by analyzing at least one of an image, a sound, and a music signal included in the content,
An analysis signal extraction step in which the analysis signal extraction means extracts at least one of image information or audio information or image information and audio information in the content as an analysis signal, and stores it in the storage means;
The first quality value calculation means obtains the analysis signal from the storage means, evaluates the quality of the content highly when there is a cut point during a series of conversations, and if there is camera work during the series of conversations, the content First, a rule storage means that stores a condition determination sentence that evaluates the quality of the image at a low level is calculated, and a quality value is calculated and output using a rule satisfaction measure calculated based on the rule corresponding to the analysis signal. A quality value calculation step;
The second quality value calculation means reads the analysis signal from the storage means, and among the analysis signals, the shot length of the image signal, the amount of motion, the color histogram, the pitch change, the power level, the pitch change of the audio signal, the power level A second quality value calculating step of calculating and outputting a quality value using a feature amount scale calculated using at least one of the ratios;
I do.

本発明（請求項９）は、請求項１乃至４のいずれか１項記載のコンテンツ評価装置を構成する各手段としてコンピュータを機能させるためのコンテンツ評価プログラムである。 The present invention (Claim 9) is a content evaluation program for causing a computer to function as each means constituting the content evaluation apparatus according to any one of Claims 1 to 4.

本発明（請求項１０）は、請求項９記載のコンテンツ評価プログラムを格納したコンピュータ読み取り可能な記録媒体である。 The present invention (Claim 10) is a computer-readable recording medium storing the content evaluation program according to Claim 9.

上記のように、本発明は、コンテンツのクオリティの違いにより出現刷る頻度が変化する特徴を画像、音声、音楽信号の中から予め抽出して、コンテンツのクオリティ評価用の条件判定文もしくは評価式を作成する。評価対象となるコンテンツからこれらの評価用の特徴量を抽出し、条件判定文や評価式を用いてコンテンツの評価値を算出する。 As described above, according to the present invention, a feature that changes the frequency of appearance printing due to a difference in content quality is extracted in advance from an image, a sound, and a music signal, and a condition judgment sentence or an evaluation expression for content quality evaluation is obtained. create. These evaluation feature quantities are extracted from the content to be evaluated, and the evaluation value of the content is calculated using a condition determination sentence and an evaluation formula.

コンテンツのクオリティの違いにより出現する頻度が変化する特徴については、例えば一般的には編集処理がなされた場合にはコンテンツのクオリティが高くなることを利用して、編集作業（例えば、テロップの挿入）を実施したことを検出対象とする特徴検出処理を実施し、編集作業を実施したことが検出された場合には、コンテンツのクオリティを高く評価する評価値を付与する。また、品質の高いコンテンツを作成する際には、複数種類の映像機器を利用（例えば、複数台のカメラを利用）している頻度が高いことを利用し、カメラを複数台利用して作成したことが分かる特徴量を評価に用いる。 For features whose frequency of appearance changes due to differences in content quality, for example, editing work (for example, insertion of a telop) is generally performed by using the fact that the quality of the content becomes higher when editing processing is performed. When it is detected that the editing operation has been performed by performing the feature detection process for detecting that the content has been implemented, an evaluation value that highly evaluates the quality of the content is assigned. Also, when creating high-quality content, it was created using multiple cameras, taking advantage of the high frequency of using multiple types of video equipment (for example, using multiple cameras). A feature quantity that can be understood is used for evaluation.

上記のように、本発明によれば、コンテンツのクオリティの違いにより出現する頻度が変化する特徴を条件判定として持つルールを予め作成し、ルール充足性尺度を算出して映像のクオリティを評価することにより、評価結果の精度が向上する。 As described above, according to the present invention, a rule having a feature whose appearance frequency changes due to a difference in content quality as a condition determination is created in advance, and a rule satisfaction measure is calculated to evaluate video quality. As a result, the accuracy of the evaluation result is improved.

また、映像の各セグメントに対してコンテンツのクオリティを評価することにより、クオリティが高い映像素材とクオリティが低い映像素材とを組み合わせた映像に対しても、正しい評価が実施可能となる。 In addition, by evaluating the quality of content for each segment of video, it is possible to perform correct evaluation even for video that is a combination of high-quality video material and low-quality video material.

以下、図面と共に本発明の実施の形態を説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

図３は、本発明の第１の実施の形態におけるコンテンツ評価装置の構成を示す。 FIG. 3 shows the configuration of the content evaluation apparatus according to the first embodiment of the present invention.

同図に示すコンテンツ評価装置は、コンテンツ記憶部１０、分析信号抽出部２０、セグメント分割部３０、分析信号メモリ４０、セグメントメモリ５０、クオリティ値算出部６０、ルール記憶部７０を有する。 The content evaluation apparatus shown in the figure includes a content storage unit 10, an analysis signal extraction unit 20, a segment division unit 30, an analysis signal memory 40, a segment memory 50, a quality value calculation unit 60, and a rule storage unit 70.

同図の例では、分析信号抽出部２０及びセグメント分割部３０にコンテンツを入力する場合には、コンテンツ記憶部１０からコンテンツを読み込む例を示しているが、この例に限定されることなく、マウス等のポンティングデバイス、キーボード等によって構成される入力装置と、クオリティ値算出部６０から出力されるクオリティ値を表示するための液晶画面等のモニタ画面を有し、入力装置やコンテンツ評価装置自体の処理に応じて情報を提示可能な出力装置が接続されるものとし、例えば、入力データ、処理経過、処理結果となるセグメントをはじめ、各種情報が出力表示される。 In the example shown in the figure, when content is input to the analysis signal extraction unit 20 and the segment division unit 30, the content is read from the content storage unit 10, but the present invention is not limited to this example. A display device such as a liquid crystal screen for displaying a quality value output from the quality value calculation unit 60, and an input device or a content evaluation device itself. It is assumed that an output device capable of presenting information according to processing is connected, and various information including output data, processing progress, processing result segments, and the like are output and displayed.

また、分析信号メモリ４０、セグメントメモリ５０は、例えば、ＲＡＭ(Random Access Memory)、ＲＯＭ(Read Only memory)、ハードディスク装置等であり、必要に応じて、ＣＤ（Compact Disk）、ＤＶＤ(Digital Versatile Disk)等の光学ディスクドライブ装置等により構成されるものとする。また、当該分析信号メモリ４０やセグメントメモリ５０の他にも必要に応じて記憶装置を設けることが可能であるが、例えば、本発明によるコンテンツ評価装置を、汎用のＰＣ（Personal Computer）に組み込んで利用する場合など、当該記憶装置が外部のそれを以って代用できる場合には、含むことを必要とするものではない。 The analysis signal memory 40 and the segment memory 50 are, for example, a RAM (Random Access Memory), a ROM (Read Only Memory), a hard disk device, and the like, and a CD (Compact Disk), a DVD (Digital Versatile Disk) as necessary. ) Or the like. In addition to the analysis signal memory 40 and the segment memory 50, a storage device can be provided as necessary. For example, the content evaluation device according to the present invention is incorporated in a general-purpose PC (Personal Computer). If the storage device can be replaced by an external device, such as when it is used, it does not need to be included.

分析信号抽出部２０は、コンテンツデータに含まれる音声信号を抽出し、分析信号メモリ４０に記憶する。 The analysis signal extraction unit 20 extracts an audio signal included in the content data and stores it in the analysis signal memory 40.

セグメント分割部３０は、音声信号に基づいてコンテンツをセグメントに分割する。 The segment dividing unit 30 divides the content into segments based on the audio signal.

クオリティ値算出部６０は、セグメント毎にクオリティ値（Ｑ値）の計算処理を実施し、出力装置（図示せず）に結果を出力する。 The quality value calculator 60 performs a quality value (Q value) calculation process for each segment, and outputs the result to an output device (not shown).

図４は、本発明の一実施の形態における評価方法のフローチャート（その１）である。 FIG. 4 is a flowchart (part 1) of the evaluation method according to the embodiment of the present invention.

本実施の形態におけるコンテンツ評価方法は、大別して３つのステップからなる。 The content evaluation method in the present embodiment is roughly divided into three steps.

ステップ１０）分析信号抽出処理ステップ：
分析信号抽出部２０がコンテンツ記憶部１０から読み込んだコンテンツから動画像、音声信号、または、その両方を分析して抽出し、分析信号メモリ４０に出力する。 Step 10) Analysis signal extraction processing step:
The analysis signal extraction unit 20 analyzes and extracts the moving image and / or the audio signal from the content read from the content storage unit 10 and outputs it to the analysis signal memory 40.

ステップ１１）セグメント分割ステップ：
当該処理は必要に応じて実施する。セグメント分割部３０が入力されたコンテンツを１つ以上のセグメントに分割し、各セグメントの開始時刻、セグメント長をセグメントメモリ５０に出力する。ここで、セグメントとは、コンテンツ全体、もしくは、コンテンツの中の部分区間を意味するものとする。当該ステップは、必要に応じて導入されるものであるが、予め人手、もしくは、本発明の技術に係らない範囲で各セグメントの開始時刻とセグメント長が与えられる場合や、コンテンツ全体をセグメントとする場合などには導入する必要がない。 Step 11) Segment division step:
This processing is performed as necessary. The segment dividing unit 30 divides the input content into one or more segments, and outputs the start time and segment length of each segment to the segment memory 50. Here, the segment means the entire content or a partial section in the content. This step is introduced as necessary, but when the start time and segment length of each segment are given in advance or manually, or within the range not related to the technology of the present invention, or the entire content is a segment There is no need to introduce it in some cases.

ステップ１２）クオリティ値計算処理ステップ：
クオリティ値算出部６０は、分析信号メモリ４０から取得した分析信号と、セグメントメモリ５０から取得した各セグメントのセグメント開始時刻、セグメント長に基づいてセグメント毎にルール記憶部７０のルールを参照してクオリティ値を計算し、これを出力する。 Step 12) Quality value calculation processing step:
The quality value calculation unit 60 refers to the rule in the rule storage unit 70 for each segment based on the analysis signal acquired from the analysis signal memory 40, the segment start time and the segment length of each segment acquired from the segment memory 50. Calculate the value and output it.

以下に、上記の各ステップの処理を詳細に説明する。 Hereinafter, the processing of each step will be described in detail.

ステップ１０の分析信号処理ステップについて説明する。 The analysis signal processing step of step 10 will be described.

分析信号抽出部２０は、コンテンツがディジタルデータとして入力された場合、これは画像信号と音声信号に分けられる。このデータのうち、必要に応じて画像信号、音声信号、あるいは、その両方を、分析信号として抽出し、分析信号メモリ４０に格納する。 When the content is input as digital data, the analysis signal extraction unit 20 is divided into an image signal and an audio signal. Of this data, an image signal, an audio signal, or both are extracted as analysis signals as necessary and stored in the analysis signal memory 40.

この分析信号は、画像信号だけでもよいし、音声信号だけでもよい。以降の処理は、例えば、音声信号だけを用いる場合でも、画像信号だけを用いる場合でも、あるいは両方を用いる場合でも、何れも実行可能なものである。 This analysis signal may be an image signal alone or an audio signal alone. The subsequent processing can be executed, for example, when only an audio signal is used, when only an image signal is used, or when both are used.

次に、ステップ１１のセグメント分割処理ステップについて説明する。 Next, the segment division processing step of Step 11 will be described.

セグメント分割とは、コンテンツ中に含まれる画像信号、音声信号、あるいはその両方に基づいて、コンテンツを０回以上区切る処理を指す。 Segment division refers to a process of dividing content zero or more times based on image signals, audio signals, or both included in the content.

当該ステップは、予めセグメントが与えられている場合や、セグメントを定める必要がない場合には、図５に示すように、実行する必要はない。 This step does not need to be executed as shown in FIG. 5 when a segment is given in advance or when it is not necessary to define a segment.

また、この処理に利用する信号は、必ずしも前述の分析信号と同じである必要はない。 Further, the signal used for this processing is not necessarily the same as the analysis signal described above.

まず、画像信号を用いてセグメントを生成する方法について述べる。 First, a method for generating a segment using an image signal will be described.

画像信号を用いる際には、構造化情報を利用してセグメントを生成する。 When using an image signal, a segment is generated using structured information.

構造化情報とその抽出方法としては、例えば、前述の非特許文献２に記載のもの、即ち、カット点、カメラワークなどと、種々の抽出方法がある。 As the structured information and its extraction method, there are various extraction methods such as those described in Non-Patent Document 2 described above, that is, cut points, camera work, and the like.

これらのものから、任意の情報を任意の数だけ利用してよいが、好ましくはカット点を利用し、これを直後、セグメントの境界として利用すればよい。この場合、セグメントはカット点によって前後を挟まれる、ショットとして生成される。 Of these, any number of pieces of arbitrary information may be used, but a cut point is preferably used, and this may be used immediately after that as a segment boundary. In this case, the segment is generated as a shot sandwiched between the front and rear by the cut point.

また、さらに、カメラワークを併用し、あるショットの中で、比較的動き量の多いカメラワークが検出された場合には、そのカメラワークの開始時刻、もしくは、終了時刻を境界としてもよい。 Further, when camera work is used in combination and camera work with a relatively large amount of motion is detected in a shot, the start time or end time of the camera work may be used as a boundary.

続いて、音声信号のみを用いてセグメントを生成する方法の一例について説明する。 Next, an example of a method for generating a segment using only an audio signal will be described.

音声信号に対して予め定めた一定の窓幅、例えば５０ｍｓ（ミリセコンド）と、シフト間隔、例えば、３０ｍｓを持つ分析窓を設定する。この分析窓を単位として、音声信号を有相関信号と無相関信号に分ける。ここで、有相関信号とは、人間や動物による発話、及び音楽のように、自己相関関数値の高い信号であり、反対に、無相関信号とは、有相関信号ではない信号、即ち、白色雑音などの自己相関関数値の低い信号を指す。 An analysis window having a predetermined fixed window width for the audio signal, for example, 50 ms (milliseconds), and a shift interval, for example, 30 ms is set. Using this analysis window as a unit, the audio signal is divided into a correlated signal and an uncorrelated signal. Here, a correlated signal is a signal having a high autocorrelation function value, such as speech and music by humans and animals, and conversely, an uncorrelated signal is a signal that is not a correlated signal, that is, white. A signal with a low autocorrelation function value such as noise.

各分析窓に含まれる音声信号が有相関信号であるか、無相関信号であるかは、例えば、次のように分類することができる。 Whether the audio signal included in each analysis window is a correlated signal or an uncorrelated signal can be classified as follows, for example.

音声信号の自己相関関数値を計算し、これが閾値を超える値となっている場合には有相関信号、そうでない場合には無相関信号であると見做す。この閾値の与え方は、例えば、０．７とするなど、予め定数として与えておいてもよいし、一定の有相関信号の存在する時間と無相関信号の存在する時間の比を基準として、この比に最も近くなるような閾値を決定してもよい。 The autocorrelation function value of the audio signal is calculated, and if it is a value exceeding the threshold value, it is regarded as a correlated signal, otherwise it is regarded as an uncorrelated signal. The method of giving this threshold value may be given as a constant in advance, for example, 0.7, or based on the ratio of the time when a constant correlated signal exists and the time when an uncorrelated signal exists, as a reference. A threshold value closest to this ratio may be determined.

次に、有相関信号と判定された分析窓のうち、連続している有相関窓（有相関信号であると判定された分析窓）で構成される区間によって、セグメントを構成する。この処理の実行によって、連続する人間や動物の発話、音楽などをひとまとまりの区間として扱うことができるため、視聴する人間によって意味の理解可能なセグメントを生成することが可能となる。 Next, among the analysis windows determined to be correlated signals, a segment is configured by a section configured by continuous correlated windows (analysis windows determined to be correlated signals). By executing this process, it is possible to handle continuous speech of humans and animals, music, and the like as a group of sections, so that it is possible to generate a segment whose meaning can be understood by the viewer.

セグメントを構成する方法の一例について説明する。 An example of a method for configuring a segment will be described.

セグメントの構成は、セグメント間の境界を決定する処理によって実行する。 The segment configuration is executed by a process for determining a boundary between segments.

コンテンツ中の有相関窓Ｆの集合を時間の早いものから順に｛Ｆ_１，Ｆ_２，…，Ｆ_Ｎ｝とする。ここでは、Ｎは有相関窓の総数である。 Assume that a set of correlated windows F in the content is {F ₁ , F ₂ ,..., F _N } in order from the earliest time. Here, N is the total number of correlated windows.

次に、時間軸上隣り合う有相関窓Ｆ_ｉ，Ｆ_ｉ＋１の時間間隔、すなわち、Ｆ_ｉの終了時刻F_iendと、次の分析窓であるF_ｉ＋１の開始時刻Ｆ_ｉ+１startについて、その時刻の差Ｆ_i+1start−Ｆ_iendを計算する。 Next, the correlated window _F i adjacent on the time _{axis, F i + 1} of the time interval, i.e., the end time F _iend of _{F i,} the start time F _{i + 1 start} of the _{F i + 1} is the next analysis window, the time The difference F _{i + 1start} −F _iend is calculated.

次に、その計算結果を、予め決定したある閾値と比較し、これよりも大きければ、Ｆ_ｉとＦ_ｉ＋１は互いに異なるセグメントに属する有相関窓であると考え、これら２つの間をセグメントの境界とする。 Next, the calculation result is compared with a predetermined threshold value, and if it is larger than this, it is considered that F _i and F _{i + 1} are correlated windows belonging to different segments, and the boundary of the segment is defined between these two. And

このような処理を全ての分析窓に繰り返すことで、時間差のある有相関窓同士は互いに異なるセグメントとすることができ、その結果、時間差のない、一連の有相関信号は同一のセグメントにまとめることができる。 By repeating this process for all analysis windows, correlated windows with a time difference can be made into different segments, and as a result, a series of correlated signals without a time difference are combined into the same segment. Can do.

例えば、図６に示す例では、F_j+1−Ｆ_ｊ＝Ｔ１，Ｆ_j+2−F_ｊ＋１＝Ｔ２となっている。もし、閾値Ｔ_th、を、Ｔ１＜Ｔ_th＜Ｔ２と設定した場合、Ｔ２を境界として、２つのセグメントＢＡ，ＢＢが構成される。 For example, in the example shown in FIG. 6, F _{j + 1} −F _j = T1, F _{j + 2} −F _{j + 1} = T2. If the threshold T _th is set as T 1 <T _th <T 2, two segments BA and BB are configured with T 2 as a boundary.

この閾値Ｔ_tbは、低い値にすればするほど、境界の数が増加するため、生成されるセグメント数が増加することとなり、逆に、高い値にするほど、境界の数が減少し、生成されるセグメント数は減少することとなる。 As the threshold value T _tb is set to a lower value, the number of boundaries increases, so the number of segments to be generated increases. Conversely, as the threshold value T _tb is increased, the number of boundaries decreases, The number of segments to be reduced will decrease.

特別な場合として、閾値Ｔ_tbを非常に高い値、例えば、コンテンツ全体の時間長以上の値を設定することなどによって、セグメントを全く分割しないようにすることもできる。 As a special case, the segment may not be divided at all by setting the threshold value T _tb to a very high value, for example, a value equal to or greater than the time length of the entire content.

従って、前述のように、本発明におけるセグメントとは、コンテンツ全体としてもよく、以降の処理は、コンテンツが全く分割されないような閾値Ｔ_tbを設定した場合であっても実行可能であり、この場合には、当該ステップ１１は意味をなさないため、実行しないものとしてもよい。 Therefore, as described above, the segment in the present invention may be the entire content, and the subsequent processing can be executed even when the threshold value T _tb is set such that the content is not divided at all. The step 11 may not be executed because it does not make sense.

また、有相関信号を、例えば人間や動物による発話音声信号、音楽信号など、さらに、詳細に分類し、これらを分けてセグメントを生成してもよい。この場合には、これらを分ける基準としてスペクトル情報を用いることができる。 Further, the correlated signal may be further classified in detail, for example, a speech signal or a music signal by a person or an animal, and these may be divided to generate a segment. In this case, spectrum information can be used as a reference for separating them.

例えば、発話音声信号と音楽信号を分類する手法としては、前述の特許文献３に記載の方法を用いることができる。 For example, as a method for classifying the speech audio signal and the music signal, the method described in Patent Document 3 can be used.

このような処理によって、より詳細なセグメントを生成することができる。 A more detailed segment can be generated by such processing.

最後の音声信号と画像信号双方を利用して、セグメント生成を実行する場合について述べる。 A case will be described in which segment generation is executed using both the last audio signal and image signal.

例えば、前述の音声信号を利用したセグメント生成では、有相関窓のない部分などに対しては、区間分割を実行することができないという問題がある。そこで、音声信号などを利用したセグメント生成と画像情報、例えば、カット点を利用したセグメント生成を両方とも適用するものとしてもよい。この方法によって、片方のみによるセグメント生成では成すことができない細かい区間を定めることが可能となる。 For example, in the segment generation using the above-described audio signal, there is a problem that section division cannot be performed on a portion without a correlated window. Therefore, both segment generation using an audio signal and image information, for example, segment generation using a cut point may be applied. By this method, it is possible to define a fine section that cannot be achieved by segment generation by only one side.

また、片方のみでは有効なセグメント生成が実行できない場合、例えば、コンテンツに有相関窓が存在しない場合や、画像の構造化情報が存在しない場合などにおいても、相補的に区間分割を実行することが可能である。 In addition, when effective segment generation cannot be performed by using only one of them, for example, when there is no correlated window in the content or when there is no structured information of an image, it is possible to perform section division complementarily. Is possible.

以上の処理によって、コンテンツから１つ以上のセグメントを生成することが可能である。 Through the above processing, one or more segments can be generated from the content.

なお、ここに述べた例、及びその他の例、いずれの方法を用いた場合にも、各セグメントの開始時刻、及びその時間長を取得し、セグメントメモリ５０に格納する。 It should be noted that the start time of each segment and its time length are acquired and stored in the segment memory 50 in any of the methods described here and other examples.

次に、ステップ１２のクオリティ値計算処理ステップについて説明する。 Next, the quality value calculation processing step of step 12 will be described.

図７は、本発明の一実施の形態におけるクオリティ算出部の構成を示す。 FIG. 7 shows the configuration of the quality calculation unit in one embodiment of the present invention.

クオリティ算出部６０は、特徴量抽出部６１、特徴量尺度算出部６２、条件文判定用特徴量抽出部６３、条件文判定部６４、ルール充足性尺度計算部６５、クオリティ値計算部６６、特徴量記憶部６０１、特徴量尺度（ＦＳ）記憶部６０２、条件文判定用特徴量記憶部６０３、ルール充足性尺度（ＲＳ）記憶部６０４から構成される。 The quality calculation unit 60 includes a feature amount extraction unit 61, a feature amount scale calculation unit 62, a conditional sentence determination feature amount extraction unit 63, a conditional sentence determination unit 64, a rule satisfaction measure scale calculation unit 65, a quality value calculation unit 66, a feature An amount storage unit 601, a feature amount scale (FS) storage unit 602, a conditional sentence determination feature amount storage unit 603, and a rule satisfaction measure scale (RS) storage unit 604.

図８は、本発明の一実施の形態におけるクオリティ値計算処理のフローチャートである。 FIG. 8 is a flowchart of the quality value calculation process in one embodiment of the present invention.

ステップ１２は、クオリティ値算出部６０が、各セグメントの分析信号に基づいて、セグメントを分類する基準となるクオリティ値（以下、Ｑ値と呼ぶ）をセグメント毎に計算し、出力するステップである。 Step 12 is a step in which the quality value calculation unit 60 calculates and outputs a quality value (hereinafter referred to as a Q value) that serves as a reference for classifying the segment for each segment based on the analysis signal of each segment.

Ｑ値は、特徴量尺度ＦＳと、ルール充足性尺度ＲＳの２つの基準の、少なくともいずれか一方に基づいて計算される。 The Q value is calculated based on at least one of the two criteria of the feature amount scale FS and the rule sufficiency scale RS.

以下では、１つのセグメントに対して特徴量尺度ＦＳとルール充足性尺度ＲＳを算出する処理についてそれぞれ詳述する。なお、各セグメントに対する処理を実施する際には、セグメントメモリ５０から算出処理を実施するセグメントの開始時刻、セグメントの時間長を取得して、分析情報の中で算出処理を実施するセグメントに対応する区間を特定し、この特定された区間の分析情報から特徴量抽出処理（Ｓ３１）あるいは条件文判定処理（Ｓ３３）を実施する。また、ＦＡＳやルール充足性尺度ＲＳを算出するために必要となる閾値やルールについては、予めルール記憶部７０に格納されているものを用いる。 In the following, the processing for calculating the feature quantity scale FS and the rule satisfaction scale RS for one segment will be described in detail. When performing processing for each segment, the start time of the segment for which calculation processing is performed and the time length of the segment are acquired from the segment memory 50, and the segment corresponding to the segment for which calculation processing is performed is included in the analysis information. A section is specified, and feature amount extraction processing (S31) or conditional sentence determination processing (S33) is performed from the analysis information of the specified section. In addition, as thresholds and rules necessary for calculating the FAS and the rule sufficiency measure RS, those stored in the rule storage unit 70 in advance are used.

まず、特徴量尺度ＦＳについて説明する。 First, the feature amount scale FS will be described.

特徴量尺度ＦＳは、分析信号メモリ４０の分析情報から得られる特徴量から判断できる、クオリティの高さを示す指標である。どのような特徴量を用いるかについては、予め定めることとする。特徴量尺度ＦＳの計算は、ステップ３１（特徴量抽出処理ステップ）と、ステップ３２（特徴量尺度計算処理ステップ）を経て行う。 The feature amount scale FS is an index indicating the high quality that can be determined from the feature amount obtained from the analysis information of the analysis signal memory 40. What kind of feature value is used is determined in advance. The feature amount scale FS is calculated through step 31 (feature amount extraction processing step) and step 32 (feature amount scale calculation processing step).

ステップ３１）特徴量抽出部６１において、分析信号メモリ４０から分析信号を取得し、分析信号から予め定めた特徴量の抽出を行い、各特徴量の値を特徴量記憶部６０１に格納する。 Step 31) The feature amount extraction unit 61 acquires an analysis signal from the analysis signal memory 40, extracts a predetermined feature amount from the analysis signal, and stores the value of each feature amount in the feature amount storage unit 601.

利用する特徴量は、例えば、分析信号として画像信号を利用する場合には、
・ショット長（ＳＢ）
・動き量（ＭＱ）
・カラーヒストグラム（ＣＨ）
等が好適であり、また、分析信号として音声信号を利用する場合には、例えば、
・ピッチ変化（ＤＰ）
・パワーレベル比（ＰＲ）
等が好適である。 The feature quantity to be used is, for example, when using an image signal as an analysis signal,
・ Shot length (SB)
-Movement amount (MQ)
-Color histogram (CH)
In the case where an audio signal is used as the analysis signal, for example,
・ Pitch change (DP)
・ Power level ratio (PR)
Etc. are suitable.

各種特徴量の抽出方法について述べる。 A method for extracting various feature amounts will be described.

＜ショット長ＳＢ＞
ショット長ＳＢは、前述のショットの時間長を指す。これは、前述の非特許文献２に記載の方法などを用いて、カット点を検出した後、これに挟まれる区間の持続時間として求めればよい。プロフェッショナルが制作した映像は長まわしのショットは少なく、平均的なショット長は６〜７秒であることが知られている。ショット長（ＳＢ）を特徴量として抽出し、例えば、図９に示すように、ショット長の分布がプロフェッショナルのもののように、ショットが分割されていれば、クオリティが高い、プロフェッショナルのものから大きく隔たっている場合には、クオリティが低いとみなすことができる。 <Shot length SB>
The shot length SB indicates the time length of the above-described shot. What is necessary is just to obtain | require this as the duration of the area pinched | interposed between this, after detecting a cut point using the method of the above-mentioned nonpatent literature 2, etc. Video produced by professionals is known to have few long shots and an average shot length of 6-7 seconds. The shot length (SB) is extracted as a feature amount. For example, as shown in FIG. 9, if the shot length is divided like a professional shot, the quality is high, and the shot is greatly separated from the professional one. If so, the quality can be considered low.

＜動き量ＭＱ＞
動き量ＭＱの抽出については、上記の非特許文献２に記載のカメラワーク検出の最に計算するカメラパラメータ、及び、ｘ方向、ｙ方向移動量を利用することができる。好ましくは、ノルムを計算し、スカラー化しておく。手振れの度合、カメラワークの滑らかさの度合を動き量（ＭＱ）として算出しておくことで、図９に示すように、クオリティの高いコンテンツとそうでないコンテンツでは差が生じる。 <Movement MQ>
For the extraction of the motion amount MQ, the camera parameters calculated at the time of camera work detection described in Non-Patent Document 2 and the movement amounts in the x and y directions can be used. Preferably, the norm is calculated and scalarized. By calculating the degree of camera shake and the smoothness of camera work as the amount of movement (MQ), as shown in FIG. 9, there is a difference between high-quality content and content that is not.

＜カラーヒストグラムＣＨ＞
カラーヒストグラムＣＨは、各画像を１つ以上の領域に分割し、各領域の平均色相値、彩度値、明度値、あるいは、ＲＧＢ、輝度値などの任意の色情報を量子化し、各量子の出現回数をカウントすることで計算することができる。編集済み映像の場合は、複数のロケーション、被写体で撮影した素材を注意深く切り取って編集するので、ショット間の色合いが大きく変化する場合が多いが、未編集の映像の場合、同じ場所で撮影したショットが連続するので色変化が少ない（例えば、運動会ビデオではグランドの色が多くのショットで支配的である）。また、太陽光や照明による光の加減を、正確にコントロールしているコンテンツの場合には、被写体が鮮明に撮影されており、画像として色変化が多彩であり、そうでないものは、光の影響で全体的に白みがかかっており、一様に輝度値が高めである。 <Color histogram CH>
The color histogram CH divides each image into one or more regions, quantizes average color values, saturation values, lightness values, or arbitrary color information such as RGB and luminance values of each region, It can be calculated by counting the number of appearances. In the case of edited video, the material shot at multiple locations and subjects is carefully cut and edited, so the hue between shots often changes greatly, but in the case of unedited video, shots shot at the same place Since there is a continuous color change, the color change is small (for example, the ground color is dominant in many shots in athletic meet videos). In addition, in the case of content that accurately controls the amount of light or light caused by sunlight or lighting, the subject is clearly photographed and the image has a variety of color changes. As a whole, whiteness is applied, and the luminance value is uniformly high.

＜ピッチ変化ＤＰ＞
ピッチ変化ＤＰは、例えば、前述の非特許文献１などに記載の、ピッチ抽出方法によってピッチＦ０を抽出した後、この時間変化を差分量として計算する。図１１に示すように、音声・音楽が収音用マイクなどを用いて録音されているクオリティの高いものは、比較的ピッチの変化が滑らかとなり、差分量が小さくなる傾向にあるが、上記のような配慮をされずに録音されたものは、ピッチに不連続な点（ジャンプ）を多く含むものとなり、差分量が大きくなる傾向にある。 <Pitch change DP>
For the pitch change DP, for example, after the pitch F0 is extracted by the pitch extraction method described in Non-Patent Document 1, the time change is calculated as a difference amount. As shown in FIG. 11, a high quality recording of voice / music using a sound collecting microphone or the like tends to have a relatively smooth pitch change and a small difference amount. Those recorded without such consideration include many discontinuous points (jumps) in the pitch, and the amount of difference tends to increase.

＜パワーレベル比ＰＲ＞
パワーレベル比ＰＲは、例えば、音声波形の振幅のｒｍｓ値をパワーとして計算し、
ＰＲ＝（ピッチが抽出されている部分の平均パワー）
／（ピッチが抽出されていない部分の平均パワー）
を計算する。図１２に示すように、ピッチ変化の場合と同様、録音に配慮のなされたものは、ピッチが抽出されていない部分のパワーレベルが低い傾向になるが、録音の配慮がなされていないものについては、パワーレベルが高くなる。結果、前者のＰＲは、高い値に、後者は低い値になる傾向がある。 <Power level ratio PR>
The power level ratio PR is calculated by, for example, calculating the rms value of the amplitude of the speech waveform as power,
PR = (Average power of the part where the pitch is extracted)
/ (Average power of the part where the pitch is not extracted)
Calculate As shown in FIG. 12, as in the case of the pitch change, those in which the recording is considered tend to have a low power level in the portion where the pitch is not extracted, but those in which the recording is not considered , The power level will be higher. As a result, the former PR tends to be high and the latter is low.

上記説明した特徴量は、用いる分析情報などに合わせて、任意のものを取捨選択して利用してよい。 The above-described feature amount may be selected and used according to the analysis information to be used.

ステップ３２）特徴量尺度算出部６２は、特徴量抽出部６１から渡された各特徴量の値から特徴量尺度ＦＳを計算し、特徴量尺度（ＦＳ）記憶部６０２に格納する。 Step 32) The feature quantity scale calculation unit 62 calculates the feature quantity scale FS from the value of each feature quantity passed from the feature quantity extraction unit 61 and stores it in the feature quantity scale (FS) storage unit 602.

以下では、特徴量尺度ＦＳを計算する処理の一例について述べる。 Below, an example of the process which calculates the feature-value scale FS is described.

特徴量尺度ＦＳは、用いる特徴量からそれぞれについて平均値、分散などの統計量を計算し、この値に基づいて定める。 The feature amount scale FS calculates a statistical amount such as an average value and a variance for each feature amount to be used, and is determined based on this value.

例えば、ショット長の統計量として、その平均値ＳＢＡ、分散値ＳＢＶや、ピッチ変化ＰＤの統計量として、その平均値ＰＤＡ、分散値ＰＤＶなどを計算すればよい。 For example, the average value SBA and variance value SBV may be calculated as shot length statistics, and the average value PDA and variance value PDV may be calculated as pitch change PD statistics.

これらの統計量から特徴量尺度ＦＳを計算するために、各統計量から特徴量尺度ＦＳを計算する１つ以上の関数を利用する。この関数としては、例えば線形関数を利用してもよいし、あるいは、シグモイド関数、ガウス関数などの上下有界関数や、多層パーセプトロンやサポートベクトルマシンなどの非線形関数など、任意のものを利用してよい。また、これらの関数を組み合わせて用いるものとしてもよい。 In order to calculate the feature amount scale FS from these statistics, one or more functions for calculating the feature amount scale FS from each statistic are used. As this function, for example, a linear function may be used, or an arbitrary function such as a upper and lower bounded function such as a sigmoid function or a Gauss function, or a nonlinear function such as a multilayer perceptron or a support vector machine may be used. Good. Further, these functions may be used in combination.

以下では、特徴量としてショット長の平均値ＳＢＡ、パワーレベル比の平均値ＰＲＡを採用し、それぞれのシグモイド関数と線形関数を利用して特徴量尺度ＦＳを計算する場合の一例について述べる。 Hereinafter, an example will be described in which an average value SBA of shot lengths and an average value PRA of power level ratios are employed as feature amounts, and the feature amount scale FS is calculated using each sigmoid function and linear function.

ショット長の平均値ＳＢＡについては、ＳＢＡから特徴量尺度ＦＳに係る寄与ＦＳ^SBAを、シグモイド関数を用いて以下のように計算する。 For the average value SBA of the shot length, the contribution FS ^SBA related to the feature amount scale FS is calculated from the ^SBA as follows using a sigmoid function.

ここで、α、βは、予め定めておく定数であり、例えば、α＝１、β＝６など、任意の値としてよい。この例では、プロフェッショナルが制作した映像の平均ショット長がβ＝６であることを仮定し、平均ショット長ＳＢＡが６に近いほどＦＳ^SBAが大きくなる。

Here, α and β are predetermined constants, and may be arbitrary values such as α = 1 and β = 6. In this example, it is assumed that the average shot length of the video produced by the professional is β = 6, and the FS ^SBA increases as the average shot length SBA is closer to 6.

また、パワーレベル比の平均値ＰＲＡについては、ＰＲＡからＦＳに係る寄与ＦＳ^PRAを、線形写像を用いて以下のように計算する。 For the average value PRA of the power level ratio, the contribution FS ^PRA from PRA to FS is calculated as follows using a linear mapping.

ここで、γ、εは、予め定めておく定数であり、例えば、γ＝1.0、ε=-0.5など、任意の値としてよい。この場合、パワーレベル比が高い値を取るほどクオリティが高いものであることを想定しており、ＦＳ^PRAが大きくなる。

Here, γ and ε are predetermined constants, and may be arbitrary values such as γ = 1.0 and ε = −0.5, for example. In this case, it is assumed that the higher the power level ratio, the higher the quality, and the FS ^PRA increases.

上記において、計算されたＦＳ^SBA、ＦＳ^PRAに基づいて、特徴量尺度ＦＳを計算する。 In the above, the feature amount scale FS is calculated based on the calculated FS ^SBA and FS ^PRA .

ここでは、線形関数を利用して特徴量尺度ＦＳを計算する例について説明する。 Here, an example of calculating the feature amount scale FS using a linear function will be described.

例えば、特徴量尺度ＦＳを以下のように計算する。 For example, the feature quantity scale FS is calculated as follows.

ここで、η、λは、予め定めておく定数であり、例えば、η＝0.6、λ=0.4など、任意の値としてよい。

Here, η and λ are predetermined constants, and may be arbitrary values such as η = 0.6 and λ = 0.4, for example.

以上、特徴量尺度ＦＳを計算する方法の一例について説明したが、ここに説明した例以外にも、任意の特徴量の組み合わせ、任意の関数を利用して同様の処理が実施できることは言うまでもない。 The example of the method for calculating the feature amount scale FS has been described above, but it goes without saying that the same processing can be performed using any combination of feature amounts and any function other than the example described here.

好ましくは、最終的に計算される特徴量尺度ＦＳは、上記で説明したように、０〜１の範囲に収まるように計算されることが望ましい。 Preferably, the feature amount scale FS that is finally calculated is calculated so as to fall within the range of 0 to 1 as described above.

次に、ルール充足性尺度ＲＳの求め方について説明する。 Next, how to obtain the rule sufficiency measure RS will be described.

ルール充足性尺度ＲＳは、クオリティを判断する基準となる、予めルール記憶部７０に設定されたルールを充足するか否かによって計算される値である。ルール記憶部７０に格納されている各ルールは、クオリティの高い映像とクオリティの低い映像とでは発生する頻度が異なる映像や音声の特徴が上演文として設定されている。ルール充足性尺度ＲＳは、ステップ３０の条件判定特徴量抽出処理、ステップ３３（条件判定処理ステップ）と、ステップ３４（ルール充足性尺度計算処理ステップ）を経て計算する。 The rule sufficiency measure RS is a value calculated depending on whether or not a rule set in advance in the rule storage unit 70 as a reference for determining quality is satisfied. In each rule stored in the rule storage unit 70, video and audio features that are generated differently between high-quality video and low-quality video are set as performance sentences. The rule sufficiency measure RS is calculated through the condition determination feature quantity extraction process in step 30, step 33 (condition determination process step), and step 34 (rule sufficiency measure calculation process step).

ステップ３０）条件文判定用特長量抽出部６３における条件文判定用特徴量抽出処理では、分析情報（分析信号）からステップ３３で条件判定に用いる特徴量を抽出する。なお、ステップ３３での判定条件は予め規定されていることとする。したがって、ステップ３０で抽出するべき特徴量も予め明らかとなる。なお、ステップ３０は、ルール充足性尺度ＲＳ算出に必要な既存の特徴量抽出処理を前処理として用いてもよい。また、条件文判定用特徴量抽出部６３において、特徴量尺度ＦＳ算出で用いた特徴量抽出処理（ステップ３１）に記載した特徴量抽出処理の中から、ステップ３３で用いる特徴量を算出する処理を選択してもよい。 Step 30) In the conditional sentence determination feature amount extraction processing in the conditional sentence determination feature amount extraction unit 63, the feature amount used for the conditional determination in step 33 is extracted from the analysis information (analysis signal). Note that the determination condition in step 33 is defined in advance. Therefore, the feature quantity to be extracted in step 30 is also clarified in advance. In step 30, an existing feature amount extraction process necessary for calculating the rule sufficiency measure RS may be used as a preprocess. Further, the conditional sentence determination feature quantity extraction unit 63 calculates the feature quantity used in step 33 from the feature quantity extraction process described in the feature quantity extraction process (step 31) used in the feature quantity scale FS calculation. May be selected.

ルール充足性尺度ＲＳは、予めルール記憶部７０に設定された１つ以上のルールを用いて計算される。ルール充足性尺度ＲＳを決定付けるルールＲは、それぞれif-then形式で記述される。これは、分析情報を分析して得られる情報に基づいて構成され、以下のように記述される。 The rule sufficiency measure RS is calculated using one or more rules set in the rule storage unit 70 in advance. Each rule R that determines the rule sufficiency measure RS is described in an if-then format. This is configured based on information obtained by analyzing analysis information, and is described as follows.

「ｉｆ条件文ＣＳｔｈｅｎＲＳ加点」
この条件文ＣＳを満たす場合には、ｔｒｕｅであると判断し、ｔｈｅｎに記述されているＲＳ加点分だけ、ＲＳに値を和算する。この際、ＲＳの初期値は０としておく。 "If conditional statement CS then RS addition"
When this conditional statement CS is satisfied, it is determined to be true, and the value is added to the RS by the RS added point described in theen. At this time, the initial value of RS is set to 0.

ステップ３３）まず、条件文判定部６４において、処理対象となるセグメントが条件文ＣＳを満たすか否かを判定する。 Step 33) First, the conditional statement determination unit 64 determines whether or not the segment to be processed satisfies the conditional statement CS.

ルールＲの例としては、例えば、
Ｒ１：「ｉｆテロップがあるｔｈｅｎ＋０．４」
Ｒ２：「ｉｆ顔のアップがあるｔｈｅｎ＋０．２」
Ｒ３：「ｉｆ一連の会話中における顔の向きがカメラの方向を向いているｔｈｅｎ − ０．３」
Ｒ４：「ｉｆ一連の会話中にカット点があるｔｈｅｎ＋１．０」
Ｒ５：「ｉｆ一連の会話中にカメラワークがあるｔｈｅｎ −０．６」
などを予め設定しておく。 As an example of the rule R, for example,
R1: "then +0.4 with if telop"
R2: “if face is up then +0.2”
R3: “if the face is facing the direction of the camera during a series of conversations -0.3”
R4: “if there is a cut point in a series of conversations, then +1.0”
R5: “if there is a camera work during a series of conversations, then -0.6”
Etc. are set in advance.

ここで、Ｒ４とＲ５は、撮影環境がどの程度整っているのかをルール化したものである。具体的には、"Ｒ５"は、１台のカメラで撮影を行っている際に発生する、会話の発話者にカメラを向けるためのカメラワークに対する評価を意図している。このため、撮影環境が整っていない（撮影機材の数が潤沢でない）ことを示す指標として、−０．６というマイナス値のＲＳ加点を付与している。一方、"Ｒ４"のような特徴を持つ映像は、少なくとも２台のカメラを用いて、発話者の切り替えの際のカメラワークが不要な環境で生成可能となる。したがって、撮影環境が整っていることを示す指標として、＋１．０というプラス値のＲＳ加点を付与している。 Here, R4 and R5 are rules based on how well the shooting environment is prepared. Specifically, “R5” is intended to evaluate the camera work for pointing the camera to the conversation speaker, which occurs when shooting with one camera. For this reason, a negative RS addition of −0.6 is given as an index indicating that the photographing environment is not prepared (the number of photographing equipment is not sufficient). On the other hand, an image having a feature such as “R4” can be generated using at least two cameras in an environment that does not require camerawork when switching the speaker. Therefore, a positive RS addition of +1.0 is given as an index indicating that the shooting environment is in place.

また、"Ｒ１"は、映像編集がなされたセグメントはクオリティが高いとの仮定のもと、映像編集がなされたことを示す特徴を抽出し、映像編集がなされたことを示す情報に対してプラスのＲＳ加点を付与することを意図している。具体的には、一般的にテロップは映像編集により挿入されるものであるため、テロップを検出し、テロップがある場合のＲＳ加点を＋０．４というプラス値としている。 “R1” is a feature that extracts a feature indicating that video editing has been performed under the assumption that a segment that has undergone video editing has high quality, and is added to information indicating that video editing has been performed. It is intended to give an additional RS score. Specifically, since telops are generally inserted by video editing, telops are detected, and RS addition points when there are telops are set to a positive value of +0.4.

"Ｒ２"、"Ｒ３"は、良質なコンテンツで統計的に多く見られる構図をルール化したものである。 "R2" and "R3" are rules that are statistically common in high-quality content.

以上、ルール記憶部７０に設定したルールの条件文を満たすか否かを判定するためには、各条件文を判断するために必要な分析を実施する必要がある。 As described above, in order to determine whether or not the conditional statement of the rule set in the rule storage unit 70 is satisfied, it is necessary to perform analysis necessary for determining each conditional statement.

例えば、ここに挙げたルールＲ１〜Ｒ５のそれぞれについて、この分析方法の一例を説明する。 For example, an example of this analysis method will be described for each of the rules R1 to R5 listed here.

例えば、ルールＲ１におけるテロップは、前述の特許文献３に記載の方法などによって、条件文判定部６４において、処理対象となるセグメントにテロップがあるかどうかを検出し、テロップが１つ以上検出された場合にルールＲ１の条件文を満たす（ｔｒｕｅである）と判定される。また、Ｒ２の顔のアップがあるか否かは前述の特許文献４記載の方法９などによって処理対象となるセグメントの中で顔領域を検出し、顔領域画素の数が予め定めた閾値以上の数の場合（つまり、画像の中での顔領域の占有率が予め定めた閾値以上の場合）に、処理対象となるセグメントの中に顔のアップがあると判定し、ルールＲ２の条件文を満たすと判定し、ＲＳ記憶部６０４に格納する。 For example, the telop in the rule R1 is detected by the conditional statement determination unit 64 as to whether or not there is a telop in the segment to be processed by the method described in Patent Document 3, and one or more telops are detected. In this case, it is determined that the conditional statement of rule R1 is satisfied (true). Whether or not there is an R2 face up is determined by detecting the face area in the segment to be processed by the method 9 described in Patent Document 4 and the number of face area pixels is equal to or greater than a predetermined threshold. In the case of the number (that is, when the occupation ratio of the face area in the image is equal to or greater than a predetermined threshold), it is determined that there is a face up in the segment to be processed, and the conditional statement of rule R2 is It determines with satisfy | filling, and stores in RS memory | storage part 604. FIG.

また、Ｒ３，Ｒ４，Ｒ５については、まず、特許文献２の記載の方法などによって、処理対象となるセグメントの中から一連の会話区間を検出する。また、Ｒ３であれば、一連の会話区間として検出された時間帯に対応する映像区間に対して、特許文献４に記載の方法などを適用して顔の向きを判定し、顔の向きとカメラの光軸方向がなす角度が予め定めた角度以内の場合が１回以上出現した場合に、ルールＲ３の条件文を満たすと判定し、ＲＳ記憶部６０４に格納する。 As for R3, R4, and R5, first, a series of conversation sections are detected from the segments to be processed by the method described in Patent Document 2. In the case of R3, the direction of the face is determined by applying the method described in Patent Literature 4 to the video section corresponding to the time zone detected as a series of conversation sections, and the face direction and the camera. When the angle formed by the optical axis direction is within a predetermined angle, it is determined that the conditional statement of rule R3 is satisfied and stored in the RS storage unit 604.

Ｒ４であれば、一連の会話区間に該当する映像区間に対して、非特許文献２に記載の方法などを用いてカット点を検出し、１つ以上のカット点が検出された場合にルールＲ４の条件文を満たすと判定し、ＲＳ記憶部６０４に格納する。 In the case of R4, a cut point is detected using a method described in Non-Patent Document 2 for a video section corresponding to a series of conversation sections, and rule R4 is detected when one or more cut points are detected. Is stored in the RS storage unit 604.

Ｒ５であれば、一連の会話区間に該当する映像区間に対して非特許文献２に記載の方法などを用いてカメラワークを検出し、１つ以上のカメラワークが検出された場合にルールＲ５の条件文を満たすと判定する。 If it is R5, the camera work is detected using the method described in Non-Patent Document 2 or the like for the video section corresponding to the series of conversation sections, and when one or more camera works are detected, the rule of R5 It is determined that the conditional statement is satisfied.

ステップ３４）次に、ルール充足性尺度計算部６５におけるルール充足性尺度計算処理について説明する。 Step 34) Next, the rule sufficiency measure calculation process in the rule sufficiency measure calculator 65 will be described.

ルール充足性尺度ＲＳは、上記条件の判定の後、ｔｒｕｅと判断されたルールに対して、ＲＳ加点を加算することによって計算する。例えば、上記例のうち、あるセグメントが、Ｒ１、Ｒ４の２ルールに対してｔｒｕｅであると判定された場合、そのＲＳは、０．４＋１．０＝１．４となる。なお、ここに挙げた例以外にも、分析情報から得られる情報に関するルールであれば、適宜追加してもよい。 The rule sufficiency measure RS is calculated by adding an RS additional point to the rule determined to be true after the above condition is determined. For example, in the above example, when it is determined that a certain segment is true for the two rules R1 and R4, the RS is 0.4 + 1.0 = 1.4. In addition to the examples given here, rules regarding information obtained from analysis information may be added as appropriate.

さらに、上記挙げたルールは、セグメントに関するルールであるが、セグメントを分割しない場合はコンテンツ全体に等しいため、コンテンツ全体についてのルールを設定してもよい。 Furthermore, although the above-mentioned rule is a rule regarding a segment, since it is equal to the entire content when the segment is not divided, a rule for the entire content may be set.

この例としては、例えば、
Ｒ６：「ｉｆコンテンツの全長が３０秒以下であるｔｈｅｎ − ０．９」
Ｒ７：「ｉｆコンテンツの画像ビットレートが９．８Mbps以上であるｔｈｅｎ＋０．９」
Ｒ８：「ｉｆコンテンツの音声サンプリングレートが22.050ｋＨｚ未満であるｔｈｅｎ −０．８」
などを設定してもよい。 For example,
R6: “if-the total length of the content is 30 seconds or less then-0.9”
R7: “the content bit rate of the content is 9.8 Mbps or more then +0.9”
R8: “if content audio sampling rate is less than 22.050 kHz then −0.8”
Etc. may be set.

特に、これらのルールは、コンテンツのプロパティとして、特別な解析をすることなく参照できるものが多い。従って、このようなルールのみからコンテンツのクオリティ値を計算する場合には、ここまで説明した解析方法を実施しなくともよい。 In particular, many of these rules can be referred to as content properties without special analysis. Accordingly, when the content quality value is calculated only from such rules, the analysis method described so far need not be performed.

ステップ３５）クオリティ値計算部６６におけるクオリティ値計算処理について説明する。 Step 35) The quality value calculation process in the quality value calculation unit 66 will be described.

以上の求めたＦＳとＲＳに基づいて、セグメントのＱ値を計算する。 Based on the FS and RS obtained above, the Q value of the segment is calculated.

この分類方法は様々あるが、ここでは、ＦＳとＲＳと、シグモイド関数を用いて計算する一例について説明する。 There are various classification methods. Here, an example of calculation using FS, RS, and a sigmoid function will be described.

例えば、Ｑ値を、以下の数式に基づいて計算する。 For example, the Q value is calculated based on the following mathematical formula.

ここで、ψ、φは予め定める定数であり、例えば、ψ＝０．２、φ＝０．８などとすればよい。この例では、Ｑ値は、０〜１までの範囲を取る。

Here, ψ and φ are predetermined constants, for example, ψ = 0.2, φ = 0.8, and the like. In this example, the Q value ranges from 0 to 1.

この例では、ＦＳ、ＲＳ両方を用いて、Ｑ値を判定する例を示したが、これ以外にも、例えば、ＦＳ，ＲＳのいずれか一方のみを用いるとしてもよい。また、Ｑ値を計算する関数は線形関数、その他非線形関数を用いてもよい。 In this example, the example in which the Q value is determined using both FS and RS has been described. However, for example, only one of FS and RS may be used. The function for calculating the Q value may be a linear function or other nonlinear function.

このようにして、図１３に示すように、セグメント毎にＱ値を計算することができるので、このＱ値を、クオリティ値をとして出力する。 In this way, as shown in FIG. 13, since the Q value can be calculated for each segment, this Q value is output as the quality value.

セグメント毎にＱ値を算出することにより、クオリティの高い映像素材とクオリティの低い映像素材とが混在している映像であっても、正しくクオリティを評価することが可能となる。 By calculating the Q value for each segment, it is possible to correctly evaluate the quality even for a video in which high-quality video material and low-quality video material are mixed.

この際、出力する情報は、例えば、図１４に示すように、各セグメントの開始時刻、時間長と合わせて、リスト形式で出力するものとしてもよい。 At this time, the information to be output may be output in a list format together with the start time and time length of each segment, as shown in FIG. 14, for example.

上記示した一例では、Ｑ値は、値が高ければ高いほどクオリティが高いと判断することができる。そこで、例えば、クオリティが「高い」と「低い」の２値に判定するために、Ｑ値が０．５以上のセグメントはクオリティが高いと判定し、０．５未満のセグメントはクオリティが低いと判定してもよい。 In the example shown above, it can be determined that the higher the Q value, the higher the quality. Therefore, for example, in order to determine a binary value of “high” and “low”, it is determined that a segment having a Q value of 0.5 or higher is high in quality, and a segment having a quality of less than 0.5 is low in quality. You may judge.

また、より詳細に、クオリティが「とても高い」、「やや高い」、「やや低い」、「とても低い」に分類するために、それぞれ、Ｑ値が０．７５以上、Ｑ値が０．５以上０．７５未満、Ｑ値が０．２５以上０．５未満、Ｑ値が０．２５未満と４段階に分け、判定してもよい。 In addition, in order to classify the quality into “very high”, “slightly high”, “slightly low”, and “very low”, the Q value is 0.75 or more and the Q value is 0.5 or more, respectively. The determination may be made in four stages: less than 0.75, Q value of 0.25 or more and less than 0.5, and Q value of less than 0.25.

このような分類を行うことによって、例えば、セグメントの中で、特定のクオリティに属するセグメントを要求し、該当するセグメントのみを集めて表示することも簡単にできる。例えば、複数の映像に含まれるセグメントを用いて編集し、クオリティの高い映像を１つ作るような場合、上記分類されたセグメントから「とても高い」「やや高い」に属するもののみを集めて表示させることもできる。 By performing such classification, for example, it is possible to request segments belonging to a specific quality among the segments and collect and display only the corresponding segments. For example, when editing using segments included in multiple videos and creating one high-quality video, only those belonging to “very high” and “slightly high” are collected from the classified segments and displayed. You can also.

上記以外にも様々な分類の仕方は存在するが、適宜適当な形態をとってよいことは言うまでもない。 There are various ways of classification other than the above, but it goes without saying that an appropriate form may be taken as appropriate.

なお、本発明は、クオリティを評価する方法であるが、評価されたクオリティを直接提示するだけの利用に留まらない。以下にいくつかの例を説明する。 In addition, although this invention is a method of evaluating quality, it is not only the use which shows the evaluated quality directly. Some examples are described below.

＜コンテンツ検索技術としての利用＞
従来のコンテンツ検索技術では、例えば、特許文献５、特許文献６に記載の動画像検索方法に開示されているりょうに、動画像のセグメントや、代表画像の特徴量の類似度ＳＴに基づいてスコアを計算し、このスコア順にランキングした結果を提示するものが多い。しかしながら、このような手法では、コンテンツのクオリティについてまで評価することはできない。利用者によっては、クオリティの高いものを特に視聴したいと考えている利用者もおり、このような利用者にとっては、従来の動画像検索技術のみでは満足な結果を得ることができていない。 <Use as content search technology>
In the conventional content search technology, for example, as disclosed in the moving image search methods described in Patent Document 5 and Patent Document 6, the score is based on the similarity ST of the feature amount of the moving image segment and the representative image. In many cases, the results are calculated and the results of ranking in the order of the scores are presented. However, such a method cannot evaluate the quality of content. Some users are particularly interested in viewing high-quality ones. For such users, satisfactory results cannot be obtained only with the conventional moving image search technology.

そこで、本発明の方法によって評価されたクオリティを考慮することによって、類似度ＳＴのみによってではなく、さらにクオリティの高いものを優先的に提示することで、このような利用者にとっても有益な動画像検索技術を提供できる。 Therefore, by considering the quality evaluated by the method of the present invention, not only by the similarity ST but also by preferentially presenting a higher quality, such a moving image that is also beneficial to such a user. Provide search technology.

処理方法の一例としては、例えば、特許文献５、特許文献６等によって、類似度ＳＴでランキングされたコンテンツを、所定の区分、例えば、（１位から順位、３０位ごとに区間を構成するなど）に分割し、区分毎にＱ値の高い順にソートしなおすものとしてもよい。 As an example of the processing method, for example, contents ranked by similarity ST according to Patent Literature 5, Patent Literature 6, and the like are divided into predetermined sections, for example, (ranked from 1st place, sections for every 30th place, etc.) ) And may be re-sorted in descending order of Q value for each section.

あるいは、特許文献５、特許文献６等によって計算された類似度ＳＴとＱ値から、新たなスコアＳＣを計算し、これに基づいてランキングを再構成してソートするものとしてもよい。 Alternatively, a new score SC may be calculated from the similarity ST and the Q value calculated by Patent Document 5, Patent Document 6, and the like, and the ranking may be reconfigured based on this score SC.

この方法の一例としては、例えば、線形関数を用いて、
ＳＣ＝Ψ×ＳＴ＋Φ×Ｑ値
などと計算すればよい。 As an example of this method, for example, using a linear function,
SC = Ψ × ST + Φ × Q value and the like may be calculated.

このとき、Ψ、Φはそれぞれ、任意の定数であり、例えば、Ψ＝０．５、Φ＝０．５などとすればよい。 At this time, Ψ and Φ are arbitrary constants, for example, Ψ = 0.5, Φ = 0.5, and the like.

また、必ずしも線形関数を利用する必要はなく、非線形関数など、任意のものを用いて構わない。このましくは、ＳＴ，Ｑ値に対して単調増大する関数を用いる。 Further, it is not always necessary to use a linear function, and an arbitrary function such as a nonlinear function may be used. Preferably, a function that monotonically increases with respect to the ST and Q values is used.

図１５に示す例では、８つのコンテンツについて、類似度、Ｑ値、Ψ＝０．５、φ＝０．５としたスコアが計算されている。同図（Ａ）の表は、類似度に対して降順にランキングした従来の手法による結果である。この例では、クオリティを示すＱ値の低いものがランキング上位に現われており、クオリティが高いものを視聴したいと考える利用者の意図を満足する結果とはならい。同図（Ｂ）は、類似度、Ｑ値双方を考慮したスコアによってランキングした結果である。この結果、類似度の高いものの中でも、Ｑ値が高いものが上位に現われており、の利用者の意図を満足する結果となっている。 In the example shown in FIG. 15, scores with similarity, Q value, Ψ = 0.5, and φ = 0.5 are calculated for eight contents. The table in FIG. 5A shows the results obtained by the conventional method ranking in descending order with respect to the similarity. In this example, a low Q value indicating quality appears at the top of the ranking, and does not satisfy the intention of a user who wants to watch a high quality one. FIG. 5B shows the result of ranking according to the score considering both the similarity and the Q value. As a result, among those having a high degree of similarity, those having a high Q value appear at the top, which satisfies the user's intention.

また、同様に、それまでに利用者が視聴したコンテンツに類似する、全く新しいコンテンツを推薦するコンテンツ推薦技術がある。この場合も、全く同様の処理方法で、類似度が高く、よりクオリティが高いものを推薦するといった利用が可能である。 Similarly, there is a content recommendation technique for recommending completely new content similar to the content that has been viewed by the user. In this case as well, it is possible to use the same processing method, such as recommending a higher quality and higher quality.

＜検出技術の予備処理としての利用＞
従来、コンテンツの中から、特定の区間を自動的に検出する検出技術が発明されている。例えば、特許文献「特開２００８−２２１４２号公報」に記載の技術では、野球映像の中から、投球シーンのみを検出する技術について開示されている。このような技術では、処理対象とするコンテンツによっては、意図する効果が得られにくい。上記の例では、野球映像の投球シーンを検出する技術であるため、野球映像以外のコンテンツに対しては、その効果を得ることはできない。 <Use of detection technology as preliminary processing>
Conventionally, a detection technique for automatically detecting a specific section from content has been invented. For example, the technique described in the patent document “Japanese Patent Laid-Open No. 2008-22142” discloses a technique for detecting only a pitching scene from a baseball video. With such a technique, it is difficult to obtain the intended effect depending on the content to be processed. In the above example, since it is a technique for detecting a pitching scene of a baseball video, the effect cannot be obtained for content other than the baseball video.

従って、例えば、野球映像以外のコンテンツが含まれるようなコンテンツデータベースに対して、上記の検出技術を利用する場合、事前に野球映像らしきものを絞り込んでおく予備処理を導入できれば、効果の低減や無駄な処理を省くことができるため、有益である。この例では、コンテンツとして野球映像を取り上げたが、この他、映画やドラマ、ニュース、各種スポーツ映像なども同様のことが言える。 Therefore, for example, when the above-described detection technology is used for a content database that includes content other than baseball video, if preparatory processing that narrows down what seems to be baseball video can be introduced in advance, the effect can be reduced or wasted. This is advantageous because it can save processing. In this example, baseball video is taken up as content, but the same can be said for movies, dramas, news, various sports videos, and the like.

上記のようなコンテンツは、主としてプロフェッショナルの作成者によって作成されており、そのクオリティは高いものとなっている。したがって、本発明の技術によって、事前にクオリティの高いコンテンツを絞り込むことによって、例えば、上記のような検出技術を支援し、効果的な処理を実現できる。 The content as described above is mainly created by professional creators, and the quality is high. Therefore, by narrowing down high-quality content in advance by the technology of the present invention, for example, the detection technology as described above can be supported and effective processing can be realized.

なお、ここでは、一例として特許文献「特開２００８−２２１４２号公報」を例示したが、同様の効果の低減が考えられ得る任意の検出技術に対して適用可能であることは言うまでもない。 In addition, although patent document "Unexamined-Japanese-Patent No. 2008-22142" was illustrated here as an example, it cannot be overemphasized that it is applicable with respect to the arbitrary detection techniques which can consider the reduction of the same effect.

以上のように、本実施の形態では、コンテンツ中の分析信号に基づいて、コンテンツの多様な情報を分析することで、セグメントのクオリティを数値として計算し、提示できる。これは、従来技術では、評価されたクオリティの精度が低く、分類数が少なかったという問題に対して解決を与えるものである。 As described above, according to the present embodiment, the quality of the segment can be calculated and presented as a numerical value by analyzing various information of the content based on the analysis signal in the content. This provides a solution to the problem that the accuracy of the evaluated quality is low and the number of classifications is small in the prior art.

また、上記の図３に示すコンテンツ評価装置の構成要素の動作をプログラムとして構築し、コンテンツ評価装置として利用されるコンピュータにインストールして実行させる、または、ネットワークを介して流通させることが可能である。 Further, the operation of the components of the content evaluation apparatus shown in FIG. 3 can be constructed as a program and installed in a computer used as the content evaluation apparatus for execution or distributed via a network. .

また、構築されたプログラムをハードディスクや、フレキシブルディスク・ＣＤ−ＲＯＭ等の可搬記憶媒体に格納し、コンピュータにインストールする、または、配布することが可能である。 Further, the constructed program can be stored in a portable storage medium such as a hard disk, a flexible disk, or a CD-ROM, and can be installed or distributed in a computer.

なお、本発明は、上記の実施の形態に限定されることなく、特許請求の範囲内において種々変更・応用が可能である。例えば、生成されたセグメントの情報を含む、ＸＭＬ(eXtensible Markup Language)データを生成してもよい。汎用的なＸＭＬデータであれば、本発明により生成される出力結果の利用性を高めることができる。 The present invention is not limited to the above-described embodiment, and various modifications and applications can be made within the scope of the claims. For example, XML (eXtensible Markup Language) data including information on the generated segment may be generated. If it is general-purpose XML data, the usability of the output result generated by the present invention can be improved.

本発明は、映像のクオリティを評価する技術全般に適用可能である。 The present invention is applicable to all techniques for evaluating the quality of video.

本発明の原理構成図である。It is a principle block diagram of this invention. 本発明の原理を説明するための図である。It is a figure for demonstrating the principle of this invention. 本発明の一実施の形態におけるコンテンツ評価装置の構成図である。It is a block diagram of the content evaluation apparatus in one embodiment of this invention. 本発明の一実施の形態における評価方法のフローチャート（その１）である。It is a flowchart (the 1) of the evaluation method in one embodiment of this invention. 本発明の一実施の形態における評価方法のフローチャート（その２）である。It is a flowchart (the 2) of the evaluation method in one embodiment of this invention. 本発明の一実施の形態における部分コンテンツ生成処理の一例を表す図である。It is a figure showing an example of the partial content production | generation process in one embodiment of this invention. 本発明の一実施の形態におけるクオリティ値算出部の構成図である。It is a block diagram of the quality value calculation part in one embodiment of this invention. 本発明の一実施の形態におけるクオリティ値計算処理のフローチャートである。It is a flowchart of the quality value calculation process in one embodiment of the present invention. 本発明の一実施の形態におけるショット長ＳＢの一例である。It is an example of shot length SB in one embodiment of the present invention. 本発明の一実施の形態における動き量ＭＱの一例である。It is an example of the amount of movement MQ in one embodiment of the present invention. 本発明の一実施の形態におけるピッチ変化ＤＰの一例である。It is an example of pitch change DP in one embodiment of the present invention. 本発明の一実施の形態におけるパワーレベル比ＰＲの一例である。It is an example of power level ratio PR in one embodiment of the present invention. 本発明の一実施の形態におけるＱ値の一例である。It is an example of Q value in one embodiment of the present invention. 本発明の一実施の形態における出力結果の一例である。It is an example of the output result in one embodiment of this invention. 検索結果の一例である。It is an example of a search result.

Explanation of symbols

１０コンテンツ記憶部
２０分析信号抽出手段、分析信号抽出部
３０セグメント分割部
４０記憶手段、分析信号メモリ
５０セグメントメモリ
６０クオリティ値算出手段、クオリティ値算出部
６１特徴量抽出部
６２特徴量尺度算出部
６３条件文判定用特徴量抽出部
６４条件文判定部
６５ルール充足性尺度計算部
６６クオリティ値計算部
７０ルール記憶手段、ルール記憶部
６０１特徴量記憶部
６０２特徴量尺度（ＦＳ）記憶部
６０３条件文判定用特徴量記憶部
６０４ルール充足性尺度（ＲＳ）記憶部 10 content storage unit 20 analysis signal extraction unit, analysis signal extraction unit 30 segment division unit 40 storage unit, analysis signal memory 50 segment memory 60 quality value calculation unit, quality value calculation unit 61 feature quantity extraction unit 62 feature quantity scale calculation unit 63 Condition sentence determination feature quantity extraction section 64 Condition sentence judgment section 65 Rule satisfaction level scale calculation section 66 Quality value calculation section 70 Rule storage means, rule storage section 601 Feature quantity storage section 602 Feature quantity scale (FS) storage section 603 Condition sentence Feature storage unit for determination 604 Rule satisfaction scale (RS) storage unit

Claims

A content evaluation apparatus that analyzes at least one of an image, a sound, and a music signal included in content and evaluates the content,
Analysis signal extraction means for extracting at least one of image information or audio information or image information and audio information in the content as an analysis signal, and storing the analysis signal in a storage means;
A rule storage means for storing a rule that is a condition determination sentence using an analysis signal whose frequency of occurrence varies depending on the quality of content;
Quality value calculation that obtains the analysis signal from the storage means, refers to the rule storage means, calculates a quality value using a rule satisfaction measure calculated based on a rule corresponding to the analysis signal, and outputs the quality value Means,
A content evaluation apparatus comprising:

The rule storage means includes
As the condition judgment sentence,
When there is a cut point during a series of conversations, a condition judgment sentence that evaluates the quality of the content highly, and evaluates the content quality low when there is camera work during a series of conversations, is stored.
The content evaluation apparatus according to claim 1.

A content evaluation apparatus that analyzes at least one of an image, a sound, and a music signal included in content and evaluates the content,
Analysis signal extracting means for extracting image information or audio information in the content, or at least one of image information and audio information as an analysis signal, and storing it in a storage means;
The analysis signal is read from the storage means, and at least one of the shot length of the image signal, the amount of motion, the color histogram, the pitch change, the power level, the pitch change of the audio signal, and the power level ratio among the analysis signals is read out. A quality value calculating means for calculating and outputting a quality value using a feature amount scale calculated using
A content evaluation apparatus comprising:

A content evaluation apparatus that analyzes at least one of an image, a sound, and a music signal included in content and evaluates the content,
Analysis signal extracting means for extracting image information or audio information in the content, or at least one of image information and audio information as an analysis signal, and storing it in a storage means;
A rule storage means for storing a condition judgment sentence that evaluates the quality of the content high when there is a cut point during a series of conversations and evaluates the quality of the content low when there is camera work during the series of conversations;
First obtaining the analysis signal from the storage means, referring to the rule storage means, and calculating and outputting a quality value using a rule satisfaction measure calculated based on a rule corresponding to the analysis signal Quality value calculation means,
The analysis signal is read from the storage means, and at least one of the shot length of the image signal, the amount of motion, the color histogram, the pitch change, the power level, the pitch change of the audio signal, and the power level ratio among the analysis signals is read out. Second quality value calculating means for calculating and outputting a quality value using a feature amount scale calculated using
A content evaluation apparatus comprising:

A content evaluation method for analyzing at least one of an image, a sound, and a music signal included in content and evaluating the content,
An analysis signal extraction step in which the analysis signal extraction unit extracts at least one of image information or audio information or image information and audio information in the content as an analysis signal, and stores the analysis signal in a storage unit;
Quality value calculating means refers to a rule storage means that acquires the analysis signal from the storage means, and stores a rule that is a condition determination sentence using an analysis signal whose occurrence frequency changes due to a difference in content quality, A quality value calculating step for calculating and outputting a quality value using a rule satisfaction measure calculated based on a rule corresponding to the analysis signal;
The content evaluation method characterized by performing.

In the quality value calculating step,
The rule storage means for storing a condition judgment sentence that evaluates the quality of the content highly when there is a cut point during a series of conversations and evaluates the content quality low when there is camera work during a series of conversations The content evaluation method according to claim 5, which is referred to.

A content evaluation method for analyzing at least one of an image, a sound, and a music signal included in content and evaluating the content,
An analysis signal extraction step in which the analysis signal extraction unit extracts at least one of image information or audio information or image information and audio information in the content as an analysis signal, and stores the analysis signal in a storage unit;
Quality value calculation means reads the analysis signal from the storage means, and among the analysis signals, shot length of image signal, motion amount, color histogram, pitch change, power level, pitch change of audio signal, power level ratio A quality value calculating step of calculating and outputting a quality value using a feature amount scale calculated using at least one of
The content evaluation method characterized by performing.

A content evaluation method for analyzing at least one of an image, a sound, and a music signal included in content and evaluating the content,
An analysis signal extracting unit that extracts at least one of image information or audio information or image information and audio information in the content as an analysis signal, and stores the analysis signal in a storage unit;
The first quality value calculation means acquires the analysis signal from the storage means, and when there is a cut point during a series of conversations, evaluates the quality of the content highly, and there is camera work during the series of conversations. In this case, the rule storage means storing the condition judgment sentence that evaluates the quality of the content is referred to, and the quality value is calculated and output using the rule satisfaction measure calculated based on the rule corresponding to the analysis signal. A first quality value calculating step;
Second quality value calculation means reads the analysis signal from the storage means, among the analysis signal, shot length of the image signal, motion amount, color histogram, pitch change, power level, pitch change of the audio signal, A second quality value calculating step of calculating and outputting a quality value using a feature amount scale calculated using at least one of the power level ratios;
The content evaluation method characterized by performing.

The content evaluation program for functioning a computer as each means which comprises the content evaluation apparatus of any one of Claims 1 thru | or 4.

A computer-readable recording medium storing the content evaluation program according to claim 9.