JP7077624B2

JP7077624B2 - Content classification device, content classification method, and program

Info

Publication number: JP7077624B2
Application number: JP2018004552A
Authority: JP
Inventors: 裕一小林
Original assignee: Toppan Inc
Current assignee: Toppan Inc
Priority date: 2018-01-15
Filing date: 2018-01-15
Publication date: 2022-05-31
Anticipated expiration: 2038-01-15
Also published as: JP2019125114A

Description

本発明は、画像や音響等のコンテンツに対して人が認識する雰囲気やテイストを数値化するコンテンツ分類装置、コンテンツ分類方法、及びプログラムに関する。 The present invention relates to a content classification device, a content classification method, and a program for quantifying the atmosphere and taste perceived by a person for contents such as images and sounds.

一般に、人間が鑑賞対象に対して感じる雰囲気や、テイスト（味わい）などの感覚は、鑑賞対象物自体の特性や、人が対象物を視認する状況などと高い相関を示す傾向がある。
例えば、前記感覚と、鑑賞対象物自体の特性としての視覚的特徴量（画像であれば画像特徴量、映像であれば映像特徴量）に対する統計的な関係を示す統計量（平均値や分散値など）とが強い相関関係を有することが知られている（例えば、非特許文献１）。
また、前記感覚と、鑑賞対象物自体の特性としての聴覚的特徴量（音響であれば音響特徴量、音声であれば音声特徴量）に対する統計量とが強い相関関係を有することが知られている（例えば、非特許文献２）。
また、画像特徴量として画像の表現方法の一つであるＧＩＳＴ画像特徴量に対する統計的な関係を示す統計量により画像間の相関関係を取得する方法が知られている（例えば、非特許文献３）。 In general, the atmosphere that humans perceive for an object to be viewed and the sense of taste (taste) tend to show a high correlation with the characteristics of the object to be viewed and the situation in which a person visually recognizes the object.
For example, a statistic (average value or dispersion value) showing a statistical relationship between the above-mentioned sensation and a visual feature amount (image feature amount in the case of an image, image feature amount in the case of a video) as a characteristic of the viewing object itself. Etc.) are known to have a strong correlation (for example, Non-Patent Document 1).
Further, it is known that the sensation and a statistic with respect to an auditory feature amount (acoustic feature amount for acoustics, audio feature amount for audio) as a characteristic of the object to be viewed have a strong correlation. (For example, Non-Patent Document 2).
Further, as an image feature amount, a method of acquiring a correlation between images by a statistic showing a statistical relationship with a GIST image feature amount, which is one of the image expression methods, is known (for example, Non-Patent Document 3). ).

また、日常生活において、人は対象に対して感じる雰囲気に基づいて、その雰囲気に合った、或いはその雰囲気と同じような別の対象を探す場合がある。例えば、お気に入りのシャツに合わせるパンツを新たに入手しようとする場合、そのシャツと同じような雰囲気を有するパンツを入手したいと考える場合があり得る。
このような場合に、インターネットにおけるウェブページ（ＷｅｂＰａｇｅ）等を用いて、様々なシャツとパンツとを組合せた画像を参照し、雰囲気やテイストが同じような画像を自動で抽出させるのは困難である。その理由としては、お気に入りのシャツと似た画像を抽出させるには、ウェブページの画像について、そのシャツの画像特徴と類似した特徴を有するか否かを判断させればよいが、画像特徴が類似することが、同じ雰囲気と人に感じさせる要素を有することになるとは限らないためである。つまり、雰囲気及びテイストが同じであるか否かは、多分に個人の主観、或いは感覚によるものであるため、客観的に定量化することが難しく、具体的な物理量や特徴量に置き換えることが困難である。このため、専門家による知識や経験に基づいて、所望の雰囲気を持つ服の組合せ（コーディネイト）が提案されているのが実情である。 Also, in daily life, a person may search for another object that suits or is similar to the atmosphere, based on the atmosphere that the person feels about the object. For example, if you are looking for new pants to match your favorite shirt, you may want to get pants that have a similar feel to that shirt.
In such a case, it is difficult to automatically extract images with similar atmosphere and taste by referring to images that combine various shirts and pants using a web page (Web Page) on the Internet. be. The reason is that in order to extract an image similar to a favorite shirt, it is sufficient to determine whether or not the image on the web page has characteristics similar to the image characteristics of the shirt, but the image features are similar. This is because doing so does not always have the same atmosphere and elements that make people feel. In other words, whether or not the atmosphere and taste are the same depends on the subjectivity or sensation of the individual, so it is difficult to objectively quantify it, and it is difficult to replace it with a specific physical quantity or feature quantity. Is. Therefore, the actual situation is that a combination of clothes (coordination) having a desired atmosphere is proposed based on the knowledge and experience of an expert.

特開２００５－２５０５６２号公報Japanese Unexamined Patent Publication No. 2005-250562

JAVIER PORTILLA and EERO P. SIMONCELLI,“A Parametric Texture Model Based on Joint Statistics of Complex Wavelet Coefficients.” International Journal of Computer Vision, Vol.40, issue-1, pp.49-71, 2000.JAVIER PORTILLA and EERO P. SIMONCELLI, “A Parametric Texture Model Based on Joint Statistics of Complex Wavelet Coefficients.” International Journal of Computer Vision, Vol.40, issue-1, pp.49-71, 2000. Josh H.McDermott and Eero P.Simoncelli,“Sound Texture perception via statistics of the auditory periphery:Evidence from sound synthesis.” Neuron, Vol.71, issue-5, pp.926-940 2011, September 8.Josh H. McDermott and Eero P. Simoncelli, “Sound Texture perception via statistics of the auditory peripheral: Evidence from sound synthesis.” Neuron, Vol.71, issue-5, pp.926-940 2011, September 8. AUDE OLIVA, ANTONIO TORRALBA, “Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope,” International Journal of Computer Vision, Vol.42, issue-3, pp.145-175, 2001.AUDE OLIVA, ANTONIO TORRALBA, “Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope,” International Journal of Computer Vision, Vol.42, issue-3, pp.145-175, 2001.

本発明は、このような状況に鑑みてなされたもので、その目的は、鑑賞対象のコンテンツに対して人間が抱く雰囲気やテイストなどの感覚を数値化することにより、分類対象である画像群に含まれる各画像について、この鑑賞対象と同様な雰囲気やテイストなどを有するかを分類することができるコンテンツ分類装置、コンテンツ分類方法、及びプログラムを提供することにある。 The present invention has been made in view of such a situation, and an object of the present invention is to quantify the feelings such as atmosphere and taste that human beings have for the content to be viewed, thereby classifying the image group to be classified. It is an object of the present invention to provide a content classification device, a content classification method, and a program capable of classifying whether or not each of the included images has the same atmosphere and taste as the viewing target.

上述した課題を解決するために、本発明の一態様であるコンテンツ分類装置は、分類対象とする画像である対象コンテンツ群から、分類の基準とする画像である元コンテンツと同様の雰囲気及びテイストを有する対象コンテンツを分類するコンテンツ分類装置であって、画像における特性、スケール、及び当該特性に応じた区分を指定した際に定まる各領域の中でとり得る互いに異なる二つの領域の組合せについて、当該二つの領域の間の関係性を示す関係性量を、画像について人に知覚される雰囲気及びテイストを数値化したコンテンツ雰囲気量とし、前記元コンテンツと、前記対象コンテンツとの各々について前記コンテンツ雰囲気量を算出するコンテンツ雰囲気量算出部と、前記元コンテンツの前記コンテンツ雰囲気量の座標値と、前記対象コンテンツの前記コンテンツ雰囲気量の座標値との間における距離を算出する距離算出部と、前記距離に基づいて、前記対象コンテンツを分類するコンテンツ分類部とを備えることを特徴とする。 In order to solve the above-mentioned problems, the content classification device according to one aspect of the present invention obtains the same atmosphere and taste as the original content which is the image to be classified from the target content group which is the image to be classified. A content classification device that classifies the target content to be possessed , and the combination of two different regions that can be taken in each region determined when the characteristics, scale, and classification according to the characteristics in the image are specified. The relationship amount indicating the relationship between the two areas is defined as the content atmosphere amount in which the atmosphere and taste perceived by a person for the image are quantified, and the content atmosphere amount is defined for each of the original content and the target content. Based on the distance calculation unit for calculating the distance between the content atmosphere amount calculation unit to be calculated, the coordinate value of the content atmosphere amount of the original content, and the coordinate value of the content atmosphere amount of the target content. It is characterized by including a content classification unit for classifying the target content.

また、本発明の一態様であるコンテンツ分類方法は、分類対象とする画像である対象コンテンツ群から、分類の基準とする画像である元コンテンツと同様の雰囲気及びテイストを有する対象コンテンツを分類するコンテンツ分類方法であって、コンテンツ雰囲気量算出部が、画像における特性、スケール、及び当該特性に応じた区分を指定した際に定まる各領域の中でとり得る互いに異なる二つの領域の組合せについて、当該二つの領域の間の関係性を示す関係性量を、画像について人に知覚される雰囲気及びテイストを数値化したコンテンツ雰囲気量とし、前記元コンテンツと、前記対象コンテンツとの各々について前記コンテンツ雰囲気量を算出するコンテンツ雰囲気量算出工程と、距離算出部が、前記コンテンツ雰囲気量算出部により算出された元コンテンツの前記コンテンツ雰囲気量の座標値と、前記対象コンテンツの前記コンテンツ雰囲気量の座標値との間における距離を算出する距離算出工程と、コンテンツ分類部が、前記距離に基づいて、前記対象コンテンツを分類するコンテンツ分類工程とを有することを特徴とする。 Further, the content classification method according to one aspect of the present invention classifies the target content having the same atmosphere and taste as the original content which is the image used as the classification standard from the target content group which is the image to be classified. Regarding the combination of two different regions that can be taken in each region determined when the content atmosphere amount calculation unit specifies the characteristics, scale, and classification according to the characteristics in the classification method. The relationship amount indicating the relationship between the two areas is defined as the content atmosphere amount in which the atmosphere and taste perceived by a person for the image are quantified, and the content atmosphere amount is defined for each of the original content and the target content. Between the content atmosphere amount calculation process to be calculated and the coordinate value of the content atmosphere amount of the original content calculated by the content atmosphere amount calculation unit and the coordinate value of the content atmosphere amount of the target content by the distance calculation unit. The content classification unit includes a distance calculation step of calculating the distance in the above, and a content classification step of classifying the target content based on the distance.

また、本発明の一態様であるプログラムは、コンピュータを、上記コンテンツ分類装置として機能させるためのプログラムである。 Further, the program according to one aspect of the present invention is a program for making a computer function as the content classification device.

以上説明したように、本発明によれば、コンテンツに対し、人が認識する雰囲気やテイストといった感覚を数値化して示すことにより、分類対象コンテンツを、元コンテンツと同様な雰囲気やテイストを有するかを分類することができる。 As described above, according to the present invention, it is possible to determine whether the content to be classified has the same atmosphere and taste as the original content by quantifying and showing the feeling such as the atmosphere and taste that a person perceives. Can be classified.

本発明の第１の実施形態のコンテンツ分類装置１の構成例を示すブロック図である。It is a block diagram which shows the structural example of the content classification apparatus 1 of 1st Embodiment of this invention. コンテンツ特徴量データベース１６におけるコンテンツ特徴量テーブルの構成例を示す図である。It is a figure which shows the configuration example of the content feature amount table in the content feature amount database 16. コンテンツ統計量モデルデータベース１７におけるコンテンツ統計量テーブルの構成例を示す図である。It is a figure which shows the configuration example of the content statistic table in the content statistic model database 17. 本発明の第１の実施形態のコンテンツ分類装置１が行う処理の動作例を示すフローチャートである。It is a flowchart which shows the operation example of the process performed by the content classification apparatus 1 of the 1st Embodiment of this invention. 本発明の第１の実施形態のコンテンツ分類装置１が行う処理の効果を説明する図である。It is a figure explaining the effect of the processing performed by the content classification apparatus 1 of the 1st Embodiment of this invention. 本発明の第２の実施形態のコンテンツ分類装置１Ａの構成例を示すブロック図である。It is a block diagram which shows the structural example of the content classification apparatus 1A of the 2nd Embodiment of this invention. 本発明の第２の実施形態のコンテンツ分類装置１Ａが行う処理の動作例を示すフローチャートである。It is a flowchart which shows the operation example of the process performed by the content classification apparatus 1A of the 2nd Embodiment of this invention. 本発明の第２の実施形態によるコンテンツ特徴量生成部１１０Ａが行なう推定コンテンツ特徴量を生成する処理の動作例を示すフローチャートである。It is a flowchart which shows the operation example of the process which generates the estimated content feature amount performed by the content feature amount generation unit 110A by the 2nd Embodiment of this invention. 本発明の第２の実施形態によるコンテンツ統計量生成部１１１Ａが行なう推定コンテンツ統計量を生成する処理の動作例を示すフローチャートである。It is a flowchart which shows the operation example of the process which generates the estimated content statistic performed by the content statistic generation unit 111A by the 2nd Embodiment of this invention. 本発明の第３の実施形態によるコンテンツ生成装置１Ｂの構成例を示すブロック図である。It is a block diagram which shows the structural example of the content generation apparatus 1B according to the 3rd Embodiment of this invention. 深層学習画像モデルの生成を行う処理Ａ１の動作例を示すフローチャートである。It is a flowchart which shows the operation example of the process A1 which generates a deep learning image model. 深層学習画像モデルの生成を行う処理Ａ２の動作例を示すフローチャートである。It is a flowchart which shows the operation example of the process A2 which generates a deep learning image model. 本発明の第３の実施形態によるコンテンツ生成装置１Ｂが行なう処理の動作例を示すフローチャートである。It is a flowchart which shows the operation example of the process performed by the content generation apparatus 1B according to the 3rd Embodiment of this invention.

以下、実施形態のコンテンツ分類装置、コンテンツ分類方法、及びプログラムを、図面を参照して説明する。 Hereinafter, the content classification device, the content classification method, and the program of the embodiment will be described with reference to the drawings.

＜第１の実施形態＞
まず、第１の実施形態について説明する。
図１は、本発明の第１の実施形態によるコンテンツ分類装置１の構成例を示すブロック図である。図１におけるコンテンツ分類装置１は、コンテンツ選択部１０、コンテンツ雰囲気量算出部１１、座標距離算出部１２、コンテンツ分類部１３、コンテンツ出力部１４、コンテンツデータベース１５、コンテンツ特徴量データベース１６、コンテンツ統計量モデルデータベース１７及び分類コンテンツ記憶部１８の各々を備えている。また、コンテンツ雰囲気量算出部１１は、コンテンツ特徴量算出部１１０、及びコンテンツ統計量算出部１１１を備える。以下、本実施形態においては、コンテンツを静止画像として説明するが、動画像、映像、音響及び音声等の他のコンテンツに対しても適用する構成としても良い。 <First Embodiment>
First, the first embodiment will be described.
FIG. 1 is a block diagram showing a configuration example of the content classification device 1 according to the first embodiment of the present invention. The content classification device 1 in FIG. 1 includes a content selection unit 10, a content atmosphere amount calculation unit 11, a coordinate distance calculation unit 12, a content classification unit 13, a content output unit 14, a content database 15, a content feature amount database 16, and a content statistic. Each of the model database 17 and the classification content storage unit 18 is provided. Further, the content atmosphere amount calculation unit 11 includes a content feature amount calculation unit 110 and a content statistic amount calculation unit 111. Hereinafter, in the present embodiment, the content will be described as a still image, but the configuration may be applied to other contents such as moving images, videos, sounds, and sounds.

コンテンツ選択部１０は、ユーザが気にいった雰囲気及びテイストを有する元画像を、外部装置、コンテンツデータベース１５、あるいは図示しない入力手段（スキャナーなど）から画像データとして入力する。コンテンツ選択部１０は、入力された元画像の画像データを、コンテンツ雰囲気量算出部１１に対して出力する。
また、コンテンツ選択部１０は、例えば、元画像の雰囲気及びテイストを有するかという観点から分類しようとする画像群（分類対象画像群）を選択する。例えば、コンテンツ選択部１０は、コンテンツデータベース１５から分類対象画像群を選択する。または、コンテンツ選択部１０は、外部装置あるいはスキャナーなど（不図示）から入力された画像群を、分類対象画像群として選択してもよい。コンテンツ選択部１０は、選択した分類対象画像群の画像データをコンテンツ雰囲気量算出部１１に対して出力する。 The content selection unit 10 inputs an original image having an atmosphere and taste that the user likes as image data from an external device, a content database 15, or an input means (scanner or the like) (not shown). The content selection unit 10 outputs the input image data of the original image to the content atmosphere amount calculation unit 11.
Further, the content selection unit 10 selects, for example, an image group (classification target image group) to be classified from the viewpoint of having the atmosphere and taste of the original image. For example, the content selection unit 10 selects a classification target image group from the content database 15. Alternatively, the content selection unit 10 may select an image group input from an external device, a scanner, or the like (not shown) as a classification target image group. The content selection unit 10 outputs the image data of the selected image group to be classified to the content atmosphere amount calculation unit 11.

コンテンツ雰囲気量算出部１１は、コンテンツ特徴量算出部１１０と、コンテンツ統計量算出部１１１とを備える。
コンテンツ雰囲気量算出部１１は、元画像および分類対象画像群に含まれる各々の画像（分類対象画像）の物理特性（例えば、濃度特性、階調特性、色彩特性、空間周波数特性、解像度特性など）のうち、特に雰囲気及びテイストという概念で人に知覚される傾向にある特徴をどの程度有しているか（特徴量）を統計的に算出し、算出した統計的な特徴量（統計量）について、ある画像と別の画像との間で比較してその関係性を算出することで、雰囲気がどの程度互いに類似しているかを示す指標であるコンテンツ雰囲気量を算出する。 The content atmosphere amount calculation unit 11 includes a content feature amount calculation unit 110 and a content statistic calculation unit 111.
The content atmosphere amount calculation unit 11 has physical characteristics (for example, density characteristics, gradation characteristics, color characteristics, spatial frequency characteristics, resolution characteristics, etc.) of each image (classification target image) included in the original image and the classification target image group. Of these, the statistical feature amount (statistics) calculated by statistically calculating the degree of features that tend to be perceived by humans, especially in terms of atmosphere and taste (feature amount). By calculating the relationship between one image and another by comparing them, the content atmosphere amount, which is an index showing how similar the atmospheres are to each other, is calculated.

コンテンツ特徴量算出部１１０は、元画像、及び分類対象画像群に含まれる分類対象画像の物理特性（例えば、濃度特性、階調特性、色彩特性、空間周波数特性、解像度特性など）のうち、特に雰囲気及びテイストという概念で人に知覚される傾向にある特徴量（知覚特徴量）に基づいて、元画像、および分類対象画像の各々のコンテンツ特徴量を算出する。 The content feature amount calculation unit 110 particularly among the physical characteristics (for example, density characteristics, gradation characteristics, color characteristics, spatial frequency characteristics, resolution characteristics, etc.) of the original image and the classification target image included in the classification target image group. Based on the feature amount (perceived feature amount) that tends to be perceived by humans based on the concept of atmosphere and taste, the content feature amount of each of the original image and the image to be classified is calculated.

コンテンツ統計量算出部１１１は、コンテンツ特徴量算出部１１０により算出されたコンテンツ特徴量から、画像の所定の物理特性の、所定の領域における知覚特徴に対して、統計的操作を適用することで得られる統計量（知覚統計量）に基づいて、元画像、および分類対象画像の各々のコンテンツ統計量を算出する。 The content statistic calculation unit 111 is obtained by applying a statistical operation to the perceived characteristics of a predetermined physical characteristic of an image in a predetermined region from the content feature amount calculated by the content feature amount calculation unit 110. Based on the obtained statistic (perceptual statistic), the content statistic of each of the original image and the classification target image is calculated.

コンテンツ雰囲気量算出部１１は、コンテンツ統計量から、ある領域のコンテンツ統計量と他の領域のコンテンツ統計量との間の関係性を定量的に示すコンテンツ関係性量を算出する。例えば、ある画像の互いに異なる領域におけるコンテンツ統計量のコンテンツ関係性量と、別の画像の互いに異なる領域におけるコンテンツ統計量のコンテンツ関係性量と、の間の距離が近い場合、ある画像における所定の領域と、別の画像における所定の領域との雰囲気が互いに類似することになる。 The content atmosphere amount calculation unit 11 calculates a content relationship amount that quantitatively indicates the relationship between the content statistic in a certain area and the content statistic in another area from the content statistic. For example, if the distance between the content relational quantity of the content statistic in different regions of one image and the content relational quantity of the content statistic in different regions of another image is close, a predetermined image in one image. The atmosphere of the region and the predetermined region in another image will be similar to each other.

さらに詳しく説明する。
コンテンツ特徴量算出部１１０は、元画像、及び分類対象画像のコンテンツ特徴量を算出する。ここで、コンテンツ特徴量は、物理特性（例えば、濃度特性、階調特性、色彩特性、空間周波数特性、解像度特性など）のうち、特に雰囲気及びテイストという概念で人に知覚される傾向にある特徴量（知覚特徴量）である。コンテンツ特徴量（知覚特徴量）は、例えば、画像の視知覚特徴を示す、輝度、色度、コントラスト、勾配、エッジ、オプティカルフロー等である。 This will be explained in more detail.
The content feature amount calculation unit 110 calculates the content feature amount of the original image and the classification target image. Here, the content feature amount is a feature that tends to be perceived by humans in terms of atmosphere and taste, among physical characteristics (for example, density characteristics, gradation characteristics, color characteristics, spatial frequency characteristics, resolution characteristics, etc.). It is a quantity (perceptual feature quantity). The content feature amount (perceptual feature amount) is, for example, luminance, chromaticity, contrast, gradient, edge, optical flow, etc., which indicate the visual perceptual feature of the image.

コンテンツ特徴量算出部１１０は、例えば、画像における物理特性である濃度（画像の濃淡）特性に対して、画像におけるコントラストや勾配を人が雰囲気及びテイストとして知覚する傾向にあることから、その濃淡の強度を、コンテンツ特徴量（知覚特徴量）として算出する。また、コンテンツ特徴量算出部１１０は、例えば、画像における物理特性である色彩特性に対して、人が画像の輝度や色度を知覚しやすい傾向にあることから、画像における画素値の輝度及び色度を、コンテンツ特徴量（知覚特徴量）として算出する。 For example, the content feature amount calculation unit 110 tends to perceive the contrast or gradient in an image as an atmosphere and taste with respect to the density (shade) characteristic which is a physical characteristic in the image. The intensity is calculated as a content feature amount (perceptual feature amount). Further, since the content feature amount calculation unit 110 tends to easily perceive the brightness and chromaticity of the image with respect to the color characteristic which is the physical characteristic of the image, for example, the brightness and color of the pixel value in the image. The degree is calculated as a content feature amount (perceptual feature amount).

コンテンツ特徴量算出部１１０は、コンテンツ特徴量（知覚特徴量）を、コンテンツ特徴量データベース１６のコンテンツ特徴量テーブルから抽出する。また、コンテンツ特徴量算出部１１０は、選択したコンテンツ特徴量の各々の算出に用いる演算式を、コンテンツ特徴量データベース１６のコンテンツ特徴量テーブルから読み出す。そして、コンテンツ特徴量算出部１１０は、読み出した演算式により、元画像、及び分類対象画像の各々の画像から上記コンテンツ特徴量を算出する。 The content feature amount calculation unit 110 extracts the content feature amount (perceived feature amount) from the content feature amount table of the content feature amount database 16. Further, the content feature amount calculation unit 110 reads out the calculation formula used for each calculation of the selected content feature amount from the content feature amount table of the content feature amount database 16. Then, the content feature amount calculation unit 110 calculates the content feature amount from each image of the original image and the classification target image by the read calculation formula.

コンテンツ統計量算出部１１１は、コンテンツ特徴量算出部１１０により算出されたコンテンツ特徴量からコンテンツ統計量（知覚統計量）を算出する。コンテンツ統計量は、画像の所定の物理特性の、所定の領域における知覚特徴に対して、統計的操作を適用することで得られる統計量（知覚統計量）であり、画像における所定の領域の、知覚的な特徴の性質を示す指標となる量である。コンテンツ統計量（知覚統計量）は、例えば、方向別および解像度別の各視知覚特徴量の分布、それらの分布のエンベロープ、ヒストグラム形状特性、或は平均、分散、ヒストグラム、パワースペクトル等である。 The content statistic calculation unit 111 calculates a content statistic (perceptual statistic) from the content feature amount calculated by the content feature amount calculation unit 110. The content statistic is a statistic (perceptual statistic) obtained by applying a statistical operation to a perceptual feature in a predetermined region of a predetermined physical characteristic of an image, and is a statistic (perceptual statistic) of a predetermined region in the image. It is a quantity that is an indicator of the nature of perceptual features. The content statistic (perceptual statistic) is, for example, the distribution of each visual perceptual feature amount by direction and resolution, the envelope of the distribution, the histogram shape characteristic, or the mean, variance, histogram, power spectrum, and the like.

コンテンツ統計量算出部１１１は、例えば、画像を分割した各領域における濃淡の強度から、その各領域の中でとり得る互いに異なる二つの領域の組合せについて、当該二つの領域における濃淡の強度の相関度合を、コンテンツ統計量（知覚統計量）として算出する。コンテンツ統計量算出部１１１は、例えば、画像を分割した各領域における画素値（輝度や、色度）から各領域間における画素値の相関度合を、コンテンツ統計量（知覚統計量）として算出する。 The content statistic calculation unit 111, for example, determines the degree of correlation between the shade intensity in each of the two regions with respect to the combination of two different regions that can be taken in each region from the shade intensity in each region where the image is divided. Is calculated as a content statistic (perceptual statistic). The content statistic calculation unit 111 calculates, for example, the degree of correlation between the pixel values (brightness and chromaticity) in each region of the image as the content statistic (perceptual statistic).

コンテンツ統計量算出部１１１は、コンテンツ統計量（知覚統計量）を、コンテンツ統計量モデルデータベース１７のコンテンツ統計量テーブルから抽出する。また、コンテンツ統計量算出部１１１は、抽出されたコンテンツ統計量の各々の算出に用いる演算式を、コンテンツ統計量モデルデータベース１７のコンテンツ統計量モデルテーブルから読み出す。そして、コンテンツ統計量算出部１１１は、読み出した演算式により、コンテンツ特徴量算出部１１０が求めたコンテンツ特徴量から上記コンテンツ統計量を算出する。 The content statistic calculation unit 111 extracts the content statistic (perceptual statistic) from the content statistic table of the content statistic model database 17. Further, the content statistic calculation unit 111 reads out the calculation formula used for each calculation of the extracted content statistic from the content statistic model table of the content statistic model database 17. Then, the content statistic calculation unit 111 calculates the content statistic from the content feature amount obtained by the content feature amount calculation unit 110 by the read calculation formula.

コンテンツ雰囲気量算出部１１は、画像に対する所定の特性における所定の領域のコンテンツ統計量、および画像に対する所定のスケール（解像度レベル）における所定の領域のコンテンツ統計量の各々がとり得る二つの領域の組合せの、全ての組合せ各々について、当該二つの領域の間の関係性を算出する。ここで、スケールとは画像の解像度である。また、画像に対して、その画像の解像度を増減させることをスケーリングという。例えば、画像の解像度を、現状の解像度の２（＝２^１）倍の細かさに増加させる場合には２を基数とするレベル１のスケールアップ、４（＝２^２）倍に増加させる場合にはレベル２のスケールアップ、・・・という。逆に、画像の解像度を、現状の解像度の２分の１に減少させる場合は、２を基数とするレベル１のスケールダウン、４分の１に減少させる場合はレベル２のスケールダウンという。このように、画像の画素の細かさの度合いを画像の元々の解像度を基準にして表したものをスケール（解像度レベル）という。
コンテンツ雰囲気量算出部１１は、算出した関係性の各々を要素とするベクトルをコンテンツ雰囲気量として表す。
コンテンツ雰囲気量は、二つの領域の間の関係性を示す指標であり、例えば、単純相関、シフト相関、または相互相関等である。 The content atmosphere amount calculation unit 11 is a combination of two regions that can be taken by each of the content statistic of a predetermined region in a predetermined characteristic for an image and the content statistic of a predetermined region at a predetermined scale (resolution level) for the image. For each of all the combinations of, the relationship between the two regions is calculated. Here, the scale is the resolution of the image. Further, increasing or decreasing the resolution of an image with respect to the image is called scaling. For example, when increasing the resolution of an image to 2 (= 2 ¹ ) times the current resolution, scale up to level 1 based on 2 and increase it to 4 (= 2 ² ) times. Is a level 2 scale-up ... On the contrary, when the resolution of the image is reduced to one half of the current resolution, it is called level 1 scale down based on 2, and when it is reduced to one quarter, it is called level 2 scale down. A scale (resolution level) is a representation of the degree of fineness of pixels in an image based on the original resolution of the image.
The content atmosphere amount calculation unit 11 represents a vector having each of the calculated relationships as an element as the content atmosphere amount.
The content atmosphere amount is an index showing the relationship between two regions, and is, for example, a simple correlation, a shift correlation, a cross correlation, or the like.

座標距離算出部１２は、コンテンツ雰囲気量として表されたベクトル空間における、元画像のコンテンツ雰囲気量（元コンテンツ雰囲気量）の座標値（元コンテンツ座標値）と、分類対象画像のコンテンツ雰囲気量（対象コンテンツ雰囲気量）の座標値（対象コンテンツ座標値）との座標距離を算出する。 The coordinate distance calculation unit 12 has a coordinate value (original content coordinate value) of the content atmosphere amount (original content atmosphere amount) of the original image and a content atmosphere amount (target) of the classification target image in the vector space expressed as the content atmosphere amount. Calculate the coordinate distance from the coordinate value (target content coordinate value) of the content atmosphere amount).

ここで、上記距離は、例えば、単純な距離比較方法（ユークリッド距離、コサイン距離、ハミング距離、マンハッタン距離等）などを用いる。また、距離が近いほど類似度が高いとしているが、画像全体あるいは一部の面積を占める領域内における画素値の相互情報量やエントロピーあるいは相互相関などを、元画像と初期画像及び分類対象画像の各々との間の類似性の尺度として採用する構成としても良い。すなわち、コンテンツ特徴分布の距離を比較する方法（エントロピー、相互情報量、相互相関等）を用いてもよい。 Here, for the above distance, for example, a simple distance comparison method (Euclidean distance, cosine distance, Hamming distance, Manhattan distance, etc.) is used. In addition, although it is said that the closer the distance is, the higher the similarity is, the mutual information amount, entropy, mutual correlation, etc. of the pixel values in the area occupying the entire image or a part of the image can be determined between the original image, the initial image, and the image to be classified. It may be configured to be adopted as a measure of similarity with each. That is, a method of comparing the distances of the content feature distributions (entropy, mutual information amount, cross correlation, etc.) may be used.

ここで、相互情報量は、２つの画像（元画像と分類対象画像の２つ）における相互の依存度合いを示し、例えば、元画像と分類対象画像との各々の画素値分布が全く無関係であり独立である場合に「０」となり、それぞれの画素値分布が等しくなる場合に最大となる。画素値は画像特徴量として用いられ、画素値分布がコンテンツ統計量として用いられる。このとき、画像特徴をＲＧＢ値すなわち色成分Ｒ、Ｇ及びＢの各々の画素値として選択すれば、（ｒ，ｇ，ｂ）の３次元特徴で表される。そして、元画像と分類対象画像との画素値分布は全画素の（ｒ，ｇ，ｂ）の分布を表している。すなわち、画像中において、画素値がどの程度の確率で出現するのかとすれば、画素値が確率変数に相当し、情報の問題に読み替えることができ、確率分布として考えることができる。 Here, the mutual information amount indicates the degree of mutual dependence between the two images (the original image and the classification target image), and for example, the pixel value distributions of the original image and the classification target image are completely irrelevant. When it is independent, it becomes "0", and when each pixel value distribution becomes equal, it becomes maximum. Pixel values are used as image features and pixel value distributions are used as content statistics. At this time, if the image feature is selected as an RGB value, that is, a pixel value of each of the color components R, G, and B, it is represented by the three-dimensional feature (r, g, b). The pixel value distribution between the original image and the classification target image represents the distribution of (r, g, b) of all the pixels. That is, if the probability that a pixel value appears in an image is considered, the pixel value corresponds to a random variable, which can be read as an information problem, and can be considered as a probability distribution.

また、エントロピーは、「画像における画素値が実際にどのような値を取るか」の曖昧さを示す尺度として示す。例えば、エントロピーが大きい画像は、この画像における画素値のバラツキが大きく、一方、エントロピーが小さい画像は、この画像における画素値のバラツキが小さい（単色画像が小さくなる）。エントロピーをＨとすると、相互情報量において、元画像（Ｘ）と分類対象画像（Ｙ）との依存度は、画像ＸのエントロピーＨ（Ｘ）と、画像Ｙの曖昧さを知った上での画像ＸのエントロピーＨ（Ｘ｜Ｙ）との差（Ｈ（Ｘ）－Ｈ（Ｘ｜Ｙ）＝Ｈ（Ｘ；Ｙ））として、エントロピー差Ｈ（Ｘ；Ｙ）である分布距離が表される。 In addition, entropy is shown as a measure of ambiguity of "what kind of value the pixel value in an image actually takes". For example, an image having a large entropy has a large variation in pixel values in this image, while an image having a small entropy has a small variation in pixel values in this image (a monochromatic image becomes smaller). Assuming that the entropy is H, the degree of dependence between the original image (X) and the classification target image (Y) in the mutual information amount is based on knowing the entropy H (X) of the image X and the ambiguity of the image Y. The distribution distance of the entropy difference H (X; Y) is expressed as the difference (H (X) −H (X | Y) = H (X; Y)) of the image X from the entropy H (X | Y). To.

また、画像特徴量は、確率分布に読み替えることができ、確率分布としての距離がＫＬ（カルバック・ライプラー距離）ダイバージェンスやより一般化されたＩ（一般化ＫＬ）ダイバージェンス距離に代表される分布距離となる。これらの分布距離を上記距離に置き換えて、類似度の判定を行う構成としても良い。また、一般的な統計的分類方法（K-means法、Ward法等の階層的クラスタリング法等）、乃至機械学習による方法（Support Vector Machine法、Random Forest法、Boosting法、Neural Network法、Deep Neural Network法等）により、元画像と分類対象画像とのコンテンツ統計量の類似度を上記距離に置き換えても良い。 In addition, the image feature amount can be read as a probability distribution, and the distance as a probability distribution is a distribution distance represented by KL (Calback Lipler distance) divergence or a more generalized I (generalized KL) divergence distance. Become. These distribution distances may be replaced with the above distances to determine the degree of similarity. In addition, general statistical classification methods (K-means method, Ward method, etc., hierarchical clustering method, etc.) or machine learning methods (Support Vector Machine method, Random Forest method, Boosting method, Neural Network method, Deep Neural method, etc.) The similarity of the content statistics between the original image and the image to be classified may be replaced with the above distance by the Network method or the like).

コンテンツ分類部１３は、上記座標距離に基づいて、分類対象画像を元画像と同様の雰囲気及びテイストを有するかという観点で分類する。コンテンツ分類部１３は、例えば、座標距離を昇順にソートし、座標距離の最も小さい分類対象画像から順に元画像と同様の雰囲気等を有する画像であると分類する。この場合において、コンテンツ分類部１３は、上記座標距離と予め設定された閾値との比較を行い、座標距離がこの閾値以下である場合、元画像と分類対象画像とがユーザが同様の雰囲気及びテイストを感じる、すなわち分類対象画像が元画像と同様の雰囲気及びテイストを有すると分類するようにしてもよい。一方、コンテンツ分類部１３は、座標距離がこの閾値を超えている場合、元画像と分類対象画像とがユーザに同様の雰囲気及びテイストを感じさせない、すなわち分類対象画像が元画像と同様の雰囲気及びテイストを有していないと分類する。ここで、上記閾値は、例えば複数の人間により、複数の元画像と、この元画像に対する分類対象画像との比較を行うことで、分類対象画像が元画像と同様の雰囲気及びテイストを有していることが判定できる数値に設定されている。 The content classification unit 13 classifies the image to be classified based on the coordinate distance from the viewpoint of having the same atmosphere and taste as the original image. For example, the content classification unit 13 sorts the coordinate distances in ascending order, and classifies the images as having the same atmosphere as the original image in order from the classification target image having the smallest coordinate distance. In this case, the content classification unit 13 compares the coordinate distance with the preset threshold value, and when the coordinate distance is equal to or less than this threshold value, the original image and the classification target image have the same atmosphere and taste as the user. That is, the image to be classified may be classified as having the same atmosphere and taste as the original image. On the other hand, when the coordinate distance exceeds this threshold value, the content classification unit 13 does not make the user feel the same atmosphere and taste as the original image and the classification target image, that is, the classification target image has the same atmosphere and the same atmosphere as the original image. Classify as having no taste. Here, the threshold value is such that the classification target image has the same atmosphere and taste as the original image by comparing the plurality of original images with the classification target image for the original image by, for example, a plurality of humans. It is set to a value that can be determined to be present.

コンテンツ出力部１４は、コンテンツ分類部１３により元画像と同様の雰囲気及びテイストを有する画像として分類された画像を出力する（例えば、図示しない表示手段の表示画面に表示する）。また、コンテンツ出力部１４は、コンテンツ分類部１３により元画像と同様の雰囲気及びテイストを有する画像として分類された画像を、分類画像識別情報を付与し、この分類画像識別情報と組として分類コンテンツ記憶部１８に対して書き込んで記憶させる。 The content output unit 14 outputs an image classified by the content classification unit 13 as an image having the same atmosphere and taste as the original image (for example, it is displayed on a display screen of a display means (not shown)). Further, the content output unit 14 imparts classification image identification information to an image classified by the content classification unit 13 as an image having the same atmosphere and taste as the original image, and stores the classification content as a set with the classification image identification information. Write to and memorize part 18.

コンテンツデータベース１５は、画像群識別情報と、画像群名称及び画像データとが示されたコンテンツ群テーブルが書き込まれて記憶されている。
コンテンツ特徴量データベース１６は、特徴量識別情報と、特徴量名称及び特徴量演算式（モデル）の各々とが示されたコンテンツ特徴量テーブルが書き込まれて記憶されている。
コンテンツ統計量モデルデータベース１７は、統計量識別情報と、統計量名称、特徴量識別情報及び統計量演算式（モデル）の各々とが示されたコンテンツ統計量テーブルが書き込まれて記憶されている。
分類コンテンツ記憶部１８は、ハードディスクなどの記憶媒体であり、分類された分類対象画像の分類画像識別情報と、座標距離に対応付けた画像データが書き込まれて記憶される。 In the content database 15, a content group table showing image group identification information, an image group name, and image data is written and stored.
In the content feature amount database 16, a content feature amount table showing each of the feature amount identification information, the feature amount name, and the feature amount calculation formula (model) is written and stored.
In the content statistic model database 17, a content statistic table showing each of the statistic identification information, the statistic name, the feature amount identification information, and the statistic calculation formula (model) is written and stored.
The classification content storage unit 18 is a storage medium such as a hard disk, and the classification image identification information of the classified image to be classified and the image data associated with the coordinate distance are written and stored.

図２は、コンテンツ特徴量データベース１６におけるコンテンツ特徴量テーブルの構成例を示す図である。図２のコンテンツ特徴量テーブルは、レコード毎に、特徴量識別情報に対応して、特徴量名称及び特徴量演算式（モデル）の各々の欄を有している。特徴量識別情報は、コンテンツ特徴量の各々を識別する識別情報である。特徴量名称は、コンテンツ特徴量の各々の名称を示し、例えば、画像の濃淡、輝度及び色度と、それらのコントラスト、勾配、エッジ、オプティカルフローなどである。また、画像特徴量は、上述した種類のみではなく、画像認識分野にて提案された他の特徴を用いても良い。特徴量演算式（モデル）は、画像（の各々のピクセルの画素値）から、上記特徴量を求めるために用いる演算式（あるいはモデル）である。 FIG. 2 is a diagram showing a configuration example of a content feature amount table in the content feature amount database 16. The content feature amount table of FIG. 2 has each column of the feature amount name and the feature amount calculation formula (model) corresponding to the feature amount identification information for each record. The feature amount identification information is identification information for identifying each of the content feature amounts. The feature amount name indicates each name of the content feature amount, and is, for example, the shading, luminance, and chromaticity of the image, and their contrast, gradient, edge, optical flow, and the like. Further, as the image feature amount, not only the above-mentioned type but also other features proposed in the image recognition field may be used. The feature amount calculation formula (model) is a calculation formula (or model) used to obtain the feature amount from the image (pixel value of each pixel).

図３は、コンテンツ統計量モデルデータベース１７におけるコンテンツ統計量テーブルの構成例を示す図である。図３のコンテンツ統計量テーブルは、レコード毎に、統計量識別情報に対応して、統計量名称、統計量演算式（モデル）の各々の欄を有している。統計量識別情報は、コンテンツ統計量の各々を識別する識別情報である。統計名称は、コンテンツ統計量の各々の名称を示し、例えば、平均、分散、ヒストグラム、歪度、尖度、最大値、最小値、中央値、最頻値、偏り、密度、スペクトル、エネルギースペクトル、確率分布、回帰分析、主成分分析、独立成分分析、クラスタリング等を含む。上記統計量演算式（モデル）は、同一のレコードに示された特徴量識別情報の各々のコンテンツ特徴量から、上記統計量を求めるために用いる演算式（あるいはモデル）である。 FIG. 3 is a diagram showing a configuration example of a content statistic table in the content statistic model database 17. The content statistic table of FIG. 3 has columns for a statistic name and a statistic calculation formula (model) for each record, corresponding to the statistic identification information. The statistic identification information is identification information that identifies each of the content statistics. The statistic name indicates each name of the content statistic, for example, mean, variance, histogram, skewness, sharpness, maximum value, minimum value, median value, mode, bias, density, spectrum, energy spectrum, etc. Includes probability distribution, regression analysis, principal component analysis, independent component analysis, clustering, etc. The statistic calculation formula (model) is an calculation formula (or model) used to obtain the statistic from each content feature amount of the feature amount identification information shown in the same record.

ここで、上記コンテンツ特徴量と上記コンテンツ統計量との各々の説明を補足する。コンテンツ特徴量は、画素単位の情報あるいは当該画素の近傍の他の情報から求められる低次の画像特徴量（例えば、輝度、色度、コントラスト、勾配などの）である。
コンテンツ統計量は、所定の画像の物理特性に関して、所定の画像領域（画像の比較的広い領域や画像全体に至る広い領域）において、領域内の多数の画素に統計的操作（例えば、平均、分散、最大値、最小値、中央値、最頻値、ヒストグラム、偏り、密度、スペクトル、エネルギースペクトル、確率分布、回帰分析、主成分分析、独立成分分析、クラスタリング等を適用することで得られる。 Here, each explanation of the above-mentioned content feature amount and the above-mentioned content statistic is supplemented. The content feature amount is a low-order image feature amount (for example, brightness, chromaticity, contrast, gradient, etc.) obtained from information on a pixel-by-pixel basis or other information in the vicinity of the pixel.
Content statistics are statistical operations (eg, averaging, variance) on a large number of pixels in a given image area (a relatively large area of the image or a large area covering the entire image) with respect to the physical properties of the given image. , Maximum value, minimum value, median value, most frequent value, histogram, bias, density, spectrum, energy spectrum, probability distribution, regression analysis, principal component analysis, independent component analysis, clustering, etc.

また、上記コンテンツ雰囲気量は、例えば、所定の画像領域に対して、各々異なる物理特性に関して求めた同一の上記コンテンツ統計量の間での相関ベクトル、あるいは所定の物理特性に対して、各々異なる画像領域において求めた同一の上記コンテンツ統計量間での相関ベクトル、または異なる画像領域において異なる物理特性に関して求めた同一の上記コンテンツ統計量間での相関ベクトルを表し、例えば、同一の方向区分で異なる解像度に対して求めた空間周波数の間の相関である解像度別固定方向空間周波数相関ベクトル、あるいは同一の解像度で異なる方向区分に対して求めた空間周波数間の相関である方向別固定解像度空間周波数間の相関ベクトル、などを含む。 Further, the content atmosphere amount is, for example, a correlation vector between the same content statistics obtained for different physical characteristics for a predetermined image region, or an image different from each other for a predetermined physical characteristic. Represents a correlation vector between the same content statistics found in a region, or a correlation vector between the same content statistics found for different physical properties in different image regions, eg, different resolutions in the same direction segment. Fixed-direction spatial frequency correlation vector by resolution, which is the correlation between the spatial frequencies obtained for, or fixed-resolution spatial frequency by direction, which is the correlation between the spatial frequencies obtained for different direction divisions at the same resolution. Includes correlation vectors, etc.

また、相互相関は画像間の類似性を表すために用いられ、例えば画像におけるＲＧＢ（Red、Green、Blue）値における色成分Ｒの数値と色成分Ｇの数値との相互相関、色成分Ｒの数値と色成分Ｂの数値との相互相関及び色成分Ｇの数値と色成分Ｂの数値との相互相関の各々が、画像における色成分Ｒ及び色成分Ｇ、色成分Ｒ及び色成分Ｂ、色成分Ｇ及び色成分Ｂそれぞれの関係するコンテンツ統計量として用いることができる。ここで、ＲＧＢ値の各々が画像のコンテンツ特徴量である。 Further, the mutual correlation is used to express the similarity between images, for example, the mutual correlation between the numerical value of the color component R and the numerical value of the color component G in the RGB (Red, Green, Blue) value in the image, and the color component R. The mutual correlation between the numerical value and the numerical value of the color component B and the mutual correlation between the numerical value of the color component G and the numerical value of the color component B are the color component R and the color component G, the color component R and the color component B, and the color in the image. It can be used as a content statistic related to each of the component G and the color component B. Here, each of the RGB values is the content feature amount of the image.

上記画像の物理特性は、画像の各々をどのような物理特性で見るか（すなわち、物理特性により評価するか）を表しており、濃度特性、階調特性、色彩特性、空間周波数特性、解像度特性などを含んでいる。また、画像の所定領域は、画像の所定の区分領域を示しており、周波数区分（画像をすでに述べた空間周波数毎に分類）、方向区分（すでに述べた方向別における画像の情報の分類）、色彩区分（すでに述べた画素の色成分による分類）等を含んでいる。
ここで、周波数区分について補足する。一般的なフーリエ変換を用いて、画像から空間周波数算出した場合、画像全体における周波数と強度との対応関係を示す空間周波数特性が算出される。このようにして算出された空間周波数特性について周波数区分ごとに分割しても、画像の位置情報と空間周波数特性との関係を得ることができない。このため、本実施形態では、ウェーブレット変換等の位置情報を保持する周波数変換方法を用いて、画像から空間周波数空間を算出する。そして、画像の位置に基づいて分画した各分画画像における空間周波数特性に基づいて、空間周波数成分毎の統計量（例えば、相関などの統計量）を算出する。これにより、分画した画像同士が、空間周波数特性の観点から強い相関を有しているか否か等を認識することが可能となる。 The physical characteristics of the above image represent what kind of physical characteristics each image is viewed (that is, whether it is evaluated by the physical characteristics), and the density characteristics, gradation characteristics, color characteristics, spatial frequency characteristics, and resolution characteristics. Etc. are included. Further, the predetermined area of the image indicates a predetermined division area of the image, and the frequency division (classification of the image according to the spatial frequency already described), the direction classification (classification of the information of the image according to the direction already described), and the predetermined region of the image. It includes color classification (classification by color component of pixels already described) and the like.
Here, the frequency classification is supplemented. When the spatial frequency is calculated from the image using a general Fourier transform, the spatial frequency characteristic indicating the correspondence between the frequency and the intensity in the entire image is calculated. Even if the spatial frequency characteristics calculated in this way are divided for each frequency division, the relationship between the position information of the image and the spatial frequency characteristics cannot be obtained. Therefore, in the present embodiment, the spatial frequency space is calculated from the image by using a frequency conversion method that retains the position information such as wavelet transform. Then, a statistic for each spatial frequency component (for example, a statistic such as a correlation) is calculated based on the spatial frequency characteristics of each fractionated image fractionated based on the position of the image. This makes it possible to recognize whether or not the separated images have a strong correlation from the viewpoint of spatial frequency characteristics.

ここで、物理特性α１、スケール（解像度レベル）β１で求められた画像（特性画像）Ｉ_α１ ^β１と、物理特性α２、スケールβ２で求められた特性画像Ｉ_α２ ^β２との間の相関ベクトルを、コンテンツ統計量として算出する例について説明する。 Here, the correlation vector between the image (characteristic image) I _α1 ^β1 obtained by the physical characteristic α1 and the scale (resolution level) β1 and the characteristic image I _α2 ^β2 obtained by the physical characteristic α2 and the scale β2 is obtained. An example of calculating as a content statistic will be described.

まず、特性画像Ｉ_α１ ^β１と特性画像Ｉ_α２ ^β２とは、それぞれＮ個に分割される。ただし、Ｎは任意の自然数である。特性画像Ｉ_α１ ^β１と特性画像Ｉ_α２ ^β２とは、例えば、方向区分として、画像の中心から、角度（３６０°／Ｎ）毎に放射状に伸ばした境界線により分画されることにより、Ｎ個に分割される。ここで、特性画像が画像における濃淡の強度特性を示す強度画像である場合には、その強度画像が分割される。また、特性画像が画像における色度特性を示す色度画像である場合には、画素値（Ｒ、Ｇ、Ｂ）の各成分、（Ｒ－Ｇ、Ｙ－Ｂ）の各成分、或いは（Ｌ＊、ａ＊、ｂ＊）の各成分が示された色度画像が分割される。また、特性画像が空間周波数特性を示す画像である場合には、特性画像の元の画像が分割され、分割された各々の画像に対して、画像横方向の空間周波数、及び画像縦方向の空間周波数が示された空間周波数平面が生成される。或いは、特性画像の各々は、配置区分として、画像平面上を升目状にＮ個に分割されてもよい。この場合、特性画像は、例えば画像横方向にｎ１個、縦方向にｎ２個に等分画される。ただし、ｎ１、ｎ２の各々は、ｎ１×ｎ２＝Ｎの関係をみたす自然数である。 First, the characteristic image I _α1 ^β1 and the characteristic image I _α2 ^β2 are each divided into N pieces. However, N is an arbitrary natural number. The characteristic image I _α1 ^β1 and the characteristic image I _α2 ^β2 are, for example, N pieces by being separated by a boundary line extending radially from the center of the image at every angle (360 ° / N) as a direction division. It is divided into. Here, when the characteristic image is an intensity image showing the intensity characteristic of shading in the image, the intensity image is divided. When the characteristic image is a chromaticity image showing the chromaticity characteristic in the image, each component of the pixel value (R, G, B), each component of (RG, YB), or (L). The chromaticity image showing each component of *, a *, b *) is divided. When the characteristic image is an image showing spatial frequency characteristics, the original image of the characteristic image is divided, and for each divided image, the spatial frequency in the horizontal direction of the image and the space in the vertical direction of the image are obtained. A spatial frequency plane showing the frequency is generated. Alternatively, each of the characteristic images may be divided into N pieces in a square shape on the image plane as an arrangement division. In this case, the characteristic image is equally divided into n1 images in the horizontal direction and n2 images in the vertical direction, for example. However, each of n1 and n2 is a natural number satisfying the relationship of n1 × n2 = N.

次に、特性画像Ｉ_α１ ^β１をＮ個に分画した各区分に対して、特性画像Ｉ_α２ ^β２をＮ個に分画した各区分との間の相関値が算出される。相関値は、例えば、以下の式（１）で示される。ここで、ＶＰｓｔａｔ１は相関値、α１は特性画像Ｉ１の特性、β１は特性画像Ｉ１のスケール、α２は特性画像Ｉ２の特性、β２は特性画像Ｉ２のスケール、Ｎは特性画像Ｉ１、Ｉ２の区分数、Ｓ１は特性画像Ｉ１の一つの区分、Ｓ２は特性画像Ｉ２の一つの区分、ν_ｋ ^１は区分Ｓ１内の各画素値、ν_ｋ ^２は区分Ｓ２内の各画素値、ｗは特性画像の幅（横）方向の画素数、ｈは特性画像の高さ（縦）方向の画素数をそれぞれ示す。 Next, for each division in which the characteristic image I _α1 ^β1 is fractionated into N pieces, a correlation value between each division in which the characteristic image I _α2 ^β2 is divided into N pieces is calculated. The correlation value is represented by, for example, the following equation (1). Here, VPstat1 is the correlation value, α1 is the characteristic of the characteristic image I1, β1 is the scale of the characteristic image I1, α2 is the characteristic of the characteristic image I2, β2 is the scale of the characteristic image I2, and N is the number of divisions of the characteristic images I1 and I2. , S1 is one division of the characteristic image I1, S2 is one division of the characteristic image I2, ν _k ¹ is each pixel value in the division S1, ν _k ² is each pixel value in the division S2, and w is the characteristic image. The number of pixels in the width (horizontal) direction and h indicate the number of pixels in the height (vertical) direction of the characteristic image.

式（１）に示される相関値が、特性画像Ｉ_α１ ^β１のＮ個の区分それぞれに対する、特性画像Ｉ_α２ ^β２のＮ個の区分との組み合わせ全てについて算出される。また相関値は、特性毎、スケール毎に算出される。つまり、特性数をＮ_α、スケール数をＮ_βとして、（Ｎ_α×Ｎ_β×Ｎ）^２個の相関値が算出され、結局、コンテンツ統計量（知覚統計量）としての相関値は、以下の式（２）に示すような、（Ｎ_α×Ｎ_β×Ｎ）^２次元のベクトルとなる。 The correlation value shown in the equation (1) is calculated for each of the N divisions of the characteristic image I _α1 ^β1 and all the combinations with the N divisions of the characteristic image I _α2 ^β2 . The correlation value is calculated for each characteristic and each scale. That is, assuming that the number of characteristics is N _α and the number of scales is N _β , ^two correlation values (N _α × N _β × N) are calculated, and in the end, the correlation values as the content statistic (perceptual statistic) are as follows. It becomes a (N _α × N _β × N) ^two -dimensional vector as shown in the equation (2) of.

あるいは、特性画像Ｉ_α１ ^β１をＮ個に分画した各区分に対して、特性画像Ｉ_α２ ^β２をＮ個に分画した各区分との間の関係について、以下の式（３）に示されるシフト相関値が算出されてもよい。ここで、ＶＰｓｔａｔ３はシフト相関値、α１は特性画像Ｉ１の特性、β１は特性画像Ｉ１のスケール、α２は特性画像Ｉ２の特性、β２は特性画像Ｉ２のスケール、Ｎは特性画像Ｉ１、Ｉ２の区分数、Ｓ１は特性画像Ｉ１の一つの区分、Ｓ２は特性画像Ｉ２の一つの区分、ｍはシフト数、ν_ｋ ^１は区分Ｓ１内の各画素値、ν_ｋ ^２は区分Ｓ２内の各画素値、ｗは特性画像の幅（横）方向の画素数、ｈは特性画像の高さ（縦）方向の画素数をそれぞれ示す。 Alternatively, the relationship between each division in which the characteristic image I _α1 ^β1 is fractionated into N pieces and the division in which the characteristic image I _α2 ^β2 is divided into N pieces is shown in the following equation (3). The shift correlation value may be calculated. Here, VPstat3 is a shift correlation value, α1 is a characteristic of the characteristic image I1, β1 is the scale of the characteristic image I1, α2 is the characteristic of the characteristic image I2, β2 is the scale of the characteristic image I2, and N is the classification of the characteristic images I1 and I2. Number, S1 is one division of the characteristic image I1, S2 is one division of the characteristic image I2, m is the number of shifts, ν _k ¹ is each pixel value in the division S1, and ν _k ² is each pixel value in the division S2. , W indicate the number of pixels in the width (horizontal) direction of the characteristic image, and h indicates the number of pixels in the height (vertical) direction of the characteristic image.

式（３）に示すシフト相関に基づくコンテンツ雰囲気量としての相関ベクトルは、シフト数ｍが取り得る値（０からＮ－１まで）に対応する相関値、つまり、Ｎ個の相関値が算出される。結局、コンテンツ統計量（知覚統計量）としてのシフト相関ベクトルは、（Ｎ_α×Ｎ_β×Ｎ）^２×Ｎ次元のベクトルとなる。 For the correlation vector as the content atmosphere amount based on the shift correlation shown in the equation (3), the correlation values corresponding to the values (from 0 to N-1) that the shift number m can take, that is, N correlation values are calculated. To. After all, the shift correlation vector as the content statistic (perceptual statistic) becomes a vector of (N _α × N _β × N) ² × N dimension.

また、特性画像Ｉ_α１ ^β１をＮ個に分画した各区分におけるコンテンツ特徴量（知覚特徴量）としての「強度」は、一つの要素が、以下の式（４）で示される（Ｎ_α×Ｎ_β×Ｎ）次元のベクトルとなる。ここで、ＶＰｓｔａｔ４はエネルギー、α１は特性画像Ｉ１の特性、β１は特性画像Ｉ１のスケール、Ｎは特性画像Ｉ１の区分数、Ｓ１は特性画像Ｉ１の一つの区分、ν_ｋ ^１は区分Ｓ１内の各画素値、ｗは特性画像の幅（横）方向の画素数、ｈは特性画像の高さ（縦）方向の画素数をそれぞれ示す。 Further, one element of the "intensity" as the content feature amount (perceived feature amount) in each category obtained by fractionating the characteristic image I _α1 ^β1 into N pieces is represented by the following equation (4) (N _α ×). N _β × N) It becomes a dimensional vector. Here, VPstat4 is energy, α1 is the characteristic of the characteristic image I1, β1 is the scale of the characteristic image I1, N is the number of divisions of the characteristic image I1, S1 is one division of the characteristic image I1, and ν _k ¹ is in the division S1. Each pixel value, w indicates the number of pixels in the width (horizontal) direction of the characteristic image, and h indicates the number of pixels in the height (vertical) direction of the characteristic image.

式（４）のエネルギーに対応するコンテンツ統計量（知覚統計量）としての「エネルギー」は、一つの要素が、以下の式（５）に示される（Ｎ_α×Ｎ_β×Ｎ）^２次元のベクトルとなる。ここで、ＶＰｓｔａｔ５はエネルギー相関、α１は特性画像Ｉ１の特性、β１は特性画像Ｉ１のスケール、α２は特性画像Ｉ２の特性、β２は特性画像Ｉ２のスケール、Ｎは特性画像Ｉ１、Ｉ２の区分数、Ｓ１は特性画像Ｉ１の一つの区分、Ｓ２は特性画像Ｉ２の一つの区分、ν_ｋ ^１は区分Ｓ１内の各画素値、ν_ｋ ^２は区分Ｓ２内の各画素値、ｗは特性画像の幅（横）方向の画素数、ｈは特性画像の高さ（縦）方向の画素数をそれぞれ示す。 "Energy" as a content statistic (perceptual statistic) corresponding to the energy of the formula (4) has one element represented by the following formula (5) (N _α × N _β × N) ^two -dimensional. It becomes a vector. Here, VPstat5 is energy correlation, α1 is the characteristic of the characteristic image I1, β1 is the scale of the characteristic image I1, α2 is the characteristic of the characteristic image I2, β2 is the scale of the characteristic image I2, and N is the number of divisions of the characteristic images I1 and I2. , S1 is one division of the characteristic image I1, S2 is one division of the characteristic image I2, ν _k ¹ is each pixel value in the division S1, ν _k ² is each pixel value in the division S2, and w is the characteristic image. The number of pixels in the width (horizontal) direction and h indicate the number of pixels in the height (vertical) direction of the characteristic image.

図４は、本実施形態によるコンテンツ分類装置１が行なう分類対象画像群から元画像と同様の雰囲気及びテイストを有する画像を分類する処理（以下、単に分類処理という）の動作例を示すフローチャートである。以下に示すフローチャートにおいては、すでに、コンテンツ特徴量及びコンテンツ特徴量を求めるための演算式（モデル）がコンテンツ特徴量データベース１６に書き込まれて記憶され、またコンテンツ統計量及びコンテンツ統計量を求めるための演算式（モデル）がコンテンツ統計量モデルデータベース１７に書き込まれて記憶されていることを前提に説明する。 FIG. 4 is a flowchart showing an operation example of a process (hereinafter, simply referred to as a classification process) for classifying images having the same atmosphere and taste as the original image from the classification target image group performed by the content classification device 1 according to the present embodiment. .. In the flowchart shown below, the calculation formula (model) for obtaining the content feature amount and the content feature amount is already written and stored in the content feature amount database 16, and the content statistic and the content statistic are obtained. The description will be made on the premise that the arithmetic expression (model) is written and stored in the content statistic model database 17.

ステップＳ１０：
コンテンツ選択部１０は、ユーザの操作により外部装置から入力された、画像を分類する際の基準となる元画像を、コンテンツ特徴量算出部１１０に対して出力する。 Step S10:
The content selection unit 10 outputs the original image, which is input from the external device by the user's operation and serves as a reference when classifying the images, to the content feature amount calculation unit 110.

ステップＳ１１：
コンテンツ選択部１０は、コンテンツデータベース１５から、分類対象画像群を抽出する。そして、コンテンツ選択部１０は、抽出した対象選択画像群のうちの一つの画像を分類対象画像としてコンテンツ特徴量算出部１１０に対して出力する。なお、コンテンツ選択部１０は、外部メモリ、または通信ネットワークを介してユーザの操作により入力された画像群を、分類対象画像群として選択してもよい。 Step S11:
The content selection unit 10 extracts a classification target image group from the content database 15. Then, the content selection unit 10 outputs one image of the extracted target selection image group as a classification target image to the content feature amount calculation unit 110. The content selection unit 10 may select an image group input by a user's operation via an external memory or a communication network as a classification target image group.

ステップＳ１２：
コンテンツ特徴量算出部１１０は、分類処理に用いる画像特徴量の種類を選択する。コンテンツ特徴量算出部１１０は、例えば、ユーザの操作により指定された画像特徴量の種類に対応する特徴量識別情報を、コンテンツ特徴量データベース１６のコンテンツ特徴量テーブルにおいて検索する。そして、コンテンツ特徴量算出部１１０は、特徴量識別情報に対応するコンテンツ特徴量を求めるための特徴量演算式（モデル）をコンテンツ特徴量テーブルから読み出す。 Step S12:
The content feature amount calculation unit 110 selects the type of image feature amount used for the classification process. The content feature amount calculation unit 110 searches, for example, the feature amount identification information corresponding to the type of the image feature amount designated by the user's operation in the content feature amount table of the content feature amount database 16. Then, the content feature amount calculation unit 110 reads out the feature amount calculation formula (model) for obtaining the content feature amount corresponding to the feature amount identification information from the content feature amount table.

ステップＳ１３：
コンテンツ統計量算出部１１１は、分類処理に用いる画像統計量の種類を選択する。コンテンツ統計量算出部１１１は、例えば、ユーザの操作により指定された画像統計量の種類に対応する、コンテンツ統計量モデルデータベース１７のコンテンツ統計量モデルテーブルに記憶された統計量名称を選択する。コンテンツ統計量算出部１１１は、選択した統計量名称に対応する特徴量識別情報を、コンテンツ統計量モデルデータベース１７のコンテンツ統計量モデルテーブルから読み出す。なお、ここでユーザにより指定されるコンテンツ統計量の種類は一つであってもよいし複数であってもよい。 Step S13:
The content statistic calculation unit 111 selects the type of image statistic used for the classification process. The content statistic calculation unit 111 selects, for example, a statistic name stored in the content statistic model table of the content statistic model database 17 corresponding to the type of image statistic specified by the user's operation. The content statistic calculation unit 111 reads the feature amount identification information corresponding to the selected statistic name from the content statistic model table of the content statistic model database 17. The type of content statistic specified by the user here may be one or a plurality.

ステップＳ１４：
コンテンツ特徴量算出部１１０は、読み出したコンテンツ特徴量を求めるための特徴量演算式（モデル）を用い、元画像の画像データ及び分類対象画像の画像データの各々から、それぞれコンテンツ特徴量を算出する。そして、コンテンツ特徴量算出部１１０は、算出した元画像及び分類対象画像の各々のコンテンツ特徴量をコンテンツ統計量算出部１１１に対して出力する。 Step S14:
The content feature amount calculation unit 110 uses a feature amount calculation formula (model) for obtaining the read content feature amount, and calculates the content feature amount from each of the image data of the original image and the image data of the image to be classified. .. Then, the content feature amount calculation unit 110 outputs the calculated content feature amounts of the original image and the classification target image to the content statistic calculation unit 111.

ステップＳ１５：
コンテンツ統計量算出部１１１は、コンテンツ特徴量算出部１１０から供給される元画像及び分類対象画像の各々のコンテンツ特徴量を用い、元画像、分類対象画像それぞれのコンテンツ統計量を算出する。 Step S15:
The content statistic calculation unit 111 calculates the content statistic of each of the original image and the classification target image by using the content feature amounts of the original image and the classification target image supplied from the content feature amount calculation unit 110.

ステップＳ１６：
コンテンツ雰囲気量算出部１１は、元画像および分類対象画像の各々について、領域毎にとり得る二つの領域の組合せの、全ての組合せについて、当該二つの領域におけるコンテンツ統計量の間の関係性（例えば、相関値）を算出し、算出した関係性コンテンツ統計量の各々を要素とするベクトルをコンテンツ雰囲気量として構成する。 Step S16:
The content atmosphere amount calculation unit 11 has a relationship (for example, for example) between the content statistics in the two regions for all combinations of the combinations of the two regions that can be taken for each region for each of the original image and the classification target image. Correlation value) is calculated, and a vector having each of the calculated relational content statistics as an element is constructed as the content atmosphere quantity.

ステップＳ１７：
座標距離算出部１２は、ユーザが設定したコンテンツ雰囲気量を軸とする雰囲気量空間において、元画像のコンテンツ雰囲気量の座標値と、分類対象画像のコンテンツ雰囲気量の座標値との間の距離である座標距離を算出する（求める）。例えば、コンテンツ雰囲気量が単数の場合、１次元雰囲気量空間における元画像のコンテンツ統計量の座標値と、分類対象画像のコンテンツ雰囲気量の座標値との間の距離が求められる。一方、コンテンツ統計量が複数（ｎ≧２）次元のベクトルの場合、ｎ次元雰囲気量空間における元画像のコンテンツ雰囲気量のベクトルの示す座標値と、分類対象画像のコンテンツ雰囲気量のベクトルの示す座標値との間の座標距離を求める。そして、座標距離算出部１２は、求めた元画像及び分類対象画像のコンテンツ雰囲気量の座標間の座標距離を、コンテンツ分類部１３に対して出力する。 Step S17:
The coordinate distance calculation unit 12 is a distance between the coordinate value of the content atmosphere amount of the original image and the coordinate value of the content atmosphere amount of the classification target image in the atmosphere amount space centered on the content atmosphere amount set by the user. Calculate (find) a certain coordinate distance. For example, when the content atmosphere amount is singular, the distance between the coordinate value of the content statistic of the original image in the one-dimensional atmosphere amount space and the coordinate value of the content atmosphere amount of the classification target image is obtained. On the other hand, when the content statistic is a vector having a plurality of dimensions (n ≧ 2), the coordinate values indicated by the vector of the content atmosphere of the original image in the n-dimensional atmosphere space and the coordinates indicated by the vector of the content atmosphere of the image to be classified. Find the coordinate distance between the values. Then, the coordinate distance calculation unit 12 outputs the coordinate distance between the coordinates of the content atmosphere amount of the obtained original image and the classification target image to the content classification unit 13.

ステップＳ１８：
コンテンツ分類部１３は、分類対象画像群に含まれる画像全てについて、上記座標距離が算出されたか否かの判定を行う。このとき、座標距離判定部１０６は、分類対象画像群に含まれる画像全てについて上記座標距離が算出されている場合、処理をステップＳ１８へ進める。一方、座標距離判定部１０６は、分類対象画像群に含まれる画像全てについて上記座標距離が算出されていない場合、処理をステップＳ１９へ進める。 Step S18:
The content classification unit 13 determines whether or not the coordinate distance has been calculated for all the images included in the classification target image group. At this time, if the coordinate distance is calculated for all the images included in the classification target image group, the coordinate distance determination unit 106 proceeds to step S18. On the other hand, if the coordinate distance is not calculated for all the images included in the classification target image group, the coordinate distance determination unit 106 proceeds to step S19.

ステップＳ１９：
コンテンツ分類部１３は、分類対象画像群に含まれる画像を、上記座標距離に基づいて分類する。コンテンツ分類部１３は、分類対象画像群に含まれる画像全てについて、各画像に対応する上記座標距離の小さい順から大きい順に昇順にソートする。コンテンツ分類部１３は、上記座標距離が最も小さい画像が最も元画像と同様の雰囲気及びテイストを有する画像として分類する。コンテンツ分類部１３は、上記座標距離が大きくなるに従い、元画像と同様の雰囲気及びテイストを有しない画像として分類する。
また、コンテンツ出力部１４は、分類コンテンツ記憶部１８に対し、分類対象画像を分類した結果を、所定の画像フォーマットにより書き込んで記憶させる。 Step S19:
The content classification unit 13 classifies the images included in the classification target image group based on the coordinate distance. The content classification unit 13 sorts all the images included in the classification target image group in ascending order from the smallest to the largest coordinate distance corresponding to each image. The content classification unit 13 classifies the image having the smallest coordinate distance as an image having the same atmosphere and taste as the original image. The content classification unit 13 classifies the image as an image that does not have the same atmosphere and taste as the original image as the coordinate distance increases.
Further, the content output unit 14 writes and stores the result of classifying the classification target image in the classification content storage unit 18 in a predetermined image format.

ステップＳ２０：
コンテンツ選択部１０は、対象選択画像群に含まれる画像のうち、まだ分類していない画像を分類対象画像としてコンテンツ特徴量算出部１１０に対して出力する。そして、コンテンツ選択部１０は、処理をステップＳ１４へ進める。これにより、分類対象画像のコンテンツ特徴量、コンテンツ統計量、およびコンテンツ雰囲気量が求められ、再度、元画像との座標距離の判定が行われる。 Step S20:
The content selection unit 10 outputs an image that has not yet been classified among the images included in the target selection image group to the content feature amount calculation unit 110 as a classification target image. Then, the content selection unit 10 advances the process to step S14. As a result, the content feature amount, the content statistic amount, and the content atmosphere amount of the classification target image are obtained, and the coordinate distance from the original image is determined again.

本実施形態においては、分類対象画像を分類する際、元画像と分類対象画像との各々のコンテンツ統計量それぞれの座標値間の座標距離をソートして分類対象画像を元画像と同様の雰囲気及びテイストを有する度合の高い順（座標距離の小さい順）に分類している。
しかしながら、上記座標距離が所定の閾値以下の分類対象画像を、元画像と同様の雰囲気及びテイストを有する画像として分類する構成としてもよい。また、座標距離の小さい順に、所定の個数（例えば１０個）の画像を元画像と同様の雰囲気及びテイストを有する画像として分類する構成としてもよい。 In the present embodiment, when the classification target image is classified, the coordinate distance between the coordinate values of each content statistic of the original image and the classification target image is sorted, and the classification target image has the same atmosphere and atmosphere as the original image. They are classified in descending order of degree of taste (smallest coordinate distance).
However, the classification target image whose coordinate distance is equal to or less than a predetermined threshold value may be classified as an image having the same atmosphere and taste as the original image. Further, a predetermined number (for example, 10) of images may be classified in ascending order of coordinate distance as images having the same atmosphere and taste as the original image.

図５は、図４に示すフローチャートにおける分類処理の処理結果を示す図である。図５（ａ）は分類対象画像群（画像Ｇ－１～Ｇ－２３、…）、図５（ｂ）は分類の基準とした元画像Ｇ－３０、図５（ｃ）は図５（ａ）に示す分類対象画像群の中で元画像Ｇ－３０と同様な雰囲気及びテイストを有すると分類された上位１０の画像である画像Ｇ－４０～Ｇ－４９をそれぞれ示す。図５（ｄ）は分類の基準とした元画像Ｇ－５０、図５（ｅ）は図５（ａ）に示す分類対象画像群の中で元画像Ｇ－５０と同様な雰囲気及びテイストを有すると分類された上位１０の画像である画像Ｇ－６０～Ｇ－６９をそれぞれ示す。図５に示すように、実施形態のコンテンツ分類装置１により、分類対象画像群に含まれる画像の中から、元画像Ｇ－３０（又はＧ－５０）と同様な雰囲気及びテイストを有する画像が分類されたことが確認できる。 FIG. 5 is a diagram showing a processing result of the classification process in the flowchart shown in FIG. 5 (a) is a group of images to be classified (images G-1 to G-23, ...), FIG. 5 (b) is an original image G-30 used as a classification reference, and FIG. 5 (c) is FIG. 5 (a). ), The top 10 images classified as having the same atmosphere and taste as the original image G-30 in the classification target image group, respectively, are shown. FIG. 5D shows the original image G-50 as the classification standard, and FIG. 5E shows the same atmosphere and taste as the original image G-50 in the classification target image group shown in FIG. 5A. Then, the images G-60 to G-69, which are the top 10 images classified, are shown. As shown in FIG. 5, the content classification device 1 of the embodiment classifies images having the same atmosphere and taste as the original image G-30 (or G-50) from the images included in the classification target image group. It can be confirmed that it was done.

第１の実施形態によれば、鑑賞対象の元画像（元コンテンツ）に対して人間が抱く雰囲気やテイストなどの感覚に対応する画像特徴量（コンテンツ特徴量）、および画像統計量（コンテンツ統計量）を、人間の視覚（知覚）系の神経機構の処理過程モデルを反映した画像統計量（コンテンツ統計量）として数値化し、統計量空間における座標値として示し、元画像と分類対象画像（対象コンテンツ）との座標値間の座標距離を判定することにより、人間が抱く雰囲気やテイストなどの感覚（人間の脳の知覚処理）を、従来に比較してより正確に再現することができ、鑑賞対象の元画像と同様な感覚を有する他のコンテンツ（対象コンテンツ）を、不特定の画像群（分類対象画像群）の中から分類することが可能となる。 According to the first embodiment, an image feature amount (content feature amount) and an image statistic (content statistic amount) corresponding to the human sensation such as atmosphere and taste with respect to the original image (original content) to be viewed. ) Is quantified as an image statistic (content statistic) that reflects the processing process model of the neural mechanism of the human visual (perceptual) system, and is shown as a coordinate value in the statistic space. By determining the coordinate distance between the coordinate values with (), it is possible to more accurately reproduce the sensations such as the atmosphere and taste that humans have (perceptual processing of the human brain) compared to the past, and it is an object to be viewed. It is possible to classify other contents (target contents) having the same feeling as the original image of the above from the unspecified image group (classification target image group).

なお、人の脳機能の解明が進むにしたがって、この他にも多くの視知覚特徴が発見されつつあり、それらを含めてもよい。 As the elucidation of human brain function progresses, many other visual perceptual features are being discovered, and they may be included.

＜第２の実施形態＞
以下、本発明の第２の実施形態について、図面を参照して説明する。
図６は、本発明の第２の実施形態によるコンテンツ分類装置１Ａの構成例を示すブロック図である。図６におけるコンテンツ分類装置１Ａは、コンテンツ特徴量算出部１１０Ａが機械学習により生成されたコンテンツ特徴量モデルを用いてコンテンツ特徴量を算出する点、及びコンテンツ統計量算出部１１１Ａが機械学習により生成されたコンテンツ統計量モデルを用いてコンテンツ統計量を算出する点において、上記第１の実施形態と相違する。また、本実施形態では、コンテンツ分類装置１Ａが学習済みモデルデータベース１９を備える。なお、本実施形態においては、第１の実施形態による図１の構成と同様の構成については同一の符号を付している。
以下、本実施形態においては、コンテンツを静止画像として説明するが、第１の実施形態と同様に、動画像、映像、音響及び音声等の他のコンテンツに対しても適用する構成としても良い。 <Second embodiment>
Hereinafter, the second embodiment of the present invention will be described with reference to the drawings.
FIG. 6 is a block diagram showing a configuration example of the content classification device 1A according to the second embodiment of the present invention. In the content classification device 1A in FIG. 6, the content feature amount calculation unit 110A calculates the content feature amount using the content feature amount model generated by machine learning, and the content statistic calculation unit 111A is generated by machine learning. It differs from the first embodiment in that the content statistic is calculated using the content statistic model. Further, in the present embodiment, the content classification device 1A includes a trained model database 19. In addition, in this embodiment, the same reference numerals are given to the same configurations as those in FIG. 1 according to the first embodiment.
Hereinafter, in the present embodiment, the content will be described as a still image, but as in the first embodiment, the content may be configured to be applied to other content such as moving images, video, sound, and audio.

以下、第２の実施形態によるコンテンツ分類装置１Ａに対して、第１の実施形態のコンテンツ分類装置１と異なる構成及び動作のみの説明を行う。
コンテンツ特徴量算出部１１０Ａは、コンテンツ特徴量を出力するコンテンツ特徴量モデルを機械学習により生成する。すなわち、ユーザがニューラルネットワークなどの推定モデルに対して、入力層を形成して教師画像の画像データ（全てのピクセルの各々の情報量）を入力する。また、上記推定モデルに対して、ユーザは出力層を形成して教師画像の画像データを出力とする。この推定モデルとしてのニューラルネットは、例えば、入力層、中間層及び出力層の３層から構成されている。ここで、中間層は、入力層への入力数及び出力層から出力数よりも、少ない入力数及び出力数により構成する。 Hereinafter, only the configuration and operation different from the content classification device 1 of the first embodiment will be described for the content classification device 1A according to the second embodiment.
The content feature amount calculation unit 110A generates a content feature amount model that outputs the content feature amount by machine learning. That is, the user forms an input layer for an estimation model such as a neural network and inputs the image data of the teacher image (the amount of information of each of all the pixels). Further, for the above estimation model, the user forms an output layer and outputs the image data of the teacher image. The neural network as this estimation model is composed of, for example, three layers, an input layer, an intermediate layer, and an output layer. Here, the intermediate layer is composed of the number of inputs to the input layer and the number of inputs and the number of outputs smaller than the number of outputs from the output layer.

これにより、コンテンツ特徴量算出部１１０Ａは、複数の異なる教師データの各々の画像データを入力とし、それぞれの教師画像の画像データが出力される推定モデル（オートエンコーダ：自己符号化器）の学習を行う。すなわち、コンテンツ特徴量算出部１１０Ａは、教師画像の画像データが入力された場合、入力された教師画像と同様な画像が出力される推定モデルを、機械学習により生成する。そして、コンテンツ特徴量算出部１１０Ａは、推定モデルを構成する入力層及び中間層（出力層の前段の層）の各々を、推定コンテンツ特徴量モデルとして抽出する。コンテンツ特徴量算出部１１０Ａは、推定コンテンツ特徴量モデルの出力を推定コンテンツ特徴量（推定画像特徴量）として用いる。
また、コンテンツ特徴量算出部１１０Ａは、機械学習により生成した上記推定モデルを学習済みモデルデータベース１９に記憶させる。 As a result, the content feature amount calculation unit 110A learns an estimation model (autoencoder: self-encoder) in which the image data of each of a plurality of different teacher data is input and the image data of each teacher image is output. conduct. That is, when the image data of the teacher image is input, the content feature amount calculation unit 110A generates an estimation model in which an image similar to the input teacher image is output by machine learning. Then, the content feature amount calculation unit 110A extracts each of the input layer and the intermediate layer (the layer in front of the output layer) constituting the estimation model as the estimated content feature amount model. The content feature amount calculation unit 110A uses the output of the estimated content feature amount model as the estimated content feature amount (estimated image feature amount).
Further, the content feature amount calculation unit 110A stores the estimation model generated by machine learning in the trained model database 19.

コンテンツ統計量算出部１１１Ａは、コンテンツ統計量を出力するコンテンツ統計量モデルを機械学習により生成する。すなわち、ユーザがニューラルネットワークなどの推定モデルに対して、入力層を形成して教師画像のコンテンツ特徴量を入力する。ここでのコンテンツ特徴量が、例えば、第１の実施形態で示したコンテンツ特徴量、或いは第２の実施形態で示したコンテンツ特徴量算出部１１０Ａが推定コンテンツ特徴量モデルから出力させた推定コンテンツ特徴量である。また、上記推定モデルに対して、ユーザは出力層を形成して教師画像の画像データを出力とする。この推定モデルとしてのニューラルネットは、例えば、入力層、中間層及び出力層の３層から構成されている。ここで、中間層は、入力層への入力数及び出力層から出力数よりも、少ない入力数及び出力数により構成する。 The content statistic calculation unit 111A generates a content statistic model that outputs the content statistic by machine learning. That is, the user forms an input layer for an estimation model such as a neural network and inputs the content features of the teacher image. The content feature amount here is, for example, the content feature amount shown in the first embodiment, or the estimated content feature output by the content feature amount calculation unit 110A shown in the second embodiment from the estimated content feature amount model. The quantity. Further, for the above estimation model, the user forms an output layer and outputs the image data of the teacher image. The neural network as this estimation model is composed of, for example, three layers, an input layer, an intermediate layer, and an output layer. Here, the intermediate layer is composed of the number of inputs to the input layer and the number of inputs and the number of outputs smaller than the number of outputs from the output layer.

これにより、コンテンツ統計量算出部１１１Ａは、複数の異なる教師画像の各々のコンテンツ特徴量のデータを入力とし、それぞれの教師画像のコンテンツ特徴量が出力される推定モデルの学習を行う。すなわち、コンテンツ統計量算出部１１１Ａは、教師画像のコンテンツ特徴量が入力された場合、入力された教師画像のコンテンツ特徴量と同様なコンテンツ特徴量が出力される推定モデル（オートエンコーダ：自己符号化器）を、機械学習により生成する。そして、コンテンツ特徴量算出部１１０Ａは、推定モデルを構成する入力層及び中間層（出力層の前段の層）の各々を、推定コンテンツ統計量モデルとして抽出する。コンテンツ統計量算出部１１１Ａは、推定コンテンツ統計量モデルの出力を推定コンテンツ統計量（推定画像統計量）として用いる。
また、コンテンツ統計量算出部１１１Ａは、機械学習により生成した上記推定モデルを、学習済みモデルデータベース１９に記憶させる。 As a result, the content statistic calculation unit 111A receives the data of the content feature amount of each of the plurality of different teacher images as input, and trains the estimation model in which the content feature amount of each teacher image is output. That is, when the content feature amount of the teacher image is input, the content statistic calculation unit 111A outputs an estimation model (autoencoder: self-encoding) similar to the content feature amount of the input teacher image. (Vessel) is generated by machine learning. Then, the content feature amount calculation unit 110A extracts each of the input layer and the intermediate layer (the layer in front of the output layer) constituting the estimation model as the estimation content statistic model. The content statistic calculation unit 111A uses the output of the estimated content statistic model as the estimated content statistic (estimated image statistic).
Further, the content statistic calculation unit 111A stores the estimation model generated by machine learning in the trained model database 19.

図７は、本実施形態によるコンテンツ分類装置１Ａが行なう分類処理の動作例を示すフローチャートである。以下に示すフローチャートにおいては、すでに、コンテンツ特徴量算出部１１０Ａにより生成され、教師画像の画像データが入力された場合、入力された教師画像と同様な画像が出力される推定モデルが、学習済みモデルデータベース１９に書き込まれて記憶されていること、及びコンテンツ統計量及びコンテンツ統計量を求めるための演算式（モデル）がコンテンツ統計量モデルデータベース１７に書き込まれて記憶されていることを前提に説明する。
なお、図７のステップＳ１０～Ｓ１１、及びＳ１６～Ｓ１９の各々に示す処理については、図４において同じ符号で示す処理と同様であるため、その説明を省略する。 FIG. 7 is a flowchart showing an operation example of the classification process performed by the content classification device 1A according to the present embodiment. In the flowchart shown below, the estimated model that has already been generated by the content feature amount calculation unit 110A and that outputs the same image as the input teacher image when the image data of the teacher image is input is the trained model. The description will be made on the premise that it is written and stored in the database 19 and that the content statistic and the calculation formula (model) for obtaining the content statistic are written and stored in the content statistic model database 17. ..
Since the processes shown in steps S10 to S11 and S16 to S19 in FIG. 7 are the same as the processes shown by the same reference numerals in FIG. 4, the description thereof will be omitted.

ステップＳ１２Ａ：
コンテンツ特徴量算出部１１０Ａは、画像特徴量モデルを抽出する推定モデルを、学習済みモデルデータベース１９から選択する。コンテンツ特徴量算出部１１０Ａは、例えば、元画像、及び分類対象画像のデータ量等に応じて上記推定モデルを選択するようにしてよい。 Step S12A:
The content feature amount calculation unit 110A selects an estimation model for extracting an image feature amount model from the trained model database 19. The content feature amount calculation unit 110A may select the estimation model according to, for example, the amount of data of the original image and the classification target image.

ステップＳ１３Ａ：
コンテンツ統計量算出部１１１Ａは、画像統計量モデルを抽出する推定モデルを、学習済みモデルデータベース１９から選択する。コンテンツ統計量算出部１１１Ａは、例えば、分類する画像の数やカテゴリに応じて上記推定モデルを選択するようにしてよい。 Step S13A:
The content statistic calculation unit 111A selects an estimation model for extracting an image statistic model from the trained model database 19. The content statistic calculation unit 111A may select the estimation model according to, for example, the number of images to be classified or the category.

ステップＳ１４Ａ：
コンテンツ特徴量算出部１１０Ａは、選択した推定モデルに、元画像、又は分類対象画像の画像データを入力することにより、元画像、又は分類対象画像の各々の画像における上記コンテンツ特徴量を算出する。 Step S14A:
The content feature amount calculation unit 110A calculates the content feature amount in each image of the original image or the classification target image by inputting the image data of the original image or the classification target image into the selected estimation model.

ステップＳ１５Ａ：
コンテンツ統計量算出部１１１Ａは、選択した推定モデルに、元画像、及び分類対象画像のコンテンツ特徴量を入力することにより、元画像と分類対象画像との間の画像統計量を算出する。 Step S15A:
The content statistic calculation unit 111A calculates the image statistic between the original image and the classification target image by inputting the content feature amount of the original image and the classification target image into the selected estimation model.

図８は、本実施形態によるコンテンツ分類装置１Ａにおけるコンテンツ特徴量算出部１１０Ａが行なう推定コンテンツ特徴量を算出する処理の動作例を示すフローチャートである。 FIG. 8 is a flowchart showing an operation example of a process for calculating an estimated content feature amount performed by the content feature amount calculation unit 110A in the content classification device 1A according to the present embodiment.

ステップＳ２１：
コンテンツ特徴量算出部１１０Ａは、ユーザが入力する複数の教師画像の各々を、一旦、図示しない記憶部に書き込んで記憶する。これらの教師画像の画像データは、縦×横の各々のピクセル数が同一に設定されている。 Step S21:
The content feature amount calculation unit 110A temporarily writes and stores each of the plurality of teacher images input by the user in a storage unit (not shown). In the image data of these teacher images, the number of pixels in each of the vertical and horizontal directions is set to be the same.

ステップＳ２２：
コンテンツ特徴量算出部１１０Ａは、上記教師画像の画像データの各ピクセルの情報を入力する入力層と、推定された推定画像の画像データのピクセルを出力する出力層の構成を設定する。そして、コンテンツ特徴量算出部１１０Ａは、入力層から供給される画像データ及び出力層から出力される画像データに比較し、より少ないデータの入力数及び出力数を有する中間層を設定する。コンテンツ特徴量算出部１１０Ａは、上記入力層、中間層及び出力層から構成されるニューラルネットの推定モデルを構成する。 Step S22:
The content feature amount calculation unit 110A sets the configuration of an input layer for inputting information of each pixel of the image data of the teacher image and an output layer for outputting the pixel of the image data of the estimated estimated image. Then, the content feature amount calculation unit 110A sets an intermediate layer having a smaller number of data inputs and outputs as compared with the image data supplied from the input layer and the image data output from the output layer. The content feature amount calculation unit 110A constitutes an estimation model of a neural network composed of the input layer, the intermediate layer, and the output layer.

ステップＳ２３：
コンテンツ特徴量算出部１１０Ａは、上記推定モデルの入力層に対して、教師画像の画像データにおけるピクセルの情報を入力し、出力層から入力した教師画像の画像データと同様のピクセルの情報が出力されるように、中間層の機械学習を行う。コンテンツ特徴量算出部１１０Ａは、上記機械学習の処理を、全ての教師画像の各々の画像データを用いて行い、それぞれの出力される画像データのピクセルの情報が、入力される教師画像の画像データのピクセルの情報と所定の範囲で類似した場合、その時点の推定モデルを元画像推定モデルとする。 Step S23:
The content feature amount calculation unit 110A inputs pixel information in the image data of the teacher image to the input layer of the estimation model, and outputs pixel information similar to the image data of the teacher image input from the output layer. As you can see, machine learning of the middle layer is performed. The content feature amount calculation unit 110A performs the above machine learning process using each image data of all the teacher images, and the pixel information of each output image data is the image data of the input teacher image. If the information in the pixel is similar to the information in a predetermined range, the estimation model at that time is used as the original image estimation model.

ステップＳ２４：
コンテンツ特徴量算出部１１０Ａは、学習により求めた元画像推定モデルにおける出力層を取り外し、入力層及び中間層からなる推定コンテンツ特徴量モデル（推定画像特徴量モデル）として抽出する。 Step S24:
The content feature amount calculation unit 110A removes the output layer in the original image estimation model obtained by learning, and extracts it as an estimated content feature amount model (estimated image feature amount model) including an input layer and an intermediate layer.

ステップＳ２５：
コンテンツ特徴量算出部１１０Ａは、中間層の出力である推定コンテンツ特徴量モデルを、コンテンツ特徴量を出力するモデルとして、学習済みモデルデータベース１９に書き込んで記憶させる（登録する）。 Step S25:
The content feature amount calculation unit 110A writes (registers) the estimated content feature amount model, which is the output of the intermediate layer, in the trained model database 19 as a model for outputting the content feature amount.

図９は、本実施形態によるコンテンツ分類装置１Ａにおけるコンテンツ統計量算出部１１１Ａが行なう推定コンテンツ統計量を算出する処理の動作例を示すフローチャートである。 FIG. 9 is a flowchart showing an operation example of a process for calculating an estimated content statistic performed by the content statistic calculation unit 111A in the content classification device 1A according to the present embodiment.

ステップＳ３０：
コンテンツ統計量算出部１１１Ａは、ユーザが入力する複数の教師画像の各々を、一旦、図示しない記憶部に書き込んで記憶する。これらの教師画像の画像データは、縦×横の各々のピクセル数が同一に設定されている。 Step S30:
The content statistic calculation unit 111A temporarily writes and stores each of the plurality of teacher images input by the user in a storage unit (not shown). In the image data of these teacher images, the number of pixels in each of the vertical and horizontal directions is set to be the same.

ステップＳ３１：
ユーザは推定コンテンツ統計量モデルを算出する際に用いるコンテンツ特徴量（画像特徴量）の組合せを、コンテンツ分類装置１Ａの入力手段を介して、コンテンツ統計量算出部１１１Ａに対して入力する。 Step S31:
The user inputs a combination of content features (image features) used when calculating the estimated content statistic model to the content statistic calculation unit 111A via the input means of the content classification device 1A.

ステップＳ３２：
コンテンツ統計量算出部１１１Ａは、コンテンツ特徴量算出部１１０Ａに対して、教師画像の各々から、選択した組合せにおけるコンテンツ特徴量の算出を行わせる。
コンテンツ特徴量算出部１１０は、教師画像の各々において、上記組合せにおける種類のコンテンツ特徴量それぞれを求める。 Step S32:
The content statistic calculation unit 111A causes the content feature amount calculation unit 110A to calculate the content feature amount in the selected combination from each of the teacher images.
The content feature amount calculation unit 110 obtains each of the types of content feature amounts in the above combination in each of the teacher images.

ステップＳ３３：
コンテンツ統計量算出部１１１Ａは、上記教師画像のコンテンツ特徴量のデータを入力する入力層と、推定されたコンテンツ特徴量のデータを出力する出力層の構成を設定する。そして、コンテンツ統計量算出部１１１Ａは、入力層から供給される画像データ及び出力層から出力される画像データに比較し、より少ないデータの入力数及び出力数を有する中間層を設定する。コンテンツ統計量算出部１１１Ａは、上記入力層、中間層及び出力層から構成されるニューラルネットの推定モデルを構成する。 Step S33:
The content statistic calculation unit 111A sets the configuration of an input layer for inputting the content feature amount data of the teacher image and an output layer for outputting the estimated content feature amount data. Then, the content statistic calculation unit 111A sets an intermediate layer having a smaller number of inputs and outputs of data as compared with the image data supplied from the input layer and the image data output from the output layer. The content statistic calculation unit 111A constitutes an estimation model of a neural network composed of the input layer, the intermediate layer, and the output layer.

ステップＳ３４：
コンテンツ統計量算出部１１１Ａは、上記推定モデルの入力層に対して、教師画像の特徴量のデータを入力し、出力層から入力した教師画像の特徴量のデータと同様のデータが出力されるように、中間層の機械学習を行う。コンテンツ統計量算出部１１１Ａは、上記機械学習の処理を、全ての教師画像の各々のコンテンツ特徴量の組合せを用いて行い、それぞれの出力されるコンテンツ特徴量のデータが、入力される教師画像のコンテンツ特徴量のデータと所定の範囲で類似した場合、その時点の推定モデルを元画像推定モデルとする。 Step S34:
The content statistic calculation unit 111A inputs the data of the feature amount of the teacher image to the input layer of the estimation model, and outputs the same data as the data of the feature amount of the teacher image input from the output layer. In addition, machine learning of the middle layer is performed. The content statistic calculation unit 111A performs the above machine learning process using each combination of content feature amounts of all teacher images, and the data of each output content feature amount is the input teacher image. When the data of the content feature amount is similar to the data in a predetermined range, the estimation model at that time is used as the original image estimation model.

ステップＳ３５：
コンテンツ統計量算出部１１１Ａは、学習により求めた元画像推定モデルにおける出力層を取り外し、入力層及び中間層からなる推定コンテンツ統計量モデル（推定画像統計量モデル）として抽出する。 Step S35:
The content statistic calculation unit 111A removes the output layer in the original image estimation model obtained by learning, and extracts it as an estimated content statistic model (estimated image statistic model) including an input layer and an intermediate layer.

ステップＳ３６：
コンテンツ統計量算出部１１１Ａは、中間層の出力である推定コンテンツ統計量をコンテンツ統計量とし、推定コンテンツ統計量モデルを、コンテンツ統計量を出力するモデルとして、学習済みモデルデータベース１９に対して書き込んで記憶させる（登録する）。 Step S36:
The content statistic calculation unit 111A writes the estimated content statistic, which is the output of the intermediate layer, as the content statistic, and writes the estimated content statistic model to the trained model database 19 as the model for outputting the content statistic. Remember (register).

第２の実施形態によれば、鑑賞対象の元画像（元コンテンツ）に対して人間が抱く雰囲気やテイストなどの感覚に対応する画像特徴量（コンテンツ特徴量）および画像統計量（コンテンツ統計量）に基づいて、人間の視覚（知覚）系の神経機構の処理過程モデルを反映した画像雰囲気量（コンテンツ雰囲気量）として数値化し、雰囲気量空間における座標値として示し、元画像と分類対象画像（対象コンテンツ）との座標値間の座標距離を判定することにより、人間が抱く雰囲気やテイストなどの感覚（人間の脳の知覚処理）を、従来に比較してより正確に再現することができ、鑑賞対象の元画像と同様な感覚を有する他のコンテンツ（対象コンテンツ）を、不特定の画像群（分類対象画像群）の中から分類することが可能となる。
また、本実施形態においては、教師画像の画像データを入力するオートエンコーダ（３層のニューラルネットワーク）の中間層をコンテンツ特徴量として用いるため、より脳における神経機構の特徴抽出に対応したコンテンツ特徴量を得ることができ、鑑賞対象の元画像に対して、第１の実施形態に比較してより近い感覚を有する他のコンテンツ（対象コンテンツ）を選択して分類することが可能となる。 According to the second embodiment, the image feature amount (content feature amount) and the image statistic (content statistic) corresponding to the human sensation such as the atmosphere and taste of the original image (original content) to be viewed. Based on, it is quantified as an image atmosphere amount (content atmosphere amount) that reflects the processing process model of the neural mechanism of the human visual (perceptual) system, and is shown as a coordinate value in the atmosphere amount space, and the original image and the classification target image (target). By determining the coordinate distance between the coordinate values of the content), it is possible to more accurately reproduce the sensations such as the atmosphere and taste that humans have (perceptual processing of the human brain), and appreciate it. It is possible to classify other contents (target contents) having the same feeling as the original image of the target from the unspecified image group (classification target image group).
Further, in the present embodiment, since the intermediate layer of the autoencoder (three-layer neural network) for inputting the image data of the teacher image is used as the content feature amount, the content feature amount corresponding to the feature extraction of the neural mechanism in the brain. It becomes possible to select and classify other contents (target contents) having a feeling closer to the original image to be viewed as compared with the first embodiment.

＜第３の実施形態＞
図１０は、本発明の第３の実施形態によるコンテンツ分類装置１Ｂの構成例を示すブロック図である。図１０におけるコンテンツ分類装置１Ｂは、コンテンツ雰囲気量算出部１１Ｂが、コンテンツ特徴量算出部１１０（１１０Ａ）、及びコンテンツ統計量算出部１１１（１１１Ａ）を用いずに、深層学習により生成された深層学習画像モデルを用いて、画像データからコンテンツ統計量を算出する点において、上記実施形態と相違する。また、コンテンツ分類装置１Ｂは、深層学習済みモデルデータベース２０を備えている。第３の実施形態においては、第１の実施形態による図１の構成と同様の構成については同一の符号を付している。 <Third embodiment>
FIG. 10 is a block diagram showing a configuration example of the content classification device 1B according to the third embodiment of the present invention. In the content classification device 1B in FIG. 10, the content atmosphere amount calculation unit 11B does not use the content feature amount calculation unit 110 (110A) and the content statistic calculation unit 111 (111A), and the content classification device 1B is generated by deep learning. It differs from the above embodiment in that the content statistic is calculated from the image data using the image model. Further, the content classification device 1B includes a deep learning model database 20. In the third embodiment, the same reference numerals are given to the same configurations as those in FIG. 1 according to the first embodiment.

以下、本実施形態においては、コンテンツを静止画像として説明するが、第１の実施形態と同様に、動画像、映像、音響及び音声等の他のコンテンツに対しても適用する構成としても良い。
以下、第３の実施形態によるコンテンツ分類装置１Ｂに対して、第１の実施形態のコンテンツ分類装置１と異なる構成及び動作のみの説明を行う。 Hereinafter, in the present embodiment, the content will be described as a still image, but as in the first embodiment, the content may be configured to be applied to other content such as moving images, video, sound, and audio.
Hereinafter, only the configuration and operation different from the content classification device 1 of the first embodiment will be described for the content classification device 1B according to the third embodiment.

コンテンツ雰囲気量算出部１１Ｂは、深層学習により深層学習画像モデルを生成し、生成した深層学習画像モデルを深層学習済みモデルデータベース２０に記憶させる。
また、コンテンツ雰囲気量算出部１１Ｂは、深層学習済みモデルデータベース２０に記憶された深層学習画像モデルを用いて、元画像、及び分類対象画像の各々の画像データから、各画像に対する画像統計量を算出する。
深層学習済みモデルデータベース２０は、深層学習により生成された深層学習画像モデルを、コンテンツ統計量を算出するためのコンテンツ統計量モデルとして記憶する。 The content atmosphere amount calculation unit 11B generates a deep learning image model by deep learning, and stores the generated deep learning image model in the deep learning completed model database 20.
Further, the content atmosphere amount calculation unit 11B calculates an image statistic for each image from each image data of the original image and the classification target image by using the deep learning image model stored in the deep learning model database 20. do.
The deep-learned model database 20 stores the deep-learned image model generated by the deep-learning as a content statistic model for calculating the content statistic.

コンテンツ雰囲気量算出部１１Ｂが生成する深層学習画像モデルの生成処理としては、以下の処理Ａ１及び処理Ａ２の２通りがある。
図１１は、深層学習画像モデルの生成を行う処理Ａ１の動作例を示すフローチャートである。この処理Ａ１の場合、気にいった画像に対し、雰囲気が似ていると知覚する画像と似ていないと知覚する画像との各々を学習用画像と複数用意する。
ステップＳ４０：
コンテンツ雰囲気量算出部１１Ｂは、ユーザが入力する複数の学習用画像の各々を、一旦、図示しない記憶部に書き込んで記憶する。これらの学習用画像の画像データは、縦×横の各々のピクセル数が同一に設定されている。 There are two types of processing A1 and processing A2 below as processing for generating the deep learning image model generated by the content atmosphere amount calculation unit 11B.
FIG. 11 is a flowchart showing an operation example of the process A1 for generating the deep learning image model. In the case of this process A1, for the image of interest, a plurality of images for perceiving that the atmosphere is similar and images for perceiving that the atmosphere is not similar are prepared as a learning image.
Step S40:
The content atmosphere amount calculation unit 11B temporarily writes and stores each of the plurality of learning images input by the user in a storage unit (not shown). In the image data of these learning images, the number of pixels in each of the vertical and horizontal directions is set to be the same.

ステップＳ４１：
コンテンツ雰囲気量算出部１１Ｂは、中間層（プーリング層及び畳み込み層）が多層構造の深層ニューラルネットワークモデルに対し、上記学習用画像の画像データの各ピクセルの情報を入力する入力層と、正規化する全結合層である出力層とを設定する。この出力層は、「１」あるいは「０」との間の小数点の数値を出力する構成となっている。 Step S41:
The content atmosphere amount calculation unit 11B normalizes the deep neural network model in which the intermediate layer (pooling layer and convolution layer) has a multi-layer structure with an input layer for inputting information of each pixel of the image data of the learning image. Set the output layer, which is a fully connected layer. This output layer is configured to output the numerical value of the decimal point between "1" and "0".

ステップＳ４２：
コンテンツ雰囲気量算出部１１Ｂは、上記深層ニューラルネットワークモデルの入力層に対し、気にいった画像と雰囲気が似ていると知覚する画像を入力した場合、出力層から似ていることを示す「１」に近い数値が出力されるように、また気にいった画像と雰囲気が似ていないと知覚する画像を入力した場合、出力層から似ていることを示す「０」に近い数値が出力されるように、各ネットワークの層の重みパラメータの最適化処理を行う。すなわち、コンテンツ雰囲気量算出部１１Ｂは、深層ニューラルネットワークモデルに対し、クラス分類の機械学習を行い、学習結果として、深層学習画像モデルを生成する。 Step S42:
When the content atmosphere amount calculation unit 11B inputs an image that perceives that the atmosphere is similar to the image of interest to the input layer of the deep neural network model, the output layer indicates that the image is similar to "1". If you input an image that you perceive that the atmosphere is not similar to the image you like, the output layer will output a value close to "0", which indicates that it is similar. The weight parameter of each layer of each network is optimized. That is, the content atmosphere amount calculation unit 11B performs machine learning of classification on the deep neural network model, and generates a deep learning image model as a learning result.

このとき、コンテンツ雰囲気量算出部１１Ｂは、学習させた深層ニューラルネットワークモデルに対し、学習用画像とは異なる気にいった画像と雰囲気が似ていると知覚する画像と、雰囲気が似ていないと知覚する画像とを入力し、学習させた深層ニューラルネットワークモデルに対する学習テスト（クロス・バリデーション）を行う。
そして、コンテンツ雰囲気量算出部１１Ｂは、雰囲気が似ていると知覚する画像を深層ニューラルネットワークモデルに入力した際、出力層の出力する数値が予め設定した第１閾値以上となり、かつ雰囲気が似ていないと知覚する画像を深層ニューラルネットワークモデルに入力した際、出力層の出力する数値が予め設定した第２閾値以下となった場合、この深層ニューラルネットワークモデルを、気にいった画像に対する深層学習画像モデルとする。一方、コンテンツ雰囲気量算出部１１Ｂは、上記学習テストにおいて、雰囲気が似ていると知覚する画像に対して、深層ニューラルネットワークモデルの出力層の出力する数値が予め設定した第１閾値未満、あるいは雰囲気が似ていないと知覚する画像に対して、深層ニューラルネットワークモデルの出力層の出力する数値が予め設定した第２閾値以上である場合、深層ニューラルネットワークモデルの再学習を行う。 At this time, the content atmosphere amount calculation unit 11B perceives that the trained deep neural network model does not have a similar atmosphere to an image that is different from the learning image and has a similar atmosphere to the favorite image. An image is input and a learning test (cross validation) is performed on the trained deep neural network model.
Then, when the content atmosphere amount calculation unit 11B inputs an image perceived as having a similar atmosphere into the deep neural network model, the numerical value output by the output layer becomes equal to or higher than the preset first threshold value, and the atmosphere is similar. When an image that is perceived as not being input is input to the deep neural network model, and the numerical value output by the output layer is equal to or less than the preset second threshold value, this deep neural network model is used as a deep learning image model for the image of interest. do. On the other hand, in the above learning test, the content atmosphere amount calculation unit 11B indicates that the numerical value output by the output layer of the deep neural network model is less than the preset first threshold value or the atmosphere for the images perceived to have similar atmospheres. When the numerical value output from the output layer of the deep neural network model is equal to or higher than the preset second threshold value for the images perceived as dissimilar, the deep neural network model is relearned.

ステップＳ４３：
コンテンツ雰囲気量算出部１１Ｂは、生成した深層学習画像モデルから、多層構造の中間層におけるプーリング層及び畳み込み層の出力パラメータ、活性化関数の種類と出力されるパラメータなどの各々を、深層学習統計量（あるいは深層学習特徴量）それぞれとして抽出する。 Step S43:
From the generated deep learning image model, the content atmosphere amount calculation unit 11B obtains each of the output parameters of the pooling layer and the convolution layer in the intermediate layer of the multi-layer structure, the type of activation function, the output parameters, and the like, as deep learning statistics. (Or deep learning features) Extract as each.

ステップＳ４４：
コンテンツ雰囲気量算出部１１Ｂは、生成した深層学習画像モデルと、抽出した深層学習統計量（深層学習特徴量）とを深層学習済みモデルデータベース２０に対して書き込んで記憶させる（登録処理）。
上記処理Ａ１を気にいった画像毎に行い、それぞれに対応する深層学習画像モデルを生成する。 Step S44:
The content atmosphere amount calculation unit 11B writes and stores the generated deep learning image model and the extracted deep learning statistic (deep learning feature amount) in the deep learning completed model database 20 (registration process).
The above processing A1 is performed for each image of interest, and a deep learning image model corresponding to each is generated.

図１２は、深層学習画像モデルの生成を行う処理Ａ２の動作例を示すフローチャートである。この処理Ａ２の場合、処理Ａ１において生成した深層学習画像モデルの転移学習を行い、別の深層学習画像モデル（他の気にいった画像に対応する深層学習画像モデル）を生成する。
ステップＳ５０：
コンテンツ雰囲気量算出部１１Ｂは、ユーザが入力する複数の学習用画像の各々を、一旦、図示しない記憶部に書き込んで記憶する。これらの学習用画像の画像データは、縦×横の各々のピクセル数が同一に設定されている。 FIG. 12 is a flowchart showing an operation example of the process A2 that generates a deep learning image model. In the case of this process A2, transfer learning of the deep learning image model generated in the process A1 is performed, and another deep learning image model (deep learning image model corresponding to another favorite image) is generated.
Step S50:
The content atmosphere amount calculation unit 11B temporarily writes and stores each of the plurality of learning images input by the user in a storage unit (not shown). In the image data of these learning images, the number of pixels in each of the vertical and horizontal directions is set to be the same.

ステップＳ５１：
コンテンツ雰囲気量算出部１１Ｂは、中間層（プーリング層及び畳み込み層）が多層構造の深層ニューラルネットワークモデルに対して接続する、上記学習用画像の画像データの各ピクセルの情報を入力する入力層と、正規化する全結合層である出力層とを設定する。この出力層は、「１」あるいは「０」との間の小数点の数値を出力する構成となっている。 Step S51:
The content atmosphere amount calculation unit 11B includes an input layer for inputting information of each pixel of the image data of the learning image to which the intermediate layer (pooling layer and convolution layer) is connected to the deep neural network model having a multi-layer structure. Set the output layer, which is the fully connected layer to be normalized. This output layer is configured to output the numerical value of the decimal point between "1" and "0".

ステップＳ５２：
ユーザがコンテンツ分類装置１Ｂに対して、所定の入力手段（不図示）により、気にいった画像に対応する深層学習画像モデルを生成する際に、深層ニューラルネットワークモデルを用いて新たな深層学習画像モデルを生成するか、あるいは他の気にいった画像の深層学習画像モデルを用いた転移学習により新たな深層学習画像モデルを生成するかの制御を行う。例えば、ユーザは、学習用画像が多量に用意できる気にいった画像に対する深層学習画像モデルを生成する場合、深層ニューラルネットワークモデルを機械学習により学習させ生成する制御を行う。一方、ユーザは、学習用画像が多量に用意できない気にいった画像に対する深層学習画像モデルを生成する場合、すでに学習により求められた他の気にいった画像に対応する深層学習画像モデルを転移学習させることにより、深層学習画像モデルを生成する制御を行う。 Step S52:
When a user generates a deep learning image model corresponding to a favorite image by a predetermined input means (not shown) for the content classification device 1B, a new deep learning image model is created by using a deep neural network model. It controls whether to generate or to generate a new deep learning image model by transfer learning using a deep learning image model of another favorite image. For example, when a user generates a deep learning image model for an image that he / she likes to prepare a large amount of learning images, he / she controls to learn and generate a deep neural network model by machine learning. On the other hand, when the user generates a deep learning image model for a favorite image for which a large amount of learning images cannot be prepared, the user transfers and learns a deep learning image model corresponding to another favorite image already obtained by learning. , Controls to generate a deep learning image model.

このとき、コンテンツ雰囲気量算出部１１Ｂは、ユーザが深層ニューラルネットワークモデルから、気にいった画像の深層学習画像モデルを新たに生成する処理を選択した場合、処理をステップＳ５５へ進める。一方、コンテンツ雰囲気量算出部１１Ｂは、ユーザが他の気にいった画像の深層学習画像モデルに対して転移学習を行い、気にいった画像の深層学習画像モデルを生成する処理を選択した場合、処理をステップＳ５３へ進める。 At this time, when the user selects a process for newly generating a deep learning image model of the image of interest from the deep neural network model, the content atmosphere amount calculation unit 11B advances the process to step S55. On the other hand, when the content atmosphere amount calculation unit 11B selects a process of performing transfer learning on a deep learning image model of another favorite image and generating a deep learning image model of the favorite image, the process is stepped. Proceed to S53.

ステップＳ５３：
コンテンツ雰囲気量算出部１１Ｂは、すでに深層学習済みモデルデータベース２０に記憶されている深層学習画像モデルのなかから、所定の深層学習モデルを生成する。例えば、ユーザが深層学習画像モデルを生成する対象の気にいった画像に対し、似ていないと知覚する他の気にいった画像の深層学習画像モデルを指定し、コンテンツ雰囲気量算出部１１Ｂがこの深層学習画像モデルを転移学習に用いる深層学習画像モデルとして選択する。ここで、コンテンツ雰囲気量算出部１１Ｂは、転移学習に用いる深層学習画像モデルを、深層学習済みモデルデータベース２０から読み出す。 Step S53:
The content atmosphere amount calculation unit 11B generates a predetermined deep learning model from the deep learning image models already stored in the deep learning model database 20. For example, a deep learning image model of another favorite image that the user perceives as dissimilar to the favorite image of the target for which the user generates a deep learning image model is specified, and the content atmosphere amount calculation unit 11B uses this deep learning image. Select the model as a deep learning image model for transfer learning. Here, the content atmosphere amount calculation unit 11B reads out the deep learning image model used for the transfer learning from the deep learning completed model database 20.

ステップＳ５４：
コンテンツ雰囲気量算出部１１Ｂは、転移学習に用いるため、読み出した深層学習画像モデルから、入力層からユーザが指定あるいは予め指定されている中間層（適合層）までを、転移学習モデルとして抽出する。
そして、コンテンツ雰囲気量算出部１１Ｂは、深層ニューラルネットワークモデルから、上記適合層以降の中間層を抽出し、上記転移学習モデルの適合層に接続し、かつ出力層を接続することにより、転移学習に用いる転移深層学習画像モデルを生成する。 Step S54:
Since the content atmosphere amount calculation unit 11B is used for transfer learning, the content atmosphere amount calculation unit 11B extracts from the read deep learning image model from the input layer to the intermediate layer (adapted layer) designated by the user or designated in advance as the transfer learning model.
Then, the content atmosphere amount calculation unit 11B extracts the intermediate layer after the adaptation layer from the deep neural network model, connects it to the adaptation layer of the transfer learning model, and connects the output layer to perform transfer learning. Generate a transition deep learning image model to be used.

ステップＳ５５：
コンテンツ雰囲気量算出部１１Ｂは、深層学習済みモデルデータベース２０に記憶されている深層ニューラルネットワークモデルから、所定の深層ニューラルネットワークモデルを選択して読み出す。 Step S55:
The content atmosphere amount calculation unit 11B selects and reads a predetermined deep neural network model from the deep neural network model stored in the deep learned model database 20.

ステップＳ５６：
コンテンツ雰囲気量算出部１１Ｂは、学習対象モデル（上記転移深層学習画像モデルあるいは上記深層ニューラルネットワークモデル）の入力層に対し、気にいった画像と雰囲気が似ていると知覚する画像を入力した場合、出力層から似ていることを示す「１」に近い数値が出力されるように、また気にいった画像と雰囲気が似ていないと知覚する画像を入力した場合、出力層から似ていないことを示す「０」に近い数値が出力されるように、各ネットワークの層の重みパラメータの最適化処理を行う。すなわち、コンテンツ雰囲気量算出部１１Ｂは、学習対象モデルに対し、クラス分類の機械学習を行い、学習結果として、深層学習画像モデルを生成する。 Step S56:
The content atmosphere amount calculation unit 11B outputs when an image that perceives that the atmosphere is similar to the image of interest is input to the input layer of the training target model (the transition deep learning image model or the deep neural network model). If you input a value close to "1" that indicates that the layer is similar, or if you input an image that you perceive that the atmosphere is not similar to the image you like, the output layer will indicate that it is not similar. The weight parameter of each layer of each network is optimized so that a value close to "0" is output. That is, the content atmosphere amount calculation unit 11B performs machine learning of classification on the learning target model, and generates a deep learning image model as a learning result.

このとき、コンテンツ雰囲気量算出部１１Ｂは、作成した学習対象モデルに対し、学習用画像とは異なる気にいった画像と雰囲気が似ていると知覚する画像と、雰囲気が似ていないと知覚する画像とを入力し、学習させた学習対象モデルに対する学習テストを行う。そして、コンテンツ雰囲気量算出部１１Ｂは、雰囲気が似ていると知覚する画像を学習対象モデルに入力した際、出力層の出力する数値が予め設定した第１閾値以上となり、かつ雰囲気が似ていないと知覚する画像を学習対象モデルに入力した際、出力層の出力する数値が予め設定した第２閾値以下となった場合、この学習対象モデルを、気にいった画像に対する深層学習画像モデルとする。一方、コンテンツ雰囲気量算出部１１Ｂは、上記学習テストにおいて、雰囲気が似ていると知覚する画像に対して学習対象モデルの出力層の出力する数値が予め設定した第１閾値未満、あるいは雰囲気が似ていないと知覚する画像に対して、学習対象モデルの出力層の出力する数値が予め設定した第２閾値以上である場合、学習対象モデルの再学習を行う。 At this time, the content atmosphere amount calculation unit 11B perceives that the created learning target model has a similar atmosphere to an image that is different from the learning image, and an image that perceives that the atmosphere is not similar. Is input, and a learning test is performed on the trained target model. Then, when the content atmosphere amount calculation unit 11B inputs an image perceived to have a similar atmosphere into the learning target model, the numerical value output by the output layer becomes equal to or higher than a preset first threshold value, and the atmosphere is not similar. When the value output by the output layer is equal to or less than the preset second threshold value when the image to be perceived is input to the training target model, this training target model is used as a deep training image model for the desired image. On the other hand, in the above learning test, the content atmosphere amount calculation unit 11B indicates that the numerical value output by the output layer of the learning target model is less than the preset first threshold value for the image perceived to have a similar atmosphere, or the atmosphere is similar. When the numerical value output from the output layer of the learning target model is equal to or higher than the preset second threshold value for the image perceived as not being used, the learning target model is retrained.

ステップＳ５７：
コンテンツ雰囲気量算出部１１Ｂは、生成した深層学習画像モデルから、多層構造の中間層におけるプーリング層及び畳み込み層の出力パラメータ、活性化関数の種類と出力されるパラメータなどの各々を、深層学習統計量（あるいは深層学習特徴量）それぞれとして抽出する。 Step S57:
From the generated deep learning image model, the content atmosphere amount calculation unit 11B obtains each of the output parameters of the pooling layer and the convolution layer in the intermediate layer of the multi-layer structure, the type of activation function, the output parameters, and the like, as deep learning statistics. (Or deep learning features) Extract as each.

ステップＳ５８：
コンテンツ雰囲気量算出部１１Ｂは、生成した深層学習画像モデルと、抽出した深層学習統計量（深層学習特徴量）とを深層学習済みモデルデータベース２０の深層学習統計量モデルテーブルに対して書き込んで記憶させる（登録処理）。
上記処理Ａ２を気にいった画像毎に行い、それぞれに対応する深層学習画像モデルを生成する。 Step S58:
The content atmosphere amount calculation unit 11B writes and stores the generated deep learning image model and the extracted deep learning statistic (deep learning feature amount) in the deep learning statistic model table of the deep learning completed model database 20. (registration process).
The above processing A2 is performed for each image of interest, and a deep learning image model corresponding to each is generated.

図１３は、本実施形態によるコンテンツ分類装置１Ｂが行なう、元画像と同様の雰囲気及びテイストを有する分類対象画像を分類する処理の動作例を示すフローチャートである。以下に示すフローチャートにおいては、すでに、コンテンツ統計量（あるいはコンテンツ特徴量）を求めるための演算式として深層学習画像モデルが深層学習済みモデルデータベース２０に書き込まれて記憶されていることを前提に説明する。
なお、図１３のステップＳ６４～Ｓ６８の各々に示す処理については、図４におけるステップＳ１６～Ｓ２０に示す処理と同様であるため、その説明を省略する。 FIG. 13 is a flowchart showing an operation example of a process for classifying a classification target image having the same atmosphere and taste as the original image, which is performed by the content classification device 1B according to the present embodiment. In the flowchart shown below, it is assumed that the deep learning image model has already been written and stored in the deep learning model database 20 as an arithmetic expression for obtaining the content statistics (or content features). ..
Since the processing shown in each of steps S64 to S68 in FIG. 13 is the same as the processing shown in steps S16 to S20 in FIG. 4, the description thereof will be omitted.

ステップＳ６０：
コンテンツ選択部１０は、ユーザの操作により、分類の基準となる元画像（気にいった画像）を、外部装置から入力して、コンテンツ雰囲気量算出部１１Ｂに対して出力する。 Step S60:
The content selection unit 10 inputs an original image (a favorite image) as a reference for classification from an external device and outputs it to the content atmosphere amount calculation unit 11B by the user's operation.

ステップＳ６１：
コンテンツ選択部１０は、コンテンツデータベース１５から、分類対象画像群を抽出する。そして、コンテンツ選択部１０は、抽出した対象選択画像群のうちの一つの画像を分類対象画像としてコンテンツ雰囲気量算出部１１Ｂに対して出力する。なお、コンテンツ選択部１０は、外部メモリ、または通信ネットワークを介してユーザの操作により入力された画像群を、分類対象画像群として選択してもよい。 Step S61:
The content selection unit 10 extracts a classification target image group from the content database 15. Then, the content selection unit 10 outputs an image of one of the extracted target selection image groups as a classification target image to the content atmosphere amount calculation unit 11B. The content selection unit 10 may select an image group input by a user's operation via an external memory or a communication network as a classification target image group.

ステップＳ６２：
コンテンツ雰囲気量算出部１１Ｂは、元画像である気にいった画像に対応する深層学習画像モデルを、深層学習済みモデルデータベース２０に記憶されている深層学習画像モデルのなかから選択する。 Step S62:
The content atmosphere amount calculation unit 11B selects a deep learning image model corresponding to the desired image, which is the original image, from the deep learning image models stored in the deep learning model database 20.

ステップＳ６３：
コンテンツ雰囲気量算出部１１Ｂは、選択した深層学習画像モデルにより、元画像及び分類対象画像の各々の深層学習統計量（すなわち、コンテンツ統計量）を算出する。
すなわち、コンテンツ雰囲気量算出部１１Ｂは、深層学習画像モデルの入力層に対して、元画像における各ピクセルの画素値を入力する。そして、コンテンツ雰囲気量算出部１１Ｂは、深層学習画像モデルの多層構造の中間層におけるプーリング層及び畳み込み層の出力パラメータ、活性化関数の種類と出力されるパラメータなどの各々を、元画像の深層学習統計量として抽出する。 Step S63:
The content atmosphere amount calculation unit 11B calculates the deep learning statistic (that is, the content statistic) of each of the original image and the classification target image by the selected deep learning image model.
That is, the content atmosphere amount calculation unit 11B inputs the pixel value of each pixel in the original image to the input layer of the deep learning image model. Then, the content atmosphere amount calculation unit 11B deep-learns the output parameters of the pooling layer and the convolution layer in the intermediate layer of the multi-layer structure of the deep learning image model, the type of the activation function, the output parameters, and the like. Extract as a statistic.

また、同様に、コンテンツ雰囲気量算出部１１Ｂは、深層学習画像モデルの入力層に対して、分類対象画像における各ピクセルの画素値を入力する。そして、コンテンツ雰囲気量算出部１１Ｂは、深層学習画像モデルの多層構造の中間層におけるプーリング層及び畳み込み層の出力パラメータ、活性化関数の種類と出力されるパラメータなどの各々を、分類対象画像の深層学習統計量として抽出する。 Similarly, the content atmosphere amount calculation unit 11B inputs the pixel value of each pixel in the classification target image to the input layer of the deep learning image model. Then, the content atmosphere amount calculation unit 11B sets each of the output parameters of the pooling layer and the convolution layer in the intermediate layer of the multi-layered structure of the deep learning image model, the type of the activation function, the output parameters, and the like into the deep layer of the image to be classified. Extract as a learning statistic.

第３の実施形態によれば、鑑賞対象の元画像（元コンテンツ）に対して人間が抱く雰囲気やテイストなどの感覚に対応させ、人間の視覚（知覚）系の神経機構の処理過程を、深層ニューラルネットワークモデルを深層学習させて生成した深層学習画像モデルとして近似し、その出力から人間の視覚（知覚）系の神経機構の処理過程モデルを反映したコンテンツ雰囲気量を数値化し、コンテンツ雰囲気量空間における座標値として示し、元画像と分類対象画像（対象コンテンツ）との座標値間の座標距離を判定することにより、人間が抱く雰囲気やテイストなどの感覚（人間の脳の知覚処理）を、従来に比較してより正確に再現することができ、鑑賞対象の元画像と同様な感覚を有する他のコンテンツ（対象コンテンツ）を生成することが可能となる。
また、本実施形態においては、学習用画像の画像データを入力して、クラス分類した結果を得る深層ニューラルネットワークモデルを用いて深層学習画像モデルを生成し、この深層学習モデルの中間層をコンテンツ統計量モデルとしているため、より脳における神経機構の特徴抽出に対応したコンテンツ統計量を得ることができ、鑑賞対象の元画像に対して、第１の実施形態に比較してより近い感覚を有する他のコンテンツ（対象コンテンツ）を分類することが可能となる。 According to the third embodiment, the processing process of the neural mechanism of the human visual (perceptual) system is deepened by making the original image (original content) to be viewed correspond to the sensation such as the atmosphere and taste that the human has. The neural network model is approximated as a deep learning image model generated by deep learning, and the content atmosphere amount that reflects the processing process model of the neural mechanism of the human visual (perceptual) system is quantified from the output, and in the content atmosphere amount space. By showing it as a coordinate value and determining the coordinate distance between the coordinate value between the original image and the classification target image (target content), the human senses such as atmosphere and taste (perceptual processing of the human brain) have been conventionally performed. It can be reproduced more accurately by comparison, and it becomes possible to generate other contents (target contents) having the same feeling as the original image to be viewed.
Further, in the present embodiment, a deep learning image model is generated by using a deep neural network model in which image data of a training image is input and the result of classification is obtained, and the intermediate layer of the deep learning model is used as content statistics. Since it is a quantitative model, it is possible to obtain content statistics corresponding to the extraction of the characteristics of the neural mechanism in the brain, and the original image to be viewed has a feeling closer to that of the first embodiment. It becomes possible to classify the contents (target contents) of.

なお、本発明におけるコンテンツ分類装置１（１Ａ、１Ｂ）の全部または一部の機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませて実行することにより処理を行なってもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。
また、「コンピュータシステム」は、ホームページ提供環境（あるいは表示環境）を備えたＷＷＷシステムも含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ－ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムが送信された場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリ（ＲＡＭ）のように、一定時間プログラムを保持しているものも含むものとする。 A program for realizing all or part of the functions of the content classification device 1 (1A, 1B) in the present invention is recorded on a computer-readable recording medium, and the program recorded on the recording medium is recorded in the computer system. The process may be performed by loading it into the computer and executing it. The term "computer system" as used herein includes hardware such as an OS and peripheral devices.
Further, the "computer system" shall also include a WWW system provided with a homepage providing environment (or display environment). Further, the "computer-readable recording medium" refers to a portable medium such as a flexible disk, a magneto-optical disk, a ROM, or a CD-ROM, and a storage device such as a hard disk built in a computer system. Furthermore, a "computer-readable recording medium" is a volatile memory (RAM) inside a computer system that serves as a server or client when a program is transmitted via a network such as the Internet or a communication line such as a telephone line. In addition, it shall include those that hold the program for a certain period of time.

また、上記プログラムは、このプログラムを記憶装置等に格納したコンピュータシステムから、伝送媒体を介して、あるいは、伝送媒体中の伝送波により他のコンピュータシステムに伝送されてもよい。ここで、プログラムを伝送する「伝送媒体」は、インターネット等のネットワーク（通信網）や電話回線等の通信回線（通信線）のように情報を伝送する機能を有する媒体のことをいう。また、上記プログラムは、前述した機能の一部を実現するためのものであってもよい。さらに、前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるもの、いわゆる差分ファイル（差分プログラム）であってもよい。 Further, the program may be transmitted from a computer system in which this program is stored in a storage device or the like to another computer system via a transmission medium or by a transmission wave in the transmission medium. Here, the "transmission medium" for transmitting a program refers to a medium having a function of transmitting information, such as a network (communication network) such as the Internet or a communication line (communication line) such as a telephone line. Further, the above program may be for realizing a part of the above-mentioned functions. Further, a so-called difference file (difference program) may be used, which can realize the above-mentioned function in combination with a program already recorded in the computer system.

本発明のいくつかの実施形態を説明したが、これらの実施形態は、例として提示したものであり、発明の範囲を限定することは意図していない。これら実施形態は、その他の様々な形態で実施されることが可能であり、発明の要旨を逸脱しない範囲で、種々の省略、置き換え、変更を行うことができる。これら実施形態やその変形は、発明の範囲や要旨に含まれると同様に、特許請求の範囲に記載された発明とその均等の範囲に含まれるものである。 Although some embodiments of the present invention have been described, these embodiments are presented as examples and are not intended to limit the scope of the invention. These embodiments can be implemented in various other embodiments, and various omissions, replacements, and changes can be made without departing from the gist of the invention. These embodiments and variations thereof are included in the scope of the invention described in the claims and the equivalent scope thereof, as are included in the scope and gist of the invention.

１、１Ａ、１Ｂ…コンテンツ分類装置
１０…コンテンツ選択部
１１、１１Ａ、１１Ｂ…コンテンツ雰囲気量算出部
１１０、１１０Ａ…コンテンツ特徴量算出部
１１１、１１１Ａ…コンテンツ統計量算出部
１２…座標距離算出部
１３…コンテンツ分類部
１４…コンテンツ出力部
１５…コンテンツデータベース
１６…コンテンツ特徴量データベース
１７…コンテンツ統計量モデルデータベース
１８…分類コンテンツ記憶部
１９…学習済みモデルデータベース
２０…深層学習済みモデルデータベース 1, 1A, 1B ... Content classification device 10 ... Content selection unit 11, 11A, 11B ... Content atmosphere amount calculation unit 110, 110A ... Content feature amount calculation unit 111, 111A ... Content statistics calculation unit 12 ... Coordinate distance calculation unit 13 … Content classification unit 14… Content output unit 15… Content database 16… Content feature amount database 17… Content statistic model database 18… Classification content storage unit 19… Trained model database 20… Deeply trained model database

Claims

A content classification device that classifies target content that has the same atmosphere and taste as the original content, which is the image used as the classification standard, from the target content group that is the image to be classified.
For the combination of two regions that are different from each other that can be taken in each region determined when the characteristics, scale, and division according to the characteristics are specified in the image, the relationship quantity that indicates the relationship between the two regions is shown. , A content atmosphere amount calculation unit that calculates the content atmosphere amount for each of the original content and the target content by using the content atmosphere amount that quantifies the atmosphere and taste perceived by a person for the image .
A distance calculation unit that calculates the distance between the coordinate value of the content atmosphere amount of the original content and the coordinate value of the content atmosphere amount of the target content.
A content classification device including a content classification unit for classifying the target content based on the distance.

The content atmosphere amount calculation unit has a plurality of characteristics that are different from each other as the characteristics, and two different characteristics that can be taken in each region determined when the classification according to the characteristics, the scale, and the plurality of characteristics that are different from each other is specified. The relationship amount is calculated for the combination of regions, and the relationship amount corresponding to each combination of a plurality of characteristics different from each other is calculated as the content atmosphere amount.
The content classification device according to claim 1.

The content classification according to claim 2, wherein the content atmosphere amount calculation unit calculates at least one of simple correlation, shift correlation, and cross-correlation as a relationship between the two regions. Device.

A content feature amount storage unit that stores a plurality of types of arithmetic expressions corresponding to each content feature amount of the original content and the target content, and a content feature amount storage unit.
Further, it is provided with a content statistic storage unit in which a plurality of types of arithmetic expressions corresponding to the content statistic, which is the statistic of the content feature amount, are stored.
The content atmosphere amount calculation unit
A content feature amount calculation unit that obtains the content feature amount from each of the original content and the target content by an arithmetic expression corresponding to the content feature amount.
Using the content feature amount calculated by the content feature amount calculation unit, the content statistic is obtained for each of the original content and the target content by an arithmetic expression corresponding to the content statistic. ,
Have,
The content atmosphere amount calculation unit
A claim characterized in that the content atmosphere amount for each of the original content and the target content is calculated by an arithmetic formula corresponding to the relationship by using the content statistic calculated by the content statistic calculation unit. Item 2 or the content classification device according to claim 3.

The content feature amount is an image feature amount obtained from the pixel information of the image, and is characterized by being an image feature amount including at least the shading, brightness, chromaticity, contrast, gradient, edge, and optical flow of the image. The content classification device according to claim 4.

The content statistic is a statistic obtained by applying a certain statistical operation to the content feature amount obtained in a predetermined region of a predetermined physical characteristic in an image, and the physical characteristic is a statistic. The region includes at least spatial frequency divisions, direction divisions, and color divisions, including at least density characteristics, gradation characteristics, color characteristics, spatial frequency characteristics, and resolution characteristics, and the statistical operation is at least average, variance, and histogram. , The content classification apparatus according to claim 5, wherein the content classification apparatus includes a power spectrum.

The content classification device according to claim 5 or 6, wherein the distance is any one of a Euclidean distance, a cosine distance, a Hamming distance, a Manhattan distance, and a Calvac-Ripler distance.

When the distance becomes equal to or less than a preset threshold value, the content classification unit classifies the target content corresponding to the distance as an image having the same atmosphere and taste as the original content. The content classification device according to any one of claims 1 to 7.

Claim 1 to claim 1, wherein the content classification unit classifies the target content corresponding to the distance into images having the same atmosphere and taste as the original content in ascending order of the distance. The content classification device according to any one of 7.

The content feature amount calculation unit obtains the content feature amount as the output of the intermediate layer of the autoencoder in which a plurality of learning images are trained as inputs and outputs.
The content classification device according to claim 4, wherein the content statistic calculation unit obtains the content statistic as an output of an intermediate layer of an autoencoder trained to use the content feature amount as an input and an output.

The content atmosphere amount calculation unit uses the pixel values of the pixels of a plurality of learning images to classify a neural network having a multi-stage intermediate layer by deep learning, or by a deep learning image model of another deep learning image model. The content classification device according to claim 1, wherein the content atmosphere amount is obtained by a deep learning image model classified by transfer learning using an intermediate layer.

It is a content classification method for classifying target content having the same atmosphere and taste as the original content which is the image to be classified from the target content group which is the image to be classified.
The relationship between the two areas regarding the combination of two different areas that can be taken in each area determined when the content atmosphere amount calculation unit specifies the characteristics, scale, and the classification according to the characteristics in the image. The relationship amount indicating the sex is defined as the content atmosphere amount in which the atmosphere and taste perceived by a person for the image are quantified, and the content atmosphere amount calculation for calculating the content atmosphere amount for each of the original content and the target content. Process and
A distance calculation step in which the distance calculation unit calculates the distance between the coordinate value of the previous content atmosphere amount of the original content calculated by the content atmosphere amount calculation unit and the coordinate value of the content atmosphere amount of the target content. When,
A content classification method, characterized in that the content classification unit has a content classification step of classifying the target content based on the distance.

Computer,
The program for functioning as the content classification device according to any one of claims 1 to 11.