JP2024083601A

JP2024083601A - Detection support device, learning device, detection support method, learning method, and program

Info

Publication number: JP2024083601A
Application number: JP2024066632A
Authority: JP
Inventors: 裕一小林
Original assignee: Individual
Current assignee: Individual
Priority date: 2019-06-18
Filing date: 2024-04-17
Publication date: 2024-06-21
Also published as: JP7476487B2; JP2020204941A

Abstract

【課題】コンテンツ画像に対する、人間の視覚による処理プロセスを、プロセッサ上の処理として実行する。【解決手段】所定の柄を繰り返し配置することにより生成されるコンテンツ画像に生じる、前記コンテンツ画像における柄の連続性による想定外のパターンの有無の検出を支援する検出支援装置であって、前記コンテンツ画像を取得するコンテンツ画像取得部と、視線特徴学習モデルを用いて前記コンテンツ画像から前記パターンを検出する際の視線の特徴である視線特徴を推定し、推定した前記視線特徴を前記コンテンツ画像における画素ごとに示す視線特徴画像を生成する視線特徴画像生成部と、前記視線特徴画像を表示する解析結果出力部と、を備え、前記視線特徴学習モデルは、学習用の前記コンテンツ画像と、当該学習用の前記コンテンツ画像から前記パターンを検出した人間の前記視線特徴との対応関係を学習した学習済モデルである。【選択図】図６[Problem] A processing process based on human vision for a content image is executed as processing on a processor. [Solution] A detection support device that supports the detection of the presence or absence of an unexpected pattern due to the continuity of a pattern in a content image generated by repeatedly arranging a predetermined pattern includes a content image acquisition unit that acquires the content image, a gaze feature image generation unit that uses a gaze feature learning model to estimate gaze features that are characteristics of the gaze when detecting the pattern from the content image and generates a gaze feature image showing the estimated gaze features for each pixel in the content image, and an analysis result output unit that displays the gaze feature image, and the gaze feature learning model is a trained model that has learned the correspondence between the content image for learning and the gaze features of a person who detected the pattern from the content image for learning. [Selected Figure] Figure 6

Description

本発明は、人間がコンテンツ画像の不具合を検出し易くなるように支援する検出支援装置、検出支援方法、及びプログラムに関する。 The present invention relates to a detection support device, a detection support method, and a program that help people to more easily detect defects in content images.

建装材の分野においては、古くから、意匠性が重要な付加価値とされており、たとえば木目や抽象柄等の意匠が施された化粧シートが、建築の内外装および家具、調度品等に接着して使用されている。このような化粧シートの意匠には、所定の柄を単位として、その柄を繰り返し配置することにより、所定の柄を同調させたものがある。 In the field of building materials, design has long been considered an important added value, and decorative sheets with designs such as wood grain or abstract patterns are used by adhering them to the interior and exterior of buildings, furniture, and other furnishings. Some designs for such decorative sheets are created by repeating a certain pattern as a unit, resulting in a certain pattern being synchronized.

このような所定の柄を同調させた意匠（以下、コンテンツ画像、或いは視覚コンテンツなどと称する）においては、柄の連続性が想定外のパターンや影を作り出してしまい、意匠性が損なわれてしまう不具合が発生することがある。このような不具合は、単体の柄を設計する段階では検出することができず、単体の柄を繰り返し配置した画像が作成され、その画像を、ある距離だけ離れた位置から観察して初めて検出されることが多い。これは、画像を観察した人物が、柄が繰り返されたコンテンツ画像上に、なんらかの空間的な規則性（パターン）を、視覚的に感知するためと考えられる。 In designs where such predetermined patterns are synchronized (hereafter referred to as content images or visual content), the continuity of the patterns can create unexpected patterns or shadows, resulting in defects that impair the design. Such defects cannot be detected at the stage of designing a single pattern, and are often only detected when an image is created in which a single pattern is repeatedly arranged, and this image is observed from a certain distance away. This is thought to be because a person observing an image visually senses some kind of spatial regularity (pattern) in the content image where the pattern is repeated.

一般に、訓練をした人間（熟練者）と訓練をしていない人間（非熟練者）とでは、同じ意匠のコンテンツ画像に対して検出することができる視覚的な特徴に差異が生じる。これは、人間が検出することができる視覚的な特徴が、コンテンツ画像の物理的な特性だけでなく、観察する人間の視覚の特性が大きく影響するためと考えられる。 In general, there are differences in the visual features that trained humans (experts) and untrained humans (non-experts) can detect in content images of the same design. This is thought to be because the visual features that humans can detect are greatly influenced not only by the physical characteristics of the content image, but also by the visual characteristics of the human observer.

つまり、熟練者は、このようなコンテンツ画像の外観上の不具合を検出することが可能であるが、非熟練者は、係る不具合を検出できないことが少なくない。これは、熟練者が、訓練によってコンテンツ画像に対する不具合の検出方法を習得したためと考えられる。つまり、熟練者は、視覚情報処理過程において、コンテンツ画像に対する特有の見方や、特有の処理方法を確立していると考えられる。このような特有の見方を定量化することができれば、非熟練者であっても、負担の大きい訓練を経ずに、このようなコンテンツ画像の不具合を検出できるようになると考えられる。 In other words, while an expert is able to detect defects in the appearance of such content images, an unskilled person is often unable to detect such defects. This is thought to be because an expert has learned how to detect defects in content images through training. In other words, an expert is thought to have established a unique way of looking at content images and a unique method of processing them during the visual information processing process. If such unique ways of looking at content images could be quantified, even an unskilled person would be able to detect defects in such content images without undergoing burdensome training.

Ｌ．Ｉｔｔｉ，Ｃ．Ｋｏｃｈ，Ｅ．Ｎｉｅｂｒ：“Ａｍｏｄｅｌｏｆｓａｌｉｅｎｃｙ－ｂａｓｅｄｖｉｓｕａｌａｔｔｅｎｔｉｏｎｆｏｒｒａｐｉｄｓｃｅｎｅａｎａｌｙｓｉｓ”，ＩＥＥＥＴｒａｎｓａｃｔｉｏｎｓｏｎＰａｔｔｅｒｎＡｎａｌｙｓｉｓａｎｄＭａｃｈｉｎｅＩｎｔｅｌｌｉｇｅｎｃｅ，Ｖｏｌｕｍｅ２０，Ｉｓｓｕｅ：１１：Ｐ．１２５４－１２５９，Ｎｏｖ１９９８．L. Itti, C. Koch, E. Niebr: "A model of saliency-based visual attention for rapid scene analysis", IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume 20, Issue: 11: P. 1254-1259, November 1998. Ｄ．Ｇａｏ，Ｖ．Ｍａｈａｄｅｖａｎ，Ｎ．Ｖａｓｃｏｎｃｅｌｏｓ：“Ｏｎｔｈｅｐｌａｕｓｉｂｉｌｉｔｙｏｆｔｈｅｄｉｓｃｒｉｍｉｎａｎｔｃｅｎｔｅｒ－ｓｕｒｒｏｕｎｄｈｙｐｏｔｈｅｓｉｓｆｏｒｖｉｓｕａｌｓａｌｉｅｎｃｙ”，ＪｏｕｒｎａｌｏｆＶｉｓｉｏｎ，Ｖｏｌ．８，１３，Ｊｕｎｅ２００８．D. Gao, V. Mahadevan, N. Vasconcelos: "On the plausibility of the discriminant center-surround hypothesis for visual salience", Journal of Vision, Vol. 8,13,June 2008. Ｊ．Ｈａｒｅｌ，Ｃ．Ｋｏｃｈ，Ｐ．Ｐｅｒｏｎａ：“Ｇｒａｐｈ－ＢａｓｅｄＶｉｓｕａｌＳａｌｉｅｎｃｙ”，Ａｄｖａｎｃｅｓｉｎｎｅｕｒａｌｉｎｆｏｒｍａｔｉｏｎｐｒｏｃｅｓｓｉｎｇｓｙｓｔｅｍｓ，１９：５４５－５５２，Ｊａｎｕａｒｙ，２００６．J. Harel, C. Koch, P. Perona: “Graph-Based Visual Salience”, Advances in neural information processing systems, 19:545-552, January, 2006.

人間の視覚の基本的な処理方法を真似た処理を、プロセッサ上に再現することでコンテンツ画像に対する人間の見方を、装置が行う処理として、ある程度再現することができる。例えば、人間の視覚は、眼から光の情報を入力して網膜に二次元状の明るさを示す情報と、色を示す情報とを抽出する。そして、人間の視覚は、抽出した情報を脳の視覚野に送信する。脳の視覚野においては、視覚から得た明るさ等の情報に基づいて、明るさの強度、空間的な不連続性（エッジ）、連続性（勾配）、色情報の色度表現（赤、緑、青の三原色表現や、赤－緑／黄－青などの反対色表現）などが、個別に処理される。 By reproducing on a processor a process that mimics the basic processing method of human vision, it is possible to reproduce to a certain extent the way humans see content images as processing performed by a device. For example, human vision receives light information from the eye and extracts two-dimensional information indicating brightness and color on the retina. Human vision then transmits the extracted information to the visual cortex of the brain. In the visual cortex of the brain, the intensity of brightness, spatial discontinuity (edges), continuity (gradients), chromaticity representation of color information (representation of the three primary colors red, green, and blue, and opponent color representations such as red-green/yellow-blue), and other factors are processed individually based on the brightness and other information obtained from vision.

さらに、脳の視覚野においては、処理したそれらの空間的な対比（中心部と周辺部間）や、方向の連続性／不連続性などが処理され、さらにはそれらの組み合わせが処理されて、…、というように、処理結果を用いて更に処理を繰返すことで、段階的に、より高次で複雑なパターンが処理される。これらの各処理を逐次プロセッサ上に実現できれば、人間の視覚と同じ種類の情報が処理できるとともに、ある回路（処理）は強く、別のある回路は弱く作用するように制御することが可能になる。 Furthermore, in the visual cortex of the brain, the spatial contrast (between the center and periphery) and directional continuity/discontinuity of the processed information are processed, and then combinations of these are processed, and so on. By repeating further processing using the results of the processing, increasingly higher-level and more complex patterns are processed in stages. If each of these processes could be realized sequentially on a processor, it would be possible to process the same type of information as human vision, and it would be possible to control certain circuits (processing) to act strongly and others to act weakly.

一方、例えば、目利きに長けた経験豊かな人物（熟練者）が、ある対象物を見て、その不具合に気付くプロセスに着目して、その観察のプロセスを真似て、プロセッサ上に再現することを考える。すなわち、経験知などと呼はれるような一定の訓練を経て人が獲得していく知識や感覚を、プロセッサ上に再現できれば、非熟練者であっても、プロセッサによる処理結果を用いて熟練者と同様の処理を実現することが可能になる。 On the other hand, for example, one could focus on the process by which an experienced person with a good eye (an expert) looks at an object and notices a defect, and imagine replicating that observation process on a processor. In other words, if the knowledge and intuition that people acquire through a certain amount of training, known as experiential knowledge, could be reproduced on a processor, even an unskilled person would be able to use the results of the processor's processing to achieve the same results as an expert.

本発明は、このような事情に鑑みてなされたもので、その目的は、コンテンツ画像に対する、人間の視覚による処理プロセスを、プロセッサ上の処理として実行することができる検出支援装置、学習装置、検出支援方法、学習方法、及びプログラムを提供することである。 The present invention has been made in consideration of the above circumstances, and its purpose is to provide a detection support device, a learning device, a detection support method, a learning method, and a program that can execute the human visual processing process for content images as processing on a processor.

上述した課題を解決するために、本発明の一態様である検出支援装置は、所定の柄を繰り返し配置することにより生成されるコンテンツ画像に生じる、前記コンテンツ画像における柄の連続性による想定外のパターンの有無の検出を支援する検出支援装置であって、前記コンテンツ画像を取得するコンテンツ画像取得部と、視線特徴学習モデルを用いて前記コンテンツ画像から前記パターンを検出する際の視線の特徴である視線特徴を推定し、推定した前記視線特徴を前記コンテンツ画像における画素ごとに示す視線特徴画像を生成する視線特徴画像生成部と、前記視線特徴画像を表示する解析結果出力部と、を備え、前記視線特徴学習モデルは、学習用の前記コンテンツ画像と、当該学習用の前記コンテンツ画像から前記パターンを検出した人間の前記視線特徴との対応関係を学習した学習済モデルである。 In order to solve the above-mentioned problems, a detection support device according to one aspect of the present invention is a detection support device that supports the detection of the presence or absence of unexpected patterns that arise in a content image generated by repeatedly arranging a predetermined pattern, due to the continuity of the pattern in the content image, and includes a content image acquisition unit that acquires the content image, a gaze feature image generation unit that uses a gaze feature learning model to estimate gaze features that are characteristics of the gaze when detecting the pattern from the content image and generates a gaze feature image showing the estimated gaze features for each pixel in the content image, and an analysis result output unit that displays the gaze feature image, and the gaze feature learning model is a trained model that has learned the correspondence between the content image for learning and the gaze features of a person who detected the pattern from the content image for learning.

上述した課題を解決するために、本発明の一態様である学習装置は、所定の柄を繰り返し配置することにより生成されるコンテンツ画像に生じる、前記コンテンツ画像における柄の連続性による想定外のパターンの有無の検出を支援する検出支援装置が用いる視線特徴学習モデルを生成する学習装置であって、前記視線特徴学習モデルは、学習用の前記コンテンツ画像と、当該学習用の前記コンテンツ画像から前記パターンを検出した人間の視線の特徴である視線特徴との対応関係を学習した学習済モデルであり、前記視線特徴学習モデルを生成する手法として、深層学習、または、既存の学習モデルを用いた転移学習のいずれかを選択可能に構成され、選択された手法を用いて前記視線特徴学習モデルを生成する深層学習部、を備える。 In order to solve the above-mentioned problems, a learning device according to one aspect of the present invention is a learning device that generates a gaze feature learning model used by a detection support device that supports detection of the presence or absence of unexpected patterns due to the continuity of patterns in a content image generated by repeatedly arranging a predetermined pattern, the gaze feature learning model being a trained model that has learned the correspondence between the content image for learning and gaze features that are characteristics of the gaze of a person who detected the pattern from the content image for learning, and is configured to be able to select either deep learning or transfer learning using an existing learning model as a method for generating the gaze feature learning model, and includes a deep learning unit that generates the gaze feature learning model using the selected method.

上述した課題を解決するために、本発明の一態様である検出支援方法は、所定の柄を繰り返し配置することにより生成されるコンテンツ画像に生じる、前記コンテンツ画像における柄の連続性による想定外のパターンの有無の検出を支援する検出支援装置であるコンピュータが行う検出支援方法であって、コンテンツ画像取得部が、前記コンテンツ画像を取得し、視線特徴画像生成部が、視線特徴学習モデルを用いて前記コンテンツ画像から前記パターンを検出する際の視線の特徴である視線特徴を推定し、推定した前記視線特徴を前記コンテンツ画像における画素ごとに示す視線特徴画像を生成し、解析結果出力部が、前記視線特徴画像を表示し、前記視線特徴学習モデルは、学習用の前記コンテンツ画像と、当該学習用の前記コンテンツ画像から前記パターンを検出した人間の前記視線特徴との対応関係を学習した学習済モデルである。 In order to solve the above-mentioned problems, a detection support method that is one aspect of the present invention is a detection support method performed by a computer that is a detection support device that supports the detection of the presence or absence of unexpected patterns that arise in a content image generated by repeatedly arranging a predetermined pattern due to the continuity of the pattern in the content image, in which a content image acquisition unit acquires the content image, a gaze feature image generation unit estimates gaze features that are characteristics of the gaze when detecting the pattern from the content image using a gaze feature learning model, generates a gaze feature image that shows the estimated gaze features for each pixel in the content image, an analysis result output unit displays the gaze feature image, and the gaze feature learning model is a learned model that has learned the correspondence between the content image for learning and the gaze features of a person who detected the pattern from the content image for learning.

上述した課題を解決するために、本発明の一態様である学習方法は、所定の柄を繰り返し配置することにより生成されるコンテンツ画像に生じる、前記コンテンツ画像における柄の連続性による想定外のパターンの有無の検出を支援する検出支援装置が用いる視線特徴学習モデルを生成する学習装置であるコンピュータが行う学習方法であって、前記視線特徴学習モデルは、学習用の前記コンテンツ画像と、当該学習用の前記コンテンツ画像から前記パターンを検出した人間の視線の特徴である視線特徴との対応関係を学習した学習済モデルであり、深層学習部が、前記視線特徴学習モデルを生成する手法として、深層学習、または、既存の学習モデルを用いた転移学習のいずれかを選択可能に構成され、選択された手法を用いて前記視線特徴学習モデルを生成する。 In order to solve the above-mentioned problems, a learning method that is one aspect of the present invention is a learning method performed by a computer that is a learning device that generates a gaze feature learning model used by a detection support device that supports the detection of the presence or absence of unexpected patterns due to the continuity of patterns in a content image generated by repeatedly arranging a predetermined pattern, the gaze feature learning model is a trained model that has learned the correspondence between the content image for learning and gaze features that are characteristics of the gaze of a person who detected the pattern from the content image for learning, and a deep learning unit is configured to be able to select either deep learning or transfer learning using an existing learning model as a method for generating the gaze feature learning model, and generates the gaze feature learning model using the selected method.

本発明の一態様であるプログラムは、コンピュータを、上記検出支援装置として機能させるためのプログラムである。 One aspect of the present invention is a program for causing a computer to function as the above-mentioned detection assistance device.

本発明の一態様であるプログラムは、コンピュータを、上記学習装置として機能させるためのプログラムである。 One aspect of the present invention is a program for causing a computer to function as the above-mentioned learning device.

以上説明したように、本発明によれば、コンテンツ画像に対する、人間の視覚による処理プロセスを、プロセッサ上の処理として実行することができる。 As described above, according to the present invention, the processing of content images based on human vision can be executed as processing on a processor.

本発明の第１の実施形態の検出支援装置１００の構成例を示すブロック図である。1 is a block diagram showing an example of a configuration of a detection support device 100 according to a first embodiment of the present invention. 本発明の第１の実施形態の検出支援装置１００が行う処理の流れを示すフローチャートである。4 is a flowchart showing a flow of processing performed by the detection support device 100 of the first embodiment of the present invention. 本発明の第１の実施形態において処理の対象とする画像の例を示す図である。FIG. 2 is a diagram showing an example of an image to be processed in the first embodiment of the present invention. 本発明の第１の実施形態の視覚特徴画像の例である。4 is an example of a visual feature image according to the first embodiment of the present invention. 本発明の第１の実施形態の解析結果を示す図である。FIG. 11 is a diagram showing an analysis result of the first embodiment of the present invention. 本発明の第１の実施形態の解析結果を示す図である。FIG. 11 is a diagram showing an analysis result of the first embodiment of the present invention. 本発明の第１の実施形態の解析結果を示す図である。FIG. 11 is a diagram showing an analysis result of the first embodiment of the present invention. 本発明の第２の実施形態の検出支援装置１００Ａの構成例を示すブロック図である。FIG. 11 is a block diagram showing an example of a configuration of a detection support device 100A according to a second embodiment of the present invention. 本発明の第２の実施形態の検出支援装置１００Ａが行う処理の流れを示すフローチャートである。10 is a flowchart showing a flow of processing performed by a detection support device 100A according to a second embodiment of the present invention. 実施形態の学習装置２００の構成例を示すブロック図である。FIG. 2 is a block diagram showing an example of the configuration of a learning device 200 according to the embodiment. 実施形態の学習装置２００が行う処理の流れを示すフローチャートである。1 is a flowchart showing a flow of processing performed by the learning device 200 of the embodiment. 本実施形態による学習装置２００が行なう処理の動作例を示すフローチャートである。4 is a flowchart showing an example of the operation of a process performed by the learning device 200 according to the present embodiment. 本発明の第３の実施形態の検出支援装置１００Ｂの構成例を示すブロック図である。FIG. 13 is a block diagram showing an example of a configuration of a detection support device 100B according to a third embodiment of the present invention. 本発明の第３の実施形態の検出支援装置１００Ｂが行う処理の流れを示すフローチャートである。13 is a flowchart showing a flow of processing performed by a detection support device 100B according to a third embodiment of the present invention.

以下、実施形態の検出支援装置を、図面を参照して説明する。 The detection support device according to the embodiment will be described below with reference to the drawings.

＜第１の実施形態＞
まず、第１の実施形態について説明する。
人間が、視覚的に得た情報からある判断をしたり、違和感を覚えたりする働きは、人間の脳の視覚神経機構の処理に依拠している。現在、脳の処理の比較的初期の段階については、その処理過程が判ってきている。そこで、その処理過程をモデル化した処理を行う装置として、検出支援装置１００を考える。検出支援装置１００が視覚神経機構の処理を実行することにより、人間の脳における視覚の情報処理を、より正確に再現することが可能となる。 First Embodiment
First, the first embodiment will be described.
The ability of humans to make certain judgments or feel something is wrong based on information obtained visually is dependent on the processing of the visual nerve mechanism in the human brain. Currently, the relatively early stages of brain processing are becoming clear. Therefore, the detection support device 100 is considered as a device that performs processing that models this processing. By having the detection support device 100 perform processing of the visual nerve mechanism, it becomes possible to more accurately reproduce visual information processing in the human brain.

本実施形態の検出支援装置１００は、処理の対象として、例えば、基準画像と検査画像とを用いる。基準画像は、熟練者により不具合が検出された画像である。検査画像は、熟練者による加工が施されて当該不具合が解消された画像である。 The detection support device 100 of this embodiment uses, for example, a reference image and an inspection image as processing targets. The reference image is an image in which a defect has been detected by an expert. The inspection image is an image in which the defect has been eliminated through processing by the expert.

基準画像と検査画像とを対象とすることにより、検出支援装置１００が、両画像における熟練者と非熟練者の見え方の差異を、人間の視知覚の神経機構の処理過程をモデル化したコンテンツ特徴量の差異として捕捉することができる。すなわち、検出支援装置１００が、両画像における見え方の差異を、いくつかの指標毎に、わかりやすく提示することで、基準画像に比べて検査画像のどのような性質がどの程度異なるのかを示し、非熟練者であっても不具合が検出し易くなるように支援する。 By targeting a reference image and a test image, the detection assistance device 100 can capture the difference in how an expert and an unexpert see both images as a difference in content features that model the processing process of the neural mechanisms of human visual perception. In other words, the detection assistance device 100 clearly presents the difference in how both images appear for each of several indicators, thereby showing what properties of the test image differ from the reference image and to what extent, thereby helping even an unexpert to easily detect defects.

なお、以下では、処理の対象とする画像（コンテンツ画像）が、静止画像である場合を例に説明するが、これに限定されることはない。コンテンツ画像は、動画像や、映像等であってもよい。 Note that, in the following, an example will be described in which the image to be processed (content image) is a still image, but this is not limited to this. The content image may also be a moving image, video, etc.

図１は、本発明の第１の実施形態の検出支援装置１００の構成例を示すブロック図である。検出支援装置１００は、例えば、コンテンツ画像選択部１０１と、視覚特徴選択部１０２と、視覚特徴画像生成部１０３と、画像特徴選択部１０４と、コンテンツ特徴量算出部１０５と、解析方法選択部１０６と、解析部１０７と、コンテンツ画像ＤＢ（データベース）１０８と、視覚特徴ＤＢ１０９と、視覚特徴画像記憶部１１０と、画像特徴ＤＢ１１１と、コンテンツ特徴量記憶部１１２と、解析方法ＤＢ１１３と、解析結果記憶部１１４と、解析結果出力部１１５とを備える。コンテンツ画像選択部１０１は、「コンテンツ取得部」の一例である。 FIG. 1 is a block diagram showing an example of the configuration of a detection support device 100 according to a first embodiment of the present invention. The detection support device 100 includes, for example, a content image selection unit 101, a visual feature selection unit 102, a visual feature image generation unit 103, an image feature selection unit 104, a content feature amount calculation unit 105, an analysis method selection unit 106, an analysis unit 107, a content image DB (database) 108, a visual feature DB 109, a visual feature image storage unit 110, an image feature DB 111, a content feature amount storage unit 112, an analysis method DB 113, an analysis result storage unit 114, and an analysis result output unit 115. The content image selection unit 101 is an example of a "content acquisition unit".

コンテンツ画像選択部１０１は、コンテンツ画像を取得する。コンテンツ画像は、所定の柄が繰り返し配置されることにより生成された意匠が表現されている画像である。コンテンツ画像は、例えば、建装材として用いられる壁紙などの化粧シートの意匠を示す画像である。 The content image selection unit 101 acquires a content image. A content image is an image that expresses a design that is generated by repeatedly arranging a predetermined pattern. For example, a content image is an image that shows the design of a decorative sheet such as wallpaper that is used as a building material.

コンテンツ画像選択部１０１は、コンテンツ画像ＤＢ１０８に記憶された複数のコンテンツ画像の中から、ユーザ等により選択された画像を、コンテンツ画像として取得する。ユーザ等による選択の方法は、任意の方法であってよい。例えば、コンテンツ画像選択部１０１は、コンテンツ画像ＤＢ１０８を参照してコンテンツ画像を表示部（不図示）に表示させる。コンテンツ画像選択部１０１は、マウスやキーボード等の外部入力装置がユーザ等により操作されることにより選択された画像を、コンテンツ画像として取得する。 The content image selection unit 101 acquires an image selected by a user or the like from among a plurality of content images stored in the content image DB 108 as a content image. The method of selection by the user or the like may be any method. For example, the content image selection unit 101 refers to the content image DB 108 and causes the content image to be displayed on a display unit (not shown). The content image selection unit 101 acquires an image selected by the user or the like operating an external input device such as a mouse or keyboard as a content image.

なお、コンテンツ画像は、コンテンツ画像ＤＢ１０８に記憶されたものに限定されることはなく、例えば、可搬型メモリや、スキャナ、或いは通信ネットワークなど任意の入力手段を介して検出支援装置１００により取得されたものであってもよい。 The content images are not limited to those stored in the content image DB 108, but may be images acquired by the detection assistance device 100 via any input means, such as a portable memory, a scanner, or a communication network.

視覚特徴選択部１０２は、視覚特徴を選択する。
視覚特徴は、人間の脳の処理の比較的初期の段階で視覚により認識され得る特徴であり、例えば、輝度、色度、コントラスト、勾配、エッジ、オプティカルフロー等である。
視覚特徴は、輝度、色度、赤緑色度、黄青色度、方向、輝度勾配、色度勾配、赤緑勾配、黄青勾配、方向勾配、輝度コントラスト、色度コントラスト、赤緑コントラスト、黄青コントラスト、方向コントラストなどを含む。
また、視覚特徴は、人の目の惹き易さを表す指標であってもよい。人の目の惹き易さを表す指標としては、視覚的注意モデル、視線予測モデル、顕著性モデル、またはサリエンシーモデルと呼ばれるものがある。例えば、非特許文献１や非特許文献２や非特許文献３等の方法を用いることができる。
視覚特徴選択部１０２は、例えば、視覚特徴ＤＢ１０９に記憶された複数の視覚特徴の中から、ユーザ等により選択された視覚特徴を選択する。 The visual feature selection unit 102 selects visual features.
Visual features are features that can be recognized by vision at a relatively early stage of human brain processing, such as luminance, chromaticity, contrast, gradient, edges, optical flow, and the like.
The visual features include luminance, chromaticity, red-green chromaticity, yellow-blue chromaticity, orientation, luminance gradient, chromaticity gradient, red-green gradient, yellow-blue gradient, orientation gradient, luminance contrast, chromaticity contrast, red-green contrast, yellow-blue contrast, orientation contrast, and the like.
The visual feature may be an index representing the ease with which a person's eye is attracted. Examples of the index representing the ease with which a person's eye is attracted include a visual attention model, a gaze prediction model, a saliency model, and a saliency model. For example, the methods described in Non-Patent Document 1, Non-Patent Document 2, Non-Patent Document 3, and the like can be used.
The visual feature selection unit 102 selects a visual feature selected by a user or the like from a plurality of visual features stored in the visual feature DB 109, for example.

視覚特徴画像生成部１０３は、コンテンツ画像に視覚特徴を適用することにより、視覚特徴画像を生成する。視覚特徴画像は、コンテンツ画像における視覚特徴を示す画像であり、例えば、コンテンツ画像における画素ごとに算出した視覚特徴の度合い（視覚特徴量）を、当該画素の位置座標に対応させた画像である。ここで用いられるコンテンツ画像は、コンテンツ画像選択部１０１により選択された画像である。ここで用いられる視覚特徴は、視覚特徴選択部１０２により選択された視覚特徴である。視覚特徴画像生成部１０３は、生成した視覚特徴画像を、視覚特徴画像記憶部１１０に記憶させる。 The visual feature image generating unit 103 generates a visual feature image by applying visual features to the content image. The visual feature image is an image that shows the visual features in the content image, and is, for example, an image in which the degree of visual feature (visual feature amount) calculated for each pixel in the content image corresponds to the position coordinates of the pixel. The content image used here is the image selected by the content image selecting unit 101. The visual features used here are the visual features selected by the visual feature selecting unit 102. The visual feature image generating unit 103 stores the generated visual feature image in the visual feature image storage unit 110.

画像特徴選択部１０４は、画像特徴を選択する。画像特徴は、既知の画像処理の技法を用いて抽出する画像上の特徴であり、例えば、同じ柄を複数配置して形成された模様における空間的な規則性（パターン）を抽出可能なテクスチャ特徴である。テクスチャ特徴としては、例えば、コントラスト、相関、角度２次モーメント、或は一様性等がある。 The image feature selection unit 104 selects image features. Image features are image features extracted using known image processing techniques, such as texture features that can extract spatial regularity (patterns) in a design formed by arranging multiple identical designs. Examples of texture features include contrast, correlation, angular second moment, and uniformity.

画像特徴選択部１０４は、例えば、画像特徴ＤＢ１１１に記憶された複数の画像特徴の中から、ユーザ等による選択操作により選択された画像特徴を選択する。画像特徴選択部１０４は、取得した画像特徴をコンテンツ特徴量算出部１０５に出力する。 The image feature selection unit 104 selects, for example, an image feature selected by a selection operation by a user or the like from among a plurality of image features stored in the image feature DB 111. The image feature selection unit 104 outputs the acquired image feature to the content feature amount calculation unit 105.

コンテンツ特徴量算出部１０５は、コンテンツ特徴量を算出する。コンテンツ特徴量は、コンテンツ画像の見え方に関する特徴の度合いを示す指標であって、例えば、視覚特徴画像に画像特徴を適用することにより算出される、視覚特徴画像における画像上の特徴を統計的に示す統計量である。 The content feature calculation unit 105 calculates the content feature. The content feature is an index indicating the degree of the feature related to the appearance of the content image, and is, for example, a statistical quantity that statistically indicates the image features in the visual feature image, calculated by applying image features to the visual feature image.

コンテンツ特徴量は、例えば、視覚特徴として輝度が選択され、画像特徴としてコントラストが選択された場合、コンテンツ画像において、人間の視覚に認識され得る輝度の状態がいかなるコントラストを形成しているかを示す値となる。ここで用いられる視覚特徴画像は、視覚特徴画像生成部１０３により生成された画像である。ここで用いられる画像特徴は、画像特徴選択部１０４により選択された画像特徴である。 For example, if brightness is selected as the visual feature and contrast is selected as the image feature, the content feature amount is a value that indicates what kind of contrast is formed by the brightness state that can be recognized by human vision in the content image. The visual feature image used here is an image generated by the visual feature image generation unit 103. The image feature used here is an image feature selected by the image feature selection unit 104.

コンテンツ特徴量算出部１０５は、算出したコンテンツ特徴量を、解析部１０７に出力する。また、コンテンツ特徴量算出部１０５は、算出したコンテンツ特徴量を、コンテンツ特徴量記憶部１１２に記憶させる。 The content feature amount calculation unit 105 outputs the calculated content feature amount to the analysis unit 107. The content feature amount calculation unit 105 also stores the calculated content feature amount in the content feature amount storage unit 112.

なお、コンテンツ特徴量は、視覚特徴の度合い（視覚特徴量）であってもよい。この場合、コンテンツ特徴量算出部１０５は、例えば、（画像特徴を用いることなく）視覚特徴画像を用いて、コンテンツ特徴量を算出する。 The content feature amount may be the degree of visual features (visual feature amount). In this case, the content feature amount calculation unit 105 calculates the content feature amount, for example, by using a visual feature image (without using image features).

解析方法選択部１０６は、解析方法を選択する。解析方法は、コンテンツ特徴量を提示する方法であり、例えば、コンテンツ特徴量を示すグラフの種別を示す情報である。グラフの種別としては、例えば、折れ線グラフ、棒線グラフ、円グラフ、レーダチャート等がある。 The analysis method selection unit 106 selects an analysis method. The analysis method is a method for presenting content features, and is, for example, information indicating the type of graph showing the content features. Graph types include, for example, line graphs, bar graphs, pie charts, radar charts, etc.

解析方法選択部１０６は、例えば、解析方法ＤＢ１１３に記憶された複数の画像特徴の中から、ユーザ等による選択操作により選択された画像特徴を選択する。画像特徴選択部１０４は、取得した画像特徴をコンテンツ特徴量算出部１０５に出力する。 The analysis method selection unit 106 selects, for example, an image feature selected by a selection operation by a user or the like from among a plurality of image features stored in the analysis method DB 113. The image feature selection unit 104 outputs the acquired image feature to the content feature amount calculation unit 105.

解析部１０７は、コンテンツ特徴量に解析方法を適用することにより、コンテンツ画像におけるコンテンツ特徴量を提示するための情報を生成する。解析部１０７は、生成した情報を解析結果記憶部１１４に記憶させる。また、解析部１０７は、生成した情報を、解析結果出力部１１５に出力する。 The analysis unit 107 applies an analysis method to the content features to generate information for presenting the content features in the content image. The analysis unit 107 stores the generated information in the analysis result storage unit 114. The analysis unit 107 also outputs the generated information to the analysis result output unit 115.

コンテンツ画像ＤＢ１０８は、コンテンツ画像を記憶する。コンテンツ画像ＤＢ１０８には、例えば、コンテンツ画像を一意に示す識別情報に対応付けられたコンテンツ画像が記憶される。コンテンツ画像は、例えば、マウスやキーボード等の外部入力装置、或いは、可搬型メモリや、スキャナ、或いは通信ネットワークなど任意の入力手段を介して検出支援装置１００により取得され、コンテンツ画像ＤＢ１０８に記憶される。 The content image DB 108 stores content images. For example, the content image DB 108 stores a content image associated with identification information that uniquely identifies the content image. The content image is acquired by the detection assistance device 100 via any input means, such as an external input device such as a mouse or keyboard, or a portable memory, a scanner, or a communication network, and is stored in the content image DB 108.

コンテンツ画像ＤＢ１０８には、基準画像とその基準画像を加工した検査画像とが対応付けられて記憶されていてもよいし、コンテンツ画像の種別に応じて分類された状態で、コンテンツ画像が記憶されていてもよい。コンテンツ画像の種別とは、例えば、柄の組み合わせ方法や、化粧シートとして作成される場合にシート表面に凹凸が有るか否かなどにより区分される。 In the content image DB 108, a reference image and an inspection image obtained by processing the reference image may be stored in association with each other, or the content images may be stored in a state of being classified according to the type of content image. The type of content image may be classified, for example, according to the method of combining patterns, or whether or not there are irregularities on the sheet surface when it is made into a decorative sheet.

視覚特徴ＤＢ１０９は、視覚特徴を記憶する。視覚特徴ＤＢ１０９には、例えば、視覚特徴を一意に示す識別情報に対応付けられた知覚特徴が記憶される。知覚特徴は、例えば、外部入力装置、或いは、入力手段を介して検出支援装置１００により取得され、視覚特徴ＤＢ１０９に記憶される。 The visual feature DB 109 stores visual features. For example, the visual feature DB 109 stores perceptual features associated with identification information that uniquely indicates the visual feature. The perceptual features are acquired by the detection support device 100 via, for example, an external input device or an input means, and are stored in the visual feature DB 109.

視覚特徴画像記憶部１１０は、視覚特徴画像生成部１０３により生成された視覚特徴画像を記憶する。視覚特徴画像記憶部１１０には、例えば、視覚特徴画像を一意に示す識別情報に対応付けられた視覚特徴画像、当該視覚特徴画像の生成に用いられたコンテンツ画像の識別情報、及び当該視覚特徴画像の生成に用いられた視覚特徴の識別情報などが記憶される。 The visual feature image storage unit 110 stores the visual feature image generated by the visual feature image generation unit 103. The visual feature image storage unit 110 stores, for example, a visual feature image associated with identification information that uniquely indicates the visual feature image, identification information of the content image used to generate the visual feature image, and identification information of the visual feature used to generate the visual feature image.

画像特徴ＤＢ１１１は、画像特徴を記憶する。画像特徴ＤＢ１１１には、例えば、画像特徴を一意に示す識別情報に対応付けられた画像特徴が記憶される。画像特徴は、例えば、外部入力装置、或いは、入力手段を介して検出支援装置１００により取得され、画像特徴ＤＢ１１１に記憶される。 Image feature DB111 stores image features. For example, image features associated with identification information that uniquely indicates the image feature are stored in image feature DB111. The image features are acquired by the detection support device 100 via, for example, an external input device or an input means, and are stored in image feature DB111.

コンテンツ特徴量記憶部１１２は、コンテンツ特徴量算出部１０５により算出されたコンテンツ特徴量を記憶する。コンテンツ特徴量記憶部１１２には、例えば、コンテンツ特徴量を一意に識別する識別情報に対応付けられたコンテンツ特徴量、当該コンテンツ特徴量の算出に用いられた視覚特徴画像の識別情報、及び当該コンテンツ特徴量の算出に用いられた画像特徴の識別情報などが記憶される。 The content feature storage unit 112 stores the content feature calculated by the content feature calculation unit 105. The content feature storage unit 112 stores, for example, a content feature associated with identification information that uniquely identifies the content feature, identification information of a visual feature image used to calculate the content feature, and identification information of an image feature used to calculate the content feature.

解析方法ＤＢ１１３は、解析方法を記憶する。解析方法ＤＢ１１３には、例えば、解析方法を一意に示す識別情報に対応付けられた解析方法が記憶される。解析方法は、例えば、外部入力装置、或いは、入力手段を介して検出支援装置１００により取得され、解析方法ＤＢ１１３に記憶される。 The analysis method DB113 stores analysis methods. For example, the analysis method DB113 stores an analysis method associated with identification information that uniquely indicates the analysis method. The analysis method is acquired by the detection support device 100 via an external input device or an input means, for example, and is stored in the analysis method DB113.

解析結果記憶部１１４は、解析部１０７による解析結果（コンテンツ特徴量を提示するための情報）を記憶する。解析結果記憶部１１４には、例えば、解析結果を一意に示す識別情報に対応付けられた解析結果、その解析に用いられたコンテンツ特徴量の識別情報、及びその解析に用いられた解析方法などが記憶される。 The analysis result storage unit 114 stores the analysis results (information for presenting content features) by the analysis unit 107. The analysis result storage unit 114 stores, for example, the analysis results associated with identification information that uniquely indicates the analysis results, the identification information of the content features used in the analysis, and the analysis method used in the analysis.

解析結果出力部１１５は、解析部１０７による解析結果（コンテンツ特徴量を提示するための情報）を出力する。解析結果出力部１１５は、例えば、表示部（不図示）に解析結果を出力し、解析結果を表示部に表示させる。 The analysis result output unit 115 outputs the analysis result (information for presenting content features) by the analysis unit 107. The analysis result output unit 115 outputs the analysis result to a display unit (not shown), for example, and causes the analysis result to be displayed on the display unit.

図２は、本発明の第１の実施形態の検出支援装置１００が行う処理の流れを示すフローチャートである。
ステップＳ１０：
検出支援装置１００は、コンテンツ画像選択部１０１によりコンテンツ画像を選択する。コンテンツ画像選択部１０１は、コンテンツ画像ＤＢ１０８を参照することによりコンテンツ画像を選択し、選択したコンテンツ画像を視覚特徴画像生成部１０３に出力する。
ステップＳ１１：
検出支援装置１００は、視覚特徴選択部１０２により視覚特徴を選択する。視覚特徴選択部１０２は、視覚特徴ＤＢ１０９を参照することにより視覚特徴を選択し、選択した視覚特徴を、視覚特徴画像生成部１０３に出力する。
ステップＳ１２：
検出支援装置１００は、視覚特徴画像生成部１０３により視覚特徴画像を生成する。視覚特徴画像生成部１０３は、ステップＳ１０にて選択されたコンテンツ画像における、ステップＳ１１にて選択された知覚特徴を算出することにより視覚特徴画像を生成する。視覚特徴画像生成部１０３は、生成した視覚特徴画像を、コンテンツ特徴量算出部１０５に出力する。 FIG. 2 is a flowchart showing a flow of processing performed by the detection support device 100 according to the first embodiment of the present invention.
Step S10:
The detection support device 100 selects a content image by the content image selection unit 101. The content image selection unit 101 selects a content image by referring to the content image DB 108, and outputs the selected content image to the visual feature image generation unit 103.
Step S11:
The detection support device 100 selects a visual feature by the visual feature selection unit 102. The visual feature selection unit 102 selects a visual feature by referring to a visual feature DB 109, and outputs the selected visual feature to the visual feature image generation unit 103.
Step S12:
The detection support device 100 generates a visual feature image by the visual feature image generation unit 103. The visual feature image generation unit 103 generates a visual feature image by calculating the perceptual features selected in step S11 in the content image selected in step S10. The visual feature image generation unit 103 outputs the generated visual feature image to the content feature amount calculation unit 105.

ステップＳ１３：
検出支援装置１００は、画像特徴選択部１０４により画像特徴を選択する。画像特徴選択部１０４は、画像特徴ＤＢ１１１を参照することにより画像特徴を選択し、選択した画像特徴を、コンテンツ特徴量算出部１０５に出力する。
ステップＳ１４：
検出支援装置１００は、コンテンツ特徴量算出部１０５により、コンテンツ画像におけるコンテンツ特徴量を算出する。コンテンツ特徴量算出部１０５は、ステップＳ１２にて生成された視覚特徴画像における、ステップＳ１３にて選択された画像特徴を算出することによりコンテンツ特徴量を算出する。
ステップＳ１５：
検出支援装置１００は、解析方法選択部１０６により解析方法を選択する。解析方法選択部１０６は、解析方法ＤＢ１１３を参照することにより解析方法を選択し、選択した解析方法を、解析部１０７に出力する。
ステップＳ１６：
検出支援装置１００は、解析部１０７により解析（コンテンツ画像の特徴量を提示するための情報の生成）を行う。解析部１０７は、ステップＳ１４にて算出されたコンテンツ画像の特徴量を、ステップＳ１５にて選択された解析方法にて示す情報を生成する。解析部１０７は、生成した情報を、解析結果記憶部１１４、及び解析結果出力部１１５に出力する。 Step S13:
The detection support device 100 selects an image feature by the image feature selection unit 104. The image feature selection unit 104 selects an image feature by referring to the image feature DB 111, and outputs the selected image feature to the content feature amount calculation unit 105.
Step S14:
The detection support device 100 calculates a content feature amount in the content image by the content feature amount calculation unit 105. The content feature amount calculation unit 105 calculates the content feature amount by calculating the image feature selected in step S13 in the visual feature image generated in step S12.
Step S15:
The detection support device 100 selects an analysis method by the analysis method selection unit 106. The analysis method selection unit 106 selects an analysis method by referring to the analysis method DB 113, and outputs the selected analysis method to the analysis unit 107.
Step S16:
The detection support device 100 performs analysis (generation of information for presenting feature quantities of a content image) by the analysis unit 107. The analysis unit 107 generates information indicating the feature quantities of the content image calculated in step S14 by the analysis method selected in step S15. The analysis unit 107 outputs the generated information to the analysis result storage unit 114 and the analysis result output unit 115.

ステップＳ１７：
検出支援装置１００は、解析結果記憶部１１４に、ステップＳ１６にて解析された解析結果（コンテンツ画像の特徴量を提示するための情報）を記憶させる。
ステップＳ１８：
検出支援装置１００は、解析結果出力部１１５により、ステップＳ１６にて解析された解析結果（コンテンツ画像の特徴量を提示するための情報）を、表示部（不図示）などに出力する。 Step S17:
The detection support device 100 stores the analysis results (information for presenting the feature amounts of the content image) obtained in step S16 in the analysis result storage unit 114.
Step S18:
The detection support device 100 outputs the analysis result (information for presenting the feature amount of the content image) analyzed in step S16 by the analysis result output unit 115 to a display unit (not shown) or the like.

図３は、本発明の第１の実施形態の基準画像及び検査画像の例を示す図である。図３では、左側に基準画像、右側に検査画像が示されている。
図３に示す通り、例えば、基準画像と検査画像とは、同一の意匠と思われるほどによく類似して見える。基準画像と検査画像とは、ほとんど差異がないようにも思われる。しかしながら、熟練者であれば、基準画像に不具合を検出し、検出した内容に基づいて基準画像を検査画像のように加工する。
本実施形態では、検出支援装置１００により基準画像と検査画像との各々を、人間の視知覚による見え方の差異を、コンテンツ特徴量の差異として可視化して提示することが可能である。つまり、検出支援装置１００は、基準画像に比べて検査画像のどのような性質がどの程度異なるのかを、数値で示すことができる。こうすることで、一見ほとんど差異がないようにも思われる両画像の差異を、熟練者でない者が認識できるように支援する。 3 is a diagram showing an example of a reference image and an inspection image according to the first embodiment of the present invention, in which the reference image is shown on the left and the inspection image is shown on the right.
As shown in Figure 3, for example, the reference image and the test image look so similar that they seem to be the same design. There seems to be almost no difference between the reference image and the test image. However, an expert can detect defects in the reference image and process the reference image to look like the test image based on the detected defects.
In this embodiment, the detection support device 100 can visualize and present the difference in appearance between the reference image and the test image due to human visual perception as a difference in content features. In other words, the detection support device 100 can numerically indicate the degree to which the characteristics of the test image differ from the reference image. In this way, the detection support device 100 can help an unskilled person to recognize the difference between the two images, which at first glance appear to be almost the same.

図４は、本発明の第１の実施形態の視覚特徴画像の例を示す図である。図４では、視覚特徴としてコントラストが選択された場合の例を示している。 Figure 4 is a diagram showing an example of a visual feature image in the first embodiment of the present invention. Figure 4 shows an example in which contrast is selected as the visual feature.

図４では、左側に基準画像における視覚特徴画像の例、右側に検査画像における視覚特徴画像の例を示しており、基準画像及び検査画像の各々についてスケール毎に三つの視覚特徴画像を示している。スケールは、視覚特徴（この例では、コントラスト）を算出する空間の大きさ（画像サイズ）を示す指標であって、上方向に細かい（コントラストを算出する画像サイズが小さい）スケール値、下方向に粗い（同画像サイズが大きい）スケール値を示している。三つの視覚特徴画像は、基準画像及び検査画像に対して、１×１は基準画像を縦横に１枚ずつ連結した（１枚分）場合、２×２は縦横に２枚ずつ連結した（４枚分）場合、３×３は縦横に３枚ずつ連結した（９枚分）場合の視覚特徴画像を、それぞれ示している。 In Figure 4, an example of a visual feature image in a reference image is shown on the left, and an example of a visual feature image in a test image is shown on the right, with three visual feature images shown for each scale for the reference image and test image. The scale is an index that indicates the size of the space (image size) in which the visual feature (in this example, contrast) is calculated, with finer scale values (smaller image size for calculating contrast) shown upwards and coarser scale values (larger image size) shown downwards. The three visual feature images, 1x1, indicate the visual feature image when one reference image is connected vertically and horizontally (one image), 2x2, indicate the visual feature image when two images are connected vertically and horizontally (four images), and 3x3, indicate the visual feature image when three images are connected vertically and horizontally (nine images).

つまり、図４の上段には、スケール値を８（「ｓｃｌ８」と記載）として算出した場合における、基準画像及び検査画像の各々の視覚特徴画像を、左から順に（１×１）、（２×２）、（３×３）の画像サイズで示している。同様に、図４の上段には、スケール値を１６（「ｓｃｌ１６」と記載）として算出した場合における、基準画像及び検査画像の各々の視覚特徴画像を、左から順に（１×１）、（２×２）、（３×３）の画像サイズで示している。図４の上段には、スケール値を８（「ｓｃｌ８」と記載）として算出した場合における、基準画像及び検査画像の各々の視覚特徴画像を、左から順に（１×１）、（２×２）、（３×３）の画像サイズで示している。 That is, the top row of FIG. 4 shows the visual feature images of the reference image and the test image with image sizes of (1×1), (2×2), and (3×3) from the left when the scale value is calculated as 8 (denoted as "scl8"). Similarly, the top row of FIG. 4 shows the visual feature images of the reference image and the test image with image sizes of (1×1), (2×2), and (3×3) from the left when the scale value is calculated as 16 (denoted as "scl16"). The top row of FIG. 4 shows the visual feature images of the reference image and the test image with image sizes of (1×1), (2×2), and (3×3) from the left when the scale value is calculated as 8 (denoted as "scl8").

図４に示すように、基準画像から生成した視覚特徴画像と、検査画像から生成した視覚特徴画像とは、例えば、スケール８における（１×１）に対応する両画像や、スケール１６における（１×１）に対応する視覚特徴画像を見比べれば、図３の基準画像及び検査画像を見比べた場合と比較して、差異があるように思われる。 As shown in Figure 4, when comparing the visual feature image generated from the reference image and the visual feature image generated from the test image, for example, when comparing the two images corresponding to (1 x 1) on scale 8 and the visual feature image corresponding to (1 x 1) on scale 16, there appears to be a difference compared to when comparing the reference image and test image in Figure 3.

視覚特徴画像に差異が認められる場合、基準画像及び検査画像の両画像において、視覚特徴（この例では、コントラスト）に、差異があることを示している。つまり、両画像を視覚特徴画像に変換することで、両画像における視覚特徴に起因する見え方の差異を強調することが可能である。 When differences are found in the visual feature images, it indicates that there is a difference in the visual features (in this example, contrast) between the reference image and the test image. In other words, by converting both images into visual feature images, it is possible to emphasize the difference in appearance caused by the visual features in both images.

図５Ａは、本発明の第１の実施形態の解析結果を示す図である。図５Ａでは、解析方法として折れ線グラフが選択された場合の例を示している。 Figure 5A is a diagram showing the analysis results of the first embodiment of the present invention. Figure 5A shows an example in which a line graph is selected as the analysis method.

図５Ａでは、スケール値ごとに三つの折れ線グラフを示している。それぞれの折れ線グラフは、基準画像及び検査画像における視覚特徴画像の相関差分を４方向について示している。四方向は、画像に設定した所定の基準軸から、それぞれ０［ｄｅｇ］、４５［ｄｅｇ］、９０［ｄｅｇ］、及び１３５［ｄｅｇ］の方向である。三つの折れ線グラフは、左から順に（１×１）、（２×２）、（３×３）の画像サイズにおける相関差分を示している。 Figure 5A shows three line graphs for each scale value. Each line graph shows the correlation difference of the visual feature images in the reference image and the test image in four directions. The four directions are 0 [deg], 45 [deg], 90 [deg], and 135 [deg] from a predetermined reference axis set in the image. From the left, the three line graphs show the correlation difference for image sizes of (1x1), (2x2), and (3x3).

図５Ｂは、図５Ａに示す複数の相関差分のうち、スケール８における（２×２）の相関差分を示している。
図５Ｂに示すように、例えば、スケール８における（２×２）における、０［ｄｅｇ］、及び９０［ｄｅｇ］の方向の相関差分が、他の方向の相関差分と比較して大きな値を示す傾向にある。 FIG. 5B shows a (2×2) correlation difference at scale 8 out of the multiple correlation differences shown in FIG. 5A.
As shown in FIG. 5B, for example, in a (2×2) scale of 8, the correlation differences in the directions of 0 degrees and 90 degrees tend to exhibit larger values than the correlation differences in other directions.

視覚特徴に比較的大きな差異が示される箇所には、両画像に、比較的大きな差異があることが示されている。つまり、両画像を視覚特徴画像に変換して、その視覚特徴を示すことで、両画像における視覚特徴に起因する差異を定量的に示すことが可能である。 In areas where relatively large differences in visual features are shown, it is shown that there are relatively large differences between the two images. In other words, by converting both images into visual feature images and showing the visual features, it is possible to quantitatively show the differences caused by the visual features in both images.

図５Ｃは、本発明の第１の実施形態の解析結果を示す図である。図５Ｃでは、解析方法としてレーダチャートが選択された場合の例を示している。 Figure 5C is a diagram showing the analysis results of the first embodiment of the present invention. Figure 5C shows an example in which a radar chart is selected as the analysis method.

図５Ｃでは、スケール値ごとに三つのレーダチャートを示している。それぞれのレーダチャートは、基準画像及び検査画像におけるコンテンツ特徴量（この例では、視覚特徴画像のコントラスト）を八方向について示している。八方向は、画像の中心から上下左右、及び、右上、左上、左下、右下のそれぞれの方向である。三つのレーダチャートは、左から順に（１×１）、（２×２）、（３×３）の画像サイズにおける相関差分を示している。 Figure 5C shows three radar charts for each scale value. Each radar chart shows the content features (in this example, the contrast of the visual feature image) in the reference image and the test image in eight directions. The eight directions are up, down, left, right, and top right, top left, bottom left, and bottom right from the center of the image. From the left, the three radar charts show the correlation difference for image sizes of (1x1), (2x2), and (3x3).

図５Ｃに示すように、例えば、スケール３２における（１×１）における、基準画像の全方向のコントラストが、検査画像のコントラストと比較して大きな値を示す傾向にある。 As shown in FIG. 5C, for example, the contrast in all directions of the reference image at (1×1) scale 32 tends to be greater than the contrast of the test image.

コンテンツ特徴量に比較的大きな差異が示される箇所には、両画像に、比較的大きな差異があることが示されている。つまり、両画像におけるコンテンツ特徴量を示すことで、両画像における見え方の差異を定量的に示すことが可能である。 Where relatively large differences in content features are shown, it is shown that there are relatively large differences between the two images. In other words, by showing the content features of both images, it is possible to quantitatively show the differences in appearance between the two images.

以上説明したように、第１の実施形態の検出支援装置１００は、コンテンツ画像選択部１０１（「コンテンツ画像取得部」の一例）と、コンテンツ特徴量算出部１０５とを備える。コンテンツ画像選択部１０１は、コンテンツ画像に関する情報を取得する。コンテンツ特徴量算出部１０５は、コンテンツ画像に、視覚特徴を適用することにより、コンテンツ特徴量を算出する。これにより、第１の実施形態の検出支援装置１００によれば、コンテンツ画像に、人間の視覚により認識され得る特徴である視覚特徴を適用することができるため、人間の視覚による処理プロセスを、プロセッサ上の処理として実行することができる。 As described above, the detection support device 100 of the first embodiment includes a content image selection unit 101 (an example of a "content image acquisition unit") and a content feature amount calculation unit 105. The content image selection unit 101 acquires information about the content image. The content feature amount calculation unit 105 calculates the content feature amount by applying visual features to the content image. As a result, according to the detection support device 100 of the first embodiment, visual features that are features that can be recognized by human vision can be applied to the content image, and therefore a processing process based on human vision can be executed as processing on a processor.

また、第１の実施形態の検出支援装置１００は、視覚特徴画像生成部１０３をさらに備えてもよい。視覚特徴画像生成部１０３は、視覚特徴画像を生成する。視覚特徴画像は、コンテンツ画像における画素ごとの視覚特徴の度合いである視覚特徴量を、前記画素に対応づけた画像である。これにより、第１の実施形態の検出支援装置１００によれば、視覚特徴をコンテンツ画像の画素に対応付けて示すことができ、コンテンツ画像におけるどの箇所がどのような視覚特徴量であるのかを、判りやすく示すことができる。 The detection support device 100 of the first embodiment may further include a visual feature image generation unit 103. The visual feature image generation unit 103 generates a visual feature image. The visual feature image is an image in which visual features, which are the degree of visual features for each pixel in a content image, are associated with the pixels. As a result, the detection support device 100 of the first embodiment can show visual features associated with pixels of the content image, and can clearly show which parts of the content image have what visual features.

また、第１の実施形態の検出支援装置１００では、視覚特徴画像生成部１０３は、輝度、色度、コントラスト、エッジ、オプティカルフロー、及び歪度の中から選択された少なくとも一つを、視覚特徴として用いて、視覚特徴画像を生成する。これにより、第１の実施形態の検出支援装置１００によれば、既存の画像処理の技法のうち、視覚による見え方に類似する技法を利用してより精度よく、視覚特徴画像を生成することができる。 In addition, in the detection support device 100 of the first embodiment, the visual feature image generation unit 103 generates a visual feature image using at least one selected from luminance, chromaticity, contrast, edge, optical flow, and skewness as a visual feature. As a result, according to the detection support device 100 of the first embodiment, it is possible to generate a visual feature image with higher accuracy by using a technique similar to the visual appearance among existing image processing techniques.

また、第１の実施形態の検出支援装置１００では、視覚特徴画像生成部１０３は、コンテンツ画像に対する人間の目の認識し易さを表す認識指標を、視覚特徴として用いて、視覚特徴画像を生成する。これにより、第１の実施形態の検出支援装置１００によれば、人間の目の認識に、より近づくように視覚特徴画像を生成することができる。 In addition, in the detection support device 100 of the first embodiment, the visual feature image generation unit 103 generates a visual feature image by using a recognition index that indicates the ease with which the human eye can recognize a content image as a visual feature. As a result, the detection support device 100 of the first embodiment can generate a visual feature image that is closer to the recognition by the human eye.

また、第１の実施形態の検出支援装置１００では、認識指標には、視覚的注意モデル、視線予測モデル、顕著性モデル、及びサリエンシーモデルのうち、少なくとも一つが含まれる。これにより、第１の実施形態の検出支援装置１００によれば、既存のモデルを用いて、より精度よく視覚特徴画像を生成することができる。 In addition, in the detection support device 100 of the first embodiment, the recognition index includes at least one of a visual attention model, a gaze prediction model, a saliency model, and a saliency model. As a result, the detection support device 100 of the first embodiment can generate visual feature images with higher accuracy using existing models.

また、第１の実施形態の検出支援装置１００では、コンテンツ特徴量算出部１０５は、視覚特徴画像に、画像特徴を適用することにより、コンテンツ特徴量を算出する。これにより、第１の実施形態の検出支援装置１００によれば、視覚特徴画像を画像処理の技法を用いて、視覚特徴により示される特徴の度合いを統計的に処理することができ、より定量的にコンテンツ特徴量を示すことができる。 In addition, in the detection support device 100 of the first embodiment, the content feature amount calculation unit 105 calculates the content feature amount by applying image features to the visual feature image. As a result, according to the detection support device 100 of the first embodiment, the visual feature image can be subjected to image processing techniques to statistically process the degree of the features indicated by the visual features, and the content feature amount can be indicated more quantitatively.

なお、人の脳機能の解明が進むにしたがって、視覚により認識され得る特徴が数多く発見されつつあり、それらの特徴を視覚特徴に含めてもよい。 As our understanding of human brain functions progresses, many features that can be recognized visually are being discovered, and these features may also be included in visual features.

＜第２の実施形態＞
以下、第２の実施形態について、図面を参照して説明する。本実施形態の検出支援装置１００Ａは、熟練者がコンテンツ画像の不具合を検出する際の視線を疑似的に提示する点において、上述した実施形態と相違する。検出支援装置１００Ａは、熟練者の視線を提示することにより、不具合の検出を支援し、非熟練者であっても不具合を検出し易くなるようにすることができる。本実施形態においては、第１の実施形態と異なる構成についてのみ説明し、第１の実施形態による図１の構成と同様の構成については同一の符号を付し、特に必要な場合を除いてその説明を省略する。 Second Embodiment
The second embodiment will be described below with reference to the drawings. The detection support device 100A of this embodiment differs from the above-mentioned embodiment in that it presents a pseudo line of sight when an expert detects a defect in a content image. The detection support device 100A presents the line of sight of an expert, thereby supporting the detection of defects, so that even an unskilled person can easily detect defects. In this embodiment, only the configuration different from the first embodiment will be described, and the same reference numerals will be used for the configuration similar to that of FIG. 1 according to the first embodiment, and the description thereof will be omitted unless particularly necessary.

図６は、第２の実施形態による検出支援装置１００Ａの構成例を示すブロック図である。検出支援装置１００Ａは、例えば、視線特徴学習モデル選択部１１６と、視線特徴画像生成部１１７と、視線特徴学習モデルＤＢ１１８と、コンテンツ特徴量算出部１０５Ａとを備える。検出支援装置１００Ａは、視覚特徴選択部１０２、及び視覚特徴画像生成部１０３を備えない。 FIG. 6 is a block diagram showing an example configuration of a detection support device 100A according to the second embodiment. The detection support device 100A includes, for example, a gaze feature learning model selection unit 116, a gaze feature image generation unit 117, a gaze feature learning model DB 118, and a content feature amount calculation unit 105A. The detection support device 100A does not include a visual feature selection unit 102 and a visual feature image generation unit 103.

以下、本実施形態においては、コンテンツ画像を静止画像として説明するが、第１の実施形態と同様に、動画像、映像等の他のコンテンツ画像に適用されてもよい。 In the following, in this embodiment, the content image is described as a still image, but similar to the first embodiment, this may be applied to other content images such as moving images and video.

視線特徴学習モデル選択部１１６は、視線特徴学習モデルを選択する。視線特徴学習モデルは、機械学習の手法により生成された、コンテンツ画像における視線特徴を推定するモデルである。視線特徴は、熟練者がコンテンツ画像の不具合を検出する際の視線に関する特徴を示す情報であって、例えば、後述する視線特徴画像、或いはコンテンツ画像の領域ごとに視認される度合いを統計的に示す情報である。 The gaze feature learning model selection unit 116 selects a gaze feature learning model. The gaze feature learning model is a model that estimates gaze features in a content image, generated by a machine learning technique. The gaze feature is information that indicates the features related to the gaze when an expert detects a defect in a content image, and is, for example, a gaze feature image described below, or information that statistically indicates the degree to which each area of the content image is visible.

視線特徴学習モデルは、例えば、互いに異なる複数の学習用のコンテンツ画像（学習用コンテンツ画像）の各々に、それぞれの学習用コンテンツ画像を視認した熟練者の視線特徴の実績を対応付けた学習データを用いて機械学習を実行することにより生成される。 The gaze feature learning model is generated, for example, by performing machine learning using learning data in which each of a number of different learning content images (learning content images) is associated with the gaze feature records of an expert who viewed each learning content image.

視線特徴を取得する手段としては、市販の専用の視線計測機（例えば、身体装着型計測機であるＴｏｂｉｉ社のＴｏｂｉｉＰｒｏＧｌａｓｓｅ２や、据え置き型計測機であるＴｏｂｉｉＰｒｏＸ２，Ｘ３等）を用いても良く、もしくは民生カメラと視線推定手法の組み合わせで計算する方法を用いても良い。視線特徴は、一般的に一定のサンプリングタイムで計測した視点の座標が時系列で格納されており、これを画像上の座標に変換して使用する。 The means for acquiring gaze features may be a commercially available dedicated gaze measurement device (for example, Tobii's Tobii Pro Glasse 2, which is a body-worn measurement device, or the Tobii Pro X2, X3, which is a stationary measurement device), or a calculation method that combines a consumer camera with a gaze estimation method may be used. Gaze features are generally stored in chronological order as the coordinates of the viewpoint measured at a fixed sampling time, and these are converted into coordinates on the image for use.

ここで用いられる機械学習の手法は、任意の手法であってよいが、例えば、深層ニューラルネットワークなどの推定モデルを用いて行われる。深層ニューラルネットは、例えば、入力層、出力層、及びその中間を多層の畳み込み層とプーリング層により接続された構成を備える。そして、多層ニューラルネットワークの入力層に学習用コンテンツ画像を入力した場合における、当該多層ニューラルネットワークの出力層から出力される情報が、その学習用コンテンツ画像に対応付けられた視線特徴となるように学習が繰返されることにより、各層を結合する結合係数やバイアス値が決定される。推定モデルの結合係数やバイアス値が決定されることにより、視線特徴学習モデルが生成される。 The machine learning method used here may be any method, but is performed, for example, using an estimation model such as a deep neural network. A deep neural network, for example, has an input layer, an output layer, and a configuration in which the layers are connected by multiple convolutional layers and pooling layers in between. When a learning content image is input to the input layer of the multilayer neural network, learning is repeated so that the information output from the output layer of the multilayer neural network becomes the gaze feature associated with the learning content image, and the coupling coefficients and bias values that couple each layer are determined. A gaze feature learning model is generated by determining the coupling coefficients and bias values of the estimation model.

視線特徴学習モデル選択部１１６は、例えば、視線特徴学習モデルＤＢ１１８に記憶された複数の視線特徴学習モデルの中から、ユーザ等による選択操作により選択された視線特徴学習モデルを選択する。視線特徴学習モデル選択部１１６は、取得した視線特徴学習モデルを視線特徴画像生成部１１７に出力する。 The gaze feature learning model selection unit 116 selects, for example, a gaze feature learning model selected by a selection operation by a user or the like from among a plurality of gaze feature learning models stored in the gaze feature learning model DB 118. The gaze feature learning model selection unit 116 outputs the acquired gaze feature learning model to the gaze feature image generation unit 117.

視線特徴画像生成部１１７は、コンテンツ画像に、視線特徴学習モデルにより推定された視線特徴を適用することにより、視線特徴画像を生成する。視線特徴画像は、コンテンツ画像における視線特徴の度合いを示す画像である。視線特徴から視線特徴画像を得る方法としては、例えば、計測時間内の画像上の視点の蓄積を確率分布として近似してヒートマップを出力する方法が用いられる。 The gaze feature image generating unit 117 generates a gaze feature image by applying the gaze features estimated by the gaze feature learning model to the content image. The gaze feature image is an image that indicates the degree of gaze features in the content image. As a method for obtaining the gaze feature image from the gaze features, for example, a method is used in which the accumulation of viewpoints on the image within the measurement time is approximated as a probability distribution and a heat map is output.

視線特徴画像は、視線特徴を、１枚の画像に情報を縮約させた画像である。ここで、情報の縮約方法としては、例えば、コンテンツ画像における画素毎の視線分布を計数して２次元ヒストグラムを構成し、ヒストグラムの山を、２次元正規分布を用いて近似表現して、強度［０，１］の範囲の実数値で表現する方法（一般にヒートマップと呼ばれる）等がある。 A gaze feature image is an image in which gaze features are condensed into information in a single image. Here, one method of condensing information is, for example, to count the gaze distribution for each pixel in the content image to create a two-dimensional histogram, and then to approximate the peaks of the histogram using a two-dimensional normal distribution and express the intensity as real values in the range [0, 1] (commonly called a heat map).

視線特徴画像生成部１１７により用いられるコンテンツ画像は、コンテンツ画像選択部１０１により選択された画像である。視線特徴画像生成部１１７により用いられる視線特徴は、視線特徴学習モデル選択部１１６により選択された視線特徴学習モデルに、コンテンツ画像選択部１０１により選択されたコンテンツ画像を入力させることにより推定されたものである。視線特徴画像生成部１１７は、生成した視線特徴画像を、コンテンツ特徴量算出部１０５Ａに出力する。 The content image used by the gaze feature image generation unit 117 is an image selected by the content image selection unit 101. The gaze feature used by the gaze feature image generation unit 117 is estimated by inputting the content image selected by the content image selection unit 101 into the gaze feature learning model selected by the gaze feature learning model selection unit 116. The gaze feature image generation unit 117 outputs the generated gaze feature image to the content feature amount calculation unit 105A.

視線特徴学習モデルＤＢ１１８は、視線特徴学習モデルを記憶する。視線特徴学習モデルＤＢ１１８には、例えば、視線特徴学習モデルを一意に示す識別情報に対応付けられた視線特徴学習モデルが記憶される。視線特徴学習モデルは、例えば、外部の学習サーバなどにより生成され、外部入力装置、或いは、入力手段を介して検出支援装置１００により取得され、解析方法ＤＢ１１３に記憶される。視線特徴学習モデルＤＢ１１８には、コンテンツ画像の種別に応じたモデルが記憶されていてもよい。これにより、コンテンツ画像の種別により、熟練者の見方が異なる場合であっても、その種別に応じたモデルを選択することができ、より精度よく視線特徴を推定させることが可能となる。 The gaze feature learning model DB118 stores gaze feature learning models. The gaze feature learning model DB118 stores, for example, gaze feature learning models associated with identification information that uniquely indicates the gaze feature learning model. The gaze feature learning model is generated, for example, by an external learning server, acquired by the detection support device 100 via an external input device or input means, and stored in the analysis method DB113. The gaze feature learning model DB118 may store models according to the type of content image. This makes it possible to select a model according to the type even if the expert's way of looking at the content image differs depending on the type of content image, making it possible to estimate gaze features with greater accuracy.

コンテンツ特徴量算出部１０５Ａは、視線特徴画像に画像特徴を適用することによりコンテンツ特徴量を算出する。本実施形態のコンテンツ特徴量は、例えば、視線特徴画像における画像上の特徴を統計的に示す統計量である。 The content feature amount calculation unit 105A calculates the content feature amount by applying image features to the gaze feature image. The content feature amount in this embodiment is, for example, a statistical amount that statistically indicates the image features in the gaze feature image.

コンテンツ特徴量は、例えば、画像特徴としてコントラストが選択された場合、コンテンツ画像において、熟練者の視線がいかなるコントラストを形成しているかを示す値となる。ここで用いられる視線特徴画像は、視線特徴画像生成部１１７により生成された画像である。ここで用いられる画像特徴は、画像特徴選択部１０４により選択された画像特徴である。 For example, when contrast is selected as an image feature, the content feature amount is a value indicating what kind of contrast the expert's gaze forms in the content image. The gaze feature image used here is an image generated by the gaze feature image generation unit 117. The image feature used here is an image feature selected by the image feature selection unit 104.

なお、コンテンツ特徴量は、視線特徴の度合いであってもよく、この場合、コンテンツ特徴量は、視線特徴画像そのものである。 The content feature amount may be the degree of gaze feature, in which case the content feature amount is the gaze feature image itself.

図７は、本実施形態による検出支援装置１００Ａが行なう処理の動作例を示すフローチャートである。図７のステップＳ２３、及びＳ２５～Ｓ２８の各々に示す処理については、図２のステップＳ１３、及びＳ１５～Ｓ１８の各々に示す処理と同様であるため、その説明を省略する。 Figure 7 is a flowchart showing an example of the processing performed by the detection support device 100A according to this embodiment. The processing shown in steps S23 and S25 to S28 in Figure 7 is similar to the processing shown in steps S13 and S15 to S18 in Figure 2, and therefore a description thereof will be omitted.

ステップＳ２０：
検出支援装置１００Ａは、コンテンツ画像選択部１０１により選択したコンテンツ画像を視線特徴画像生成部１１７に出力する。
ステップＳ２１：
検出支援装置１００Ａは、視線特徴学習モデル選択部１１６により視線特徴学習モデルを選択する。視線特徴学習モデル選択部１１６は、視線特徴学習モデルＤＢ１１８を参照することにより視線特徴学習モデルを選択し、選択した視線特徴学習モデルを、視線特徴画像生成部１１７に出力する。
ステップＳ２２：
検出支援装置１００Ａは、視線特徴画像生成部１１７により視線特徴画像を生成する。視線特徴画像生成部１１７は、ステップＳ２０にて選択されたコンテンツ画像における視線特徴を、ステップＳ２１にて選択された視線特徴学習モデルを用いて推定することにより視線特徴画像を生成する。視線特徴画像生成部１１７は、生成した視線特徴画像を、コンテンツ特徴量算出部１０５Ａに出力する。
ステップＳ２４：
検出支援装置１００Ａは、コンテンツ特徴量算出部１０５Ａにより、コンテンツ画像におけるコンテンツ特徴量を算出する。コンテンツ特徴量算出部１０５Ａは、ステップＳ２２にて生成された視線特徴画像における、ステップＳ２３にて選択された画像特徴を算出することによりコンテンツ特徴量を算出する。 Step S20:
The detection support device 100A outputs the content image selected by the content image selection unit 101 to the gaze characteristic image generation unit 117.
Step S21:
The detection support device 100A selects a gaze feature learning model by the gaze feature learning model selection unit 116. The gaze feature learning model selection unit 116 selects a gaze feature learning model by referring to the gaze feature learning model DB 118, and outputs the selected gaze feature learning model to the gaze feature image generation unit 117.
Step S22:
Detection support device 100A generates a gaze feature image by gaze feature image generation unit 117. Gaze feature image generation unit 117 generates a gaze feature image by estimating gaze features in the content image selected in step S20 using the gaze feature learning model selected in step S21. Gaze feature image generation unit 117 outputs the generated gaze feature image to content feature amount calculation unit 105A.
Step S24:
The detection support device 100A calculates a content feature amount in the content image by the content feature amount calculation unit 105A. The content feature amount calculation unit 105A calculates the content feature amount by calculating the image feature selected in step S23 in the gaze feature image generated in step S22.

以上説明したように、第２の実施形態の検出支援装置１００Ａでは、コンテンツ特徴量算出部１０５Ａが、コンテンツ画像に、前記コンテンツ画像を視認する視線の特徴を示す視線特徴を適用することにより、コンテンツ特徴量を算出する。これにより、第２の実施形態の検出支援装置１００Ａによれば、上述した効果と同様の効果を奏する。 As described above, in the detection support device 100A of the second embodiment, the content feature amount calculation unit 105A calculates the content feature amount by applying, to the content image, gaze features that indicate the characteristics of the gaze of the person viewing the content image. As a result, the detection support device 100A of the second embodiment achieves the same effects as those described above.

また、第２の実施形態の検出支援装置１００Ａでは、視線特徴画像生成部１１７を更に備える。視線特徴画像生成部１１７は、コンテンツ画像における画素ごとの視線特徴を、前記画素の位置座標に対応させた視線特徴画像を生成する。コンテンツ特徴量算出部１０５Ａは、視線特徴画像を用いてコンテンツ特徴量を算出する。これにより、第２の実施形態の検出支援装置１００Ａによれば、上述した効果と同様の効果を奏する。 The detection support device 100A of the second embodiment further includes a gaze feature image generation unit 117. The gaze feature image generation unit 117 generates a gaze feature image in which the gaze feature of each pixel in the content image corresponds to the position coordinates of the pixel. The content feature amount calculation unit 105A calculates the content feature amount using the gaze feature image. As a result, the detection support device 100A of the second embodiment achieves the same effects as those described above.

また、第２の実施形態の検出支援装置１００Ａでは、視線特徴は、コンテンツ画像と、当該コンテンツ画像における視線特徴の実績とを対応付けた学習データを用いて機械学習を実行することにより生成された視線特徴学習モデルを用いて推定される。これにより、第２の実施形態の検出支援装置１００Ａによれば、コンテンツ画像における熟練者の視線特徴を、過去の実績に基づいてより精度よく推定することが可能である。 In addition, in the detection support device 100A of the second embodiment, the gaze feature is estimated using a gaze feature learning model generated by performing machine learning using learning data that associates a content image with the performance of the gaze feature in the content image. As a result, the detection support device 100A of the second embodiment makes it possible to more accurately estimate the gaze feature of an expert in a content image based on past performance.

図８は、実施形態の学習装置２００の構成例を示すブロック図である。学習装置２００は、視線特徴学習モデルを生成する装置である。
学習装置２００は、例えば、学習用コンテンツ画像取得部２０１と、視線情報取得部２０２と、学習用視線特徴画像生成部２０３と、深層学習部２０４と、学習用コンテンツ画像ＤＢ２０５と、視線情報記憶部２０６と、視線特徴画像記憶部２０７と、視線特徴学習モデルＤＢ２０８とを備える。 8 is a block diagram showing an example of the configuration of a learning device 200 according to an embodiment. The learning device 200 is a device that generates a gaze feature learning model.
The learning device 200 includes, for example, a learning content image acquisition unit 201, a gaze information acquisition unit 202, a learning gaze feature image generation unit 203, a deep learning unit 204, a learning content image DB 205, a gaze information memory unit 206, a gaze feature image memory unit 207, and a gaze feature learning model DB 208.

学習用コンテンツ画像取得部２０１は、学習用コンテンツ画像を取得する。学習用コンテンツ画像は、推定モデルに機械学習を実行する際に用いられる学習データであって、推定モデルの入力層に入力（設定）する情報である。 The learning content image acquisition unit 201 acquires learning content images. The learning content images are learning data used when performing machine learning on an estimation model, and are information to be input (set) to the input layer of the estimation model.

学習用コンテンツ画像取得部２０１は、学習用コンテンツ画像ＤＢ２０５に記憶された複数の学習用コンテンツ画像の中から、学習量に応じてユーザ等により選択された画像の集合を、学習用コンテンツ画像の集合として取得する。ユーザ等による選択の方法は、任意の方法であってよい。学習用コンテンツ画像取得部２０１は、取得した学習用コンテンツ画像を、学習用視線特徴画像生成部２０３に出力する。 The learning content image acquisition unit 201 acquires a set of learning content images, which is a set of images selected by a user or the like according to the amount of learning from among a plurality of learning content images stored in the learning content image DB 205. The method of selection by the user or the like may be any method. The learning content image acquisition unit 201 outputs the acquired learning content images to the learning gaze characteristic image generation unit 203.

視線情報取得部２０２は、視線情報（視線特徴）を取得する。視線情報は、学習用コンテンツ画像に対する熟練者の視線に関する情報であって、例えば、学習用コンテンツ画像を視認する熟練者の視線の時系列変化を示す情報である。視線情報取得部２０２は、例えば、視線情報記憶部２０６に記憶された複数の視覚特徴の中から、学習用コンテンツ画像に対応する視線情報を選択する。視線情報取得部２０２は、取得した視線情報を、学習用視線特徴画像生成部２０３に出力する。 The gaze information acquisition unit 202 acquires gaze information (gaze features). The gaze information is information about the gaze of an expert on a learning content image, and is, for example, information indicating time-series changes in the gaze of an expert viewing a learning content image. The gaze information acquisition unit 202 selects gaze information corresponding to the learning content image from, for example, a plurality of visual features stored in the gaze information storage unit 206. The gaze information acquisition unit 202 outputs the acquired gaze information to the learning gaze feature image generation unit 203.

学習用視線特徴画像生成部２０３は、学習用コンテンツ画像に、視線情報を適用することにより、学習用視線特徴画像を生成する。学習用視線特徴画像を生成する方法は、視線特徴画像生成部１１７が視線特徴画像を生成する方法と同様であるため、その説明を省略する。学習用視線特徴画像生成部２０３は、生成した学習用視線特徴画像を、深層学習部２０４に出力すると共に、視線特徴画像記憶部２０７に記憶させる。 The learning gaze characteristic image generation unit 203 generates a learning gaze characteristic image by applying gaze information to the learning content image. The method of generating the learning gaze characteristic image is similar to the method of generating the gaze characteristic image by the gaze characteristic image generation unit 117, and therefore a description thereof will be omitted. The learning gaze characteristic image generation unit 203 outputs the generated learning gaze characteristic image to the deep learning unit 204 and also stores it in the gaze characteristic image storage unit 207.

深層学習部２０４は、学習用視線特徴画像を学習データとした学習（深層学習）を行うことにより、視線特徴学習モデルを生成する。深層学習部２０４は、生成した視線特徴学習モデルを視線特徴学習モデルＤＢ２０８に記憶させる。 The deep learning unit 204 generates a gaze feature learning model by performing learning (deep learning) using the learning gaze feature image as learning data. The deep learning unit 204 stores the generated gaze feature learning model in the gaze feature learning model DB 208.

図９は、本実施形態による学習装置２００が行なう処理の動作例を示すフローチャートである。図９では、深層学習による視線特徴学習モデルを生成する処理の動作の流れが示される。
ステップＳ５０：
学習装置２００は、学習用コンテンツ画像取得部２０１により、学習用コンテンツ画像を取得する。学習用コンテンツ画像取得部２０１は、学習用コンテンツ画像を多数取得することが好ましい。一般に、学習データ（学習用コンテンツ画像）を数多くバリエーション豊富に揃えることで良い学習効果が得られるためである。
ステップＳ５１：
学習装置２００は、視線情報取得部２０２により、学習用コンテンツ画像に対応する視線情報を取得する。
ステップＳ５２：
学習装置２００は、学習用視線特徴画像生成部２０３により、ステップＳ５０で取得した学習用コンテンツ画像に、ステップＳ５１で取得した視線情報を適用することにより、学習用視線特徴画像を生成する。
ステップＳ５３：
学習装置２００は、ステップＳ５０で取得した学習用コンテンツ画像の全てにおいて、学習用視線特徴画像を生成したか否かを判定する。学習装置２００は、学習用コンテンツ画像の全てにおいて、学習用視線特徴画像を生成した場合には、ステップＳ５４に示す処理を実行する。学習装置２００は、学習用コンテンツ画像の全てにおいて、学習用視線特徴画像を生成していない場合には、ステップＳ５１に示す処理に戻る。
ステップＳ５４：
学習装置２００は、学習用視線特徴画像を学習データとして深層学習を実行することにより、視線特徴学習モデルを生成する。 9 is a flowchart showing an example of the operation of the process performed by the learning device 200 according to the present embodiment. In FIG. 9, the flow of the operation of the process of generating a gaze feature learning model by deep learning is shown.
Step S50:
Study device 200 acquires study content images by study content image acquisition unit 201. It is preferable for study content image acquisition unit 201 to acquire a large number of study content images. This is because, in general, a good learning effect can be obtained by preparing a large number of study data (study content images) with a wide variety of images.
Step S51:
Study device 200 acquires gaze information corresponding to a learning content image by gaze information acquisition unit 202.
Step S52:
The learning device 200 generates a learning gaze characteristic image by applying the gaze information acquired in step S51 to the learning content image acquired in step S50 using the learning gaze characteristic image generation unit 203.
Step S53:
Learning device 200 determines whether or not learning gaze characteristic images have been generated for all of the learning content images acquired in step S50. If learning device 200 has generated learning gaze characteristic images for all of the learning content images, it executes the process shown in step S54. If learning device 200 has not generated learning gaze characteristic images for all of the learning content images, it returns to the process shown in step S51.
Step S54:
The learning device 200 generates a gaze feature learning model by performing deep learning using the learning gaze feature image as learning data.

図１０は、本実施形態による学習装置２００が行なう処理の動作例を示すフローチャートである。図１０では、深層学習による視線特徴学習モデル（以下、単に学習モデルともいう）について転移学習を行うことで新たな学習モデルを生成する処理の動作の流れが示される。
ステップＳ６０：
学習装置２００は、深層学習部２０４により、推定モデルの入力層及び出力層を構成する。推定モデルは、中間層（プーリング層及び畳み込み層）が多層構造の深層学習モデルである。入力層には、学習用コンテンツ画像における各画素の情報が入力される。出力層は、正規化する全結合層である。この出力層は、「１」あるいは「０」との間の小数点の数値を出力する構成となっている。
ステップＳ６１：
深層学習部２０４は、深層学習を用いて新たな学習モデルを生成するか、あるいは既存の汎用的な学習モデルを用いた転移学習により新たな学習モデルを生成するか、を判定する。深層学習部２０４は、例えば、検出支援装置１００Ａによる学習モデルの選択が実行される際に、係る判定を行う。 Fig. 10 is a flowchart showing an example of the operation of the process performed by the learning device 200 according to the present embodiment. Fig. 10 shows the flow of the operation of the process of generating a new learning model by performing transfer learning on a gaze feature learning model by deep learning (hereinafter, also simply referred to as a learning model).
Step S60:
The learning device 200 configures an input layer and an output layer of an estimation model by the deep learning unit 204. The estimation model is a deep learning model with a multi-layer structure of intermediate layers (pooling layer and convolution layer). Information on each pixel in the learning content image is input to the input layer. The output layer is a fully connected layer that performs normalization. This output layer is configured to output a decimal value between "1" and "0".
Step S61:
The deep learning unit 204 determines whether to generate a new learning model using deep learning or to generate a new learning model by transfer learning using an existing general-purpose learning model. The deep learning unit 204 performs such a determination, for example, when the detection assistance device 100A selects a learning model.

例えば、深層学習部２０４は、学習用コンテンツ画像を多量に用意できる状況において、視線特徴学習モデルを生成する場合を考える。この場合、各学習用コンテンツ画像に対して、熟練者の視線情報を取得し、正解コンテンツ集合（学習データ）を生成する。その後、深層学習部２０４は、学習用コンテンツ画像の集合と、正解コンテンツ集合とを用いて、深層学習モデル（推定モデル）を機械学習により学習させ、つまり、新規の学習により視線特徴学習モデルを生成する。
一方、深層学習部２０４は、学習用コンテンツ画像を多量に用意できない状況において、視線特徴学習モデルを生成する場合、すでに深層学習により生成された、他の学習用コンテンツ画像に対応する視線特徴学習モデルを転移学習させることにより、視線特徴学習モデルを生成する。なお、学習用コンテンツ画像が多量に用意できる状況であるか否かは、例えば、学習用コンテンツ画像ＤＢ２０５に記憶された学習用コンテンツ画像の数に応じて、或いはユーザの選択操作に応じて判定される。
深層学習部２０４は、新規の学習により視線特徴学習モデルを生成する場合、ステップＳ６５に示す処理を実行する。学習装置２００は、転移学習により視線特徴学習モデルを生成する場合、ステップＳ６２に示す処理を実行する。 For example, consider a case where the deep learning unit 204 generates a gaze feature learning model in a situation where a large amount of learning content images can be prepared. In this case, gaze information of an expert is acquired for each learning content image, and a set of correct answer contents (learning data) is generated. After that, the deep learning unit 204 trains the deep learning model (estimation model) by machine learning using the set of learning content images and the set of correct answer contents, that is, generates a gaze feature learning model by new learning.
On the other hand, when generating a gaze feature learning model in a situation where a large amount of learning content images cannot be prepared, the deep learning unit 204 generates the gaze feature learning model by transfer learning of a gaze feature learning model corresponding to another learning content image that has already been generated by deep learning. Note that whether or not a large amount of learning content images can be prepared is determined, for example, according to the number of learning content images stored in the learning content image DB 205 or according to a selection operation by the user.
When generating a gaze feature learning model by new learning, the deep learning unit 204 executes the process shown in step S65. When generating a gaze feature learning model by transfer learning, the learning device 200 executes the process shown in step S62.

ステップＳ６２：
深層学習部２０４は、視線特徴学習モデルＤＢ２０８に記憶されている学習モデルの中から、所定の学習モデルを選択する。例えば、深層学習部２０４は、ユーザにより選択された学習用コンテンツ画像の集合に対して、他の学習用コンテンツ画像の集合に対して学習済みの学習モデルを選択する。深層学習部２０４は、選択した学習モデルを転移学習に用いる深層学習モデルとして取得する。
ステップＳ６３：
深層学習部２０４は、ステップＳ６２で転移学習に用いるために読み出した深層学習モデルから、入力層からユーザが指定あるいは予め指定されている中間層（適合層）までを、転移学習モデルとして抽出する。そして、深層学習部２０４は、深層学習モデルから、上記適合層以降の中間層を抽出し、上記転移学習モデルの適合層に接続し、かつ出力層を接続することにより、転移学習用深層学習モデルを構成する。
ステップＳ６４：
深層学習部２０４は、学習対象モデル（上記転移学習用深層学習モデルあるいは上記深層学習モデル）の入力層に、学習用コンテンツ画像における熟練者の視線情報に基づき注目度が高いと判断される画素を入力した場合に、出力層から注目度が高いことを示す「１」に近い数値が出力されるように、各ネットワークの層の重みパラメタの最適化処理を行う。また、深層学習部２０４は、学習対象モデルの入力層に、学習用コンテンツ画像における熟練者の注目度が低いと判断される画素を入力した場合に、出力層から注目度が低いことを示す「０」に近い数値が出力されるよう最適化処理を行う。すなわち、深層学習部２０４は、学習用コンテンツ画像に対し、クラス分類の機械学習を行い、学習結果として、視線特徴画像を生成する。 Step S62:
The deep learning unit 204 selects a predetermined learning model from among the learning models stored in the gaze feature learning model DB 208. For example, for a set of learning content images selected by a user, the deep learning unit 204 selects a learning model that has been trained for other sets of learning content images. The deep learning unit 204 acquires the selected learning model as a deep learning model to be used for transfer learning.
Step S63:
The deep learning unit 204 extracts, from the deep learning model read out in step S62 for use in transfer learning, the layers from the input layer to the intermediate layer (adaptation layer) designated by the user or designated in advance as a transfer learning model. The deep learning unit 204 then extracts the intermediate layers subsequent to the adaptation layer from the deep learning model, connects them to the adaptation layer of the transfer learning model, and connects the output layer to configure a deep learning model for transfer learning.
Step S64:
The deep learning unit 204 performs optimization processing of the weight parameters of the layers of each network so that, when a pixel determined to have a high degree of attention based on gaze information of an expert in a learning content image is input to the input layer of the learning target model (the deep learning model for transfer learning or the deep learning model), a numerical value close to "1" indicating a high degree of attention is output from the output layer. Also, the deep learning unit 204 performs optimization processing so that, when a pixel determined to have a low degree of attention of an expert in a learning content image is input to the input layer of the learning target model, a numerical value close to "0" indicating a low degree of attention is output from the output layer. That is, the deep learning unit 204 performs machine learning of class classification on the learning content image and generates a gaze feature image as a learning result.

このとき、深層学習部２０４は、生成した学習モデルに対し、学習用コンテンツとは異なる学習用コンテンツ画像の集合と、それらの画像に対する熟練者の視線情報である正解データ集合との組を入力し、生成した学習モデルに対して学習テストを行うようにしてもよい。
この場合、深層学習部２０４は、学習用コンテンツ画像の集合を、学習モデルに入力した際、出力層の出力する数値が予め設定した第１閾値以上となり、かつ、出力層の出力する数値が予め設定した第２閾値以下となった場合、この学習モデルを視線特徴学習モデルＤＢ２０８に記憶し、視線特徴学習モデルとする。
一方、深層学習部２０４は、上記学習テストにおいて、熟練者の視線が集中する画素に対して学習モデルの出力層の出力する数値が予め設定した第１閾値未満、あるいは検査員の視線が集中しにくい画素に対して、学習対象モデルの出力層の出力する数値が予め設定した第２閾値以上である場合、この学習モデルを視線特徴学習モデルＤＢ２０８に記憶せずに、学習モデルの再学習を行う。 At this time, the deep learning unit 204 may input a set of learning content images different from the learning content and a set of correct answer data which is gaze information of an expert on those images to the generated learning model, and conduct a learning test on the generated learning model.
In this case, when the deep learning unit 204 inputs a collection of learning content images into the learning model, if the numerical value output from the output layer is greater than or equal to a predetermined first threshold and the numerical value output from the output layer is less than or equal to a predetermined second threshold, the deep learning unit 204 stores this learning model in the gaze feature learning model DB 208 and sets it as the gaze feature learning model.
On the other hand, if, in the above-mentioned learning test, the value output by the output layer of the learning model for a pixel on which the expert's gaze is focused is less than a preset first threshold, or the value output by the output layer of the model to be learned for a pixel on which the inspector's gaze is not easily focused is equal to or greater than a preset second threshold, the deep learning unit 204 does not store this learning model in the gaze feature learning model DB 208 and re-learns the learning model.

ステップＳ６５：
深層学習部２０４は、ステップＳ６４で生成した学習モデルから、多層構造の中間層におけるプーリング層及び畳み込み層の出力パラメタ、活性化関数の種類と出力されるパラメタなどの各々を、学習モデルのパラメタとして抽出する。 Step S65:
From the learning model generated in step S64, the deep learning unit 204 extracts, as parameters of the learning model, the output parameters of the pooling layer and convolution layer in the intermediate layer of the multi-layer structure, the type of activation function and the output parameters, etc.

ステップＳ６６：
深層学習部２０４は、生成した学習モデルと、抽出した学習モデルパラメタとを視線特徴学習モデルＤＢ２０８に記憶させる（登録処理）。 Step S66:
The deep learning unit 204 stores the generated learning model and the extracted learning model parameters in the gaze feature learning model DB 208 (registration process).

＜第３の実施形態＞
次に第３の実施形態について説明する。本実施形態では、視覚特徴と視線特徴とを用いて、コンテンツ特徴量を算出する点において、上述した実施形態と相違する。これにより、本実施形態の検出支援装置１００Ｂは、人間の視知覚の情報処理に類似した処理を施すこと、及び人間の視線情報を利用することができ、コンテンツ画像の見え方について、より詳細な情報を提示することができる。本実施形態においては、上述した実施形態と異なる構成についてのみ説明し、上述した実施形態の構成と同様の構成については同一の符号を付し、特に必要な場合を除いてその説明を省略する。 Third Embodiment
Next, a third embodiment will be described. This embodiment differs from the above-mentioned embodiment in that a content feature amount is calculated using visual features and gaze features. As a result, the detection support device 100B of this embodiment can perform processing similar to information processing of human visual perception and use human gaze information, and can present more detailed information about how the content image appears. In this embodiment, only the configuration different from the above-mentioned embodiment will be described, and the same reference numerals will be used for the configuration similar to the configuration of the above-mentioned embodiment, and the description thereof will be omitted unless particularly necessary.

図１１は、第３の実施形態による検出支援装置１００Ｂの構成例を示すブロック図である。検出支援装置１００Ｂは、例えば、視覚特徴視線特徴算出部１１９と、コンテンツ特徴量算出部１０５Ｂとを備える。 FIG. 11 is a block diagram showing an example of the configuration of a detection support device 100B according to the third embodiment. The detection support device 100B includes, for example, a visual feature/gaze feature calculation unit 119 and a content feature amount calculation unit 105B.

視覚特徴視線特徴算出部１１９は、視覚特徴画像と視線特徴画像とを用いて、視覚特徴視線特徴を算出する。視覚特徴視線特徴は、視覚特徴と視線特徴との双方の度合いを示す情報である。視覚特徴視線特徴算出部１１９は、例えば、視覚特徴と視線特徴との間で演算を行うことにより、視覚特徴視線特徴を算出する。ここでの演算には、例えば、視覚特徴と視線特徴との論理積（ＡＮＤ）、論理和（ＯＲ）、排他的論理和（ＸＯＲ）等の各種論理演算や、Ｗｉｎｎｅｒｓｔａｋｅａｌｌ演算や、ビット演算、四則演算等が含まれる。 The visual feature/gaze feature calculation unit 119 calculates the visual feature/gaze feature using the visual feature image and the gaze feature image. The visual feature/gaze feature is information indicating the degree of both the visual feature and the gaze feature. The visual feature/gaze feature calculation unit 119 calculates the visual feature/gaze feature by, for example, performing an operation between the visual feature and the gaze feature. The operation here includes, for example, various logical operations such as AND, OR, and exclusive OR (XOR) between the visual feature and the gaze feature, as well as winners take all operations, bit operations, and arithmetic operations.

視覚特徴視線特徴算出部１１９は、視覚特徴と視線特徴との間で演算を行う際に、特徴ごと、或いは画素ごとに重みづけを行ってもよい。 When performing calculations between visual features and gaze features, the visual feature/gaze feature calculation unit 119 may weight each feature or each pixel.

視覚特徴視線特徴算出部１１９により用いられる視覚特徴画像は、視覚特徴画像生成部１０３により生成された画像である。視覚特徴視線特徴算出部１１９により用いられる視線特徴画像は、視線特徴画像生成部１１７により生成された画像である。視覚特徴視線特徴算出部１１９は、生成した視覚特徴視線特徴を、コンテンツ特徴量算出部１０５Ａに出力する。 The visual feature image used by the visual feature image calculation unit 119 is an image generated by the visual feature image generation unit 103. The visual feature image used by the visual feature image calculation unit 119 is an image generated by the gaze feature image generation unit 117. The visual feature gaze feature calculation unit 119 outputs the generated visual feature gaze feature to the content feature amount calculation unit 105A.

コンテンツ特徴量算出部１０５Ｂは、視覚特徴視線特徴に画像特徴を適用することによりコンテンツ特徴量を算出する。本実施形態のコンテンツ特徴量は、例えば、視覚特徴視線特徴における画像上の特徴を統計的に示す統計量である。 The content feature amount calculation unit 105B calculates the content feature amount by applying the image feature to the visual feature/gaze feature. The content feature amount in this embodiment is, for example, a statistical amount that statistically indicates the image features in the visual feature/gaze feature.

コンテンツ特徴量は、例えば、画像特徴としてコントラストが選択された場合、コンテンツ画像において、視覚特徴視線特徴がいかなるコントラストを形成しているかを示す値となる。ここで用いられる視覚特徴視線特徴は、視覚特徴視線特徴算出部１１９により生成された情報である。ここで用いられる画像特徴は、画像特徴選択部１０４により選択された画像特徴である。
なお、コンテンツ特徴量は、視覚特徴視線特徴そのものであってもよい。 For example, when contrast is selected as the image feature, the content feature amount is a value indicating what kind of contrast the visual feature/gaze feature forms in the content image. The visual feature/gaze feature used here is information generated by the visual feature/gaze feature calculation unit 119. The image feature used here is the image feature selected by the image feature selection unit 104.
The content feature amount may be the visual feature or the gaze feature itself.

図１２は、本実施形態による検出支援装置１００Ｂが行なう処理の動作例を示すフローチャートである。図１２のステップＳ３１、Ｓ３２、Ｓ３６、及びＳ３８～Ｓ４１の各々に示す処理については、図２のステップＳ１１、Ｓ１２、Ｓ１３、及びＳ１５～Ｓ１８の各々に示す処理と同様であるため、その説明を省略する。また、図１２のステップＳ３３、Ｓ３４に示す処理については、図２のステップＳ２１、Ｓ２２に示す処理と同様であるため、その説明を省略する。 Figure 12 is a flowchart showing an example of the operation of the processing performed by the detection support device 100B according to this embodiment. The processing shown in each of steps S31, S32, S36, and S38 to S41 in Figure 12 is similar to the processing shown in each of steps S11, S12, S13, and S15 to S18 in Figure 2, and therefore a description thereof will be omitted. In addition, the processing shown in steps S33 and S34 in Figure 12 is similar to the processing shown in steps S21 and S22 in Figure 2, and therefore a description thereof will be omitted.

ステップＳ３０：
検出支援装置１００Ｂは、コンテンツ画像選択部１０１により取得したコンテンツ画像を、視覚特徴画像生成部１０３、及び視線特徴画像生成部１１７に出力する。
ステップＳ３５：
検出支援装置１００Ｂは、視覚特徴視線特徴算出部１１９により、Ｓ３２で生成した視覚特徴画像、及びステップＳ３４で生成した視覚特徴画像を用いて、視覚特徴視線特徴を算出し、算出した視覚特徴視線特徴をコンテンツ特徴量算出部１０５Ｂに出力する。
ステップＳ３７：
検出支援装置１００Ｂは、コンテンツ特徴量算出部１０５Ｂにより、ステップＳ３５で算出した視覚特徴視線特徴に、画像特徴を適用することにより、コンテンツ特徴量を算出する。 Step S30:
The detection support device 100B outputs the content image acquired by the content image selection unit 101 to the visual feature image generation unit 103 and the gaze feature image generation unit 117.
Step S35:
The detection support device 100B calculates visual feature gaze features using the visual feature image generated in S32 and the visual feature image generated in step S34, and outputs the calculated visual feature gaze features to the content feature calculation unit 105B.
Step S37:
Detection support device 100B calculates content feature amounts by applying the image features to the visual feature and gaze feature calculated in step S35 using content feature amount calculation unit 105B.

以上説明したように、第３の実施形態の検出支援装置１００Ｂは、視覚特徴視線特徴算出部１１９を備える。視覚特徴視線特徴算出部１１９は、視覚特徴画像と視線特徴画像とを用いて、視覚特徴視線特徴を算出する。これにより、第３の実施形態の検出支援装置１００Ｂによれば、人間の視知覚の情報処理に類似した処理を施すこと、及び人間の視線情報を利用することができ、コンテンツ画像の見え方について、より詳細な情報を提示することができる。 As described above, the detection support device 100B of the third embodiment includes a visual feature gaze feature calculation unit 119. The visual feature gaze feature calculation unit 119 calculates visual feature gaze features using a visual feature image and a gaze feature image. As a result, the detection support device 100B of the third embodiment can perform processing similar to the information processing of human visual perception and utilize human gaze information, making it possible to present more detailed information about how the content image appears.

なお、本発明における検出支援装置１００（１００Ａ、１００Ｂ）の全部または一部の機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませて実行することにより処理を行なってもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。
また、「コンピュータシステム」は、ホームページ提供環境（あるいは表示環境）を備えたＷＷＷシステムも含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ－ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムが送信された場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリ（ＲＡＭ）のように、一定時間プログラムを保持しているものも含むものとする。 In addition, a program for realizing all or part of the functions of the detection assistance device 100 (100A, 100B) of the present invention may be recorded on a computer-readable recording medium, and the program recorded on the recording medium may be read into a computer system and executed to perform processing. Note that the term "computer system" here includes hardware such as an OS and peripheral devices.
Additionally, "computer system" includes a WWW system equipped with a homepage providing environment (or display environment). Furthermore, "computer-readable recording medium" refers to portable media such as flexible disks, optical magnetic disks, ROMs, and CD-ROMs, as well as storage devices such as hard disks built into computer systems. Furthermore, "computer-readable recording medium" also includes devices that hold a program for a certain period of time, such as volatile memory (RAM) within a computer system that becomes a server or client when a program is transmitted via a network such as the Internet or a communication line such as a telephone line.

また、上記プログラムは、このプログラムを記憶装置等に格納したコンピュータシステムから、伝送媒体を介して、あるいは、伝送媒体中の伝送波により他のコンピュータシステムに伝送されてもよい。ここで、プログラムを伝送する「伝送媒体」は、インターネット等のネットワーク（通信網）や電話回線等の通信回線（通信線）のように情報を伝送する機能を有する媒体のことをいう。また、上記プログラムは、前述した機能の一部を実現するためのものであってもよい。さらに、前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるもの、いわゆる差分ファイル（差分プログラム）であってもよい。 The above program may also be transmitted from a computer system in which the program is stored in a storage device or the like to another computer system via a transmission medium, or by transmission waves in the transmission medium. Here, the "transmission medium" that transmits the program refers to a medium that has the function of transmitting information, such as a network (communication network) such as the Internet or a communication line (communication line) such as a telephone line. The above program may also be one that realizes part of the above-mentioned functions. Furthermore, it may be a so-called difference file (difference program) that can realize the above-mentioned functions in combination with a program already recorded in the computer system.

本発明のいくつかの実施形態を説明したが、これらの実施形態は、例として提示したものであり、発明の範囲を限定することは意図していない。これら実施形態は、その他の様々な形態で実施されることが可能であり、発明の要旨を逸脱しない範囲で、種々の省略、置き換え、変更を行うことができる。これら実施形態やその変形は、発明の範囲や要旨に含まれると同様に、特許請求の範囲に記載された発明とその均等の範囲に含まれるものである。 Although several embodiments of the present invention have been described, these embodiments are presented as examples and are not intended to limit the scope of the invention. These embodiments can be implemented in various other forms, and various omissions, substitutions, and modifications can be made without departing from the gist of the invention. These embodiments and their modifications are within the scope of the invention and its equivalents as set forth in the claims, as well as the scope and gist of the invention.

１００、１００Ａ、１００Ｂ…検出支援装置
１０１…コンテンツ画像選択部
１０２…視覚特徴選択部
１０３…視覚特徴画像生成部
１０４…画像特徴選択部
１０５、１０５Ａ、１０５Ｂ…コンテンツ特徴量算出部
１０６…解析方法選択部
１０７…解析部
１０８…コンテンツ画像ＤＢ
１０９…視覚特徴ＤＢ
１１０…視覚特徴画像記憶部
１１１…画像特徴ＤＢ
１１２…コンテンツ特徴量記憶部
１１３…解析方法ＤＢ
１１４…解析結果記憶部
１１５…解析結果出力部
１１６…視線特徴学習モデル選択部
１１７…視線特徴画像生成部
１１８…視線特徴学習モデルＤＢ
１１９…視覚特徴視線特徴算出部 Reference Signs List 100, 100A, 100B... Detection support device 101... Content image selection unit 102... Visual feature selection unit 103... Visual feature image generation unit 104... Image feature selection unit 105, 105A, 105B... Content feature amount calculation unit 106... Analysis method selection unit 107... Analysis unit 108... Content image DB
109...Visual feature DB
110: Visual feature image storage unit 111: Image feature DB
112: Content feature amount storage unit 113: Analysis method DB
114: Analysis result storage unit 115: Analysis result output unit 116: Gaze feature learning model selection unit 117: Gaze feature image generation unit 118: Gaze feature learning model DB
119...visual feature and gaze feature calculation unit

Claims

A detection support device that supports detection of the presence or absence of an unexpected pattern due to continuity of a pattern in a content image generated by repeatedly arranging a predetermined pattern, the detection support device comprising:
a content image acquisition unit for acquiring the content image;
a gaze feature image generating unit that estimates gaze features that are characteristics of a gaze when detecting the pattern from the content image by using a gaze feature learning model, and generates a gaze feature image that indicates the estimated gaze features for each pixel in the content image;
an analysis result output unit that displays the gaze feature image;
Equipped with
The gaze feature learning model is a trained model that has learned a correspondence between the content image for learning and the gaze feature of a person who detected the pattern from the content image for learning.
Detection aids.

a visual feature image generating unit for generating a visual feature image representing a visual feature, which is a feature of brightness or color, from the content image;
a visual feature gaze feature calculation unit that calculates the visual feature and the gaze feature using the visual feature image and the gaze feature image, and calculates a result of the calculation as a visual feature gaze feature;
Further equipped with
The analysis result output unit outputs the visual feature/gaze feature as a gaze and visual feature when detecting the pattern in the content image.
The detection assistance device according to claim 1 .

the visual feature image generating unit generates the visual feature image by further using a recognition index representing an ease of recognition of the content image by a human eye as the visual feature.
The detection assistance device according to claim 2 .

The recognition indices include at least one of a visual attention model, a gaze prediction model, a saliency model, and a saliency model.
The detection assistance device according to claim 3 .

A learning device that generates a gaze feature learning model to be used by a detection support device that supports detection of the presence or absence of an unexpected pattern due to the continuity of a pattern in a content image generated by repeatedly arranging a predetermined pattern, the device comprising:
the gaze feature learning model is a trained model that has learned a correspondence relationship between the content image for learning and a gaze feature that is a feature of the gaze of a person who detected the pattern from the content image for learning,
a deep learning unit configured to be able to select either deep learning or transfer learning using an existing learning model as a method for generating the gaze feature learning model, and to generate the gaze feature learning model using the selected method;
A learning device comprising:

a learning content image database for storing the learning content image;
The deep learning unit determines whether to select deep learning or transfer learning using an existing learning model depending on the number of the content images for learning stored in the learning content image database.
The learning device according to claim 5 .

A detection support method performed by a computer, which is a detection support device, for supporting detection of the presence or absence of an unexpected pattern due to the continuity of a pattern in a content image generated by repeatedly arranging a predetermined pattern, the method comprising:
a content image acquisition unit acquires the content image;
a gaze feature image generating unit that estimates gaze features that are characteristics of a gaze when detecting the pattern from the content image using a gaze feature learning model, and generates a gaze feature image that indicates the estimated gaze features for each pixel in the content image;
an analysis result output unit displays the gaze feature image;
The gaze feature learning model is a trained model that has learned a correspondence between the content image for learning and the gaze feature of a person who detected the pattern from the content image for learning.
Detection aid methods.

A program for causing a computer to function as the detection support device described in claim 1.

A learning method performed by a computer, which is a learning device, for generating a gaze feature learning model used by a detection support device that supports detection of the presence or absence of an unexpected pattern due to the continuity of a pattern in a content image generated by repeatedly arranging a predetermined pattern, the method comprising:
the gaze feature learning model is a trained model that has learned a correspondence relationship between the content image for learning and a gaze feature that is a feature of the gaze of a person who detected the pattern from the content image for learning,
The deep learning unit is configured to be able to select either deep learning or transfer learning using an existing learning model as a method for generating the gaze feature learning model, and generates the gaze feature learning model using the selected method.
How to learn.

A program for causing a computer to function as the learning device described in claim 5.