JP7441107B2

JP7441107B2 - Learning device, representative image extraction device and program

Info

Publication number: JP7441107B2
Application number: JP2020075676A
Authority: JP
Inventors: 桃子前澤; 貴裕望月; 伶遠藤
Original assignee: Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2020-04-21
Filing date: 2020-04-21
Publication date: 2024-02-29
Anticipated expiration: 2040-04-21
Also published as: JP2021174117A

Description

本発明は、映像から代表画像を抽出する映像解析分野に用いる学習装置、代表画像抽出装置及びプログラムに関する。 The present invention relates to a learning device, a representative image extraction device, and a program used in the field of video analysis for extracting representative images from videos.

従来、放送局では、視聴者の接触率向上を目的として、番組ＨＰ（ホームページ）の充実化が進んでいる。番組ＨＰには、閲覧者に番組内容を大まかに把握してもらうために、番組映像から抽出した複数の代表画像を掲載するケースが多い。 BACKGROUND ART Conventionally, broadcasting stations have been enhancing their program homepages (homepages) with the aim of improving viewer contact rates. Program homepages often display a plurality of representative images extracted from program videos in order to give viewers a rough idea of the program content.

しかしながら、番組映像から代表画像を抽出するには大きな労力が必要である。このため、番組映像から代表画像を自動的に抽出する手法が提案されている（例えば、特許文献１、非特許文献１を参照）。 However, a lot of effort is required to extract representative images from program videos. For this reason, methods have been proposed for automatically extracting representative images from program videos (see, for example, Patent Document 1 and Non-Patent Document 1).

例えば特許文献１の手法は、画像集合から人物の顔、シーン及びオブジェクトの判別結果、ＧＰＳ（Global Positioning System：全地球無線測位システム）情報並びに類似度に基づいて、画像間の関連度を算出し、関連度及び撮影日に基づいて、代表画像を抽出するものである。 For example, the method disclosed in Patent Document 1 calculates the degree of association between images based on the results of identifying human faces, scenes, and objects from a set of images, GPS (Global Positioning System) information, and similarity. , a representative image is extracted based on the degree of association and the date of photography.

また、非特許文献１の手法は、事前学習済みのGoogLeNetのニューラルネットワークを用いて、画像に対する芸術性の高低を判定するものである。 Furthermore, the method disclosed in Non-Patent Document 1 uses a GoogleLeNet neural network that has been trained in advance to determine the level of artistic quality of an image.

特許第６１４９０１５号公報Patent No. 6149015

Xin Jin, et al.，“ILGNet：Inception modules with connected local and global features for efficient image aesthetic quality classification using domain adaptation.”，IET Computer Vision 13.2 (2018)：206-212.Xin Jin, et al., “ILGNet: Inception modules with connected local and global features for efficient image aesthetic quality classification using domain adaptation.” IET Computer Vision 13.2 (2018): 206-212.

しかしながら、番組映像から代表画像を抽出する際に、前述の特許文献１の手法では、ＧＰＳ情報、撮影日等の特殊な情報を必要とする。また、画像に含まれる物体、顔等の一部の要素のみに着目しており、画像全体の芸術性を考慮していない。また、前述の非特許文献１の手法では、番組制作のノウハウを考慮していない。 However, when extracting a representative image from a program video, the method disclosed in Patent Document 1 described above requires special information such as GPS information and shooting date. Furthermore, it focuses only on some elements such as objects and faces included in the image, and does not take into account the artistic quality of the image as a whole. Furthermore, the method of Non-Patent Document 1 mentioned above does not take into account program production know-how.

このため、代表画像を用いて作成した番組ＨＰは、必ずしも有効なものにはなっておらず、閲覧者に対して番組内容を効果的に提示することができない場合がある、という問題があった。 For this reason, program homepages created using representative images are not necessarily effective, and there have been problems in that they may not be able to effectively present the program content to viewers. .

そこで、本発明は前記課題を解決するためになされたものであり、その目的は、番組映像から、番組制作のノウハウを考慮した代表画像を抽出可能な学習装置、代表画像抽出装置及びプログラムを提供することにある。 SUMMARY OF THE INVENTION The present invention has been made to solve the above problems, and its purpose is to provide a learning device, a representative image extraction device, and a program that can extract representative images from program videos in consideration of program production know-how. It's about doing.

前記課題を解決するために、請求項１の学習装置は、ニューラルネットワークを学習する学習装置において、学習用番組映像をサンプリングして得られるフレーム画像を番組画像とし、前記番組画像に付与された複数段階のうちのいずれかの段階のスコアを第１正解スコアとし、所定画像に付与された複数段階のうちのいずれかの段階のスコアを第２正解スコアとし、前記ニューラルネットワークを、前記番組画像及び前記所定画像が交互に入力され、１次元のスコアが出力されるモデルとして、前記番組画像及び前記第１正解スコアからなる番組学習データ、並びに前記所定画像及び前記第２正解スコアからなる所定学習データが格納されたメモリと、前記メモリから前記番組学習データ及び前記所定学習データを読み出し、前記番組学習データ及び前記所定学習データを用いて、前記ニューラルネットワークを学習する学習部と、を備え、前記学習部が、前記ニューラルネットワークを用いて、前記番組学習データに含まれる前記番組画像から前記番組画像の１次元のスコアを第１スコアとして算出し、前記ニューラルネットワークを用いて、前記所定学習データに含まれる前記所定画像から前記所定画像の１次元のスコアを第２スコアとして算出するニューラルネットワーク部と、前記ニューラルネットワーク部により算出された前記第１スコアと前記番組学習データに含まれる前記第１正解スコアとの間の誤差を第１誤差として算出し、前記第２スコアと前記所定学習データに含まれる前記第２正解スコアとの間の誤差を第２誤差として算出する誤差算出部と、前記誤差算出部により算出された前記第１誤差及び前記第２誤差の和が小さくなるように、前記ニューラルネットワークのパラメータを更新するパラメータ更新部と、を備えたことを特徴とする。 In order to solve the problem, a learning device according to a first aspect of the present invention is a learning device for learning a neural network, in which a frame image obtained by sampling a learning program video is taken as a program image, and a plurality of frames added to the program image are provided. The score of any one of the stages is set as a first correct score, the score of any one of the plural stages given to a predetermined image is set as a second correct score, and the neural network is set to the program image and As a model in which the predetermined images are input alternately and a one-dimensional score is output, program learning data consisting of the program image and the first correct score, and predetermined learning data consisting of the predetermined image and the second correct score. and a learning unit that reads the program learning data and the predetermined learning data from the memory and learns the neural network using the program learning data and the predetermined learning data, using the neural network to calculate a one-dimensional score of the program image as a first score from the program image included in the program learning data; a neural network unit that calculates a one-dimensional score of the predetermined image as a second score from the predetermined image, and the first score calculated by the neural network unit and the first correct score included in the program learning data. an error calculation unit that calculates an error between the second score and the second correct score included in the predetermined learning data as a second error; The method further includes a parameter updating section that updates parameters of the neural network so that the sum of the first error and the second error calculated by the section becomes smaller.

また、請求項２の学習装置は、請求項１に記載の学習装置において、さらに、前記番組学習データを生成する番組学習データ生成部を備え、前記番組学習データ生成部が、前記学習用番組映像を前記番組画像にサンプリングするサンプリング処理部と、前記学習用番組映像に対応した番組のホームページのＵＲＬへアクセスし、前記番組の静止画をダウンロードするダウンロード処理部と、前記サンプリング処理部によりサンプリングされた前記番組画像について、前記ダウンロード処理部によりダウンロードされた前記静止画との間の類似度を算出する類似度算出部と、前記類似度算出部により算出された前記類似度に基づいて、前記番組画像に対して前記第１正解スコアを付与し、前記番組画像及び前記第１正解スコアからなる前記番組学習データを前記メモリに格納する第１正解スコア付与部と、を備えたことを特徴とする。 The learning device according to claim 2 is the learning device according to claim 1, further comprising a program learning data generation unit that generates the program learning data, and wherein the program learning data generation unit a sampling processing unit that samples the program image into the program image; a download processing unit that accesses the URL of the home page of the program corresponding to the learning program video and downloads a still image of the program; a similarity calculation unit that calculates the similarity between the program image and the still image downloaded by the download processing unit; and a similarity calculation unit that calculates the similarity between the program image and the still image downloaded by the download processing unit; and a first correct score assigning unit that assigns the first correct score to the program image and stores the program learning data including the program image and the first correct score in the memory.

また、請求項３の学習装置は、請求項２に記載の学習装置において、前記所定学習データを生成する所定学習データ生成部を備え、前記所定学習データ生成部が、前記所定画像、及び前記所定画像に対して予め付与された複数段階のうちのいずれかの段階のラベルからなるオープンデータを入力し、前記ラベルを前記第２正解スコアに変換することで、前記所定画像に対して前記第２正解スコアを付与し、前記所定画像及び前記第２正解スコアからなる前記所定学習データを前記メモリに格納する第２正解スコア付与部を備えたことを特徴とする。 The learning device according to claim 3 is the learning device according to claim 2, further comprising a predetermined learning data generation section that generates the predetermined learning data, and wherein the predetermined learning data generation section generates the predetermined image and the predetermined learning data. By inputting open data consisting of a label of one of a plurality of stages assigned to an image in advance, and converting the label into the second correct answer score, the second correct score is applied to the predetermined image. The present invention is characterized by comprising a second correct score assigning unit that assigns a correct score and stores the predetermined learning data including the predetermined image and the second correct score in the memory.

また、請求項４の学習装置は、請求項１から３までのいずれか一項に記載の学習装置において、前記番組学習データの数をＡ個（Ａは正の整数）、前記所定学習データの数をＢ個（Ｂは正の整数）、Ａ＜Ｂとし、Ｂ個からＡ個を減算した結果を（Ｂ－Ａ）として、前記学習部が、Ａ個の前記番組学習データ、及び、前記所定学習データに対する前記番組学習データの不足分である（Ｂ－Ａ）個のデータであって、Ａ個の前記番組学習データのいずれかまたは全てを用いて補充された前記番組学習データ、並びにＢ個の前記所定学習データを用いて、前記ニューラルネットワークを学習する、ことを特徴とする。 The learning device according to claim 4 is the learning device according to any one of claims 1 to 3, wherein the number of the program learning data is A (A is a positive integer), and the number of the predetermined learning data is A (A is a positive integer). When the number is B (B is a positive integer), A<B, and the result of subtracting A from B is (B-A), the learning section obtains A pieces of the program learning data and the program learning data. (B-A) pieces of data that are insufficient in the program learning data with respect to predetermined learning data, the program learning data supplemented using any or all of the A pieces of the program learning data, and B The method is characterized in that the neural network is trained using the predetermined learning data.

さらに、請求項５の代表画像抽出装置は、番組映像から代表画像を抽出する代表画像抽出装置において、前記番組映像をフレーム画像にサンプリングし、前記フレーム画像を番組画像として出力するサンプリング処理部と、請求項１から４までのいずれか一項の学習装置により学習されたニューラルネットワークを用いて、前記サンプリング処理部により出力された前記番組画像から、前記番組画像の１次元のスコアを算出するスコア算出部と、前記スコア算出部により算出された前記スコアに基づいて、前記サンプリング処理部により前記番組映像がサンプリングされて出力された全ての前記番組画像から、前記代表画像を選択する選択部と、を備えたことを特徴とする。 Further, a representative image extraction device according to a fifth aspect of the present invention is a representative image extraction device that extracts a representative image from a program video, and further includes a sampling processing section that samples the program video into a frame image and outputs the frame image as a program image. A score calculation that calculates a one-dimensional score of the program image from the program image output by the sampling processing unit using a neural network trained by the learning device according to any one of claims 1 to 4. and a selection unit that selects the representative image from all the program images in which the program video is sampled and output by the sampling processing unit, based on the score calculated by the score calculation unit. It is characterized by being equipped.

さらに、請求項６のプログラムは、コンピュータを、請求項１から４までのいずれか一項に記載の学習装置として機能させることを特徴とする。 Furthermore, the program according to claim 6 causes a computer to function as the learning device according to any one of claims 1 to 4.

また、請求項７のプログラムは、コンピュータを、請求項５に記載の代表画像抽出装置として機能させることを特徴とする。 Moreover, the program according to claim 7 is characterized in that it causes a computer to function as the representative image extraction device according to claim 5.

以上のように、本発明によれば、番組映像から、番組制作のノウハウを考慮した代表画像を抽出することができる。 As described above, according to the present invention, a representative image can be extracted from a program video in consideration of program production know-how.

本発明の実施形態による学習装置の構成例を示すブロック図である。1 is a block diagram showing a configuration example of a learning device according to an embodiment of the present invention. FIG. 番組学習データ生成部の構成例を示すブロック図である。FIG. 2 is a block diagram showing a configuration example of a program learning data generation section. 番組学習データ生成部の処理例を示すフローチャートである。7 is a flowchart illustrating an example of processing by a program learning data generation unit. 番組学習データ生成部の他の構成例を示すフローチャートである。It is a flowchart which shows another example of a structure of a program learning data generation part. 芸術性学習データ生成部の構成例を示すブロック図である。FIG. 2 is a block diagram showing a configuration example of an artistic learning data generation section. 学習部の構成例を示すブロック図である。FIG. 2 is a block diagram showing a configuration example of a learning section. 学習部の処理例を示すフローチャートである。7 is a flowchart illustrating an example of processing by a learning unit. ＮＮ部の構成例を示すブロック図である。FIG. 2 is a block diagram showing a configuration example of a NN section. ＮＮ部の具体的な構成例を説明する図である。FIG. 3 is a diagram illustrating a specific example of the configuration of the NN section. 本発明の実施形態による代表画像抽出装置の構成例を示すブロック図である。1 is a block diagram showing a configuration example of a representative image extraction device according to an embodiment of the present invention. 代表画像抽出装置を用いた第１実施例の番組ＨＰ作成システムを説明する図である。FIG. 1 is a diagram illustrating a program homepage creation system of a first embodiment using a representative image extraction device. 代表画像抽出装置を用いた第２実施例の番組ＤＶＤ販売ＨＰ作成システムを説明する図である。It is a figure explaining the program DVD sales HP creation system of 2nd Example using a representative image extraction device. 本発明の実施形態における学習処理の効果を説明する図である。It is a figure explaining the effect of learning processing in an embodiment of the present invention.

以下、本発明を実施するための形態について図面を用いて詳細に説明する。
〔学習装置〕
まず、本発明の実施形態による学習装置について説明する。図１は、本発明の実施形態による学習装置の構成例を示すブロック図である。 DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, embodiments for carrying out the present invention will be described in detail using the drawings.
[Learning device]
First, a learning device according to an embodiment of the present invention will be described. FIG. 1 is a block diagram showing a configuration example of a learning device according to an embodiment of the present invention.

この学習装置１は、番組学習データ生成部１０、メモリ１１，１３、芸術性学習データ生成部１２及び学習部１４を備えている。学習装置１は、学習用番組映像及び芸術性評価オープンデータ等を用いて、後述する代表画像抽出装置２が番組映像から番組制作のノウハウを考慮した代表画像を抽出できるように、代表画像抽出装置２が使用するニューラルネットワークを学習する装置である。 This learning device 1 includes a program learning data generation section 10, memories 11 and 13, an artistic learning data generation section 12, and a learning section 14. The learning device 1 is a representative image extracting device so that a representative image extracting device 2 (described later) can extract representative images from program videos in consideration of program production know-how by using learning program videos, artistic evaluation open data, etc. 2 is a device for learning the neural network used.

番組学習データ生成部１０は、学習用番組映像、及び当該学習用番組映像の番組に対応した番組ＨＰのＵＲＬ（Uniform Resource Locator）を入力する。そして、番組学習データ生成部１０は、学習用番組映像をサンプリングして得られた複数のフレーム画像（以下、「番組画像」という。）のそれぞれについて、番組ＨＰのＵＲＬへアクセスして取得した複数の静止画のそれぞれとの間の類似度を算出する。 The program learning data generation unit 10 inputs a learning program video and a URL (Uniform Resource Locator) of a program HP corresponding to the program of the learning program video. Then, the program learning data generation unit 10 generates a plurality of frame images obtained by accessing the URL of the program HP for each of the plurality of frame images (hereinafter referred to as "program images") obtained by sampling the learning program video. The degree of similarity between each of the still images is calculated.

番組学習データ生成部１０は、類似度に基づいて、番組画像に対して正解スコアを付与し、番組画像及び正解スコアからなる番組学習データをメモリ１１に格納する。 The program learning data generation unit 10 assigns a correct score to the program image based on the degree of similarity, and stores the program learning data including the program image and the correct score in the memory 11.

これにより、メモリ１１には、学習用番組映像をサンプリングして得られた全ての番組画像について、番組画像及び正解スコアからなる番組学習データが格納される。 As a result, program learning data consisting of program images and correct scores is stored in the memory 11 for all program images obtained by sampling the learning program video.

ここで、番組ＨＰに掲載されている静止画は、番組制作スタッフがそのノウハウを生かすことで、番組映像から選択した代表画面であるといえる。このため、番組画像と静止画との間の類似度は、番組制作スタッフのノウハウが反映された値となり、結果として、番組画像の正解スコアは、番組制作スタッフのノウハウが反映された値となる。 Here, the still images posted on the program homepage can be said to be representative screens selected from the program video by the program production staff utilizing their know-how. Therefore, the similarity between the program image and the still image is a value that reflects the know-how of the program production staff, and as a result, the correct score for the program image is a value that reflects the know-how of the program production staff. .

芸術性学習データ生成部１２は、芸術性評価オープンデータを順次入力する。この芸術性評価オープンデータは、一切の制限を受けることなく、全ての人が入手し利用することが可能なデータであり、画像に対し、芸術性の観点で評価された正解ラベルが付与されている。芸術性評価オープンデータは、画像（以下、「芸術性評価画像」という。）、及び芸術性評価画像に対して予め付与された複数段階の評価が反映された正解ラベル（芸術性の高低を示すラベル）から構成される。 The artistic learning data generation unit 12 sequentially inputs artistic evaluation open data. This artistic evaluation open data is data that anyone can obtain and use without any restrictions, and images are given correct labels evaluated from an artistic perspective. There is. Artistic evaluation open data consists of images (hereinafter referred to as ``artistic evaluation images'') and correct answer labels (indicating high or low artistic quality) that reflect the multi-level evaluation given in advance to the artistic evaluation images. label).

芸術性学習データ生成部１２は、入力した芸術性評価オープンデータのそれぞれについて、芸術性評価オープンデータに含まれる正解ラベルを、所定の規則に従って正解スコアに変換する。そして、芸術性学習データ生成部１２は、芸術性評価画像及び正解スコアからなる芸術性学習データをメモリ１３に格納する。正解ラベルは、前述のとおり高低を示すラベルであり、正解スコアは数値である。 For each piece of input artistic evaluation open data, the artistic learning data generation unit 12 converts the correct label included in the artistic evaluation open data into a correct answer score according to a predetermined rule. Then, the artistic learning data generation unit 12 stores artistic learning data consisting of the artistic evaluation image and the correct score in the memory 13. The correct answer label is a label indicating high or low as described above, and the correct answer score is a numerical value.

これにより、メモリ１３には、複数の芸術性評価オープンデータについて、芸術性評価画像及び正解スコアからなる芸術性学習データが格納される。 As a result, the memory 13 stores artistic learning data consisting of artistic evaluation images and correct scores for a plurality of artistic evaluation open data.

学習部１４は、学習対象のニューラルネットワークを備えている。学習部１４は、メモリ１１から、番組画像及び正解スコアからなる番組学習データを読み出すと共に、メモリ１３から、芸術性評価画像及び正解スコアからなる芸術性学習データを読み出す。そして、学習部１４は、番組学習データ及び芸術性学習データを用いて、ニューラルネットワークを学習する。このニューラルネットワークは、番組画像及び芸術性評価画像が交互に入力され、１次元のスコア（重要度）が出力されるモデルである。 The learning unit 14 includes a neural network to be learned. The learning unit 14 reads program learning data consisting of program images and correct scores from the memory 11, and reads artistic learning data consisting of artistic evaluation images and correct scores from the memory 13. The learning unit 14 then uses the program learning data and the artistic learning data to learn the neural network. This neural network is a model in which program images and artistic evaluation images are alternately input, and a one-dimensional score (importance) is output.

これにより、ニューラルネットワークに用いる最適なパラメータ（重み係数等）が得られる。このパラメータは、番組制作スタッフのノウハウが反映された値であり、後述する図１０に示す代表画像抽出装置２に備えたニューラルネットワークに用いられる。 As a result, optimal parameters (weighting coefficients, etc.) for use in the neural network can be obtained. This parameter is a value that reflects the know-how of the program production staff, and is used in the neural network provided in the representative image extraction device 2 shown in FIG. 10, which will be described later.

（番組学習データ生成部１０）
次に、図１に示した番組学習データ生成部１０について詳細に説明する。図２は、番組学習データ生成部１０の構成例を示すブロック図であり、図３は、番組学習データ生成部１０の処理例を示すフローチャートである。この番組学習データ生成部１０は、サンプリング処理部２０、ダウンロード処理部２１、類似度算出部２２及び正解スコア付与部２３を備えている。 (Program learning data generation unit 10)
Next, the program learning data generation section 10 shown in FIG. 1 will be explained in detail. FIG. 2 is a block diagram showing a configuration example of the program learning data generation section 10, and FIG. 3 is a flowchart showing an example of processing of the program learning data generation section 10. The program learning data generation section 10 includes a sampling processing section 20, a download processing section 21, a similarity calculation section 22, and a correct answer scoring section 23.

番組学習データ生成部１０は、ハードディスクレコーダー等に蓄積された学習用番組映像、及び当該学習用番組映像の番組に対応した番組ＨＰのＵＲＬを入力する（ステップＳ３０１）。サンプリング処理部２０は、学習用番組映像を入力し、ダウンロード処理部２１は、対応する番組ＨＰのＵＲＬを入力する。 The program learning data generation unit 10 inputs the learning program video stored in a hard disk recorder or the like and the URL of the program HP corresponding to the program of the learning program video (step S301). The sampling processing section 20 inputs the learning program video, and the download processing section 21 inputs the URL of the corresponding program homepage.

サンプリング処理部２０は、学習用番組映像から一定間隔で、フレーム画像である番組画像をサンプリングする（ステップＳ３０２）。サンプリングされた全ての番組画像をＰ₁，・・・，Ｐ_Nとする。Ｎは２以上の整数である。サンプリング処理部２０は、番組画像Ｐ₁，・・・，Ｐ_Nを類似度算出部２２に出力する。 The sampling processing unit 20 samples program images, which are frame images, from the learning program video at regular intervals (step S302). Let all sampled program images be P ₁ , . . . , P _N . N is an integer of 2 or more. The sampling processing section 20 outputs the program images P ₁ , . . . , P _N to the similarity calculation section 22 .

ダウンロード処理部２１は、番組ＨＰのＵＲＬへアクセスし、番組ＨＰに掲載されている全ての静止画をダウンロードする（ステップＳ３０３）。ダウンロードされた全ての静止画をＰ’₁，・・・，Ｐ’_Mとする。Ｍは２以上の整数である。ダウンロード処理部２１は、静止画Ｐ’₁，・・・，Ｐ’_Mを類似度算出部２２に出力する。 The download processing unit 21 accesses the URL of the program HP and downloads all still images posted on the program HP (step S303). Let all downloaded still images be P' ₁ , . . . , P' _M. M is an integer of 2 or more. The download processing unit 21 outputs the still images P′ ₁ , . . . , P′ _M to the similarity calculation unit 22 .

類似度算出部２２は、サンプリング処理部２０から番組画像Ｐ₁，・・・，Ｐ_Nを入力すると共に、ダウンロード処理部２１から静止画Ｐ’₁，・・・，Ｐ’_Mを入力する。そして、類似度算出部２２は、番組画像Ｐ_nについて、静止画Ｐ’_mとの間の類似度Ｓ_n,mを算出する（ステップＳ３０４）。ｎ＝１，・・・，Ｎであり、ｍ＝１，・・・，Ｍである。 The similarity calculation unit 22 receives program images P ₁ , . . . , P _N from the sampling processing unit 20 and still _images P′ ₁ , . Then, the similarity calculation unit 22 calculates the similarity S n, _m between the program image P _n and the still image P' _m (step S304). n=1,...,N, and m=1,...,M.

類似度算出部２２は、番組画像Ｐ_n及び当該番組画像Ｐ_nの類似度Ｓ_n,m（Ｓ_n,1，・・・，Ｓ_n,M）を正解スコア付与部２３に出力する。 The similarity calculating unit 22 outputs the program image P _n and the similarity S _n,m (S _n,1 , . . . , S _n,M ) of the program image P _n to the correct score assigning unit 23 .

正解スコア付与部２３は、類似度算出部２２から番組画像Ｐ_n及び当該番組画像Ｐ_nの類似度Ｓ_n,m（Ｓ_n,1，・・・，Ｓ_n,M）を入力する。そして、正解スコア付与部２３は、番組画像Ｐ_nについて、類似度Ｓ_n,1，・・・，Ｓ_n,Mのうちの最大値Ｂ＝ｍａｘ_mＳ_n,mを求める（ステップＳ３０５）。 The correct score assigning unit 23 inputs the program image P _n and the similarity S _n,m (S _n,1 , . . . , S _n,M ) of the program image P _n from the similarity calculation unit 22 . Then, the correct score assigning unit 23 obtains the maximum value B=max _m S _n ,m of the degrees of similarity S _n,1 , . . . , S _n, _M for the program image P n (step S305).

正解スコア付与部２３は、最大値Ｂが予め設定された閾値以上であるか否かを判定する（ステップＳ３０６）。正解スコア付与部２３は、ステップＳ３０６において、最大値Ｂが閾値以上であると判定した場合（ステップＳ３０６：Ｙ）、番組画像Ｐ_nに対し、正例の正解スコア（＝１）を付与する（ステップＳ３０７）。 The correct score assigning unit 23 determines whether the maximum value B is greater than or equal to a preset threshold (step S306). If it is determined in step S306 that the maximum value B is greater than or equal to the threshold (step S306: Y), the correct score assigning unit 23 assigns a correct score (=1) of a positive example to the program image P _n ( Step S307).

一方、正解スコア付与部２３は、ステップＳ３０６において、最大値Ｂが閾値以上でないと判定した場合（ステップＳ３０６：Ｎ）、番組画像Ｐ_nに対し、負例の正解スコア（＝０）を付与する（ステップＳ３０８）。 On the other hand, if the correct score assigning unit 23 determines in step S306 that the maximum value B is not equal to or greater than the threshold (step S306: N), it assigns a negative example correct score (=0) to the program image P _n . (Step S308).

尚、正解スコア付与部２３は、番組画像Ｐ_nに対し、０～１の範囲のスコアにおいて、２段階の正解スコア（正例（＝１）または負例（＝０））を付与するようにしたが、３段階以上の正解スコアを付与するようにしてもよい。例えば、３段階の正解スコアの場合、正解スコア付与部２３は、最大値Ｂを閾値処理することで、番組画像Ｐ_nに対し、３段階の正解スコア（例えば０．０，０．５，１．０）のうちのいずれかを付与する。 The correct score assigning unit 23 assigns a two-level correct score (positive example (=1) or negative example (=0)) to the program image P _n in a score range of 0 to 1. However, three or more levels of correct scores may be given. For example, in the case of a three-level correct score, the correct score assigning unit 23 thresholds the maximum value B to give the program image P _n a three-level correct score (for example, 0.0, 0.5, 1). .0).

この場合、正解スコアの段階は、０～１の範囲において必ずしも等間隔である必要はなく、例えば０．０，０．７，１．０であってもよく、適切な間隔であればよい。また、正解スコアは、後述する図５において芸術性評価画像の正解スコアと同様の範囲、例えば０～１の範囲で、その段階が設定されるものとする。 In this case, the stages of the correct score do not necessarily have to be at equal intervals in the range of 0 to 1, and may be, for example, 0.0, 0.7, 1.0, as long as they are at appropriate intervals. Further, the correct score is set in stages in the same range as the correct score of the artistic evaluation image in FIG. 5, which will be described later, for example, in the range of 0 to 1.

また、正解スコア付与部２３は、番組画像に対して、類似度算出部２２から入力した類似度を正解スコアとして付与するようにしてもよい。この場合の類似度の範囲は０～１である。 Further, the correct score assigning unit 23 may assign the similarity input from the similarity calculating unit 22 to the program image as the correct score. The range of similarity in this case is 0-1.

正解スコア付与部２３は、ステップＳ３０７またはＳ３０８から移行して、番組画像及び正解スコアからなる番組学習データをメモリ１１に格納する（ステップＳ３０９）。ステップＳ３０４～Ｓ３０９の処理は、Ｎ個の番組画像Ｐ_n（ｎ＝１，・・・，Ｎ）について行われ、メモリ１１には、Ｎ個の番組学習データが格納される。 The correct score assigning unit 23 moves from step S307 or S308 and stores the program learning data consisting of the program image and the correct score in the memory 11 (step S309). The processes of steps S304 to S309 are performed on N program images P _n (n=1, . . . , N), and N program learning data are stored in the memory 11.

これにより、番組画像Ｐ_nについて、番組ＨＰからダウンロードされた静止画との間の類似度Ｓ_n,mが反映された正解スコアが付与され、番組学習データがメモリ１１に格納される。類似度Ｓ_n,mが高いほど（代表画像に相応しいほど）、正解スコアは１または１に近い段階の値となり、類似度Ｓ_n,mが低いほど（代表画像に相応しくないほど）、正解スコアは０または０に近い段階の値となる。 As a result, a correct score that reflects the degree of similarity S _n _,m between the program image P n and the still image downloaded from the program HP is assigned, and the program learning data is stored in the memory 11 . The higher the similarity S _n,m (the more appropriate the image is to be a representative image), the correct score will be 1 or a value close to 1, and the lower the similarity S _n,m (the less appropriate the image is to be a representative image), the correct score will be. is a value of 0 or a step close to 0.

図４は、番組学習データ生成部１０の他の構成例を示すフローチャートである。この番組学習データ生成部１０は、サンプリング処理部２０及び正解スコア付与部２４を備えている。この番組学習データ生成部１０は、学習用番組映像のみを入力し、番組ＨＰのＵＲＬを入力しない。 FIG. 4 is a flowchart showing another example of the configuration of the program learning data generation section 10. The program learning data generation section 10 includes a sampling processing section 20 and a correct answer scoring section 24. This program learning data generation section 10 receives only the learning program video and does not input the URL of the program homepage.

サンプリング処理部２０は、学習用番組映像を入力し、図２に示したサンプリング処理部２０と同様の処理を行い、番組画像を正解スコア付与部２４に出力する。 The sampling processing unit 20 inputs the learning program video, performs the same processing as the sampling processing unit 20 shown in FIG. 2, and outputs the program image to the correct score assigning unit 24.

正解スコア付与部２４は、サンプリング処理部２０から番組画像を入力し、番組画像を、図示しない表示装置に表示する。番組制作スタッフであるユーザは、表示装置に表示された番組画像を評価し、番組画像に対する正解スコアを判断する。例えば、２段階の正解スコアの場合、番組画像の評価が高いときに正解スコア（＝１）が判断され、番組画像の評価が低いときに正解スコア（＝０）が判断される。 The correct score assigning unit 24 inputs the program image from the sampling processing unit 20 and displays the program image on a display device (not shown). A user who is a program production staff member evaluates a program image displayed on a display device and determines a correct score for the program image. For example, in the case of a two-level correct score, the correct score (=1) is determined when the evaluation of the program image is high, and the correct score (=0) is determined when the evaluation of the program image is low.

正解スコア付与部２４は、番組制作スタッフの操作に従い、番組画像に対する正解スコアを入力する。そして、正解スコア付与部２４は、番組画像に対して正解スコアを付与することで、番組画像及び正解スコアからなる番組学習データを生成し、これをメモリ１１に格納する。 The correct score assigning unit 24 inputs the correct score for the program image according to the operation of the program production staff. Then, the correct score assigning unit 24 generates program learning data consisting of the program image and the correct score by assigning a correct score to the program image, and stores this in the memory 11 .

これにより、番組画像について、番組制作スタッフのノウハウが反映された正解スコアが付与され、番組学習データがメモリ１１に格納される。番組画像に対する評価が高いほど（代表画像として相応しいほど）、正解スコアは１または１に近い段階の値となり、番組画像に対する評価が低いほど（代表画像として相応しくないほど）、正解スコアは０または０に近い段階の値となる。 As a result, a correct score that reflects the know-how of the program production staff is assigned to the program image, and the program learning data is stored in the memory 11. The higher the evaluation of the program image (the more suitable it is as a representative image), the correct score will be 1 or a value close to 1, and the lower the evaluation of the program image (the more unsuitable it is as a representative image), the correct score will be 0 or 0. The value is close to .

（芸術性学習データ生成部１２）
次に、図１に示した芸術性学習データ生成部１２について詳細に説明する。図５は、芸術性学習データ生成部１２の構成例を示すブロック図である。この芸術性学習データ生成部１２は、正解スコア付与部２５を備えている。 (Artistic learning data generation unit 12)
Next, the artistic learning data generation section 12 shown in FIG. 1 will be explained in detail. FIG. 5 is a block diagram showing an example of the configuration of the artistic learning data generation section 12. As shown in FIG. The artistic learning data generating section 12 includes a correct answer scoring section 25.

正解スコア付与部２５は、芸術性評価画像及び正解ラベルからなる芸術性評価オープンデータを順次入力する。 The correct score assigning unit 25 sequentially inputs artistic evaluation open data consisting of artistic evaluation images and correct answer labels.

正解スコア付与部２５は、芸術性評価オープンデータのそれぞれについて、芸術性評価オープンデータに含まれる正解ラベルを、所定の規則に従って正解スコアに変換することで、芸術性評価オープンデータに含まれる芸術性評価画像に対して正解スコアを付与する。そして、正解スコア付与部２５は、芸術性評価画像及び正解スコアからなる芸術性学習データをメモリ１３に格納する。 The correct score assigning unit 25 converts the correct label included in the artistic evaluation open data into a correct score according to a predetermined rule for each piece of artistic evaluation open data. A correct score is assigned to the evaluation image. Then, the correct score assigning unit 25 stores the artistic learning data consisting of the artistic evaluation image and the correct scores in the memory 13.

一般に、正解ラベルは数値化されていないため、正解ラベルを学習処理に用いることができない。このため、正解スコア付与部２５は、正解ラベルを、数値で表した正解スコアに変換する。これにより、正解ラベルが反映され、かつ数値で表された正解スコアを、学習処理に用いることができる。 Generally, the correct label cannot be used for learning processing because it is not digitized. Therefore, the correct answer score giving unit 25 converts the correct answer label into a correct answer score expressed as a numerical value. Thereby, the correct answer score, which reflects the correct answer label and is expressed numerically, can be used for the learning process.

所定の規則は、ｐ段階の正解ラベルをｑ段階の正解スコアに変換する、予め設定された規則である。ｐ，ｑは２以上の整数であり、ｐ≠ｑであってもよいし、ｐ＝ｑであってもよい。 The predetermined rule is a preset rule that converts a p-level correct answer label into a q-level correct answer score. p and q are integers of 2 or more, and may be p≠q or p=q.

所定の規則により、例えば、３段階の正解ラベルである「great」「good」及び「bad」が２段階の正解スコアに変換される。３段階の正解ラベルは、芸術性の高い順に「great」＞「good」＞「bad」である。正解ラベル（＝great）は正解スコア（＝１）に、正解ラベル（＝good）は正解スコア（＝１）に、正解ラベル（＝bad）は正解スコア（＝０）に変換される。 According to a predetermined rule, for example, three-level correct labels "great," "good," and "bad" are converted into two-level correct scores. The three levels of correct labels are "great" > "good" > "bad" in order of artistic quality. The correct label (=great) is converted into a correct score (=1), the correct label (=good) is converted into a correct score (=1), and the correct label (=bad) is converted into a correct score (=0).

また、所定の規則により、例えば、３段階の正解ラベルが３段階の正解スコアに変換される。正解ラベル（＝great）は正解スコア（＝１．０）に、正解ラベル（＝good）は正解スコア（＝０．５）に、正解ラベル（＝bad）は正解スコア（＝０．０）に変換される。 Further, according to a predetermined rule, for example, a three-level correct answer label is converted into a three-level correct answer score. The correct label (=great) becomes the correct score (=1.0), the correct label (=good) becomes the correct score (=0.5), and the correct label (=bad) becomes the correct score (=0.0). converted.

尚、正解スコアの段階は、０～１の範囲において必ずしも等間隔である必要はなく、例えば０．０，０．７，１．０であってもよく、適切な間隔であればよい。また、正解スコアは、３段階を超える段階であってもよく、前述の番組画像の正解スコアと同様の範囲、例えば０～１の範囲で、その段階が設定されるものとする。 Note that the stages of the correct score do not necessarily have to be at equal intervals within the range of 0 to 1, and may be, for example, 0.0, 0.7, 1.0, as long as they are at appropriate intervals. Further, the correct score may have more than three levels, and the level is set in the same range as the correct score of the program image described above, for example, in the range of 0 to 1.

正解スコア付与部２５は、正解ラベルを正解スコアに変換する代わりに、番組制作スタッフの操作に従い、番組制作スタッフにより判断された正解スコアを入力することで、芸術性評価画像に対して正解スコアを付与するようにしてもよい。前述と同様に、正解スコアの段階は必ずしも等間隔である必要はない。 Instead of converting the correct label into a correct score, the correct score assigning unit 25 assigns the correct score to the artistic evaluation image by inputting the correct score determined by the program production staff in accordance with the program production staff's operation. It may be given. Similar to the above, the stages of correct scores do not necessarily have to be equally spaced.

具体的には、正解スコア付与部２５は、芸術性評価オープンデータに含まれる芸術性評価画像及び正解ラベルを、図示しない表示装置に表示する。番組制作スタッフであるユーザは、表示装置に表示された正解ラベルを参照して芸術性評価画像を評価し、芸術性評価画像に対する正解スコアを判断する。 Specifically, the correct score assigning unit 25 displays the artistic evaluation image and the correct answer label included in the artistic evaluation open data on a display device (not shown). A user who is a program production staff member evaluates the artistic evaluation image with reference to the correct answer label displayed on the display device, and determines the correct answer score for the artistic evaluation image.

正解スコア付与部２５は、番組制作スタッフの操作に従い、芸術性評価画像に対する正解スコアを入力する。そして、正解スコア付与部２５は、芸術性評価画像に対して正解スコアを付与することで、芸術性評価画像及び正解スコアからなる芸術性学習データを生成し、これをメモリ１３に格納する。 The correct score assigning unit 25 inputs the correct score for the artistic evaluation image according to the operation of the program production staff. Then, the correct score assigning unit 25 generates artistic learning data consisting of the artistic evaluation image and the correct score by assigning a correct score to the artistic evaluation image, and stores this in the memory 13 .

これにより、芸術性評価画像について、番組制作スタッフのノウハウが反映された正解スコアが付与され、芸術性学習データがメモリ１３に格納される。芸術性評価画像に対する評価が高いほど（代表画像として相応しいほど）、正解スコアは１または１に近い段階の値となり、芸術性評価画像に対する評価が低いほど（代表画像として相応しくないほど）、正解スコアは０または０に近い段階の値となる。 As a result, a correct score reflecting the know-how of the program production staff is assigned to the artistic evaluation image, and the artistic learning data is stored in the memory 13. The higher the evaluation for the artistic evaluation image (the more suitable it is as a representative image), the correct score will be 1 or a value close to 1, and the lower the evaluation for the artistic evaluation image (the more unsuitable it is for a representative image), the correct answer score will be. is a value of 0 or a step close to 0.

（学習部１４）
次に、図１に示した学習部１４について詳細に説明する。図６は、学習部１４の構成例を示すブロック図であり、図７は、学習部１４の処理例を示すフローチャートである。 (Learning part 14)
Next, the learning section 14 shown in FIG. 1 will be explained in detail. FIG. 6 is a block diagram showing a configuration example of the learning unit 14, and FIG. 7 is a flowchart showing an example of processing by the learning unit 14.

この学習部１４は、切り替え部３０、ＮＮ（ニューラルネットワーク）部３１、誤差算出部３２及びパラメータ更新部３３を備えている。学習部１４は、ステップＳ７０７の処理にて終了条件を満たすまで、番組学習データ及び芸術性学習データの組毎に、ステップＳ７０１～Ｓ７０６の処理を行う。 The learning section 14 includes a switching section 30, a neural network (NN) section 31, an error calculation section 32, and a parameter updating section 33. The learning unit 14 performs the processes of steps S701 to S706 for each set of program learning data and artistic learning data until the end condition is satisfied in the process of step S707.

切り替え部３０は、パラメータ更新部３３から、番組学習データまたは芸術性学習データを示す切り替え信号を入力する。そして、切り替え部３０は、切り替え信号が番組学習データを示している場合、メモリ１１から、番組画像及び正解スコアからなる番組学習データを読み出す。一方、切り替え部３０は、切り替え信号が芸術性学習データを示している場合、メモリ１３から、芸術性評価画像及び正解スコアからなる芸術性学習データを読み出す（ステップＳ７０１）。 The switching unit 30 receives a switching signal indicating program learning data or artistic learning data from the parameter updating unit 33. Then, when the switching signal indicates program learning data, the switching unit 30 reads program learning data consisting of a program image and a correct score from the memory 11. On the other hand, when the switching signal indicates artistic learning data, the switching unit 30 reads artistic learning data consisting of an artistic evaluation image and a correct score from the memory 13 (step S701).

これにより、番組学習データを示す切り替え信号が入力される毎に、メモリ１１から、新たな番組学習データが読み出され、芸術性学習データを示す切り替え信号が入力される毎に、メモリ１３から新たな芸術性学習データが読み出される。 As a result, new program learning data is read out from the memory 11 every time a switching signal indicating program learning data is input, and new program learning data is read out from the memory 13 every time a switching signal indicating artistic learning data is input. Artistic learning data is read out.

切り替え部３０は、切り替え信号が番組学習データを示している場合、番組学習データに含まれる番組画像をＮＮ部３１に出力すると共に、番組画像に対応する正解スコアを誤差算出部３２に出力する。 When the switching signal indicates program learning data, the switching unit 30 outputs the program image included in the program learning data to the NN unit 31, and also outputs the correct score corresponding to the program image to the error calculation unit 32.

一方、切り替え部３０は、切り替え信号が芸術性学習データを示している場合、芸術性学習データに含まれる芸術性評価画像をＮＮ部３１に出力すると共に、芸術性評価画像に対応する正解スコアを誤差算出部３２に出力する。 On the other hand, when the switching signal indicates the artistic learning data, the switching unit 30 outputs the artistic evaluation image included in the artistic learning data to the NN unit 31, and also outputs the correct score corresponding to the artistic evaluation image. It is output to the error calculation section 32.

ＮＮ部３１は、切り替え部３０から番組画像または芸術性評価画像のテンソルを入力する。そして、ＮＮ部３１は、パラメータ更新部３３によりパラメータが設定されたニューラルネットワークを用いて、番組画像または芸術性評価画像から１次元のスコアを算出し、スコアを誤差算出部３２に出力する。 The NN section 31 receives the tensor of the program image or the artistic evaluation image from the switching section 30 . Then, the NN section 31 calculates a one-dimensional score from the program image or the artistic evaluation image using a neural network whose parameters are set by the parameter update section 33, and outputs the score to the error calculation section 32.

誤差算出部３２は、ＮＮ部３１からスコアを入力すると共に、切り替え部３０から正解スコアを入力し、両者の誤差を算出してパラメータ更新部３３に出力する。例えば、誤差を算出する関数としては、ＭＳＥ（平均二乗誤差）等の、誤差が大きいほど大きい値を出力する関数が用いられる。 The error calculation section 32 receives the score from the NN section 31 and the correct score from the switching section 30, calculates an error between the two, and outputs the result to the parameter update section 33. For example, as a function for calculating the error, a function such as MSE (mean square error) that outputs a larger value as the error becomes larger is used.

具体的には、ＮＮ部３１は、切り替え部３０から番組画像のテンソルを入力した場合、ニューラルネットワークを用いて、番組映像からスコアを算出する（ステップＳ７０２）。このニューラルネットワークとは、後述する図８に示す特徴抽出用ＮＮ４０及びスコア算出用ＮＮ４１である。 Specifically, when the tensor of the program image is input from the switching unit 30, the NN unit 31 calculates a score from the program image using a neural network (step S702). This neural network is a feature extraction NN 40 and a score calculation NN 41 shown in FIG. 8, which will be described later.

そして、誤差算出部３２は、番組画像のスコアと番組学習データに含まれる当該番組画像の正解スコアとの間の誤差を算出する（ステップＳ７０３）。 Then, the error calculation unit 32 calculates the error between the score of the program image and the correct score of the program image included in the program learning data (step S703).

一方、ＮＮ部３１は、切り替え部３０から芸術性評価画像のテンソルを入力した場合、ニューラルネットワークを用いて、芸術性評価画像からスコアを算出する（ステップＳ７０４）。 On the other hand, when the tensor of the artistic evaluation image is input from the switching section 30, the NN section 31 calculates a score from the artistic evaluation image using a neural network (step S704).

そして、誤差算出部３２は、芸術性評価画像のスコアと芸術性学習データに含まれる当該芸術性評価画像の正解スコアとの間の誤差を算出する（ステップＳ７０５）。 Then, the error calculation unit 32 calculates the error between the score of the artistic evaluation image and the correct score of the artistic evaluation image included in the artistic learning data (step S705).

パラメータ更新部３３は、誤差算出部３２から番組画像の誤差及び芸術性評価画像の誤差を入力し、これらの誤差の和が小さくなるように、保持しているパラメータを更新する（ステップＳ７０６）。そして、パラメータ更新部３３は、更新したパラメータをＮＮ部３１に設定する。 The parameter update unit 33 inputs the error of the program image and the error of the artistic evaluation image from the error calculation unit 32, and updates the held parameters so that the sum of these errors becomes small (step S706). Then, the parameter update section 33 sets the updated parameters in the NN section 31.

ここで、パラメータ更新部３３は、ＮＮ部３１に設定したパラメータを保持しているものとする。 Here, it is assumed that the parameter update section 33 holds the parameters set in the NN section 31.

尚、パラメータ更新部３３は、パラメータを更新する処理として、例えばＡｄａｍ、ＳＧＤ（Stocastic Gradient Descent）、誤差逆伝播学習法（Backpropagation）等の一般的なニューラルネットワーク最適化手法を用いる。 Note that the parameter updating unit 33 uses a general neural network optimization method such as Adam, SGD (Stochastic Gradient Descent), and error backpropagation learning method (Backpropagation) to update the parameters.

また、パラメータ更新部３３は、番組画像及び芸術性評価画像を組として、所定数の組（例えば３０組）毎に、パラメータを更新するようにしてもよい。具体的には、パラメータ更新部３３は、所定数の組の誤差をそれぞれ入力し、所定数の組の誤差の和を算出し、当該誤差の和が小さくなるように、パラメータを更新する。 Further, the parameter updating unit 33 may update the parameters for each set of a predetermined number of sets (for example, 30 sets) of a program image and an artistic evaluation image. Specifically, the parameter updating unit 33 inputs a predetermined number of sets of errors, calculates the sum of the predetermined number of sets of errors, and updates the parameters so that the sum of the errors becomes smaller.

パラメータ更新部３３は、誤差算出部３２から番組画像の誤差を入力した場合、次に芸術性評価画像の誤差を入力するために、芸術性学習データを示す切り替え信号を切り替え部３０に出力する。 When the parameter update unit 33 receives the error of the program image from the error calculation unit 32, it outputs a switching signal indicating artistic learning data to the switching unit 30 in order to input the error of the artistic evaluation image next.

一方、パラメータ更新部３３は、誤差算出部３２から芸術性評価画像の誤差を入力した場合、次に番組画像の誤差を入力するために、番組学習データを示す切り替え信号を切り替え部３０に出力する。 On the other hand, when the error of the artistic evaluation image is input from the error calculation unit 32, the parameter update unit 33 outputs a switching signal indicating program learning data to the switching unit 30 in order to input the error of the program image next. .

パラメータ更新部３３は、ステップＳ７０６から移行して、パラメータ更新の終了条件を満たすか否かを判定する（ステップＳ７０７）。 The parameter update unit 33 moves from step S706 and determines whether the parameter update termination condition is satisfied (step S707).

パラメータ更新部３３は、ステップＳ７０７において、終了条件を満たさないと判定した場合（ステップＳ７０７：Ｎ）、ステップＳ７０１へ移行し、次の番組学習データ及び芸術性学習データの組について、ステップＳ７０１～Ｓ７０６の処理を行う。つまり、終了条件を満たすまで、番組学習データ及び芸術性学習データの組毎に、ステップＳ７０１～Ｓ７０６の処理が行われる。 If the parameter update unit 33 determines in step S707 that the end condition is not satisfied (step S707: N), the parameter update unit 33 moves to step S701 and updates the next set of program learning data and artistic learning data in steps S701 to S706. Process. That is, the processes of steps S701 to S706 are performed for each set of program learning data and artistic learning data until the termination condition is satisfied.

一方、パラメータ更新部３３は、ステップＳ７０７において、終了条件を満たすと判定した場合（ステップＳ７０７：Ｙ）、ステップＳ７０６の処理にて更新したパラメータを最適なパラメータとして出力する（ステップＳ７０８）。パラメータ更新部３３により出力された最適なパラメータは、後述する図１０に示す代表画像抽出装置２に備えたスコア算出部５１のニューラルネットワークに設定される。 On the other hand, if the parameter updating unit 33 determines in step S707 that the termination condition is satisfied (step S707: Y), it outputs the parameters updated in the process of step S706 as optimal parameters (step S708). The optimal parameters output by the parameter updating section 33 are set in the neural network of the score calculation section 51 provided in the representative image extraction device 2 shown in FIG. 10, which will be described later.

ここで、ステップＳ７０７における終了条件は、例えば、予め設定された回数分のパラメータ更新が行われたか否か、パラメータの更新量が予め設定された閾値よりも小さいか否かの条件等である。 Here, the termination conditions in step S707 include, for example, whether the parameters have been updated a preset number of times, and whether the amount of parameter updates is smaller than a preset threshold.

（ＮＮ部３１）
次に、図６に示したＮＮ部３１について詳細に説明する。図８は、ＮＮ部３１の構成例を示すブロック図である。このＮＮ部３１は、特徴抽出用ＮＮ４０及びスコア算出用ＮＮ４１を備えて構成される。 (NN section 31)
Next, the NN unit 31 shown in FIG. 6 will be explained in detail. FIG. 8 is a block diagram showing an example of the configuration of the NN section 31. As shown in FIG. This NN unit 31 is configured to include a feature extraction NN 40 and a score calculation NN 41.

特徴抽出用ＮＮ４０は、番組画像または芸術性評価画像を入力データとして、パラメータ更新部３３によりパラメータが設定されたニューラルネットワークの演算により、１０２４次元の画像特徴ベクトルの出力データを求める。 The feature extraction NN 40 receives a program image or an artistic evaluation image as input data, and calculates output data of a 1024-dimensional image feature vector by calculation of a neural network whose parameters are set by the parameter updating unit 33.

スコア算出用ＮＮ４１は、特徴抽出用ＮＮ４０により求めた１０２４次元の画像特徴ベクトルを入力データとして、パラメータ更新部３３によりパラメータが設定されたニューラルネットワークの演算により、１次元のスコアの出力データを求める。 The score calculation NN 41 uses the 1024-dimensional image feature vector obtained by the feature extraction NN 40 as input data, and calculates one-dimensional score output data through neural network calculations whose parameters are set by the parameter update unit 33.

図９は、ＮＮ部３１の具体的な構成例を説明する図であり、図８に示したＮＮ部３１の構成を詳細に表したものである。図９において、「Conv」は畳み込み層を、「MaxPool」は最大値を抽出するプーリング層を、「LocalResponseNorm」は正規化層をそれぞれ示す。また、「Inception Module」は「GoogLeNet」に含まれる技術であり、畳み込み層及びプーリング層を示す。また、「AveragePool」は平均値を算出するプーリング層を、「FC」は全結合層を、「Concat」は連結層を、「Sigmoid」はシグモイド関数を用いる層をそれぞれ示す。また、「Kernel」はフィルタサイズを、「dim」は次元数をそれぞれ示す。 FIG. 9 is a diagram illustrating a specific example of the configuration of the NN section 31, and shows the configuration of the NN section 31 shown in FIG. 8 in detail. In FIG. 9, "Conv" indicates a convolution layer, "MaxPool" indicates a pooling layer that extracts the maximum value, and "LocalResponseNorm" indicates a normalization layer. Furthermore, "Inception Module" is a technology included in "GoogLeNet" and indicates a convolution layer and a pooling layer. Furthermore, "AveragePool" indicates a pooling layer that calculates the average value, "FC" indicates a fully connected layer, "Concat" indicates a connected layer, and "Sigmoid" indicates a layer using a sigmoid function. Furthermore, "Kernel" indicates the filter size, and "dim" indicates the number of dimensions.

番組画像または芸術性評価画像が入力される「Conv」のプーリング層α１から、１０２４次元の画像特徴ベクトルが出力される「Concat」の連結層α２までの各層により、特徴抽出用ＮＮ４０が構成される。 The feature extraction NN 40 is configured by each layer from the pooling layer α1 of “Conv” where program images or artistic evaluation images are input to the connection layer α2 of “Concat” where a 1024-dimensional image feature vector is output. .

また、１０２４次元の画像特徴ベクトルが入力される「FC」の全結合層α３から、１次元のスコアが出力される「Sigmoid」のシグモイド関数α４の出力層までの各層により、スコア算出用ＮＮ４１が構成される。 In addition, each layer from the fully connected layer α3 of “FC” where a 1024-dimensional image feature vector is input to the output layer of the sigmoid function α4 of “Sigmoid” where a one-dimensional score is output, the score calculation NN41 is configured.

このように、ＮＮ部３１は、番組画像または芸術性評価画像から当該画像の１０２４次元の画像特徴ベクトルを算出する特徴抽出用ＮＮ４０と、当該画像の１０２４次元の画像特徴ベクトルから１次元のスコアを算出するスコア算出用ＮＮ４１から構成される。 In this way, the NN unit 31 uses the feature extraction NN 40 that calculates the 1024-dimensional image feature vector of the image from the program image or the artistic evaluation image, and the NN 40 that calculates the 1-dimensional score from the 1024-dimensional image feature vector of the image. It is composed of a score calculation NN 41 to be calculated.

このＮＮ部３１により、番組画像または芸術性評価画像に付与された正解スコアの段階数に関わることなく、１次元のスコアが算出される。つまり、ＮＮ部３１としては、番組画像または芸術性評価画像の段階数に応じて異なるニューラルネットワークを用意する必要がなく、段階数に依存することのない固定構成のニューラルネットワークを用意すればよい。 The NN unit 31 calculates a one-dimensional score regardless of the number of correct score stages assigned to the program image or artistic evaluation image. That is, as the NN section 31, there is no need to prepare different neural networks depending on the number of stages of the program image or artistic evaluation image, and it is sufficient to prepare a neural network with a fixed configuration that does not depend on the number of stages.

以上のように、本発明の実施形態による学習装置１によれば、番組学習データ生成部１０は、学習用番組映像をサンプリングして得られた番組画像について、番組ＨＰのＵＲＬへアクセスして取得した静止画との間の類似度を算出する。そして、番組学習データ生成部１０は、類似度に基づいて、番組画像に対して正解スコアを付与し、番組画像及び正解スコアからなる番組学習データを生成する。 As described above, according to the learning device 1 according to the embodiment of the present invention, the program learning data generation unit 10 accesses the URL of the program homepage and obtains the program image obtained by sampling the learning program video. The degree of similarity between the still image and the captured still image is calculated. Then, the program learning data generation unit 10 assigns a correct score to the program image based on the degree of similarity, and generates program learning data including the program image and the correct score.

芸術性学習データ生成部１２は、芸術性評価オープンデータに含まれる正解ラベルを、所定の規則に従って正解スコアに変換し、芸術性評価画像及び正解スコアからなる芸術性学習データを生成する。 The artistic learning data generation unit 12 converts the correct label included in the artistic evaluation open data into a correct score according to a predetermined rule, and generates artistic learning data including the artistic evaluation image and the correct score.

学習部１４は、番組学習データ及び芸術性学習データを用いて、ニューラルネットワークを学習する。具体的には、ＮＮ部３１は、ニューラルネットワークを用いて、番組学習データに含まれる番組画像から１次元のスコアを算出し、誤差算出部３２は、番組画像のスコアと番組学習データに含まれる正解スコアとの間の誤差を算出する。また、ＮＮ部３１は、芸術性評価画像についても１次元のスコアを算出し、誤差算出部３２は、芸術性評価画像のスコアと芸術性学習データに含まれる正解スコアとの間の誤差を算出する。 The learning unit 14 uses the program learning data and the artistic learning data to learn the neural network. Specifically, the NN section 31 uses a neural network to calculate a one-dimensional score from the program image included in the program learning data, and the error calculation section 32 calculates the score of the program image and the program image included in the program learning data. Calculate the error between the correct score and the correct score. The NN unit 31 also calculates a one-dimensional score for the artistic evaluation image, and the error calculation unit 32 calculates the error between the score of the artistic evaluation image and the correct score included in the artistic learning data. do.

パラメータ更新部３３は、番組画像の誤差及び芸術性評価画像の誤差の和が小さくなるように、ニューラルネットワークのパラメータを更新し、所定の終了条件を満たしたときのパラメータを最適なパラメータとして出力する。 The parameter updating unit 33 updates the parameters of the neural network so that the sum of the error in the program image and the error in the artistic evaluation image becomes small, and outputs the parameters when a predetermined termination condition is satisfied as the optimal parameters. .

ここで、番組ＨＰの静止画は、番組制作スタッフのノウハウを生かすことで生成された画像であるため、番組画像と静止画の類似度から算出された番組画像の正解スコアは、番組制作のノウハウを考慮したスコアとなる。 Here, the still images on the program homepage are images generated by making use of the know-how of the program production staff, so the correct score of the program image calculated from the similarity between the program image and the still image is based on the know-how of the program production staff. The score takes into account the

これにより、番組画像の正解スコアを用いて学習されたニューラルネットワークも、番組制作のノウハウを考慮したものとなる。したがって、後述する代表画像抽出装置２は、学習装置１により学習されたニューラルネットワークを用いることにより、番組映像から、番組制作のノウハウを考慮した代表画像を抽出することができる。また、番組映像以外の特殊なデータを用いることなく、代表画像を抽出することができるから、処理負荷を低減することができる。そして、代表画像を用いて番組ＨＰを作成する際には、作業量を大幅に減らすことができる。 As a result, the neural network trained using the correct scores of program images also takes into account program production know-how. Therefore, by using the neural network learned by the learning device 1, the representative image extraction device 2, which will be described later, can extract a representative image from the program video taking into account the know-how of program production. Furthermore, since the representative image can be extracted without using special data other than the program video, the processing load can be reduced. Further, when creating a program homepage using representative images, the amount of work can be significantly reduced.

また、メモリ１１に格納された番組学習データの数がメモリ１３に格納された芸術性学習データよりも少ない場合であっても、同じ番組学習データを繰り返し用いることにより、番組学習データの不足分を補充することができる。これにより、芸術性学習データと同数の番組学習データを用意することができ、同数の番組学習データ及び芸術性学習データを用いて、ニューラルネットワークを学習することができる。 Furthermore, even if the number of program learning data stored in the memory 11 is smaller than the artistic learning data stored in the memory 13, by repeatedly using the same program learning data, the shortage of program learning data can be compensated for. Can be replenished. Thereby, the same number of program learning data as the artistic learning data can be prepared, and the neural network can be trained using the same number of program learning data and artistic learning data.

具体的には、メモリ１１に格納された番組学習データの数をＡ個（Ａは正の整数）、メモリ１３に格納された芸術性学習データの数をＢ個（Ｂは正の整数）、Ａ＜Ｂ、番組学習データ及び芸術性学習データの差分、すなわちＢ個からＡ個を減算した結果を（Ｂ－Ａ）とする。番組学習データ及び芸術性学習データの差分に相当する（Ｂ－Ａ）個の番組学習データ（芸術性学習データに対する番組学習データの不足分）は、Ａ個の番組学習データのいずれかまたは全てを使用することで補充される。すなわち、学習部１４は、Ａ個の番組学習データ、及び、不足分の（Ｂ－Ａ）個のデータであって、Ａ個の番組学習データのいずれかまたは全てを用いて補充された番組学習データ、並びにＢ個の芸術性学習データを用いて、ニューラルネットワークを学習する。この場合、Ａ個の番組学習データ及び不足分の（Ｂ－Ａ）個の番組学習データの合計数は、芸術性学習データの数と同じＢ個である。 Specifically, the number of program learning data stored in the memory 11 is A (A is a positive integer), the number of artistic learning data stored in the memory 13 is B (B is a positive integer), If A<B, the difference between the program learning data and the artistic learning data, that is, the result of subtracting A from B is set as (BA). The (B-A) pieces of program learning data corresponding to the difference between the program learning data and the artistic learning data (the shortfall of the program learning data with respect to the artistic learning data) are obtained by combining any or all of the A pieces of program learning data. Replenishes with use. In other words, the learning unit 14 performs program learning that is supplemented by using any or all of the A program learning data and the missing (B-A) data. A neural network is trained using the data and B pieces of artistic training data. In this case, the total number of A program learning data and the missing (B−A) program learning data is B, which is the same as the number of artistic learning data.

例えば、番組学習データの数がＡ＝6,000であり、芸術性学習データの数がＢ＝10,000である場合を想定する。この場合、芸術性学習データに対する番組学習データの不足分である（Ｂ－Ａ）＝4,000個の番組学習データは、Ａ＝6,000個の番組学習データの一部を用いて補充される。これにより、不足分の4,000個の番組学習データは、元のＡ＝6,000個の番組学習データを用いて補充することができる。学習部１４は、元のＡ＝6,000個の番組学習データ及び不足分の4,000個の番組学習データ、並びに10,000個の芸術性学習データを用いて、ニューラルネットワークを学習する。 For example, assume that the number of program learning data is A=6,000 and the number of artistic learning data is B=10,000. In this case, the shortage of program learning data (B−A)=4,000 pieces of program learning data with respect to the artistic learning data is supplemented by using a part of the program learning data of A=6,000 pieces. As a result, the missing 4,000 pieces of program learning data can be supplemented using the original A=6,000 pieces of program learning data. The learning unit 14 trains the neural network using the original A=6,000 program learning data, the missing 4,000 program learning data, and 10,000 artistic learning data.

また、番組学習データの数がＡ＝6,000であり、芸術性学習データの数がＢ＝25,000である場合を想定する。この場合、芸術性学習データに対する番組学習データの不足分である（Ｂ－Ａ）＝19,000個の番組学習データは、Ａ＝6,000個の番組学習データが３回重複して使用され、さらに、残りの1,000個については、Ａ＝6,000個の番組学習データの一部が使用される。これにより、不足分の19,000個の番組学習データは、元のＡ＝6,000個の番組学習データを用いて補充することができる。学習部１４は、元のＡ＝6,000個の番組学習データ及び不足分の19,000個の番組学習データ、並びに25,000個の芸術性学習データを用いて、ニューラルネットワークを学習する。 Further, assume that the number of program learning data is A=6,000 and the number of artistic learning data is B=25,000. In this case, the lack of program learning data (B-A) = 19,000 pieces of program learning data for the artistic learning data is obtained by using A = 6,000 pieces of program learning data three times, and the remaining For 1,000 pieces of program learning data, part of A=6,000 pieces of program learning data is used. As a result, the missing 19,000 pieces of program learning data can be supplemented using the original A=6,000 pieces of program learning data. The learning unit 14 trains the neural network using the original A=6,000 program learning data, the missing 19,000 program learning data, and 25,000 artistic learning data.

また、番組画像の正解スコアの段階数と芸術性評価画像の正解スコアの段階数が同じまたは異なる場合であっても、ＮＮ部３１により、番組画像及び芸術性評価画像について統一した１次元のスコアが算出される。つまり、ＮＮ部３１において、番組画像及び芸術性評価画像の段階数に依存することのない固定構成のニューラルネットワークを用いることができるから、段階数に応じて異なるニューラルネットワークを予め用意する必要がない。したがって、簡易な構成にて高精度の学習処理を実現することができる。 Furthermore, even if the number of stages of the correct score of the program image and the number of stages of the correct score of the artistic evaluation image are the same or different, the NN unit 31 provides a unified one-dimensional score for the program image and the artistic evaluation image. is calculated. In other words, the neural network unit 31 can use a neural network with a fixed configuration that does not depend on the number of stages of program images and artistic evaluation images, so there is no need to prepare different neural networks in advance depending on the number of stages. . Therefore, highly accurate learning processing can be achieved with a simple configuration.

図１３は、本発明の実施形態における学習処理の効果を説明する図である。（１）は、非特許文献１の学習処理を示しており、特徴抽出用ＮＮ及びクラス分類用ＮＮを用いて、２段階のクラス（２クラス）の正解スコアが付与された画像ａのデータセットから、２クラスの確率分布が算出される。 FIG. 13 is a diagram illustrating the effect of learning processing in the embodiment of the present invention. (1) shows the learning process of Non-Patent Document 1, in which a data set of image a is given a correct score of two classes (2 classes) using a feature extraction NN and a class classification NN. From this, the probability distribution of the two classes is calculated.

（２）は、一般的なマルチデータセットの学習処理を示している。特徴抽出用ＮＮ及び上側に示すクラス分類用ＮＮを用いて、２クラスの正解スコアが付与された画像ａのデータセットから、２クラスの確率分布が算出される。また、特徴抽出用ＮＮ及び下側に示すクラス分類用ＮＮを用いて、３クラスの正解スコアが付与された画像ｂのデータセットから、３クラスの確率分布が算出される。 (2) shows a general multi-data set learning process. Using the feature extraction NN and the class classification NN shown above, the probability distribution of two classes is calculated from the data set of image a to which the correct scores of two classes have been assigned. Further, using the feature extraction NN and the class classification NN shown below, the probability distribution of the three classes is calculated from the data set of the image b to which the correct scores of the three classes have been assigned.

（３）は、本発明の実施形態における学習処理を示しており、図８に示したＮＮ部３１の特徴抽出用ＮＮ４０及びスコア算出用ＮＮ４１による処理に相当する。特徴抽出用ＮＮ４０及びスコア算出用ＮＮ４１を用いて、２クラスの正解スコアが付与された画像ａ（例えば番組画像）のデータセットから、１次元のスコアが算出される。また、特徴抽出用ＮＮ４０及びスコア算出用ＮＮ４１を用いて、３クラスの正解スコアが付与された画像ｂ（例えば芸術性評価画像）のデータセットから、１次元のスコアが算出される。 (3) shows the learning process in the embodiment of the present invention, and corresponds to the process by the feature extraction NN 40 and score calculation NN 41 of the NN unit 31 shown in FIG. Using the feature extraction NN 40 and the score calculation NN 41, a one-dimensional score is calculated from a data set of images a (for example, program images) to which two classes of correct scores have been assigned. Further, using the feature extraction NN 40 and the score calculation NN 41, a one-dimensional score is calculated from the data set of the image b (for example, an artistic evaluation image) to which three classes of correct scores have been assigned.

（３）において、２クラスのデータセットの場合、例えば第１のクラスの正解スコアは０．０、第２のクラスの正解スコアは１．０である。また、３クラスのデータセットの場合、例えば第１のクラスの正解スコアは０．０、第２のクラスの正解スコアは０．５、第３のクラスの正解スコアは１．０である。 In (3), in the case of a two-class data set, for example, the correct score for the first class is 0.0, and the correct score for the second class is 1.0. Further, in the case of a data set of three classes, for example, the correct score of the first class is 0.0, the correct score of the second class is 0.5, and the correct score of the third class is 1.0.

（２）において、２クラスのデータセットにおける第１のクラス及び３クラスのデータセットにおける第１のクラスについて、これらの正解スコアが意味する画像に対する評価度合いは、似ているが同じではない。例えば、２クラスのデータセットにおける第１のクラスの正解スコアが０．０、３クラスのデータセットにおける第１のクラスの正解スコアも０．０とする。この場合、両データセットのクラス数が異なるため、正解スコアが０．０の画像に対する評価の幅も異なることとなる。 In (2), for the first class in the 2-class data set and the first class in the 3-class data set, the degree of evaluation of images implied by these correct scores is similar but not the same. For example, assume that the correct score of the first class in the two-class data set is 0.0, and the correct answer score of the first class in the three-class data set is also 0.0. In this case, since the numbers of classes in both datasets are different, the range of evaluation for images with a correct answer score of 0.0 will also be different.

このため、（２）に示したとおり、２クラスのデータセット用のクラス分類用ＮＮと、３クラスのデータセット用のクラス分類用ＮＮとに分け、異なる２つのＮＮを用いる必要がある。 Therefore, as shown in (2), it is necessary to use two different NNs, one for class classification for a two-class data set and one for class classification for a three-class data set.

しかしながら、例えば３クラスのデータセットの数が２クラスのデータセットよりも少ない場合には、特徴抽出用ＮＮ、上側に示すクラス分類用ＮＮ及び下側に示すクラス分類用ＮＮの全体として、精度の高い学習を実現することができない。 However, for example, if the number of three-class datasets is smaller than the two-class dataset, the overall accuracy of the feature extraction NN, the class classification NN shown in the upper part, and the class classification NN shown in the lower part will be reduced. Unable to achieve high learning.

そこで、（３）に示したように、本発明の実施形態において、１次元のスコアを算出する、両データセットに共通のスコア算出用ＮＮ４１を用いることで、データセットのクラス数に依存することなく、学習処理を実現することができる。 Therefore, as shown in (3), in the embodiment of the present invention, by using the score calculation NN41 common to both datasets, which calculates a one-dimensional score, it is possible to calculate a one-dimensional score depending on the number of classes in the dataset. It is possible to realize learning processing without having to do so.

このように、（２）に示したとおり、従来は、複数種類のデータセットを用いてニューラルネットワークを学習する場合、データセット毎に、異なるニューラルネットワークを用意する必要があった。これに対し、（３）に示したとおり、本発明の実施形態では、異なるニューラルネットワークを用意する必要はなく、単一のスコア算出用ＮＮ４１を用いれば済む。つまり、簡易な構成にて高精度の学習処理を実現することができる。 In this way, as shown in (2), conventionally, when learning a neural network using multiple types of data sets, it was necessary to prepare a different neural network for each data set. On the other hand, as shown in (3), in the embodiment of the present invention, there is no need to prepare different neural networks, and it is sufficient to use a single score calculation NN 41. In other words, highly accurate learning processing can be achieved with a simple configuration.

（３）に示す本発明の実施形態は、データセットのクラスとして、順序関係（例えば「great」＞「good」＞「bad」等）がある場合に、特に有効である。 The embodiment of the present invention shown in (3) is particularly effective when the classes of datasets have an order relationship (for example, "great" > "good" > "bad", etc.).

〔代表画像抽出装置〕
次に、図１に示した学習装置１により学習されたニューラルネットワークを用いて、番組映像から代表画像を抽出する代表画像抽出装置について説明する。図１０は、本発明の実施形態による代表画像抽出装置の構成例を示すブロック図である。 [Representative image extraction device]
Next, a representative image extraction device that extracts representative images from program video using the neural network trained by the learning device 1 shown in FIG. 1 will be described. FIG. 10 is a block diagram showing a configuration example of a representative image extraction device according to an embodiment of the present invention.

この代表画像抽出装置２は、サンプリング処理部５０、スコア算出部５１及び選択部５２を備えている。サンプリング処理部５０は、番組映像を入力し、番組映像から一定間隔で、フレーム画像である番組画像をサンプリングし、番組画像をスコア算出部５１に出力する。 The representative image extraction device 2 includes a sampling processing section 50, a score calculation section 51, and a selection section 52. The sampling processing unit 50 inputs the program video, samples the program image, which is a frame image, from the program video at regular intervals, and outputs the program image to the score calculation unit 51.

尚、サンプリング処理部５０は、番組映像をサンプリングして得られた全ての番組画像のうち、所定数の番組画像を予め選択し、選択した所定数の番組画像のみをスコア算出部５１に出力するようにしてもよい。これにより、後段のスコア算出部５１及び選択部５２における処理負荷を低減することができる。 Note that the sampling processing unit 50 selects in advance a predetermined number of program images from all the program images obtained by sampling the program video, and outputs only the selected predetermined number of program images to the score calculation unit 51. You can do it like this. Thereby, the processing load on the score calculation section 51 and the selection section 52 in the subsequent stages can be reduced.

スコア算出部５１は、図１に示した学習装置１により学習された学習済みニューラルネットワークを備えている。つまり、スコア算出部５１は、学習装置１により出力された最適なパラメータを入力し、ニューラルネットワークに設定する。 The score calculation unit 51 includes a trained neural network trained by the learning device 1 shown in FIG. That is, the score calculation unit 51 inputs the optimal parameters output by the learning device 1 and sets them in the neural network.

スコア算出部５１は、サンプリング処理部５０から番組画像のテンソルを入力し、ニューラルネットワークを用いて、番組画像からスコアを算出する。そして、スコア算出部５１は、番組画像及び当該番組画像のスコアを選択部５２に出力する。 The score calculation unit 51 inputs the tensor of the program image from the sampling processing unit 50 and calculates a score from the program image using a neural network. Then, the score calculation unit 51 outputs the program image and the score of the program image to the selection unit 52.

これにより、番組映像をサンプリングして得られた複数の番組画像のそれぞれについて、番組画像及び当該番組画像のスコアが算出され、選択部５２に出力される。 As a result, the program image and the score of the program image are calculated for each of the plurality of program images obtained by sampling the program video, and are output to the selection unit 52.

選択部５２は、サンプリング処理部５０によりサンプリングして得られた全ての番組画像のそれぞれについて、スコア算出部５１から番組画像及びスコアを入力する。そして、選択部５２は、スコアの降順に番組画像をソートし、全ての番組画像の中からスコアの高いＣ枚の番組画像を、代表画像に選択する。Ｃは１以上の整数であり、予め設定される。 The selection unit 52 inputs program images and scores from the score calculation unit 51 for each of all program images sampled by the sampling processing unit 50. Then, the selection unit 52 sorts the program images in descending order of scores, and selects C program images with high scores from all the program images as representative images. C is an integer of 1 or more and is set in advance.

選択部５２は、Ｃ枚の代表画像を時系列順にソートし、時系列順のＣ枚の代表画像を出力する。 The selection unit 52 sorts the C representative images in chronological order and outputs the C representative images in chronological order.

尚、選択部５２は、全ての番組画像及びこれらに対応するスコアを入力し、閾値処理により、全ての番組画像をスコアに基づいて例えば３段階のクラスに分類し、上位のクラスの番組画像を代表画像に選択するようにしてもよい。選択部５２は、必ずしもスコアを等間隔に区切ることで、番組画像を分類する必要はない。 Note that the selection unit 52 inputs all program images and their corresponding scores, classifies all program images into, for example, three classes based on the scores by threshold processing, and selects program images of higher classes. The image may be selected as a representative image. The selection unit 52 does not necessarily need to classify the program images by dividing the scores into equal intervals.

例えば、選択部５２は、予め設定された閾値（例えば、０．２５，０．７５）を用いた閾値処理により、０．００≦スコア≦閾値０．２５の場合、当該スコアの番組画像を第１の段階のクラスに分類する。また、選択部５２は、閾値０．２５＜スコア＜閾値０．７５の場合、当該スコアの番組画像を第２の段階のクラスに分類し、閾値０．７５≦スコア≦１．００の場合、当該スコアの番組画像を第３の段階のクラスに分類する。そして、選択部５２は、第３の段階のクラスの番組画像を代表画像に選択する。 For example, when 0.00≦score≦threshold 0.25, the selection unit 52 selects the program image with the score by threshold processing using a preset threshold (for example, 0.25, 0.75). Classify into class 1. Further, when threshold value 0.25<score<threshold value 0.75, the selection unit 52 classifies the program image with the score into the second stage class, and when threshold value 0.75≦score≦1.00, The program image with the score is classified into a third stage class. Then, the selection unit 52 selects the program image of the third stage class as the representative image.

以上のように、本発明の実施形態の代表画像抽出装置２によれば、スコア算出部５１は、番組映像をサンプリングして得られた番組画像について、学習装置１により学習されたニューラルネットワークを用いて、スコアを算出する。 As described above, according to the representative image extraction device 2 of the embodiment of the present invention, the score calculation unit 51 uses the neural network learned by the learning device 1 to calculate the program image obtained by sampling the program video. and calculate the score.

選択部５２は、番組映像をサンプリングして得られた全ての番組画像を、スコアの降順にソートし、スコアの高いＣ枚の番組画像を代表画像に選択し、Ｃ枚の代表画像を時系列順にソートして出力する。 The selection unit 52 sorts all the program images obtained by sampling the program video in descending order of scores, selects C program images with high scores as representative images, and sorts the C representative images in chronological order. Sort and output in order.

ここで、学習装置１により学習されたニューラルネットワークは、番組制作のノウハウを考慮して生成されたモデルである。したがって、このニューラルネットワークを用いることにより、番組映像から、番組制作のノウハウを考慮した代表画像を抽出することができる。また、番組映像以外の特殊なデータを用いることなく、代表画像を抽出することができるから、処理負荷を低減することができる。そして、代表画像を用いて番組ＨＰを作成する際には、作業量を大幅に減らすことができる。 Here, the neural network learned by the learning device 1 is a model generated in consideration of program production know-how. Therefore, by using this neural network, it is possible to extract representative images from program videos in consideration of program production know-how. Furthermore, since the representative image can be extracted without using special data other than the program video, the processing load can be reduced. Further, when creating a program homepage using representative images, the amount of work can be significantly reduced.

〔代表画像抽出装置２を用いた実施例〕
次に、図１０に示した代表画像抽出装置２を用いた実施例について説明する。図１１は、代表画像抽出装置２を用いた第１実施例の番組ＨＰ作成システムを説明する図である。この番組ＨＰ作成システム３は、番組ＨＰの作成対象である番組についての番組ＥＰＧ（Electronic Programming Guide：電子番組表）情報及び番組映像を用いて、番組ＨＰを作成するシステムである。 [Example using representative image extraction device 2]
Next, an example using the representative image extraction device 2 shown in FIG. 10 will be described. FIG. 11 is a diagram illustrating a program HP creation system according to the first embodiment using the representative image extraction device 2. As shown in FIG. This program HP creation system 3 is a system that creates a program HP using program EPG (Electronic Programming Guide) information and program video regarding the program for which the program HP is to be created.

番組ＨＰ作成システム３は、代表画像抽出装置２、要約映像生成部１００及び自動配置処理部１０１を備えて構成される。要約映像生成部１００は、従来の処理により、番組映像から要約映像を生成する構成部であり、代表画像抽出装置２は、図１０に示した本発明の実施形態による装置であり、番組映像から例えば３枚の代表画像を抽出する。 The program HP creation system 3 includes a representative image extraction device 2, a summary video generation section 100, and an automatic placement processing section 101. The summary video generation unit 100 is a component that generates a summary video from a program video using conventional processing, and the representative image extraction device 2 is a device according to the embodiment of the present invention shown in FIG. For example, three representative images are extracted.

自動配置処理部１０１は、番組ＥＰＧ情報、要約映像及び３枚の代表画像を、予め設定された位置に配置し、図１１に示すような番組ＨＰを作成する。 The automatic placement processing unit 101 places the program EPG information, the summary video, and the three representative images at preset positions, and creates a program HP as shown in FIG. 11.

図１２は、代表画像抽出装置２を用いた第２実施例の番組ＤＶＤ販売ＨＰ作成システムを説明する図である。この番組ＤＶＤ販売ＨＰ作成システム４は、番組ＤＶＤ販売ＨＰの作成対象である番組ＤＶＤについてのＤＶＤ宣伝コメント、ＤＶＤパッケージ画像及び番組ＤＶＤ動画を用いて、番組ＤＶＤ販売ＨＰを作成するシステムである。 FIG. 12 is a diagram illustrating a program DVD sales HP creation system according to the second embodiment using the representative image extraction device 2. As shown in FIG. This program DVD sales HP creation system 4 is a system that creates a program DVD sales HP using DVD advertising comments, DVD package images, and program DVD videos about the program DVD for which the program DVD sales HP is created.

番組ＤＶＤ販売ＨＰ作成システム４は、代表画像抽出装置２及び自動配置処理部１０２を備えて構成される。代表画像抽出装置２は、図１０に示した本発明の実施形態による装置であり、番組映像から例えば６枚の代表画像を抽出する。 The program DVD sales HP creation system 4 includes a representative image extraction device 2 and an automatic placement processing section 102. The representative image extraction device 2 is a device according to the embodiment of the present invention shown in FIG. 10, and extracts, for example, six representative images from a program video.

自動配置処理部１０２は、ＤＶＤ宣伝コメント、ＤＶＤパッケージ画像及び６枚の代表画像を、予め設定された位置に配置し、図１２に示すような番組ＤＶＤ販売ＨＰを作成する。 The automatic placement processing unit 102 places the DVD promotion comment, DVD package image, and six representative images at preset positions, and creates a program DVD sales website as shown in FIG. 12.

以上、実施形態を挙げて本発明を説明したが、本発明は前記実施形態に限定されるものではなく、その技術思想を逸脱しない範囲で種々変形可能である。 Although the present invention has been described above with reference to the embodiments, the present invention is not limited to the embodiments described above, and can be modified in various ways without departing from the technical concept thereof.

例えば、図１に示した学習装置１は、番組画像に加え、芸術性評価オープンデータの芸術性評価画像を用いて、ニューラルネットワークを学習するようにしたが、番組画像のみを用いるようにしてもよい。また、学習装置１は、番組画像に加え、芸術性評価オープンデータ以外のオープンデータを用いるようにしてもよい。学習に用いるオープンデータは、画像、及び当該画像に対して所定の観点で評価が付与された正解ラベルからなるデータであれば何でもよい。 For example, the learning device 1 shown in FIG. 1 uses artistic evaluation images from artistic evaluation open data in addition to program images to learn the neural network, but it is also possible to use only program images. good. Further, the learning device 1 may use open data other than artistic evaluation open data in addition to program images. The open data used for learning may be any data as long as it consists of an image and a correct label that has been evaluated from a predetermined viewpoint for the image.

また、図８及び図９に示したＮＮ部３１の特徴抽出用ＮＮ４０は、１０２４次元の画像特徴ベクトルの出力データを求め、スコア算出用ＮＮ４１は、１０２４次元の画像特徴ベクトルを入力データとして扱うようにした。この１０２４次元の画像ベクトルは例示であり、本発明における特徴抽出用ＮＮ４０の出力データ及びスコア算出用ＮＮ４１の入力データは、１０２４次元の画像ベクトルに限定されるものではない。 Further, the feature extraction NN 40 of the NN unit 31 shown in FIGS. 8 and 9 obtains output data of a 1024-dimensional image feature vector, and the score calculation NN 41 handles the 1024-dimensional image feature vector as input data. I made it. This 1024-dimensional image vector is an example, and the output data of the feature extraction NN 40 and the input data of the score calculation NN 41 in the present invention are not limited to the 1024-dimensional image vector.

尚、本発明の実施形態による学習装置１のハードウェア構成としては、通常のコンピュータを使用することができる。学習装置１は、ＣＰＵ、ＲＡＭ等の揮発性の記憶媒体、ＲＯＭ等の不揮発性の記憶媒体、及びインターフェース等を備えたコンピュータによって構成される。本発明の実施形態による代表画像抽出装置２についても同様である。 Note that a normal computer can be used as the hardware configuration of the learning device 1 according to the embodiment of the present invention. The learning device 1 is configured by a computer including a CPU, a volatile storage medium such as a RAM, a non-volatile storage medium such as a ROM, an interface, and the like. The same applies to the representative image extraction device 2 according to the embodiment of the present invention.

学習装置１に備えた番組学習データ生成部１０、メモリ１１，１３、芸術性学習データ生成部１２及び学習部１４の各機能は、これらの機能を記述したプログラムをＣＰＵに実行させることによりそれぞれ実現される。 The functions of the program learning data generation unit 10, memories 11, 13, artistic learning data generation unit 12, and learning unit 14 provided in the learning device 1 are realized by having the CPU execute a program in which these functions are written. be done.

また、代表画像抽出装置２に備えたサンプリング処理部５０、スコア算出部５１及び選択部５２の各機能も、これらの機能を記述したプログラムをＣＰＵに実行させることによりそれぞれ実現される。 Further, the functions of the sampling processing section 50, score calculation section 51, and selection section 52 provided in the representative image extraction device 2 are also realized by causing the CPU to execute a program in which these functions are written.

これらのプログラムは、前記記憶媒体に格納されており、ＣＰＵに読み出されて実行される。また、これらのプログラムは、磁気ディスク（フロッピー（登録商標）ディスク、ハードディスク等）、光ディスク（ＣＤ－ＲＯＭ、ＤＶＤ等）、半導体メモリ等の記憶媒体に格納して頒布することもでき、ネットワークを介して送受信することもできる。 These programs are stored in the storage medium, and are read and executed by the CPU. Additionally, these programs can be stored and distributed in storage media such as magnetic disks (floppy (registered trademark) disks, hard disks, etc.), optical disks (CD-ROMs, DVDs, etc.), semiconductor memories, etc., and can be distributed via networks. You can also send and receive messages.

１学習装置
２代表画像抽出装置
３番組ＨＰ作成システム
４番組ＤＶＤ販売ＨＰ作成システム
１０番組学習データ生成部
１１，１３メモリ
１２芸術性学習データ生成部
１４学習部
２０サンプリング処理部
２１ダウンロード処理部
２２類似度算出部
２３，２４，２５正解スコア付与部
３０切り替え部
３１ＮＮ（ニューラルネットワーク）部
３２誤差算出部
３３パラメータ更新部
４０特徴抽出用ＮＮ
４１スコア算出用ＮＮ
５０サンプリング処理部
５１スコア算出部
５２選択部
１００要約映像生成部
１０１，１０２自動配置処理部
Ｐ₁，・・・，Ｐ_N，Ｐ_n 番組画像
Ｐ’₁，・・・，Ｐ’_M，Ｐ’_m 静止画
Ｓ_n,m 類似度
Ｂ最大値 1 Learning device 2 Representative image extraction device 3 Program HP creation system 4 Program DVD sales HP creation system 10 Program learning data generation units 11, 13 Memory 12 Artistic learning data generation unit 14 Learning unit 20 Sampling processing unit 21 Download processing unit 22 Similarity Degree calculation units 23, 24, 25 Correct score assigning unit 30 Switching unit 31 NN (neural network) unit 32 Error calculation unit 33 Parameter updating unit 40 Feature extraction NN
41 NN for score calculation
50 Sampling processing unit 51 Score calculation unit 52 Selection unit 100 Summary video generation unit 101, 102 Automatic placement processing unit P ₁ , ..., P _N , P _n program image P' ₁ , ..., P' _M , P ' _m still image S _n,m similarity B maximum value

Claims

In a learning device for learning neural networks,
A frame image obtained by sampling a learning program video is defined as a program image, a score of one of the multiple stages assigned to the program image is defined as a first correct score, and a plurality of stages assigned to a predetermined image is defined as a first correct score. The score at any one of the stages is a second correct score, and the neural network is a model in which the program image and the predetermined image are alternately input and a one-dimensional score is output.
a memory storing program learning data consisting of the program image and the first correct score, and predetermined learning data consisting of the predetermined image and the second correct score;
a learning unit that reads the program learning data and the predetermined learning data from the memory and learns the neural network using the program learning data and the predetermined learning data,
The learning department is
Using the neural network, calculate a one-dimensional score of the program image from the program image included in the program learning data as a first score; a neural network unit that calculates a one-dimensional score of the predetermined image from the image as a second score;
An error between the first score calculated by the neural network unit and the first correct score included in the program learning data is calculated as a first error, and an error between the first score and the first correct score included in the program learning data is calculated as a first error. an error calculation unit that calculates an error between the second correct answer score and the second correct answer score as a second error;
A learning device comprising: a parameter updating section that updates parameters of the neural network so that the sum of the first error and the second error calculated by the error calculating section becomes smaller.

The learning device according to claim 1,
further comprising a program learning data generation unit that generates the program learning data;
The program learning data generation unit includes:
a sampling processing unit that samples the learning program video into the program image;
a download processing unit that accesses a URL of a homepage of a program corresponding to the learning program video and downloads a still image of the program;
a similarity calculation unit that calculates a degree of similarity between the program image sampled by the sampling processing unit and the still image downloaded by the download processing unit;
The first correct score is assigned to the program image based on the similarity calculated by the similarity calculation unit, and the program learning data consisting of the program image and the first correct score is stored in the memory. A learning device comprising: a first correct score assigning unit that stores a first correct answer score.

The learning device according to claim 2,
comprising a predetermined learning data generation unit that generates the predetermined learning data,
The predetermined learning data generation unit includes:
By inputting open data consisting of the predetermined image and a label of one of a plurality of stages assigned in advance to the predetermined image, and converting the label into the second correct answer score, the predetermined A learning device comprising a second correct score assigning unit that assigns the second correct score to an image and stores the predetermined learning data including the predetermined image and the second correct score in the memory. .

The learning device according to any one of claims 1 to 3,
The number of program learning data is A (A is a positive integer), the number of predetermined learning data is B (B is a positive integer), A<B, and the result of subtracting A from B is ( As B-A),
The learning department is
A pieces of the program learning data and (B-A) pieces of data that are short of the program learning data with respect to the predetermined learning data, using any or all of the A pieces of the program learning data. The learning device is characterized in that the neural network is trained using the program learning data supplemented by the program learning data and the B pieces of the predetermined learning data.

In a representative image extraction device that extracts a representative image from a program video,
a sampling processing unit that samples the program video into a frame image and outputs the frame image as a program image;
A score calculation that calculates a one-dimensional score of the program image from the program image output by the sampling processing unit using a neural network trained by the learning device according to any one of claims 1 to 4. Department and
a selection unit that selects the representative image from all the program images sampled and output from the program video by the sampling processing unit, based on the score calculated by the score calculation unit; A representative image extraction device characterized by:

A program for causing a computer to function as the learning device according to any one of claims 1 to 4.

A program for causing a computer to function as the representative image extraction device according to claim 5.