JP6366626B2

JP6366626B2 - Generating device, generating method, and generating program

Info

Publication number: JP6366626B2
Application number: JP2016054435A
Authority: JP
Inventors: 隼人小林; 幸浩田頭; 正樹野口
Original assignee: Yahoo Japan Corp
Current assignee: Yahoo Japan Corp
Priority date: 2016-03-17
Filing date: 2016-03-17
Publication date: 2018-08-01
Anticipated expiration: 2036-03-17
Also published as: JP2017169140A

Description

本発明は、生成装置、生成方法、及び生成プログラムに関する。 The present invention relates to a generation device, a generation method, and a generation program.

従来、ニュース記事等のコンテンツに含まれる画像を加工する技術が提供されている。例えば、人の顔が含まれる画像における両目間の距離に基づいて画像を加工する技術が提供されている。また、このような画像を加工する技術を用いて、コンテンツを要約する動画情報（以下、単に「動画」ともいう）を生成する場合がある。例えば、コンテンツに含まれる画像や動画等の画像に関する情報を用いてコンテンツを要約する動画を生成する場合がある。 Conventionally, a technique for processing an image included in content such as a news article has been provided. For example, a technique for processing an image based on a distance between both eyes in an image including a human face is provided. Further, there is a case where moving image information (hereinafter also simply referred to as “moving image”) that summarizes content is generated by using such a technique for processing an image. For example, there is a case where a moving image summarizing the content is generated using information related to images such as images and moving images included in the content.

特開２００５−１０８２０７号公報JP 2005-108207 A

しかしながら、上記の従来技術ではコンテンツの内容を含む動画が適切に生成されるとは限らない。例えば、コンテンツに含まれる画像内の領域をクロッピングした加工画像から動画を生成する場合、コンテンツの内容を含む動画が適切に生成されるとは限らない。 However, in the above-described conventional technology, a moving image including the content is not always generated appropriately. For example, when a moving image is generated from a processed image obtained by cropping an area in an image included in the content, the moving image including the content is not always generated appropriately.

本願は、上記に鑑みてなされたものであって、コンテンツの内容を含む動画を適切に生成する生成装置、生成方法、及び生成プログラムを提供することを目的とする。 The present application has been made in view of the above, and an object thereof is to provide a generation device, a generation method, and a generation program that appropriately generate a moving image including the content.

本願に係る生成装置は、コンテンツに含まれる画像に関する情報から抽出される対象物の領域に関する情報である特徴領域情報を取得する取得部と、前記取得部により取得された前記特徴領域情報に基づいて前記コンテンツから複数の加工画像を生成する第１生成部と、前記複数の加工画像に付された順位に基づく順序で、前記複数の加工画像が表示される動画情報を生成する第２生成部と、を備えたことを特徴とする。 The generation device according to the present application is based on an acquisition unit that acquires feature region information that is information on a region of an object extracted from information on an image included in content, and the feature region information acquired by the acquisition unit. A first generation unit that generates a plurality of processed images from the content; and a second generation unit that generates moving image information in which the plurality of processed images are displayed in an order based on the order attached to the plurality of processed images. , Provided.

実施形態の一態様によれば、コンテンツの内容を含む動画を適切に生成することができるという効果を奏する。 According to one aspect of the embodiment, there is an effect that a moving image including the content can be appropriately generated.

図１は、実施形態に係る生成処理の一例を示す図である。FIG. 1 is a diagram illustrating an example of a generation process according to the embodiment. 図２は、実施形態に係る配信システムの構成例を示す図である。FIG. 2 is a diagram illustrating a configuration example of a distribution system according to the embodiment. 図３は、実施形態に係る生成装置の構成例を示す図である。FIG. 3 is a diagram illustrating a configuration example of the generation apparatus according to the embodiment. 図４は、実施形態に係るコンテンツ情報記憶部の一例を示す図である。FIG. 4 is a diagram illustrating an example of a content information storage unit according to the embodiment. 図５は、実施形態に係る端末装置の構成例を示す図である。FIG. 5 is a diagram illustrating a configuration example of the terminal device according to the embodiment. 図６は、実施形態に係る端末装置における表示の一例を示す図である。FIG. 6 is a diagram illustrating an example of display on the terminal device according to the embodiment. 図７は、実施形態に係る生成処理の一例を示すフローチャートである。FIG. 7 is a flowchart illustrating an example of the generation process according to the embodiment. 図８は、実施形態に係る動画を用いた生成処理の一例を示す図である。FIG. 8 is a diagram illustrating an example of a generation process using a moving image according to the embodiment. 図９は、実施形態に係る動画を用いた生成処理の一例を示す図である。FIG. 9 is a diagram illustrating an example of a generation process using a moving image according to the embodiment. 図１０は、実施形態に係る動画のキーフレームに基づく生成処理の一例を示す図である。FIG. 10 is a diagram illustrating an example of a generation process based on a moving image key frame according to the embodiment. 図１１は、生成装置の機能を実現するコンピュータの一例を示すハードウェア構成図である。FIG. 11 is a hardware configuration diagram illustrating an example of a computer that realizes the function of the generation device.

以下に、本願に係る生成装置、生成方法、及び生成プログラムを実施するための形態（以下、「実施形態」と呼ぶ）について図面を参照しつつ詳細に説明する。なお、この実施形態により本願に係る生成装置、生成方法、及び生成プログラムが限定されるものではない。また、以下の各実施形態において同一の部位には同一の符号を付し、重複する説明は省略される。 Hereinafter, a generation apparatus, a generation method, and a mode for executing a generation program (hereinafter referred to as “embodiment”) according to the present application will be described in detail with reference to the drawings. Note that the generation device, the generation method, and the generation program according to the present application are not limited by this embodiment. In the following embodiments, the same portions are denoted by the same reference numerals, and redundant description is omitted.

（実施形態）
〔１．生成処理〕
まず、図１を用いて、実施形態に係る生成処理の一例として、記事コンテンツ（以下、単に「コンテンツ」ともいう）に含まれる画像情報（以下、単に「画像」ともいう）を用いた生成処理の一例を示す。図１は、実施形態に係る生成処理の一例を示す図である。具体的には、図１は、コンテンツＡＴ１１（図４参照）に含まれる画像ＩＭ１１を用いて動画ＭＶ１１を生成する生成処理を一例として示す。以下では、生成処理により生成される動画を要約動画と記載する場合がある。例えば、生成処理により生成される動画ＭＶ１１は、要約動画ＭＶ１１と記載する場合がある。図１に示す生成処理は、生成装置１００（図３参照）により実行される。図１では、生成装置１００は、画像ＩＭ１１から抽出される対象物（以下、「オブジェクト」ともいう）の領域に関する情報である特徴領域情報に基づいてコンテンツＡＴ１１から複数の加工画像ＩＰ１１１〜ＩＰ１４１等を生成する。そして、生成装置１００は、加工画像ＩＰ１１１〜ＩＰ１４１等から要約動画ＭＶ１１を生成する。 (Embodiment)
[1. Generation process)
First, referring to FIG. 1, as an example of the generation process according to the embodiment, a generation process using image information (hereinafter also simply referred to as “image”) included in article content (hereinafter also simply referred to as “content”). An example is shown. FIG. 1 is a diagram illustrating an example of a generation process according to the embodiment. Specifically, FIG. 1 shows, as an example, a generation process that generates a moving image MV11 using an image IM11 included in the content AT11 (see FIG. 4). Hereinafter, the moving image generated by the generation process may be described as a summary moving image. For example, the moving image MV11 generated by the generation process may be described as a summary moving image MV11. The generation process illustrated in FIG. 1 is executed by the generation apparatus 100 (see FIG. 3). In FIG. 1, the generation apparatus 100 generates a plurality of processed images IP111 to IP141 and the like from the content AT11 based on feature area information that is information related to an area of an object (hereinafter also referred to as “object”) extracted from the image IM11. Generate. Then, the generation device 100 generates the summary video MV11 from the processed images IP111 to IP141 and the like.

図１では、コンテンツＡＴ１１に含まれる画像ＩＭ１１と文字情報ＩＣ１１とに基づいて特徴領域情報が抽出される例を示す。図１に示す例において、複数の加工画像ＩＰ１１１〜ＩＰ１４１等は、画像ＩＭ１１の所定の範囲をクロッピングすることにより生成される。例えば、加工画像ＩＰ１１は、画像ＩＭ１１中の特徴的な部分を含む範囲をクロッピングすることにより生成される。なお、ここでいうクロッピングとは画像から所定の領域を切り取る処理をいう。 FIG. 1 shows an example in which feature area information is extracted based on an image IM11 and character information IC11 included in the content AT11. In the example illustrated in FIG. 1, the plurality of processed images IP111 to IP141 and the like are generated by cropping a predetermined range of the image IM11. For example, the processed image IP11 is generated by cropping a range including a characteristic portion in the image IM11. Note that the term “cropping” here refers to a process of cutting out a predetermined area from an image.

図１中の画像ＩＭ１１は、野球の試合における１シーンを示す画像である。具体的には、図１中の画像ＩＭ１１は、あるチーム（チームＡＡ）の選手であるキャッチャーＰに向けて、チームＡＡの選手であるピッチャーＮが投球し、打席に立った相手チーム（チームＢＢ）の選手であるバッターＯが打つシーンを示す画像である。また、図１中の画像ＩＭ１１には、審判Ｑや、ピッチャーＮが投球したボール等が含まれる。 An image IM11 in FIG. 1 is an image showing one scene in a baseball game. Specifically, the image IM11 in FIG. 1 shows an opponent team (team BB) who is pitched by a pitcher N who is a player of team AA and who is standing at a bat at a catcher P who is a player of a team (team AA). ) Is a picture showing a scene hit by batter O who is a player. Further, the image IM11 in FIG. 1 includes a referee Q, a ball thrown by the pitcher N, and the like.

また、図１中の文字情報ＩＣ１１は、画像ＩＭ１１に関する文章を含む。具体的には、文字情報ＩＣ１１には、Ｘ月Ｙ日に行われた決勝戦におけるチームＡＡ対チームＢＢの試合に関する内容が含まれる。また、文字情報ＩＣ１１には、チームＡＡのピッチャーＮが投げたボールをチームＢＢのバッターＯが打ち返したことに関する内容が含まれる。 Moreover, the character information IC11 in FIG. 1 includes a sentence regarding the image IM11. Specifically, the text information IC11 includes the contents related to the game between team AA and team BB in the final game held on X month Y. Further, the character information IC11 includes contents relating to the batter O of the team BB returning the ball thrown by the pitcher N of the team AA.

まず、生成装置１００は、画像ＩＭ１１における特徴量に関する特徴領域情報ＦＲ１１を抽出する（ステップＳ１１）。具体的には、生成装置１００は、画像ＩＭ１１に基づいて画像ＩＭ１１における特徴量に関する特徴領域情報ＦＲ１１を抽出する。なお、ここでいう、特徴領域情報とは、対象物の領域に関する情報であり、画像ＩＭ１１中のどこに対象物が含まれるかを示す情報である。例えば、生成装置１００は、サリエンシーディテクション（Saliency Detection）等の画像処理における種々の従来手法を適宜用いて、画像ＩＭ１１における特徴領域情報ＦＲ１１を抽出する。例えば、生成装置１００は、Ｒ−ＣＮＮ(Regions with Convolutional Neural Network)等の画像認識技術を用いた画像処理を適宜用いてもよい。また、生成装置１００は、画像処理の種々の従来手法等を適宜用いて、画像におけるオブジェクト（物体）の認識による情報の抽出を行ってもよい。 First, the generation device 100 extracts feature area information FR11 related to a feature amount in the image IM11 (step S11). Specifically, the generation apparatus 100 extracts feature region information FR11 related to the feature amount in the image IM11 based on the image IM11. Note that the feature area information here is information related to the area of the object, and is information indicating where the object is included in the image IM11. For example, the generation apparatus 100 extracts feature region information FR11 in the image IM11 by appropriately using various conventional methods in image processing such as Saliency Detection. For example, the generation apparatus 100 may appropriately use image processing using an image recognition technique such as R-CNN (Regions with Convolutional Neural Network). The generation apparatus 100 may extract information by recognizing an object (object) in an image by appropriately using various conventional techniques for image processing.

例えば、生成装置１００は、文字情報ＩＣ１１からトピックを抽出する。なお、生成装置１００は、トピック分析（解析）等の種々の従来手法を適宜用いて、文字情報ＩＣ１１からトピックを抽出してもよい。例えば、生成装置１００は、文字情報ＩＣ１１を形態素解析等の自然言語処理技術を適宜用いて解析することにより、文字情報ＩＣ１１から重要なキーワードをトピックとして抽出してもよい。図１の例では、生成装置１００は、文字情報ＩＣ１１から、「ピッチャーＮ」や「バッターＯ」や「直球（ボール）」等のトピックを抽出する。 For example, the generation apparatus 100 extracts a topic from the character information IC11. Note that the generation apparatus 100 may extract topics from the character information IC11 by appropriately using various conventional methods such as topic analysis (analysis). For example, the generation apparatus 100 may extract important keywords from the character information IC11 as topics by appropriately analyzing the character information IC11 using a natural language processing technique such as morphological analysis. In the example of FIG. 1, the generation apparatus 100 extracts topics such as “pitcher N”, “batter O”, and “straight ball (ball)” from the character information IC11.

そして、生成装置１００は、文字情報ＩＣ１１から抽出したトピックに基づいて、画像ＩＭ１１から特徴領域情報ＦＲ１１を抽出する。例えば、生成装置１００は、上述した画像処理等の種々の従来手法を適宜用いて、画像ＩＭ１１における特徴領域情報ＦＲ１１を抽出する。例えば、生成装置１００は、画像ＩＭ１１において文字情報ＩＣ１１から抽出したトピックに関する物体を含む領域の特徴量が大きくなるように特徴領域情報ＦＲ１１を抽出する。 Then, the generation apparatus 100 extracts the feature area information FR11 from the image IM11 based on the topic extracted from the character information IC11. For example, the generation apparatus 100 extracts the feature area information FR11 in the image IM11 by appropriately using various conventional methods such as the image processing described above. For example, the generating apparatus 100 extracts the feature region information FR11 so that the feature amount of the region including the object related to the topic extracted from the character information IC11 in the image IM11 becomes large.

図１では、生成装置１００は、特徴領域情報ＦＲ１１に示すように、画像ＩＭ１１に基づいて、画像ＩＭ１１における特徴領域を抽出する。例えば、特徴領域情報ＦＲ１１は、画像ＩＭ１１における各画素の特徴量を示す。なお、ここでいう特徴量は、例えば、特徴量を示す数値である。具体的には、特徴領域情報ＦＲ１１を構成する各点（画素）の位置は、画像ＩＭ１１に重畳させた場合に画像ＩＭ１１において重なる位置に対応し、特徴領域情報ＦＲ１１は、画像ＩＭ１１において対応する画素の特徴量を示す。なお、図１中の特徴領域情報ＦＲ１１では、特徴を示す領域を色が濃い態様で示す。すなわち、特徴領域情報ＦＲ１１では、特徴量が大きいほど色が濃い態様で表示される。具体的には、図１中の特徴領域情報ＦＲ１１では、画像ＩＭ１１において人の頭部（顔）やボールが位置する領域が色の濃い態様で示される。すなわち、図１では、生成装置１００は、バッターＯの顔やオブジェクトＯＢ１５が位置する領域が色の濃い態様で示される。 In FIG. 1, the generation apparatus 100 extracts a feature region in the image IM11 based on the image IM11 as indicated by the feature region information FR11. For example, the feature area information FR11 indicates the feature amount of each pixel in the image IM11. The feature value here is a numerical value indicating the feature value, for example. Specifically, the position of each point (pixel) constituting the feature area information FR11 corresponds to a position that overlaps in the image IM11 when superimposed on the image IM11, and the feature area information FR11 corresponds to a pixel corresponding to the image IM11. The feature amount is shown. Note that, in the feature region information FR11 in FIG. 1, the feature region is shown in a dark color manner. That is, in the feature area information FR11, the larger the feature amount, the darker the color is displayed. Specifically, in the feature region information FR11 in FIG. 1, the region where the human head (face) or the ball is located in the image IM11 is shown in a dark color manner. That is, in FIG. 1, the generation apparatus 100 shows the face of the batter O and the region where the object OB15 is located in a dark color manner.

次に、生成装置１００は、画像ＩＭ１１に含まれるオブジェクトを抽出し、抽出したオブジェクトの表示順を決定する（ステップＳ１２）。例えば、生成装置１００は、特徴領域情報ＦＲ１１や文字情報ＩＣ１１等の種々の情報に基づいて、オブジェクト一覧ＯＬ１１に示すように、画像ＩＭ１１に含まれるオブジェクトＯＢ１１〜ＯＢ１５等を抽出する。図１の例では、生成装置１００は、画像ＩＭ１１に含まれるキャッチャーＰをオブジェクトＯＢ１１として抽出する。また、生成装置１００は、画像ＩＭ１１に含まれるピッチャーＮをオブジェクトＯＢ１２として抽出する。また、生成装置１００は、画像ＩＭ１１に含まれるバッターＯをオブジェクトＯＢ１３として抽出する。また、生成装置１００は、画像ＩＭ１１に含まれる審判ＱをオブジェクトＯＢ１４として抽出する。また、生成装置１００は、画像ＩＭ１１に含まれるボールをオブジェクトＯＢ１５として抽出する。 Next, the generating apparatus 100 extracts objects included in the image IM11 and determines the display order of the extracted objects (step S12). For example, the generating apparatus 100 extracts the objects OB11 to OB15 and the like included in the image IM11 as shown in the object list OL11 based on various information such as the feature area information FR11 and the character information IC11. In the example of FIG. 1, the generation device 100 extracts the catcher P included in the image IM11 as the object OB11. Further, the generation apparatus 100 extracts the pitcher N included in the image IM11 as the object OB12. Further, the generation apparatus 100 extracts the batter O included in the image IM11 as the object OB13. Further, the generation apparatus 100 extracts the referee Q included in the image IM11 as the object OB14. Further, the generation device 100 extracts the ball included in the image IM11 as the object OB15.

なお、生成装置１００は、種々の従来技術を適宜用いて、文字情報ＩＣ１１に含まれるピッチャーＮ（オブジェクトＯＢ１２）が位置する領域やバッターＯ（オブジェクトＯＢ１３）が位置する領域やボール（オブジェクトＯＢ１５）が位置する領域を推定してもよい。例えば、生成装置１００は、特徴量が大きい領域の形状や位置関係等に応じて、ピッチャーＮ（オブジェクトＯＢ１２）が位置する領域やバッターＯ（オブジェクトＯＢ１３）が位置する領域やボール（オブジェクトＯＢ１５）が位置する領域を推定してもよい。また、生成装置１００は、種々の情報を適宜用いて、ピッチャーＮ（オブジェクトＯＢ１２）が位置する領域やバッターＯ（オブジェクトＯＢ１３）が位置する領域やボール（オブジェクトＯＢ１５）が位置する領域を推定してもよい。 Note that the generation apparatus 100 uses various conventional techniques as appropriate to generate an area where the pitcher N (object OB12) included in the character information IC11 is located, an area where the batter O (object OB13) is located, and a ball (object OB15). You may estimate the area | region which is located. For example, in the generation apparatus 100, an area in which the pitcher N (object OB12) is located, an area in which the batter O (object OB13) is located, or a ball (object OB15) is determined according to the shape or positional relationship of the area having a large feature amount. You may estimate the area | region which is located. Further, the generation apparatus 100 uses various information as appropriate to estimate the area where the pitcher N (object OB12) is located, the area where the batter O (object OB13) is located, and the area where the ball (object OB15) is located. Also good.

例えば、生成装置１００は、種々のオブジェクトを学習した学習情報に基づいて、オブジェクトＯＢ１１〜ＯＢ１５を抽出してもよい。例えば、生成装置１００は、ピッチャーＮのユニフォームや背番号等を学習した学習情報に基づいて、画像ＩＭ１１からオブジェクトＯＢ１２を抽出してもよい。また、例えば、生成装置１００は、バッターＯの顔等を学習した学習情報に基づいて、画像ＩＭ１１からオブジェクトＯＢ１３を抽出してもよい。また、例えば、生成装置１００は、野球のボールを学習した学習情報に基づいて、画像ＩＭ１１からオブジェクトＯＢ１５を抽出してもよい。例えば、生成装置１００は、事前に学習したボール内の色の分布情報等に基づいて、画像ＩＭ１１からオブジェクトＯＢ１５を抽出してもよい。なお、生成装置１００は、特徴領域情報ＦＲ１１等により画像ＩＭ１１からオブジェクトＯＢ１１〜ＯＢ１５等が抽出可能であれば、どのような技術によりステップＳ１２のオブジェクト抽出を行ってもよい。 For example, the generation device 100 may extract the objects OB11 to OB15 based on learning information obtained by learning various objects. For example, the generation apparatus 100 may extract the object OB12 from the image IM11 based on learning information obtained by learning the uniform, the spine number, and the like of the pitcher N. Further, for example, the generation apparatus 100 may extract the object OB13 from the image IM11 based on learning information obtained by learning the face of the batter O and the like. Further, for example, the generation device 100 may extract the object OB15 from the image IM11 based on learning information obtained by learning a baseball. For example, the generation apparatus 100 may extract the object OB15 from the image IM11 based on the color distribution information in the ball learned in advance. The generation apparatus 100 may perform the object extraction in step S12 by any technique as long as the objects OB11 to OB15 and the like can be extracted from the image IM11 by using the feature area information FR11 and the like.

また、生成装置１００は、抽出したオブジェクトＯＢ１１〜ＯＢ１５等の表示順を決定する。例えば、生成装置１００は、文字情報ＩＣ１１から抽出したトピックや文字情報ＩＣ１１の構文や時系列に関する情報に基づいて、オブジェクトＯＢ１１〜ＯＢ１５等の表示順を決定する。 Further, the generation apparatus 100 determines the display order of the extracted objects OB11 to OB15 and the like. For example, the generating apparatus 100 determines the display order of the objects OB11 to OB15 and the like based on the topic extracted from the character information IC11, information on the syntax of the character information IC11, and time series.

図１の例では、文字情報ＩＣ１１には、「ピッチャーＮがボールを投げる」、「ボールをバッターＯが打つ」という内容が含まれる。そのため、生成装置１００は、種々の従来技術を適宜用いて、文字情報ＩＣ１１の内容に基づいて、ピッチャーＮの表示順よりもバッターＯの表示順が後であると決定する。例えば、生成装置１００は、形態素解析や構文解析等の種々ン従来技術を適宜用いて、文字情報ＩＣ１１の内容を解析することにより、表示順を決定する。例えば、生成装置１００は、テキストデータである文字情報ＩＣ１１を自然言語処理により解析する。また、生成装置１００は、ボールがピッチャーＮとバッターＯとをつなぐ関係にあるため、ボールの表示順をピッチャーＮとバッターＯとの間の表示順であると決定する。これにより、生成装置１００は、ピッチャーＮ、ボール、バッターＯの時系列における順序を抽出する。すなわち、生成装置１００は、オブジェクトＯＢ１２の表示順を１位、オブジェクトＯＢ１３の表示順を３位、オブジェクトＯＢ１５の表示順を２位に決定する。また、生成装置１００は、画像ＩＭ１１に含まれる他のオブジェクトＯＢ１１やＯＢ１４等はコンテンツＡＴ１１において重要度が低いオブジェクトとして、表示順を設定しない。すなわち、生成装置１００は、画像ＩＭ１１に含まれるキャッチャーＰや審判ＱをコンテンツＡＴ１１の内容において重要ではないとして、表示順を「−（無）」と決定する。 In the example of FIG. 1, the character information IC11 includes contents “Pitcher N throws the ball” and “Batter hits the ball batter O”. Therefore, the generation apparatus 100 determines that the display order of the batter O is later than the display order of the pitcher N based on the contents of the character information IC11 using various conventional techniques as appropriate. For example, the generating apparatus 100 determines the display order by analyzing the contents of the character information IC11 by appropriately using various conventional techniques such as morphological analysis and syntax analysis. For example, the generation apparatus 100 analyzes the character information IC11 that is text data by natural language processing. Further, the generation apparatus 100 determines that the display order of the balls is the display order between the pitcher N and the batter O because the ball is in a relationship connecting the pitcher N and the batter O. Thereby, the generating apparatus 100 extracts the order in the time series of the pitcher N, the ball, and the batter O. That is, the generating apparatus 100 determines the display order of the object OB12 as the first place, the display order of the object OB13 as the third place, and the display order of the object OB15 as the second place. In addition, the generation apparatus 100 does not set the display order of the other objects OB11 and OB14 included in the image IM11 as objects with low importance in the content AT11. That is, the generation apparatus 100 determines that the display order is “-(none)”, assuming that the catcher P and the referee Q included in the image IM11 are not important in the content of the content AT11.

そして、生成装置１００は、コンテンツＡＴ１１から複数の加工画像ＩＰ１１１〜ＩＰ１４１等を生成し、生成した複数の加工画像ＩＰ１１１〜ＩＰ１４１が表示される要約動画ＭＶ１１を生成する（ステップＳ１３）。例えば、生成装置１００は、特徴領域情報ＦＲ１１やオブジェクト一覧ＯＬ１１に基づいて、画像ＩＭ１１をクロッピングすることにより、複数の加工画像ＩＰ１１１〜ＩＰ１４１等を生成する。なお、図１の例では、説明を簡単にするために、複数の加工画像ＩＰ１１１〜ＩＰ１４１等を生成する際に、画像ＩＭ１１をクロッピングする領域ＡＲ１１〜ＡＲ１４のアスペクト比（縦横比）を１：１とする。なお、クロッピングする領域のアスペクト比や形状は、各加工画像の生成で異なってもよい。例えば、ある加工画像の生成におけるクロッピングする領域のアスペクト比は、１：２や３：４であってもよい。また、例えば、ある加工画像の生成におけるクロッピングする領域の形状は、円形状や四角以外の多角形状等、種々の形状であってもよい。また、クロッピングする領域をどのような大きさにするかは、適宜の基準に基づいて決定されてもよい。例えば、生成装置１００は、領域に含まれる各画素の特徴量の値に基づいて、クロッピングする領域の大きさを決定してもよい。例えば、生成装置１００は、領域に含まれる各画素の特徴量の平均値に基づいて、クロッピングする領域の大きさを決定してもよい。例えば、生成装置１００は、領域に所望のオブジェクトに部位が含まれ、領域中の各画素の特徴量の平均値が大きくなるように、クロッピングする領域の大きさを決定してもよい。例えば、生成装置１００は、領域にバッターＯの顔が含まれ、領域中の各画素の特徴量の平均値が大きくなるように、バッターＯ（オブジェクトＯＢ１３）をクロッピングする領域の大きさを決定してもよい。 Then, the generation device 100 generates a plurality of processed images IP111 to IP141 and the like from the content AT11, and generates a summary video MV11 on which the generated plurality of processed images IP111 to IP141 are displayed (Step S13). For example, the generating apparatus 100 generates a plurality of processed images IP111 to IP141 and the like by cropping the image IM11 based on the feature area information FR11 and the object list OL11. In the example of FIG. 1, in order to simplify the description, when generating the plurality of processed images IP111 to IP141 and the like, the aspect ratio (aspect ratio) of the areas AR11 to AR14 where the image IM11 is cropped is 1: 1. And It should be noted that the aspect ratio and shape of the cropped region may differ depending on the generation of each processed image. For example, the aspect ratio of the cropped area in the generation of a certain processed image may be 1: 2 or 3: 4. Further, for example, the shape of the region to be cropped in the generation of a certain processed image may be various shapes such as a circular shape or a polygonal shape other than a square. Further, the size of the cropping region may be determined based on an appropriate criterion. For example, the generation apparatus 100 may determine the size of the region to be cropped based on the feature value of each pixel included in the region. For example, the generation apparatus 100 may determine the size of the region to be cropped based on the average value of the feature values of each pixel included in the region. For example, the generation apparatus 100 may determine the size of the cropped region so that a desired object is included in the region and the average value of the feature values of each pixel in the region is large. For example, the generation apparatus 100 determines the size of the area to be cropped of the batter O (object OB13) so that the area includes the face of the batter O and the average value of the feature values of each pixel in the area increases. May be.

図１の例では、生成装置１００は、ピッチャーＮの略全身が含まれる加工画像ＩＰ１１１やピッチャーＮの背番号部分が含まれる加工画像ＩＰ１２１やボールが含まれる加工画像ＩＰ１３１やバッターＯが含まれる加工画像ＩＰ１４１等を生成する。例えば、生成装置１００は、画像ＩＭ１１の領域ＡＲ１１をクロッピングすることにより、加工画像ＩＰ１１１を生成する。また、例えば、生成装置１００は、画像ＩＭ１１の領域ＡＲ１２をクロッピングすることにより、加工画像ＩＰ１２１を生成する。また、例えば、生成装置１００は、画像ＩＭ１１の領域ＡＲ１３をクロッピングすることにより、加工画像ＩＰ１３１を生成する。また、例えば、生成装置１００は、画像ＩＭ１１の領域ＡＲ１４をクロッピングすることにより、加工画像ＩＰ１４１を生成する。なお、図１では説明を簡単にするために、加工画像ＩＰ１１１〜ＩＰ１４１のみを図示するが、生成装置１００は、画像ＩＭ１１から多数の加工画像を生成してもよい。 In the example of FIG. 1, the generation apparatus 100 includes a processed image IP111 including substantially the whole body of the pitcher N, a processed image IP121 including the back number portion of the pitcher N, a processed image IP131 including the ball, and a process including the batter O. An image IP 141 or the like is generated. For example, the generation device 100 generates the processed image IP111 by cropping the area AR11 of the image IM11. Further, for example, the generation apparatus 100 generates the processed image IP121 by cropping the area AR12 of the image IM11. Further, for example, the generation apparatus 100 generates the processed image IP131 by cropping the area AR13 of the image IM11. Further, for example, the generation apparatus 100 generates the processed image IP141 by cropping the area AR14 of the image IM11. In FIG. 1, only the processed images IP111 to IP141 are illustrated for simplicity of explanation, but the generating apparatus 100 may generate a large number of processed images from the image IM11.

そして、生成装置１００は、複数の加工画像ＩＰ１１１〜ＩＰ１４１等の順位を決定する。なお、図１の例では、複数の加工画像ＩＰ１１１〜ＩＰ１４１等の順位は、複数の加工画像ＩＰ１１１〜ＩＰ１４１等を表示する順番に対応する。すなわち、生成した要約動画ＭＶ１１においては順位が高い加工画像から順に表示される。例えば、生成装置１００は、複数の加工画像ＩＰ１１１〜ＩＰ１４１等のうち、表示順が１位のオブジェクトであるピッチャーＮを含む加工画像ＩＰ１１１、ＩＰ１２１等に高い順位を付す。具体的には、生成装置１００は、ピッチャーＮの略全身が含まれる加工画像ＩＰ１１１に順位Ａを付し、ピッチャーＮの背番号部分が含まれる加工画像ＩＰ１２１に順位Ｂ（Ａ＋α）を付す。なお、順位Ａは順位１（位）であってもよい。 Then, the generation apparatus 100 determines the order of the plurality of processed images IP111 to IP141 and the like. In the example of FIG. 1, the order of the plurality of processed images IP111 to IP141 corresponds to the order in which the plurality of processed images IP111 to IP141 are displayed. That is, in the generated summary video MV11, the processed images are displayed in order from the highest ranking. For example, the generation apparatus 100 assigns a higher rank to the processed images IP111, IP121, and the like including the pitcher N that is the first-ranked object among the plurality of processed images IP111 to IP141. Specifically, the generation apparatus 100 assigns a rank A to the processed image IP111 including the substantially whole body of the pitcher N, and assigns a rank B (A + α) to the processed image IP121 including the back number portion of the pitcher N. The rank A may be rank 1 (rank).

また、生成装置１００が各加工画像の順位に基づいて、複数の加工画像ＩＰ１１１〜ＩＰ１４１間をつなぐフレーム補間等の補間処理を行うことにより、要約動画ＭＶ１１を生成する場合、加工画像ＩＰ１２１に付される順位Ｂ（Ａ＋α）の「α」は「１」であってもよい。なお、ここでいう補間処理には、線形補間やスプライン補間等の種々の従来技術が適宜用いられてもよい。例えば、補間処理により、加工画像ＩＰ１１１、ＩＰ１２１をフレームとして、加工画像ＩＰ１１１と加工画像ＩＰ１２１との間を滑らかにつなぐフレーム補間が行われてもよい。例えば、生成装置１００は、補間処理により、各オブジェクト間の直線的に移動するように表示される要約動画を生成してもよい。また、生成装置１００が複数の加工画像ＩＰ１１１〜ＩＰ１４１等を順位に基づいて連続して表示する要約動画ＭＶ１１を生成する場合、加工画像ＩＰ１２１に付される順位Ｂ（Ａ＋α）の「α」は、「加工画像ＩＰ１１１と加工画像ＩＰ１２１との間に表示される加工画像の枚数＋１」であってもよい。 In addition, when the generation apparatus 100 generates the summary moving image MV11 by performing interpolation processing such as frame interpolation that connects the plurality of processed images IP111 to IP141 based on the ranking of each processed image, the generated image MV11 is attached to the processed image IP121. “Α” in the ranking B (A + α) may be “1”. It should be noted that various conventional techniques such as linear interpolation and spline interpolation may be appropriately used for the interpolation processing here. For example, frame interpolation that smoothly connects the processed image IP111 and the processed image IP121 using the processed images IP111 and IP121 as frames may be performed by interpolation processing. For example, the generation apparatus 100 may generate a summary video that is displayed so as to move linearly between the objects by interpolation processing. Further, when the generating apparatus 100 generates the summary video MV11 that continuously displays the plurality of processed images IP111 to IP141 based on the order, “α” of the order B (A + α) attached to the processed image IP121 is: It may be “number of processed images displayed between processed image IP111 and processed image IP121 + 1”.

また、例えば、生成装置１００は、複数の加工画像ＩＰ１１１〜ＩＰ１４１等のうち、表示順が２位のオブジェクトであるボールを含む加工画像ＩＰ１３１等にピッチャーＮが含まれる加工画像の順位より低い順位を付す。具体的には、生成装置１００は、ボール（オブジェクトＯＢ１５）が含まれる加工画像ＩＰ１３１に順位Ｃ（Ｂ＋β）を付す。 In addition, for example, the generation apparatus 100 has a lower rank than the rank of the processed image in which the pitcher N is included in the processed image IP131 including the ball that is the second-ranked object among the plurality of processed images IP111 to IP141. Attached. Specifically, the generating apparatus 100 assigns a rank C (B + β) to the processed image IP131 including the ball (object OB15).

また、生成装置１００が各加工画像の順位に基づいて、複数の加工画像ＩＰ１１１〜ＩＰ１４１間をつなぐフレーム補間等の補間処理を行うことにより、要約動画ＭＶ１１を生成する場合、加工画像ＩＰ１３１に付される順位Ｃ（Ｂ＋β）の「β」は「１」であってもよい。また、生成装置１００が複数の加工画像ＩＰ１１１〜ＩＰ１４１等を順位に基づいて連続して表示する要約動画ＭＶ１１を生成する場合、加工画像ＩＰ１３１に付される順位Ｃ（Ｂ＋β）の「β」は、「加工画像ＩＰ１２１と加工画像ＩＰ１３１との間に表示される加工画像の枚数＋１」であってもよい。 In addition, when the generation apparatus 100 generates the summary moving image MV11 by performing interpolation processing such as frame interpolation that connects the plurality of processed images IP111 to IP141 based on the ranking of each processed image, the generated image MV11 is attached to the processed image IP131. “1” in the order C (B + β) may be “1”. Further, when the generating apparatus 100 generates the summary video MV11 that continuously displays the plurality of processed images IP111 to IP141 based on the order, “β” of the order C (B + β) attached to the processed image IP131 is: It may be “number of processed images displayed between processed image IP121 and processed image IP131 + 1”.

また、例えば、生成装置１００は、複数の加工画像ＩＰ１１１〜ＩＰ１４１等のうち、表示順が３位のオブジェクトであるバッターＯを含む加工画像ＩＰ１４１等にボール（オブジェクトＯＢ１５）が含まれる加工画像の順位より低い順位を付す。具体的には、生成装置１００は、バッターＯが含まれる加工画像ＩＰ１４１に順位Ｄ（Ｃ＋γ）を付す。 Further, for example, the generation apparatus 100 ranks the processed images in which the ball (object OB15) is included in the processed image IP141 including the batter O that is the third-ranked object among the plurality of processed images IP111 to IP141. Give a lower ranking. Specifically, the generating apparatus 100 assigns a rank D (C + γ) to the processed image IP141 including the batter O.

また、生成装置１００が各加工画像の順位に基づいて、複数の加工画像ＩＰ１１１〜ＩＰ１４１間をつなぐフレーム補間等の補間処理を行うことにより、要約動画ＭＶ１１を生成する場合、加工画像ＩＰ１４１に付される順位Ｄ（Ｃ＋γ）の「γ」は「１」であってもよい。また、生成装置１００が複数の加工画像ＩＰ１１１〜ＩＰ１４１等を順位に基づいて連続して表示する要約動画ＭＶ１１を生成する場合、加工画像ＩＰ１４１に付される順位Ｄ（Ｃ＋γ）の「γ」は、「加工画像ＩＰ１３１と加工画像ＩＰ１４１との間に表示される加工画像の枚数＋１」であってもよい。 In addition, when the generation apparatus 100 generates the summary moving image MV11 by performing interpolation processing such as frame interpolation that connects the plurality of processed images IP111 to IP141 based on the order of each processed image, the generated image MV11 is attached to the processed image IP141. “Γ” in the ranking D (C + γ) may be “1”. Further, when the generating apparatus 100 generates the summary video MV11 that continuously displays the plurality of processed images IP111 to IP141 based on the rank, “γ” of the rank D (C + γ) given to the processed image IP141 is It may be “number of processed images displayed between processed image IP131 and processed image IP141 + 1”.

そして、生成装置１００は、複数の加工画像ＩＰ１１１〜ＩＰ１４１等に付された順位に基づく順序で、複数の加工画像ＩＰ１１１〜ＩＰ１４１等が表示される要約動画ＭＶ１１を生成する。例えば、生成装置１００は、上述したフレーム補間等の処理により、複数の加工画像ＩＰ１１１〜ＩＰ１４１間をつなぐ補間を行うことにより、要約動画ＭＶ１１を生成してもよい。例えば、生成装置１００は、生成した加工画像に加工画像ＩＰ１１１〜ＩＰ１４１以外にも多数の加工画像が含まれる場合、複数の加工画像に付された順位に基づく順序で、複数の加工画像が表示される要約動画ＭＶ１１を生成してもよい。 Then, the generation device 100 generates the summary video MV11 in which the plurality of processed images IP111 to IP141 and the like are displayed in the order based on the order given to the plurality of processed images IP111 to IP141 and the like. For example, the generating apparatus 100 may generate the summary video MV11 by performing interpolation that connects the plurality of processed images IP111 to IP141 by the above-described processing such as frame interpolation. For example, when the generated processed image includes a large number of processed images in addition to the processed images IP111 to IP141, the generated device 100 displays the plurality of processed images in an order based on the order given to the plurality of processed images. A summary video MV11 may be generated.

上述したように、生成装置１００は、特徴領域情報ＦＲ１１やオブジェクト一覧ＯＬ１１に基づいて、画像ＩＭ１１をクロッピングすることにより、加工画像ＩＰ１１１〜ＩＰ１４１等を生成する。また、生成装置１００は、加工画像ＩＰ１１１〜ＩＰ１４１等に付された順位に基づく順序で、加工画像ＩＰ１１１〜ＩＰ１４１等が表示される要約動画ＭＶ１１を生成することにより、生成装置１００は、コンテンツＡＴ１１の内容を含む動画を適切に生成することができる。 As described above, the generation apparatus 100 generates the processed images IP111 to IP141 and the like by cropping the image IM11 based on the feature area information FR11 and the object list OL11. Further, the generation apparatus 100 generates the summary video MV11 in which the processed images IP111 to IP141 are displayed in the order based on the ranks assigned to the processed images IP111 to IP141, so that the generation apparatus 100 can generate the summary of the content AT11. A moving image including the contents can be appropriately generated.

なお、生成装置１００は、上述した例に限らず、種々の情報に基づいて、要約動画を生成してもよい。例えば、生成装置１００は、各オブジェクトの重要度に応じて、各オブジェクトが要約動画に含まれる割合を決定してもよい。例えば、生成装置１００は、画像内に占める各オブジェクトの割合や文字情報における各オブジェクトの出現順序や出現頻度に基づいて、各オブジェクトが要約動画に含まれる割合を決定してもよい。なお、ここでいう要約動画に含まれる割合とは、要約動画の再生時間における割合であってもよい。例えば、生成装置１００は、画像ＩＭ１１や文字情報ＩＣ１１において、ピッチャーＮやバッターＯの重要度が高いと推定し、ピッチャーＮが４割、バッターＯが５割、ボールが１割含まれる要約動画を生成してもよい。 In addition, the production | generation apparatus 100 may produce | generate a summary moving image based on not only the example mentioned above but various information. For example, the generation apparatus 100 may determine the ratio of each object included in the summary video according to the importance of each object. For example, the generation apparatus 100 may determine the ratio of each object included in the summary video based on the ratio of each object in the image and the appearance order and appearance frequency of each object in the character information. Here, the ratio included in the summary video may be a ratio in the playback time of the summary video. For example, in the image IM11 and the character information IC11, the generation apparatus 100 estimates that the importance of the pitcher N and the batter O is high, and the summary video including the pitcher N is 40%, the batter O is 50%, and the ball is 10%. It may be generated.

また、上述した例においては、生成装置１００が文字情報ＩＣ１１を用いて要約動画ＭＶ１１を生成する場合を示したが、生成装置１００は、画像のみから要約動画を生成してもよい。また、コンテンツＡＴ１１に含まれる文字情報ＩＣ１１を用いる場合を示したが、文字情報はコンテンツと関連すればどのような情報であってもよい。また、生成装置１００は、所定の記憶手段に記憶された各オブジェクト間やオブジェクトの部分ごとの表示順に関する情報に基づいて、要約動画を生成してもよい。なお、上述した例においては、ピッチャーＮの後にバッターＯが表示される要約動画ＭＶ１１が生成される場合を示したが、生成装置１００は、異なる順番で表示される要約動画を生成してもよい。例えば、生成装置１００は、コンテンツに含まれる画像がピッチャー返しの画像である場合や、コンテンツに含まれる文字情報が「バッターが打った球がピッチャーを直撃…」等である場合、ボールを打ったバッターを表示した後に、ピッチャーを表示してもよい。このように、生成装置１００は、画像の内容や文字情報の意味等に基づいて、各オブジェクトの表示順を決定してもよい。 In the example described above, the generation apparatus 100 generates the summary video MV11 using the character information IC11. However, the generation apparatus 100 may generate the summary video only from the image. Moreover, although the case where the character information IC11 included in the content AT11 is used is shown, the character information may be any information as long as it is related to the content. Further, the generation apparatus 100 may generate a summary moving image based on information regarding the display order between objects or each part of an object stored in a predetermined storage unit. In the above-described example, the case where the summary video MV11 in which the batter O is displayed after the pitcher N is generated is shown, but the generation device 100 may generate the summary video displayed in a different order. . For example, the generation apparatus 100 hits the ball when the image included in the content is an image returned by the pitcher, or when the character information included in the content is “a ball hit by the batter hits the pitcher directly” or the like. After displaying the batter, the pitcher may be displayed. As described above, the generation apparatus 100 may determine the display order of each object based on the content of the image, the meaning of the character information, and the like.

また、例えば、生成装置１００は、人間の常識に関する情報に基づいて、各オブジェクト間やオブジェクトの部分ごとの表示順を決定してもよい。例えば、生成装置１００は、いわゆる知識ベース等のデータベースに記憶された情報に基づいて、各オブジェクト間やオブジェクトの部分ごとの表示順を決定し、要約動画を生成してもよい。この場合、生成装置１００は、配信システム１（図２参照）の管理者等が入力した各オブジェクト間やオブジェクトの部分ごとの表示順に関する情報に基づいて、要約動画を生成してもよい。例えば、生成装置１００は、ニュース動画等の種々の既存の動画から学習した表示順を用いて、要約動画を生成してもよい。例えば、既存のニュース動画等における野球の動画では、ピッチャーからバッターといった表示順序で頻繁に表示される場合が多いとする。この場合、生成装置１００は、野球の動画では、ピッチャーの次にバッターという表示順を学習し、学習した表示順を用いて、ピッチャーの次にバッターが表示される要約動画を生成してもよい。また、例えば、生成装置１００は、ネットワーク上から収集した種々の情報に基づいて、表示順を決定しても良い。例えば、生成装置１００は、収集したＷｅｂページ等に基づいて算出した各頻度に関する情報を利用して表示順を決定しても良い。例えば、生成装置１００は、収集したＷｅｂページ等におけるに表示に基づいて算出した各オブジェクトの表示順の頻度に関する情報を利用して表示順を決定しても良い。また、例えば、複数人が含まれるグループ（例えばアイドルグループ等）において各オブジェクト（人間）に人気順等の順位付けがされているものとする。この場合、生成装置１００は、複数人が含まれるグループのうち、所定の閾値（例えば５位等）以上の（人気）順位が付されたオブジェクト（人間）が含まれるように、要約動画を生成してもよい。例えば、生成装置１００は、複数人が含まれるグループのうち、上位人気の５人が必ず含まれるように、要約動画を生成してもよい。なお、このようなグループ内の人気に関する情報は、上述した知識ベースから取得してもよいし、配信システム１（図２参照）の管理者等が入力してもよい。また、生成装置１００は、オブジェクトが動物（人間）で含まれる場合、顔認識結果の尤度等の情報を用いて表示順を決定しても良い。例えば、生成装置１００は、イベントや祭り等のなど多くの人が集まるシーンにおいて、群衆の中の人の顔にフォーカスする場合に、良く撮れている、すなわちピントが合っている人の顔にフォーカスするように、要約動画を生成してもよい。 In addition, for example, the generation apparatus 100 may determine the display order between objects or parts of objects based on information about human common sense. For example, the generating apparatus 100 may determine a display order for each object or each part of an object based on information stored in a database such as a so-called knowledge base, and generate a summary moving image. In this case, the generation apparatus 100 may generate a summary video based on information regarding the display order between objects or each part of an object input by an administrator of the distribution system 1 (see FIG. 2). For example, the generating apparatus 100 may generate a summary video using a display order learned from various existing videos such as news videos. For example, it is assumed that a baseball video in an existing news video or the like is frequently displayed in a display order from pitcher to batter. In this case, in the baseball video, the generation apparatus 100 may learn the display order of batter next to the pitcher, and use the learned display order to generate a summary video in which the batter is displayed next to the pitcher. . For example, the generation device 100 may determine the display order based on various information collected from the network. For example, the generation apparatus 100 may determine the display order using information regarding each frequency calculated based on the collected Web pages and the like. For example, the generation apparatus 100 may determine the display order using information regarding the frequency of the display order of each object calculated based on the display on the collected Web page or the like. Further, for example, it is assumed that each object (human) is ranked in popularity order in a group including a plurality of people (for example, an idol group). In this case, the generating apparatus 100 generates a summary video so that an object (human) with a (popular) ranking equal to or higher than a predetermined threshold (for example, fifth) is included in a group including a plurality of people. May be. For example, the generation apparatus 100 may generate the summary video so that the top five popular people are always included in the group including a plurality of people. Note that such information regarding popularity within the group may be acquired from the knowledge base described above, or may be input by an administrator of the distribution system 1 (see FIG. 2). In addition, when the object is an animal (human), the generation apparatus 100 may determine the display order using information such as the likelihood of the face recognition result. For example, in a scene where many people gather, such as an event or a festival, the generation device 100 focuses on the faces of people who are well photographed, that is, in focus, when focusing on the faces of people in the crowd. As such, a summary video may be generated.

例えば、配信システム１の管理者等は、画像や動画にピッチャーやバッターが含まれる場合、人間が通常はピッチャーを見てからバッターを見ることを示す情報を入力する。例えば、配信システム１の管理者等は、ファッションショー等においてモデルを撮影する際には、脚部から頭部へ、すなわち下から上へ表示範囲を移動させることを示す情報を入力する。例えば、配信システム１の管理者等は、商品広告等の場合、商品を持つ人間全体を含む表示範囲から、商品をアップで含む表示範囲へ変更、すなわち商品へズームすることを示す情報を入力する。例えば、生成装置１００は、入力した各オブジェクト間やオブジェクトの部分ごとの表示順に関する情報を記憶部１２０（図３参照）に記憶し、要約動画を生成する際に、記憶部１２０から対応する表示順に関する情報を読み出して用いてもよい。また、例えば、生成装置１００は、要約動画（アニメーション）生成の際にオブジェクトのＮＧリストを利用してもよい。例えば、ＮＧリストに広告や一般人等が含まれてもよい。この場合、例えば、図１に示す画像ＩＭ１１中に観衆（一般人）の顔等が含まれる場合、生成装置１００は、一般人の顔が含まれないように要約動画を生成してもよい。また、例えば、図１に示す画像ＩＭ１１中に広告等が含まれる場合、生成装置１００は、広告が含まれないように要約動画を生成してもよい。例えば、生成装置１００は、図１に示す画像ＩＭ１１中に含まれるキャッチャーＰが所定の事象（例えば逮捕等）によりＮＧリストに含まれる場合、生成装置１００は、キャッチャーＰが含まれないように要約動画を生成してもよい。例えば、生成装置１００は、ＮＧリストに含まれるオブジェクト（以下、「ＮＧオブジェクト」とする）が要約動画に含まれることを回避する場合、種々の編集に関する手法を用いてもよい。例えば、生成装置１００は、所定の編集点（カット）を入れた要約動画を生成してもよい。例えば、生成装置１００は、要約動画に含めるオブジェクト間を連続的に繋がずに、一部に適当な編集効果を入れて分割してＮＧオブジェクトを避けてもよい。例えば、生成装置１００は、要約情報に含めるオブジェクトＡとオブジェクトＢとの間に、ＮＧオブジェクトＣが位置する場合、オブジェクトＡとオブジェクトＢとの間を連続的につながずに、一部に任意の編集効果を入れて分割してオブジェクトＣが含まれることを回避した要約動画を生成してもよい。例えば、生成装置１００は、所定の画像を追加したり、インサート編集等の種々の編集処理を行ったりしてもよい。また、生成装置１００は、複数の要約動画（アニメーション）のパス候補がある場合にはＮＧオブジェクトを含まない方を選択してもよい。例えば、生成装置１００は、オブジェクトＡとオブジェクトＢとの間をつなぐ要約動画を生成する場合に、オブジェクトＡとオブジェクトＢとの間をつなぐパスにＮＧオブジェクトＣが位置するパスとＮＧオブジェクトＣが位置しないパスとが含まれる場合、ＮＧオブジェクトＣが位置しないパスに基づいて、要約動画を生成してもよい。 For example, when an image or a moving image includes a pitcher or a batter, an administrator of the distribution system 1 inputs information indicating that a person usually sees the batter after looking at the pitcher. For example, when the administrator of the distribution system 1 shoots a model at a fashion show or the like, information indicating that the display range is moved from the leg portion to the head, that is, from the bottom to the top is input. For example, in the case of a product advertisement or the like, the administrator or the like of the distribution system 1 inputs information indicating that the display range including the entire person having the product is changed to the display range including the product up, that is, zooming to the product. . For example, the generation device 100 stores information regarding the display order between the input objects and for each part of the objects in the storage unit 120 (see FIG. 3), and displays a corresponding display from the storage unit 120 when generating the summary video. Information regarding the order may be read and used. Further, for example, the generation apparatus 100 may use an NG list of objects when generating a summary video (animation). For example, the NG list may include advertisements and general people. In this case, for example, when the image IM11 shown in FIG. 1 includes the spectators (general people) face and the like, the generation apparatus 100 may generate the summary video so that the general people face is not included. Further, for example, when an advertisement or the like is included in the image IM11 illustrated in FIG. 1, the generation apparatus 100 may generate a summary video so that the advertisement is not included. For example, when the catcher P included in the image IM11 illustrated in FIG. 1 is included in the NG list due to a predetermined event (for example, arrest), the generation device 100 summarizes the catcher P so that the catcher P is not included. A video may be generated. For example, the generation apparatus 100 may use various editing-related techniques when an object included in the NG list (hereinafter referred to as “NG object”) is avoided from being included in the summary video. For example, the generation apparatus 100 may generate a summary video that includes predetermined editing points (cuts). For example, the generating apparatus 100 may avoid an NG object by dividing an object to be included in a summary video by adding an appropriate editing effect to a part without continuously connecting the objects to be included in the summary video. For example, when the NG object C is located between the object A and the object B to be included in the summary information, the generation device 100 does not continuously connect the object A and the object B, but may arbitrarily add some A summary video that avoids the inclusion of the object C may be generated by dividing it with an editing effect. For example, the generating apparatus 100 may add a predetermined image or perform various editing processes such as insert editing. In addition, when there are a plurality of summary moving image (animation) path candidates, the generation apparatus 100 may select the one that does not include the NG object. For example, when the generation apparatus 100 generates a summary video that connects between the object A and the object B, the path where the NG object C is located and the position where the NG object C is located in the path that connects the object A and the object B When a path that is not included is included, a summary movie may be generated based on a path where the NG object C is not located.

また、生成装置１００が生成する要約動画は、所定の圧縮形式により圧縮された動画であってもよい。また、生成装置１００は、要約動画における重要なシーンの前で所定の時間静止する要約動画を生成してもよい。例えば、生成装置１００は、重要なシーンの前で所定の時間静止し、テロップ等の文字情報を差し込み表示し、その後続きを表示する要約動画を生成してもよい。例えば、生成装置１００は、アクションシーンの要約動画において、殴るシーンの直前で一時停止し、テロップ等の文字情報を差し込み表示し、その後続きを表示する要約動画を生成してもよい。生成装置１００は、音声情報から取得した文字情報に基づいて、上記の処理を行ってもよい。また、生成装置１００は、所定のコンテンツの遷移先のコンテンツの要約動画を生成し、所定のコンテンツに表示してもよい。例えば、生成装置１００は、所定のウェブページの遷移先ページの要約動画を生成して、ディスプレイ広告として所定のウェブページに表示してもよい。例えば、生成装置１００は、所定のウェブページの遷移先ページのキャプチャ画像に基づいて要約動画を生成して、ディスプレイ広告として所定のウェブページに表示してもよい。 The summary video generated by the generation device 100 may be a video compressed by a predetermined compression format. Further, the generation apparatus 100 may generate a summary video that stops for a predetermined time before an important scene in the summary video. For example, the generating apparatus 100 may generate a summary moving image that pauses for a predetermined time in front of an important scene, inserts and displays text information such as telop, and displays the subsequent information. For example, in the summary video of the action scene, the generation device 100 may generate a summary video that pauses immediately before the scene to be played, inserts text information such as a telop, and displays the subsequent information. The generation device 100 may perform the above processing based on the character information acquired from the voice information. Further, the generation apparatus 100 may generate a summary video of the content that is the destination of the predetermined content and display it on the predetermined content. For example, the generating apparatus 100 may generate a summary video of a transition destination page of a predetermined web page and display it on a predetermined web page as a display advertisement. For example, the generating apparatus 100 may generate a summary video based on a captured image of a transition destination page of a predetermined web page and display it on a predetermined web page as a display advertisement.

〔２．配信システムの構成〕
図２に示すように、配信システム１は、端末装置１０と、提供元端末５０と、生成装置１００とが含まれる。端末装置１０と、提供元端末５０と、生成装置１００とは所定のネットワークＮを介して、有線または無線により通信可能に接続される。図２は、実施形態に係る配信システムの構成例を示す図である。なお、図２に示した配信システム１には、複数台の端末装置１０や、複数台の提供元端末５０や、複数台の生成装置１００が含まれてもよい。 [2. Distribution system configuration)
As illustrated in FIG. 2, the distribution system 1 includes a terminal device 10, a provider terminal 50, and a generation device 100. The terminal device 10, the provider terminal 50, and the generation device 100 are connected via a predetermined network N so as to be communicable by wire or wirelessly. FIG. 2 is a diagram illustrating a configuration example of a distribution system according to the embodiment. The distribution system 1 illustrated in FIG. 2 may include a plurality of terminal devices 10, a plurality of providing source terminals 50, and a plurality of generating devices 100.

端末装置１０は、ユーザによって利用される情報処理装置である。端末装置１０は、ユーザによる種々の操作を受け付ける。なお、以下では、端末装置１０をユーザと表記する場合がある。すなわち、以下では、ユーザを端末装置１０と読み替えることもできる。なお、上述した端末装置１０は、例えば、スマートフォンや、タブレット型端末や、ノート型ＰＣ（Personal Computer）や、デスクトップＰＣや、携帯電話機や、ＰＤＡ（Personal Digital Assistant）等により実現される。 The terminal device 10 is an information processing device used by a user. The terminal device 10 receives various operations by the user. Hereinafter, the terminal device 10 may be referred to as a user. That is, hereinafter, the user can be read as the terminal device 10. The terminal device 10 described above is realized by, for example, a smartphone, a tablet terminal, a notebook PC (Personal Computer), a desktop PC, a mobile phone, a PDA (Personal Digital Assistant), or the like.

提供元端末５０は、文字情報や画像等のコンテンツの提供元によって利用される情報処理装置である。例えば、文字情報や画像等のコンテンツの提供元は、提供元端末５０により、図４に示すような文字情報や画像を生成装置１００へ提供する。 The provider terminal 50 is an information processing apparatus used by a provider of content such as character information and images. For example, a provider of contents such as character information and images provides the generation apparatus 100 with character information and images as shown in FIG.

生成装置１００は、複数の特徴領域情報に基づいて、画像を加工することにより、画像から加工画像を生成する情報処理装置である。また、本実施形態において生成装置１００は、生成した組合せコンテンツを端末装置１０へ配信するコンテンツ配信サービスを提供する。 The generation device 100 is an information processing device that generates a processed image from an image by processing the image based on a plurality of feature region information. In the present embodiment, the generation device 100 provides a content distribution service for distributing the generated combination content to the terminal device 10.

〔３．生成装置の構成〕
次に、図３を用いて、実施形態に係る生成装置１００の構成について説明する。図３は、実施形態に係る生成装置１００の構成例を示す図である。図３に示すように、生成装置１００は、通信部１１０と、記憶部１２０と、制御部１３０とを有する。なお、生成装置１００は、生成装置１００の管理者等から各種操作を受け付ける入力部（例えば、キーボードやマウス等）や、各種情報を表示するための表示部（例えば、液晶ディスプレイ等）を有してもよい。 [3. Configuration of the generator
Next, the configuration of the generation apparatus 100 according to the embodiment will be described with reference to FIG. FIG. 3 is a diagram illustrating a configuration example of the generation apparatus 100 according to the embodiment. As illustrated in FIG. 3, the generation device 100 includes a communication unit 110, a storage unit 120, and a control unit 130. The generation device 100 includes an input unit (for example, a keyboard and a mouse) that receives various operations from an administrator of the generation device 100 and a display unit (for example, a liquid crystal display) for displaying various types of information. May be.

（通信部１１０）
通信部１１０は、例えば、ＮＩＣ（Network Interface Card）等によって実現される。そして、通信部１１０は、ネットワークＮと有線または無線で接続され、端末装置１０との間で情報の送受信を行う。 (Communication unit 110)
The communication unit 110 is realized by, for example, a NIC (Network Interface Card). The communication unit 110 is connected to the network N by wire or wirelessly and transmits / receives information to / from the terminal device 10.

（記憶部１２０）
記憶部１２０は、例えば、ＲＡＭ（Random Access Memory）、フラッシュメモリ（Flash Memory）等の半導体メモリ素子、または、ハードディスク、光ディスク等の記憶装置によって実現される。実施形態に係る記憶部１２０は、図３に示すように、コンテンツ情報記憶部１２１を有する。 (Storage unit 120)
The storage unit 120 is realized by, for example, a semiconductor memory device such as a RAM (Random Access Memory) or a flash memory, or a storage device such as a hard disk or an optical disk. The storage unit 120 according to the embodiment includes a content information storage unit 121 as illustrated in FIG.

（コンテンツ情報記憶部１２１）
実施形態に係るコンテンツ情報記憶部１２１は、コンテンツに関する各種情報を記憶する。図４に、実施形態に係るコンテンツ情報記憶部１２１の一例を示す。図４に示すコンテンツ情報記憶部１２１は、「コンテンツＩＤ」、「文字情報」、「画像情報」、「画像ＩＤ」、「提供元ＩＤ」といった項目を有する。 (Content information storage unit 121)
The content information storage unit 121 according to the embodiment stores various types of information related to content. FIG. 4 shows an example of the content information storage unit 121 according to the embodiment. The content information storage unit 121 illustrated in FIG. 4 includes items such as “content ID”, “character information”, “image information”, “image ID”, and “provider ID”.

「コンテンツＩＤ」は、コンテンツを識別するための識別情報を示す。「文字情報」は、対応するコンテンツに含まれる文字情報を示す。また、「画像情報」は、対応するコンテンツに含まれる画像を示す。図４では、説明のため画像ＩＤにより識別される画像を図示するが、「画像情報」としては、画像の格納場所を示すファイルパス名などが格納されてもよい。「画像ＩＤ」は、画像を識別するための識別情報を示す。例えば、画像ＩＤ「ＩＭ１１」により識別される画像は、図１の例に示した画像ＩＭ１１に対応する。また、「提供元ＩＤ」は、コンテンツの提供元を識別するための識別情報を示す。 “Content ID” indicates identification information for identifying the content. “Character information” indicates character information included in the corresponding content. “Image information” indicates an image included in the corresponding content. In FIG. 4, an image identified by an image ID is illustrated for explanation, but a file path name indicating a storage location of the image may be stored as “image information”. “Image ID” indicates identification information for identifying an image. For example, the image identified by the image ID “IM11” corresponds to the image IM11 illustrated in the example of FIG. The “provider ID” indicates identification information for identifying the content provider.

例えば、図４に示す例において、コンテンツＩＤ「ＡＴ１１」により識別されるコンテンツＡＴ１１は、文字情報「Ｘ月Ｙ日に行われた決勝戦で、チームＡＡのピッチャーＮが…」と画像ＩＤ「ＩＭ１１」により識別される画像ＩＭ１１を含むコンテンツＡＴ１１であることを示す。また、コンテンツＩＤ「ＡＴ１１」により識別されるコンテンツＡＴ１１は、提供元ＩＤ「ＣＰ１１」により識別される提供元から取得したコンテンツＡＴ１１であることを示す。 For example, in the example shown in FIG. 4, the content AT11 identified by the content ID “AT11” has character information “Pitcher N of team AA in the final match held on the month of X, Y ...” and the image ID “IM11 ”Indicates that the content AT11 includes the image IM11 identified. The content AT11 identified by the content ID “AT11” is the content AT11 acquired from the provider identified by the provider ID “CP11”.

また、例えば、図４に示す例において、コンテンツＩＤ「ＡＴ１２」により識別されるコンテンツＡＴ１２は、文字情報「Ｚ月Ａ日に行われたリーグの第Ｚ節、…」と画像ＩＤ「ＩＭ１２」により識別される画像ＩＭ１２を含むコンテンツＡＴ１２であることを示す。また、コンテンツＩＤ「ＡＴ１２」により識別されるコンテンツＡＴ１２は、提供元ＩＤ「ＣＰ１２」により識別される提供元から取得したコンテンツＡＴ１２であることを示す。 Further, for example, in the example shown in FIG. 4, the content AT12 identified by the content ID “AT12” is based on the character information “Z Section of the league held on Z month A,...” And image ID “IM12”. The content AT12 includes the image IM12 to be identified. The content AT12 identified by the content ID “AT12” indicates that the content AT12 is acquired from the provider identified by the provider ID “CP12”.

なお、コンテンツ情報記憶部１２１は、上記に限らず、目的に応じて種々の情報を記憶してもよい。例えば、コンテンツ情報記憶部１２１は、コンテンツに動画が含まれる場合、動画を記憶してもよい。例えば、コンテンツ情報記憶部１２１は、コンテンツに複数の画像が含まれる場合、複数の画像を記憶してもよい。例えば、コンテンツ情報記憶部１２１は、コンテンツのカテゴリに関する情報を記憶してもよい。また、例えば、コンテンツ情報記憶部１２１は、コンテンツを取得した日時やコンテンツが作成された日時に関する情報を記憶してもよい。また、例えば、コンテンツ情報記憶部１２１は、コンテンツから抽出されたトピックに関する情報を記憶してもよい。また、例えば、コンテンツ情報記憶部１２１は、コンテンツの文字情報における重要語に関する情報を記憶してもよい。また、コンテンツ情報記憶部１２１中の画像は、画像の提供元や画像に関する権利（著作権等）を有する第三者から、画像への加工、すなわち二次加工に関する許諾が得られていることが判断（確認)され、管理（記憶）されているものとする。 The content information storage unit 121 is not limited to the above, and may store various types of information according to the purpose. For example, the content information storage unit 121 may store a moving image when the content includes a moving image. For example, the content information storage unit 121 may store a plurality of images when the content includes a plurality of images. For example, the content information storage unit 121 may store information related to content categories. Further, for example, the content information storage unit 121 may store information regarding the date and time when the content was acquired and the date and time when the content was created. For example, the content information storage unit 121 may store information on topics extracted from content. Further, for example, the content information storage unit 121 may store information related to important words in the character information of the content. In addition, the image in the content information storage unit 121 may be approved for processing of the image, that is, secondary processing, from an image provider or a third party having rights (copyright etc.) regarding the image. It is determined (confirmed) and managed (stored).

（制御部１３０）
図３の説明に戻って、制御部１３０は、例えば、ＣＰＵ（Central Processing Unit）やＭＰＵ（Micro Processing Unit）等によって、生成装置１００内部の記憶装置に記憶されている各種プログラム（生成プログラムの一例に相当）がＲＡＭを作業領域として実行されることにより実現される。また、制御部１３０は、例えば、ＡＳＩＣ（Application Specific Integrated Circuit）やＦＰＧＡ（Field Programmable Gate Array）等の集積回路により実現される。 (Control unit 130)
Returning to the description of FIG. 3, the control unit 130, for example, various programs (an example of a generation program) stored in a storage device inside the generation device 100 by a CPU (Central Processing Unit), an MPU (Micro Processing Unit), or the like. Is realized by using the RAM as a work area. The control unit 130 is realized by an integrated circuit such as an application specific integrated circuit (ASIC) or a field programmable gate array (FPGA).

図３に示すように、制御部１３０は、取得部１３１と、抽出部１３２と、第１生成部１３３と、第２生成部１３４と、配信部１３５とを有し、以下に説明する情報処理の機能や作用を実現または実行する。なお、制御部１３０の内部構成は、図３に示した構成に限られず、後述する情報処理を行う構成であれば他の構成であってもよい。また、制御部１３０が有する各処理部の接続関係は、図３に示した接続関係に限られず、他の接続関係であってもよい。 As illustrated in FIG. 3, the control unit 130 includes an acquisition unit 131, an extraction unit 132, a first generation unit 133, a second generation unit 134, and a distribution unit 135, and information processing described below. Realize or execute the functions and operations of Note that the internal configuration of the control unit 130 is not limited to the configuration illustrated in FIG. 3, and may be another configuration as long as the information processing described below is performed. Further, the connection relationship between the processing units included in the control unit 130 is not limited to the connection relationship illustrated in FIG. 3, and may be another connection relationship.

（取得部１３１）
取得部１３１は、各種情報を取得する。例えば、取得部１３１は、外部装置や記憶部１２０から各種情報を取得する。例えば、取得部１３１は、外部装置やコンテンツ情報記憶部１２１からコンテンツに関する各種情報を取得する。例えば、取得部１３１は、コンテンツ情報記憶部１２１からコンテンツＡＴ１１に関する各種情報を取得する。 (Acquisition part 131)
The acquisition unit 131 acquires various types of information. For example, the acquisition unit 131 acquires various types of information from the external device or the storage unit 120. For example, the acquisition unit 131 acquires various types of information related to content from the external device or the content information storage unit 121. For example, the acquisition unit 131 acquires various types of information related to the content AT11 from the content information storage unit 121.

また、取得部１３１は、コンテンツに含まれる画像に関する情報から抽出される対象物の領域に関する情報である特徴領域情報を取得する。例えば、取得部１３１は、特徴領域情報ＦＲ１１を取得する。図１では、取得部１３１は、画像に関する情報としてコンテンツＡＴ１１に含まれる画像ＩＭ１１から抽出される特徴領域情報ＦＲ１１を取得する。例えば、取得部１３１は、コンテンツに含まれる画像に関する情報から抽出された特徴領域情報を外部装置から取得してもよい。また、例えば、取得部１３１は、コンテンツに含まれる画像に関する情報から抽出された特徴領域情報を抽出部１３２や記憶部１２０から取得してもよい。 In addition, the acquisition unit 131 acquires feature region information that is information related to the region of the target object extracted from the information related to the image included in the content. For example, the acquisition unit 131 acquires the feature area information FR11. In FIG. 1, the acquisition unit 131 acquires feature area information FR11 extracted from the image IM11 included in the content AT11 as information about the image. For example, the acquisition unit 131 may acquire feature region information extracted from information related to an image included in content from an external device. For example, the acquisition unit 131 may acquire feature region information extracted from information about an image included in content from the extraction unit 132 or the storage unit 120.

例えば、取得部１３１は、画像に関する情報としてコンテンツに含まれる複数の画像情報から抽出される特徴領域情報を取得する。また、例えば、取得部１３１は、画像に関する情報としてコンテンツに含まれる動画情報から抽出される特徴領域情報を取得する。また、例えば、取得部１３１は、コンテンツに関連する文字情報に基づいて抽出される特徴領域情報を取得する。図１では、取得部１３１は、コンテンツＡＴ１１に含まれる文字情報ＩＣ１１に基づいて抽出される特徴領域情報ＦＲ１１を取得する。また、例えば、取得部１３１は、コンテンツに関連する音声情報に基づいて抽出される特徴領域情報を取得する。 For example, the acquisition unit 131 acquires feature area information extracted from a plurality of pieces of image information included in content as information related to images. Further, for example, the acquisition unit 131 acquires feature region information extracted from moving image information included in content as information about an image. For example, the acquisition unit 131 acquires feature region information extracted based on character information related to content. In FIG. 1, the acquisition unit 131 acquires feature area information FR11 that is extracted based on the character information IC11 included in the content AT11. For example, the acquisition unit 131 acquires feature region information extracted based on audio information related to content.

また、取得部１３１は、端末装置１０からコンテンツの配信要求を取得する。また、取得部１３１は、外部の情報処理装置からコンテンツ情報記憶部１２１に記憶されるコンテンツを取得する。この場合、例えば、取得部１３１は、提供元端末５０からコンテンツを取得する。また、取得部１３１は、コンテンツにおけるトピックに関する情報を取得してもよい。例えば、取得部１３１は、コンテンツにおけるトピックに関する指定をコンテンツの提供元から取得してもよい。この場合、例えば、取得部１３１は、提供元端末５０からコンテンツにおけるトピックに関する指定を取得する。 The acquisition unit 131 acquires a content distribution request from the terminal device 10. In addition, the acquisition unit 131 acquires content stored in the content information storage unit 121 from an external information processing apparatus. In this case, for example, the acquisition unit 131 acquires content from the provider terminal 50. The acquisition unit 131 may acquire information related to topics in the content. For example, the acquisition unit 131 may acquire a specification related to a topic in the content from the content provider. In this case, for example, the acquisition unit 131 acquires a specification related to a topic in the content from the providing source terminal 50.

（抽出部１３２）
また、抽出部１３２は、コンテンツに含まれる画像から特徴量に関する特徴領域情報を抽出する。例えば、抽出部１３２は、各種情報に基づいて、画像から特徴量に関する特徴領域情報を抽出する。例えば、抽出部１３２は、コンテンツに含まれる画像に基づいて画像から特徴領域情報を抽出してもよい。図１では、抽出部１３２は、コンテンツＡＴ１１に含まれる画像ＩＭ１１に基づいて画像ＩＭ１１から特徴領域情報ＦＲ１１を抽出する。例えば、抽出部１３２は、サリエンシーディテクション等の画像処理における種々の従来手法を適宜用いて、画像ＩＭ１１における特徴領域情報ＦＲ１１を抽出する。例えば、抽出部１３２は、Ｒ−ＣＮＮ等の画像認識技術を用いた画像処理を適宜用いてもよい。また、抽出部１３２は、画像処理の種々の従来手法等を適宜用いて、画像におけるオブジェクト（物体）の認識による情報の抽出を行ってもよい。 (Extractor 132)
Further, the extraction unit 132 extracts feature area information related to the feature amount from the image included in the content. For example, the extraction unit 132 extracts feature area information related to a feature amount from an image based on various types of information. For example, the extraction unit 132 may extract feature region information from the image based on the image included in the content. In FIG. 1, the extraction unit 132 extracts feature region information FR11 from the image IM11 based on the image IM11 included in the content AT11. For example, the extraction unit 132 extracts the feature region information FR11 in the image IM11 by appropriately using various conventional methods in image processing such as saliency detection. For example, the extraction unit 132 may appropriately use image processing using an image recognition technique such as R-CNN. Further, the extraction unit 132 may extract information by recognizing an object (object) in an image by appropriately using various conventional techniques for image processing.

また、抽出部１３２は、文字情報に基づいて画像から特徴領域情報を抽出する。例えば、抽出部１３２は、コンテンツに関連する文字情報に基づいて画像から特徴領域情報を抽出する。図１では、抽出部１３２は、コンテンツＡＴ１１に含まれる文字情報ＩＣ１１に基づいて画像ＩＭ１１から特徴領域情報ＦＲ１１を抽出する。例えば、抽出部１３２は、文字情報ＩＣ１１から抽出したトピックに基づいて、画像ＩＭ１１から特徴領域情報ＦＲ１１を抽出する。例えば、抽出部１３２は、画像処理等の種々の従来手法を適宜用いて、画像ＩＭ１１における特徴領域情報ＦＲ１１を抽出する。例えば、抽出部１３２は、画像ＩＭ１１において文字情報ＩＣ１１から抽出したトピックに関する物体を含む領域の特徴量が大きくなるように特徴領域情報ＦＲ１１を抽出する。 The extraction unit 132 extracts feature area information from the image based on the character information. For example, the extraction unit 132 extracts feature area information from the image based on character information related to the content. In FIG. 1, the extraction unit 132 extracts feature area information FR11 from the image IM11 based on the character information IC11 included in the content AT11. For example, the extraction unit 132 extracts the feature area information FR11 from the image IM11 based on the topic extracted from the character information IC11. For example, the extraction unit 132 extracts the feature region information FR11 in the image IM11 by appropriately using various conventional methods such as image processing. For example, the extraction unit 132 extracts the feature region information FR11 so that the feature amount of the region including the object related to the topic extracted from the character information IC11 in the image IM11 becomes large.

また、例えば、抽出部１３２は、画像から文字情報を生成する技術を応用して特徴領域情報を抽出してもよい。例えば、画像のキャプション生成のためのＡｔｔｅｎｔｉｏｎ機構付きＮＮ（Neural Network）を応用して特徴領域情報を取得してもよい。例えば、抽出部１３２は、入力画像の局所領域の畳み込みとプーリングとを繰り返す、いわゆる畳み込みニューラルネットワーク（ＣＮＮ）やリカレントニューラルネットワーク（ＲＮＮ）の技術を適宜用いて、特徴領域情報を抽出してもよい。例えば、抽出部１３２は、ＲＮＮとして、ＬＳＴＭ（Long Short-Term Memory）の技術を用いてもよい。例えば、抽出部１３２は、画像のみから、画像に含まれる特徴（対象）であって、文字情報（キャプション）を生成する際に文字情報に含まれる特徴（対象）を示す特徴領域情報を抽出する。例えば、抽出部１３２は、文字情報（キャプション）を生成する際に文字情報に含まれる特徴（対象）を含む領域の特徴量が大きい特徴領域情報を抽出する。なお、抽出部１３２は、コンテンツに関連する音声情報に基づいて上述した抽出処理を行ってもよい。 For example, the extraction unit 132 may extract feature region information by applying a technique for generating character information from an image. For example, feature area information may be acquired by applying an NN (Neural Network) with an Attention mechanism for generating image captions. For example, the extraction unit 132 may extract feature region information by appropriately using a so-called convolutional neural network (CNN) or recurrent neural network (RNN) technique that repeats convolution and pooling of a local region of an input image. . For example, the extraction unit 132 may use LSTM (Long Short-Term Memory) technology as the RNN. For example, the extraction unit 132 extracts, from only the image, feature region information indicating the feature (target) included in the image and indicating the feature (target) included in the character information when generating the character information (caption). . For example, when the character information (caption) is generated, the extraction unit 132 extracts feature region information having a large feature amount of a region including a feature (target) included in the character information. Note that the extraction unit 132 may perform the above-described extraction process based on audio information related to the content.

図１では、抽出部１３２は、画像ＩＭ１１に含まれるオブジェクトを抽出し、抽出したオブジェクトの表示順を決定する。例えば、抽出部１３２は、特徴領域情報ＦＲ１１や文字情報ＩＣ１１等の種々の情報に基づいて、オブジェクト一覧ＯＬ１１に示すように、オブジェクトＯＢ１１〜ＯＢ１５等を抽出する。例えば、抽出部１３２は、画像ＩＭ１１に含まれるキャッチャーＰをオブジェクトＯＢ１１として抽出する。また、抽出部１３２は、画像ＩＭ１１に含まれるピッチャーＮをオブジェクトＯＢ１２として抽出する。また、抽出部１３２は、画像ＩＭ１１に含まれるバッターＯをオブジェクトＯＢ１３として抽出する。また、抽出部１３２は、画像ＩＭ１１に含まれる審判ＱをオブジェクトＯＢ１４として抽出する。また、抽出部１３２は、画像ＩＭ１１に含まれるボールをオブジェクトＯＢ１５として抽出する。 In FIG. 1, the extraction unit 132 extracts objects included in the image IM11 and determines the display order of the extracted objects. For example, the extraction unit 132 extracts the objects OB11 to OB15 and the like as shown in the object list OL11 based on various information such as the feature area information FR11 and the character information IC11. For example, the extraction unit 132 extracts the catcher P included in the image IM11 as the object OB11. Further, the extraction unit 132 extracts the pitcher N included in the image IM11 as the object OB12. Further, the extraction unit 132 extracts the batter O included in the image IM11 as the object OB13. Further, the extraction unit 132 extracts the referee Q included in the image IM11 as the object OB14. Further, the extraction unit 132 extracts the ball included in the image IM11 as the object OB15.

なお、抽出部１３２は、種々の従来技術を適宜用いて、文字情報ＩＣ１１に含まれるピッチャーＮ（オブジェクトＯＢ１２）が位置する領域やバッターＯ（オブジェクトＯＢ１３）が位置する領域やボール（オブジェクトＯＢ１５）が位置する領域を推定してもよい。例えば、抽出部１３２は、特徴量が大きい領域の形状や位置関係等に応じて、ピッチャーＮ（オブジェクトＯＢ１２）が位置する領域やバッターＯ（オブジェクトＯＢ１３）が位置する領域やボール（オブジェクトＯＢ１５）が位置する領域を推定してもよい。また、抽出部１３２は、種々の情報を適宜用いて、ピッチャーＮ（オブジェクトＯＢ１２）が位置する領域やバッターＯ（オブジェクトＯＢ１３）が位置する領域やボール（オブジェクトＯＢ１５）が位置する領域を推定してもよい。 Note that the extraction unit 132 uses various conventional techniques as appropriate to obtain an area where the pitcher N (object OB12) included in the character information IC11 is located, an area where the batter O (object OB13) is located, and a ball (object OB15). You may estimate the area | region which is located. For example, the extraction unit 132 determines whether the area where the pitcher N (object OB12) is located, the area where the batter O (object OB13) is located, or the ball (object OB15) according to the shape or positional relationship of the area where the feature amount is large. You may estimate the area | region which is located. Further, the extraction unit 132 uses various information as appropriate to estimate the area where the pitcher N (object OB12) is located, the area where the batter O (object OB13) is located, and the area where the ball (object OB15) is located. Also good.

例えば、抽出部１３２は、種々のオブジェクトを学習した学習情報に基づいて、オブジェクトＯＢ１１〜ＯＢ１５を抽出してもよい。例えば、抽出部１３２は、ピッチャーＮのユニフォームや背番号等を学習した学習情報に基づいて、画像ＩＭ１１からオブジェクトＯＢ１２を抽出してもよい。また、例えば、抽出部１３２は、バッターＯの顔等を学習した学習情報に基づいて、画像ＩＭ１１からオブジェクトＯＢ１３を抽出してもよい。また、例えば、抽出部１３２は、野球のボールを学習した学習情報に基づいて、画像ＩＭ１１からオブジェクトＯＢ１５を抽出してもよい。なお、抽出部１３２は、特徴領域情報ＦＲ１１等により画像ＩＭ１１からオブジェクトＯＢ１１〜ＯＢ１５等が抽出可能であれば、どのような技術によりオブジェクト抽出を行ってもよい。 For example, the extraction unit 132 may extract the objects OB11 to OB15 based on learning information obtained by learning various objects. For example, the extraction unit 132 may extract the object OB12 from the image IM11 based on learning information obtained by learning the uniform or the spine number of the pitcher N. For example, the extraction unit 132 may extract the object OB13 from the image IM11 based on learning information obtained by learning the face of the batter O and the like. For example, the extraction unit 132 may extract the object OB15 from the image IM11 based on learning information obtained by learning a baseball. Note that the extraction unit 132 may perform object extraction by any technique as long as the objects OB11 to OB15 and the like can be extracted from the image IM11 by the feature area information FR11 and the like.

また、抽出部１３２は、抽出したオブジェクトＯＢ１１〜ＯＢ１５等の表示順を決定する。例えば、抽出部１３２は、文字情報ＩＣ１１から抽出したトピックや文字情報ＩＣ１１の構文や時系列に関する情報に基づいて、オブジェクトＯＢ１１〜ＯＢ１５等の表示順を決定する。 Further, the extraction unit 132 determines the display order of the extracted objects OB11 to OB15 and the like. For example, the extraction unit 132 determines the display order of the objects OB11 to OB15 and the like based on the topic extracted from the character information IC11, information on the syntax of the character information IC11, and time series.

また、抽出部１３２は、種々の従来技術を適宜用いて、文字情報ＩＣ１１には、ピッチャーＮの表示順よりもバッターＯの表示順が後であると決定する。また、抽出部１３２は、ボールがピッチャーＮとバッターＯとをつなぐ関係にあるため、ボールの表示順をピッチャーＮとバッターＯと間の表示順であると決定する。これにより、抽出部１３２は、ピッチャーＮ、ボール、バッターＯの時系列における順序を抽出する。すなわち、抽出部１３２は、オブジェクトＯＢ１２の表示順を１位、オブジェクトＯＢ１３の表示順を３位、オブジェクトＯＢ１５の表示順を２位に決定する。また、抽出部１３２は、画像ＩＭ１１に含まれる他のオブジェクトＯＢ１１やＯＢ１４等はコンテンツＡＴ１１において重要度が低いオブジェクトとして、表示順を設定しない。すなわち、抽出部１３２は、画像ＩＭ１１に含まれるキャッチャーＰや審判ＱをコンテンツＡＴ１１の内容において重要ではないとして、表示順を「−（無）」と決定する。 Further, the extraction unit 132 determines that the display order of the batter O is later than the display order of the pitcher N in the character information IC11 using various conventional techniques as appropriate. Further, the extraction unit 132 determines that the display order of the balls is the display order between the pitcher N and the batter O because the ball has a relationship connecting the pitcher N and the batter O. Thereby, the extraction unit 132 extracts the order in the time series of the pitcher N, the ball, and the batter O. That is, the extraction unit 132 determines that the display order of the object OB12 is first, the display order of the object OB13 is third, and the display order of the object OB15 is second. In addition, the extraction unit 132 sets other objects OB11 and OB14 included in the image IM11 as objects with low importance in the content AT11, and does not set the display order. That is, the extraction unit 132 determines that the display order is “-(none)”, assuming that the catcher P and the referee Q included in the image IM11 are not important in the content of the content AT11.

また、抽出部１３２は、コンテンツからトピックを抽出してもよい。例えば、抽出部１３２は、コンテンツＡＴ１１からトピックを抽出する。また、抽出部１３２は、文字情報に基づいてトピックを抽出してもよい。例えば、抽出部１３２は、文字情報ＩＣ１１に基づいてトピックを抽出する。なお、抽出部１３２は、トピック分析（解析）等の種々の従来手法を適宜用いて、コンテンツＡＴ１１からトピックを抽出してもよい。例えば、抽出部１３２は、文字情報ＩＣ１１を形態素解析等の自然言語処理技術を適宜用いて解析することにより、文字情報ＩＣ１１から重要なキーワードをトピックとして抽出してもよい。また、抽出部１３２は、抽出した特徴領域情報等の各種情報を記憶部１２０に記憶してもよい。 Further, the extraction unit 132 may extract a topic from the content. For example, the extraction unit 132 extracts topics from the content AT11. Moreover, the extraction part 132 may extract a topic based on character information. For example, the extraction unit 132 extracts topics based on the character information IC11. The extraction unit 132 may extract topics from the content AT11 by appropriately using various conventional methods such as topic analysis (analysis). For example, the extraction unit 132 may extract important keywords from the character information IC11 as topics by analyzing the character information IC11 using a natural language processing technique such as morphological analysis as appropriate. Further, the extraction unit 132 may store various types of information such as extracted feature region information in the storage unit 120.

（第１生成部１３３）
第１生成部１３３は、複数の加工画像を生成する。例えば、第１生成部１３３は、取得部１３１により取得された特徴領域情報に基づいてコンテンツから複数の加工画像を生成する。図１では、第１生成部１３３は、コンテンツＡＴ１１から複数の加工画像ＩＰ１１１〜ＩＰ１４１等を生成する。例えば、第１生成部１３３は、特徴領域情報ＦＲ１１やオブジェクト一覧ＯＬ１１に基づいて、画像ＩＭ１１をクロッピングすることにより、複数の加工画像ＩＰ１１１〜ＩＰ１４１等を生成する。また、例えば、第１生成部１３３は、画像ＩＭ１１に含まれる対象物（オブジェクトＯＢ１２、ＯＢ１３、ＯＢ１５）に関する領域をクロッピングすることにより生成される加工画像ＩＰ１１１〜ＩＰ１４１を含む、複数の加工画像を生成する。 (First generation unit 133)
The first generation unit 133 generates a plurality of processed images. For example, the first generation unit 133 generates a plurality of processed images from the content based on the feature area information acquired by the acquisition unit 131. In FIG. 1, the first generation unit 133 generates a plurality of processed images IP111 to IP141 from the content AT11. For example, the first generation unit 133 generates a plurality of processed images IP111 to IP141 and the like by cropping the image IM11 based on the feature area information FR11 and the object list OL11. In addition, for example, the first generation unit 133 generates a plurality of processed images including processed images IP111 to IP141 generated by cropping a region related to an object (objects OB12, OB13, and OB15) included in the image IM11. To do.

図１の例では、第１生成部１３３は、ピッチャーＮの略全身が含まれる加工画像ＩＰ１１１やピッチャーＮの背番号部分が含まれる加工画像ＩＰ１２１やボールが含まれる加工画像ＩＰ１３１やバッターＯが含まれる加工画像ＩＰ１４１等を生成する。例えば、第１生成部１３３は、画像ＩＭ１１の領域ＡＲ１１をクロッピングすることにより、加工画像ＩＰ１１１を生成する。また、例えば、第１生成部１３３は、画像ＩＭ１１の領域ＡＲ１２をクロッピングすることにより、加工画像ＩＰ１２１を生成する。また、例えば、第１生成部１３３は、画像ＩＭ１１の領域ＡＲ１３をクロッピングすることにより、加工画像ＩＰ１３１を生成する。また、例えば、第１生成部１３３は、画像ＩＭ１１の領域ＡＲ１４をクロッピングすることにより、加工画像ＩＰ１４１を生成する。 In the example of FIG. 1, the first generation unit 133 includes a processed image IP111 including substantially the whole body of the pitcher N, a processed image IP121 including the back number portion of the pitcher N, a processed image IP131 including the ball, and a batter O. A processed image IP141 or the like to be generated is generated. For example, the first generation unit 133 generates the processed image IP111 by cropping the area AR11 of the image IM11. Further, for example, the first generation unit 133 generates the processed image IP121 by cropping the area AR12 of the image IM11. Further, for example, the first generation unit 133 generates the processed image IP131 by cropping the area AR13 of the image IM11. For example, the 1st production | generation part 133 produces | generates processed image IP141 by cropping area | region AR14 of image IM11.

また、第１生成部１３３は、複数の加工画像ＩＰ１１１〜ＩＰ１４１等の順位を決定する。例えば、第１生成部１３３は、複数の加工画像ＩＰ１１１〜ＩＰ１４１等のうち、表示順が１位のオブジェクトであるピッチャーＮを含む加工画像ＩＰ１１１、ＩＰ１２１等に高い順位を付す。具体的には、第１生成部１３３は、ピッチャーＮの略全身が含まれる加工画像ＩＰ１１１に順位Ａを付し、ピッチャーＮの背番号部分が含まれる加工画像ＩＰ１２１に順位Ｂ（Ａ＋α）を付す。 Moreover, the 1st production | generation part 133 determines the order | rank of several processed image IP111-IP141 grade | etc.,. For example, the first generation unit 133 assigns a higher rank to the processed images IP111, IP121, and the like including the pitcher N that is the first-ranked object among the plurality of processed images IP111 to IP141. Specifically, the first generation unit 133 assigns rank A to the processed image IP111 including the substantially whole body of the pitcher N, and assigns rank B (A + α) to the processed image IP121 including the back number portion of the pitcher N. .

また、例えば、第１生成部１３３は、複数の加工画像ＩＰ１１１〜ＩＰ１４１等のうち、表示順が２位のオブジェクトであるボールを含む加工画像ＩＰ１３１等にピッチャーＮが含まれる加工画像の順位より低い順位を付す。具体的には、第１生成部１３３は、ボール（オブジェクトＯＢ１５）が含まれる加工画像ＩＰ１２１に順位Ｃ（Ｂ＋β）を付す。 In addition, for example, the first generation unit 133 is lower than the order of the processed images in which the pitcher N is included in the processed image IP131 including the ball that is the object in the second display order among the plurality of processed images IP111 to IP141. Give a ranking. Specifically, the first generation unit 133 assigns a rank C (B + β) to the processed image IP121 including the ball (object OB15).

また、例えば、第１生成部１３３は、複数の加工画像ＩＰ１１１〜ＩＰ１４１等のうち、表示順が３位のオブジェクトであるバッターＯを含む加工画像ＩＰ１４１等にボール（オブジェクトＯＢ１５）が含まれる加工画像の順位より低い順位を付す。具体的には、第１生成部１３３は、バッターＯが含まれる加工画像ＩＰ１４１に順位Ｄ（Ｃ＋γ）を付す。 In addition, for example, the first generation unit 133 includes a processed image in which a ball (object OB15) is included in the processed image IP141 including the batter O that is the third-ranked object among the plurality of processed images IP111 to IP141. The ranking is lower than the ranking. Specifically, the first generation unit 133 assigns a rank D (C + γ) to the processed image IP141 including the batter O.

例えば、第１生成部１３３は、複数の画像情報から複数の加工画像を生成する。また、例えば、第１生成部１３３は、複数の画像情報のうち、所定の画像情報に含まれる対象物に関する領域をクロッピングすることにより生成される加工画像を含む、複数の加工画像を生成する。例えば、第１生成部１３３は、動画情報から複数の加工画像を生成する。また、例えば、第１生成部１３３は、動画情報から抽出される画像情報に含まれる対象物に関する領域をクロッピングすることにより生成される加工画像を含む、複数の加工画像を生成する。なお、これらの点についての詳細は後述する。 For example, the first generation unit 133 generates a plurality of processed images from a plurality of image information. Further, for example, the first generation unit 133 generates a plurality of processed images including a processed image generated by cropping a region related to an object included in the predetermined image information among the plurality of image information. For example, the first generation unit 133 generates a plurality of processed images from the moving image information. Further, for example, the first generation unit 133 generates a plurality of processed images including a processed image generated by cropping a region related to an object included in image information extracted from moving image information. Details of these points will be described later.

また、第１生成部１３３は、文字情報に基づいて複数の加工画像を生成してもよい。図１の例では、第１生成部１３３は、文字情報ＩＣ１１に基づいて抽出された特徴領域情報ＦＲ１１やオブジェクト一覧ＯＬ１１に基づいて、画像ＩＭ１１をクロッピングすることにより、複数の加工画像ＩＰ１１１〜ＩＰ１４１等を生成する。また、第１生成部１３３は、音声情報に基づいて複数の加工画像を生成してもよい。なお、第１生成部１３３は、生成した加工画像を記憶部１２０に記憶してもよい。 Moreover, the 1st production | generation part 133 may produce | generate a some processed image based on character information. In the example of FIG. 1, the first generation unit 133 crops the image IM11 based on the feature area information FR11 extracted based on the character information IC11 and the object list OL11, so that a plurality of processed images IP111 to IP141, etc. Is generated. Moreover, the 1st production | generation part 133 may produce | generate a some processed image based on audio | voice information. The first generation unit 133 may store the generated processed image in the storage unit 120.

（第２生成部１３４）
第２生成部１３４は、複数の加工画像に付された順位に基づく順序で、複数の加工画像が表示される動画情報を生成する。なお、第２生成部１３４は、生成した動画情報（要約動画）を記憶部１２０に記憶してもよい。例えば、第２生成部１３４は、文字情報に基づいて複数の加工画像に付された順位に応じた順序で、複数の加工画像が表示される動画情報を生成する。第２生成部１３４は、音声情報に基づいて複数の加工画像に付された順位に応じた順序で、複数の加工画像が表示される動画情報を生成してもよい。例えば、第２生成部１３４は、音声情報に基づく音声を含む動画情報を生成してもよい。 (Second generation unit 134)
The second generation unit 134 generates moving image information in which the plurality of processed images are displayed in an order based on the order given to the plurality of processed images. Note that the second generation unit 134 may store the generated moving image information (summary moving image) in the storage unit 120. For example, the second generation unit 134 generates moving image information in which a plurality of processed images are displayed in an order corresponding to the ranks attached to the plurality of processed images based on character information. The second generation unit 134 may generate moving image information in which a plurality of processed images are displayed in an order corresponding to the order given to the plurality of processed images based on the audio information. For example, the second generation unit 134 may generate moving image information including sound based on the sound information.

図１では、第２生成部１３４は、複数の加工画像ＩＰ１１１〜ＩＰ１４１等に付された順位に基づく順序で、複数の加工画像ＩＰ１１１〜ＩＰ１４１等が表示される要約動画ＭＶ１１を生成する。例えば、第２生成部１３４は、上述したフレーム補間等の処理により、複数の加工画像ＩＰ１１１〜ＩＰ１４１間をつなぐ補間を行うことにより、要約動画ＭＶ１１を生成してもよい。例えば、第２生成部１３４は、生成した加工画像に加工画像ＩＰ１１１〜ＩＰ１４１以外にも多数の加工画像が含まれる場合、複数の加工画像に付された順位に基づく順序で、複数の加工画像が表示される要約動画ＭＶ１１を生成してもよい。 In FIG. 1, the 2nd production | generation part 134 produces | generates the summary moving image MV11 in which the some processed image IP111-IP141 grade | etc., Is displayed in the order based on the order | rank attached | subjected to the some processed image IP111-IP141 grade | etc.,. For example, the second generation unit 134 may generate the summary video MV11 by performing interpolation that connects the plurality of processed images IP111 to IP141 by the above-described processing such as frame interpolation. For example, when the generated processed image includes a large number of processed images in addition to the processed images IP111 to IP141, the second generating unit 134 converts the plurality of processed images in the order based on the order given to the plurality of processed images. The displayed summary video MV11 may be generated.

また、第２生成部１３４は、複数の加工画像のうち、所定の表示順が付された第１の対象物が含まれる加工画像の後に、第１の対象物に付された表示順よりも下位の表示順が付された第２の対象物が含まれる加工画像が表示される動画情報を生成する。図１では、第２生成部１３４は、加工画像ＩＰ１１１〜ＩＰ１４１のうち、表示順「１」位が付されたピッチャーＮが含まれる加工画像ＩＰ１２１の後に、表示順「２」位が付されたボールが含まれる加工画像ＩＰ１３１が表示される要約動画ＭＶ１１を生成する。また、第２生成部１３４は、加工画像ＩＰ１１１〜ＩＰ１４１のうち、表示順「２」位が付されたボールが含まれる加工画像ＩＰ１３１の後に、表示順「３」位が付されたバッターＯが含まれる加工画像ＩＰ１４１が表示される要約動画ＭＶ１１を生成する。 Moreover, the 2nd production | generation part 134 is rather than the display order attached | subjected to the 1st target object after the processed image containing the 1st target object to which the predetermined display order was attached | subjected among several processed images. The moving image information in which the processed image including the second object with the lower display order is displayed is generated. In FIG. 1, the second generation unit 134 is assigned the display order “2” after the processed image IP121 including the pitcher N assigned the display order “1” among the processed images IP111 to IP141. A summary video MV11 on which the processed image IP131 including the ball is displayed is generated. In addition, the second generation unit 134 includes the batter O with the display order “3” after the processed image IP131 including the ball with the display order “2” in the processed images IP111 to IP141. A summary video MV11 on which the processed image IP141 included is displayed is generated.

例えば、第２生成部１３４は、複数の加工画像のうち、所定の対象物の一部であって、所定の表示順が付された第１の部分が含まれる加工画像の後に、所定の対象物の一部であって、第１の部分に付された表示順よりも下位の表示順が付された第２の部分が含まれる加工画像が表示される動画情報を生成する。オブジェクトとしてファッションショー等におけるモデルが含まれる画像を用いた場合を例に説明する。この場合、オブジェクトであるモデル（人間）について、上下方向に３分割、例えば、下から脚部、胴体部、頭部に３分割されるものとする。また、オブジェクトであるモデル（人間）が含まれる動画情報等に基づく学習により、オブジェクトがモデル（人間）である場合、下から上へ脚部、胴体部、頭部の順で表示されることが多いことを示す情報が取得されているものとする。そのため、例えば、オブジェクトであるモデル（人間）について、脚部に表示順「１」位が付され、胴体部に表示順「２」位が付され、頭部に表示順「３」位が付されるものとする。また、オブジェクトとしてモデル（人間）が含まれる動画や画像から第１生成部１３３により生成される各加工画像には、上述した表示順に基づいた順位が付されるものとする。 For example, the second generation unit 134 includes a predetermined target after a processed image that includes a first part that is a part of a predetermined target among a plurality of processed images and that has a predetermined display order. Moving image information is generated in which a processed image including a second part which is a part of the object and has a lower display order than the display order attached to the first part is generated. A case where an image including a model in a fashion show or the like is used as an object will be described as an example. In this case, it is assumed that the model (human) as the object is divided into three parts in the vertical direction, for example, three parts from the bottom into the leg part, the trunk part, and the head part. In addition, when learning based on moving image information including a model (human) that is an object, when the object is a model (human), the leg, body, and head may be displayed in order from the bottom to the top. It is assumed that information indicating that there is a large amount has been acquired. Therefore, for example, for the model (human) as an object, the display order “1” is assigned to the leg, the display order “2” is assigned to the body, and the display order “3” is assigned to the head. Shall be. In addition, each processed image generated by the first generation unit 133 from a moving image or image including a model (human) as an object is given a rank based on the display order described above.

上述した例においては、第２生成部１３４は、オブジェクトであるモデル（人間）において、表示順「１」位が付された脚部が含まれる加工画像の後に、表示順「２」位が付された胴体部が含まれる加工画像が表示される要約動画を生成する。また、第２生成部１３４は、オブジェクトであるモデル（人間）において、表示順「２」位が付された胴体部が含まれる加工画像の後に、表示順「３」位が付された頭部が含まれる加工画像が表示される要約動画を生成する。これにより、第２生成部１３４は、コンテンツに含まれる画像や動画にオブジェクトとしてモデル（人間）が含まれる場合において、下から上へ脚部、胴体部、頭部の順で表示される要約動画を生成することができる。なお、第２生成部１３４は、オブジェクトに応じて、オブジェクトの各部分が種々の順序で表示される要約動画を生成してもよい。例えば、第２生成部１３４は、オブジェクトに応じて、オブジェクトの左、中央、右の順序で表示される要約動画を生成してもよい。また、例えば、第２生成部１３４は、オブジェクトに応じて、オブジェクトの時計回りや反時計回りの順序で表示される要約動画を生成してもよい。また、例えば、第２生成部１３４は、動画や画像に会議のシーンで円卓等に複数人が並ぶ場合、時計回りの順序で円卓に座る人が表示される要約動画を生成してもよい。 In the example described above, the second generation unit 134 attaches the display order “2” after the processed image including the leg part attached with the display order “1” in the model (human) as the object. A summary moving image in which a processed image including the body part is displayed is generated. In addition, the second generation unit 134, in the model (human) that is the object, the head with the display order “3” after the processed image including the body part with the display order “2”. A summary moving image in which a processed image including is displayed is generated. As a result, when the model (human being) is included as an object in the image or moving image included in the content, the second generating unit 134 displays the summary moving image displayed in the order of the leg, the torso, and the head from the bottom to the top. Can be generated. Note that the second generation unit 134 may generate a summary moving image in which each part of the object is displayed in various orders according to the object. For example, the second generation unit 134 may generate a summary video displayed in the order of the left, center, and right of the object according to the object. Further, for example, the second generation unit 134 may generate a summary video displayed in the clockwise or counterclockwise order of the objects according to the object. Further, for example, when a plurality of people line up on a round table or the like in a meeting scene in a video or image, the second generation unit 134 may generate a summary video that displays people sitting on the round table in a clockwise order.

（配信部１３５）
配信部１３５は、コンテンツを端末装置１０へ配信する。例えば、配信部１３５は、第２生成部１３４により生成された要約動画を含むコンテンツを端末装置１０へ配信する。例えば、配信部１３５は、要約動画ＭＶ１１と文字情報ＩＣ１１とを端末装置１０へ配信する。また、配信部１３５は、コンテンツ情報記憶部１２１に記憶されたコンテンツを端末装置１０へ配信してもよい。 (Distribution unit 135)
The distribution unit 135 distributes the content to the terminal device 10. For example, the distribution unit 135 distributes the content including the summary video generated by the second generation unit 134 to the terminal device 10. For example, the distribution unit 135 distributes the summary video MV11 and the character information IC11 to the terminal device 10. Further, the distribution unit 135 may distribute the content stored in the content information storage unit 121 to the terminal device 10.

〔４．端末装置の構成〕
次に、図５を用いて、実施形態に係る端末装置１０の構成について説明する。図５は、実施形態に係る端末装置１０の構成例を示す図である。図５に示すように、端末装置１０は、通信部１１と、記憶部１２と、入力部１３と、出力部１４と、制御部１５とを有する。 [4. Configuration of terminal device]
Next, the configuration of the terminal device 10 according to the embodiment will be described with reference to FIG. FIG. 5 is a diagram illustrating a configuration example of the terminal device 10 according to the embodiment. As illustrated in FIG. 5, the terminal device 10 includes a communication unit 11, a storage unit 12, an input unit 13, an output unit 14, and a control unit 15.

（通信部１１）
通信部１１は、例えば、通信回路等によって実現される。そして、通信部１１は、図示しない所定の通信網と有線または無線で接続され、生成装置１００との間で情報の送受信を行う。 (Communication unit 11)
The communication unit 11 is realized by a communication circuit or the like, for example. The communication unit 11 is connected to a predetermined communication network (not shown) in a wired or wireless manner, and transmits / receives information to / from the generation device 100.

（記憶部１２）
記憶部１２は、例えば、ＲＡＭ、フラッシュメモリ等の半導体メモリ素子、または、ハードディスク、光ディスク等の記憶装置によって実現される。記憶部１２は、例えば、端末装置１０にインストールされているアプリケーションに関する情報、例えばプログラム等を記憶する。 (Storage unit 12)
The storage unit 12 is realized by, for example, a semiconductor memory element such as a RAM or a flash memory, or a storage device such as a hard disk or an optical disk. The storage unit 12 stores, for example, information related to applications installed in the terminal device 10, such as programs.

（入力部１３）
入力部１３は、ユーザからの各種操作を受け付ける。例えば、入力部１３は、タッチパネル機能により表示面（例えば表示部１５３）を介してユーザからの各種操作を受け付けてもよい。また、入力部１３は、端末装置１０に設けられたボタンや、端末装置１０に接続されたキーボードやマウスからの各種操作を受け付けてもよい。 (Input unit 13)
The input unit 13 receives various operations from the user. For example, the input unit 13 may accept various operations from the user via a display surface (for example, the display unit 153) by a touch panel function. Further, the input unit 13 may accept various operations from buttons provided on the terminal device 10 or a keyboard or mouse connected to the terminal device 10.

（出力部１４）
出力部１４は、例えば液晶ディスプレイや有機ＥＬ（Electro-Luminescence）ディスプレイ等によって実現されるタブレット端末等の表示画面であり、各種情報を表示するための表示装置である。 (Output unit 14)
The output unit 14 is a display screen such as a tablet terminal realized by, for example, a liquid crystal display or an organic EL (Electro-Luminescence) display, and is a display device for displaying various types of information.

（制御部１５）
制御部１５は、例えば、ＣＰＵやＭＰＵ等によって、端末装置１０内部の記憶部１２などの記憶装置に記憶されている各種プログラムがＲＡＭを作業領域として実行されることにより実現される。例えば、この各種プログラムは、インストールされているアプリケーションのプログラムが含まれる。また、制御部１５は、例えば、ＡＳＩＣやＦＰＧＡ等の集積回路により実現される。 (Control unit 15)
The control unit 15 is realized, for example, by executing various programs stored in a storage device such as the storage unit 12 inside the terminal device 10 using the RAM as a work area by a CPU, an MPU, or the like. For example, the various programs include installed application programs. The control unit 15 is realized by an integrated circuit such as an ASIC or FPGA, for example.

図５に示すように、制御部１５は、送信部１５１と、受信部１５２と、表示部１５３とを有し、以下に説明する生成処理の機能や作用を実現または実行する。なお、制御部１５の内部構成は、図５に示した構成に限られず、後述する生成処理を行う構成であれば他の構成であってもよい。また、制御部１５が有する各処理部の接続関係は、図５に示した接続関係に限られず、他の接続関係であってもよい。 As illustrated in FIG. 5, the control unit 15 includes a transmission unit 151, a reception unit 152, and a display unit 153, and realizes or executes functions and operations of generation processing described below. Note that the internal configuration of the control unit 15 is not limited to the configuration illustrated in FIG. 5, and may be another configuration as long as it performs a generation process described later. Further, the connection relationship between the processing units included in the control unit 15 is not limited to the connection relationship illustrated in FIG. 5, and may be another connection relationship.

送信部１５１は、各種情報を外部の情報処理装置へ送信する。送信部１５１は、入力部１３により受け付けたユーザ操作に従って、生成装置１００へコンテンツの配信要求を送信する。例えば、送信部１５１は、アプリからの配信要求を生成装置１００へ送信する。 The transmission unit 151 transmits various types of information to an external information processing apparatus. The transmission unit 151 transmits a content distribution request to the generation device 100 according to the user operation received by the input unit 13. For example, the transmission unit 151 transmits a distribution request from the application to the generation device 100.

受信部１５２は、各種情報を外部の情報処理装置から受信する。受信部１５２は、生成装置１００から配信されたコンテンツを受信する。例えば、受信部１５２は、コンテンツから生成された要約動画を受信する。例えば、受信部１５２は、要約動画ＭＶ１１や文字情報ＩＣ１１（図６参照）を受信する。 The receiving unit 152 receives various types of information from an external information processing apparatus. The receiving unit 152 receives content distributed from the generation device 100. For example, the receiving unit 152 receives a summary video generated from content. For example, the receiving unit 152 receives the summary video MV11 and the character information IC11 (see FIG. 6).

表示部１５３は、受信部１５２により受信されたコンテンツを表示する。例えば、表示部１５３は、受信部１５２により受信された要約動画ＭＶ１１や文字情報ＩＣ１１を含むウェブページＷ１０（図６参照）を表示する。 Display unit 153 displays the content received by receiving unit 152. For example, the display unit 153 displays the web page W10 (see FIG. 6) including the summary moving image MV11 and the character information IC11 received by the receiving unit 152.

なお、上述した制御部１５による表示処理等の処理は、例えば、ＪａｖａＳｃｒｉｐｔ（登録商標）などにより実現されてもよい。また、上述した表示処理が所定のアプリケーションにより行われる場合や表示処理が専用アプリにより行われる場合、制御部１５は、例えば、所定のアプリや専用アプリを制御するアプリ制御部を有してもよい。 Note that the processing such as the display processing by the control unit 15 described above may be realized by, for example, JavaScript (registered trademark). In addition, when the display process described above is performed by a predetermined application or when the display process is performed by a dedicated application, the control unit 15 may include, for example, an application control unit that controls the predetermined application or the dedicated application. .

〔５．加工画像を含むコンテンツの表示例〕
次に、図６を用いて、実施形態に係る端末装置１０における要約動画を含むコンテンツの表示について説明する。図６は、実施形態に係る端末装置における表示の一例を示す図である。図６では、端末装置１０が生成装置１００から要約動画ＭＶ１１や文字情報ＩＣ１１等を受信した場合を例に説明する。 [5. Display example of content including processed images)
Next, display of content including a summary video in the terminal device 10 according to the embodiment will be described with reference to FIG. FIG. 6 is a diagram illustrating an example of display on the terminal device according to the embodiment. In FIG. 6, the case where the terminal device 10 receives the summary video MV11, the character information IC11, and the like from the generation device 100 will be described as an example.

図６に示す例において、端末装置１０には、要約動画ＭＶ１１や文字情報ＩＣ１１を含むウェブページＷ１０が表示される。例えば、文字情報ＩＣ１１の下部には、図示しない他の要約動画や文字情報が並べて配置されており、ユーザがスクロール操作等を行うことにより、表示する要約動画や文字情報を変更することができる。 In the example illustrated in FIG. 6, the terminal device 10 displays a web page W10 including the summary video MV11 and the character information IC11. For example, other summary videos and character information (not shown) are arranged side by side below the character information IC 11, and the user can change the summary video and character information to be displayed by performing a scroll operation or the like.

例えば、端末装置１０に表示された要約動画ＭＶ１１がユーザにより選択された場合、端末装置１０は、要約動画ＭＶ１１を再生してもよい。また、例えば、端末装置１０において要約動画ＭＶ１１が表示された領域をユーザがタッチした場合、端末装置１０は、要約動画ＭＶ１１を再生してもよい。また、例えば、端末装置１０がユーザの視線を検知する機能を有する場合、端末装置１０において要約動画ＭＶ１１が表示された領域へのユーザの視線を検知した場合、端末装置１０は、要約動画ＭＶ１１を再生してもよい。なお、端末装置１０は、要約動画に音声情報が含まれる場合、要約動画の再生に応じて音声情報をスピーカ等により出力してもよい。また、端末装置１０は、ユーザがスクロール操作に応じて、要約動画ＭＶ１１を再生してもよい。例えば、端末装置１０は、要約動画ＭＶ１１を画面の下側へ移動させるスクロール操作を行った場合、その移動量に応じて要約動画ＭＶ１１の表示を進めてもよい。例えば、図６に示す状態において、要約動画ＭＶ１１を画面の下側へ移動させるスクロール操作を行った場合、各加工画像に付された順序に従って、加工画像ＩＰ１１１から加工画像ＩＰ１２１、ＩＰ１３１、ＩＰ１４１等に要約動画ＭＶ１１の表示を順番に変更してもよい。すなわち、端末装置１０は、要約動画ＭＶ１１を画面の下側へ移動させるスクロール操作に応じて要約動画ＭＶ１１を再生してもよい。 For example, when the summary video MV11 displayed on the terminal device 10 is selected by the user, the terminal device 10 may reproduce the summary video MV11. For example, when the user touches an area where the summary video MV11 is displayed on the terminal device 10, the terminal device 10 may reproduce the summary video MV11. Further, for example, when the terminal device 10 has a function of detecting the user's line of sight, when the terminal device 10 detects the user's line of sight to the area where the summary video MV11 is displayed, the terminal device 10 displays the summary video MV11. You may replay it. Note that when the summary video includes audio information, the terminal device 10 may output the audio information through a speaker or the like according to the playback of the summary video. Moreover, the terminal device 10 may reproduce the summary video MV11 in response to a scroll operation by the user. For example, when the terminal device 10 performs a scroll operation for moving the summary video MV11 to the lower side of the screen, the terminal device 10 may advance the display of the summary video MV11 according to the movement amount. For example, in the state shown in FIG. 6, when a scroll operation for moving the summary video MV11 to the lower side of the screen is performed, the processed image IP111 is changed to the processed images IP121, IP131, IP141, etc. according to the order given to each processed image. The display of the summary video MV11 may be changed in order. That is, the terminal device 10 may reproduce the summary video MV11 in response to a scroll operation for moving the summary video MV11 to the lower side of the screen.

また、例えば、端末装置１０は、要約動画ＭＶ１１を画面の上側へ移動させるスクロール操作を行った場合、その移動量に応じて要約動画ＭＶ１１の表示を戻してもよい。例えば、端末装置１０における要約動画ＭＶ１１の表示が加工画像ＩＰ１４１である場合、要約動画ＭＶ１１を画面の上側へ移動させるスクロール操作を行った場合、各加工画像に付された順序に従って、加工画像ＩＰ１４１から加工画像ＩＰ１３１、ＩＰ１２１、ＩＰ１１１等に要約動画ＭＶ１１の表示を順番に変更してもよい。すなわち、端末装置１０は、要約動画ＭＶ１１を画面の上側へ移動させるスクロール操作に応じて要約動画ＭＶ１１を逆再生してもよい。 For example, when the terminal device 10 performs a scroll operation for moving the summary video MV11 to the upper side of the screen, the terminal device 10 may return the display of the summary video MV11 according to the movement amount. For example, when the summary moving image MV11 is displayed on the terminal device 10 as the processed image IP141, when the scroll operation for moving the summarized moving image MV11 to the upper side of the screen is performed, the processed image IP141 starts from the processed image IP141 according to the order given to each processed image. The display of the summary video MV11 may be sequentially changed to the processed images IP131, IP121, IP111, and the like. That is, the terminal device 10 may reversely reproduce the summary video MV11 in accordance with a scroll operation for moving the summary video MV11 to the upper side of the screen.

また、端末装置１０は、要約動画ＭＶ１１の表示に応じて、文字情報ＩＣ１１の表示を変更してもよい。例えば、端末装置１０は、文字情報ＩＣ１１のうち、要約動画ＭＶ１１の表示に対応する文章を表示してもよい。例えば、図６に示す状態において、端末装置１０は、文字情報ＩＣ１１のうち、要約動画ＭＶ１１の表示に対応する文章「チームＡＡのピッチャーＮが投げた…」を表示してもよい。また、例えば、要約動画ＭＶ１１の表示が加工画像ＩＰ１４１である状態において、端末装置１０は、文字情報ＩＣ１１のうち、要約動画ＭＶ１１の表示に対応する文章「チームＢＢのバッターＯが打ち返し…」を表示してもよい。 Further, the terminal device 10 may change the display of the character information IC11 in accordance with the display of the summary video MV11. For example, the terminal device 10 may display text corresponding to the display of the summary video MV11 in the character information IC11. For example, in the state illustrated in FIG. 6, the terminal device 10 may display a sentence “Pitcher N of team AA threw ...” corresponding to the display of the summary video MV11 in the character information IC11. Further, for example, in a state where the display of the summary video MV11 is the processed image IP141, the terminal device 10 displays the text “Battery of Team BB batters ...” corresponding to the display of the summary video MV11 in the character information IC11. May be.

なお、図６に示すウェブページＷ１０の表示は一例であり、端末装置１０には、どのような対応において要約動画ＭＶ１１や文字情報ＩＣ１１が表示されてもよい。例えば、端末装置１０には、要約動画ＭＶ１１と文字情報ＩＣ１１とは横方向に並べて表示されてもよい。 Note that the display of the web page W10 illustrated in FIG. 6 is an example, and the terminal device 10 may display the summary video MV11 and the character information IC11 in any correspondence. For example, the summary video MV11 and the character information IC11 may be displayed side by side in the horizontal direction on the terminal device 10.

〔６．生成処理フロー〕
次に、図７を用いて、実施形態に係る配信システム１におけるコンテンツの生成処理について説明する。図７は、実施形態に係る生成処理の一例を示すフローチャートである。 [6. Generation process flow)
Next, content generation processing in the distribution system 1 according to the embodiment will be described with reference to FIG. FIG. 7 is a flowchart illustrating an example of the generation process according to the embodiment.

まず、図７に示す例において、生成装置１００は、画像及び文字情報を含むコンテンツを取得する（ステップＳ１０１）。例えば、生成装置１００は、画像及び文字情報を含むコンテンツをコンテンツ情報記憶部１２１から取得する。 First, in the example illustrated in FIG. 7, the generation apparatus 100 acquires content including image and character information (Step S <b> 101). For example, the generation apparatus 100 acquires content including image and character information from the content information storage unit 121.

そして、生成装置１００は、画像及び文字情報に基づいて、画像から特徴領域情報を抽出する（ステップＳ１０２）。図１では、生成装置１００は、画像ＩＭ１１と文字情報ＩＣ１１とに基づいて画像ＩＭ１１から特徴領域情報ＦＲ１１を抽出する。 Then, the generation apparatus 100 extracts feature area information from the image based on the image and character information (step S102). In FIG. 1, the generation device 100 extracts feature area information FR11 from the image IM11 based on the image IM11 and the character information IC11.

また、生成装置１００は、画像に含まれるオブジェクトの表示順を決定する（ステップＳ１０３）。図１では、生成装置１００は、オブジェクトＯＢ１２の表示順を１位、オブジェクトＯＢ１３の表示順を３位、オブジェクトＯＢ１５の表示順を２位に決定する。 Further, the generation device 100 determines the display order of the objects included in the image (step S103). In FIG. 1, the generation apparatus 100 determines that the display order of the object OB12 is first, the display order of the object OB13 is third, and the display order of the object OB15 is second.

その後、生成装置１００は、特徴領域情報に基づいてコンテンツから複数の加工画像を生成する（ステップＳ１０４）。図１では、生成装置１００は、コンテンツＡＴ１１から複数の加工画像ＩＰ１１１〜ＩＰ１４１等を生成する。 Thereafter, the generation device 100 generates a plurality of processed images from the content based on the feature area information (step S104). In FIG. 1, the generation apparatus 100 generates a plurality of processed images IP111 to IP141 from the content AT11.

その後、生成装置１００は、オブジェクトの表示順に基づく順序で、加工画像が表示される動画を生成する（ステップＳ１０５）。図１では、生成装置１００は、複数の加工画像ＩＰ１１１〜ＩＰ１４１等に付された順位に基づく順序で、複数の加工画像ＩＰ１１１〜ＩＰ１４１等が表示される要約動画ＭＶ１１を生成する。 Thereafter, the generation device 100 generates a moving image on which the processed images are displayed in an order based on the display order of the objects (step S105). In FIG. 1, the generation device 100 generates a summary video MV11 in which a plurality of processed images IP111 to IP141 and the like are displayed in an order based on the ranks assigned to the plurality of processed images IP111 to IP141 and the like.

〔７．動画を用いた生成処理〕
次に、図８及び図９を用いて、実施形態に係る動画を用いた生成処理について説明する。図８及び図９は、実施形態に係る動画を用いた生成処理の一例を示す図である。 [7. (Generation process using video)
Next, generation processing using a moving image according to the embodiment will be described with reference to FIGS. 8 and 9. 8 and 9 are diagrams illustrating an example of generation processing using a moving image according to the embodiment.

なお、図８や図９の説明においては、特徴領域情報の抽出等の説明は省略するが、各フレームＦＭ２１１〜ＦＭ２６１を画像ＩＭ１１と同様の情報として、図１と同様の処理を行うことにより、特徴領域情報を抽出してもよい。例えば、生成装置１００は、フレームＦＭ２１１〜ＦＭ２６１ごとに特徴領域情報の抽出の処理を行ってもよい。また、オブジェクトの追跡等の種々の従来技術を用いて、各フレームＦＭ２１１〜ＦＭ２６１に含まれるオブジェクトの位置等を特定してもよい。例えば、生成装置１００は、所定間隔で抽出したフレーム（例えばフレームＦＭ２１１等）に対して特徴領域情報の抽出の処理を行い、フレームＦＭ２１１から抽出されたオブジェクトを追跡することにより、各フレームＦＭ２１１〜ＦＭ２６１から特徴領域情報の抽出の処理を行ってもよい。なお、上記は一例であり、生成装置１００は、フレームから特徴領域情報を抽出し、オブジェクトが特定可能であれば、どのような処理により、特徴領域情報の抽出を行ってもよい。まず、図８における動画ＭＣ２１を用いた生成処理について説明する。 In the description of FIG. 8 and FIG. 9, the description of extraction of the feature area information and the like is omitted, but the same processing as in FIG. 1 is performed by using the frames FM211 to FM261 as the same information as the image IM11. Feature area information may be extracted. For example, the generating apparatus 100 may perform the feature area information extraction process for each of the frames FM211 to FM261. Further, the position of the object included in each of the frames FM211 to FM261 may be specified by using various conventional techniques such as object tracking. For example, the generation apparatus 100 performs feature region information extraction processing on frames extracted at predetermined intervals (for example, the frame FM 211), and tracks the objects extracted from the frame FM 211, whereby each frame FM 211 to FM 261 is detected. The process of extracting feature region information from the above may be performed. Note that the above is an example, and the generation apparatus 100 may extract the feature region information from the frame and extract the feature region information by any process as long as the object can be identified. First, the generation process using the moving image MC21 in FIG. 8 will be described.

図８の例においては、生成装置１００は、上述した処理により動画ＭＣ２１に含まれるオブジェクトを抽出し、抽出したオブジェクトの表示順を決定する。例えば、生成装置１００は、オブジェクト一覧ＯＬ２１に示すように、動画ＭＣ２１に含まれるオブジェクトＯＢ２１〜ＯＢ２３等を抽出する。図８の例では、生成装置１００は、動画ＭＣ２１に含まれる犬ＡをオブジェクトＯＢ２１として抽出する。また、生成装置１００は、動画ＭＣ２１に含まれる犬ＢをオブジェクトＯＢ２２として抽出する。また、生成装置１００は、動画ＭＣ２１に含まれるボールをオブジェクトＯＢ２３として抽出する。 In the example of FIG. 8, the generation apparatus 100 extracts objects included in the moving image MC21 by the above-described processing, and determines the display order of the extracted objects. For example, as illustrated in the object list OL21, the generation apparatus 100 extracts objects OB21 to OB23 and the like included in the moving image MC21. In the example of FIG. 8, the generation device 100 extracts the dog A included in the moving image MC21 as the object OB21. Further, the generation device 100 extracts the dog B included in the moving image MC21 as the object OB22. Further, the generation device 100 extracts a ball included in the moving image MC21 as the object OB23.

また、生成装置１００は、抽出したオブジェクトＯＢ２１〜ＯＢ２３等の表示順を決定する。例えば、生成装置１００は、各フレームＦＭ２１１〜ＦＭ２６１等における撮影範囲の変化や、動画ＭＣ２１におけるオブジェクトＯＢ２１〜ＯＢ２３等の位置の変化に基づいて、オブジェクトＯＢ２１〜ＯＢ２３等の表示順を決定する。なお、図８の例では、説明を簡単にするために、例えば定点カメラのように、撮影範囲は固定されているものとする。そのため、生成装置１００は、オブジェクトＯＢ２１〜ＯＢ２３等の位置の変化に基づいて、オブジェクトＯＢ２１〜ＯＢ２３等の表示順を決定する。図８の例では、ボールが犬Ａの前を通過し左側から右側へ移動し、右側において犬Ｂがボールと重なる。そのため、生成装置１００は、種々の従来技術を適宜用いて、犬Ａの表示順よりも犬Ｂの表示順が後であると決定する。また、生成装置１００は、ボールが犬Ａと犬Ｂとをつなぐ関係にあるため、ボールの表示順を犬Ａと犬Ｂとの間の表示順であると決定する。これにより、生成装置１００は、左側に位置する犬ＡであるオブジェクトＯＢ２１の表示順を１位、ボールであるオブジェクトＯＢ２３の表示順を２位、右側に位置する犬ＢであるオブジェクトＯＢ２２の表示順を３位に決定する。 Further, the generation apparatus 100 determines the display order of the extracted objects OB21 to OB23 and the like. For example, the generating apparatus 100 determines the display order of the objects OB21 to OB23 and the like based on the change in the shooting range in each frame FM211 to FM261 and the change in the position of the objects OB21 to OB23 and the like in the moving image MC21. In the example of FIG. 8, in order to simplify the description, it is assumed that the shooting range is fixed, for example, like a fixed point camera. Therefore, the generation apparatus 100 determines the display order of the objects OB21 to OB23 and the like based on the change in the positions of the objects OB21 to OB23 and the like. In the example of FIG. 8, the ball passes in front of the dog A and moves from the left side to the right side, and the dog B overlaps the ball on the right side. Therefore, the generation apparatus 100 determines that the display order of the dog B is later than the display order of the dog A by appropriately using various conventional techniques. In addition, since the ball has a relationship connecting the dog A and the dog B, the generation apparatus 100 determines that the display order of the balls is the display order between the dog A and the dog B. Accordingly, the generation apparatus 100 places the display order of the object OB21, which is the dog A located on the left side, in the first place, the display order of the object OB23, which is the ball, and the display order of the object OB22, which is the dog B located on the right side. Will be ranked 3rd.

そして、生成装置１００は、フレームＦＭ２１１〜ＦＭ２６１等を含む動画ＭＣ２１から複数の加工画像ＩＰ２１１〜ＩＰ２６１等を生成し、生成した複数の加工画像ＩＰ２１１〜ＩＰ２６１が表示される要約動画ＭＶ２１を生成する（ステップＳ２１）。例えば、生成装置１００は、各フレームの特徴領域情報やオブジェクト一覧ＯＬ２１に基づいて、対応するフレーム（画像）をクロッピングすることにより、複数の加工画像ＩＰ２１１〜ＩＰ２６１等を生成する。 Then, the generating apparatus 100 generates a plurality of processed images IP211 to IP261 and the like from the moving image MC21 including the frames FM211 to FM261 and the like, and generates a summary moving image MV21 in which the generated plurality of processed images IP211 to IP261 are displayed (step). S21). For example, the generating apparatus 100 generates a plurality of processed images IP211 to IP261 and the like by cropping corresponding frames (images) based on the feature area information of each frame and the object list OL21.

図８の例では、生成装置１００は、犬Ａの全体が含まれる加工画像ＩＰ２１１、ＩＰ２２１、ＩＰ２３１やボールが含まれる加工画像ＩＰ２４１やボール及び犬Ｂが含まれる加工画像ＩＰ２５１、ＩＰ２６１等を生成する。例えば、生成装置１００は、フレームＦＭ２１１の領域ＡＲ２１をクロッピングすることにより、加工画像ＩＰ２１１を生成する。また、例えば、生成装置１００は、フレームＦＭ２２１の領域ＡＲ２２をクロッピングすることにより、加工画像ＩＰ２２１を生成する。また、例えば、生成装置１００は、フレームＦＭ２３１の領域ＡＲ２３をクロッピングすることにより、加工画像ＩＰ２３１を生成する。また、例えば、生成装置１００は、フレームＦＭ２４１の領域ＡＲ２４をクロッピングすることにより、加工画像ＩＰ２４１を生成する。また、例えば、生成装置１００は、フレームＦＭ２５１の領域ＡＲ２５をクロッピングすることにより、加工画像ＩＰ２５１を生成する。また、例えば、生成装置１００は、フレームＦＭ２６１の領域ＡＲ２６をクロッピングすることにより、加工画像ＩＰ２６１を生成する。なお、図８では説明を簡単にするために、加工画像ＩＰ２１１〜ＩＰ２６１のみを図示するが、生成装置１００は、他のフレーム（画像）から多数の加工画像を生成してもよい。また、生成装置１００は、１つのフレームから複数の加工画像を生成してもよい。 In the example of FIG. 8, the generation apparatus 100 generates processed images IP211, IP221, IP231 including the entire dog A, processed images IP241 including the ball, processed images IP251, IP261 including the ball and the dog B, and the like. . For example, the generating apparatus 100 generates the processed image IP211 by cropping the area AR21 of the frame FM211. Further, for example, the generating apparatus 100 generates the processed image IP221 by cropping the area AR22 of the frame FM221. Further, for example, the generating apparatus 100 generates the processed image IP231 by cropping the area AR23 of the frame FM231. Further, for example, the generation apparatus 100 generates the processed image IP241 by cropping the area AR24 of the frame FM241. Further, for example, the generating apparatus 100 generates the processed image IP251 by cropping the area AR25 of the frame FM251. Further, for example, the generating apparatus 100 generates the processed image IP261 by cropping the area AR26 of the frame FM261. In FIG. 8, only the processed images IP211 to IP261 are illustrated for the sake of simplicity, but the generating apparatus 100 may generate a large number of processed images from other frames (images). Further, the generation apparatus 100 may generate a plurality of processed images from one frame.

そして、生成装置１００は、複数の加工画像ＩＰ２１１〜ＩＰ２６１等の順位を決定する。なお、図８の例では、各加工画像ＩＰ２１１〜ＩＰ２６１等の順位は、抽出元となるフレームＦＭ２１１〜ＦＭ２６１の時系列順に対応する。例えば、生成装置１００は、複数の加工画像ＩＰ２１１〜ＩＰ２６１等のうち、加工画像ＩＰ２１１を最も表示順を高くし、その次に加工画像ＩＰ２２１の順位を高くする。また、生成装置１００は、複数の加工画像ＩＰ２１１〜ＩＰ２６１等のうち、加工画像ＩＰ２３１の順位を加工画像ＩＰ２２１の次に高くし、加工画像ＩＰ２４１、ＩＰ２５１、ＩＰ２６１の順位は、加工画像ＩＰ２３１よりも低く、加工画像ＩＰ２４１、ＩＰ２５１、ＩＰ２６１の順に低くなる順位とする。図８の例では、生成装置１００は、加工画像ＩＰ２１１の順位を順位Ａ、加工画像ＩＰ２２１の順位を順位Ｂ、加工画像ＩＰ２３１の順位を順位Ｃ、加工画像ＩＰ２４１の順位を順位Ｄ、加工画像ＩＰ２５１の順位を順位Ｅ、加工画像ＩＰ２６１の順位を順位Ｆに決定する（Ａ＜Ｂ＜Ｃ＜Ｄ＜Ｅ＜Ｆ）。 Then, the generation apparatus 100 determines the order of the plurality of processed images IP211 to IP261 and the like. In the example of FIG. 8, the ranks of the processed images IP211 to IP261 correspond to the time series order of the frames FM211 to FM261 that are the extraction sources. For example, the generation apparatus 100 sets the display order of the processed image IP211 among the plurality of processed images IP211 to IP261 and the like, and then increases the rank of the processed image IP221. Further, the generation apparatus 100 sets the rank of the processed image IP231 next to the processed image IP221 among the plurality of processed images IP211 to IP261, and the rank of the processed images IP241, IP251, and IP261 is lower than that of the processed image IP231. , And the processed images IP241, IP251, and IP261 are ranked in descending order. In the example of FIG. 8, the generation device 100 sets the rank of the processed image IP211 as rank A, the rank of the processed image IP221 as rank B, the rank of the processed image IP231 as rank C, the rank of the processed image IP241 as rank D, and the processed image IP251. Is ranked E, and the processed image IP 261 is ranked F (A <B <C <D <E <F).

そして、生成装置１００は、複数の加工画像ＩＰ２１１〜ＩＰ２６１等に付された順位に基づく順序で、複数の加工画像ＩＰ２１１〜ＩＰ２６１等が表示される要約動画ＭＶ２１を生成する。例えば、生成装置１００は、上述したフレーム補間等の処理により、複数の加工画像ＩＰ２１１〜ＩＰ２６１間をつなぐ補間を行うことにより、要約動画ＭＶ２１を生成してもよい。例えば、生成装置１００は、生成した加工画像に加工画像ＩＰ２１１〜ＩＰ２６１以外にも多数の加工画像が含まれる場合、複数の加工画像に付された順位に基づく順序で、複数の加工画像が表示される要約動画ＭＶ２１を生成してもよい。このように、生成装置１００は、動画から要約動画を生成することができる。なお、上述のように、複数のフレームから要約動画を生成する処理は、複数の画像から要約動画を生成する処理に対応する。 Then, the generation device 100 generates the summary video MV21 in which the plurality of processed images IP211 to IP261 and the like are displayed in the order based on the order given to the plurality of processed images IP211 to IP261 and the like. For example, the generation apparatus 100 may generate the summary video MV21 by performing interpolation that connects the plurality of processed images IP211 to IP261 by the above-described processing such as frame interpolation. For example, when the generated processed image includes a large number of processed images in addition to the processed images IP211 to IP261, the generated device 100 displays the plurality of processed images in an order based on the order given to the plurality of processed images. A summary video MV21 may be generated. In this way, the generation device 100 can generate a summary video from a video. Note that, as described above, the process of generating a summary movie from a plurality of frames corresponds to the process of generating a summary movie from a plurality of images.

次に、図９における動画ＭＣ２１を用いた生成処理について説明する。図９の例においては、生成装置１００は、上述した処理により動画ＭＣ２１に含まれるオブジェクトを抽出し、抽出したオブジェクトの表示順を決定する。例えば、生成装置１００は、オブジェクト一覧ＯＬ２１に示すように、動画ＭＣ２１に含まれるオブジェクトＯＢ２１〜ＯＢ２３等を抽出する。図９の例では、生成装置１００は、動画ＭＣ２１に含まれる犬ＡをオブジェクトＯＢ２１として抽出する。また、生成装置１００は、動画ＭＣ２１に含まれる犬ＢをオブジェクトＯＢ２２として抽出する。また、生成装置１００は、動画ＭＣ２１に含まれるボールをオブジェクトＯＢ２３として抽出する。 Next, the generation process using the moving image MC21 in FIG. 9 will be described. In the example of FIG. 9, the generation apparatus 100 extracts objects included in the moving image MC21 by the above-described processing, and determines the display order of the extracted objects. For example, as illustrated in the object list OL21, the generation apparatus 100 extracts objects OB21 to OB23 and the like included in the moving image MC21. In the example of FIG. 9, the generation device 100 extracts the dog A included in the moving image MC21 as the object OB21. Further, the generation device 100 extracts the dog B included in the moving image MC21 as the object OB22. Further, the generation device 100 extracts a ball included in the moving image MC21 as the object OB23.

また、生成装置１００は、抽出したオブジェクトＯＢ２１〜ＯＢ２３等の表示順を決定する。例えば、生成装置１００は、各フレームＦＭ２１１〜ＦＭ２６１等における撮影範囲の変化や、動画ＭＣ２１におけるオブジェクトＯＢ２１〜ＯＢ２３等の位置の変化に基づいて、オブジェクトＯＢ２１〜ＯＢ２３等の表示順を決定する。なお、図９の例では、説明を簡単にするために、例えば定点カメラのように、撮影範囲は固定されているものとする。そのため、生成装置１００は、オブジェクトＯＢ２１〜ＯＢ２３等の位置の変化に基づいて、オブジェクトＯＢ２１〜ＯＢ２３等の表示順を決定する。図９の例では、ボールが犬Ａの前を通過し左側から右側へ移動し、右側において犬Ｂがボールと重なる。そのため、生成装置１００は、種々の従来技術を適宜用いて、犬Ａの表示順よりも犬Ｂの表示順が後であると決定する。また、生成装置１００は、ボールが犬Ａと犬Ｂとをつなぐ関係にあるため、ボールの表示順を犬Ａと犬Ｂとの間の表示順であると決定する。これにより、生成装置１００は、左側に位置する犬ＡであるオブジェクトＯＢ２１の表示順を１位、ボールであるオブジェクトＯＢ２３の表示順を２位、右側に位置する犬ＢであるオブジェクトＯＢ２２の表示順を３位に決定する。 Further, the generation apparatus 100 determines the display order of the extracted objects OB21 to OB23 and the like. For example, the generating apparatus 100 determines the display order of the objects OB21 to OB23 and the like based on the change in the shooting range in each frame FM211 to FM261 and the change in the position of the objects OB21 to OB23 and the like in the moving image MC21. In the example of FIG. 9, it is assumed that the shooting range is fixed, for example, like a fixed point camera, for the sake of simplicity. Therefore, the generation apparatus 100 determines the display order of the objects OB21 to OB23 and the like based on the change in the positions of the objects OB21 to OB23 and the like. In the example of FIG. 9, the ball passes in front of dog A and moves from the left side to the right side, and dog B overlaps the ball on the right side. Therefore, the generation apparatus 100 determines that the display order of the dog B is later than the display order of the dog A by appropriately using various conventional techniques. In addition, since the ball has a relationship connecting the dog A and the dog B, the generation apparatus 100 determines that the display order of the balls is the display order between the dog A and the dog B. Accordingly, the generation apparatus 100 places the display order of the object OB21, which is the dog A located on the left side, in the first place, the display order of the object OB23, which is the ball, and the display order of the object OB22, which is the dog B located on the right side. Will be ranked 3rd.

そして、生成装置１００は、フレームＦＭ２１１〜ＦＭ２６１等を含む動画ＭＣ２１から複数の加工画像ＩＰ３１１〜ＩＰ３６１等を生成し、生成した複数の加工画像ＩＰ３１１〜ＩＰ３６１が表示される要約動画ＭＶ２２を生成する（ステップＳ２２）。例えば、生成装置１００は、各フレームの特徴領域情報やオブジェクト一覧ＯＬ２１に基づいて、対応するフレーム（画像）をクロッピングすることにより、複数の加工画像ＩＰ３１１〜ＩＰ３６１等を生成する。 Then, the generating apparatus 100 generates a plurality of processed images IP311 to IP361 and the like from the moving image MC21 including the frames FM211 to FM261, and generates a summary moving image MV22 on which the generated plurality of processed images IP311 to IP361 are displayed (Step). S22). For example, the generating apparatus 100 generates a plurality of processed images IP311 to IP361 by cropping corresponding frames (images) based on the feature area information of each frame and the object list OL21.

図９の例では、生成装置１００は、犬Ａの全体が含まれる加工画像ＩＰ３１１、ＩＰ３２１、ＩＰ３３１やボールが含まれる加工画像ＩＰ３４１やボール及び犬Ｂが含まれる加工画像ＩＰ３５１、ＩＰ３６１等を生成する。例えば、生成装置１００は、フレームＦＭ２１１の領域ＡＲ３１をクロッピングすることにより、加工画像ＩＰ３１１を生成する。また、例えば、生成装置１００は、フレームＦＭ２２１の領域ＡＲ３２をクロッピングすることにより、加工画像ＩＰ３２１を生成する。また、例えば、生成装置１００は、フレームＦＭ２３１の領域ＡＲ３３をクロッピングすることにより、加工画像ＩＰ３３１を生成する。また、例えば、生成装置１００は、フレームＦＭ２４１の領域ＡＲ３４をクロッピングすることにより、加工画像ＩＰ３４１を生成する。加工画像ＩＰ３４１は、動画ＭＣ２１に含まれ、表示順が付されたオブジェクトＯＢ２１〜ＯＢ２３の全てを含む。このように、図９の例では、生成装置１００は、全体を俯瞰するような加工画像ＩＰ３４１を生成することにより、図８に示す場合と比較して、より動画ＭＣ２１全体の内容を含む要約動画ＭＶ２２を生成することができる。 In the example of FIG. 9, the generation apparatus 100 generates processed images IP311, IP321, and IP331 including the entire dog A, processed images IP341 including the ball, processed images IP351 and IP361 including the ball and the dog B, and the like. . For example, the generating apparatus 100 generates the processed image IP311 by cropping the area AR31 of the frame FM211. Further, for example, the generation apparatus 100 generates the processed image IP321 by cropping the area AR32 of the frame FM221. Further, for example, the generating apparatus 100 generates the processed image IP331 by cropping the area AR33 of the frame FM231. Further, for example, the generation apparatus 100 generates the processed image IP341 by cropping the area AR34 of the frame FM241. The processed image IP341 includes all of the objects OB21 to OB23 that are included in the moving image MC21 and assigned a display order. In this way, in the example of FIG. 9, the generation apparatus 100 generates the processed image IP341 that gives an overview of the whole, and thus the summary video including the entire content of the video MC21 as compared to the case illustrated in FIG. 8. MV22 can be generated.

また、例えば、生成装置１００は、フレームＦＭ２５１の領域ＡＲ３５をクロッピングすることにより、加工画像ＩＰ３５１を生成する。また、例えば、生成装置１００は、フレームＦＭ２６１の領域ＡＲ３６をクロッピングすることにより、加工画像ＩＰ３６１を生成する。なお、図９では説明を簡単にするために、加工画像ＩＰ３１１〜ＩＰ３６１のみを図示するが、生成装置１００は、他のフレーム（画像）から多数の加工画像を生成してもよい。また、生成装置１００は、１つのフレームから複数の加工画像を生成してもよい。 Further, for example, the generation apparatus 100 generates the processed image IP351 by cropping the area AR35 of the frame FM251. Further, for example, the generation apparatus 100 generates the processed image IP361 by cropping the area AR36 of the frame FM261. In FIG. 9, only the processed images IP311 to IP361 are illustrated for simplicity of explanation, but the generating apparatus 100 may generate a large number of processed images from other frames (images). Further, the generation apparatus 100 may generate a plurality of processed images from one frame.

そして、生成装置１００は、複数の加工画像ＩＰ３１１〜ＩＰ３６１等の順位を決定する。なお、図９の例では、各加工画像ＩＰ３１１〜ＩＰ３６１等の順位は、抽出元となるフレームＦＭ２１１〜ＦＭ２６１の時系列順に対応する。例えば、生成装置１００は、複数の加工画像ＩＰ３１１〜ＩＰ３６１等のうち、加工画像ＩＰ３１１を最も表示順を高くし、その次に加工画像ＩＰ３２１の順位を高くする。また、生成装置１００は、複数の加工画像ＩＰ３１１〜ＩＰ３６１等のうち、加工画像ＩＰ３３１の順位を加工画像ＩＰ３２１の次に高くし、加工画像ＩＰ３４１、ＩＰ３５１、ＩＰ３６１の順位は、加工画像ＩＰ３３１よりも低く、加工画像ＩＰ３４１、ＩＰ３５１、ＩＰ３６１の順に低くなる順位とする。図９の例では、生成装置１００は、加工画像ＩＰ３１１の順位を順位Ａ、加工画像ＩＰ３２１の順位を順位Ｂ、加工画像ＩＰ３３１の順位を順位Ｃ、加工画像ＩＰ３４１の順位を順位Ｄ、加工画像ＩＰ３５１の順位を順位Ｅ、加工画像ＩＰ３６１の順位を順位Ｆに決定する（Ａ＜Ｂ＜Ｃ＜Ｄ＜Ｅ＜Ｆ）。 Then, the generation apparatus 100 determines the order of the plurality of processed images IP311 to IP361 and the like. In the example of FIG. 9, the ranks of the processed images IP311 to IP361 correspond to the time series order of the frames FM211 to FM261 that are the extraction sources. For example, the generating apparatus 100 sets the processed image IP311 to the highest display order among the plurality of processed images IP311 to IP361, and then increases the rank of the processed image IP321. Further, the generation apparatus 100 sets the rank of the processed image IP331 next to the processed image IP321 among the plurality of processed images IP311 to IP361, and the rank of the processed images IP341, IP351, and IP361 is lower than that of the processed image IP331. , And the processed image IP341, IP351, and IP361 in the order of decreasing. In the example of FIG. 9, the generation apparatus 100 sets the rank of the processed image IP311 as rank A, the rank of the processed image IP321 as rank B, the rank of the processed image IP331 as rank C, the rank of the processed image IP341 as rank D, and the processed image IP351. Is ranked E, and the processed image IP361 is ranked F (A <B <C <D <E <F).

そして、生成装置１００は、複数の加工画像ＩＰ３１１〜ＩＰ３６１等に付された順位に基づく順序で、複数の加工画像ＩＰ３１１〜ＩＰ３６１等が表示される要約動画ＭＶ２２を生成する。例えば、生成装置１００は、上述したフレーム補間等の処理により、複数の加工画像ＩＰ３１１〜ＩＰ３６１間をつなぐ補間を行うことにより、要約動画ＭＶ２２を生成してもよい。例えば、生成装置１００は、生成した加工画像に加工画像ＩＰ３１１〜ＩＰ３６１以外にも多数の加工画像が含まれる場合、複数の加工画像に付された順位に基づく順序で、複数の加工画像が表示される要約動画ＭＶ２２を生成してもよい。このように、生成装置１００は、動画から要約動画を生成することができる。なお、上述のように、複数のフレームから要約動画を生成する処理は、複数の画像から要約動画を生成する処理に対応する。なお、生成装置１００は、図８に示す要約動画ＭＶ２１と図９に示す要約動画ＭＶ２２とのいずれを生成するかを、要約動画の生成に用いるコンテンツに含まれる動画ＭＣ２１の内容等に基づいて決定してもよい。また、生成装置１００は、図８に示す要約動画ＭＶ２１と図９に示す要約動画ＭＶ２２とのいずれを生成するかを、配信システム１の管理者等の指定に応じて決定してもよい。 Then, the generation apparatus 100 generates the summary video MV22 in which the plurality of processed images IP311 to IP361 and the like are displayed in the order based on the order given to the plurality of processed images IP311 to IP361 and the like. For example, the generation apparatus 100 may generate the summary video MV22 by performing interpolation that connects the plurality of processed images IP311 to IP361 by the above-described processing such as frame interpolation. For example, when the generated processed image includes a large number of processed images in addition to the processed images IP311 to IP361, the generated device 100 displays the plurality of processed images in an order based on the order given to the plurality of processed images. A summary video MV22 may be generated. In this way, the generation device 100 can generate a summary video from a video. Note that, as described above, the process of generating a summary movie from a plurality of frames corresponds to the process of generating a summary movie from a plurality of images. Note that the generation apparatus 100 determines which of the summary video MV21 illustrated in FIG. 8 and the summary video MV22 illustrated in FIG. 9 is to be generated based on the content of the video MC21 included in the content used to generate the summary video. May be. Further, the generation apparatus 100 may determine which of the summary video MV21 illustrated in FIG. 8 and the summary video MV22 illustrated in FIG. 9 is to be generated according to the designation of the administrator or the like of the distribution system 1.

〔８．動画のキーフレームに基づく生成処理〕
例えば、生成装置１００は、複数のキーフレームを抽出して処理を行ってもよい。この点について図１０を用いて説明する。図１０は、実施形態に係る動画のキーフレームに基づく生成処理の一例を示す図である。例えば、生成装置１００は、種々の従来技術を適宜用いて複数のキーフレームを抽出してもよい。例えば、生成装置１００は、エッジ検出や肌色検出や音量検出やカメラワーク検出等、種々の技術を用いてキーフレームを抽出してもよい。例えば、生成装置１００は、各画素の変化に基づいて推定されるシーンの転換点をキーフレームとして抽出してもよい。また、生成装置１００は、配信システム１の管理者等によるキーフレームの指定を受け付けてもよい。 [8. (Generation processing based on video keyframes)
For example, the generation apparatus 100 may perform processing by extracting a plurality of key frames. This point will be described with reference to FIG. FIG. 10 is a diagram illustrating an example of a generation process based on a moving image key frame according to the embodiment. For example, the generation apparatus 100 may extract a plurality of key frames using various conventional techniques as appropriate. For example, the generation apparatus 100 may extract key frames using various techniques such as edge detection, skin color detection, volume detection, camera work detection, and the like. For example, the generation device 100 may extract a scene turning point estimated based on a change of each pixel as a key frame. The generation apparatus 100 may accept a key frame designation by an administrator of the distribution system 1 or the like.

図１０に示す例において、動画ＭＣ３１には、フレームＦＭ３１１〜ＦＭ３３４等が含まれるものとする。例えば、生成装置１００は、所定の処理により動画ＭＣ３１のキーフレームがフレームＦＭ３１２、ＦＭ３１９、ＦＭ３２７の３つのフレームであると特定する。なお、以下では、フレームＦＭ３１２をキーフレームＫＦ３１とし、フレームＦＭ３１９をキーフレームＫＦ３２とし、フレームＦＭ３２７をキーフレームＫＦ３３とする場合がある。 In the example illustrated in FIG. 10, the moving image MC31 includes frames FM311 to FM334 and the like. For example, the generation apparatus 100 specifies that the key frame of the moving image MC31 is the three frames FM312, FM319, and FM327 by a predetermined process. Hereinafter, the frame FM312 may be referred to as a key frame KF31, the frame FM319 may be referred to as a key frame KF32, and the frame FM327 may be referred to as a key frame KF33.

また、生成装置１００は、各キーフレームＫＦ３１〜ＫＦ３３から後の数フレームを対象に動画生成を行う。例えば、生成装置１００は、キーフレームＫＦ３１から後の数フレームＦＭ３１３〜ＦＭ３１５である関連フレームＣＦ３１−１〜ＣＦ３１−３を対象に複数の加工画像を生成し、複数の加工画像に基づいて動画を生成する（ステップＳ３１）。これにより、生成装置１００は、キーフレームＫＦ３１及び関連フレームＣＦ３１−１〜ＣＦ３１−３から動画情報Ａである動画ＭＶ３１１を生成する。 Further, the generation device 100 generates a moving image for several frames after the key frames KF31 to KF33. For example, the generation apparatus 100 generates a plurality of processed images for the related frames CF31-1 to CF31-3 that are the several frames FM313 to FM315 after the key frame KF31, and generates a moving image based on the plurality of processed images. (Step S31). Thereby, the production | generation apparatus 100 produces | generates the moving image MV311 which is the moving image information A from the key frame KF31 and related frame CF31-1-CF31-3.

また、例えば、生成装置１００は、キーフレームＫＦ３２から後の数フレームＦＭ３２０〜ＦＭ３２１である関連フレームＣＦ３２−１、ＣＦ３２−２を対象に複数の加工画像を生成し、複数の加工画像に基づいて動画を生成する（ステップＳ３２）。これにより、生成装置１００は、キーフレームＫＦ３２及び関連フレームＣＦ３２−１、ＣＦ３２−２から動画情報Ｂである動画ＭＶ３１２を生成する。 In addition, for example, the generation device 100 generates a plurality of processed images for the related frames CF32-1 and CF32-2 that are several frames FM320 to FM321 after the key frame KF32, and a moving image is generated based on the plurality of processed images. Is generated (step S32). As a result, the generation apparatus 100 generates the moving image MV312 that is the moving image information B from the key frame KF32 and the related frames CF32-1 and CF32-2.

また、例えば、生成装置１００は、キーフレームＫＦ３３から後の数フレームＦＭ３２８〜ＦＭ３３１である関連フレームＣＦ３３−１〜ＣＦ３３−４を対象に複数の加工画像を生成し、複数の加工画像に基づいて動画を生成する（ステップＳ３３）。これにより、生成装置１００は、キーフレームＫＦ３３及び関連フレームＣＦ３３−１〜ＣＦ３３−４から動画情報Ｃである動画ＭＶ３１３を生成する。 Further, for example, the generation apparatus 100 generates a plurality of processed images for the related frames CF33-1 to CF33-4 that are the several frames FM328 to FM331 subsequent to the key frame KF33, and a moving image based on the plurality of processed images. Is generated (step S33). Thereby, the production | generation apparatus 100 produces | generates the moving image MV313 which is the moving image information C from the key frame KF33 and related frame CF33-1 to CF33-4.

そして、生成装置１００は、動画情報Ａ〜Ｃから要約動画ＭＶ３１を生成する（ステップＳ３４）。例えば、生成装置１００は、動画ＭＶ３１１、ＭＶ３１２、ＭＶ３１３の順で表示される要約動画ＭＶ３１を生成する。このように、生成装置１００は、複数のキーフレームから各々生成される動画をつなげた要約動画を生成する。このように、生成装置１００は、複数のキーフレームが含まれる場合であっても、各キーフレームに対応する動画をつなげることにより、要約動画を生成することができる。 And the production | generation apparatus 100 produces | generates the summary moving image MV31 from moving image information AC (step S34). For example, the generating apparatus 100 generates a summary video MV31 that is displayed in the order of the videos MV311, MV312 and MV313. As described above, the generation device 100 generates a summary moving image in which moving images generated from a plurality of key frames are connected. As described above, the generating apparatus 100 can generate a summary moving image by connecting moving images corresponding to each key frame even when a plurality of key frames are included.

〔９．効果〕
上述してきたように、実施形態に係る生成装置１００は、取得部１３１と、第１生成部１３３と、第２生成部１３４とを有する。取得部１３１は、コンテンツに含まれる画像に関する情報から抽出される対象物の領域に関する情報である特徴領域情報を取得する。第１生成部１３３と、取得部１３１により取得された特徴領域情報に基づいてコンテンツから複数の加工画像を生成する。第２生成部１３４は、複数の加工画像に付された順位に基づく順序で、複数の加工画像が表示される動画情報を生成する。 [9. effect〕
As described above, the generation device 100 according to the embodiment includes the acquisition unit 131, the first generation unit 133, and the second generation unit 134. The acquisition unit 131 acquires feature region information that is information related to a region of an object extracted from information related to an image included in content. Based on the feature area information acquired by the first generation unit 133 and the acquisition unit 131, a plurality of processed images are generated from the content. The second generation unit 134 generates moving image information in which the plurality of processed images are displayed in an order based on the order given to the plurality of processed images.

これにより、実施形態に係る生成装置１００は、対象物の領域に関する情報である特徴領域情報に基づいて複数の加工画像を生成することにより、コンテンツの内容を含む動画（実施形態においては「要約動画」。以下同じ）を適切に生成することができる。 Thus, the generation apparatus 100 according to the embodiment generates a plurality of processed images based on the feature area information that is information on the area of the target object, thereby generating a moving image including the contents of the content (in the embodiment, “summary moving image”). The same shall apply hereinafter).

また、実施形態に係る生成装置１００において、取得部１３１は、画像に関する情報としてコンテンツに含まれる複数の画像情報から抽出される特徴領域情報を取得する。第１生成部１３３は、複数の画像情報から複数の加工画像を生成する。 In the generation device 100 according to the embodiment, the acquisition unit 131 acquires feature region information extracted from a plurality of pieces of image information included in the content as information about the image. The first generation unit 133 generates a plurality of processed images from a plurality of image information.

これにより、実施形態に係る生成装置１００は、コンテンツに含まれる複数の画像情報に特徴領域情報に基づいて複数の加工画像を生成することにより、コンテンツの内容を含む動画を適切に生成することができる。 Accordingly, the generation apparatus 100 according to the embodiment can appropriately generate a moving image including the content content by generating a plurality of processed images based on the feature region information for the plurality of image information included in the content. it can.

また、実施形態に係る生成装置１００において、第１生成部１３３は、複数の画像情報のうち、所定の画像情報に含まれる対象物に関する領域をクロッピングすることにより生成される加工画像を含む、複数の加工画像を生成する。 In the generation device 100 according to the embodiment, the first generation unit 133 includes a plurality of processed images generated by cropping a region related to an object included in the predetermined image information among the plurality of image information. The processed image is generated.

これにより、実施形態に係る生成装置１００は、コンテンツに含まれる複数の画像情報中の対象物に関する領域をクロッピングすることにより、コンテンツの内容を含む動画を適切に生成することができる。 Thereby, the production | generation apparatus 100 which concerns on embodiment can produce | generate the moving image containing the content of a content appropriately by cropping the area | region regarding the target object in the some image information contained in content.

また、実施形態に係る生成装置１００において、取得部１３１は、画像に関する情報としてコンテンツに含まれる動画情報から抽出される特徴領域情報を取得する。第１生成部１３３は、動画情報から複数の加工画像を生成する。 In the generation device 100 according to the embodiment, the acquisition unit 131 acquires feature area information extracted from moving image information included in content as information about an image. The first generation unit 133 generates a plurality of processed images from the moving image information.

これにより、実施形態に係る生成装置１００は、コンテンツに含まれる動画情報に特徴領域情報に基づいて複数の加工画像を生成することにより、コンテンツの内容を含む動画を適切に生成することができる。 Thereby, the production | generation apparatus 100 which concerns on embodiment can produce | generate the moving image containing the content of a content appropriately by producing | generating a some processed image based on characteristic area information to the moving image information contained in a content.

また、実施形態に係る生成装置１００において、第１生成部１３３は、動画情報から抽出される画像情報に含まれる対象物に関する領域をクロッピングすることにより生成される加工画像を含む、複数の加工画像を生成する。 In the generation device 100 according to the embodiment, the first generation unit 133 includes a plurality of processed images including a processed image generated by cropping a region related to an object included in image information extracted from moving image information. Is generated.

これにより、実施形態に係る生成装置１００は、コンテンツに含まれる動画情報中の対象物に関する領域をクロッピングすることにより、コンテンツの内容を含む動画を適切に生成することができる。 Thereby, the production | generation apparatus 100 which concerns on embodiment can produce | generate the moving image containing the content of a content appropriately by cropping the area | region regarding the target object in the moving image information contained in content.

また、実施形態に係る生成装置１００において、取得部１３１は、コンテンツに関連する文字情報に基づいて抽出される特徴領域情報を取得する。第１生成部１３３は、文字情報に基づいて複数の加工画像を生成する。第２生成部１３４は、文字情報に基づいて複数の加工画像に付された順位に応じた順序で、複数の加工画像が表示される動画情報を生成する。 In the generation device 100 according to the embodiment, the acquisition unit 131 acquires feature region information extracted based on character information related to content. The first generation unit 133 generates a plurality of processed images based on the character information. The second generation unit 134 generates moving image information in which the plurality of processed images are displayed in an order corresponding to the order given to the plurality of processed images based on the character information.

これにより、実施形態に係る生成装置１００は、コンテンツに関連する文字情報に基づいて抽出された特徴領域情報を用いて複数の加工画像を生成することにより、コンテンツの内容を含む動画を適切に生成することができる。 Accordingly, the generation apparatus 100 according to the embodiment appropriately generates a moving image including the content content by generating a plurality of processed images using the feature area information extracted based on the character information related to the content. can do.

また、実施形態に係る生成装置１００において、取得部１３１は、コンテンツに関連する音声情報に基づいて抽出される特徴領域情報を取得する。第１生成部１３３は、音声情報に基づいて複数の加工画像を生成する。第２生成部１３４は、音声情報に基づいて複数の加工画像に付された順位に応じた順序で、複数の加工画像が表示される動画情報を生成する。 In the generation device 100 according to the embodiment, the acquisition unit 131 acquires feature region information extracted based on audio information related to content. The first generation unit 133 generates a plurality of processed images based on the audio information. The second generation unit 134 generates moving image information in which the plurality of processed images are displayed in an order corresponding to the order given to the plurality of processed images based on the audio information.

これにより、実施形態に係る生成装置１００は、コンテンツに関連する音声情報に基づいて抽出された特徴領域情報を用いて複数の加工画像を生成することにより、コンテンツの内容を含む動画を適切に生成することができる。 Accordingly, the generation apparatus 100 according to the embodiment appropriately generates a moving image including the content content by generating a plurality of processed images using the feature area information extracted based on the audio information related to the content. can do.

また、実施形態に係る生成装置１００において、第２生成部１３４は、複数の加工画像のうち、所定の表示順が付された第１の対象物が含まれる加工画像の後に、第１の対象物に付された表示順よりも下位の表示順が付された第２の対象物が含まれる加工画像が表示される動画情報を生成する。 In the generation device 100 according to the embodiment, the second generation unit 134 includes the first target after the processed image including the first target with a predetermined display order among the plurality of processed images. Movie information is generated in which a processed image including a second object with a display order lower than the display order attached to the object is displayed.

これにより、実施形態に係る生成装置１００は、対象物に付された表示順に基づく順序で対象物が表示されるように動画を生成することにより、コンテンツに含まれる動画情報からコンテンツの内容を含む動画を適切に生成することができる。 Thereby, the production | generation apparatus 100 which concerns on embodiment contains the content of the content from the moving image information contained in content by producing | generating a moving image so that a target object may be displayed in the order based on the display order attached | subjected to the target object. Video can be generated properly.

また、実施形態に係る生成装置１００において、第２生成部１３４は、複数の加工画像のうち、所定の対象物の一部であって、所定の表示順が付された第１の部分が含まれる加工画像の後に、所定の対象物の一部であって、第１の部分に付された表示順よりも下位の表示順が付された第２の部分が含まれる加工画像が表示される動画情報を生成する。 Further, in the generation device 100 according to the embodiment, the second generation unit 134 includes a first part that is a part of a predetermined object and has a predetermined display order among the plurality of processed images. After the processed image is displayed, a processed image including a second part that is a part of the predetermined object and has a lower display order than the display order attached to the first part is displayed. Generate video information.

これにより、実施形態に係る生成装置１００は、対象物の各部位に付された表示順に基づく順序で対象物の各部位が表示されるように動画を生成することにより、コンテンツに含まれる動画情報からコンテンツの内容を含む動画を適切に生成することができる。 Thereby, the generating apparatus 100 according to the embodiment generates the moving image information included in the content by generating the moving image so that each part of the target is displayed in the order based on the display order given to each part of the target. Therefore, it is possible to appropriately generate a moving image including the content content.

〔１０．ハードウェア構成〕
上述してきた実施形態に係る生成装置１００は、例えば図１１に示すような構成のコンピュータ１０００によって実現される。図１１は、生成装置の機能を実現するコンピュータの一例を示すハードウェア構成図である。コンピュータ１０００は、ＣＰＵ１１００、ＲＡＭ１２００、ＲＯＭ１３００、ＨＤＤ１４００、通信インターフェイス（Ｉ／Ｆ）１５００、入出力インターフェイス（Ｉ／Ｆ）１６００、及びメディアインターフェイス（Ｉ／Ｆ）１７００を有する。 [10. Hardware configuration)
The generation apparatus 100 according to the embodiment described above is realized by a computer 1000 having a configuration as shown in FIG. 11, for example. FIG. 11 is a hardware configuration diagram illustrating an example of a computer that realizes the function of the generation device. The computer 1000 includes a CPU 1100, RAM 1200, ROM 1300, HDD 1400, communication interface (I / F) 1500, input / output interface (I / F) 1600, and media interface (I / F) 1700.

ＣＰＵ１１００は、ＲＯＭ１３００またはＨＤＤ１４００に格納されたプログラムに基づいて動作し、各部の制御を行う。ＲＯＭ１３００は、コンピュータ１０００の起動時にＣＰＵ１１００によって実行されるブートプログラムや、コンピュータ１０００のハードウェアに依存するプログラム等を格納する。 The CPU 1100 operates based on a program stored in the ROM 1300 or the HDD 1400 and controls each unit. The ROM 1300 stores a boot program executed by the CPU 1100 when the computer 1000 is started up, a program depending on the hardware of the computer 1000, and the like.

ＨＤＤ１４００は、ＣＰＵ１１００によって実行されるプログラム、及び、かかるプログラムによって使用されるデータ等を格納する。通信インターフェイス１５００は、ネットワークＮを介して他の機器からデータを受信してＣＰＵ１１００へ送り、ＣＰＵ１１００が生成したデータをネットワークＮを介して他の機器へ送信する。 The HDD 1400 stores programs executed by the CPU 1100, data used by the programs, and the like. The communication interface 1500 receives data from other devices via the network N and sends the data to the CPU 1100, and transmits data generated by the CPU 1100 to other devices via the network N.

ＣＰＵ１１００は、入出力インターフェイス１６００を介して、ディスプレイやプリンタ等の出力装置、及び、キーボードやマウス等の入力装置を制御する。ＣＰＵ１１００は、入出力インターフェイス１６００を介して、入力装置からデータを取得する。また、ＣＰＵ１１００は、生成したデータを入出力インターフェイス１６００を介して出力装置へ出力する。 The CPU 1100 controls an output device such as a display and a printer and an input device such as a keyboard and a mouse via the input / output interface 1600. The CPU 1100 acquires data from the input device via the input / output interface 1600. In addition, the CPU 1100 outputs the generated data to the output device via the input / output interface 1600.

メディアインターフェイス１７００は、記録媒体１８００に格納されたプログラムまたはデータを読み取り、ＲＡＭ１２００を介してＣＰＵ１１００に提供する。ＣＰＵ１１００は、かかるプログラムを、メディアインターフェイス１７００を介して記録媒体１８００からＲＡＭ１２００上にロードし、ロードしたプログラムを実行する。記録媒体１８００は、例えばＤＶＤ（Digital Versatile Disc）、ＰＤ（Phase change rewritable Disk）等の光学記録媒体、ＭＯ（Magneto-Optical disk）等の光磁気記録媒体、テープ媒体、磁気記録媒体、または半導体メモリ等である。 The media interface 1700 reads a program or data stored in the recording medium 1800 and provides it to the CPU 1100 via the RAM 1200. The CPU 1100 loads such a program from the recording medium 1800 onto the RAM 1200 via the media interface 1700, and executes the loaded program. The recording medium 1800 is, for example, an optical recording medium such as a DVD (Digital Versatile Disc) or PD (Phase change rewritable disk), a magneto-optical recording medium such as an MO (Magneto-Optical disk), a tape medium, a magnetic recording medium, or a semiconductor memory. Etc.

例えば、コンピュータ１０００が実施形態に係る生成装置１００として機能する場合、コンピュータ１０００のＣＰＵ１１００は、ＲＡＭ１２００上にロードされたプログラムを実行することにより、制御部１３０の機能を実現する。コンピュータ１０００のＣＰＵ１１００は、これらのプログラムを記録媒体１８００から読み取って実行するが、他の例として、他の装置からネットワークＮを介してこれらのプログラムを取得してもよい。 For example, when the computer 1000 functions as the generation apparatus 100 according to the embodiment, the CPU 1100 of the computer 1000 implements the function of the control unit 130 by executing a program loaded on the RAM 1200. The CPU 1100 of the computer 1000 reads these programs from the recording medium 1800 and executes them. However, as another example, these programs may be acquired from other devices via the network N.

以上、本願の実施形態のいくつかを図面に基づいて詳細に説明したが、これらは例示であり、発明の開示の行に記載の態様を始めとして、当業者の知識に基づいて種々の変形、改良を施した他の形態で本発明を実施することが可能である。 As described above, some of the embodiments of the present application have been described in detail with reference to the drawings. However, these are merely examples, and various modifications based on the knowledge of those skilled in the art, including the aspects described in the disclosure line of the invention. It is possible to implement the present invention in other forms with improvements.

〔１１．その他〕
また、上記実施形態において説明した各処理のうち、自動的に行われるものとして説明した処理の全部または一部を手動的に行うこともでき、あるいは、手動的に行われるものとして説明した処理の全部または一部を公知の方法で自動的に行うこともできる。この他、上記文書中や図面中で示した処理手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる。例えば、各図に示した各種情報は、図示した情報に限られない。 [11. Others]
In addition, among the processes described in the above embodiment, all or part of the processes described as being automatically performed can be performed manually, or the processes described as being performed manually can be performed. All or a part can be automatically performed by a known method. In addition, the processing procedures, specific names, and information including various data and parameters shown in the document and drawings can be arbitrarily changed unless otherwise specified. For example, the various types of information illustrated in each drawing is not limited to the illustrated information.

また、図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されていることを要しない。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。 Further, each component of each illustrated apparatus is functionally conceptual, and does not necessarily need to be physically configured as illustrated. In other words, the specific form of distribution / integration of each device is not limited to that shown in the figure, and all or a part thereof may be functionally or physically distributed or arbitrarily distributed in arbitrary units according to various loads or usage conditions. Can be integrated and configured.

また、上述してきた実施形態は、処理内容を矛盾させない範囲で適宜組み合わせることが可能である。 Further, the above-described embodiments can be appropriately combined within a range in which processing contents are not contradictory.

また、上述してきた「部（section、module、unit）」は、「手段」や「回路」などに読み替えることができる。例えば、取得部は、取得手段や取得回路に読み替えることができる。 In addition, the “section (module, unit)” described above can be read as “means” or “circuit”. For example, the acquisition unit can be read as acquisition means or an acquisition circuit.

１配信システム
１００生成装置
１２１コンテンツ情報記憶部
１３０制御部
１３１取得部
１３２抽出部
１３３第１生成部
１３４第２生成部
１３５配信部
１０端末装置
１５１送信部
１５２受信部
１５３表示部
Ｎネットワーク DESCRIPTION OF SYMBOLS 1 Distribution system 100 Generation apparatus 121 Content information storage part 130 Control part 131 Acquisition part 132 Extraction part 133 1st generation part 134 2nd generation part 135 Distribution part 10 Terminal device 151 Transmission part 152 Reception part 153 Display part N Network

Claims

An acquisition unit that acquires feature region information that is information related to a region of an object extracted from information related to an image included in content;
A first generation unit that generates a plurality of processed images from the content based on the feature region information acquired by the acquisition unit;
A second generation unit that generates moving image information in which the plurality of processed images are displayed in an order based on information about sentences included in the content and based on the order of the plurality of processed images;
A generating apparatus comprising:

An acquisition unit that acquires feature region information that is information related to a region of an object extracted from information related to an image included in content;
A first generation unit that generates a plurality of processed images from the content based on the feature region information acquired by the acquisition unit;
For each object included in the content, the plurality of processed images are displayed in an order based on the order given to the plurality of processed images according to a display order determined based on information stored in a predetermined database. A second generation unit for generating moving image information to be performed;
A generating apparatus comprising:

An acquisition unit that acquires feature region information that is information related to a region of an object extracted from information related to an image included in content;
A first generation unit that generates a plurality of processed images from the content based on the feature region information acquired by the acquisition unit;
The moving image information in which the plurality of processed images are displayed in the order based on the order given to the plurality of processed images according to the display order learned based on a predetermined moving image group for each object included in the content A second generation unit for generating
A generating apparatus comprising:

An acquisition unit that acquires feature region information that is information related to a region of an object extracted from information related to an image included in content;
A first generation unit that generates a plurality of processed images from the content based on the feature region information acquired by the acquisition unit;
For each object included in the content, the plurality of processed images are displayed in an order based on the order given to the plurality of processed images according to a display order determined based on information collected from the network. A second generator for generating video information;
A generating apparatus comprising:

An acquisition unit that acquires feature region information that is information related to a region of an object extracted from information related to an image included in content;
A first generation unit that generates a plurality of processed images from the content based on the feature region information acquired by the acquisition unit;
When each object included in the content is a living thing, the object is based on the order given to the plurality of processed images according to the display order determined based on information such as likelihood of the face recognition result. A second generator for generating moving image information in which the plurality of processed images are displayed in order;
A generating apparatus comprising:

The acquisition unit
Obtaining the feature region information extracted from a plurality of pieces of image information included in the content as information about the image;
The first generator is
Generating apparatus according to any one of claims 1-5, characterized in that to generate the plurality of processed image from said plurality of image information.

The first generator is
Among the plurality of image information, according to claim 6, characterized in that to produce a processed image containing the plurality of processing images generated by cropping an area about the object included in the predetermined image information Generator.

The acquisition unit
Obtaining the feature region information extracted from the moving image information included in the content as information about the image;
The first generator is
Generating apparatus according to any one of claims 1-5, characterized in that to generate the plurality of processed image from the moving image information.

The first generator is
The generating apparatus according to claim 8 , wherein the plurality of processed images including a processed image generated by cropping a region related to an object included in image information extracted from the moving image information is generated. .

The acquisition unit
Obtaining the feature region information extracted based on character information related to the content;
The first generator is
Generating the plurality of processed images based on the character information;
The second generator is
In order corresponding to the ranking assigned to the plurality of processed image based on said character information, any one of claims 1-9, characterized in that to produce a motion picture information of the plurality of processed image is displayed 1 The generating device according to item.

The acquisition unit
Obtaining the feature area information extracted based on audio information related to the content;
The first generator is
Generating the plurality of processed images based on the audio information;
The second generator is
In order corresponding to the ranking assigned to the plurality of processing images on the basis of the voice information, claim 1-10, characterized in that to produce a motion picture information of the plurality of processed image is displayed 1 The generating device according to item.

The second generator is
A display order lower than the display order attached to the first object is attached after the processed image including the first object attached with a predetermined display order among the plurality of processed images. generating apparatus according to any one of claims 1 to 11, processed image including the second object and generating a video information to be displayed with.

The second generator is
Among the plurality of processed images, a part of the predetermined object, which is a part of the predetermined object after the processed image including a first part with a predetermined display order. , it claims 1-12, characterized in that to generate moving information processing images in which the first lower than subjected the display order on the portion of the display order includes a second portion attached is displayed The generating device according to any one of the above.

A generation method executed by a computer,
An acquisition step of acquiring feature region information, which is information related to a region of an object extracted from information related to an image included in content;
A first generation step of generating a plurality of processed images from the content based on the feature region information acquired by the acquisition step;
A second generation step of generating moving image information in which the plurality of processed images are displayed in an order based on information about sentences included in the content, and in an order based on the order given to the plurality of processed images;
A generation method comprising:

A generation method executed by a computer,
An acquisition step of acquiring feature region information, which is information related to a region of an object extracted from information related to an image included in content;
A first generation step of generating a plurality of processed images from the content based on the feature region information acquired by the acquisition step;
For each object included in the content, the plurality of processed images are displayed in an order based on the order given to the plurality of processed images according to a display order determined based on information stored in a predetermined database. A second generation step of generating the moving image information to be performed;
A generation method comprising:

A generation method executed by a computer,
An acquisition step of acquiring feature region information, which is information related to a region of an object extracted from information related to an image included in content;
A first generation step of generating a plurality of processed images from the content based on the feature region information acquired by the acquisition step;
The moving image information in which the plurality of processed images are displayed in the order based on the order given to the plurality of processed images according to the display order learned based on a predetermined moving image group for each object included in the content A second generation step of generating
A generation method comprising:

A generation method executed by a computer,
An acquisition step of acquiring feature region information, which is information related to a region of an object extracted from information related to an image included in content;
A first generation step of generating a plurality of processed images from the content based on the feature region information acquired by the acquisition step;
For each object included in the content, the plurality of processed images are displayed in an order based on the order given to the plurality of processed images according to a display order determined based on information collected from the network. A second generation step of generating video information;
A generation method comprising:

A generation method executed by a computer,
An acquisition step of acquiring feature region information, which is information related to a region of an object extracted from information related to an image included in content;
A first generation step of generating a plurality of processed images from the content based on the feature region information acquired by the acquisition step;
When each object included in the content is a living thing, the object is based on the order given to the plurality of processed images according to the display order determined based on information such as likelihood of the face recognition result. A second generation step of generating moving image information in which the plurality of processed images are displayed in order;
A generation method comprising:

An acquisition procedure for acquiring feature region information, which is information related to a region of an object extracted from information related to an image included in content;
A first generation procedure for generating a plurality of processed images from the content based on the feature region information acquired by the acquisition procedure;
A second generation procedure for generating moving image information in which the plurality of processed images are displayed in an order based on information about sentences included in the content, and in an order based on the order given to the plurality of processed images;
A program for causing a computer to execute.

An acquisition procedure for acquiring feature region information, which is information related to a region of an object extracted from information related to an image included in content;
A first generation procedure for generating a plurality of processed images from the content based on the feature region information acquired by the acquisition procedure;
For each object included in the content, the plurality of processed images are displayed in an order based on the order given to the plurality of processed images according to a display order determined based on information stored in a predetermined database. A second generation procedure for generating moving image information to be performed;
A program for causing a computer to execute.

An acquisition procedure for acquiring feature region information, which is information related to a region of an object extracted from information related to an image included in content;
A first generation procedure for generating a plurality of processed images from the content based on the feature region information acquired by the acquisition procedure;
The moving image information in which the plurality of processed images are displayed in the order based on the order given to the plurality of processed images according to the display order learned based on a predetermined moving image group for each object included in the content A second generation procedure for generating
A program for causing a computer to execute.

An acquisition procedure for acquiring feature region information, which is information related to a region of an object extracted from information related to an image included in content;
A first generation procedure for generating a plurality of processed images from the content based on the feature region information acquired by the acquisition procedure;
For each object included in the content, the plurality of processed images are displayed in an order based on the order given to the plurality of processed images according to a display order determined based on information collected from the network. A second generation procedure for generating video information;
A program for causing a computer to execute.

An acquisition procedure for acquiring feature region information, which is information related to a region of an object extracted from information related to an image included in content;
A first generation procedure for generating a plurality of processed images from the content based on the feature region information acquired by the acquisition procedure;
When each object included in the content is a living thing, the object is based on the order given to the plurality of processed images according to the display order determined based on information such as likelihood of the face recognition result. A second generation procedure for generating moving image information in which the plurality of processed images are displayed in order;
A program for causing a computer to execute.