JP6623905B2

JP6623905B2 - Server device, information processing method and program

Info

Publication number: JP6623905B2
Application number: JP2016073060A
Authority: JP
Inventors: 亮佐橋
Original assignee: Brother Industries Ltd
Current assignee: Brother Industries Ltd
Priority date: 2016-03-31
Filing date: 2016-03-31
Publication date: 2019-12-25
Anticipated expiration: 2036-03-31
Also published as: JP2017182706A

Description

本発明は、サーバ装置、情報処理方法およびプログラムの技術分野に属する。より詳細には、端末装置に対して動画を送信するサーバ装置および情報処理方法、ならびにそのサーバ装置用のプログラムの技術分野に属する。 The present invention belongs to the technical field of a server device, an information processing method, and a program. More specifically, the present invention belongs to the technical field of a server device that transmits a moving image to a terminal device, an information processing method, and a program for the server device.

近年、インターネット等のネットワークを介してサーバ装置に接続されたクライアント端末それぞれの機能をそのサーバ装置から提供する、いわゆる仮想クライアントシステムに関する研究／開発が行われている。このような仮想クライアントシステムについての先行技術文献には、例えば下記特許文献１がある。特許文献１に開示されている技術では、クライアント端末を使用するユーザの嗜好に応じて、各クライアント端末に送信される動画の全体からその一部を切り出すことで、その動画の内容を代表する、いわゆるダイジェスト画像を生成する構成とされている。 2. Description of the Related Art In recent years, research / development of a so-called virtual client system in which functions of client terminals connected to a server device via a network such as the Internet are provided from the server device has been performed. Prior art documents on such a virtual client system include, for example, Patent Document 1 below. In the technology disclosed in Patent Document 1, a part of the entire moving image transmitted to each client terminal is cut out according to the preference of a user who uses the client terminal, thereby representing the contents of the moving image. It is configured to generate a so-called digest image.

特開２００４−１２６８１１号公報（図２３等）JP-A-2004-126811 (FIG. 23, etc.)

しかしながら上記特許文献１に記載されている仮想クライアントシステムにおいては、ダイジェスト画像の生成のためには、元の動画の内容に対応するメタデータを作成する必要がある。ここで、この場合のメタデータとしては、元の動画におけるオブジェクトの出現時間情報やその属性情報等を含むメタデータが必要であり、これを作成するには大変な労力を必要とするという問題点があった。またそのようなメタデータの作成のためには、元の動画におけるオブジェクトの位置検出のためのオブジェクト識別処理等が必要であり、このオブジェクト識別処理等のための処理負荷も大きくなるという問題点があった。 However, in the virtual client system described in Patent Document 1, in order to generate a digest image, it is necessary to create metadata corresponding to the content of the original moving image. Here, as the metadata in this case, metadata including the appearance time information of the object in the original moving image and the attribute information thereof is required, and the creation of this requires a great deal of labor. was there. Further, in order to create such metadata, an object identification process or the like for detecting the position of an object in the original moving image is required, and the processing load for the object identification process or the like increases. there were.

そこで本発明は、以上の問題点に鑑みて為されたものであり、メタデータ等の追加データを不要として、送信される動画の内容を的確に代表する代表画像を生成して送信することが可能なサーバ装置および情報処理方法ならびにプログラムを提供する。 Accordingly, the present invention has been made in view of the above problems, and it is possible to generate and transmit a representative image that accurately represents the content of a moving image to be transmitted without requiring additional data such as metadata. A possible server device, an information processing method, and a program are provided.

上記の課題を解決するために、請求項１に記載の発明は、動画を表示する複数の端末装置のそれぞれに、複数のフレームからなる前記動画を送信するサーバ装置において、各前記端末装置における前記動画の表示範囲を示す範囲データを、各前記端末装置から取得する取得手段と、前記取得手段により取得された前記範囲データに基づいて、前記フレーム内の予め設定された領域ごとの前記端末装置における表示回数を、前記フレームごとに算出する算出手段と、前記算出手段により算出された前記表示回数に基づいて、前記表示回数が予め設定された基準を満たす前記フレームの前記領域を前記動画から抽出する抽出手段であって、前記フレームにおいて前記表示回数が最大となる前記領域の中心を中心として、前記動画の送信において予め設定されている最低解像度分の画像を前記動画から抽出する抽出手段と、前記抽出手段により抽出された前記画像に基づいて、前記動画の内容を代表する代表画像を生成する生成手段と、を備えることを特徴とする。
請求項２に記載の発明は、動画を表示する複数の端末装置のそれぞれに、複数のフレームからなる前記動画を送信するサーバ装置において、各前記端末装置における前記動画の表示範囲を示す範囲データを、各前記端末装置から取得する取得手段と、前記取得手段により取得された前記範囲データに基づいて、前記フレーム内の予め設定された領域ごとの前記端末装置における表示回数を、前記フレームごとに算出する算出手段と、前記算出手段により算出された前記表示回数に基づいて、前記表示回数が予め設定された基準を満たす前記フレームの前記領域を前記動画から抽出する抽出手段であって、前記フレームにおいて前記表示回数が最大となる前記領域を含み、且つ、前記動画の送信において予め設定されているアスペクト比となる画像を前記動画から抽出する抽出手段と、前記抽出手段により抽出された前記画像に基づいて、前記動画の内容を代表する代表画像を生成する生成手段と、を備えることを特徴とする。
請求項３に記載の発明は、動画を表示する複数の端末装置のそれぞれに、複数のフレームからなる前記動画を送信するサーバ装置において、各前記端末装置における前記動画の表示範囲を示す範囲データを、各前記端末装置から取得する取得手段と、前記取得手段により取得された前記範囲データに基づいて、前記フレーム内の予め設定された領域ごとの前記端末装置における表示回数を、前記フレームごとに算出する算出手段と、前記算出手段により算出された前記表示回数に基づいて、前記表示回数が予め設定された基準を満たす前記フレームの前記領域を前記動画から抽出する抽出手段と、前記抽出手段により抽出された前記領域に基づいて、前記動画の内容を代表する代表画像を生成する生成手段と、を備え、前記表示回数が予め設定された閾値より多い前記領域の大きさが前記動画の送信において予め設定されている最低解像度以下である場合に、前記抽出手段は、前記表示回数が前記閾値より多い前記領域の中心を中心として、前記最低解像度分の画像を前記動画から抽出し、前記表示回数が前記閾値より多い前記領域の大きさが前記最低解像度より大きい場合に、前記抽出手段は、前記表示回数が前記閾値より多い前記領域を含み、且つ、前記動画の送信において予め設定されているアスペクト比となる画像を前記動画から抽出することを特徴とする。 In order to solve the above-mentioned problem, the invention according to claim 1 is a server device that transmits the moving image including a plurality of frames to each of a plurality of terminal devices that display a moving image, Acquisition means for acquiring range data indicating a display range of a moving image from each of the terminal devices, based on the range data acquired by the acquisition means, in the terminal device for each preset region in the frame. Calculating means for calculating the number of display times for each frame; and extracting, from the moving image, the region of the frame in which the number of display times satisfies a preset criterion based on the number of display times calculated by the calculating means. a extracting means, around the center of the area where the impression is maximum in the frame, pre in the transmission of the video Comprising extracting means for extracting a minimum resolution of image that has been set from the video, based on the image extracted by the extraction means, and generating means for generating a representative image representing the content of the moving image, the It is characterized by the following.
According to a second aspect of the present invention, in the server device that transmits the moving image including a plurality of frames to each of a plurality of terminal devices that display a moving image, range data indicating a display range of the moving image in each of the terminal devices is stored. Acquiring means for acquiring from each of the terminal devices, and calculating the number of display times on the terminal device for each of predetermined regions in the frame based on the range data acquired by the acquiring means for each of the frames. Calculating means, and extracting means for extracting, from the moving image, the area of the frame in which the number of times of display satisfies a predetermined criterion based on the number of times of display calculated by the calculating means. Includes the area where the number of display times is maximum, and has a preset aspect ratio in transmission of the moving image Extraction means for extracting an image from the moving image, based on the image extracted by the extraction means, characterized in that it comprises, generating means for generating a representative image representing the content of the video.
According to a third aspect of the present invention, in the server device that transmits the moving image including a plurality of frames to each of the plurality of terminal devices that display the moving image, the range data indicating the display range of the moving image in each of the terminal devices is stored. Acquiring means for acquiring from each of the terminal devices, and calculating the number of display times on the terminal device for each of predetermined regions in the frame based on the range data acquired by the acquiring means for each of the frames. Calculating means for extracting, based on the number of display times calculated by the calculating means, an extracting means for extracting, from the moving image, the area of the frame in which the number of display times satisfies a preset criterion; Generating means for generating a representative image representative of the contents of the moving image based on the selected area, wherein the display count is set in advance. When the size of the region larger than the threshold value is equal to or less than the minimum resolution set in advance in the transmission of the moving image, the extraction unit is configured to center the center of the region where the display count is larger than the threshold value. The image for the minimum resolution is extracted from the moving image, and when the size of the region in which the number of display times is greater than the threshold is larger than the minimum resolution, the extraction unit determines the area in which the number of display times is greater than the threshold value And extracting an image having a preset aspect ratio in the transmission of the moving image from the moving image.

請求項４に記載の発明は、請求項１から請求項３のいずれか一項に記載のサーバ装置において、前記取得手段は、予め設定された分類方法により複数の前記端末装置を分類して得られた端末装置群に属する前記端末装置から前記範囲データを取得することを特徴とする。 According to a fourth aspect of the present invention, in the server device according to any one of the first to third aspects, the acquisition unit classifies the plurality of terminal devices by a preset classification method. And acquiring the range data from the terminal devices belonging to the terminal device group obtained.

請求項５に記載の発明は、請求項１から請求項４のいずれか一項に記載のサーバ装置において、前記算出手段は、前記領域としての画素ごとの前記表示回数を算出し、前記抽出手段は、前記表示回数が前記基準を満たす前記画素を前記動画から抽出し、前記生成手段は、前記抽出手段により抽出された前記画素に基づいて前記代表画像を生成することを特徴とする。 The invention described in claim 5, in the server apparatus according to any one of claims 1 to 4, wherein the calculating means calculates the impressions for each pixel as the region, the extraction unit Is characterized in that the pixel whose number of times of display satisfies the criterion is extracted from the moving image, and the generation unit generates the representative image based on the pixel extracted by the extraction unit.

請求項６に記載の発明は、請求項１から請求項５のいずれか一項に記載のサーバ装置において、前記抽出手段は、前記算出手段により算出された前記表示回数が前記動画の再生時間軸に対してピークとなったタイミングを含んで前記タイミングの前後の予め設定された再生時間に再生された前記領域を前記動画から抽出することを特徴とする。 According to a sixth aspect of the present invention, in the server device according to any one of the first to fifth aspects, the extraction unit is configured to determine that the number of displays calculated by the calculation unit is a reproduction time axis of the moving image. And extracting, from the moving image, the area reproduced at a preset reproduction time before and after the timing including the peak timing.

請求項７に記載の発明は、請求項１から請求項６のいずれか一項に記載のサーバ装置において、前記抽出手段は、前記算出手段により算出された前記表示回数が予め設定された閾値以上である前記領域を前記動画から抽出することを特徴とする。 According to a seventh aspect of the present invention, in the server device according to any one of the first to sixth aspects, the extracting unit may be configured so that the display count calculated by the calculating unit is equal to or greater than a preset threshold. Is extracted from the moving image.

請求項８に記載の発明は、動画を表示する複数の端末装置のそれぞれに、複数のフレームからなる前記動画を送信するサーバ装置において実行される情報処理方法において、各前記端末装置における前記動画の表示範囲を示す範囲データを、各前記端末装置から取得する取得ステップと、前記取得ステップにおいて取得された前記範囲データに基づいて、前記フレーム内の予め設定された領域ごとの前記端末装置における表示回数を、前記フレームごとに算出する算出ステップと、前記算出ステップにおいて算出された前記表示回数に基づいて、前記表示回数が予め設定された基準を満たす前記フレームの前記領域を前記動画から抽出する抽出ステップであって、前記フレームにおいて前記表示回数が最大となる前記領域の中心を中心として、前記動画の送信において予め設定されている最低解像度分の画像を前記動画から抽出する抽出ステップと、前記抽出ステップにおいて抽出された前記画像に基づいて、前記動画の内容を代表する代表画像を生成する生成ステップと、を含むことを特徴とする。
請求項９に記載の発明は、動画を表示する複数の端末装置のそれぞれに、複数のフレームからなる前記動画を送信するサーバ装置において実行される情報処理方法において、各前記端末装置における前記動画の表示範囲を示す範囲データを、各前記端末装置から取得する取得ステップと、前記取得ステップにおいて取得された前記範囲データに基づいて、前記フレーム内の予め設定された領域ごとの前記端末装置における表示回数を、前記フレームごとに算出する算出ステップと、前記算出ステップにおいて算出された前記表示回数に基づいて、前記表示回数が予め設定された基準を満たす前記フレームの前記領域を前記動画から抽出する抽出ステップであって、前記フレームにおいて前記表示回数が最大となる前記領域を含み、且つ、前記動画の送信において予め設定されているアスペクト比となる画像を前記動画から抽出する抽出ステップと、前記抽出ステップにおいて抽出された前記画像に基づいて、前記動画の内容を代表する代表画像を生成する生成ステップと、を含むことを特徴とする。
請求項１０に記載の発明は、動画を表示する複数の端末装置のそれぞれに、複数のフレームからなる前記動画を送信するサーバ装置において実行される情報処理方法において、各前記端末装置における前記動画の表示範囲を示す範囲データを、各前記端末装置から取得する取得ステップと、前記取得ステップにおいて取得された前記範囲データに基づいて、前記フレーム内の予め設定された領域ごとの前記端末装置における表示回数を、前記フレームごとに算出する算出ステップと、前記算出ステップにおいて算出された前記表示回数に基づいて、前記表示回数が予め設定された基準を満たす前記フレームの前記領域を前記動画から抽出する抽出ステップと、前記抽出ステップにおいて抽出された前記領域に基づいて、前記動画の内容を代表する代表画像を生成する生成ステップと、を含み、前記表示回数が予め設定された閾値より多い前記領域の大きさが前記動画の送信において予め設定されている最低解像度以下である場合に、前記抽出ステップにおいては、前記表示回数が前記閾値より多い前記領域の中心を中心として、前記最低解像度分の画像を前記動画から抽出し、前記表示回数が前記閾値より多い前記領域の大きさが前記最低解像度より大きい場合に、前記抽出ステップにおいては、前記表示回数が前記閾値より多い前記領域を含み、且つ、前記動画の送信において予め設定されているアスペクト比となる画像を前記動画から抽出することを特徴とする。 The invention according to claim 8 is an information processing method executed in a server device that transmits the moving image including a plurality of frames to each of a plurality of terminal devices that display a moving image. An acquisition step of acquiring range data indicating a display range from each of the terminal devices; and, based on the range data acquired in the acquiring step, the number of display times in the terminal device for each of predetermined regions in the frame. A calculating step of calculating, for each frame, an extracting step of extracting, from the moving image, the region of the frame in which the number of displays satisfies a preset criterion based on the number of times of display calculated in the calculating step. In the frame, with the center of the region where the number of times of display is maximum as the center An extraction step of extracting a minimum resolution of image that is preset in the transmission of the video from the video, based on the image extracted in the extracting step, generating a representative image representing the content of the videos And a generating step.
The invention according to claim 9 is an information processing method that is executed in a server device that transmits the moving image including a plurality of frames to each of a plurality of terminal devices that display a moving image. An acquisition step of acquiring range data indicating a display range from each of the terminal devices; and, based on the range data acquired in the acquiring step, the number of display times in the terminal device for each of predetermined regions in the frame. A calculating step of calculating, for each frame, an extracting step of extracting, from the moving image, the region of the frame in which the number of displays satisfies a preset criterion based on the number of times of display calculated in the calculating step. And including the area where the number of times of display is maximum in the frame, and An extracting step of extracting an image having a preset aspect ratio from the moving image in transmitting the image, and generating a representative image representing the contents of the moving image based on the image extracted in the extracting step. And step.
The invention according to claim 10 is an information processing method executed in a server device that transmits the moving image including a plurality of frames to each of a plurality of terminal devices that display a moving image. An acquisition step of acquiring range data indicating a display range from each of the terminal devices; and, based on the range data acquired in the acquiring step, the number of display times in the terminal device for each of predetermined regions in the frame. A calculating step of calculating, for each frame, an extracting step of extracting, from the moving image, the region of the frame in which the number of displays satisfies a preset criterion based on the number of times of display calculated in the calculating step. And representing the contents of the moving image based on the region extracted in the extracting step. Generating a representative image, wherein the size of the area in which the number of times of display is greater than a preset threshold is equal to or less than the minimum resolution preset in the transmission of the moving image. In the step, the image of the lowest resolution is extracted from the moving image around the center of the area where the number of times of display is greater than the threshold, and the size of the area where the number of times of display is greater than the threshold is equal to the minimum resolution. If it is larger, the extracting step includes extracting from the moving image an image that includes the area where the number of times of display is larger than the threshold value and has an aspect ratio set in advance in transmitting the moving image. And

請求項１１に記載の発明は、動画を表示する複数の端末装置のそれぞれに、複数のフレームからなる前記動画を送信するサーバ装置に含まれるコンピュータに、各前記端末装置における前記動画の表示範囲を示す範囲データを、各前記端末装置から取得する取得ステップと、前記取得ステップにおいて取得された前記範囲データに基づいて、前記フレーム内の予め設定された領域ごとの前記端末装置における表示回数を、前記フレームごとに算出する算出ステップと、前記算出ステップにおいて算出された前記表示回数に基づいて、前記表示回数が予め設定された基準を満たす前記フレームの前記領域を前記動画から抽出する抽出ステップであって、前記フレームにおいて前記表示回数が最大となる前記領域の中心を中心として、前記動画の送信において予め設定されている最低解像度分の画像を前記動画から抽出する抽出ステップと、前記抽出ステップにおいて抽出された前記画像に基づいて、前記動画の内容を代表する代表画像を生成する生成ステップと、を実行させることを特徴とする。
請求項１２に記載の発明は、動画を表示する複数の端末装置のそれぞれに、複数のフレームからなる前記動画を送信するサーバ装置に含まれるコンピュータに、各前記端末装置における前記動画の表示範囲を示す範囲データを、各前記端末装置から取得する取得ステップと、前記取得ステップにおいて取得された前記範囲データに基づいて、前記フレーム内の予め設定された領域ごとの前記端末装置における表示回数を、前記フレームごとに算出する算出ステップと、前記算出ステップにおいて算出された前記表示回数に基づいて、前記表示回数が予め設定された基準を満たす前記フレームの前記領域を前記動画から抽出する抽出ステップであって、前記フレームにおいて前記表示回数が最大となる前記領域を含み、且つ、前記動画の送信において予め設定されているアスペクト比となる画像を前記動画から抽出する抽出ステップと、前記抽出ステップにおいて抽出された前記画像に基づいて、前記動画の内容を代表する代表画像を生成する生成ステップと、を実行させることを特徴とする。
請求項１３に記載の発明は、動画を表示する複数の端末装置のそれぞれに、複数のフレームからなる前記動画を送信するサーバ装置に含まれるコンピュータに、各前記端末装置における前記動画の表示範囲を示す範囲データを、各前記端末装置から取得する取得ステップと、前記取得ステップにおいて取得された前記範囲データに基づいて、前記フレーム内の予め設定された領域ごとの前記端末装置における表示回数を、前記フレームごとに算出する算出ステップと、前記算出ステップにおいて算出された前記表示回数に基づいて、前記表示回数が予め設定された基準を満たす前記フレームの前記領域を前記動画から抽出する抽出ステップと、前記抽出ステップにおいて抽出された前記領域に基づいて、前記動画の内容を代表する代表画像を生成する生成ステップと、を実行させるプログラムであって、前記表示回数が予め設定された閾値より多い前記領域の大きさが前記動画の送信において予め設定されている最低解像度以下である場合に、前記抽出ステップを実行する前記コンピュータは、前記表示回数が前記閾値より多い前記領域の中心を中心として、前記最低解像度分の画像を前記動画から抽出し、前記表示回数が前記閾値より多い前記領域の大きさが前記最低解像度より大きい場合に、前記抽出ステップを実行する前記コンピュータは、前記表示回数が前記閾値より多い前記領域を含み、且つ、前記動画の送信において予め設定されているアスペクト比となる画像を前記動画から抽出することを特徴とする。 The invention according to claim 11 , wherein a plurality of terminal devices each displaying a moving image, a computer included in a server device that transmits the moving image composed of a plurality of frames, the display range of the moving image in each of the terminal devices The range data to be shown is obtained from each of the terminal devices, and, based on the range data obtained in the obtaining step, the number of times of display on the terminal device for each preset region in the frame, A calculating step of calculating for each frame, and an extracting step of extracting, from the moving image, the region of the frame, in which the number of display times satisfies a predetermined criterion, based on the number of display times calculated in the calculating step. , Centering on the center of the region where the number of display times is maximum in the frame, An extraction step of extracting a minimum resolution of image that is previously set in the signal from the video, based on the image extracted in said extraction step, a generation step of generating a representative image representing the content of the videos , Is executed.
In a twelfth aspect of the present invention, the display range of the moving image in each of the terminal devices is provided to a computer included in a server device that transmits the moving image including a plurality of frames to each of a plurality of terminal devices that display a moving image. The range data to be shown is obtained from each of the terminal devices, and, based on the range data obtained in the obtaining step, the number of times of display on the terminal device for each preset region in the frame, A calculating step of calculating for each frame, and an extracting step of extracting, from the moving image, the region of the frame, in which the number of display times satisfies a predetermined criterion, based on the number of display times calculated in the calculating step. Including the area in which the number of times of display is maximum in the frame, and for transmitting the moving image. An extraction step of extracting an image having an aspect ratio set in advance from the moving image, and a generating step of generating a representative image representing the contents of the moving image based on the image extracted in the extracting step, Is executed.
The invention according to claim 13 provides a computer included in a server device that transmits the moving image including a plurality of frames to each of a plurality of terminal devices that display a moving image, and a display range of the moving image in each of the terminal devices. The range data to be shown is obtained from each of the terminal devices, and, based on the range data obtained in the obtaining step, the number of times of display on the terminal device for each preset region in the frame, A calculating step of calculating for each frame, and an extracting step of extracting, from the moving image, the region of the frame, in which the number of display times satisfies a predetermined criterion, based on the number of display times calculated in the calculating step; Based on the region extracted in the extraction step, a representative image representing the content of the moving image Generating a step of executing the program, wherein when the size of the area where the number of times of display is larger than a preset threshold is equal to or less than a minimum resolution set in the transmission of the moving image, The computer that executes the extracting step extracts, from the moving image, an image of the lowest resolution, centering on the center of the area where the number of display times is greater than the threshold, and sets the size of the area where the number of display times is greater than the threshold value. When the image resolution is larger than the minimum resolution, the computer that executes the extraction step includes an image that includes the area where the number of times of display is larger than the threshold value and has an aspect ratio set in advance in the transmission of the moving image. Is extracted from the moving image.

請求項１、請求項８または請求項１１のいずれか一項に記載の発明によれば、メタデータ等の追加データを不要として、動画の内容を的確に代表する代表画像を生成することができると共に、最低解像度の画像からなる代表画像を生成することができる。
請求項２、請求項９または請求項１２のいずれか一項に記載の発明によれば、メタデータ等の追加データを不要として、動画の内容を的確に代表する代表画像を生成することができると共に、既定のアスペクト比を維持しつつ、動画の内容を的確に代表する代表画像を生成することができる。
請求項３、請求項１０または請求項１３のいずれか一項に記載の発明によれば、メタデータ等の追加データを不要として、動画の内容を的確に代表する代表画像を生成することができると共に、表示回数が閾値より多い領域の大きさと既定の最低解像度との関係に応じて、動画の内容を的確に代表する代表画像を生成することができる。 According to the invention described in any one of the first, eighth, and eleventh aspects, it is possible to generate a representative image that accurately represents the contents of a moving image without requiring additional data such as metadata. At the same time, it is possible to generate a representative image including the image with the lowest resolution.
According to the invention described in any one of the second, ninth and twelfth aspects, it is possible to generate a representative image that accurately represents the contents of a moving image without requiring additional data such as metadata. At the same time, it is possible to generate a representative image that accurately represents the contents of a moving image while maintaining a predetermined aspect ratio.
According to the invention described in any one of the third, tenth, and thirteenth aspects, it is possible to generate a representative image that accurately represents the contents of a moving image without requiring additional data such as metadata. At the same time, it is possible to generate a representative image that accurately represents the content of the moving image in accordance with the relationship between the size of the region where the number of times of display is larger than the threshold and the predetermined minimum resolution.

請求項４に記載の発明によれば、端末装置群に属する端末装置のユーザに最適化した代表画像を生成することができる。 According to the fourth aspect of the present invention, it is possible to generate a representative image optimized for the user of the terminal device belonging to the terminal device group.

請求項５に記載の発明によれば、動画の内容をより的確に代表する代表画像を生成することができる。 According to the fifth aspect of the present invention, it is possible to generate a representative image that more accurately represents the contents of a moving image.

請求項６に記載の発明によれば、表示回数として少なくても、その前後に対して表示回数がピークとなった領域を含む代表画像を生成することができる。 According to the invention described in claim 6 , it is possible to generate a representative image including a region where the number of display peaks before and after the number of display even if the number of display is small.

請求項７に記載の発明によれば、表示回数がより多い領域を含む代表画像を生成して動画の内容を的確に代表させることができる。 According to the invention described in claim 7 , it is possible to generate a representative image including a region where the number of times of display is larger, and accurately represent the contents of the moving image.

本実施形態の通信システムの概要構成例を示す図である。FIG. 1 is a diagram illustrating a schematic configuration example of a communication system according to an embodiment. （ａ）ないし（ｃ）は、本実施形態の最大表示回数の算出をそれぞれ例示する図である。(A) thru | or (c) are figures which respectively illustrate calculation of the maximum display frequency of this embodiment. （ａ）は本実施形態の最大表示回数のピーク値を例示する図である。（ｂ）は本実施形態の最大表示回数領域の切出しを例示する図である。（ｃ）は本実施形態の最大表示回数領域の切出しを例示する図である。（ｄ）は本実施形態の最大表示回数領域の切出しを例示する図である。（ｅ）は本実施形態の最大表示回数領域の切出しを例示する図である。(A) is a figure which illustrates the peak value of the maximum number of times of display of this embodiment. FIG. 3B is a diagram illustrating an example of cutting out a maximum display count area according to the present embodiment. FIG. 3C is a diagram illustrating an example of cutting out a maximum display count area according to the embodiment. FIG. 4D is a diagram illustrating an example of cutting out a maximum display count area according to the present embodiment. (E) is a figure which illustrates extraction of the maximum display frequency area of this embodiment. （ａ）は本実施形態のユーザクラスタリング処理を示すフローチャートである。（ｂ）は本実施形態の画像フレーム抽出処理を示すフローチャートである。（ｃ）は本実施形態のダイジェスト画像生成処理を示すフローチャートである。(A) is a flowchart showing a user clustering process of the present embodiment. (B) is a flowchart showing the image frame extraction processing of the present embodiment. (C) is a flowchart showing the digest image generation processing of the present embodiment.

以下、本発明の実施形態を図面に基づいて説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

（１）通信システムの構成および動作概要
図１は、本実施形態の通信システムの概要構成例を示す図である。図１に示すように、通信システムＳは、配信サーバ１および複数のクライアント２を含んで構成される。本実施形態の配信サーバ１は、本発明のサーバ装置の一例である。クライアント２は、本発明の端末装置の一例である。配信サーバ１とクライアント２とはネットワークＮＷを介して通信可能になっている。ネットワークＮＷは、例えば、インターネット、携帯通信網、およびゲートウェイ等により構成される。 (1) Outline of Configuration and Operation of Communication System FIG. 1 is a diagram showing an example of a schematic configuration of a communication system of the present embodiment. As shown in FIG. 1, the communication system S includes a distribution server 1 and a plurality of clients 2. The distribution server 1 of the present embodiment is an example of the server device of the present invention. The client 2 is an example of the terminal device of the present invention. The distribution server 1 and the client 2 can communicate with each other via a network NW. The network NW includes, for example, the Internet, a mobile communication network, a gateway, and the like.

配信サーバ１は、例えばクライアント２からの要求に応じて、動画データを含むコンテンツをクライアント２へ送信する。動画データは複数の画像フレームから構成される動画を表すデータである。コンテンツは音声データを含んでいてもよい。コンテンツの送信は、例えばネットワークＮＷを介してＨＴＴＰ（Hyper Text Transfer Protocol）ライブストリーミングを用いて行われる。例えば配信サーバ１は、所定のビデオカメラによる撮影中に、ビデオカメラから送信されてくるコンテンツをリアルタイムで配信するライブ配信を実行してもよい。あるいは配信サーバ１は、コンテンツを予め記憶し、記憶しておいたコンテンツをオンデマンド配信してもよい。所定のビデオカメラで撮影された動画を例として、ビデオカメラの撮影範囲全領域の動画を全体動画ということにする。またビデオカメラの撮影範囲の一部の領域の動画を部分動画ということにする。配信サーバ１は、クライアント２のユーザによる後述する操作に基づいて指定された表示範囲内の領域の動画（全体動画の場合もあるし、部分動画の場合もある）を表す動画データを含むコンテンツを、クライアント２へ送信する。 The distribution server 1 transmits content including moving image data to the client 2 in response to a request from the client 2, for example. The moving image data is data representing a moving image composed of a plurality of image frames. The content may include audio data. The transmission of the content is performed using, for example, HTTP (Hyper Text Transfer Protocol) live streaming via the network NW. For example, the distribution server 1 may execute live distribution for distributing the content transmitted from the video camera in real time during shooting by a predetermined video camera. Alternatively, the distribution server 1 may store the content in advance and distribute the stored content on demand. As an example, a moving image taken by a predetermined video camera is referred to as an entire moving image in the entire region of the shooting range of the video camera. Also, a moving image in a part of the shooting range of the video camera is referred to as a partial moving image. The distribution server 1 stores content including moving image data representing a moving image (may be an entire moving image or a partial moving image) in an area within a display range designated based on an operation described later by a user of the client 2. , To the client 2.

クライアント２は、配信サーバ１から配信されてくるコンテンツを受信する。クライアント２は、受信したコンテンツを再生することにより、コンテンツに含まれる動画データにより表される動画を、後述する表示部２４ａの画面に表示させる。コンテンツの再生中、クライアント２は、後述する操作部２５ａを用いたユーザによる疑似的なカメラワーク操作を受け付ける。カメラワーク操作とは、撮影者がカメラを動かすことで、例えば被写体に対するカメラの位置、被写体に対するカメラの角度（向き）、および被写体のサイズを決める操作をいう。本実施形態では、全体動画を表す動画データを構成する複数の画像フレームにおける表示対象に対するカメラワーク（仮想カメラを動かすこと）を、ユーザがあたかも実際のカメラを動かすように操作部２５ａを操作して疑似的に行う。このような操作を、「表示範囲の変更操作」という。この表示範囲の変更操作により、ユーザは、全体動画の中で所望の表示範囲を指定可能となり、またその指定により表示範囲を変更することができる。表示範囲は、全体動画を構成する画像フレーム全体の中で表示部２４ａの画面に表示される範囲である。ユーザは、表示範囲の変更操作により、全体動画に対する表示範囲の位置座標を、画像フレームごとに異なるように指定することができる。このような変更操作の例として、パンおよびチルト操作がある。パンおよびチルト操作は、画像フレームに対する仮想カメラを基準とする視点の方向を変更する操作である。またユーザは、表示範囲の変更操作により、全体動画に対する表示範囲のサイズを拡大または縮小することができる。すなわちユーザは、表示される画像の内容を変更せずに、表示範囲のサイズを拡大または縮小することができる。これにより、表示範囲内の画像を拡大または縮小することができる。このような変更操作の例として、ズーム操作がある。ズーム操作は、画像フレームに対する上記視点からの画角を変更する操作である。ズーム操作には、ズームイン操作とズームアウト操作とがある。ズームインとは、画角を狭める動作をいう。ズームアウトとは、画角を広げる動作をいう。クライアント２は、ユーザの操作に基づいて特定された表示範囲内の領域の動画を表す動画データを含むコンテンツを、配信サーバ１から受信する。クライアント２は、受信された動画データに基づいて、全体動画のうち表示範囲内の領域の動画を表示部２４ａに表示させる。クライアント２は、例えば、パーソナルコンピュータ、スマートフォン、携帯電話、テレビ、テレビゲーム機等であってもよい。 The client 2 receives the content distributed from the distribution server 1. By playing back the received content, the client 2 displays a moving image represented by moving image data included in the content on a screen of a display unit 24a described later. During the reproduction of the content, the client 2 accepts a pseudo camerawork operation by the user using the operation unit 25a described later. The camera work operation is an operation in which the photographer moves the camera to determine, for example, the position of the camera with respect to the subject, the angle (direction) of the camera with respect to the subject, and the size of the subject. In the present embodiment, camera operation (moving a virtual camera) for a display target in a plurality of image frames constituting moving image data representing an entire moving image is performed by operating the operation unit 25a as if the user moves an actual camera. Perform in a simulated manner. Such an operation is referred to as a “display range changing operation”. By the operation of changing the display range, the user can specify a desired display range in the entire moving image, and can change the display range by the specification. The display range is a range displayed on the screen of the display unit 24a in the entire image frame constituting the entire moving image. The user can specify the position coordinates of the display range with respect to the entire moving image so as to be different for each image frame by the operation of changing the display range. Examples of such a change operation include a pan and tilt operation. The pan and tilt operations are operations for changing the direction of the viewpoint with respect to the image frame based on the virtual camera. In addition, the user can enlarge or reduce the size of the display range for the entire moving image by changing the display range. That is, the user can enlarge or reduce the size of the display range without changing the content of the displayed image. Thereby, the image within the display range can be enlarged or reduced. A zoom operation is an example of such a change operation. The zoom operation is an operation for changing the angle of view of the image frame from the viewpoint. The zoom operation includes a zoom-in operation and a zoom-out operation. Zooming in refers to an operation of narrowing the angle of view. Zoom-out refers to an operation of increasing the angle of view. The client 2 receives, from the distribution server 1, content including moving image data representing a moving image in an area within a display range specified based on a user operation. The client 2 causes the display unit 24a to display a moving image in an area within the display range of the entire moving image based on the received moving image data. The client 2 may be, for example, a personal computer, a smartphone, a mobile phone, a television, a video game machine, or the like.

一方、配信サーバ１は、コンテンツを再生するクライアント２からカメラワークデータのアップロードを受け付ける。カメラワークデータは、コンテンツの再生時間の経過に従って、そのクライアント２における動画の表示範囲を示す疑似的なカメラワークに関する情報である。カメラワークデータは、画像フレームに対する表示範囲の位置座標およびサイズと、その画像フレームの再生位置とのセットを、少なくとも表示範囲の変更操作が行われた再生位置ごとに含む。なお、表示範囲の位置座標およびサイズは、パン、チルト、およびズームで表すとよい。パンとは、ビデオカメラ（仮想カメラ）の左右振りをいう。チルトとは、ビデオカメラ（仮想カメラ）の上下振りをいう。ズームとは、表示倍率をいう。再生位置は、動画データの再生開始からの時間的な位置をいう。再生位置は、動画データの再生開始から経過した時間であるという点において、再生時間ともいう。カメラワークデータは本発明の範囲データの一例である。 On the other hand, the distribution server 1 accepts upload of camera work data from the client 2 that reproduces the content. The camera work data is information on pseudo camera work indicating a display range of a moving image on the client 2 as the content reproduction time elapses. The camera work data includes a set of a position coordinate and a size of a display range with respect to an image frame and a reproduction position of the image frame for at least a reproduction position at which a display range change operation is performed. Note that the position coordinates and the size of the display range may be represented by pan, tilt, and zoom. Pan means swinging a video camera (virtual camera) from side to side. The tilt refers to a vertical swing of a video camera (virtual camera). Zoom means a display magnification. The reproduction position refers to a temporal position from the start of reproduction of the moving image data. The reproduction position is also referred to as a reproduction time in that the reproduction position is a time elapsed from the start of reproduction of the moving image data. Camera work data is an example of the range data of the present invention.

ここで本実施形態の通信システムＳは、仮想クライアントシステムとして動作する。すなわち通信システムＳは、各クライアント２から送信されたカメラワークデータに基づき、そのクライアント２における表示範囲内の領域の動画を表す動画データを含むコンテンツを、配信サーバ１からそのクライアント２へ送信する。このとき配信サーバ１は、各クライアント２に送信すべき動画データを含むコンテンツを記憶部１２から読み出して、各クライアント２に送信する。これにより各クライアント２では、そのユーザの任意によるカメラワーク操作に対応した表示範囲の動画を表す動画データを含むコンテンツを受信して、表示部２４ａに表示することができる。 Here, the communication system S of the present embodiment operates as a virtual client system. That is, based on the camera work data transmitted from each client 2, the communication system S transmits, from the distribution server 1 to the client 2, content including moving image data representing a moving image in an area within the display range of the client 2. At this time, the distribution server 1 reads the content including the moving image data to be transmitted to each client 2 from the storage unit 12 and transmits the content to each client 2. Thus, each client 2 can receive the content including the moving image data representing the moving image in the display range corresponding to the camera work operation by the user, and display the content on the display unit 24a.

また配信サーバ１は、本実施形態のダイジェスト画像を生成する。本実施形態のダイジェスト画像は、各クライアント２に送信するコンテンツの内容を代表するダイジェスト画像である。本実施形態のダイジェスト画像は、コンテンツに含まれる動画データの一部、あるいはその動画データを構成する一または複数の静止画データにより構成されている。ダイジェスト画像を構成する静止画データは例えば、そのコンテンツの内容を代表する、いわゆるサムネイル画像である。ダイジェスト画像としての動画データまたは静止画データを、以下単に「ダイジェスト画像としての動画データ等」と称する。ダイジェスト画像としての動画データ等は、例えばクライアント２からの要求に応じて、そのクライアント２に配信される。ダイジェスト画像としての動画データ等を受信したクライアント２のユーザは、その動画データ等を再生することにより、そのダイジェスト画像により代表されるコンテンツの内容またはその概要を認識することができる。本実施形態のダイジェスト画像は、各クライアント２から送信されるカメラワークデータに基づいて生成される。生成されたダイジェスト画像としての動画データ等は、例えば配信サーバ１内に記憶され、クライアント２からの要求に応じて送信される。生成されたダイジェスト画像としての動画データ等のクライアント２への送信方法は、例えば、ダイジェスト画像により代表されるコンテンツのクライアント２への送信方法と同様の方法を用いることができる。本実施形態のダイジェスト画像の生成については、後ほど詳述する。 The distribution server 1 generates a digest image according to the present embodiment. The digest image of the present embodiment is a digest image representing the content of the content transmitted to each client 2. The digest image of the present embodiment is configured by a part of the moving image data included in the content or one or a plurality of still image data constituting the moving image data. The still image data constituting the digest image is, for example, a so-called thumbnail image representing the content of the content. The moving image data or still image data as the digest image is hereinafter simply referred to as “moving image data as the digest image”. The moving image data or the like as the digest image is distributed to the client 2 in response to a request from the client 2, for example. The user of the client 2 that has received the moving image data or the like as the digest image can recognize the content of the content represented by the digest image or the outline thereof by reproducing the moving image data or the like. The digest image of the present embodiment is generated based on camera work data transmitted from each client 2. The generated moving image data or the like as a digest image is stored in, for example, the distribution server 1 and transmitted in response to a request from the client 2. As a method for transmitting the generated moving image data or the like as the digest image to the client 2, for example, a method similar to the method for transmitting the content represented by the digest image to the client 2 can be used. The generation of the digest image according to the present embodiment will be described later in detail.

（２）各装置の構成
次に図１を参照して、本実施形態の通信システムＳに含まれる各装置の構成について説明する。配信サーバ１は図１に示すように、制御部１１、記憶部１２およびインターフェース部１３等を備えて構成される。これらの構成要素は、バス１４に接続されている。インターフェース部１３は、ネットワークＮＷに接続される。インターフェース部１３は本発明の取得手段の一例である。制御部１１は、コンピュータとしてのＣＰＵ（Central Processing Unit）、ＲＯＭ（Read Only Memory）およびＲＡＭ（Random Access Memory）等により構成される。制御部１１はそれぞれ、本発明の算出手段の一例、抽出手段の一例および生成手段の一例である。記憶部１２は、例えばハードディスクドライブにより構成される。記憶部１２には、ＯＳ（Operating System）およびサーバプログラム等が記憶されている。サーバプログラムは、コンテンツまたはダイジェスト画像の送信処理等をＣＰＵに実行させるプログラムである。また記憶部１２には、コンテンツを構成する動画データおよび音声データが記憶される。動画データおよび音声データは、コンテンツＩＤ、および動画データまたは音声データのコンテンツにおける再生位置または再生時間と対応付けて記憶部１２に記憶される。コンテンツＩＤは、コンテンツを識別する識別情報である。 (2) Configuration of each device Next, the configuration of each device included in the communication system S of the present embodiment will be described with reference to FIG. As shown in FIG. 1, the distribution server 1 includes a control unit 11, a storage unit 12, an interface unit 13, and the like. These components are connected to the bus 14. The interface unit 13 is connected to the network NW. The interface unit 13 is an example of an acquisition unit according to the present invention. The control unit 11 includes a CPU (Central Processing Unit) as a computer, a ROM (Read Only Memory), a RAM (Random Access Memory), and the like. The control unit 11 is an example of a calculation unit, an example of an extraction unit, and an example of a generation unit of the present invention. The storage unit 12 is configured by, for example, a hard disk drive. The storage unit 12 stores an OS (Operating System), a server program, and the like. The server program is a program that causes the CPU to execute transmission processing of content or a digest image and the like. In addition, the storage unit 12 stores moving image data and audio data constituting the content. The moving image data and the audio data are stored in the storage unit 12 in association with the content ID and the reproduction position or the reproduction time in the contents of the moving image data or the audio data. The content ID is identification information for identifying the content.

次にクライアント２は図１に示すように、制御部２１、記憶部２２、ビデオＲＡＭ２３、映像制御部２４、操作処理部２５、音声制御部２６およびインターフェース部２７等を備えて構成される。これらの構成要素は、バス２８に接続されている。映像制御部２４には、ディスプレイを備える表示部２４ａが接続される。制御部２１は、コンピュータとしてのＣＰＵ、ＲＯＭおよびＲＡＭ等により構成される。操作処理部２５には、操作部２５ａが接続される。操作部２５ａには、例えば、マウス、キーボード、リモコン等が含まれる。表示部２４ａと操作部２５ａとを兼ねるタッチパネルが適用されてもよい。制御部２１は、ユーザによる操作部２５ａからの操作指示を、操作処理部２５を介して受け付ける。音声制御部２６には、スピーカ２６ａが接続される。インターフェース部２７は、ネットワークＮＷに接続される。記憶部２２は、例えば、ハードディスクドライブまたはフラッシュメモリ等により構成される。記憶部２２には、ＯＳ、およびプレイヤーソフトウェア等が記憶されている。プレイヤーソフトウェアは、コンテンツまたはダイジェスト画像の受信および再生処理等をＣＰＵに実行させるプログラムである。 Next, as shown in FIG. 1, the client 2 includes a control unit 21, a storage unit 22, a video RAM 23, a video control unit 24, an operation processing unit 25, an audio control unit 26, an interface unit 27, and the like. These components are connected to a bus 28. A display unit 24a including a display is connected to the video control unit 24. The control unit 21 includes a CPU as a computer, a ROM, a RAM, and the like. The operation unit 25a is connected to the operation processing unit 25. The operation unit 25a includes, for example, a mouse, a keyboard, a remote controller, and the like. A touch panel serving both as the display unit 24a and the operation unit 25a may be applied. The control unit 21 receives an operation instruction from the operation unit 25 a by the user via the operation processing unit 25. A speaker 26 a is connected to the audio control unit 26. The interface unit 27 is connected to the network NW. The storage unit 22 includes, for example, a hard disk drive or a flash memory. The storage unit 22 stores an OS, player software, and the like. The player software is a program that causes a CPU to execute processing for receiving and reproducing content or a digest image, and the like.

また配信サーバ１の記憶部１２は、上記カメラワークデータの受信処理等を制御部１１のＣＰＵに実行させるサーバプログラムも記憶している。さらに記憶部１２には、いずれかのクライアント２から受信したカメラワークデータが記憶される。各カメラワークデータは、コンテンツＩＤおよびユーザＩＤに対応付けて記憶部１２に記憶される。コンテンツＩＤは、対象のカメラワークデータが示す表示範囲でクライアント２により再生されていたコンテンツを示す。ユーザＩＤは、対象のカメラワークデータが示す疑似カメラワーク操作を行ったユーザを識別する識別情報である。また、ユーザＩＤは、そのユーザが利用するクライアント２を識別する情報でもある。なお、端末装置識別情報として、クライアント２のＩＰアドレス等が用いられてもよい。 The storage unit 12 of the distribution server 1 also stores a server program that causes the CPU of the control unit 11 to execute the above-described camera work data reception processing and the like. Further, the storage unit 12 stores camera work data received from any one of the clients 2. Each camera work data is stored in the storage unit 12 in association with the content ID and the user ID. The content ID indicates the content reproduced by the client 2 in the display range indicated by the target camera work data. The user ID is identification information for identifying a user who has performed the pseudo camera work operation indicated by the target camera work data. The user ID is also information for identifying the client 2 used by the user. Note that the IP address of the client 2 or the like may be used as the terminal device identification information.

（３）本実施形態のダイジェスト画像の生成および送信
次に図２および図３を参照して、本実施形態のダイジェスト画像の生成およびクライアント２への送信について説明する。図２は、本実施形態の最大表示回数の算出を例示する図である。図３は、本実施形態の最大表示回数のピーク値および本実施形態の最大表示回数領域の切出しをそれぞれ例示する図である。本実施形態の通信システムＳでは、クライアント２に配信されたコンテンツの内容を代表する本実施形態のダイジェスト画像が配信サーバ１において生成され、クライアント２からの要求に応じてそのクライアント２に送信される。配信サーバ１は、配信されたコンテンツについて各クライアント２から送信されたカメラワークデータに基づいて、そのコンテンツについてのダイジェスト画像を生成する。具体的に配信サーバ１は、各クライアント２において指定された表示範囲を示すカメラワークデータを受信する。そして配信サーバ１は、各クライアント２における後述の最大表示回数が、配信されたコンテンツに含まれていた動画データの再生時間軸においてピーク値または最大値となる画像の領域を含む画像フレームを用いて、そのコンテンツについてのダイジェスト画像を生成する。この場合の画像の領域は、一の画像フレームにおいて一または複数の画素からなる画像の領域である。すなわち、一のクライアント２へ配信されたコンテンツに含まれていた動画データを構成する画像フレームが、図２（ａ）に例示する画像フレームＧであるとする。そして、その画像フレームＧの中でそのクライアント２において指定された表示範囲が、図２に相当する表示範囲ＣＷ１であるとする。この表示範囲ＣＷ１は、画像フレームＧ内の領域であり、一または複数の画素により構成されている。この場合にそのクライアント２からは、表示範囲ＣＷ１を示すカメラワークデータが送信される。そして図２（ａ）の場合、表示範囲ＣＷ１の内側の領域の画像のクライアント２における表示回数は、１回である。表示範囲ＣＷ１の内側の領域の画像は、表示範囲ＣＷ１を示すカメラワークデータを送信した一のクライアント２において表示されたからである。これに対し、画像フレームＧにおける表示範囲ＣＷ１の外側の領域の画像は、いずれのクライアント２でも表示されていない。よって、画像フレームＧにおける表示範囲ＣＷ１の外側の領域の画像のクライアント２における表示回数は、０回である。なお図２（ａ）ないし図２（ｃ）では、画像フレームＧの領域（すなわち一または複数の画素からなる領域）ごとのクライアント２における表示回数が、かっこ書きで示されている。なお表示回数は、換言すれば、その画像フレームＧの領域の画像が表示されたクライアント２の数でもある。次に、画像フレームＧから構成される動画データを含むコンテンツが例えば三つのクライアント２に送信されているとすると、各クライアント２からは、例えば図２（ｂ）に例示する表示範囲ＣＷ１ないし表示範囲ＣＷ３をそれぞれ示すカメラワークデータが送信される。ここで、各クライアント２において指定された表示範囲ＣＷ１ないし表示範囲ＣＷ３の画像フレームＧにおける位置が、図２（ｂ）に例示する位置だったとする。この場合に、表示範囲ＣＷ１ないし表示範囲ＣＷ３には、画像フレームＧ内において図２（ｂ）に例示する重複範囲が生じる。そして図２（ｂ）に例示する場合における画像フレームＧの領域ごとの画像のクライアント２における表示回数は、図２（ｂ）においてかっこ書きで示される表示回数となる。すなわち例えば図２（ｂ）において４５度のハッチングで示すように、表示範囲ＣＷ１の内側で且つ表示範囲ＣＷ２および表示範囲ＣＷ２の外側の領域の画像のクライアント２における表示回数は、１回である。一方、表示範囲ＣＷ１および表示範囲ＣＷ２の内側で且つ表示範囲ＣＷ３の外側領域の画像のクライアント２における表示回数は、２回である。さらに、図２（ｂ）においてクロスハッチングで示すように、表示範囲ＣＷ１ないし表示範囲ＣＷ３すべての内側の領域の画像のクライアント２における表示回数は、３回である。そして、クライアント２におけるその画像の表示回数が表示範囲ＣＷ１ないし表示範囲ＣＷ３すべての内側の領域よりも多い領域は、図２（ｂ）に例示する画像フレームＧ内には存在しない。よって、図２（ｂ）に例示する画像フレームＧ内の領域の画像のクライアント２における表示回数の最大値は、「３」である。なお以下の説明において、画像フレームＧ内の領域のクライアント２における表示回数の最大値を、単に「最大表示回数」と称する。すなわち、図２（ｂ）に例示する画像フレームＧに対応する最大表示回数は、「３」である。次に、画像フレームＧから構成される動画データを含むコンテンツが例えば七つのクライアント２に送信されているとすると、各クライアント２からは、例えば図２（ｃ）に例示する表示範囲ＣＷ１ないし表示範囲ＣＷ７をそれぞれ示すカメラワークデータが送信される。ここで、各クライアント２において指定された表示範囲ＣＷ１ないし表示範囲ＣＷ７の画像フレームＧにおける位置が、図２（ｃ）に例示する位置だったとする。この場合に、表示範囲ＣＷ１ないし表示範囲ＣＷ７には、画像フレームＧ内において図２（ｃ）に例示する重複範囲が生じる。そして図２（ｃ）に例示する場合における画像フレームＧの領域ごとの画像の表示回数は、図２（ｃ）においてかっこ書きで示される表示回数となる。すなわち例えば図２（ｃ）において４５度のハッチングで示すように、表示範囲ＣＷ４の内側で且つ表示範囲ＣＷ１ないし表示範囲ＣＷ３および表示範囲ＣＷ５ないし表示範囲ＣＷ７の外側の領域の画像のクライアント２における表示回数は、１回である。一方、例えば図２（ｃ）において１３５度のハッチングで示すように、表示範囲ＣＷ４および表示範囲ＣＷ２の内側で且つ表示範囲ＣＷ５の外側の領域の画像のクライアント２における表示回数は、２回である。さらに、図２（ｃ）においてクロスハッチングで示すように、表示範囲ＣＷ１ないし表示範囲ＣＷ５すべての内側の領域の画像のクライアント２における表示回数は、５回である。そして、クライアント２における表示回数が表示範囲ＣＷ１ないし表示範囲ＣＷ５すべての内側の領域よりも多い領域は、画像フレームＧ内には存在しない。よって、図２（ｃ）に例示する画像フレームＧに対応する最大表示回数は、「５」である。本実施形態の配信サーバ１は、通信システムＳに属する各クライアント２からのカメラワークデータに基づき、最大表示回数を、図２に例示するように画像フレームＧごとに算出する。配信サーバ１は、算出した画像フレームＧごとの最大表示回数を、一時的に例えば記憶部１２に記憶する。そして配信サーバ１は、ダイジェスト画像により代表されるコンテンツの全体または一部について、その再生時間に応じた最大表示回数を画像フレームＧごとに検出する。ここで上記画像フレームＧは、コンテンツの再生に従った再生タイミングでクライアント２において表示される。よって画像フレームＧは、それが対応しているコンテンツの再生時間に、クライアント２において表示される。すなわち上記コンテンツの再生時間に応じた最大表示回数は、再生時間を横軸とし、最大表示回数を縦軸とすると、例えば図３（ａ）に例示する変化をする。なお図３（ａ）の横軸は、再生時間に代えて予め設定された再生タイミングからの画像フレーム数であってもよい。ここで図３（ａ）に例示する場合において、前後の再生時間に対して最大表示回数がピーク値を取るタイミングが、再生時間で４０秒、１９０秒および２８０秒であったとし、その場合のピークがそれぞれピークＰ１ないしピークＰ３となったとする。最大表示回数がピークＰ１ないしピークＰ３となるということは、ピークＰ１ないしピークＰ３に対応する画像フレームが、その前後の画像フレームよりも多くのクライアント２において表示されていることを意味する。そしてこの場合、ピークＰ１ないしピークＰ３のそれぞれに対応する画像フレームの内容が、図３（ａ）により最大表示回数の変化が示されるコンテンツの内容を代表しているといえる。そこで配信サーバ１は、図３（ａ）に例示する場合には、ピークＰ１ないしピークＰ３に対応する画像フレームに基づいて、図３（ａ）により最大表示回数の変化が示されるコンテンツの内容を代表するダイジェスト画像を生成する。なお、最大表示回数が再生時間に沿って単純に増加または減少するのみのコンテンツの場合は、上述したようなピークＰ１等ではなく、最大表示回数の最大値に対応する画像フレームに基づいてそのコンテンツの内容を代表するダイジェスト画像を生成してもよい。この場合、最大表示回数の上記最大値に対応する画像フレームが、上記ピークＰ１等に対応する画像フレームに対応する。また配信サーバ１は、画像フレームごとの最大表示回数に代えて、画像フレーム内の領域のクライアント２における表示回数が予め設定された閾値以上となる領域に基づいてダイジェスト画像を生成してもよい。その後配信サーバ１は、生成されたダイジェスト画像としての動画データ等を、クライアント２からの要求に応じてそのクライアント２に送信する。 (3) Generation and Transmission of Digest Image of the Present Embodiment Next , generation of a digest image and transmission to the client 2 of the present embodiment will be described with reference to FIGS. FIG. 2 is a diagram illustrating an example of calculating the maximum number of times of display according to the present embodiment. FIG. 3 is a diagram exemplifying a peak value of the maximum number of display times according to the present embodiment and a cutout of the maximum number of display times area according to the present embodiment. In the communication system S of the present embodiment, a digest image of the present embodiment representing the contents of the content distributed to the client 2 is generated in the distribution server 1 and transmitted to the client 2 in response to a request from the client 2. . The distribution server 1 generates a digest image of the distributed content based on the camerawork data transmitted from each client 2 for the content. Specifically, the distribution server 1 receives camera work data indicating a display range specified by each client 2. Then, the distribution server 1 uses an image frame including an image region in which the maximum number of display times described later in each client 2 has a peak value or a maximum value on the reproduction time axis of the moving image data included in the distributed content. , Generate a digest image for the content. The image region in this case is an image region including one or more pixels in one image frame. That is, it is assumed that an image frame constituting moving image data included in the content distributed to one client 2 is an image frame G illustrated in FIG. 2A. Then, it is assumed that the display range specified by the client 2 in the image frame G is the display range CW1 corresponding to FIG. The display range CW1 is an area in the image frame G, and includes one or a plurality of pixels. In this case, the client 2 transmits camera work data indicating the display range CW1. In the case of FIG. 2A, the number of times the image in the area inside the display range CW1 is displayed on the client 2 is one. This is because the image of the area inside the display range CW1 was displayed on one client 2 that transmitted the camera work data indicating the display range CW1. On the other hand, the image of the area outside the display range CW1 in the image frame G is not displayed by any of the clients 2. Therefore, the number of times the image in the area outside the display range CW1 in the image frame G is displayed on the client 2 is zero. In FIGS. 2A to 2C, the number of display times on the client 2 for each area of the image frame G (that is, an area including one or a plurality of pixels) is shown in parentheses. The number of times of display is, in other words, the number of clients 2 on which images in the area of the image frame G are displayed. Next, assuming that the content including the moving image data composed of the image frame G is transmitted to, for example, three clients 2, a display range CW1 to a display range illustrated in FIG. Camera work data respectively indicating CW3 is transmitted. Here, it is assumed that the positions of the display ranges CW1 to CW3 specified in each client 2 in the image frame G are the positions illustrated in FIG. 2B. In this case, in the display range CW1 to the display range CW3, an overlapping range illustrated in FIG. In the case illustrated in FIG. 2B, the number of times the image is displayed on the client 2 for each area of the image frame G is the number of times indicated by parentheses in FIG. 2B. That is, as shown by, for example, hatching at 45 degrees in FIG. 2B, the number of times the image in the client 2 is displayed inside the display range CW1 and outside the display range CW2 and the display range CW2. On the other hand, the number of times that the image in the area inside the display range CW1 and the display range CW2 and outside the display range CW3 is displayed on the client 2 twice. Further, as shown by cross-hatching in FIG. 2B, the number of times that the image of the area inside the entire display range CW1 to CW3 is displayed on the client 2 is three times. An area in which the number of times the image is displayed on the client 2 is larger than the area inside all of the display ranges CW1 to CW3 does not exist in the image frame G illustrated in FIG. 2B. Therefore, the maximum value of the number of times the image of the area in the image frame G illustrated in FIG. 2B is displayed on the client 2 is “3”. In the following description, the maximum value of the number of times of display of the area in the image frame G on the client 2 is simply referred to as “maximum number of times of display”. That is, the maximum number of display times corresponding to the image frame G illustrated in FIG. 2B is “3”. Next, assuming that the content including the moving image data composed of the image frame G is transmitted to, for example, seven clients 2, the display range CW 1 to the display range illustrated in FIG. Camera work data respectively indicating CW7 is transmitted. Here, it is assumed that the positions of the display ranges CW1 to CW7 specified in each client 2 in the image frame G are the positions illustrated in FIG. 2C. In this case, in the display range CW1 to the display range CW7, an overlapping range illustrated in FIG. The number of times the image is displayed for each region of the image frame G in the case illustrated in FIG. 2C is the number of times indicated by parentheses in FIG. 2C. That is, for example, as shown by hatching of 45 degrees in FIG. 2C, display of an image on the client 2 inside the display range CW4 and outside the display ranges CW1 to CW3 and the display ranges CW5 to CW7. The number of times is one. On the other hand, for example, as indicated by hatching at 135 degrees in FIG. 2C, the number of times of displaying the image of the area inside the display range CW4 and the display range CW2 and outside the display range CW5 on the client 2 is two times. . Further, as shown by cross-hatching in FIG. 2C, the number of times the image in the region inside the display range CW1 to the display range CW5 is displayed on the client 2 is five times. An area in which the number of times of display on the client 2 is larger than the area inside all of the display ranges CW1 to CW5 does not exist in the image frame G. Therefore, the maximum number of display times corresponding to the image frame G illustrated in FIG. 2C is “5”. The distribution server 1 of the present embodiment calculates the maximum number of display times for each image frame G as illustrated in FIG. 2 based on the camera work data from each client 2 belonging to the communication system S. The distribution server 1 temporarily stores the calculated maximum number of display times for each image frame G, for example, in the storage unit 12. Then, the distribution server 1 detects, for each image frame G, the maximum number of times of display corresponding to the reproduction time of the entire or a part of the content represented by the digest image. Here, the image frame G is displayed on the client 2 at a reproduction timing according to the reproduction of the content. Therefore, the image frame G is displayed on the client 2 at the playback time of the content corresponding to the image frame G. That is, the maximum number of display times according to the playback time of the content changes as exemplified in FIG. 3A, for example, where the horizontal axis is the playback time and the vertical axis is the maximum display number. Note that the horizontal axis in FIG. 3A may be the number of image frames from a preset reproduction timing instead of the reproduction time. Here, in the example illustrated in FIG. 3A, it is assumed that the timings at which the maximum number of display times reaches a peak value with respect to the preceding and following playback times are 40 seconds, 190 seconds, and 280 seconds in the playback time. It is assumed that the peaks are peaks P1 to P3, respectively. The fact that the maximum number of display times is the peak P1 to the peak P3 means that the image frames corresponding to the peaks P1 to P3 are displayed on more clients 2 than the image frames before and after the frame. In this case, it can be said that the content of the image frame corresponding to each of the peaks P1 to P3 is representative of the content of the content in which the change of the maximum display number is shown in FIG. Therefore, in the case illustrated in FIG. 3A, the distribution server 1, based on the image frames corresponding to the peaks P 1 to P 3, displays the content of the content in which the change of the maximum number of times of display is shown in FIG. Generate a representative digest image. In the case of content in which the maximum number of display times simply increases or decreases along with the playback time, the content is determined based on the image frame corresponding to the maximum value of the maximum number of display times instead of the peak P1 or the like as described above. May be generated. In this case, the image frame corresponding to the maximum value of the maximum number of times of display corresponds to the image frame corresponding to the peak P1 and the like. Further, the distribution server 1 may generate a digest image based on an area in which the number of times of display in the client 2 of the area in the image frame is equal to or larger than a predetermined threshold, instead of the maximum number of times of display for each image frame. Thereafter, the distribution server 1 transmits the generated moving image data as a digest image to the client 2 in response to a request from the client 2.

次に、図３（ａ）により最大表示回数の変化が示されるコンテンツの内容を代表するダイジェスト画像の生成方法について、具体的に図３（ｂ）および図３（ｅ）を用いて説明する。以下の説明では、図３（ａ）に示すピークＰ２に対応する画像フレームに基づいたダイジェスト画像の生成を例として説明する。ピークＰ２に対する画像フレームに基づいて本実施形態のダイジェスト画像を生成する場合、配信サーバ１は先ず、ピークＰ２に対する画像フレームに基づき、全体動画を表す動画データからダイジェスト画像の生成に用いる画像フレームを抽出する。ここで配信サーバ１による画像フレームの抽出方法としては、二通りの方法が考えられる。第１の抽出方法は図３（ｂ）に例示するように、再生時間におけるピークＰ２のタイミングを含んでそのタイミングの前後の予め設定された再生時間ＤＡ１に再生された画像フレームを抽出する方法である。すなわち配信サーバ１は、画像フレーム抽出の第１の方法として、最大表示回数が動画データの再生時間軸に対してピークＰ２となったタイミングを含んでその前後の予め設定された再生時間に再生された画像フレームを動画データから抽出する。この場合に配信サーバ１は、再生時間におけるピークＰ２のタイミングの前後の例えば５秒間に再生された合計１０秒分の画像フレームを抽出する。第２の抽出方法は図３（ｃ）に例示するように、再生時間におけるピークＰ２としての最大表示回数に対して予め設定された１未満の係数を乗じて得られる最大表示回数以上の最大表示回数に対応した再生時間ＤＡ２に再生された画像フレームを抽出する方法である。すなわち配信サーバ１は、画像フレーム抽出の第２の方法として、最大表示回数が予め設定された値（割合）以上である画像フレームを動画データから抽出する。この場合配信サーバ１は、ピークＰ２における最大表示回数に対して予め設定された例えば係数０．９を乗じて得られる最大表示回数Ｌ以上の最大表示回数に対応した再生時間ＤＡ２に再生された画像フレームを抽出する。 Next, a method of generating a digest image representing the content of the content whose change in the maximum number of display times is shown in FIG. 3A will be specifically described with reference to FIGS. 3B and 3E. In the following description, generation of a digest image based on an image frame corresponding to the peak P2 shown in FIG. 3A will be described as an example. When generating the digest image of the present embodiment based on the image frame corresponding to the peak P2, the distribution server 1 first extracts the image frame used for generating the digest image from the moving image data representing the entire moving image based on the image frame corresponding to the peak P2. I do. Here, as a method of extracting an image frame by the distribution server 1, two methods can be considered. The first extraction method is a method of extracting an image frame reproduced at a preset reproduction time DA1 before and after the timing including the timing of the peak P2 in the reproduction time as exemplified in FIG. 3B. is there. That is, as a first method of extracting the image frames, the distribution server 1 reproduces the image at the preset reproduction time before and after the maximum display number including the timing at which the maximum display count reaches the peak P2 with respect to the reproduction time axis of the moving image data. The extracted image frame is extracted from the moving image data. In this case, the distribution server 1 extracts a total of 10 seconds of image frames reproduced for, for example, 5 seconds before and after the timing of the peak P2 in the reproduction time. In the second extraction method, as illustrated in FIG. 3 (c), the maximum display times equal to or more than the maximum display times obtained by multiplying the maximum display times as the peak P2 in the reproduction time by a coefficient less than 1 set in advance. This is a method of extracting image frames reproduced during the reproduction time DA2 corresponding to the number of times. That is, as a second method of extracting image frames, the distribution server 1 extracts, from the moving image data, image frames whose maximum display times are equal to or greater than a preset value (ratio). In this case, the distribution server 1 determines whether the image reproduced during the reproduction time DA2 corresponding to the maximum number of display times L equal to or more than the maximum number of display times L obtained by multiplying the maximum number of display times at the peak P2 by, for example, a preset coefficient 0.9. Extract the frame.

配信サーバ１は次に、図３（ｂ）または図３（ｃ）に例示する方法を用いて抽出した画像フレームから、その画像フレームにおけるダイジェスト画像生成用の領域を切り出す。配信サーバ１は、例えば通信システムＳによるコンテンツの配信において予め設定されている、各クライアント２における表示の最低解像度およびアスペクト比に基づいて、ダイジェスト画像生成用の領域を画像フレームから切り出す。ダイジェスト画像用の画像フレームの領域は、一または複数の画素からなる領域である。ここで配信サーバ１によるダイジェスト画像生成用の画像フレームの領域の切出し方法としては、二通りの方法が考えられる。第１の切出し方法は図３（ｄ）に例示するように、ダイジェスト画像の生成に用いる画像フレームＧにおいて、その画像フレームＧにおける最大表示回数の領域Ｍの大きさが上記最低解像度以下である場合の切出し方法である。この場合に配信サーバ１は、図３（ｄ）に例示する最大表示回数の領域Ｍの中心を中心とした上記最低解像度分の大きさの領域ＡＲを、ダイジェスト画像生成用として画像フレームＧから切り出す。この場合に、図３（ｄ）に例示する最大表示回数の領域Ｍは、最低解像度分の大きさの領域ＡＲよりも小さくなる。第２の切出し方法は図３（ｅ）に例示するように、ダイジェスト画像の生成に用いる画像フレームＧにおいて、その画像フレームＧにおける最大表示回数の領域Ｍの大きさが上記最低解像度より大きい場合の切出し方法である。この場合に配信サーバ１は、図３（ｅ）に例示する最大表示回数の領域Ｍを含み、かつ上記アスペクト比となる領域ＡＲを、ダイジェスト画像生成用として画像フレームＧから切り出す。 Next, the distribution server 1 cuts out a region for digest image generation in the image frame extracted from the image frame extracted by using the method illustrated in FIG. 3B or 3C. The distribution server 1 cuts out a region for digest image generation from an image frame based on the minimum resolution and aspect ratio of display on each client 2 which are preset in, for example, distribution of content by the communication system S. The region of the image frame for the digest image is a region including one or a plurality of pixels. Here, as a method of extracting an area of an image frame for generating a digest image by the distribution server 1, two methods are conceivable. As illustrated in FIG. 3D, the first clipping method is a case where, in an image frame G used for generating a digest image, the size of the region M having the maximum number of display times in the image frame G is equal to or less than the minimum resolution. It is a cutting method. In this case, the distribution server 1 cuts out, from the image frame G, an area AR having a size corresponding to the minimum resolution and centering on the center of the area M having the maximum number of display times illustrated in FIG. . In this case, the area M of the maximum number of times of display illustrated in FIG. 3D is smaller than the area AR of the size corresponding to the minimum resolution. As illustrated in FIG. 3 (e), in the second extraction method, as shown in FIG. 3 (e), in an image frame G used for generating a digest image, the size of the region M having the maximum number of display times in the image frame G is larger than the above-described minimum resolution. It is a cutting method. In this case, the distribution server 1 cuts out, from the image frame G, an area AR that includes the area M of the maximum number of times of display illustrated in FIG.

そして配信サーバ１は、図３（ｂ）または図３（ｃ）のいずれかに例示される抽出方法により抽出した画像フレームＧから図３（ｄ）または図３（ｅ）のいずれかに例示される切出し方法により切り出した領域ＡＲを用いて、図３（ａ）により最大表示回数の変化が示されるコンテンツの内容を代表するダイジェスト画像を生成する。配信サーバ１は、図３（ａ）に例示するピークＰ１ないしピークＰ３について、ダイジェスト画像の生成を繰り返す。そして配信サーバ１は、生成されたダイジェスト画像を一時的に記憶部１２に記憶し、クライアント２からの要求に応じてそのクライアント２に送信する。 The distribution server 1 is illustrated in FIG. 3D or FIG. 3E from the image frame G extracted by the extraction method illustrated in FIG. 3B or FIG. 3C. Using the area AR extracted by the extraction method described above, a digest image representing the content of the content in which the change of the maximum display number is shown in FIG. 3A is generated. The distribution server 1 repeats generation of a digest image for the peaks P1 to P3 illustrated in FIG. Then, the distribution server 1 temporarily stores the generated digest image in the storage unit 12, and transmits the digest image to the client 2 in response to a request from the client 2.

なお配信サーバ１は、通信システムＳを構成する複数のクライアント２を、例えば予め設定されたそれぞれのユーザの属性に基づいて予め分類し、その分類したクライアント群ごとに対応付けて本実施形態のダイジェスト画像を生成してもよい。この場合にクライアント２の複数のクライアント群への分類は、換言するとそのクライアント２のユーザの分類でもある。このような分類を以下「クラスタリング」と称する。また分類されたクライアント群のそれぞれを、以下「クラスタ」と称する。この場合に配信サーバ１は、分類したクラスタをそれぞれ示す情報を、それに属するクライアント２を識別するユーザＩＤとともに記憶部１２に記憶する。そして配信サーバ１は、各クラスタに属するクライアント２からのカメラワークデータをクラスタごとに受信し、クラスタごとに最大表示回数を算出し、クラスタごとに図３（ａ）に例示される再生時間に対する最大表示回数の変化を検出する。配信サーバ１は、検出した最大表示回数の変化に基づき、そのピークに対応する画像フレームを用いた本実施形態のダイジェスト画像をクラスタごとに生成する。配信サーバ１は、生成したダイジェスト画像を、その生成に用いられたカメラワークデータを送信したクライアント２が属するクラスタごとに送信する。 The distribution server 1 classifies the plurality of clients 2 constituting the communication system S in advance based on, for example, preset attributes of each user, and associates each of the classified clients with the digest of the present embodiment. An image may be generated. In this case, the classification of the client 2 into a plurality of client groups is, in other words, the classification of the user of the client 2. Such a classification is hereinafter referred to as “clustering”. Each of the classified client groups is hereinafter referred to as a “cluster”. In this case, the distribution server 1 stores the information indicating each of the classified clusters in the storage unit 12 together with the user ID for identifying the client 2 belonging to the cluster. Then, the distribution server 1 receives the camera work data from the client 2 belonging to each cluster for each cluster, calculates the maximum number of display times for each cluster, and calculates the maximum display count for each cluster with respect to the reproduction time illustrated in FIG. Detect changes in display count. The distribution server 1 generates a digest image of the present embodiment for each cluster using an image frame corresponding to the peak based on the detected change in the maximum number of times of display. The distribution server 1 transmits the generated digest image for each cluster to which the client 2 that has transmitted the camera work data used for the generation belongs.

また、ピークＰ１ないしピークＰ３にそれぞれ対応する一の画像フレームにおいて最大表示回数の領域が複数ある場合、これら複数の領域を用いてダイジェスト画像を生成してもよい。さらに、生成されたダイジェスト画像の利用方法としては、例えば、配信可能なコンテンツを一覧表示するリスト画面や、各コンテンツの説明用画面等での利用が考えられるが、この他に、再生中のコンテンツにおけるシーン選択等で利用してもよい。また本実施形態のダイジェスト画像は、上記オンデマンド配信だけでなく、ライブ配信でも利用可能である。このライブ配信での利用の場合、リアルタイムにダイジェスト画像の生成処理を行い、例えばライブにおける休憩時間等においてそれを表示してもよい。さらに、例えばライブへの途中参加者（途中からの観覧者）に対してダイジェスト画像を提示することで、そのライブへのいわゆるキャッチアップを容易にすることに利用してもよい。 When one image frame corresponding to each of the peaks P1 to P3 includes a plurality of regions having the maximum number of times of display, a digest image may be generated using the plurality of regions. Further, as a method of using the generated digest image, for example, it can be used on a list screen for displaying a list of distributable contents, a screen for explaining each content, and the like. May be used for scene selection and the like. Further, the digest image of the present embodiment can be used not only for the above-described on-demand distribution but also for live distribution. In the case of use in this live distribution, a process of generating a digest image may be performed in real time, and the digest image may be displayed, for example, during a break during live performance. Further, for example, by presenting a digest image to a participant on the way to the live (viewer from the middle), the digest image may be used to facilitate so-called catch-up to the live.

（４）通信システムＳの動作
次に、図４を参照して、通信システムＳの動作について説明する。図４（ａ）は本実施形態のユーザクラスタリング処理を示すフローチャートである。図４（ｂ）は本実施形態の画像フレーム抽出処理を示すフローチャートである。図４（ｃ）は本実施形態のダイジェスト画像生成処理を示すフローチャートである。 (4) Operation of Communication System S Next, the operation of the communication system S will be described with reference to FIG. FIG. 4A is a flowchart illustrating a user clustering process according to the present embodiment. FIG. 4B is a flowchart illustrating an image frame extraction process according to the present embodiment. FIG. 4C is a flowchart illustrating the digest image generation processing according to the present embodiment.

（Ｉ）配信サーバ１におけるユーザクラスタリング処理
図４（ａ）を参照して、本実施形態のユーザクラスタリング処理について説明する。本実施形態の通信システムＳにおける配信サーバ１の制御部１１は、例えば予め設定された期間ごとに、図４（ａ）に示すユーザクラスタリング処理を開始する。先ず制御部１１は、ステップＳ１において、各クライアント２のユーザの特徴を示すユーザ特徴量を算出する。ここで、例えばあるユーザが、通信システムＳによりコンテンツの配信を受けるために通信システムＳへの参加を要求する際には、そのユーザを示す予め設定されたユーザ情報が、そのユーザを識別するユーザＩＤに関連付けて記憶部１２に記憶される。なおこの場合のユーザ情報としては、例えばそのユーザの年齢を示す年齢情報や、そのユーザの性別を示す性別情報、またはそのユーザの趣味や嗜好を示す趣味情報等が挙げられる。また制御部１１は、通信システムＳに含まれる各クライアント２における過去のコンテンツの視聴履歴情報を、ユーザＩＤおよびコンテンツＩＤに関連付けて記憶部１２に記憶している。そしてステップＳ１として制御部１１は、ユーザ毎に記憶されている上記ユーザ情報および上記視聴履歴情報に基づき、予め設定された方法により、各ユーザの特徴を示すユーザ特徴量をユーザ毎に算出する。そして制御部１１は、ステップＳ１で算出されたユーザ特徴量に基づき、例えば階層クラスタリング法等の従来のクラスタリング手法を用いて、その時点で通信システムＳに含まれているクライアント２のユーザをクラスタリングする（ステップＳ２）。このステップＳ２により、ユーザ特徴量に応じたユーザのクラスタが生成される。そして制御部１１は、ステップＳ２で生成されたクラスタを示す情報を、それに属するクライアント２を識別するユーザＩＤとともに記憶部１２に記憶する（ステップＳ３）。その後制御部１１は、ユーザクラスタリング処理を終了する。 (I) User Clustering Process in Distribution Server 1 The user clustering process of the present embodiment will be described with reference to FIG. The control unit 11 of the distribution server 1 in the communication system S according to the present embodiment starts the user clustering process illustrated in FIG. First, in step S1, the control unit 11 calculates a user characteristic amount indicating a characteristic of a user of each client 2. Here, for example, when a certain user requests to participate in the communication system S in order to receive the distribution of the content by the communication system S, the preset user information indicating the user includes a user identifying the user. The information is stored in the storage unit 12 in association with the ID. The user information in this case includes, for example, age information indicating the age of the user, gender information indicating the gender of the user, hobby information indicating a hobby or preference of the user, and the like. Further, the control unit 11 stores in the storage unit 12 the viewing history information of past contents in each client 2 included in the communication system S in association with the user ID and the content ID. Then, as step S1, the control unit 11 calculates, for each user, a user feature amount indicating a feature of each user based on the user information and the viewing history information stored for each user by a preset method. Then, based on the user feature amount calculated in step S1, the control unit 11 clusters the users of the clients 2 included in the communication system S at that time by using a conventional clustering method such as a hierarchical clustering method. (Step S2). By this step S2, a cluster of the user according to the user characteristic amount is generated. Then, the control unit 11 stores the information indicating the cluster generated in step S2 in the storage unit 12 together with the user ID for identifying the client 2 belonging to the cluster (step S3). After that, the control unit 11 ends the user clustering process.

（II）配信サーバ１における画像フレーム抽出処理
図４（ｂ）を参照して、本実施形態の画像フレーム抽出処理について説明する。以下に説明する画像フレーム抽出処理は、複数の画像フレームからなる動画として本実施形態のダイジェスト画像を生成する場合の画像フレーム抽出処理である。本実施形態の画像フレーム抽出処理は、本実地形態のダイジェスト画像の生成用に、ダイジェスト画像を生成するコンテンツのシーンを抽出する処理である。配信サーバ１の制御部１１は、例えばその管理者からの指示に基づき、配信済みのあるコンテンツについての本実施形態のダイジェスト画像を動画として生成する場合、初めに図４（ｂ）に示す本実施形態の画像フレーム抽出処理を行う。先ず制御部１１は、ステップＳ１０において、図４（ａ）に示すユーザクラスタリング処理により生成されたクラスタごとに、そのクラスタに属するクライアント２から送信されたカメラワークデータを記憶部１２から抽出して集計する（図２（ｂ）または図２（ｃ）参照）。すなわち制御部１１は、各クライアント２におけるコンテンツの表示範囲を示すカメラワークデータを、各クライアント２からクラスタごとに取得して記憶部１２に記憶しておき、それを抽出して集計する。次に制御部１１は、ステップＳ１０で集計したカメラワークデータに基づいて、図２または図３を用いて説明した方法により、各画像フレームにおける最大表示回数を算出する（ステップＳ１１）。すなわち制御部１１は、取得されているカメラワークデータに基づいて、画像フレーム内の既定領域ごとの最大表示回数を、画像フレームごとに算出する。次に制御部１１は、各画像フレームについて算出された最大表示回数に対して予め設定された例えば移動平均処理を用いたいわゆるスムージング処理を施す（ステップＳ１２）。次に制御部１１は、ステップＳ１２の結果を用いてそのコンテンツにおける最大表示回数のピーク値を算出する（ステップＳ１３）。ステップＳ１３で算出されるピーク値は、例えば図３（ａ）に例示するピークＰ１ないしピークＰ３それぞれにおける最大表示回数である。次に制御部１１は、ステップＳ１３において算出されたピーク値を、再生時間の順（図３（ａ）参照）から、ピーク値についての降順に並べ替える。さらに制御部１１は、並び替えられたピーク値の中から、ピーク値が最大のものから順に、ダイジェスト画像の生成に用いる画像フレームの数だけピーク値を抽出し、その抽出されたピーク値に対応する複数の画像フレームを記憶部１２から抽出する（ステップＳ１４）。すなわち制御部１１は、算出された最大表示回数に基づいて、その最大表示回数に対応する複数の画像フレームを抽出する。より具体的に制御部１１は、ステップＳ１４の第１例として、抽出されたピーク値をそれぞれに有する各ピーク（その一例は図３（ｂ）に例示するピークＰ２である）の再生時間におけるタイミングを含んでそのタイミングの前後の予め設定された再生時間（図３（ｂ）符号「ＤＡ１」参照）に再生された複数の画像フレームを抽出する。このステップＳ１４の第１例は、図３（ｂ）を用いて説明した第１の抽出方法に相当する。あるいは制御部１１は、ステップＳ１４の第２例として、抽出されたピーク値をそれぞれに有する各ピーク（その一例は図３（ｃ）に例示するピークＰ２である）に対して予め設定された１未満の係数を乗じて得られるピーク値（図３（ｃ）符号「Ｌ」参照）以上のピーク値に対応した再生時間（図３（ｃ）符号「ＤＡ２」参照）に再生された複数の画像フレームを抽出する。このステップＳ１４の第２例は、図３（ｃ）を用いて説明した第２の抽出方法に相当する。その後制御部１１は、本実施形態の画像フレーム抽出処理を終了する。 (II) Image Frame Extraction Process in Distribution Server 1 The image frame extraction process of the present embodiment will be described with reference to FIG. The image frame extraction process described below is an image frame extraction process when the digest image of the present embodiment is generated as a moving image including a plurality of image frames. The image frame extraction process of the present embodiment is a process of extracting a scene of a content for generating a digest image for generating a digest image of the actual form. For example, based on an instruction from the administrator, the control unit 11 of the distribution server 1 generates the digest image of the distributed content as a moving image according to the present embodiment as a moving image. An image frame extraction process of the form is performed. First, in step S10, the control unit 11 extracts, from the storage unit 12, the camera work data transmitted from the client 2 belonging to each cluster generated by the user clustering process illustrated in FIG. (See FIG. 2B or FIG. 2C). That is, the control unit 11 acquires camera work data indicating the display range of the content in each client 2 from each client 2 for each cluster, stores it in the storage unit 12, extracts it, and totals it. Next, the control unit 11 calculates the maximum number of display times in each image frame by the method described with reference to FIG. 2 or FIG. 3 based on the camera work data compiled in step S10 (step S11). That is, the control unit 11 calculates the maximum number of display times for each predetermined area in the image frame based on the acquired camera work data for each image frame. Next, the control unit 11 performs a so-called smoothing process using, for example, a predetermined moving average process on the maximum number of display times calculated for each image frame (step S12). Next, the control unit 11 calculates the peak value of the maximum number of times of display of the content using the result of step S12 (step S13). The peak value calculated in step S13 is, for example, the maximum number of times of display for each of the peaks P1 to P3 illustrated in FIG. Next, the control unit 11 sorts the peak values calculated in step S13 from the order of the reproduction time (see FIG. 3A) to the descending order of the peak values. Further, the control unit 11 extracts, from the rearranged peak values, the peak values in the order of the number of image frames used for generating the digest image in order from the one having the largest peak value, and corresponds to the extracted peak value. A plurality of image frames to be extracted are extracted from the storage unit 12 (step S14). That is, based on the calculated maximum number of display times, the control unit 11 extracts a plurality of image frames corresponding to the maximum number of display times. More specifically, as a first example of step S14, the control unit 11 determines the timing in the reproduction time of each peak having an extracted peak value (an example of which is a peak P2 illustrated in FIG. 3B). , And a plurality of image frames reproduced at a predetermined reproduction time before and after the timing (see reference numeral “DA1” in FIG. 3B) are extracted. The first example of step S14 corresponds to the first extraction method described with reference to FIG. Alternatively, as a second example of step S 14, the control unit 11 sets a preset 1 for each peak having an extracted peak value (an example of which is a peak P 2 illustrated in FIG. 3C). A plurality of images reproduced during a reproduction time (see code “DA2” in FIG. 3 (c)) corresponding to a peak value equal to or more than a peak value (see code “L” in FIG. 3 (c)) obtained by multiplying by a coefficient less than Extract the frame. The second example of step S14 corresponds to the second extraction method described with reference to FIG. Thereafter, the control unit 11 ends the image frame extraction processing of the present embodiment.

（III）配信サーバ１におけるダイジェスト画像生成処理
図４（ｃ）を参照して、本実施形態のダイジェスト画像生成処理について説明する。以下に説明するダイジェスト画像生成処理は、複数の画像フレームからなる動画として本実施形態のダイジェスト画像を生成する場合のダイジェスト画像生成処理である。配信サーバ１の制御部１１は、例えば図４（ｂ）に示す画像フレーム抽出処理が終了したタイミングから、図４（ｃ）に示すダイジェスト画像生成処理を開始する。図４（ｃ）に示すダイジェスト画像生成処理は例えば、図４（ｂ）のステップＳ１４において抽出された各ピーク値のピークについて実行される。先ず制御部１１は、ステップＳ２０において、図４（ｂ）に示す画像フレーム抽出処理により各ピークに対応して抽出された各画像フレームの中から、本実施形態のダイジェスト画像生成処理の対象とする画像フレームを選択する。次に制御部１１は、ステップＳ２１において、ステップＳ２０で選択した画像フレームにおける最大表示回数の領域（図３（ｄ）または図３（ｅ）符号Ｍ参照）の大きさを示す情報を取得する。次に制御部１１は、ステップＳ２１で取得した領域の大きさが、通信システムＳにおける予め設定された上記最低解像度よりも大きいか否かを判定する（ステップＳ２２）。ステップＳ２２の判定において、その画像フレームにおける最大表示回数の領域の大きさが最低解像度以下である場合（ステップＳ２２：ＮＯ）、制御部１１は、その最大表示回数の領域Ｍの中心を中心とした最低解像度分の大きさを有しかつ上記アスペクト比となる領域を、ダイジェスト画像生成用として、ステップＳ２０で選択した画像フレームから切り出す（ステップＳ２４、図３（ｄ）参照）。すなわち制御部１１は、算出された最大表示回数に基づいて、最大表示回数となる画像フレームの領域を抽出する。一方ステップＳ２２の判定において、その画像フレームにおける最大表示回数の領域の大きさが最低解像度より大きい場合（ステップＳ２２：ＹＥＳ）、制御部１１は、その最大表示回数の領域を含み、かつ上記アスペクト比となる領域を、ダイジェスト画像生成用として、ステップＳ２０で選択した画像フレームから切り出す（ステップＳ２３、図３（ｅ）参照）。その後制御部１１は、ステップＳ２３またはステップＳ２４においてステップＳ２０で選択した画像フレームから切り出された領域を用いて、配信済みの上記コンテンツの内容を代表するダイジェスト画像を生成し、それを記憶部１２に記憶する（ステップＳ２５）。すなわち制御部１１は、切り出された領域に基づいて、コンテンツの内容を代表するダイジェスト画像を生成する。より具体的に制御部１１は、ステップＳ２０で選択した画像フレームからステップＳ２３またはステップＳ２４において切り出された領域を用いて、その画像フレームとしてそのコンテンツの内容を代表するダイジェスト画像を生成する。その後制御部１１は、図４（ｂ）に示す画像フレーム抽出処理により各ピークについて抽出された画像フレームの全てについて本実施形態のダイジェスト画像生成処理が終了したか否かを判定する（ステップＳ２６）。ステップＳ２６の判定において、図４（ｂ）に示す画像フレーム抽出処理によりピーク値ごとに抽出された画像フレームの全てについてダイジェスト画像生成処理が終了していない場合（ステップＳ２６：ＮＯ）、制御部１１は上記ステップＳ２０に戻り、ダイジェスト画像生成処理の対象とする次の画像フレームを選択する。一方ステップＳ２６の判定において、図４（ｂ）に示す画像フレーム抽出処理により各ピーク値について抽出された画像フレームの全てについてダイジェスト画像生成処理が終了している場合（ステップＳ２６：ＹＥＳ）、制御部１１はそのままダイジェスト画像生成処理を終了する。なお図４（ｃ）に示す本実施形態のダイジェスト画像生成処理により生成されたダイジェスト画像としての動画データ等は、例えば、図４（ｂ）ステップＳ１０のクラスタごとに、そのクラスタに属するクライアント２からの要求に応じてそのクライアント２に送信される。 (III) Digest Image Generation Process in Distribution Server 1 The digest image generation process of the present embodiment will be described with reference to FIG. The digest image generation process described below is a digest image generation process when the digest image of the present embodiment is generated as a moving image including a plurality of image frames. The control unit 11 of the distribution server 1 starts the digest image generation processing illustrated in FIG. 4C from the timing when the image frame extraction processing illustrated in FIG. The digest image generation processing shown in FIG. 4C is executed, for example, for the peak of each peak value extracted in step S14 of FIG. 4B. First, in step S20, the control unit 11 selects, from the image frames extracted corresponding to each peak by the image frame extraction processing illustrated in FIG. 4B, the digest image generation processing according to the present embodiment. Select an image frame. Next, in step S21, the control unit 11 obtains information indicating the size of the region of the maximum number of display times (see the symbol M in FIG. 3D or FIG. 3E) in the image frame selected in step S20. Next, the control unit 11 determines whether or not the size of the area acquired in step S21 is larger than the previously set minimum resolution in the communication system S (step S22). In the determination of step S22, when the size of the region of the maximum number of display times in the image frame is equal to or less than the minimum resolution (step S22: NO), the control unit 11 centers the center of the region M of the maximum number of display times. An area having a size corresponding to the minimum resolution and having the above aspect ratio is cut out from the image frame selected in step S20 for generating a digest image (step S24, see FIG. 3D). That is, the control unit 11 extracts an area of the image frame having the maximum display count based on the calculated maximum display count. On the other hand, if the size of the region of the maximum number of display times in the image frame is larger than the minimum resolution in the determination of step S22 (step S22: YES), the control unit 11 includes the region of the maximum number of display times and Is cut out from the image frame selected in step S20 for generating a digest image (step S23, see FIG. 3E). Thereafter, the control unit 11 generates a digest image representing the content of the distributed content using the region cut out from the image frame selected in step S20 in step S23 or step S24, and stores the digest image in the storage unit 12. It is stored (step S25). That is, the control unit 11 generates a digest image representing the content of the content based on the cut-out area. More specifically, the control unit 11 generates a digest image representing the content of the content as the image frame using the region cut out in step S23 or step S24 from the image frame selected in step S20. Thereafter, the control unit 11 determines whether or not the digest image generation processing of the present embodiment has been completed for all the image frames extracted for each peak by the image frame extraction processing illustrated in FIG. 4B (step S26). . In the determination in step S26, if the digest image generation processing has not been completed for all the image frames extracted for each peak value by the image frame extraction processing shown in FIG. 4B (step S26: NO), the control unit 11 Returns to step S20, and selects the next image frame to be subjected to the digest image generation processing. On the other hand, in the determination of step S26, if the digest image generation processing has been completed for all the image frames extracted for each peak value by the image frame extraction processing shown in FIG. 4B (step S26: YES), the control unit No. 11 ends the digest image generation processing as it is. Note that moving image data and the like as a digest image generated by the digest image generation processing of the present embodiment illustrated in FIG. 4C are, for example, for each cluster in step S10 in FIG. Is transmitted to the client 2 in response to the request.

以上説明したように、本実施形態によれば、各クライアント２から取得されたカメラワークデータに基づいて、画像フレーム内の領域ごとのクライアント２における表示回数を画像フレームごとに算出する。そして、算出された表示回数が予め設定された基準（例えば最大表示回数または既定の閾値以上の表示回数）を満たす画像フレームの領域を抽出し、その抽出された領域に基づいてダイジェスト画像を生成する。よって、メタデータ等の追加データを不要として、コンテンツの内容を的確に代表するダイジェスト画像を生成することができる。 As described above, according to the present embodiment, the number of times of display in the client 2 for each area in the image frame is calculated for each image frame based on the camera work data acquired from each client 2. Then, an area of the image frame in which the calculated number of times of display satisfies a predetermined criterion (for example, the number of times of display equal to or greater than the maximum number of times of display or a predetermined threshold) is extracted, and a digest image is generated based on the extracted area. . Therefore, it is possible to generate a digest image that accurately represents the content of the content without requiring additional data such as metadata.

なお、静止画としてダイジェスト画像を生成する場合に制御部１１は、図４（ｂ）のステップＳ１４で抽出されたピーク値それぞれを有する各画像フレームのみ（すなわち、最大表示回数がピーク値となる各画像フレームのみ）を抽出する。そして制御部１１は、抽出した各画像フレームから図４（ｃ）に示す方法により切り出した領域を用いて、各画像フレームにより構成される動画データを含むコンテンツの内容を代表するダイジェスト画像を生成する。また制御部１１は、図４（ｂ）のステップＳ１４で抽出されたピーク値それぞれを有する各画像フレームの前後で、例えば所定の再生時間ごとに再生される画像フレームを用いてダイジェスト画像を生成してもよい。 Note that when generating a digest image as a still image, the control unit 11 controls only each image frame having each peak value extracted in step S14 of FIG. (Only image frames). Then, the control unit 11 generates a digest image representing the contents of the content including the moving image data constituted by each image frame, using the region cut out from each extracted image frame by the method shown in FIG. 4C. . In addition, the control unit 11 generates a digest image using, for example, image frames reproduced at predetermined reproduction times before and after each image frame having each peak value extracted in step S14 of FIG. 4B. You may.

１配信サーバ
２クライアント
１１、２１制御部
１２、２２記憶部
１３、２７インターフェース部
２４ａ表示部
２５ａ操作部
Ｓ通信システム
ＮＷネットワーク
Ｇ画像フレーム
Ｐ１、Ｐ２、Ｐ３ピーク
ＤＡ１、ＤＡ２再生時間
Ｍ、ＡＲ領域 Reference Signs List 1 distribution server 2 client 11, 21 control unit 12, 22 storage unit 13, 27 interface unit 24a display unit 25a operation unit S communication system NW network G image frame P1, P2, P3 peak DA1, DA2 playback time M, AR area

Claims

In each of a plurality of terminal devices that display a moving image, a server device that transmits the moving image including a plurality of frames,
Acquiring means for acquiring, from each of the terminal devices, range data indicating a display range of the moving image in each of the terminal devices;
Based on the range data obtained by the obtaining unit, a calculating unit that calculates the number of times of display on the terminal device for each of predetermined regions in the frame, for each frame,
Extracting means for extracting, from the moving image, the area of the frame in which the number of display times satisfies a predetermined criterion based on the number of display times calculated by the calculating means; Extracting means for extracting, from the moving image, an image of a minimum resolution set in advance in transmission of the moving image, with the center of the region as a center ,
Generating means for generating a representative image representing the contents of the moving image based on the image extracted by the extracting means;
A server device comprising:

In each of a plurality of terminal devices that display a moving image, a server device that transmits the moving image including a plurality of frames,
Acquiring means for acquiring, from each of the terminal devices, range data indicating a display range of the moving image in each of the terminal devices;
Based on the range data obtained by the obtaining unit, a calculating unit that calculates the number of times of display on the terminal device for each of predetermined regions in the frame, for each frame,
Extracting means for extracting, from the moving image, the area of the frame in which the number of display times satisfies a predetermined criterion based on the number of display times calculated by the calculating means; An extraction unit that includes the area to be extracted, and extracts an image having an aspect ratio preset in the transmission of the moving image from the moving image,
Generating means for generating a representative image representing the contents of the moving image based on the image extracted by the extracting means;
Server device comprising: a.

In each of a plurality of terminal devices that display a moving image, a server device that transmits the moving image including a plurality of frames,
Acquiring means for acquiring, from each of the terminal devices, range data indicating a display range of the moving image in each of the terminal devices;
Based on the range data obtained by the obtaining unit, a calculating unit that calculates the number of times of display on the terminal device for each of predetermined regions in the frame, for each frame,
Extracting means for extracting, from the moving image, the area of the frame that satisfies the predetermined number of display times based on the number of display times calculated by the calculating means;
Generating means for generating a representative image representing the content of the moving image based on the area extracted by the extracting means;
With
When the size of the area where the number of times of display is larger than a preset threshold is equal to or smaller than the minimum resolution set in the transmission of the moving image, the extraction unit may determine the area where the number of times of display is larger than the threshold. Extracting the image of the minimum resolution from the video with the center of
When the size of the region where the number of displays is greater than the threshold is larger than the minimum resolution, the extracting unit includes the region where the number of displays is greater than the threshold, and is set in advance in the transmission of the moving image. A server device for extracting an image having a given aspect ratio from the moving image .

The server device according to any one of claims 1 to 3,
The server device, wherein the acquisition unit acquires the range data from the terminal devices belonging to a terminal device group obtained by classifying a plurality of the terminal devices according to a preset classification method .

In the server device according to any one of claims 1 to 4,
The calculating means calculates the number of times of display for each pixel as the region,
The extracting means extracts the pixels from which the number of display times satisfies the criterion from the moving image,
The server device , wherein the generation unit generates the representative image based on the pixels extracted by the extraction unit.

The server device according to any one of claims 1 to 5,
The extraction unit includes a region where the number of display times calculated by the calculation unit includes a timing with a peak with respect to a reproduction time axis of the moving image, including the timing reproduced at a preset reproduction time before and after the timing. A server device for extracting the image from the moving image .

The server device according to any one of claims 1 to 6 ,
The extraction means, the server apparatus characterized that you extract the image of the region and the impressions that are calculated is preset threshold value or more by the calculation means from the moving image.

In each of a plurality of terminal devices displaying a moving image, an information processing method executed in a server device transmitting the moving image including a plurality of frames,
An acquiring step of acquiring, from each of the terminal devices, range data indicating a display range of the moving image in each of the terminal devices;
A calculating step of calculating, for each frame, the number of times of display on the terminal device for each of predetermined regions in the frame, based on the range data obtained in the obtaining step;
An extracting step of extracting, from the moving image, the region of the frame in which the display count satisfies a predetermined criterion based on the display count calculated in the calculation step, wherein the display count in the frame is the maximum. An extraction step of extracting, from the moving image, an image of a minimum resolution set in advance in the transmission of the moving image, with the center of the region as a center,
A generating step of generating a representative image representing the content of the moving image based on the image extracted in the extracting step;
An information processing method comprising :

In each of a plurality of terminal devices displaying a moving image, an information processing method executed in a server device transmitting the moving image including a plurality of frames,
An acquiring step of acquiring, from each of the terminal devices, range data indicating a display range of the moving image in each of the terminal devices;
A calculating step of calculating, for each frame, the number of times of display on the terminal device for each of predetermined regions in the frame, based on the range data obtained in the obtaining step;
An extracting step of extracting, from the moving image, the region of the frame in which the display count satisfies a predetermined criterion based on the display count calculated in the calculation step, wherein the display count in the frame is the maximum. An extraction step that includes the area to be extracted, and extracts an image having an aspect ratio set in advance in the transmission of the moving image from the moving image ,
A generating step of generating a representative image representing the content of the moving image based on the image extracted in the extracting step;
An information processing method comprising:

To each of the plurality of terminal devices for displaying moving, the information processing method is Oite executed in the server device for transmitting the video consisting of a plurality of frames,
An acquiring step of acquiring, from each of the terminal devices, range data indicating a display range of the moving image in each of the terminal devices;
A calculating step of calculating, for each frame, the number of times of display on the terminal device for each of predetermined regions in the frame, based on the range data obtained in the obtaining step;
An extracting step of extracting, from the moving image, the region of the frame, in which the display count satisfies a preset criterion, based on the display count calculated in the calculation step;
A generation step of generating a representative image representing the content of the moving image based on the region extracted in the extraction step;
Including
In a case where the size of the region where the number of times of display is larger than a preset threshold is equal to or less than a minimum resolution set in the transmission of the moving image, in the extracting step, the number of times of display is larger than the threshold. With the center of the area as the center, the image of the lowest resolution is extracted from the moving image,
When the size of the region where the number of display times is larger than the threshold is larger than the minimum resolution, the extracting step includes the region where the number of display times is larger than the threshold value, and is set in advance in transmission of the moving image. the information processing method characterized by extracting an image to be an aspect ratio that is from the video.

  For each of a plurality of terminal devices that display moving images, a computer included in a server device that transmits the moving images composed of a plurality of frames,
  An acquiring step of acquiring, from each of the terminal devices, range data indicating a display range of the moving image in each of the terminal devices;
  A calculating step of calculating, for each frame, the number of times of display on the terminal device for each of predetermined regions in the frame, based on the range data obtained in the obtaining step;
  An extracting step of extracting, from the moving image, the region of the frame in which the display count satisfies a predetermined criterion based on the display count calculated in the calculation step, wherein the display count in the frame is the maximum. An extraction step of extracting, from the moving image, an image of a minimum resolution set in advance in the transmission of the moving image, with the center of the region as a center,
  A generating step of generating a representative image representing the content of the moving image based on the image extracted in the extracting step;
  A program characterized by executing

  For each of a plurality of terminal devices that display moving images, a computer included in a server device that transmits the moving images composed of a plurality of frames,
  An acquiring step of acquiring, from each of the terminal devices, range data indicating a display range of the moving image in each of the terminal devices;
  A calculating step of calculating, for each frame, the number of times of display on the terminal device for each of predetermined regions in the frame, based on the range data obtained in the obtaining step;
  An extracting step of extracting, from the moving image, the region of the frame in which the display count satisfies a predetermined criterion based on the display count calculated in the calculation step, wherein the display count in the frame is the maximum. An extraction step that includes the area to be extracted, and extracts an image having an aspect ratio set in advance in the transmission of the moving image from the moving image,
  A generating step of generating a representative image representing the content of the moving image based on the image extracted in the extracting step;
  A program characterized by executing

  For each of a plurality of terminal devices that display moving images, a computer included in a server device that transmits the moving images composed of a plurality of frames,
  An acquiring step of acquiring, from each of the terminal devices, range data indicating a display range of the moving image in each of the terminal devices;
  A calculating step of calculating, for each frame, the number of times of display on the terminal device for each of predetermined regions in the frame, based on the range data obtained in the obtaining step;
  An extracting step of extracting, from the moving image, the region of the frame, in which the display count satisfies a preset criterion, based on the display count calculated in the calculation step;
  A generation step of generating a representative image representing the content of the moving image based on the region extracted in the extraction step;
  Is a program that executes
  If the size of the region where the number of display times is greater than a preset threshold is equal to or less than the minimum resolution set in the transmission of the moving image, the computer that executes the extraction step, wherein the number of display times is less than With the center of the region larger than a threshold as the center, an image for the minimum resolution is extracted from the moving image,
  When the size of the region in which the number of display times is greater than the threshold is larger than the minimum resolution, the computer that executes the extraction step includes the area in which the number of display times is greater than the threshold value, and A program for extracting an image having an aspect ratio preset in transmission from the moving image.