JP2020080115A

JP2020080115A - Thumbnail output device, thumbnail output method, and thumbnail output program

Info

Publication number: JP2020080115A
Application number: JP2018213903A
Authority: JP
Inventors: 喜美子川嶋; Kimiko Kawashima; 安永　健治; Kenji Yasunaga; 健治安永
Original assignee: Nippon Telegraph and Telephone West Corp
Current assignee: Nippon Telegraph and Telephone West Corp
Priority date: 2018-11-14
Filing date: 2018-11-14
Publication date: 2020-05-28
Anticipated expiration: 2038-11-14
Also published as: JP6793169B2

Abstract

To output thumbnail data for movie data by reflecting intention of a user thereto.SOLUTION: A thumbnail output device 1 comprises a person importance score calculation section 24 for: calculating an appearance time for each person recognized in processing target frame data 12 configuring movie data 11, by adding times until the frame data to be processed following the processing target frame data 12 where the person is recognized; calculating, for each person, an appearance person score which is a ratio of the appearance time for a person with respect to the maximum value of the appearance time for each person; and calculating the appearance person score of a person whose face area is largest in the processing target frame data 12, as a person importance score of the frame data. The device further comprises a thumbnail output section 29 for outputting the frame data with the high appearance person score, as thumbnail data 15.SELECTED DRAWING: Figure 1

Description

本発明は、映像データのサムネイルを出力するサムネイル出力装置、サムネイル出力方法およびサムネイル出力プログラムに関する。 The present invention relates to a thumbnail output device, a thumbnail output method, and a thumbnail output program that output thumbnails of video data.

一般的に、映像データの１つのフレームデータをサムネイルとして公開し、映像データを紹介する。従って、サムネイルは、映像データの特徴を捉えたフレームデータが用いられることが好ましい。 Generally, one frame data of video data is disclosed as a thumbnail to introduce the video data. Therefore, it is preferable to use frame data that captures the characteristics of video data as the thumbnail.

サムネイルは、手動により選択されることも可能であるが、AI(Artificial Intelligence)により、映像データからサムネイルを抽出する技術もある（非特許文献１および非特許文献２参照）。非特許文献１に開示される技術は、主にドラマ、映画等のジャンルを対象として、過去のノウハウを元にサムネイルを作成する。非特許文献２に開示される技術は、映像データについてユーザが設定したサムネイルのうち、再生回数が多くクオリティが高いと推測されるサムネイルと、再生回数が少なくクオリティが低いと推測されるサムネイルを大量に学習させ、サムネイルを生成する。 The thumbnail can be manually selected, but there is also a technique for extracting the thumbnail from the video data by AI (Artificial Intelligence) (see Non-Patent Document 1 and Non-Patent Document 2). The technique disclosed in Non-Patent Document 1 mainly targets genres such as dramas and movies, and creates thumbnails based on past know-how. In the technology disclosed in Non-Patent Document 2, among thumbnails set by the user for video data, a large number of thumbnails are assumed to have a large number of playbacks and high quality, and a large number of thumbnails are assumed to have a low number of playbacks and a low quality. To learn and generate thumbnails.

Madeline、外6名、"AVA: The Art and Science of Image Discovery at Netflix"、［online］、2018年2月8日、Netflix Technology Blog、［平成30年10月30日検索］、インターネット〈URL：https://medium.com/netflix-techblog/ava-the-art-and-science-of-image-discovery-at-netflix-a442f163af6〉Madeline, 6 people outside, "AVA: The Art and Science of Image Discovery at Netflix", [online], February 8, 2018, Netflix Technology Blog, [Search on October 30, 2018], Internet <URL: https://medium.com/netflix-techblog/ava-the-art-and-science-of-image-discovery-at-netflix-a442f163af6〉 Munenori Taniguchi、"なんでも自動化するGoogleがスゴイ YouTubeのサムネイル自動生成を解説、人工ニューラルネットワーク技術で最適フレームを選択"、［online］、2015年10月14日、engadget、［平成30年10月30日検索］、インターネット〈URL：https://japanese.engadget.com/2015/10/14/google-youtube/〉Munenori Taniguchi, "Google Automates Anything, Describes Amazing YouTube Automatic Thumbnail Generation, Selects Optimal Frame with Artificial Neural Network Technology", [online], October 14, 2015, engadget, [October 30, 2018 Search], Internet <URL: https://japanese.engadget.com/2015/10/14/google-youtube/>

しかしながらいずれの文献も、過去のノウハウまたは選択結果に基づいてサムネイルを作成するため、映像データのサムネイルを作成するユーザの意図は考慮されていない。例えば、映像データのジャンル、内容等によってサムネイルの抽出ポイントを変更したいなどのユーザの意図を考慮して、サムネイルを作成することはできない。 However, none of the documents creates a thumbnail based on past know-how or a selection result, and therefore does not consider the intention of the user who creates the thumbnail of the video data. For example, a thumbnail cannot be created in consideration of the user's intention such as changing the extraction point of the thumbnail depending on the genre and content of the video data.

従って本発明の目的は、ユーザの意向を反映して、映像データのサムネイルを出力するサムネイル出力装置、サムネイル出力方法およびサムネイル出力プログラムを提供することである。 Therefore, an object of the present invention is to provide a thumbnail output device, a thumbnail output method, and a thumbnail output program that output a thumbnail of video data, reflecting a user's intention.

上記課題を解決するために、本発明の第１の特徴は、映像データのサムネイルデータを出力するサムネイル出力装置に関する。本発明の第１の特徴に係るサムネイル出力装置は、映像データを構成するフレームデータで認識される各人物について、人物が認識されたフレームデータの次の処理対象のフレームデータまでの時間を加算した登場時間を算出し、各人物の登場時間の最大値に対する人物の登場時間の割合である登場人物スコアを、各人物について算出するとともに、フレームデータにおいて最も顔領域の大きい人物の登場人物スコアを、フレームデータの人物重要度スコアとして算出する人物重要度スコア算出部と、登場人物スコアの高いフレームデータを、サムネイルデータとして出力するサムネイル出力部を備える。 In order to solve the above problems, a first feature of the present invention relates to a thumbnail output device that outputs thumbnail data of video data. In the thumbnail output device according to the first aspect of the present invention, for each person recognized in the frame data forming the video data, the time until the frame data of the next processing target of the frame data in which the person is recognized is added. The appearance time is calculated, the character score that is the ratio of the appearance time of the person to the maximum value of the appearance time of each person is calculated for each person, and the character score of the person with the largest face area in the frame data is calculated. A person importance score calculation unit that calculates the person importance score of the frame data, and a thumbnail output unit that outputs frame data with a high character score as thumbnail data.

本発明の第２の特徴は、映像データのサムネイルデータを出力するサムネイル出力装置に関する。本発明の第２の特徴に係るサムネイル出力装置は、映像データを構成するフレームデータおよびフレームデータで認識される人物の顔領域の面積を算出し、 A second feature of the present invention relates to a thumbnail output device that outputs thumbnail data of video data. A thumbnail output device according to a second aspect of the present invention calculates frame data forming video data and an area of a face area of a person recognized by the frame data,

最適面積に近い顔領域の面積を有するフレームデータについて高くなり、最適面積に遠い顔領域の面積を有するフレームデータについて低くなる顔領域面積スコアを算出する顔領域面積スコア算出部と、顔領域面積スコアの高いフレームデータを、サムネイルデータとして出力するサムネイル出力部を備える。 A face area area score calculation unit that calculates a face area area score that becomes high for frame data having a face area close to the optimum area and low for frame data having a face area far from the optimum area; A thumbnail output unit that outputs high frame data as thumbnail data is provided.

本発明の第３の特徴は、映像データのサムネイルデータを出力するサムネイル出力装置に関する。本発明の第３の特徴に係るサムネイル出力装置は、映像データを構成するフレームデータで認識される人物について、表情の種類に対する表情値を算出し、フレームデータの人物の各表情の種類の表情値の合計に対する、各表情値のうちの最大値を、フレームデータの表情スコアとして算出する表情スコア算出部と、表情スコアの高いフレームデータを、サムネイルデータとして出力するサムネイル出力部を備える。 A third feature of the present invention relates to a thumbnail output device that outputs thumbnail data of video data. A thumbnail output device according to a third aspect of the present invention calculates a facial expression value for a facial expression type of a person recognized by frame data forming video data, and the facial expression value of each facial expression type of the person in the frame data is calculated. The facial expression score calculation unit that calculates the maximum value of the facial expression values as the facial expression score of the frame data, and the thumbnail output unit that outputs the frame data with a high facial expression score as thumbnail data.

本発明の第４の特徴は、映像データのサムネイルデータを出力するサムネイル出力装置に関する。本発明の第４の特徴に係るサムネイル出力装置は、映像データの音量が大きい時間に対応するフレームデータについて高くなり、音量が小さい時間に対応するフレームデータについて低くなる音量スコアを算出する音量スコア算出部と、音量スコアの高いフレームデータを、サムネイルデータとして出力するサムネイル出力部を備える。 A fourth feature of the present invention relates to a thumbnail output device that outputs thumbnail data of video data. The thumbnail output device according to the fourth aspect of the present invention calculates a volume score in which the frame data corresponding to the time when the volume of the video data is high becomes high, and the volume score becomes low for the frame data corresponding to the time when the volume of the video data is low. Section and a thumbnail output section for outputting frame data having a high volume score as thumbnail data.

本発明の第５の特徴は、映像データのサムネイルデータを出力するサムネイル出力装置に関する。本発明の第５の特徴に係るサムネイル出力装置は、映像データのフレームデータについて、第１の特徴に記載の人物重要度スコア、第２の特徴に記載の顔領域面積スコア、第３の特徴に記載の表情スコアおよび第４の特徴に記載の音量スコアのうちの１つ以上を含む複数のスコアに、重みをそれぞれ乗算して加算した統合スコアを算出する統合スコア算出部と、統合スコアの高いフレームデータを、サムネイルデータとして出力するサムネイル出力部を備える。 A fifth feature of the present invention relates to a thumbnail output device that outputs thumbnail data of video data. A thumbnail output device according to a fifth aspect of the present invention provides a person importance score described in the first feature, a face area area score described in the second feature, and a third feature for frame data of video data. An integrated score calculation unit that calculates an integrated score by multiplying a plurality of scores including one or more of the described facial expression score and one or more of the volume scores described in the fourth feature by weights, and adds a high integrated score. A thumbnail output unit that outputs the frame data as thumbnail data is provided.

本発明の第６の特徴は、映像データのサムネイルデータを出力するサムネイル出力方法に関する。本発明の第６の特徴に係るサムネイル出力方法は、コンピュータが、映像データを構成するフレームデータで認識される各人物について、人物が認識されたフレームデータの次の処理対象のフレームデータまでの時間を加算した登場時間を算出するとともに、コンピュータが、各人物の登場時間の最大値に対する人物の登場時間の割合である登場人物スコアを、各人物について算出するステップと、コンピュータが、フレームデータにおいて最も顔領域の大きい人物の登場人物スコアを、フレームデータの人物重要度スコアとして算出するステップと、コンピュータが、登場人物スコアの高いフレームデータを、サムネイルデータとして出力するステップを備える。 A sixth feature of the present invention relates to a thumbnail output method for outputting thumbnail data of video data. According to a sixth aspect of the present invention, in a thumbnail output method, a computer determines, for each person recognized in frame data forming video data, a time until the frame data to be processed next to the frame data in which the person is recognized. While calculating the appearance time by adding, the step for the computer to calculate the character score, which is the ratio of the appearance time of the person to the maximum value of the appearance time of each person, for each person, and The method includes: a step of calculating a character score of a person having a large face area as a person importance score of frame data; and a step of causing the computer to output frame data having a high character score as thumbnail data.

本発明の第７の特徴は、映像データのサムネイルデータを出力するサムネイル出力方法に関する。本発明の第７の特徴に係るサムネイル出力方法は、コンピュータが、映像データを構成するフレームデータおよびフレームデータで認識される人物の顔領域の面積を算出し、最適面積に近い顔領域の面積を有するフレームデータについて高くなり、最適面積に遠い顔領域の面積を有するフレームデータについて低くなる顔領域面積スコアを算出するステップと、記コンピュータが、顔領域面積スコアの高いフレームデータを、サムネイルデータとして出力するステップを備える。 A seventh feature of the present invention relates to a thumbnail output method for outputting thumbnail data of video data. In the thumbnail output method according to the seventh feature of the present invention, the computer calculates the area of the face data of the person recognized in the frame data and the frame data forming the video data, and determines the area of the face area close to the optimum area. A step of calculating a face area area score that becomes higher for the frame data that it has and becomes lower for the frame data that has an area of the face area that is far from the optimum area, and the computer outputs the frame data having a high face area area score as thumbnail data. The step of performing.

本発明の第８の特徴は、映像データのサムネイルデータを出力するサムネイル出力方法に関する。本発明の第８の特徴に係るサムネイル出力方法は、コンピュータが、映像データを構成するフレームデータで認識される人物について、表情の種類に対する表情値を算出し、フレームデータの人物の各表情の種類の表情値の合計に対する、各表情値のうちの最大値を、フレームデータの表情スコアとして算出するステップと、コンピュータが、表情スコアの高いフレームデータを、サムネイルデータとして出力するステップを備える。 An eighth feature of the present invention relates to a thumbnail output method for outputting thumbnail data of video data. In the thumbnail output method according to the eighth aspect of the present invention, a computer calculates a facial expression value for a facial expression type of a person recognized in frame data forming video data, and the facial expression type of the person in the frame data is calculated. And a step of calculating the maximum value of the facial expression values with respect to the total of the facial expression values as a facial expression score of the frame data, and outputting the frame data having a high facial expression score as thumbnail data.

本発明の第９の特徴は、映像データのサムネイルデータを出力するサムネイル出力方法に関する。本発明の第９の特徴に係るサムネイル出力方法は、コンピュータが、映像データの音量が大きい時間に対応するフレームデータについて高くなり、音量が小さい時間に対応するフレームデータについて低くなる音量スコアを算出するステップと、コンピュータが、音量スコアの高いフレームデータを、サムネイルデータとして出力するステップを備える。 A ninth feature of the present invention relates to a thumbnail output method for outputting thumbnail data of video data. In the thumbnail output method according to the ninth aspect of the present invention, the computer calculates a volume score that becomes high for frame data corresponding to a time when the volume of the video data is high and becomes low for frame data corresponding to a time when the volume of the video data is low. And a step of causing the computer to output frame data having a high volume score as thumbnail data.

本発明の第１０の特徴は、映像データのサムネイルデータを出力するサムネイル出力方法に関する。本発明の第１０の特徴に係るサムネイル出力方法は、コンピュータが、映像データのフレームデータについて、第６の特徴に記載の人物重要度スコア、第７の特徴に記載の顔領域面積スコア、第８の特徴に記載の表情スコアおよび第９の特徴に記載の音量スコアのうちの１つ以上を含む複数のスコアに、重みをそれぞれ乗算して加算した統合スコアを算出するステップと、コンピュータが、統合スコアの高いフレームデータを、サムネイルデータとして出力するステップを備える。 The tenth feature of the present invention relates to a thumbnail output method for outputting thumbnail data of video data. According to a tenth feature of the present invention, in the thumbnail output method, the computer, with respect to the frame data of the video data, has the person importance score described in the sixth feature, the face area area score described in the seventh feature, and the eighth feature. Calculating a combined score by multiplying a plurality of scores including one or more of the facial expression score described in the feature of 9 above and the volume score described in the ninth feature by weights, and the computer integrates The step of outputting frame data having a high score as thumbnail data is provided.

本発明の第１１の特徴は、コンピュータに、本発明の第１ないし第５の特徴に記載のサムネイル出力装置として機能させるためのサムネイル出力プログラムに関する。 An eleventh feature of the present invention relates to a thumbnail output program for causing a computer to function as the thumbnail output device according to the first to fifth features of the present invention.

本発明によれば、ユーザの意向を反映して、映像データのサムネイルを出力するサムネイル出力装置、サムネイル出力方法およびサムネイル出力プログラムを提供することができる。 According to the present invention, it is possible to provide a thumbnail output device, a thumbnail output method, and a thumbnail output program that output a thumbnail of video data by reflecting a user's intention.

本発明の実施の形態に係るサムネイル出力装置のハードウエア構成および機能ブロックを説明する図である。It is a figure explaining the hardware constitutions and functional block of the thumbnail output device concerning an embodiment of the invention. 本発明の実施の形態に係るサムネイル出力装置の条件データのデータ項目の一例を説明する図である。It is a figure explaining an example of the data item of the condition data of the thumbnail output device which concerns on embodiment of this invention. 本発明の実施の形態に係るサムネイル出力装置のスコアデータのデータ項目の一例を説明する図である。It is a figure explaining an example of the data item of the score data of the thumbnail output device concerning an embodiment of the invention. 本発明の実施の形態に係る人物重要度スコア算出部による人物重要度スコア算出処理を説明するフローチャートである。6 is a flowchart illustrating a person importance score calculation process by a person importance score calculator according to the embodiment of the present invention. 本発明の実施の形態に係る人物重要度スコア算出部において参照される人物認識結果の一例を説明する図である。It is a figure explaining an example of the person recognition result referred in the person importance score calculation part concerning an embodiment of the invention. 本発明の実施の形態に係る人物重要度スコア算出部において参照される処理対象フレームデータの一例を説明する図である。It is a figure explaining an example of processing object frame data referred in the person importance score calculation part concerning an embodiment of the invention. 本発明の実施の形態に係る顔領域面積スコア算出部による顔領域面積スコア算出処理を説明するフローチャートである。6 is a flowchart illustrating a face area area score calculation process by a face area area score calculation unit according to the embodiment of the present invention. 本発明の実施の形態に係る表情スコア算出部による表情スコア算出処理を説明するフローチャートである。It is a flow chart explaining the facial expression score calculation processing by the facial expression score calculation part concerning an embodiment of the invention. 本発明の実施の形態に係る表情スコア算出部において参照される表情認識結果の一例を説明する図である。It is a figure explaining an example of the facial expression recognition result referred in the facial expression score calculation part which concerns on embodiment of this invention. 本発明の実施の形態に係る音量スコア算出部による音量スコア算出処理を説明するフローチャートである。It is a flowchart explaining the volume score calculation process by the volume score calculation part which concerns on embodiment of this invention.

次に、図面を参照して、本発明の実施の形態を説明する。以下の図面の記載において、同一または類似の部分には同一または類似の符号を付している。 Next, an embodiment of the present invention will be described with reference to the drawings. In the following description of the drawings, the same or similar parts are denoted by the same or similar reference numerals.

（サムネイル出力装置）
図１を参照して、本発明の実施の形態に係るサムネイル出力装置１を説明する。サムネイル出力装置１は、映像データ１１から、ユーザの意図を反映したサムネイルデータ１５を出力する。 (Thumbnail output device)
A thumbnail output device 1 according to an embodiment of the present invention will be described with reference to FIG. The thumbnail output device 1 outputs the thumbnail data 15 reflecting the user's intention from the video data 11.

サムネイル出力装置１は、記憶装置１０および処理装置２０を備える一般的なコンピュータである。一般的なコンピュータがサムネイル出力プログラムを実行することにより、図１に示す機能を実現する。 The thumbnail output device 1 is a general computer including a storage device 10 and a processing device 20. A general computer executes the thumbnail output program to realize the functions shown in FIG.

記憶装置１０は、ＲＯＭ（Read Only Memory）、ＲＡＭ（Random access memory）、ハードディスク等であって、処理装置２０が処理を実行するための入力データ、出力データおよび中間データなどの各種データを記憶する。処理装置２０は、ＣＰＵ（Central Processing Unit）であって、記憶装置１０に記憶されたデータを読み書きして、サムネイル出力装置１における処理を実行する。 The storage device 10 is a read only memory (ROM), a random access memory (RAM), a hard disk, or the like, and stores various data such as input data, output data, and intermediate data for the processing device 20 to perform processing. . The processing device 20, which is a CPU (Central Processing Unit), reads and writes data stored in the storage device 10 and executes processing in the thumbnail output device 1.

また図１には示さないが、キーボード、マウス、ディスプレイなどの入出力装置、入出力装置と処理装置のインタフェースとなる入出力インタフェース等を備えても良い。 Although not shown in FIG. 1, an input/output device such as a keyboard, a mouse, and a display, an input/output interface serving as an interface between the input/output device and the processing device, and the like may be provided.

記憶装置１０は、サムネイル出力プログラムを記憶するとともに、映像データ１１、処理対象フレームデータ１２、条件データ１３、スコアデータ１４およびサムネイルデータ１５を記憶する。 The storage device 10 stores a thumbnail output program, and also stores video data 11, processing target frame data 12, condition data 13, score data 14, and thumbnail data 15.

映像データ１１は、サムネイル出力装置１が出力するサムネイルデータ１５が表現するコンテンツである。映像データ１１は、１秒あたり３０枚などの、複数のフレームデータ、および時系列の音声データを含む。音声データは、映像データ１１における音声および音量の推移のデータである。 The video data 11 is content represented by the thumbnail data 15 output by the thumbnail output device 1. The video data 11 includes a plurality of frame data, such as 30 frames per second, and time-series audio data. The audio data is data of changes in audio and volume in the video data 11.

処理対象フレームデータ１２は、映像データ１１の複数のフレームデータのうち、後述のスコア算出部２３による処理対象となるフレームデータである。処理対象フレームデータ１２は、複数であっても良い。処理対象フレームデータ１２は、１秒あたり１枚など、映像データ１１に含まれる複数のフレームデータから所定の頻度で間引かれたデータであっても良いし、ランダムに間引かれたデータであっても良い。また処理対象フレームデータ１２は、映像データをシーン分割し、各シーンから抽出されたフレームデータなど、所定の処理を経て映像データ１１から抽出されても良い。 The processing target frame data 12 is frame data to be processed by the score calculation unit 23, which will be described later, among the plurality of frame data of the video data 11. The processing target frame data 12 may be plural. The processing target frame data 12 may be data thinned out at a predetermined frequency from a plurality of frame data included in the video data 11, such as one frame per second, or may be data thinned out at random. May be. Further, the frame data 12 to be processed may be extracted from the video data 11 through predetermined processing such as frame data extracted from each scene by dividing the video data into scenes.

条件データ１３は、サムネイルデータ１５を作成する条件のデータである。条件データ１３は、図２に示すように、最適面積、サムネイル数および重みを含む。最適面積は、後述の顔領域面積スコア算出部２５において参照される。本発明の実施の形態において最適面積は、フレームデータに対する面積率で表現するが、フレームデータにおける画素数で表現されても良い。サムネイル数は、後述のサムネイル出力部２９が出力するサムネイルデータ１５の数であって、自然数が設定される。 The condition data 13 is data of a condition for creating the thumbnail data 15. As shown in FIG. 2, the condition data 13 includes the optimum area, the number of thumbnails, and the weight. The optimum area is referred to by the face area area score calculation unit 25 described later. In the embodiment of the present invention, the optimum area is expressed by the area ratio with respect to the frame data, but may be expressed by the number of pixels in the frame data. The number of thumbnails is the number of thumbnail data 15 output by a thumbnail output unit 29 described later, and a natural number is set.

重みは、後述の統合スコア算出部２８について、人物重要度スコア、顔領域面積スコア、表情スコアおよび音量スコアのうちの１以上を含む複数のスコアを考慮した統合スコアを算出する際に参照される。重みは、人物重要度スコア、顔領域面積スコア、表情スコアおよび音量スコアのうちの１つ以上と、そのほかのスコアのうち、統合スコアを算出するために用いられる各スコアに対して、設定される。 The weight is referred to by the integrated score calculation unit 28, which will be described later, when calculating an integrated score in consideration of a plurality of scores including one or more of a person importance score, a face area score, a facial expression score, and a volume score. . The weight is set for at least one of the person importance score, the face area area score, the facial expression score, and the volume score, and for each of the other scores, which is used for calculating the integrated score. .

スコアデータ１４は、スコア算出部２３による算出結果のデータである。スコアデータ１４は、図３に示すように、フレームデータの識別子に対して、人物重要度スコア、顔領域面積スコア、表情スコア、音量スコアおよび統合スコアを対応づける。図３に示すフレームデータの識別子は、処理対象フレームデータ１２の識別子である。人物重要度スコア、顔領域面積スコア、表情スコア、音量スコアおよび統合スコアは、それぞれ、スコア算出部２３の人物重要度スコア算出部２４、顔領域面積スコア算出部２５、表情スコア算出部２６、音量スコア算出部２７および統合スコア算出部２８のそれぞれの算出結果である。 The score data 14 is data of the calculation result by the score calculation unit 23. As shown in FIG. 3, the score data 14 associates the identifier of the frame data with the person importance score, the face area score, the facial expression score, the volume score, and the integrated score. The frame data identifier shown in FIG. 3 is the identifier of the frame data 12 to be processed. The person importance score, the face area area score, the facial expression score, the volume score, and the integrated score are respectively the person importance score calculation unit 24, the face area area score calculation unit 25, the facial expression score calculation unit 26, and the volume of the score calculation unit 23. These are calculation results of the score calculation unit 27 and the integrated score calculation unit 28, respectively.

スコアデータ１４は、各フレームデータについて、人物重要度スコア、顔領域面積スコア、表情スコア、音量スコアおよび統合スコアが対応づけられる必要はなく、サムネイルデータ１５を出力する際に参照されるスコアが設定されていれば良い。例えば人物重要度スコアのみに基づいてサムネイルデータ１５を出力する場合、各フレームデータに対して人物重要度スコアのみが設定されれば良い。また人物重要度スコアと顔領域面積スコアを統合した統合スコアに基づいてサムネイルデータ１５を出力する場合、各フレームデータに対して人物重要度スコア、顔領域面積スコアおよび統合スコアのみが設定されれば良い。 The score data 14 does not need to be associated with the person importance score, the face area score, the facial expression score, the volume score, and the integrated score for each frame data, and the score referred to when the thumbnail data 15 is output is set. It should be done. For example, when outputting the thumbnail data 15 based on only the person importance score, only the person importance score needs to be set for each frame data. When outputting the thumbnail data 15 based on the integrated score obtained by integrating the person importance score and the face area area score, if only the person importance score, face area area score, and integrated score are set for each frame data, good.

サムネイルデータ１５は、サムネイル出力装置１が出力するサムネイルデータ１５である。サムネイルデータ１５は、スコアデータ１４に示す処理対象フレームデータ１２毎に算出されたスコアに基づいて、決定される。記憶装置１０は、条件データ１３のサムネイル数のサムネイルデータ１５を記憶しても良い。 The thumbnail data 15 is the thumbnail data 15 output by the thumbnail output device 1. The thumbnail data 15 is determined based on the score calculated for each processing target frame data 12 shown in the score data 14. The storage device 10 may store thumbnail data 15 of the number of thumbnails of the condition data 13.

処理装置２０は、処理対象フレーム抽出部２１、条件データ取得部２２、スコア算出部２３およびサムネイル出力部２９を備える。 The processing device 20 includes a processing target frame extraction unit 21, a condition data acquisition unit 22, a score calculation unit 23, and a thumbnail output unit 29.

処理対象フレーム抽出部２１は、映像データ１１を構成するフレームデータのうち、サムネイルデータ１５の候補となるフレームデータを、処理対象フレームデータ１２として抽出する。処理対象フレーム抽出部２１は、１秒あたり１枚など、映像データ１１に含まれる複数のフレームデータから所定の頻度で間引いて、処理対象フレームデータ１２を抽出しても良いし、ランダムに抽出しても良い。また処理対象フレームデータ１２は、映像データをシーン分割し、各シーンから抽出されたフレームデータなど、所定の処理を経て、処理対象フレームデータ１２を抽出しても良い。また処理対象フレーム抽出部２１は、映像データ１１の各フレームデータを、処理対象フレームデータ１２として抽出しても良い。 The processing target frame extraction unit 21 extracts, as the processing target frame data 12, the frame data that is a candidate for the thumbnail data 15 from the frame data forming the video data 11. The processing target frame extraction unit 21 may extract the processing target frame data 12 by thinning out a plurality of frame data included in the video data 11 at a predetermined frequency, such as one frame per second, or by extracting at random. May be. Further, the processing target frame data 12 may be obtained by dividing the video data into scenes and performing the predetermined processing such as the frame data extracted from each scene to extract the processing target frame data 12. Further, the processing target frame extraction unit 21 may extract each frame data of the video data 11 as the processing target frame data 12.

条件データ取得部２２は、図２を参照して説明した条件データ１３の各項目を、例えばユーザの入力により取得して、条件データ１３を記憶装置１０に記憶する。これらの項目は、予め記憶装置１０に記憶されていても良い。また、条件データ１３の各項目について設定される必要はなく、統合スコアを算出しない場合、重みの項目が設定されないなど、必要な条件が設定されていればよい。 The condition data acquisition unit 22 acquires each item of the condition data 13 described with reference to FIG. 2 by, for example, user input, and stores the condition data 13 in the storage device 10. These items may be stored in the storage device 10 in advance. Further, it is not necessary to set for each item of the condition data 13, and if the integrated score is not calculated, the necessary condition may be set such that the item of weight is not set.

また条件データ取得部２２は、人物重要度スコア、顔領域面積スコア、表情スコア、音量スコアおよび統合スコアのうちのいずれのスコアに基づいて、サムネイルデータ１５を出力するかを示す指標を有していても良い。 Further, the condition data acquisition unit 22 has an index indicating which of the person importance score, the face area area score, the facial expression score, the volume score, and the integrated score, to output the thumbnail data 15. May be.

スコア算出部２３は、処理対象フレームデータ１２のそれぞれについて、サムネイルデータ１５を決定する指標となるスコアを算出する。スコア算出部２３は、人物重要度スコア算出部２４、顔領域面積スコア算出部２５、表情スコア算出部２６、音量スコア算出部２７および統合スコア算出部２８を備える。 The score calculation unit 23 calculates a score as an index for determining the thumbnail data 15 for each of the processing target frame data 12. The score calculation unit 23 includes a person importance score calculation unit 24, a face area area score calculation unit 25, a facial expression score calculation unit 26, a volume score calculation unit 27, and an integrated score calculation unit 28.

本発明の実施の形態において、人物重要度スコア算出部２４、顔領域面積スコア算出部２５、表情スコア算出部２６、音量スコア算出部２７および統合スコア算出部２８はそれぞれ、処理対象フレームデータ１２のそれぞれについて、人物重要度スコア、顔領域面積スコア、表情スコア、音量スコアおよび統合スコアを算出する場合を説明するが、これに限らない。例えば、人物重要度スコアのみに基づいてサムネイルデータ１５を出力する場合、人物重要度スコア算出部２４のみが処理すれば良い。また人物重要度スコアと顔領域面積スコアに基づいてサムネイルデータ１５を出力する場合、人物重要度スコア算出部２４、顔領域面積スコア算出部２５および統合スコア算出部２８が処理すれば良い。このように、サムネイルデータ１５を出力する指標に応じて、処理対象フレームデータ１２を処理する算出部が限定されても良い。 In the embodiment of the present invention, the person importance score calculation unit 24, the face area area score calculation unit 25, the facial expression score calculation unit 26, the sound volume score calculation unit 27, and the integrated score calculation unit 28 respectively include the processing target frame data 12. The case of calculating the person importance score, the face area area score, the facial expression score, the sound volume score, and the integrated score will be described for each, but not limited to this. For example, when outputting the thumbnail data 15 based on only the person importance score, only the person importance score calculation unit 24 needs to process. Further, when outputting the thumbnail data 15 based on the person importance score and the face area area score, the person importance degree score calculation section 24, the face area area score calculation section 25, and the integrated score calculation section 28 may be processed. In this way, the calculation unit that processes the processing target frame data 12 may be limited according to the index that outputs the thumbnail data 15.

人物重要度スコア算出部２４は、処理対象フレームデータ１２の人物重要度スコアを算出する。映像データ１１において大きくかつ長く映っている人物を重要な人物と推定し、人物重要度スコアは、その重要な人物が映っているフレームデータについて高くなるように設定される。 The person importance score calculation unit 24 calculates the person importance score of the processing target frame data 12. A person who appears large and long in the video data 11 is estimated to be an important person, and the person importance score is set to be high for the frame data in which the important person appears.

顔領域面積スコア算出部２５は、処理対象フレームデータ１２の顔領域面積スコアを算出する。顔領域面積スコアは、処理対象フレームデータ１２において認識された人物の顔部分の面積の、条件データ１３として設定された最適面積に対する近似性を示す。最適面積は、サムネイルデータ１５に表示したい顔領域面積の指標である。 The face area area score calculation unit 25 calculates the face area area score of the processing target frame data 12. The face area area score indicates the closeness of the area of the face portion of the person recognized in the processing target frame data 12 to the optimum area set as the condition data 13. The optimum area is an index of the area of the face area to be displayed in the thumbnail data 15.

例えば、映画、一人の人に密着したドキュメンタリー番組等で、人物のアップの画像をサムネイルデータ１５として選択したい場合、最適面積に１００％または１００％に近い値が設定される。風景を紹介する番組等で風景をサムネイルデータ１５として選択したい場合、最適面積に０％または０％に近い値が設定される。またお笑い番組等で、複数の人物が一つのパフォーマンスをするシーンをサムネイルデータ１５として選択したい場合、一人当たりの顔の面積に相当する値が、最適面積に設定される。 For example, in a movie, a documentary program closely attached to one person, or the like, when it is desired to select a close-up image of a person as the thumbnail data 15, the optimum area is set to 100% or a value close to 100%. When it is desired to select a landscape as the thumbnail data 15 in a program that introduces a landscape, the optimum area is set to 0% or a value close to 0%. Further, in a comedy program or the like, when a plurality of people want to select a scene in which one performance is performed as the thumbnail data 15, a value corresponding to the area of the face per person is set as the optimum area.

表情スコア算出部２６は、処理対象フレームデータ１２の表情スコアを算出する。処理対象フレームデータ１２の表情スコアを算出する。表情スコアは、処理対象フレームデータ１２で認識される人物の表情が豊かなフレームデータについて、高くなるように設定される。 The facial expression score calculation unit 26 calculates the facial expression score of the processing target frame data 12. The facial expression score of the processing target frame data 12 is calculated. The facial expression score is set to be higher for frame data in which the facial expression of the person recognized in the processing target frame data 12 is rich.

音量スコア算出部２７は、処理対象フレームデータ１２の音量スコアを算出する。音量スコアは、処理対象フレームデータ１２の時間に対応する音量が大きい場合に高くなるように設定される。例えば映画やバラエティ番組などで音量が大きい時間において盛り上がると推定できるので、音量スコアは、音量の大きい時間のフレームデータについて高くなるように設定される。音量スコアは、音量が大きいことから盛り上がりが大きいと推測されるフレームデータをサムネイルデータ１５として出力したい場合に好適である。 The volume score calculator 27 calculates the volume score of the frame data 12 to be processed. The volume score is set to be high when the volume of the processing target frame data 12 corresponding to time is large. For example, in a movie or variety program, it can be estimated that the sound volume will rise during a time when the volume is high, so the volume score is set to be high for the frame data during the time when the volume is high. The volume score is suitable when it is desired to output, as the thumbnail data 15, frame data that is estimated to have a large excitement because of a large volume.

統合スコア算出部２８は、処理対象フレームデータ１２の統合スコアを算出する。処理対象フレームデータ１２の統合スコアを算出する。統合スコアは、人物重要度スコア、顔領域面積スコア、表情スコアおよび音量スコアのうちの１つ以上を含む複数のスコアについて、条件データ１３で定める重みに基づいて算出される。統合スコアは、複数の視点に基づいてサムネイルデータ１５を出力したい場合に好適である。 The integrated score calculation unit 28 calculates the integrated score of the processing target frame data 12. The integrated score of the frame data 12 to be processed is calculated. The integrated score is calculated based on the weight determined by the condition data 13 for a plurality of scores including one or more of the person importance score, the face area area score, the facial expression score, and the volume score. The integrated score is suitable when it is desired to output the thumbnail data 15 based on a plurality of viewpoints.

人物重要度スコア算出部２４、顔領域面積スコア算出部２５、表情スコア算出部２６、音量スコア算出部２７および統合スコア算出部２８の各処理は、後に詳述する。 Each processing of the person importance score calculation unit 24, the face area area score calculation unit 25, the facial expression score calculation unit 26, the volume score calculation unit 27, and the integrated score calculation unit 28 will be described in detail later.

サムネイル出力部２９は、スコア算出部２３によって算出された各スコアに基づいて、スコアの高い順に、条件データ１３で指定されたサムネイル数の処理対象フレームデータ１２を、サムネイルデータ１５として出力する。サムネイル出力部２９は、人物重要度スコア、顔領域面積スコア、表情スコアおよび音量スコアのうちの一つのスコアに基づいて、そのスコアの高い処理対象フレームデータ１２をサムネイルデータ１５として出力する。或いはサムネイル出力部２９は、統合スコアに基づいて、複数のスコアを考慮してサムネイルデータ１５を出力する。 The thumbnail output unit 29 outputs, as thumbnail data 15, the processing target frame data 12 of the number of thumbnails designated by the condition data 13 in the descending order of score based on each score calculated by the score calculation unit 23. The thumbnail output unit 29 outputs the processing target frame data 12 having a high score as the thumbnail data 15 based on one of the person importance score, the face area area score, the facial expression score, and the volume score. Alternatively, the thumbnail output unit 29 outputs the thumbnail data 15 in consideration of a plurality of scores based on the integrated score.

（人物重要度スコア算出部）
人物重要度スコア算出部２４は、処理対象フレームデータ１２で認識される各人物について、人物が認識された処理対象フレームデータの次の処理対象のフレームデータまでの時間を加算した登場時間を、処理対象フレームデータで認識される各人物について算出し、各人物の登場時間の最大値に対する人物の登場時間の割合である登場人物スコアを、各人物について算出する。人物重要度スコア算出部２４は、さらに、処理対象フレームデータ１２において最も顔領域の割合の大きい人物の登場人物スコアを、処理対象フレームデータ１２の人物重要度スコアとして算出する。人物重要度スコア算出部２４は、処理対象フレームデータ１２のそれぞれについて、人物重要度スコアを算出して、スコアデータ１４に記憶する。 (Person importance score calculation unit)
The person importance score calculation unit 24 processes, for each person recognized in the processing target frame data 12, the appearance time obtained by adding the time until the next processing target frame data of the processing target frame data in which the person is recognized, It is calculated for each person recognized in the target frame data, and the character score, which is the ratio of the person's appearance time to the maximum appearance time of each person, is calculated for each person. The person importance score calculation unit 24 further calculates the character score of the person having the largest proportion of the face area in the processing target frame data 12 as the person importance score of the processing target frame data 12. The person importance score calculation unit 24 calculates a person importance score for each of the processing target frame data 12 and stores it in the score data 14.

人物重要度スコアのみに基づいてサムネイルデータ１５を出力する場合、サムネイル出力部２９は、登場人物スコアの高いフレームデータを、サムネイルデータ１５として出力する。 When outputting the thumbnail data 15 based on only the person importance score, the thumbnail output unit 29 outputs frame data having a high character score as the thumbnail data 15.

図４を参照して、人物重要度スコア算出部２４による人物重要度スコア算出処理を説明する。 The person importance score calculation process by the person importance score calculator 24 will be described with reference to FIG.

まずステップＳ１０１において人物重要度スコア算出部２４は、処理対象フレームデータ１２のそれぞれについて、映っている人物を認識する。人物認識においては、例えば、Azure、AWS、Google等のAPI(Application Programming Interface)が用いられても良い。その結果、図５に示すように、フレームの識別子毎に、各フレームデータで認識された人物の関係を特定する。図５に示す例では、フレームの識別子f1の処理対象フレームデータ１２において、人物P1およびP2が認識される。フレームの識別子f2の処理対象フレームデータ１２において、人物P1およびP3が認識され、フレームの識別子f3の処理対象フレームデータ１２において、人物P1、P2およびP3が認識される。 First, in step S101, the person importance score calculation unit 24 recognizes the person shown in each of the processing target frame data 12. In person recognition, for example, API (Application Programming Interface) of Azure, AWS, Google, etc. may be used. As a result, as shown in FIG. 5, the relationship of the persons recognized in each frame data is specified for each frame identifier. In the example shown in FIG. 5, the persons P1 and P2 are recognized in the processing target frame data 12 of the frame identifier f1. The persons P1 and P3 are recognized in the processing target frame data 12 of the frame identifier f2, and the persons P1, P2, and P3 are recognized in the processing target frame data 12 of the frame identifier f3.

ステップＳ１０２において人物重要度スコア算出部２４は、各人物が認識されるフレームデータを特定する。図５の例において、人物P1は、フレーム識別子f1、f2およびf3の各処理対象フレームデータ１２において認識される。人物P2は、フレーム識別子f1およびf3の各処理対象フレームデータ１２において認識される。人物P3は、フレーム識別子f2およびf3の各処理対象フレームデータ１２において認識される。 In step S102, the person importance score calculation unit 24 identifies the frame data in which each person is recognized. In the example of FIG. 5, the person P1 is recognized in each frame data 12 to be processed having the frame identifiers f1, f2, and f3. The person P2 is recognized in each processing target frame data 12 of the frame identifiers f1 and f3. The person P3 is recognized in each processing target frame data 12 of the frame identifiers f2 and f3.

ステップＳ１０３において人物重要度スコア算出部２４は、各人物について、登場時間を算出する。登場時間は、人物が認識された処理対象フレームデータ１２の次の処理対象のフレームデータまでの時間を加算した時間である。 In step S103, the person importance score calculation unit 24 calculates the appearance time of each person. The appearance time is the time obtained by adding the time until the next frame data to be processed after the frame data 12 to be processed in which a person is recognized.

処理対象フレームデータにおいて認識される人物の登場時間は、映画の場合、主人公の時間が長い傾向がある。また旅番組等のロケ番組の場合、リポーターの登場時間が長くなり、例えばロケ番組で訪れた店舗の店員、ロケ中にリポーターとすれ違った人の登場時間は、リポーターの登場時間よりも短い傾向がある。 In the case of a movie, the appearance time of a person recognized in the frame data to be processed tends to be long. Also, in the case of location programs such as travel programs, the reporter's appearance time will be longer, and for example, the clerk of the store that visited the location program, the appearance time of the person who passed the reporter during the location, tends to be shorter than the reporter's appearance time. is there.

図６に示すように、処理対象フレームデータ１２は、映像データ１１から時間Δt毎に１枚抽出されたフレームデータであるとする。この場合、ある人物の登場時間は、処理対象フレームデータ１２のうち、その人物が認識された枚数×Δｔとなる。ここでは映像データ１１から均一に処理対象フレームデータ１２が抽出される場合を説明するが、ランダムに抽出される場合も同様に、人物が認識された処理対象フレームデータの次の処理対象のフレームデータまでの時間を加算することで、その人物の登場時間が算出される。 As shown in FIG. 6, it is assumed that the frame data 12 to be processed is frame data extracted from the video data 11 one at a time Δt. In this case, the appearance time of a person is the number of times the person is recognized in the processing target frame data 12×Δt. Here, the case where the processing target frame data 12 is uniformly extracted from the video data 11 will be described, but similarly in the case where the processing target frame data 12 is randomly extracted, the processing target frame data next to the processing target frame data in which a person is recognized is similarly processed. The appearance time of the person is calculated by adding the time up to.

各人物の登場時間が算出されると、ステップＳ１０４において人物重要度スコア算出部２４は、各人物の登場人物スコアを算出する。登場人物スコアは、映像データ１１における各人物の登場時間の最大値に対する人物の登場時間の割合である。人物Pi(i=1,2,…,n：nは処理対象フレームにおいて認識された人物数)の登場人物スコアSp(Pi)は、式（１）で表される。ここで、Tiは、人物Piの登場時間である。
Sp(Pi)=Ti/max(Ti) ・・・式（１） When the appearance time of each person is calculated, the person importance score calculation unit 24 calculates the appearance person score of each person in step S104. The character score is the ratio of the character appearance time to the maximum value of the character appearance time in the video data 11. The character score Sp(Pi) of the person Pi (i=1, 2,..., N: n is the number of persons recognized in the processing target frame) is represented by Expression (1). Here, Ti is the appearance time of the person Pi.
Sp(Pi)=Ti/max(Ti) ・・・Equation (1)

ここで、人物P1の登場時間T1は10秒で、人物P2の登場時間T2は15秒、人物P3の登場時間T3は20秒とする。映像データ１１における各人物の登場時間の最大値は、人物P3の登場時間T3の20秒である。従って、人物P1の登場人物スコアは、10/20である。人物P2の登場人物スコアは、15/20である。人物P3の登場人物スコアは、20/20である。 Here, the appearance time T1 of the person P1 is 10 seconds, the appearance time T2 of the person P2 is 15 seconds, and the appearance time T3 of the person P3 is 20 seconds. The maximum appearance time of each person in the video data 11 is 20 seconds, which is the appearance time T3 of the person P3. Therefore, the character score of the person P1 is 10/20. The character score of the character P2 is 15/20. The character score of the character P3 is 20/20.

各人物について登場人物スコアが算出されると、人物重要度スコア算出部２４は、各処理対象フレームデータ１２について、人物重要度スコアを算出する処理を繰り返す。まずステップＳ１０５において人物重要度スコア算出部２４は、処理対象フレームデータ１２において顔領域の最も大きい人物を特定する。ステップＳ１０６において顔領域の最も大きい人物の登場人物スコアを、この処理対象フレームデータ１２の人物重要度スコアとして出力する。 When the character score is calculated for each person, the person importance score calculation unit 24 repeats the process of calculating the person importance score for each processing target frame data 12. First, in step S105, the person importance score calculation unit 24 identifies the person having the largest face area in the processing target frame data 12. In step S106, the character score of the person having the largest face area is output as the person importance score of the processing target frame data 12.

ある処理対象フレームデータfにおいて顔領域の面積が最大となる人物が人物Pxの場合、登場人物スコアS1(f)は、例えば式（２）により算出される。
S1(f)=Sp(Px) ・・・式（２） When the person having the largest face area in a certain processing target frame data f is the person Px, the character score S1(f) is calculated, for example, by the equation (2).
S1(f)=Sp(Px) ・・・Equation (2)

例えば、フレーム識別子f1の処理対象フレームデータ１２において人物P2の顔領域が人物P1の顔領域よりも大きい場合、フレーム識別子f1の処理対象フレームデータ１２の人物重要度スコアは、人物P2の登場人物スコアである15/20となる。 For example, when the face area of the person P2 is larger than the face area of the person P1 in the processing target frame data 12 of the frame identifier f1, the person importance score of the processing target frame data 12 of the frame identifier f1 is the character score of the person P2. It will be 15/20.

各処理対象フレームデータ１２について人物重要度スコアを算出し、スコアデータ１４に出力すると、人物重要度スコア算出部２４は処理を終了する。 When the person importance score is calculated for each processing target frame data 12 and output to the score data 14, the person importance score calculation unit 24 ends the processing.

（顔領域面積スコア算出部）
顔領域面積スコア算出部２５は、処理対象フレームデータ１２および処理対象フレームデータ１２で認識される人物の顔領域の面積を算出する。顔領域面積スコア算出部２５は、最適面積に近い顔領域の面積を有する処理対象フレームデータ１２について高くなり、最適面積に遠い顔領域の面積を有する処理対象フレームデータ１２について低くなる顔領域面積スコアを算出する。ここで、最適面積は、条件データ１３において設定される。 (Face area area score calculation unit)
The face area area score calculation unit 25 calculates the area of the processing target frame data 12 and the face area of the person recognized in the processing target frame data 12. The face area area score calculation unit 25 becomes high for the processing target frame data 12 having the area of the face area close to the optimum area and becomes low for the processing target frame data 12 having the area of the face area far from the optimum area. To calculate. Here, the optimum area is set in the condition data 13.

顔領域面積スコア算出部２５は、処理対象フレームデータ１２において複数の人物が認識された場合、顔領域面積スコアを算出する方法はいくつか考えられる。例えば顔領域面積スコア算出部２５は、最も顔領域の面積の大きい人物の顔領域の面積と、最適面積との比較に基づいて顔領域面積スコアを算出しても良い。顔領域面積スコア算出部２５は、所定の閾値以上の顔領域の面積を有する各人物の顔領域の面積の平均値と、最適面積との比較に基づいて顔領域面積スコアを算出しても良い。顔領域面積スコア算出部２５は、所定の閾値以上の顔領域の面積を有する各人物の顔領域の面積のうち最も小さい顔領域の面積と、最適面積との比較に基づいて顔領域面積スコアを算出しても良い。 The face area area score calculation unit 25 may consider several methods for calculating the face area area score when a plurality of persons are recognized in the processing target frame data 12. For example, the face area area score calculation unit 25 may calculate the face area area score based on a comparison between the area of the face area of the person having the largest area and the optimum area. The face area area score calculation unit 25 may calculate the face area area score based on a comparison between the average area of the face areas of the persons having the area of the face area equal to or larger than a predetermined threshold and the optimum area. .. The face area area score calculation unit 25 calculates a face area area score based on a comparison between the area of the smallest face area among the areas of the face areas of each person having the area of the face area equal to or larger than a predetermined threshold and the optimum area. It may be calculated.

顔領域面積スコアのみに基づいてサムネイルデータ１５を出力する場合、サムネイル出力部２９は、顔領域面積スコアの高いフレームデータを、サムネイルデータ１５として出力する。 When outputting the thumbnail data 15 based only on the face area area score, the thumbnail output unit 29 outputs frame data having a high face area area score as the thumbnail data 15.

図７を参照して、顔領域面積スコア算出部２５による顔領域面積スコア算出処理を説明する。図７は、処理対象フレームデータ１２において複数の人物が認識された場合、処理対象フレームデータ１２において最も顔領域の大きい人物の顔領域の面積と、条件データ１３で設定された最適面積とに基づいて、顔領域面積スコアを算出する例を説明する。顔領域面積スコア算出部２５は、処理対象フレームデータ１２のそれぞれについて、顔領域面積スコアを算出する処理を繰り返す。 The face area area score calculation processing by the face area area score calculation unit 25 will be described with reference to FIG. 7. FIG. 7 is based on the area of the face area of the person having the largest face area in the processing target frame data 12 and the optimum area set in the condition data 13 when a plurality of persons are recognized in the processing target frame data 12. An example of calculating the face area area score will be described. The face area area score calculation unit 25 repeats the processing of calculating the face area area score for each of the processing target frame data 12.

ステップＳ２０１において顔領域面積スコア算出部２５は、処理対象フレームデータ１２において映っている人物を認識する。図４のステップＳ１０１と同様に、人物認識においては、例えば、Azure、AWS、Google等の既存のAPI(Application Programming Interface)が用いられても良い。ステップＳ２０２において、ステップＳ２０１で認識された各人物のうち、最も顔領域の面積が大きい人物の顔領域の面積を算出する。 In step S201, the face area area score calculation unit 25 recognizes the person shown in the processing target frame data 12. Similar to step S101 in FIG. 4, in the person recognition, an existing API (Application Programming Interface) such as Azure, AWS, or Google may be used. In step S202, the area of the face area of the person whose face area is the largest among the persons recognized in step S201 is calculated.

フレーム識別子ｆの顔領域スコアをS2(f)、最適面積をSpace_bestとする場合、顔領域スコアS2(f)は、例えば、式（３）により算出される。ここでwは、最適面積Space_bestからどのくらい離れることを許容するかを表すパラメータである。wは、デフォルトの値が与えられても良いし、予め条件データ１３に設定されても良い。Space(Px,f)は、処理対象フレームデータ１２の画素数に対する、最も顔領域の面積が大きい人物の顔領域の画素数の割合である。
S2(f)=exp(-1*w*abs(1-sqrt(Space(Px,f)/Space_best))) ・・・式（３） When the face area score of the frame identifier f is S2(f) and the optimum area is Space_best, the face area score S2(f) is calculated by, for example, Expression (3). Here, w is a parameter indicating how far away from the optimum area Space_best is allowed. As for w, a default value may be given or it may be set in the condition data 13 in advance. Space(Px,f) is the ratio of the number of pixels in the face area of the person having the largest face area to the number of pixels in the processing target frame data 12.
S2(f)=exp(-1*w*abs(1-sqrt(Space(Px,f)/Space_best))) ・・・Equation (3)

ステップＳ２０３において顔領域面積スコア算出部２５は、最適面積との近さに応じて、顔領域面積スコアを算出する。例えば、最適面積が０．２の場合、ステップＳ２０２で算出された顔領域の面積が０．２の処理対象フレームデータ１２の顔領域面積スコアは、ステップＳ２０２で算出された顔領域の面積が０．５の処理対象フレームデータ１２の顔領域面積スコアよりも高くなる。 In step S203, the face area area score calculation unit 25 calculates the face area area score according to the closeness to the optimum area. For example, when the optimum area is 0.2, the face area area score of the processing target frame data 12 in which the area of the face area calculated in step S202 is 0.2 is 0 in the area of the face area calculated in step S202. It becomes higher than the face area area score of the processing target frame data 12 of 0.5.

（表情スコア算出部）
表情スコア算出部２６は、処理対象フレームデータ１２で認識される人物について、表情の種類に対する表情値を算出する。表情スコア算出部２６は、処理対象フレームデータ１２の人物の各表情の種類の表情値の合計に対する、各表情値のうちの最大値を、フレームデータの表情スコアとして算出する。 (Facial expression score calculator)
The facial expression score calculation unit 26 calculates the facial expression value for the type of facial expression of the person recognized in the processing target frame data 12. The facial expression score calculation unit 26 calculates, as the facial expression score of the frame data, the maximum value of the facial expression values with respect to the total of the facial expression values of the facial expression types of the person in the processing target frame data 12.

表情スコア算出部２６は、処理対象フレームデータ１２において複数の人物が認識された場合、表情スコアを算出する方法はいくつか考えられる。例えば表情スコア算出部２６は、最も顔領域の面積の大きい人物の表情値に基づいて、表情スコアを算出しても良い。表情スコア算出部２６は、所定の閾値以上の顔領域の面積を有する各人物の表情値に基づいて各人物の表情スコアを算出して、算出された表情スコアを平均値を、処理対象フレームデータ１２の表情スコアとしても良い。表情スコア算出部２６は、所定の閾値以上の顔領域の面積を有する各人物の顔領域の面積のうち最も小さい顔領域の面積を有する人物の表情値に基づいて、表情スコアを算出しても良い。 When a plurality of persons are recognized in the processing target frame data 12, the facial expression score calculation unit 26 can consider several methods for calculating the facial expression score. For example, the facial expression score calculation unit 26 may calculate the facial expression score based on the facial expression value of the person with the largest face area. The facial expression score calculation unit 26 calculates the facial expression score of each person based on the facial expression value of each person having the area of the face region that is equal to or larger than a predetermined threshold value, and calculates the average value of the calculated facial expression scores as the processing target frame data. It may be 12 facial expression scores. The facial expression score calculation unit 26 may calculate the facial expression score based on the facial expression value of the person having the smallest face area of the face areas of the persons having the area of the facial area equal to or larger than the predetermined threshold. good.

表情スコアのみに基づいてサムネイルデータ１５を出力する場合、サムネイル出力部２９は、表情スコアの高いフレームデータを、サムネイルデータ１５として出力する。 When outputting the thumbnail data 15 based on only the facial expression score, the thumbnail output unit 29 outputs frame data having a high facial expression score as the thumbnail data 15.

図８を参照して、表情スコア算出部２６による表情スコア算出処理を説明する。表情スコア算出部２６は、各処理対象フレームデータ１２について、表情スコアを算出する処理を繰り返す。 The facial expression score calculation processing by the facial expression score calculation unit 26 will be described with reference to FIG. 8. The facial expression score calculation unit 26 repeats the processing for calculating the facial expression score for each processing target frame data 12.

ステップＳ３０１において表情スコア算出部２６は、処理対象フレームデータ１２において、最も顔領域の大きい人物を特定する。ステップＳ３０２において表情スコア算出部２６は、ステップＳ３０１で特定した最も顔領域の大きい人物について、各表情の種類に対する表情値を算出する。表情値の算出においては、既存のAPIが用いられても良い。 In step S301, the facial expression score calculation unit 26 specifies the person having the largest face area in the processing target frame data 12. In step S302, the facial expression score calculation unit 26 calculates the facial expression value for each facial expression type for the person with the largest face area identified in step S301. An existing API may be used to calculate the facial expression value.

本発明の実施の形態において図９に示すように、表情の種類として、喜び、怒り、悲しみおよび驚きがあり、それぞれについて、表情値が設定される場合を説明する。ここでは、フレームの識別子f1において、最も顔領域の大きい人物P1について、喜びの表情値5、怒りの表情値0、悲しみの表情値0および驚きの表情値1が算出されたとする。 In the embodiment of the present invention, as shown in FIG. 9, there are joy, anger, sadness and surprise as facial expression types, and a case in which facial expression values are set for each will be described. Here, it is assumed that for the person P1 having the largest face area, the joy expression value 5, the anger expression value 0, the sadness expression value 0, and the surprise expression value 1 are calculated for the person P1 having the largest face area in the frame identifier f1.

ステップＳ３０３において表情スコア算出部２６は、ステップＳ３０２で算出された表情値から、表情スコアを算出する。表情スコアは、表情値の合計に対する表情値の最大値の割合である。処理対象フレームデータ１２の表情スコアS3(f)は、処理対象フレームデータ１２において認識された人物の各表情の種類に対する表情値Sej(f)を用いて、例えば式（４）により算出される。
S3(f)=max(Sej(f))/Σ(Sej(f)) ・・・式（４） In step S303, the facial expression score calculation unit 26 calculates a facial expression score from the facial expression value calculated in step S302. The facial expression score is the ratio of the maximum facial expression value to the total facial expression value. The facial expression score S3(f) of the processing target frame data 12 is calculated by using, for example, the expression value Sej(f) for each facial expression type of the person recognized in the processing target frame data 12 according to Expression (4).
S3(f)=max(Sej(f))/Σ(Sej(f)) ・・・Equation (4)

図９に示すフレームの識別子f1について、喜びの表情値5、怒りの表情値0、悲しみの表情値0および驚きの表情値1と算出されると、表情値の合計は6で、最大の表情値は5であるので、表情スコアは5/6となる。 For the frame identifier f1 shown in FIG. 9, when the expression value of joy is 5, the expression value of anger is 0, the expression value of sadness is 0, and the expression value of surprise is 1, the total expression value is 6 and the maximum expression value is 6. Since the value is 5, the facial expression score is 5/6.

各処理対象フレームデータ１２について表情スコアを算出し、スコアデータ１４に出力すると、表情スコア算出部２６は処理を終了する。 When the facial expression score is calculated for each processing target frame data 12 and output to the score data 14, the facial expression score calculation unit 26 ends the processing.

なお、本発明の実施の形態において表情スコアは、表情の種類を問わず、何らかの種類の表情の表情値が高い処理対象フレームデータがサムネイルデータ１５として選択される場合を説明するが、これに限らない。例えば、条件データ１３に表情の種類を設定し、設定された表情の種類の表情値が高いフレームデータが、サムネイルデータ１５として出力されるようにしても良い。 In the embodiment of the present invention, the facial expression score will be described as a case where the processing target frame data having a high facial expression value of any kind of facial expression is selected as the thumbnail data 15 regardless of the facial expression type, but the present invention is not limited to this. Absent. For example, the facial expression type may be set in the condition data 13, and frame data having a high facial expression value of the set facial expression type may be output as the thumbnail data 15.

（音量スコア算出部）
音量スコア算出部２７は、映像データ１１の音量が大きい時間に対応するフレームデータについて高くなり、音量が小さい時間に対応するフレームデータについて低くなる音量スコアを算出する。 (Volume score calculator)
The volume score calculation unit 27 calculates a volume score that becomes higher for frame data corresponding to a time when the volume of the video data 11 is high and becomes lower for frame data corresponding to a time when the volume of the video data 11 is low.

音量スコアのみに基づいてサムネイルデータ１５を出力する場合、サムネイル出力部２９は、音量スコアの高いフレームデータを、サムネイルデータ１５として出力する。 When outputting the thumbnail data 15 based on only the volume score, the thumbnail output unit 29 outputs frame data with a high volume score as the thumbnail data 15.

図１０を参照して、音量スコア算出部２７による音量スコア算出処理を説明する。 The volume score calculation processing by the volume score calculation unit 27 will be described with reference to FIG. 10.

まずステップＳ４０１において音量スコア算出部２７は、映像データ１１の音量の推移を、滑らかな推移に変換する。音量スコア算出部２７は、例えば、映像データ１１の時間に対して微少な時間毎に、ガウシアンで２回畳み込み積分を算出し、式（５）により、フレーム識別子ｆの音量スコアS4(f)の推移が算出される。式（５）におけるV(f)は、映像データ１１におけるフレーム識別子ｆの時間に対応する積分値である。
S4(f)=4*(V(f)-0.5)^2 ・・・式（５） First, in step S401, the volume score calculation unit 27 converts the transition of the volume of the video data 11 into a smooth transition. The sound volume score calculation unit 27 calculates the convolution integral twice with Gaussian for each minute time with respect to the time of the video data 11, and calculates the sound volume score S4(f) of the frame identifier f from Expression (5). The transition is calculated. V(f) in the equation (5) is an integral value corresponding to the time of the frame identifier f in the video data 11.
S4(f)=4*(V(f)-0.5)^2 ・・・Equation (5)

ステップＳ４０２において音量スコア算出部２７は、各処理対象フレームデータ１２について、音量スコアを算出する処理を繰り返す。音量スコア算出部２７は、ステップＳ４０１で算出した音量の推移から、処理対象フレームデータ１２の時間に対する値を、処理対象フレームデータ１２の音量スコアとして取得する。 In step S402, the volume score calculation unit 27 repeats the process of calculating the volume score for each processing target frame data 12. The volume score calculation unit 27 acquires the value of the processing target frame data 12 with respect to time from the transition of the volume calculated in step S401 as the volume score of the processing target frame data 12.

各処理対象フレームデータ１２について音量スコアを算出し、スコアデータ１４に出力すると、音量スコア算出部２７は処理を終了する。 When the volume score is calculated for each processing target frame data 12 and is output to the score data 14, the volume score calculation unit 27 ends the process.

（統合スコア算出部）
統合スコア算出部２８は、処理対象フレームデータ１２について、人物重要度スコア、顔領域面積スコア、表情スコアおよび音量スコアのうちの１つ以上を含む複数のスコアに、重みをそれぞれ乗算して加算した統合スコアを算出する。 (Integrated score calculator)
The integrated score calculation unit 28 multiplies and adds a plurality of scores including one or more of the person importance score, the face area score, the facial expression score, and the sound volume score to the processing target frame data 12 by adding weights. Calculate the integrated score.

例えば、人物重要度スコア、顔領域面積スコア、表情スコアおよび音量スコアの４つのスコアを統合した統合スコアを算出する場合、式（６）により算出される。式（６）において、Sall(f)、S1(f)、S2(f)、S3(f)およびS4(f)はそれぞれ、フレームの識別子ｆの処理対象フレームデータ１２における統合スコア、人物重要度スコア、顔領域面積スコア、表情スコアおよび音量スコアである。a1、a2、a3およびa4はそれぞれ、条件データ１３で設定される人物重要度スコア、顔領域面積スコア、表情スコアおよび音量スコアの重みである。 For example, when calculating an integrated score that integrates the four scores of the person importance score, the face area area score, the facial expression score, and the volume score, the integrated score is calculated by Expression (6). In Expression (6), Sall(f), S1(f), S2(f), S3(f), and S4(f) are respectively the integrated score and the person importance degree in the processing target frame data 12 of the frame identifier f. A score, a face area area score, a facial expression score, and a volume score. a1, a2, a3, and a4 are weights of the person importance score, the face area score, the facial expression score, and the sound volume score set in the condition data 13, respectively.

Sall(f)=Σai*Si(f) (i=1.2.3.4) ・・・式（６） Sall(f)=Σai*Si(f) (i=1.2.3.4) ・・・Equation (6)

統合スコア算出部２８は、人物重要度スコア、顔領域面積スコア、表情スコアおよび音量スコアのうちの１つ以上と、他のスコアを含めて、２つ以上のスコアに基づいて、統合スコアを算出する。統合スコア算出部２８は、各スコアに対してそれぞれ重みを乗算して加算することで、統合スコアを算出する。 The integrated score calculation unit 28 calculates an integrated score based on two or more scores including one or more of a person importance score, a face area area score, a facial expression score, and a volume score, and other scores. To do. The integrated score calculation unit 28 calculates an integrated score by multiplying and adding weights to the respective scores.

統合スコアに基づいてサムネイルデータ１５を出力する場合、サムネイル出力部２９は、統合スコアの高いフレームデータを、サムネイルデータ１５として出力する。 When outputting the thumbnail data 15 based on the integrated score, the thumbnail output unit 29 outputs frame data with a high integrated score as the thumbnail data 15.

このように本発明の実施の形態に係るサムネイル出力装置１は、ユーザの意図を反映したサムネイルデータ１５を出力することができる。 In this way, the thumbnail output device 1 according to the embodiment of the present invention can output the thumbnail data 15 that reflects the user's intention.

例えば、映像データ１１が映画、ドラマ等の場合、サムネイルデータ１５にメインキャストが映っていることが求められる。従って、人物重要度スコアのみに基づいて、或いは人物重要度スコアの重みを高く設定して算出された統合スコアに基づいて、サムネイルデータ１５を抽出することにより、所望のサムネイルデータ１５を出力することができる。一方、風景のロケ番組では、人物よりも風景を撮影した瞬間の画像をサムネイルとして抽出することが求められる。従って、ロケ番組の映像データ１１について、人物重要度スコアの重みを、他のスコアより低く設定して算出された統合スコアに基づいて、サムネイルデータ１５を抽出することにより、所望のサムネイルデータを出力することができる。 For example, when the video data 11 is a movie, drama, or the like, it is required that the main cast be shown in the thumbnail data 15. Therefore, the desired thumbnail data 15 is output by extracting the thumbnail data 15 based on only the person importance score or the integrated score calculated by setting the weight of the person importance score high. You can On the other hand, in a landscape location program, it is required to extract an image at the moment when the landscape is photographed as a thumbnail rather than a person. Therefore, with respect to the video data 11 of the location program, the thumbnail data 15 is extracted based on the integrated score calculated by setting the weight of the person importance score lower than the other scores, thereby outputting the desired thumbnail data. can do.

また映像データ１１に人物が映っている場合でも、ユーザ所望のサムネイルデータ１５を選択する指標が異なる場合がある。例えば映画、ドラマ等ではメインキャストが映っているサムネイルデータ１５が求められる一方、バラエティ番組でリポーターが商品を紹介している場合、サムネイルデータ１５に商品が大きく映りリポーターは小さく映っていることが求められる。従って、このようなバラエティ番組等の映像データ１１について、人物重要度スコアの重みを、他のスコアより低く設定して算出された統合スコアに基づいて、サムネイルデータ１５を抽出することにより、所望のサムネイルデータ１５を出力することができる。 Even when a person is shown in the video data 11, the index for selecting the thumbnail data 15 desired by the user may be different. For example, in movies and dramas, the thumbnail data 15 showing the main cast is required, while when the reporter introduces the product in a variety program, the thumbnail data 15 needs to show the product large and the reporter small. Be done. Therefore, for the video data 11 such as such a variety program, by extracting the thumbnail data 15 based on the integrated score calculated by setting the weight of the person importance score lower than the other scores, the desired data can be obtained. The thumbnail data 15 can be output.

ドキュメンタリー番組のように一人の人に密着した番組のサムネイルを作る場合、サムネイルデータに、密着対象の人物が映っているだけではなく、その人の表情が笑顔だったり、怒っていたり、というように番組に適した表情が含まれることが求められる。従って、表情スコアのみに基づいて、或いは表情スコアの重みを高く設定して算出された統合スコアに基づいて、サムネイルデータ１５を抽出することにより、所望のサムネイルデータ１５を出力することができる。 When making a thumbnail of a program that closely adheres to one person, such as a documentary program, not only does the thumbnail data show the person to be closely contacted, but the facial expression of that person is smiling or angry. It is required to include facial expressions suitable for the program. Therefore, the desired thumbnail data 15 can be output by extracting the thumbnail data 15 based on only the facial expression score or based on the integrated score calculated by setting the weight of the facial expression score high.

バラエティ番組では、キャストの表情だけではなく、客が盛り上がった瞬間が番組のハイライトとなるため、盛り上がっている瞬間のフレームデータをサムネイルデータ１５として抽出することが求められる。従って、音量スコアのみに基づいて、或いは音量スコアの重みを高く設定して算出された統合スコアに基づいて、サムネイルデータ１５を抽出することにより、所望のサムネイルデータ１５を出力することができる。一方、ロケ番組では、リポーターが話している瞬間よりも、静かに景色を撮影した瞬間のフレームデータをサムネイルデータ１５として抽出することが求められる。従って、ロケ番組の映像データ１１について、音量スコアの重みを、他のスコアより低く設定して算出された統合スコアに基づいて、サムネイルデータ１５を抽出することにより、所望のサムネイルデータ１５を出力することができる。 In the variety program, not only the expression of the cast but also the moment when the customer gets excited becomes the highlight of the program. Therefore, it is required to extract the frame data at the moment when the customer is excited as the thumbnail data 15. Therefore, the desired thumbnail data 15 can be output by extracting the thumbnail data 15 based on only the volume score or based on the integrated score calculated by setting the weight of the volume score high. On the other hand, in the location program, it is required to extract the frame data at the moment when the scenery is photographed quietly as the thumbnail data 15 rather than the moment when the reporter is talking. Therefore, for the video data 11 of the location program, the desired thumbnail data 15 is output by extracting the thumbnail data 15 based on the integrated score calculated by setting the weight of the volume score lower than other scores. be able to.

このように、本発明の実施の形態に係るサムネイル出力装置１は、ユーザの意向を反映して、映像データ１１のサムネイルデータ１５を出力することができる。 As described above, the thumbnail output device 1 according to the embodiment of the present invention can output the thumbnail data 15 of the video data 11 by reflecting the intention of the user.

（その他の実施の形態）
上記のように、本発明の実施の形態によって記載したが、この開示の一部をなす論述および図面はこの発明を限定するものであると理解すべきではない。この開示から当業者には様々な代替実施の形態、実施例および運用技術が明らかとなる。 (Other embodiments)
As described above, the embodiments of the present invention have been described, but it should not be understood that the description and drawings forming a part of this disclosure limit the present invention. From this disclosure, various alternative embodiments, examples, and operation techniques will be apparent to those skilled in the art.

例えば、本発明の実施の形態に記載したサムネイル出力装置は、図１に示すように一つのハードウエア上に構成されても良いし、その機能や処理数に応じて複数のハードウエア上に構成されても良い。また、既存の情報処理システム上に実現されても良い。 For example, the thumbnail output device described in the embodiment of the present invention may be configured on one piece of hardware as shown in FIG. 1, or may be configured on a plurality of pieces of hardware according to its function and the number of processes. May be done. It may also be realized on an existing information processing system.

本発明はここでは記載していない様々な実施の形態等を含むことは勿論である。従って、本発明の技術的範囲は上記の説明から妥当な特許請求の範囲に係る発明特定事項によってのみ定められるものである。 It goes without saying that the present invention includes various embodiments and the like not described here. Therefore, the technical scope of the present invention is defined only by the matters specifying the invention according to the scope of claims reasonable from the above description.

１サムネイル出力装置
１０記憶装置
１１映像データ
１２処理対象フレームデータ
１３条件データ
１４スコアデータ
１５サムネイルデータ
２０処理装置
２１処理対象フレーム抽出部
２２条件データ取得部
２３スコア算出部
２４人物重要度スコア算出部
２５顔領域面積スコア算出部
２６表情スコア算出部
２７音量スコア算出部
２８統合スコア算出部
２９サムネイル出力部
1 Thumbnail Output Device 10 Storage Device 11 Video Data 12 Processing Target Frame Data 13 Condition Data 14 Score Data 15 Thumbnail Data 20 Processing Device 21 Processing Target Frame Extraction Unit 22 Condition Data Acquisition Unit 23 Score Calculation Unit 24 Person Importance Score Calculation Unit 25 Face area area score calculation unit 26 Facial expression score calculation unit 27 Volume score calculation unit 28 Integrated score calculation unit 29 Thumbnail output unit

Claims

A thumbnail output device for outputting thumbnail data of video data,
For each person recognized in the frame data forming the video data, calculate the appearance time by adding the time until the next processing target frame data of the frame data in which the person is recognized,
While calculating the character score, which is the ratio of the person's appearance time to the maximum value of each person's appearance time, for each person,
A character importance score calculation unit that calculates the character score of the person with the largest face area in the frame data as a person importance score of the frame data;
A thumbnail output device comprising: a thumbnail output unit that outputs the frame data having a high character score as thumbnail data.

A thumbnail output device for outputting thumbnail data of video data,
Calculate the area of the face area of the person recognized in the frame data and the frame data forming the video data,
A face area area score calculation unit that calculates a face area area score that becomes high for frame data having an area of the face area close to the optimum area and becomes low for frame data having an area of the face area far from the optimum area,
A thumbnail output device comprising: a thumbnail output unit that outputs frame data having a high face area area score as thumbnail data.

A thumbnail output device for outputting thumbnail data of video data,
For the person recognized by the frame data that constitutes the video data, calculate the facial expression value for the type of facial expression,
A facial expression score calculation unit that calculates the maximum value of the facial expression values with respect to the total of the facial expression values of the facial expressions of the person in the frame data as the facial expression score of the frame data,
A thumbnail output device comprising: a thumbnail output unit that outputs the frame data having a high facial expression score as thumbnail data.

A thumbnail output device for outputting thumbnail data of video data,
A volume score calculation unit that calculates a volume score that becomes high for frame data corresponding to a time when the volume of the video data is high and becomes low for frame data corresponding to a time when the volume of the video data is low,
A thumbnail output device comprising: a thumbnail output unit that outputs the frame data having a high volume score as thumbnail data.

A thumbnail output device for outputting thumbnail data of video data,
Regarding the frame data of the video data, one of the person importance score according to claim 1, the face area area score according to claim 2, the facial expression score according to claim 3, and the volume score according to claim 4. An integrated score calculation unit that calculates an integrated score by multiplying a plurality of scores including one or more by weights and adding the weights,
A thumbnail output device comprising: a thumbnail output unit that outputs the frame data having a high integrated score as thumbnail data.

A thumbnail output method for outputting thumbnail data of video data,
The computer, for each person recognized in the frame data forming the video data, while calculating the appearance time by adding the time until the next processing target frame data of the frame data in which the person is recognized,
A step of calculating, for each person, a character score, which is the ratio of the time of appearance of the person to the maximum value of the time of appearance of each person,
A step in which the computer calculates the character score of the person having the largest face area in the frame data as a person importance score of the frame data;
The thumbnail output method, wherein the computer outputs the frame data having a high character score as thumbnail data.

A thumbnail output method for outputting thumbnail data of video data,
The computer calculates the area of the face area of the person recognized in the frame data and the frame data forming the video data,
Calculating a face area area score that is high for frame data having an area of the face area close to the optimum area and low for frame data having an area of the face area far from the optimum area;
The thumbnail output method, wherein the computer outputs the frame data having a high face area score as thumbnail data.

A thumbnail output method for outputting thumbnail data of video data,
The computer calculates the facial expression value for the type of facial expression of the person recognized by the frame data that constitutes the video data,
Calculating a maximum value of the facial expression values as a facial expression score of the frame data with respect to a total of facial expression values of the facial expressions of the person in the frame data;
The thumbnail output method, wherein the computer outputs the frame data having a high facial expression score as thumbnail data.

A thumbnail output method for outputting thumbnail data of video data,
A step in which the computer calculates a volume score that becomes high for the frame data corresponding to the time when the volume of the video data is high and becomes low for the frame data corresponding to the time when the volume of the video data is low;
The thumbnail output method, wherein the computer outputs the frame data having the high volume score as thumbnail data.

A thumbnail output method for outputting thumbnail data of video data,
The computer uses the person importance score according to claim 6, the face area area score according to claim 7, the facial expression score according to claim 8, and the volume score according to claim 9 for the frame data of the video data. Calculating a combined score by multiplying a plurality of scores including one or more of them by weights and adding the weights;
The thumbnail output method, wherein the computer outputs the frame data having a high integrated score as thumbnail data.

A thumbnail output program for causing a computer to function as the thumbnail output device according to any one of claims 1 to 5.