JP6793169B2

JP6793169B2 - Thumbnail output device, thumbnail output method and thumbnail output program

Info

Publication number: JP6793169B2
Application number: JP2018213903A
Authority: JP
Inventors: 喜美子川嶋; 安永　健治; 健治安永
Original assignee: Nippon Telegraph and Telephone West Corp
Current assignee: Nippon Telegraph and Telephone West Corp
Priority date: 2018-11-14
Filing date: 2018-11-14
Publication date: 2020-12-02
Anticipated expiration: 2038-11-14
Also published as: JP2020080115A

Description

本発明は、映像データのサムネイルを出力するサムネイル出力装置、サムネイル出力方法およびサムネイル出力プログラムに関する。 The present invention relates to a thumbnail output device for outputting thumbnails of video data, a thumbnail output method, and a thumbnail output program.

一般的に、映像データの１つのフレームデータをサムネイルとして公開し、映像データを紹介する。従って、サムネイルは、映像データの特徴を捉えたフレームデータが用いられることが好ましい。 Generally, one frame data of video data is published as a thumbnail to introduce the video data. Therefore, as the thumbnail, it is preferable to use frame data that captures the characteristics of the video data.

サムネイルは、手動により選択されることも可能であるが、AI(Artificial Intelligence)により、映像データからサムネイルを抽出する技術もある（非特許文献１および非特許文献２参照）。非特許文献１に開示される技術は、主にドラマ、映画等のジャンルを対象として、過去のノウハウを元にサムネイルを作成する。非特許文献２に開示される技術は、映像データについてユーザが設定したサムネイルのうち、再生回数が多くクオリティが高いと推測されるサムネイルと、再生回数が少なくクオリティが低いと推測されるサムネイルを大量に学習させ、サムネイルを生成する。 Thumbnails can be manually selected, but there is also a technique for extracting thumbnails from video data using AI (Artificial Intelligence) (see Non-Patent Document 1 and Non-Patent Document 2). The technology disclosed in Non-Patent Document 1 mainly targets genres such as dramas and movies, and creates thumbnails based on past know-how. The technology disclosed in Non-Patent Document 2 includes a large number of thumbnails set by the user for video data, which are presumed to have a high number of playbacks and high quality, and thumbnails which are presumed to have a low number of playbacks and low quality. To generate thumbnails.

Madeline、外6名、"AVA: The Art and Science of Image Discovery at Netflix"、［online］、2018年2月8日、Netflix Technology Blog、［平成30年10月30日検索］、インターネット〈URL：https://medium.com/netflix-techblog/ava-the-art-and-science-of-image-discovery-at-netflix-a442f163af6〉Madeline, 6 outsiders, "AVA: The Art and Science of Image Discovery at Netflix", [online], February 8, 2018, Netflix Technology Blog, [Search October 30, 2018], Internet <URL: https://medium.com/netflix-techblog/ava-the-art-and-science-of-image-discovery-at-netflix-a442f163af6> Munenori Taniguchi、"なんでも自動化するGoogleがスゴイ YouTubeのサムネイル自動生成を解説、人工ニューラルネットワーク技術で最適フレームを選択"、［online］、2015年10月14日、engadget、［平成30年10月30日検索］、インターネット〈URL：https://japanese.engadget.com/2015/10/14/google-youtube/〉Munenori Taniguchi, "Google that automates everything explains amazing YouTube thumbnail automatic generation, selects the optimum frame with artificial neural network technology", [online], October 14, 2015, engadget, [October 30, 2018 Search], Internet <URL: https://japanese.engadget.com/2015/10/14/google-youtube/>

しかしながらいずれの文献も、過去のノウハウまたは選択結果に基づいてサムネイルを作成するため、映像データのサムネイルを作成するユーザの意図は考慮されていない。例えば、映像データのジャンル、内容等によってサムネイルの抽出ポイントを変更したいなどのユーザの意図を考慮して、サムネイルを作成することはできない。 However, since all the documents create thumbnails based on past know-how or selection results, the intention of the user who creates thumbnails of video data is not taken into consideration. For example, it is not possible to create a thumbnail in consideration of the user's intention such as changing the extraction point of the thumbnail according to the genre, content, etc. of the video data.

従って本発明の目的は、ユーザの意向を反映して、映像データのサムネイルを出力するサムネイル出力装置、サムネイル出力方法およびサムネイル出力プログラムを提供することである。 Therefore, an object of the present invention is to provide a thumbnail output device, a thumbnail output method, and a thumbnail output program for outputting thumbnails of video data, reflecting the intention of the user.

上記課題を解決するために、本発明の第１の特徴は、映像データのサムネイルデータを出力するサムネイル出力装置に関する。本発明の第１の特徴に係るサムネイル出力装置は、映像データを構成するフレームデータで認識される各人物について、人物が認識されたフレームデータの次の処理対象のフレームデータまでの時間を加算した登場時間を算出し、各人物の登場時間の最大値に対する人物の登場時間の割合である登場人物スコアを、各人物について算出するとともに、フレームデータにおいて最も顔領域の大きい人物の登場人物スコアを、フレームデータの人物重要度スコアとして算出する人物重要度スコア算出部と、登場人物スコアの高いフレームデータを、サムネイルデータとして出力するサムネイル出力部を備える。 In order to solve the above problems, the first feature of the present invention relates to a thumbnail output device that outputs thumbnail data of video data. In the thumbnail output device according to the first feature of the present invention, for each person recognized by the frame data constituting the video data, the time until the next frame data to be processed of the frame data recognized by the person is added. The appearance time is calculated, the character score, which is the ratio of the appearance time of the person to the maximum value of the appearance time of each person, is calculated for each person, and the character score of the person with the largest face area in the frame data is calculated. It includes a person importance score calculation unit that calculates as a person importance score of frame data, and a thumbnail output unit that outputs frame data with a high character score as thumbnail data.

本発明の第２の特徴は、映像データのサムネイルデータを出力するサムネイル出力装置に関する。本発明の第２の特徴に係るサムネイル出力装置は、映像データを構成するフレームデータおよびフレームデータで認識される人物の顔領域の面積を算出し、 The second feature of the present invention relates to a thumbnail output device that outputs thumbnail data of video data. The thumbnail output device according to the second feature of the present invention calculates the frame data constituting the video data and the area of the face area of the person recognized by the frame data.

最適面積に近い顔領域の面積を有するフレームデータについて高くなり、最適面積に遠い顔領域の面積を有するフレームデータについて低くなる顔領域面積スコアを算出する顔領域面積スコア算出部と、顔領域面積スコアの高いフレームデータを、サムネイルデータとして出力するサムネイル出力部を備える。 The face area area score calculation unit for calculating the face area area score, which is higher for the frame data having the area of the face area close to the optimum area and lower for the frame data having the area of the face area far from the optimum area, and the face area area score It is provided with a thumbnail output unit that outputs high frame data as thumbnail data.

本発明の第３の特徴は、映像データのサムネイルデータを出力するサムネイル出力装置に関する。本発明の第３の特徴に係るサムネイル出力装置は、映像データを構成するフレームデータで認識される人物について、表情の種類に対する表情値を算出し、フレームデータの人物の各表情の種類の表情値の合計に対する、各表情値のうちの最大値を、フレームデータの表情スコアとして算出する表情スコア算出部と、表情スコアの高いフレームデータを、サムネイルデータとして出力するサムネイル出力部を備える。 A third feature of the present invention relates to a thumbnail output device that outputs thumbnail data of video data. The thumbnail output device according to the third feature of the present invention calculates a facial expression value for each facial expression type of a person recognized by the frame data constituting the video data, and the facial expression value of each facial expression type of the person in the frame data. It is provided with a facial expression score calculation unit that calculates the maximum value of each facial expression value with respect to the total of the above as the facial expression score of the frame data, and a thumbnail output unit that outputs the frame data having a high facial expression score as the facial expression data.

本発明の第４の特徴は、映像データのサムネイルデータを出力するサムネイル出力装置に関する。本発明の第４の特徴に係るサムネイル出力装置は、映像データの音量が大きい時間に対応するフレームデータについて高くなり、音量が小さい時間に対応するフレームデータについて低くなる音量スコアを算出する音量スコア算出部と、音量スコアの高いフレームデータを、サムネイルデータとして出力するサムネイル出力部を備える。 A fourth feature of the present invention relates to a thumbnail output device that outputs thumbnail data of video data. The thumbnail output device according to the fourth feature of the present invention calculates a volume score that calculates a volume score that increases for frame data corresponding to a time when the volume of video data is high and decreases for frame data corresponding to a time when the volume is low. It is provided with a unit and a thumbnail output unit that outputs frame data having a high volume score as thumbnail data.

本発明の第５の特徴は、映像データのサムネイルデータを出力するサムネイル出力装置に関する。本発明の第５の特徴に係るサムネイル出力装置は、映像データのフレームデータについて、第１の特徴に記載の人物重要度スコア、第２の特徴に記載の顔領域面積スコア、第３の特徴に記載の表情スコアおよび第４の特徴に記載の音量スコアのうちの１つ以上を含む複数のスコアに、重みをそれぞれ乗算して加算した統合スコアを算出する統合スコア算出部と、統合スコアの高いフレームデータを、サムネイルデータとして出力するサムネイル出力部を備える。 A fifth feature of the present invention relates to a thumbnail output device that outputs thumbnail data of video data. The thumbnail output device according to the fifth feature of the present invention has the frame data of the video data as the person importance score described in the first feature, the facial area area score described in the second feature, and the third feature. An integrated score calculation unit that calculates an integrated score obtained by multiplying a plurality of scores including one or more of the described facial expression score and the volume score described in the fourth feature by weights, respectively, and a high integrated score. It is provided with a thumbnail output unit that outputs frame data as thumbnail data.

本発明の第６の特徴は、映像データのサムネイルデータを出力するサムネイル出力方法に関する。本発明の第６の特徴に係るサムネイル出力方法は、コンピュータが、映像データを構成するフレームデータで認識される各人物について、人物が認識されたフレームデータの次の処理対象のフレームデータまでの時間を加算した登場時間を算出するとともに、コンピュータが、各人物の登場時間の最大値に対する人物の登場時間の割合である登場人物スコアを、各人物について算出するステップと、コンピュータが、フレームデータにおいて最も顔領域の大きい人物の登場人物スコアを、フレームデータの人物重要度スコアとして算出するステップと、コンピュータが、登場人物スコアの高いフレームデータを、サムネイルデータとして出力するステップを備える。 A sixth feature of the present invention relates to a thumbnail output method for outputting thumbnail data of video data. In the thumbnail output method according to the sixth feature of the present invention, for each person recognized by the frame data constituting the video data, the time until the frame data to be processed next to the frame data recognized by the person is reached. In addition to calculating the appearance time by adding, the computer calculates the character score, which is the ratio of the appearance time of the person to the maximum value of the appearance time of each person, for each person, and the computer is the most in the frame data. It includes a step of calculating a character score of a person having a large face area as a person importance score of frame data, and a step of a computer outputting frame data having a high character score as thumbnail data.

本発明の第７の特徴は、映像データのサムネイルデータを出力するサムネイル出力方法に関する。本発明の第７の特徴に係るサムネイル出力方法は、コンピュータが、映像データを構成するフレームデータおよびフレームデータで認識される人物の顔領域の面積を算出し、最適面積に近い顔領域の面積を有するフレームデータについて高くなり、最適面積に遠い顔領域の面積を有するフレームデータについて低くなる顔領域面積スコアを算出するステップと、記コンピュータが、顔領域面積スコアの高いフレームデータを、サムネイルデータとして出力するステップを備える。 A seventh feature of the present invention relates to a thumbnail output method for outputting thumbnail data of video data. In the thumbnail output method according to the seventh feature of the present invention, the computer calculates the area of the face area of the person recognized by the frame data constituting the video data and the frame data, and obtains the area of the face area close to the optimum area. The step of calculating the face area area score that becomes higher for the frame data that has and becomes lower for the frame data that has the area of the face area far from the optimum area, and the writing computer outputs the frame data with the high face area area score as thumbnail data. Have steps to do.

本発明の第８の特徴は、映像データのサムネイルデータを出力するサムネイル出力方法に関する。本発明の第８の特徴に係るサムネイル出力方法は、コンピュータが、映像データを構成するフレームデータで認識される人物について、表情の種類に対する表情値を算出し、フレームデータの人物の各表情の種類の表情値の合計に対する、各表情値のうちの最大値を、フレームデータの表情スコアとして算出するステップと、コンピュータが、表情スコアの高いフレームデータを、サムネイルデータとして出力するステップを備える。 The eighth feature of the present invention relates to a thumbnail output method for outputting thumbnail data of video data. In the thumbnail output method according to the eighth feature of the present invention, the computer calculates the facial expression value for the facial expression type of the person recognized by the frame data constituting the video data, and each facial expression type of the person in the frame data. It includes a step of calculating the maximum value of each facial expression value with respect to the total of the facial expression values of the above as a facial expression score of frame data, and a step of the computer outputting frame data having a high facial expression score as thumbnail data.

本発明の第９の特徴は、映像データのサムネイルデータを出力するサムネイル出力方法に関する。本発明の第９の特徴に係るサムネイル出力方法は、コンピュータが、映像データの音量が大きい時間に対応するフレームデータについて高くなり、音量が小さい時間に対応するフレームデータについて低くなる音量スコアを算出するステップと、コンピュータが、音量スコアの高いフレームデータを、サムネイルデータとして出力するステップを備える。 A ninth feature of the present invention relates to a thumbnail output method for outputting thumbnail data of video data. In the thumbnail output method according to the ninth feature of the present invention, the computer calculates a volume score that increases for the frame data corresponding to the time when the volume of the video data is high and decreases for the frame data corresponding to the time when the volume is low. It includes a step and a step in which the computer outputs frame data having a high volume score as thumbnail data.

本発明の第１０の特徴は、映像データのサムネイルデータを出力するサムネイル出力方法に関する。本発明の第１０の特徴に係るサムネイル出力方法は、コンピュータが、映像データのフレームデータについて、第６の特徴に記載の人物重要度スコア、第７の特徴に記載の顔領域面積スコア、第８の特徴に記載の表情スコアおよび第９の特徴に記載の音量スコアのうちの１つ以上を含む複数のスコアに、重みをそれぞれ乗算して加算した統合スコアを算出するステップと、コンピュータが、統合スコアの高いフレームデータを、サムネイルデータとして出力するステップを備える。 A tenth feature of the present invention relates to a thumbnail output method for outputting thumbnail data of video data. In the thumbnail output method according to the tenth feature of the present invention, the computer performs the frame data of the video data with respect to the person importance score described in the sixth feature, the facial area area score described in the seventh feature, and the eighth. A computer integrates a step of calculating an integrated score obtained by multiplying and adding weights to a plurality of scores including one or more of the facial expression score described in the feature and the volume score described in the ninth feature. It includes a step of outputting frame data having a high score as thumbnail data.

本発明の第１１の特徴は、コンピュータに、本発明の第１ないし第５の特徴に記載のサムネイル出力装置として機能させるためのサムネイル出力プログラムに関する。 The eleventh feature of the present invention relates to a thumbnail output program for causing a computer to function as the thumbnail output device according to the first to fifth features of the present invention.

本発明によれば、ユーザの意向を反映して、映像データのサムネイルを出力するサムネイル出力装置、サムネイル出力方法およびサムネイル出力プログラムを提供することができる。 According to the present invention, it is possible to provide a thumbnail output device, a thumbnail output method, and a thumbnail output program for outputting thumbnails of video data, reflecting the intention of the user.

本発明の実施の形態に係るサムネイル出力装置のハードウエア構成および機能ブロックを説明する図である。It is a figure explaining the hardware composition and the functional block of the thumbnail output device which concerns on embodiment of this invention. 本発明の実施の形態に係るサムネイル出力装置の条件データのデータ項目の一例を説明する図である。It is a figure explaining an example of the data item of the condition data of the thumbnail output apparatus which concerns on embodiment of this invention. 本発明の実施の形態に係るサムネイル出力装置のスコアデータのデータ項目の一例を説明する図である。It is a figure explaining an example of the data item of the score data of the thumbnail output apparatus which concerns on embodiment of this invention. 本発明の実施の形態に係る人物重要度スコア算出部による人物重要度スコア算出処理を説明するフローチャートである。It is a flowchart explaining the person importance score calculation process by the person importance score calculation part which concerns on embodiment of this invention. 本発明の実施の形態に係る人物重要度スコア算出部において参照される人物認識結果の一例を説明する図である。It is a figure explaining an example of the person recognition result referred to in the person importance score calculation part which concerns on embodiment of this invention. 本発明の実施の形態に係る人物重要度スコア算出部において参照される処理対象フレームデータの一例を説明する図である。It is a figure explaining an example of the processing target frame data referred to in the person importance score calculation part which concerns on embodiment of this invention. 本発明の実施の形態に係る顔領域面積スコア算出部による顔領域面積スコア算出処理を説明するフローチャートである。It is a flowchart explaining the face area area score calculation process by the face area area score calculation part which concerns on embodiment of this invention. 本発明の実施の形態に係る表情スコア算出部による表情スコア算出処理を説明するフローチャートである。It is a flowchart explaining the facial expression score calculation process by the facial expression score calculation unit which concerns on embodiment of this invention. 本発明の実施の形態に係る表情スコア算出部において参照される表情認識結果の一例を説明する図である。It is a figure explaining an example of the facial expression recognition result referred to in the facial expression score calculation part which concerns on embodiment of this invention. 本発明の実施の形態に係る音量スコア算出部による音量スコア算出処理を説明するフローチャートである。It is a flowchart explaining the volume score calculation process by the volume score calculation part which concerns on embodiment of this invention.

次に、図面を参照して、本発明の実施の形態を説明する。以下の図面の記載において、同一または類似の部分には同一または類似の符号を付している。 Next, an embodiment of the present invention will be described with reference to the drawings. In the description of the drawings below, the same or similar parts are designated by the same or similar reference numerals.

（サムネイル出力装置）
図１を参照して、本発明の実施の形態に係るサムネイル出力装置１を説明する。サムネイル出力装置１は、映像データ１１から、ユーザの意図を反映したサムネイルデータ１５を出力する。 (Thumbnail output device)
The thumbnail output device 1 according to the embodiment of the present invention will be described with reference to FIG. The thumbnail output device 1 outputs thumbnail data 15 that reflects the user's intention from the video data 11.

サムネイル出力装置１は、記憶装置１０および処理装置２０を備える一般的なコンピュータである。一般的なコンピュータがサムネイル出力プログラムを実行することにより、図１に示す機能を実現する。 The thumbnail output device 1 is a general computer including a storage device 10 and a processing device 20. The function shown in FIG. 1 is realized by executing a thumbnail output program on a general computer.

記憶装置１０は、ＲＯＭ（Read Only Memory）、ＲＡＭ（Random access memory）、ハードディスク等であって、処理装置２０が処理を実行するための入力データ、出力データおよび中間データなどの各種データを記憶する。処理装置２０は、ＣＰＵ（Central Processing Unit）であって、記憶装置１０に記憶されたデータを読み書きして、サムネイル出力装置１における処理を実行する。 The storage device 10 is a ROM (Read Only Memory), a RAM (Random access memory), a hard disk, or the like, and stores various data such as input data, output data, and intermediate data for the processing device 20 to execute processing. .. The processing device 20 is a CPU (Central Processing Unit) that reads and writes data stored in the storage device 10 and executes processing in the thumbnail output device 1.

また図１には示さないが、キーボード、マウス、ディスプレイなどの入出力装置、入出力装置と処理装置のインタフェースとなる入出力インタフェース等を備えても良い。 Further, although not shown in FIG. 1, an input / output device such as a keyboard, a mouse, and a display, an input / output interface serving as an interface between the input / output device and the processing device, and the like may be provided.

記憶装置１０は、サムネイル出力プログラムを記憶するとともに、映像データ１１、処理対象フレームデータ１２、条件データ１３、スコアデータ１４およびサムネイルデータ１５を記憶する。 The storage device 10 stores the thumbnail output program, and also stores the video data 11, the frame data 12 to be processed, the condition data 13, the score data 14, and the thumbnail data 15.

映像データ１１は、サムネイル出力装置１が出力するサムネイルデータ１５が表現するコンテンツである。映像データ１１は、１秒あたり３０枚などの、複数のフレームデータ、および時系列の音声データを含む。音声データは、映像データ１１における音声および音量の推移のデータである。 The video data 11 is the content expressed by the thumbnail data 15 output by the thumbnail output device 1. The video data 11 includes a plurality of frame data such as 30 images per second, and time-series audio data. The audio data is data on the transition of audio and volume in the video data 11.

処理対象フレームデータ１２は、映像データ１１の複数のフレームデータのうち、後述のスコア算出部２３による処理対象となるフレームデータである。処理対象フレームデータ１２は、複数であっても良い。処理対象フレームデータ１２は、１秒あたり１枚など、映像データ１１に含まれる複数のフレームデータから所定の頻度で間引かれたデータであっても良いし、ランダムに間引かれたデータであっても良い。また処理対象フレームデータ１２は、映像データをシーン分割し、各シーンから抽出されたフレームデータなど、所定の処理を経て映像データ１１から抽出されても良い。 The processing target frame data 12 is frame data to be processed by the score calculation unit 23, which will be described later, among the plurality of frame data of the video data 11. The number of frame data 12 to be processed may be plural. The frame data 12 to be processed may be data thinned out from a plurality of frame data included in the video data 11 at a predetermined frequency, such as one frame per second, or may be randomly thinned out data. You may. Further, the processing target frame data 12 may be extracted from the video data 11 through predetermined processing such as the frame data extracted from each scene by dividing the video data into scenes.

条件データ１３は、サムネイルデータ１５を作成する条件のデータである。条件データ１３は、図２に示すように、最適面積、サムネイル数および重みを含む。最適面積は、後述の顔領域面積スコア算出部２５において参照される。本発明の実施の形態において最適面積は、フレームデータに対する面積率で表現するが、フレームデータにおける画素数で表現されても良い。サムネイル数は、後述のサムネイル出力部２９が出力するサムネイルデータ１５の数であって、自然数が設定される。 The condition data 13 is the data of the conditions for creating the thumbnail data 15. The condition data 13 includes an optimum area, the number of thumbnails, and weights, as shown in FIG. The optimum area is referred to in the face area area score calculation unit 25 described later. In the embodiment of the present invention, the optimum area is expressed by the area ratio with respect to the frame data, but may be expressed by the number of pixels in the frame data. The number of thumbnails is the number of thumbnail data 15 output by the thumbnail output unit 29, which will be described later, and a natural number is set.

重みは、後述の統合スコア算出部２８について、人物重要度スコア、顔領域面積スコア、表情スコアおよび音量スコアのうちの１以上を含む複数のスコアを考慮した統合スコアを算出する際に参照される。重みは、人物重要度スコア、顔領域面積スコア、表情スコアおよび音量スコアのうちの１つ以上と、そのほかのスコアのうち、統合スコアを算出するために用いられる各スコアに対して、設定される。 The weight is referred to when calculating the integrated score considering a plurality of scores including one or more of the person importance score, the face area area score, the facial expression score, and the volume score for the integrated score calculation unit 28 described later. .. Weights are set for one or more of the person importance score, face area score, facial expression score and volume score, and for each of the other scores used to calculate the integrated score. ..

スコアデータ１４は、スコア算出部２３による算出結果のデータである。スコアデータ１４は、図３に示すように、フレームデータの識別子に対して、人物重要度スコア、顔領域面積スコア、表情スコア、音量スコアおよび統合スコアを対応づける。図３に示すフレームデータの識別子は、処理対象フレームデータ１２の識別子である。人物重要度スコア、顔領域面積スコア、表情スコア、音量スコアおよび統合スコアは、それぞれ、スコア算出部２３の人物重要度スコア算出部２４、顔領域面積スコア算出部２５、表情スコア算出部２６、音量スコア算出部２７および統合スコア算出部２８のそれぞれの算出結果である。 The score data 14 is the data of the calculation result by the score calculation unit 23. As shown in FIG. 3, the score data 14 associates a person importance score, a face area area score, a facial expression score, a volume score, and an integrated score with the frame data identifier. The frame data identifier shown in FIG. 3 is an identifier of the processing target frame data 12. The person importance score, face area area score, facial expression score, volume score, and integrated score are the person importance score calculation unit 24, face area area score calculation unit 25, facial expression score calculation unit 26, and volume of the score calculation unit 23, respectively. It is the calculation result of each of the score calculation unit 27 and the integrated score calculation unit 28.

スコアデータ１４は、各フレームデータについて、人物重要度スコア、顔領域面積スコア、表情スコア、音量スコアおよび統合スコアが対応づけられる必要はなく、サムネイルデータ１５を出力する際に参照されるスコアが設定されていれば良い。例えば人物重要度スコアのみに基づいてサムネイルデータ１５を出力する場合、各フレームデータに対して人物重要度スコアのみが設定されれば良い。また人物重要度スコアと顔領域面積スコアを統合した統合スコアに基づいてサムネイルデータ１５を出力する場合、各フレームデータに対して人物重要度スコア、顔領域面積スコアおよび統合スコアのみが設定されれば良い。 In the score data 14, it is not necessary to associate the person importance score, the face area area score, the facial expression score, the volume score, and the integrated score with each frame data, and the score referred to when outputting the thumbnail data 15 is set. It should be done. For example, when the thumbnail data 15 is output based only on the person importance score, only the person importance score may be set for each frame data. Further, when the thumbnail data 15 is output based on the integrated score that integrates the person importance score and the face area area score, if only the person importance score, the face area area score, and the integrated score are set for each frame data. good.

サムネイルデータ１５は、サムネイル出力装置１が出力するサムネイルデータ１５である。サムネイルデータ１５は、スコアデータ１４に示す処理対象フレームデータ１２毎に算出されたスコアに基づいて、決定される。記憶装置１０は、条件データ１３のサムネイル数のサムネイルデータ１５を記憶しても良い。 The thumbnail data 15 is the thumbnail data 15 output by the thumbnail output device 1. The thumbnail data 15 is determined based on the score calculated for each of the processing target frame data 12 shown in the score data 14. The storage device 10 may store the thumbnail data 15 of the number of thumbnails of the condition data 13.

処理装置２０は、処理対象フレーム抽出部２１、条件データ取得部２２、スコア算出部２３およびサムネイル出力部２９を備える。 The processing device 20 includes a processing target frame extraction unit 21, a condition data acquisition unit 22, a score calculation unit 23, and a thumbnail output unit 29.

処理対象フレーム抽出部２１は、映像データ１１を構成するフレームデータのうち、サムネイルデータ１５の候補となるフレームデータを、処理対象フレームデータ１２として抽出する。処理対象フレーム抽出部２１は、１秒あたり１枚など、映像データ１１に含まれる複数のフレームデータから所定の頻度で間引いて、処理対象フレームデータ１２を抽出しても良いし、ランダムに抽出しても良い。また処理対象フレームデータ１２は、映像データをシーン分割し、各シーンから抽出されたフレームデータなど、所定の処理を経て、処理対象フレームデータ１２を抽出しても良い。また処理対象フレーム抽出部２１は、映像データ１１の各フレームデータを、処理対象フレームデータ１２として抽出しても良い。 The processing target frame extraction unit 21 extracts the frame data that is a candidate for the thumbnail data 15 from the frame data constituting the video data 11 as the processing target frame data 12. The processing target frame extraction unit 21 may extract the processing target frame data 12 by thinning out from a plurality of frame data included in the video data 11 such as one per second at a predetermined frequency, or randomly extracts the processing target frame data 12. You may. Further, the processing target frame data 12 may be obtained by dividing the video data into scenes and extracting the processing target frame data 12 through predetermined processing such as frame data extracted from each scene. Further, the processing target frame extraction unit 21 may extract each frame data of the video data 11 as the processing target frame data 12.

条件データ取得部２２は、図２を参照して説明した条件データ１３の各項目を、例えばユーザの入力により取得して、条件データ１３を記憶装置１０に記憶する。これらの項目は、予め記憶装置１０に記憶されていても良い。また、条件データ１３の各項目について設定される必要はなく、統合スコアを算出しない場合、重みの項目が設定されないなど、必要な条件が設定されていればよい。 The condition data acquisition unit 22 acquires each item of the condition data 13 described with reference to FIG. 2, for example, by input of a user, and stores the condition data 13 in the storage device 10. These items may be stored in the storage device 10 in advance. Further, it is not necessary to set each item of the condition data 13, and when the integrated score is not calculated, a necessary condition such that the weight item is not set may be set.

また条件データ取得部２２は、人物重要度スコア、顔領域面積スコア、表情スコア、音量スコアおよび統合スコアのうちのいずれのスコアに基づいて、サムネイルデータ１５を出力するかを示す指標を有していても良い。 Further, the condition data acquisition unit 22 has an index indicating which of the person importance score, the face area area score, the facial expression score, the volume score, and the integrated score is used to output the thumbnail data 15. You may.

スコア算出部２３は、処理対象フレームデータ１２のそれぞれについて、サムネイルデータ１５を決定する指標となるスコアを算出する。スコア算出部２３は、人物重要度スコア算出部２４、顔領域面積スコア算出部２５、表情スコア算出部２６、音量スコア算出部２７および統合スコア算出部２８を備える。 The score calculation unit 23 calculates a score as an index for determining the thumbnail data 15 for each of the processing target frame data 12. The score calculation unit 23 includes a person importance score calculation unit 24, a face area area score calculation unit 25, a facial expression score calculation unit 26, a volume score calculation unit 27, and an integrated score calculation unit 28.

本発明の実施の形態において、人物重要度スコア算出部２４、顔領域面積スコア算出部２５、表情スコア算出部２６、音量スコア算出部２７および統合スコア算出部２８はそれぞれ、処理対象フレームデータ１２のそれぞれについて、人物重要度スコア、顔領域面積スコア、表情スコア、音量スコアおよび統合スコアを算出する場合を説明するが、これに限らない。例えば、人物重要度スコアのみに基づいてサムネイルデータ１５を出力する場合、人物重要度スコア算出部２４のみが処理すれば良い。また人物重要度スコアと顔領域面積スコアに基づいてサムネイルデータ１５を出力する場合、人物重要度スコア算出部２４、顔領域面積スコア算出部２５および統合スコア算出部２８が処理すれば良い。このように、サムネイルデータ１５を出力する指標に応じて、処理対象フレームデータ１２を処理する算出部が限定されても良い。 In the embodiment of the present invention, the person importance score calculation unit 24, the face area area score calculation unit 25, the facial expression score calculation unit 26, the volume score calculation unit 27, and the integrated score calculation unit 28 are the processing target frame data 12, respectively. The case of calculating the person importance score, the face area area score, the facial expression score, the volume score, and the integrated score will be described for each, but the present invention is not limited to this. For example, when the thumbnail data 15 is output based only on the person importance score, only the person importance score calculation unit 24 needs to process it. Further, when the thumbnail data 15 is output based on the person importance score and the face area area score, the person importance score calculation unit 24, the face area area score calculation unit 25, and the integrated score calculation unit 28 may process the thumbnail data 15. In this way, the calculation unit that processes the processing target frame data 12 may be limited according to the index that outputs the thumbnail data 15.

人物重要度スコア算出部２４は、処理対象フレームデータ１２の人物重要度スコアを算出する。映像データ１１において大きくかつ長く映っている人物を重要な人物と推定し、人物重要度スコアは、その重要な人物が映っているフレームデータについて高くなるように設定される。 The person importance score calculation unit 24 calculates the person importance score of the processing target frame data 12. A person who appears large and long in the video data 11 is estimated to be an important person, and the person importance score is set to be higher for the frame data in which the important person is shown.

顔領域面積スコア算出部２５は、処理対象フレームデータ１２の顔領域面積スコアを算出する。顔領域面積スコアは、処理対象フレームデータ１２において認識された人物の顔部分の面積の、条件データ１３として設定された最適面積に対する近似性を示す。最適面積は、サムネイルデータ１５に表示したい顔領域面積の指標である。 The face area area score calculation unit 25 calculates the face area area score of the processing target frame data 12. The face area area score indicates the approximation of the area of the face portion of the person recognized in the processing target frame data 12 to the optimum area set as the condition data 13. The optimum area is an index of the face area area to be displayed in the thumbnail data 15.

例えば、映画、一人の人に密着したドキュメンタリー番組等で、人物のアップの画像をサムネイルデータ１５として選択したい場合、最適面積に１００％または１００％に近い値が設定される。風景を紹介する番組等で風景をサムネイルデータ１５として選択したい場合、最適面積に０％または０％に近い値が設定される。またお笑い番組等で、複数の人物が一つのパフォーマンスをするシーンをサムネイルデータ１５として選択したい場合、一人当たりの顔の面積に相当する値が、最適面積に設定される。 For example, in a movie, a documentary program closely related to one person, or the like, when it is desired to select a close-up image of a person as thumbnail data 15, a value of 100% or a value close to 100% is set for the optimum area. When it is desired to select the landscape as the thumbnail data 15 in a program introducing the landscape, a value of 0% or a value close to 0% is set for the optimum area. Further, in a comedy program or the like, when it is desired to select a scene in which a plurality of people perform one performance as thumbnail data 15, a value corresponding to the area of the face per person is set as the optimum area.

表情スコア算出部２６は、処理対象フレームデータ１２の表情スコアを算出する。処理対象フレームデータ１２の表情スコアを算出する。表情スコアは、処理対象フレームデータ１２で認識される人物の表情が豊かなフレームデータについて、高くなるように設定される。 The facial expression score calculation unit 26 calculates the facial expression score of the processing target frame data 12. The facial expression score of the processing target frame data 12 is calculated. The facial expression score is set to be high for the frame data in which the facial expression of the person recognized by the processing target frame data 12 is rich.

音量スコア算出部２７は、処理対象フレームデータ１２の音量スコアを算出する。音量スコアは、処理対象フレームデータ１２の時間に対応する音量が大きい場合に高くなるように設定される。例えば映画やバラエティ番組などで音量が大きい時間において盛り上がると推定できるので、音量スコアは、音量の大きい時間のフレームデータについて高くなるように設定される。音量スコアは、音量が大きいことから盛り上がりが大きいと推測されるフレームデータをサムネイルデータ１５として出力したい場合に好適である。 The volume score calculation unit 27 calculates the volume score of the processing target frame data 12. The volume score is set to be high when the volume corresponding to the time of the processing target frame data 12 is high. For example, in a movie or a variety show, it can be estimated that the volume rises during a loud time, so the volume score is set to be higher for the frame data during the loud time. The volume score is suitable when it is desired to output frame data, which is presumed to have a large excitement because the volume is high, as thumbnail data 15.

統合スコア算出部２８は、処理対象フレームデータ１２の統合スコアを算出する。処理対象フレームデータ１２の統合スコアを算出する。統合スコアは、人物重要度スコア、顔領域面積スコア、表情スコアおよび音量スコアのうちの１つ以上を含む複数のスコアについて、条件データ１３で定める重みに基づいて算出される。統合スコアは、複数の視点に基づいてサムネイルデータ１５を出力したい場合に好適である。 The integrated score calculation unit 28 calculates the integrated score of the processing target frame data 12. The integrated score of the processing target frame data 12 is calculated. The integrated score is calculated based on the weight defined in the condition data 13 for a plurality of scores including one or more of the person importance score, the face area area score, the facial expression score, and the volume score. The integrated score is suitable when it is desired to output thumbnail data 15 based on a plurality of viewpoints.

人物重要度スコア算出部２４、顔領域面積スコア算出部２５、表情スコア算出部２６、音量スコア算出部２７および統合スコア算出部２８の各処理は、後に詳述する。 Each process of the person importance score calculation unit 24, the face area area score calculation unit 25, the facial expression score calculation unit 26, the volume score calculation unit 27, and the integrated score calculation unit 28 will be described in detail later.

サムネイル出力部２９は、スコア算出部２３によって算出された各スコアに基づいて、スコアの高い順に、条件データ１３で指定されたサムネイル数の処理対象フレームデータ１２を、サムネイルデータ１５として出力する。サムネイル出力部２９は、人物重要度スコア、顔領域面積スコア、表情スコアおよび音量スコアのうちの一つのスコアに基づいて、そのスコアの高い処理対象フレームデータ１２をサムネイルデータ１５として出力する。或いはサムネイル出力部２９は、統合スコアに基づいて、複数のスコアを考慮してサムネイルデータ１５を出力する。 The thumbnail output unit 29 outputs the processing target frame data 12 of the number of thumbnails specified in the condition data 13 as the thumbnail data 15 in descending order of the score based on each score calculated by the score calculation unit 23. The thumbnail output unit 29 outputs the processing target frame data 12 having a high score as the thumbnail data 15 based on one of the person importance score, the face area area score, the facial expression score, and the volume score. Alternatively, the thumbnail output unit 29 outputs thumbnail data 15 in consideration of a plurality of scores based on the integrated score.

（人物重要度スコア算出部）
人物重要度スコア算出部２４は、処理対象フレームデータ１２で認識される各人物について、人物が認識された処理対象フレームデータの次の処理対象のフレームデータまでの時間を加算した登場時間を、処理対象フレームデータで認識される各人物について算出し、各人物の登場時間の最大値に対する人物の登場時間の割合である登場人物スコアを、各人物について算出する。人物重要度スコア算出部２４は、さらに、処理対象フレームデータ１２において最も顔領域の割合の大きい人物の登場人物スコアを、処理対象フレームデータ１２の人物重要度スコアとして算出する。人物重要度スコア算出部２４は、処理対象フレームデータ１２のそれぞれについて、人物重要度スコアを算出して、スコアデータ１４に記憶する。 (Person importance score calculation department)
The person importance score calculation unit 24 processes, for each person recognized by the processing target frame data 12, the appearance time obtained by adding the time until the next processing target frame data of the processing target frame data recognized by the person. It is calculated for each person recognized by the target frame data, and the character score, which is the ratio of the appearance time of the person to the maximum value of the appearance time of each person, is calculated for each person. The person importance score calculation unit 24 further calculates the character score of the person having the largest proportion of the face area in the processing target frame data 12 as the person importance score of the processing target frame data 12. The person importance score calculation unit 24 calculates the person importance score for each of the processing target frame data 12 and stores it in the score data 14.

人物重要度スコアのみに基づいてサムネイルデータ１５を出力する場合、サムネイル出力部２９は、登場人物スコアの高いフレームデータを、サムネイルデータ１５として出力する。 When the thumbnail data 15 is output based only on the person importance score, the thumbnail output unit 29 outputs the frame data having a high character score as the thumbnail data 15.

図４を参照して、人物重要度スコア算出部２４による人物重要度スコア算出処理を説明する。 The person importance score calculation process by the person importance score calculation unit 24 will be described with reference to FIG.

まずステップＳ１０１において人物重要度スコア算出部２４は、処理対象フレームデータ１２のそれぞれについて、映っている人物を認識する。人物認識においては、例えば、Azure、AWS、Google等のAPI(Application Programming Interface)が用いられても良い。その結果、図５に示すように、フレームの識別子毎に、各フレームデータで認識された人物の関係を特定する。図５に示す例では、フレームの識別子f1の処理対象フレームデータ１２において、人物P1およびP2が認識される。フレームの識別子f2の処理対象フレームデータ１２において、人物P1およびP3が認識され、フレームの識別子f3の処理対象フレームデータ１２において、人物P1、P2およびP3が認識される。 First, in step S101, the person importance score calculation unit 24 recognizes the person in the image for each of the processing target frame data 12. In person recognition, for example, API (Application Programming Interface) such as Azure, AWS, Google may be used. As a result, as shown in FIG. 5, the relationship between the persons recognized in each frame data is specified for each frame identifier. In the example shown in FIG. 5, the persons P1 and P2 are recognized in the processing target frame data 12 of the frame identifier f1. Persons P1 and P3 are recognized in the processing target frame data 12 of the frame identifier f2, and persons P1, P2 and P3 are recognized in the processing target frame data 12 of the frame identifier f3.

ステップＳ１０２において人物重要度スコア算出部２４は、各人物が認識されるフレームデータを特定する。図５の例において、人物P1は、フレーム識別子f1、f2およびf3の各処理対象フレームデータ１２において認識される。人物P2は、フレーム識別子f1およびf3の各処理対象フレームデータ１２において認識される。人物P3は、フレーム識別子f2およびf3の各処理対象フレームデータ１２において認識される。 In step S102, the person importance score calculation unit 24 identifies the frame data in which each person is recognized. In the example of FIG. 5, the person P1 is recognized by the processing target frame data 12 of the frame identifiers f1, f2, and f3. The person P2 is recognized in each of the processing target frame data 12 of the frame identifiers f1 and f3. The person P3 is recognized in each of the processing target frame data 12 of the frame identifiers f2 and f3.

ステップＳ１０３において人物重要度スコア算出部２４は、各人物について、登場時間を算出する。登場時間は、人物が認識された処理対象フレームデータ１２の次の処理対象のフレームデータまでの時間を加算した時間である。 In step S103, the person importance score calculation unit 24 calculates the appearance time for each person. The appearance time is the time obtained by adding the time until the next processing target frame data of the processing target frame data 12 in which the person is recognized.

処理対象フレームデータにおいて認識される人物の登場時間は、映画の場合、主人公の時間が長い傾向がある。また旅番組等のロケ番組の場合、リポーターの登場時間が長くなり、例えばロケ番組で訪れた店舗の店員、ロケ中にリポーターとすれ違った人の登場時間は、リポーターの登場時間よりも短い傾向がある。 In the case of a movie, the appearance time of the person recognized in the processed frame data tends to be long for the main character. Also, in the case of location programs such as travel programs, the appearance time of the reporter becomes longer. For example, the appearance time of the store clerk who visited the location program and the person who passed the reporter during the location tends to be shorter than the appearance time of the reporter. is there.

図６に示すように、処理対象フレームデータ１２は、映像データ１１から時間Δt毎に１枚抽出されたフレームデータであるとする。この場合、ある人物の登場時間は、処理対象フレームデータ１２のうち、その人物が認識された枚数×Δｔとなる。ここでは映像データ１１から均一に処理対象フレームデータ１２が抽出される場合を説明するが、ランダムに抽出される場合も同様に、人物が認識された処理対象フレームデータの次の処理対象のフレームデータまでの時間を加算することで、その人物の登場時間が算出される。 As shown in FIG. 6, it is assumed that the processing target frame data 12 is frame data extracted from the video data 11 every time Δt. In this case, the appearance time of a certain person is the number of times the person is recognized in the processing target frame data 12 × Δt. Here, the case where the processing target frame data 12 is uniformly extracted from the video data 11 will be described, but similarly, when the processing target frame data 12 is randomly extracted, the processing target frame data next to the processing target frame data in which the person is recognized is also described. The appearance time of the person is calculated by adding the time until.

各人物の登場時間が算出されると、ステップＳ１０４において人物重要度スコア算出部２４は、各人物の登場人物スコアを算出する。登場人物スコアは、映像データ１１における各人物の登場時間の最大値に対する人物の登場時間の割合である。人物Pi(i=1,2,…,n：nは処理対象フレームにおいて認識された人物数)の登場人物スコアSp(Pi)は、式（１）で表される。ここで、Tiは、人物Piの登場時間である。
Sp(Pi)=Ti/max(Ti) ・・・式（１） When the appearance time of each person is calculated, the person importance score calculation unit 24 calculates the character score of each person in step S104. The character score is the ratio of the appearance time of the person to the maximum value of the appearance time of each person in the video data 11. The character score Sp (Pi) of the person Pi (i = 1,2, ..., n: n is the number of people recognized in the processing target frame) is expressed by the equation (1). Here, Ti is the appearance time of the person Pi.
Sp (Pi) = Ti / max (Ti) ・・・ Equation (1)

ここで、人物P1の登場時間T1は10秒で、人物P2の登場時間T2は15秒、人物P3の登場時間T3は20秒とする。映像データ１１における各人物の登場時間の最大値は、人物P3の登場時間T3の20秒である。従って、人物P1の登場人物スコアは、10/20である。人物P2の登場人物スコアは、15/20である。人物P3の登場人物スコアは、20/20である。 Here, the appearance time T1 of the person P1 is 10 seconds, the appearance time T2 of the person P2 is 15 seconds, and the appearance time T3 of the person P3 is 20 seconds. The maximum value of the appearance time of each person in the video data 11 is 20 seconds of the appearance time T3 of the person P3. Therefore, the character score of person P1 is 10/20. The character score of person P2 is 15/20. The character score of person P3 is 20/20.

各人物について登場人物スコアが算出されると、人物重要度スコア算出部２４は、各処理対象フレームデータ１２について、人物重要度スコアを算出する処理を繰り返す。まずステップＳ１０５において人物重要度スコア算出部２４は、処理対象フレームデータ１２において顔領域の最も大きい人物を特定する。ステップＳ１０６において顔領域の最も大きい人物の登場人物スコアを、この処理対象フレームデータ１２の人物重要度スコアとして出力する。 When the character score is calculated for each person, the person importance score calculation unit 24 repeats the process of calculating the person importance score for each processing target frame data 12. First, in step S105, the person importance score calculation unit 24 identifies the person with the largest face region in the processing target frame data 12. In step S106, the character score of the person having the largest face area is output as the person importance score of the processing target frame data 12.

ある処理対象フレームデータfにおいて顔領域の面積が最大となる人物が人物Pxの場合、登場人物スコアS1(f)は、例えば式（２）により算出される。
S1(f)=Sp(Px) ・・・式（２） When the person having the maximum area of the face area in a certain processing target frame data f is the person Px, the character score S1 (f) is calculated by, for example, the equation (2).
S1 (f) = Sp (Px) ・・・ Equation (2)

例えば、フレーム識別子f1の処理対象フレームデータ１２において人物P2の顔領域が人物P1の顔領域よりも大きい場合、フレーム識別子f1の処理対象フレームデータ１２の人物重要度スコアは、人物P2の登場人物スコアである15/20となる。 For example, when the face area of the person P2 is larger than the face area of the person P1 in the processing target frame data 12 of the frame identifier f1, the person importance score of the processing target frame data 12 of the frame identifier f1 is the character score of the person P2. It will be 15/20.

各処理対象フレームデータ１２について人物重要度スコアを算出し、スコアデータ１４に出力すると、人物重要度スコア算出部２４は処理を終了する。 When the person importance score is calculated for each processing target frame data 12 and output to the score data 14, the person importance score calculation unit 24 ends the processing.

（顔領域面積スコア算出部）
顔領域面積スコア算出部２５は、処理対象フレームデータ１２および処理対象フレームデータ１２で認識される人物の顔領域の面積を算出する。顔領域面積スコア算出部２５は、最適面積に近い顔領域の面積を有する処理対象フレームデータ１２について高くなり、最適面積に遠い顔領域の面積を有する処理対象フレームデータ１２について低くなる顔領域面積スコアを算出する。ここで、最適面積は、条件データ１３において設定される。 (Face area area score calculation unit)
The face area area score calculation unit 25 calculates the area of the face area of the person recognized by the processing target frame data 12 and the processing target frame data 12. The face area area score calculation unit 25 increases the processing target frame data 12 having the area of the face area close to the optimum area and decreases the processing target frame data 12 having the area of the face area far from the optimum area. Is calculated. Here, the optimum area is set in the condition data 13.

顔領域面積スコア算出部２５は、処理対象フレームデータ１２において複数の人物が認識された場合、顔領域面積スコアを算出する方法はいくつか考えられる。例えば顔領域面積スコア算出部２５は、最も顔領域の面積の大きい人物の顔領域の面積と、最適面積との比較に基づいて顔領域面積スコアを算出しても良い。顔領域面積スコア算出部２５は、所定の閾値以上の顔領域の面積を有する各人物の顔領域の面積の平均値と、最適面積との比較に基づいて顔領域面積スコアを算出しても良い。顔領域面積スコア算出部２５は、所定の閾値以上の顔領域の面積を有する各人物の顔領域の面積のうち最も小さい顔領域の面積と、最適面積との比較に基づいて顔領域面積スコアを算出しても良い。 When a plurality of persons are recognized in the processing target frame data 12, the face area area score calculation unit 25 can consider several methods for calculating the face area area score. For example, the face area area score calculation unit 25 may calculate the face area area score based on the comparison between the area of the face area of the person having the largest area of the face area and the optimum area. The face area area score calculation unit 25 may calculate the face area area score based on the comparison between the average value of the area of the face area of each person having the area of the face area equal to or larger than a predetermined threshold value and the optimum area. .. The face area area score calculation unit 25 calculates the face area area score based on the comparison between the area of the smallest face area among the areas of the face area of each person having the area of the face area equal to or larger than a predetermined threshold and the optimum area. You may calculate.

顔領域面積スコアのみに基づいてサムネイルデータ１５を出力する場合、サムネイル出力部２９は、顔領域面積スコアの高いフレームデータを、サムネイルデータ１５として出力する。 When the thumbnail data 15 is output based only on the face area area score, the thumbnail output unit 29 outputs the frame data having a high face area area score as the thumbnail data 15.

図７を参照して、顔領域面積スコア算出部２５による顔領域面積スコア算出処理を説明する。図７は、処理対象フレームデータ１２において複数の人物が認識された場合、処理対象フレームデータ１２において最も顔領域の大きい人物の顔領域の面積と、条件データ１３で設定された最適面積とに基づいて、顔領域面積スコアを算出する例を説明する。顔領域面積スコア算出部２５は、処理対象フレームデータ１２のそれぞれについて、顔領域面積スコアを算出する処理を繰り返す。 The face area area score calculation process by the face area area score calculation unit 25 will be described with reference to FIG. 7. FIG. 7 is based on the area of the face area of the person having the largest face area in the processing target frame data 12 and the optimum area set in the condition data 13 when a plurality of people are recognized in the processing target frame data 12. An example of calculating the face area area score will be described. The face area area score calculation unit 25 repeats the process of calculating the face area area score for each of the processing target frame data 12.

ステップＳ２０１において顔領域面積スコア算出部２５は、処理対象フレームデータ１２において映っている人物を認識する。図４のステップＳ１０１と同様に、人物認識においては、例えば、Azure、AWS、Google等の既存のAPI(Application Programming Interface)が用いられても良い。ステップＳ２０２において、ステップＳ２０１で認識された各人物のうち、最も顔領域の面積が大きい人物の顔領域の面積を算出する。 In step S201, the face area area score calculation unit 25 recognizes the person shown in the processing target frame data 12. Similar to step S101 of FIG. 4, in person recognition, for example, an existing API (Application Programming Interface) such as Azure, AWS, or Google may be used. In step S202, the area of the face area of the person having the largest face area among the persons recognized in step S201 is calculated.

フレーム識別子ｆの顔領域スコアをS2(f)、最適面積をSpace_bestとする場合、顔領域スコアS2(f)は、例えば、式（３）により算出される。ここでwは、最適面積Space_bestからどのくらい離れることを許容するかを表すパラメータである。wは、デフォルトの値が与えられても良いし、予め条件データ１３に設定されても良い。Space(Px,f)は、処理対象フレームデータ１２の画素数に対する、最も顔領域の面積が大きい人物の顔領域の画素数の割合である。
S2(f)=exp(-1*w*abs(1-sqrt(Space(Px,f)/Space_best))) ・・・式（３） When the face area score of the frame identifier f is S2 (f) and the optimum area is Space_best, the face area score S2 (f) is calculated by, for example, the equation (3). Here, w is a parameter indicating how far from the optimum area Space_best is allowed. A default value may be given to w, or the condition data 13 may be set in advance. Space (Px, f) is the ratio of the number of pixels in the face area of the person having the largest area of the face area to the number of pixels in the frame data 12 to be processed.
S2 (f) = exp (-1 * w * abs (1-sqrt (Space (Px, f) / Space_best))) ・・・ Equation (3)

ステップＳ２０３において顔領域面積スコア算出部２５は、最適面積との近さに応じて、顔領域面積スコアを算出する。例えば、最適面積が０．２の場合、ステップＳ２０２で算出された顔領域の面積が０．２の処理対象フレームデータ１２の顔領域面積スコアは、ステップＳ２０２で算出された顔領域の面積が０．５の処理対象フレームデータ１２の顔領域面積スコアよりも高くなる。 In step S203, the face area area score calculation unit 25 calculates the face area area score according to the proximity to the optimum area. For example, when the optimum area is 0.2, the face area score of the processing target frame data 12 in which the area of the face area calculated in step S202 is 0.2 is such that the area of the face area calculated in step S202 is 0. It is higher than the face area area score of the processing target frame data 12 in 5.5.

（表情スコア算出部）
表情スコア算出部２６は、処理対象フレームデータ１２で認識される人物について、表情の種類に対する表情値を算出する。表情スコア算出部２６は、処理対象フレームデータ１２の人物の各表情の種類の表情値の合計に対する、各表情値のうちの最大値を、フレームデータの表情スコアとして算出する。 (Facial expression score calculation unit)
The facial expression score calculation unit 26 calculates the facial expression value for the type of facial expression for the person recognized by the processing target frame data 12. The facial expression score calculation unit 26 calculates the maximum value of each facial expression value as the facial expression score of the frame data with respect to the total of the facial expression values of each facial expression type of the person in the processing target frame data 12.

表情スコア算出部２６は、処理対象フレームデータ１２において複数の人物が認識された場合、表情スコアを算出する方法はいくつか考えられる。例えば表情スコア算出部２６は、最も顔領域の面積の大きい人物の表情値に基づいて、表情スコアを算出しても良い。表情スコア算出部２６は、所定の閾値以上の顔領域の面積を有する各人物の表情値に基づいて各人物の表情スコアを算出して、算出された表情スコアを平均値を、処理対象フレームデータ１２の表情スコアとしても良い。表情スコア算出部２６は、所定の閾値以上の顔領域の面積を有する各人物の顔領域の面積のうち最も小さい顔領域の面積を有する人物の表情値に基づいて、表情スコアを算出しても良い。 When a plurality of persons are recognized in the processing target frame data 12, the facial expression score calculation unit 26 can consider several methods for calculating the facial expression score. For example, the facial expression score calculation unit 26 may calculate the facial expression score based on the facial expression value of the person having the largest area of the face area. The facial expression score calculation unit 26 calculates the facial expression score of each person based on the facial expression value of each person having an area of the face area equal to or larger than a predetermined threshold, and calculates the calculated facial expression score as the average value, and the processed frame data. It may be a facial expression score of 12. The facial expression score calculation unit 26 may calculate the facial expression score based on the facial expression value of the person having the smallest face area of the face area of each person having the area of the face area equal to or larger than a predetermined threshold. good.

表情スコアのみに基づいてサムネイルデータ１５を出力する場合、サムネイル出力部２９は、表情スコアの高いフレームデータを、サムネイルデータ１５として出力する。 When the thumbnail data 15 is output based only on the facial expression score, the thumbnail output unit 29 outputs the frame data having a high facial expression score as the thumbnail data 15.

図８を参照して、表情スコア算出部２６による表情スコア算出処理を説明する。表情スコア算出部２６は、各処理対象フレームデータ１２について、表情スコアを算出する処理を繰り返す。 The facial expression score calculation process by the facial expression score calculation unit 26 will be described with reference to FIG. The facial expression score calculation unit 26 repeats the process of calculating the facial expression score for each processing target frame data 12.

ステップＳ３０１において表情スコア算出部２６は、処理対象フレームデータ１２において、最も顔領域の大きい人物を特定する。ステップＳ３０２において表情スコア算出部２６は、ステップＳ３０１で特定した最も顔領域の大きい人物について、各表情の種類に対する表情値を算出する。表情値の算出においては、既存のAPIが用いられても良い。 In step S301, the facial expression score calculation unit 26 identifies the person having the largest face region in the processing target frame data 12. In step S302, the facial expression score calculation unit 26 calculates the facial expression value for each facial expression type for the person with the largest facial expression region identified in step S301. An existing API may be used in calculating the facial expression value.

本発明の実施の形態において図９に示すように、表情の種類として、喜び、怒り、悲しみおよび驚きがあり、それぞれについて、表情値が設定される場合を説明する。ここでは、フレームの識別子f1において、最も顔領域の大きい人物P1について、喜びの表情値5、怒りの表情値0、悲しみの表情値0および驚きの表情値1が算出されたとする。 As shown in FIG. 9 in the embodiment of the present invention, there are joy, anger, sadness, and surprise as types of facial expressions, and a case where a facial expression value is set for each of them will be described. Here, it is assumed that the facial expression value 5 of joy, the facial expression value 0 of anger, the facial expression value 0 of sadness, and the facial expression value 1 of surprise are calculated for the person P1 having the largest facial expression in the frame identifier f1.

ステップＳ３０３において表情スコア算出部２６は、ステップＳ３０２で算出された表情値から、表情スコアを算出する。表情スコアは、表情値の合計に対する表情値の最大値の割合である。処理対象フレームデータ１２の表情スコアS3(f)は、処理対象フレームデータ１２において認識された人物の各表情の種類に対する表情値Sej(f)を用いて、例えば式（４）により算出される。
S3(f)=max(Sej(f))/Σ(Sej(f)) ・・・式（４） In step S303, the facial expression score calculation unit 26 calculates the facial expression score from the facial expression values calculated in step S302. The facial expression score is the ratio of the maximum value of the facial expression value to the total of the facial expression values. The facial expression score S3 (f) of the processing target frame data 12 is calculated by, for example, the equation (4) using the facial expression value Sej (f) for each facial expression type of the person recognized in the processing target frame data 12.
S3 (f) = max (Sej (f)) / Σ (Sej (f)) ・・・ Equation (4)

図９に示すフレームの識別子f1について、喜びの表情値5、怒りの表情値0、悲しみの表情値0および驚きの表情値1と算出されると、表情値の合計は6で、最大の表情値は5であるので、表情スコアは5/6となる。 When the frame identifier f1 shown in FIG. 9 is calculated as a joy facial expression value 5, an angry facial expression value 0, a sad facial expression value 0, and a surprise facial expression value 1, the total facial expression value is 6, which is the maximum facial expression. Since the value is 5, the facial expression score is 5/6.

各処理対象フレームデータ１２について表情スコアを算出し、スコアデータ１４に出力すると、表情スコア算出部２６は処理を終了する。 When the facial expression score is calculated for each processing target frame data 12 and output to the score data 14, the facial expression score calculation unit 26 ends the processing.

なお、本発明の実施の形態において表情スコアは、表情の種類を問わず、何らかの種類の表情の表情値が高い処理対象フレームデータがサムネイルデータ１５として選択される場合を説明するが、これに限らない。例えば、条件データ１３に表情の種類を設定し、設定された表情の種類の表情値が高いフレームデータが、サムネイルデータ１５として出力されるようにしても良い。 In the embodiment of the present invention, the facial expression score describes the case where the processing target frame data having a high facial expression value of some kind of facial expression is selected as the thumbnail data 15, regardless of the type of facial expression, but the present invention is limited to this. Absent. For example, the facial expression type may be set in the condition data 13, and the frame data having a high facial expression value of the set facial expression type may be output as the thumbnail data 15.

（音量スコア算出部）
音量スコア算出部２７は、映像データ１１の音量が大きい時間に対応するフレームデータについて高くなり、音量が小さい時間に対応するフレームデータについて低くなる音量スコアを算出する。 (Volume score calculation unit)
The volume score calculation unit 27 calculates a volume score that increases for the frame data corresponding to the time when the volume of the video data 11 is high and decreases for the frame data corresponding to the time when the volume is low.

音量スコアのみに基づいてサムネイルデータ１５を出力する場合、サムネイル出力部２９は、音量スコアの高いフレームデータを、サムネイルデータ１５として出力する。 When the thumbnail data 15 is output based only on the volume score, the thumbnail output unit 29 outputs the frame data having a high volume score as the thumbnail data 15.

図１０を参照して、音量スコア算出部２７による音量スコア算出処理を説明する。 The volume score calculation process by the volume score calculation unit 27 will be described with reference to FIG.

まずステップＳ４０１において音量スコア算出部２７は、映像データ１１の音量の推移を、滑らかな推移に変換する。音量スコア算出部２７は、例えば、映像データ１１の時間に対して微少な時間毎に、ガウシアンで２回畳み込み積分を算出し、式（５）により、フレーム識別子ｆの音量スコアS4(f)の推移が算出される。式（５）におけるV(f)は、映像データ１１におけるフレーム識別子ｆの時間に対応する積分値である。
S4(f)=4*(V(f)-0.5)^2 ・・・式（５） First, in step S401, the volume score calculation unit 27 converts the transition of the volume of the video data 11 into a smooth transition. For example, the volume score calculation unit 27 calculates the convolution integral twice with Gaussian every minute time with respect to the time of the video data 11, and the volume score S4 (f) of the frame identifier f is calculated by the equation (5). The transition is calculated. V (f) in the equation (5) is an integral value corresponding to the time of the frame identifier f in the video data 11.
S4 (f) = 4 * (V (f) -0.5) ^ 2 ・・・ Equation (5)

ステップＳ４０２において音量スコア算出部２７は、各処理対象フレームデータ１２について、音量スコアを算出する処理を繰り返す。音量スコア算出部２７は、ステップＳ４０１で算出した音量の推移から、処理対象フレームデータ１２の時間に対する値を、処理対象フレームデータ１２の音量スコアとして取得する。 In step S402, the volume score calculation unit 27 repeats the process of calculating the volume score for each processing target frame data 12. The volume score calculation unit 27 acquires a value with respect to the time of the processing target frame data 12 as the volume score of the processing target frame data 12 from the change in the volume calculated in step S401.

各処理対象フレームデータ１２について音量スコアを算出し、スコアデータ１４に出力すると、音量スコア算出部２７は処理を終了する。 When the volume score is calculated for each processing target frame data 12 and output to the score data 14, the volume score calculation unit 27 ends the processing.

（統合スコア算出部）
統合スコア算出部２８は、処理対象フレームデータ１２について、人物重要度スコア、顔領域面積スコア、表情スコアおよび音量スコアのうちの１つ以上を含む複数のスコアに、重みをそれぞれ乗算して加算した統合スコアを算出する。 (Integrated score calculation department)
The integrated score calculation unit 28 added the frame data 12 to be processed by multiplying a plurality of scores including one or more of the person importance score, the face area area score, the facial expression score, and the volume score by weights. Calculate the integrated score.

例えば、人物重要度スコア、顔領域面積スコア、表情スコアおよび音量スコアの４つのスコアを統合した統合スコアを算出する場合、式（６）により算出される。式（６）において、Sall(f)、S1(f)、S2(f)、S3(f)およびS4(f)はそれぞれ、フレームの識別子ｆの処理対象フレームデータ１２における統合スコア、人物重要度スコア、顔領域面積スコア、表情スコアおよび音量スコアである。a1、a2、a3およびa4はそれぞれ、条件データ１３で設定される人物重要度スコア、顔領域面積スコア、表情スコアおよび音量スコアの重みである。 For example, when calculating the integrated score by integrating the four scores of the person importance score, the face area area score, the facial expression score, and the volume score, it is calculated by the equation (6). In equation (6), Sall (f), S1 (f), S2 (f), S3 (f) and S4 (f) are the integrated score and person importance in the frame data 12 to be processed of the frame identifier f, respectively. Score, face area score, facial expression score and volume score. a1, a2, a3 and a4 are weights of the person importance score, the face area area score, the facial expression score and the volume score set in the condition data 13, respectively.

Sall(f)=Σai*Si(f) (i=1.2.3.4) ・・・式（６） Sall (f) = Σai * Si (f) (i = 12.3.4) ・・・ Equation (6)

統合スコア算出部２８は、人物重要度スコア、顔領域面積スコア、表情スコアおよび音量スコアのうちの１つ以上と、他のスコアを含めて、２つ以上のスコアに基づいて、統合スコアを算出する。統合スコア算出部２８は、各スコアに対してそれぞれ重みを乗算して加算することで、統合スコアを算出する。 The integrated score calculation unit 28 calculates the integrated score based on one or more of the person importance score, the face area area score, the facial expression score, and the volume score, and two or more scores including the other scores. To do. The integrated score calculation unit 28 calculates the integrated score by multiplying each score by a weight and adding them.

統合スコアに基づいてサムネイルデータ１５を出力する場合、サムネイル出力部２９は、統合スコアの高いフレームデータを、サムネイルデータ１５として出力する。 When the thumbnail data 15 is output based on the integrated score, the thumbnail output unit 29 outputs the frame data having a high integrated score as the thumbnail data 15.

このように本発明の実施の形態に係るサムネイル出力装置１は、ユーザの意図を反映したサムネイルデータ１５を出力することができる。 As described above, the thumbnail output device 1 according to the embodiment of the present invention can output thumbnail data 15 that reflects the user's intention.

例えば、映像データ１１が映画、ドラマ等の場合、サムネイルデータ１５にメインキャストが映っていることが求められる。従って、人物重要度スコアのみに基づいて、或いは人物重要度スコアの重みを高く設定して算出された統合スコアに基づいて、サムネイルデータ１５を抽出することにより、所望のサムネイルデータ１５を出力することができる。一方、風景のロケ番組では、人物よりも風景を撮影した瞬間の画像をサムネイルとして抽出することが求められる。従って、ロケ番組の映像データ１１について、人物重要度スコアの重みを、他のスコアより低く設定して算出された統合スコアに基づいて、サムネイルデータ１５を抽出することにより、所望のサムネイルデータを出力することができる。 For example, when the video data 11 is a movie, drama, or the like, it is required that the thumbnail data 15 shows the main cast. Therefore, the desired thumbnail data 15 is output by extracting the thumbnail data 15 based only on the person importance score or based on the integrated score calculated by setting the weight of the person importance score high. Can be done. On the other hand, in a landscape location program, it is required to extract the image at the moment when the landscape is taken rather than the person as a thumbnail. Therefore, for the video data 11 of the location program, the desired thumbnail data is output by extracting the thumbnail data 15 based on the integrated score calculated by setting the weight of the person importance score lower than the other scores. can do.

また映像データ１１に人物が映っている場合でも、ユーザ所望のサムネイルデータ１５を選択する指標が異なる場合がある。例えば映画、ドラマ等ではメインキャストが映っているサムネイルデータ１５が求められる一方、バラエティ番組でリポーターが商品を紹介している場合、サムネイルデータ１５に商品が大きく映りリポーターは小さく映っていることが求められる。従って、このようなバラエティ番組等の映像データ１１について、人物重要度スコアの重みを、他のスコアより低く設定して算出された統合スコアに基づいて、サムネイルデータ１５を抽出することにより、所望のサムネイルデータ１５を出力することができる。 Further, even when a person is shown in the video data 11, the index for selecting the thumbnail data 15 desired by the user may be different. For example, in movies, dramas, etc., thumbnail data 15 showing the main cast is required, while when a reporter introduces a product in a variety show, the thumbnail data 15 requires that the product appears large and the reporter appears small. Be done. Therefore, for the video data 11 of such a variety show or the like, it is desired to extract the thumbnail data 15 based on the integrated score calculated by setting the weight of the person importance score lower than the other scores. Thumbnail data 15 can be output.

ドキュメンタリー番組のように一人の人に密着した番組のサムネイルを作る場合、サムネイルデータに、密着対象の人物が映っているだけではなく、その人の表情が笑顔だったり、怒っていたり、というように番組に適した表情が含まれることが求められる。従って、表情スコアのみに基づいて、或いは表情スコアの重みを高く設定して算出された統合スコアに基づいて、サムネイルデータ１５を抽出することにより、所望のサムネイルデータ１５を出力することができる。 When making a thumbnail of a program that is closely related to one person, such as a documentary program, not only the person to be closely related is reflected in the thumbnail data, but also that person's facial expression is smiling or angry. It is required to include facial expressions suitable for the program. Therefore, the desired thumbnail data 15 can be output by extracting the thumbnail data 15 based only on the facial expression score or based on the integrated score calculated by setting the weight of the facial expression score high.

バラエティ番組では、キャストの表情だけではなく、客が盛り上がった瞬間が番組のハイライトとなるため、盛り上がっている瞬間のフレームデータをサムネイルデータ１５として抽出することが求められる。従って、音量スコアのみに基づいて、或いは音量スコアの重みを高く設定して算出された統合スコアに基づいて、サムネイルデータ１５を抽出することにより、所望のサムネイルデータ１５を出力することができる。一方、ロケ番組では、リポーターが話している瞬間よりも、静かに景色を撮影した瞬間のフレームデータをサムネイルデータ１５として抽出することが求められる。従って、ロケ番組の映像データ１１について、音量スコアの重みを、他のスコアより低く設定して算出された統合スコアに基づいて、サムネイルデータ１５を抽出することにより、所望のサムネイルデータ１５を出力することができる。 In a variety show, not only the facial expression of the cast but also the moment when the customer is excited is the highlight of the program, so it is required to extract the frame data at the moment when the customer is excited as thumbnail data 15. Therefore, the desired thumbnail data 15 can be output by extracting the thumbnail data 15 based only on the volume score or based on the integrated score calculated by setting the weight of the volume score high. On the other hand, in the location program, it is required to extract the frame data at the moment when the scenery is photographed quietly as the thumbnail data 15 rather than the moment when the reporter is talking. Therefore, the desired thumbnail data 15 is output by extracting the thumbnail data 15 based on the integrated score calculated by setting the weight of the volume score lower than the other scores for the video data 11 of the location program. be able to.

このように、本発明の実施の形態に係るサムネイル出力装置１は、ユーザの意向を反映して、映像データ１１のサムネイルデータ１５を出力することができる。 As described above, the thumbnail output device 1 according to the embodiment of the present invention can output the thumbnail data 15 of the video data 11 reflecting the intention of the user.

（その他の実施の形態）
上記のように、本発明の実施の形態によって記載したが、この開示の一部をなす論述および図面はこの発明を限定するものであると理解すべきではない。この開示から当業者には様々な代替実施の形態、実施例および運用技術が明らかとなる。 (Other embodiments)
As mentioned above, although described by embodiments of the invention, the statements and drawings that form part of this disclosure should not be understood to limit the invention. This disclosure reveals to those skilled in the art various alternative embodiments, examples and operational techniques.

例えば、本発明の実施の形態に記載したサムネイル出力装置は、図１に示すように一つのハードウエア上に構成されても良いし、その機能や処理数に応じて複数のハードウエア上に構成されても良い。また、既存の情報処理システム上に実現されても良い。 For example, the thumbnail output device described in the embodiment of the present invention may be configured on one hardware as shown in FIG. 1, or may be configured on a plurality of hardware according to its function and the number of processes. May be done. Further, it may be realized on an existing information processing system.

本発明はここでは記載していない様々な実施の形態等を含むことは勿論である。従って、本発明の技術的範囲は上記の説明から妥当な特許請求の範囲に係る発明特定事項によってのみ定められるものである。 It goes without saying that the present invention includes various embodiments not described here. Therefore, the technical scope of the present invention is defined only by the matters specifying the invention relating to the reasonable claims from the above description.

１サムネイル出力装置
１０記憶装置
１１映像データ
１２処理対象フレームデータ
１３条件データ
１４スコアデータ
１５サムネイルデータ
２０処理装置
２１処理対象フレーム抽出部
２２条件データ取得部
２３スコア算出部
２４人物重要度スコア算出部
２５顔領域面積スコア算出部
２６表情スコア算出部
２７音量スコア算出部
２８統合スコア算出部
２９サムネイル出力部
1 Thumbnail output device 10 Storage device 11 Video data 12 Processing target frame data 13 Condition data 14 Score data 15 Thumbnail data 20 Processing device 21 Processing target frame extraction unit 22 Condition data acquisition unit 23 Score calculation unit 24 Person importance score calculation unit 25 Face area area score calculation unit 26 Facial expression score calculation unit 27 Volume score calculation unit 28 Integrated score calculation unit 29 Thumbnail output unit

Claims

A thumbnail output device that outputs thumbnail data of video data.
For each person recognized by the frame data constituting the video data, the appearance time is calculated by adding the time until the frame data to be processed next to the frame data recognized by the person.
The character score, which is the ratio of the character's appearance time to the maximum value of each person's appearance time, is calculated for each person, and at the same time.
A person importance score calculation unit that calculates the character score of the person having the largest face area in the frame data as the person importance score of the frame data.
A thumbnail output device including a thumbnail output unit that outputs frame data having a high character score as thumbnail data.

The area of the face area of the person recognized by the frame data constituting the video data is calculated.
A face area area score calculation unit for calculating a face area area score that is high for frame data having the area of the face area close to the optimum area and low for frame data having the area of the face area far from the optimum area is further provided.
The third aspect of claim 1, wherein the thumbnail output unit outputs frame data having a high integrated score , which is obtained by multiplying the person importance score and the face area area score by weights and adding them , as thumbnail data. Thumbnail output device.

For the person recognized by the frame data constituting the video data, the facial expression value for the facial expression type is calculated.
Further provided with a facial expression score calculation unit that calculates the ratio of the maximum value of each facial expression value to the total facial expression value of each facial expression type of the person in the frame data as the facial expression score of the frame data.
The thumbnail according to claim 1, wherein the thumbnail output unit outputs frame data having a high integrated score obtained by multiplying the person importance score and the facial expression score by weights, respectively, and adding them as thumbnail data. Output device.

A volume score calculation unit for calculating a volume score that increases the frame data corresponding to the time when the volume of the video data is high and decreases the frame data corresponding to the time when the volume is low is further provided.
The thumbnail according to claim 1, wherein the thumbnail output unit outputs frame data having a high integrated score obtained by multiplying the person importance score and the volume score by weights, respectively, and adding them as thumbnail data. Output device.

A thumbnail output device that outputs thumbnail data of video data.
Of the frame data of the video data, the person importance score according to claim 1, the face area area score according to claim 2, the facial expression score according to claim 3, and the volume score according to claim 4. An integrated score calculation unit that calculates an integrated score by multiplying multiple scores including one or more by weights and adding them.
A thumbnail output device including a thumbnail output unit that outputs frame data having a high integrated score as thumbnail data.

It is a thumbnail output method that outputs thumbnail data of video data.
For each person recognized by the frame data constituting the video data, the computer calculates the appearance time by adding the time until the frame data to be processed next to the frame data recognized by the person.
A step to calculate the character score, which is the ratio of the character's appearance time to the maximum value of each person's appearance time, for each person, and
A step in which the computer calculates the character score of the person having the largest face area in the frame data as the person importance score of the frame data.
A thumbnail output method, wherein the computer includes a step of outputting frame data having a high character score as thumbnail data.

The computer calculates the area of the face area of the person recognized by the frame data constituting the video data.
A step of calculating a face area area score that is high for frame data having the area of the face area close to the optimum area and low for frame data having the area of the face area far from the optimum area.
6. The computer further comprises a step of outputting as thumbnail data frame data having a high integrated score obtained by multiplying the person importance score and the face area area score by weights, respectively. Thumbnail output method described in.

The computer calculates a facial expression value for a type of facial expression for a person recognized by the frame data constituting the video data.
A step of calculating the ratio of the maximum value of each facial expression value to the total of the facial expression values of each facial expression type of the person in the frame data as the facial expression score of the frame data.
The sixth aspect of claim 6 is characterized in that the computer further includes a step of outputting as thumbnail data frame data having a high integrated score obtained by multiplying the person importance score and the facial expression score by weights and adding them. Thumbnail output method.

A step in which the computer calculates a volume score that increases for frame data corresponding to a time when the volume of the video data is high and decreases for frame data corresponding to a time when the volume is low.
The sixth aspect of claim 6 is characterized in that the computer further includes a step of outputting frame data having a high integrated score , which is obtained by multiplying the person importance score and the volume score by a weight, respectively , as thumbnail data. Thumbnail output method.

It is a thumbnail output method that outputs thumbnail data of video data.
The computer uses the frame data of the video data as the person importance score according to claim 6, the face area area score according to claim 7, the facial expression score according to claim 8, and the volume score according to claim 9. A step of calculating an integrated score by multiplying a plurality of scores including one or more of them by weights and adding them.
A thumbnail output method, wherein the computer includes a step of outputting frame data having a high integrated score as thumbnail data.

A thumbnail output program for causing a computer to function as the thumbnail output device according to any one of claims 1 to 5.