JP2021132328A

JP2021132328A - Information processing method, information processing device, and computer program

Info

Publication number: JP2021132328A
Application number: JP2020027209A
Authority: JP
Inventors: 大資玉城; Daisuke Tamaki; 伸行松下; Nobuyuki Matsushita; ヘーラトサマン; Herath Saman; 健一郎金井; Kenichiro Kanai; フェドトフキリル; Fedotov Kyril; 宏輝藤原; Hiroki Fujiwara; 祐也杉田; Yuya Sugita; アブドゥルラーマンアブドゥルガニ; Abdul Rahman Abdulgani
Original assignee: Exa Wizards Inc
Current assignee: Exa Wizards Inc
Priority date: 2020-02-20
Filing date: 2020-02-20
Publication date: 2021-09-09
Anticipated expiration: 2040-02-20
Also published as: JP6830634B1

Abstract

To provide an information processing method, an information processing device, and a computer program that can be expected to facilitate the editing of a moving image.SOLUTION: In an information processing method according to an embodiment in which an information processing device generates moving image data, the information processing device acquires moving image data, identifies the feature of the acquired moving image data, and edits the moving image data according to the specified features. Further, the information processing device may specify a plurality of features at the same time point in the moving image data, and edit the moving image data according to a combination of the specified plurality of features. Further, the information processing device may extract a plurality of pieces of partial moving image data from the moving image data, edit the extracted pieces of partial moving image data, and combine the pieces of edited partial moving image data.SELECTED DRAWING: Figure 7

Description

本発明は、動画像データに対する編集処理を行う情報処理方法、情報処理装置及びコンピュータプログラムに関する。 The present invention relates to an information processing method, an information processing device, and a computer program that perform editing processing on moving image data.

近年、動画像を撮影することができる機器が広く普及しており、多くのユーザが簡単に動画像の撮影を行うことができる。しかし、撮影された動画像データに対する編集処理はある程度の知識及び技術等が必要であり、一般のユーザにとって動画像の編集は敷居が高いものであった。 In recent years, devices capable of capturing moving images have become widespread, and many users can easily capture moving images. However, the editing process for the captured moving image data requires a certain amount of knowledge and skill, and editing the moving image is a high threshold for general users.

特許文献１においては、動画の撮像時又は再生時に撮像した操作者の動画からこの操作者の顔の表情を数値化して評価値を算出し、算出した評価値を元の動画と同じタイムラインで記録し、記録した評価値に基づいて元の動画の部分動画を順次抽出してダイジェストを生成する動画像処理装置が提案されている。 In Patent Document 1, an evaluation value is calculated by quantifying the facial expression of the operator from the video of the operator captured at the time of capturing or playing back the moving image, and the calculated evaluation value is used on the same timeline as the original moving image. A moving image processing device has been proposed that records and sequentially extracts partial moving images of the original moving image based on the recorded evaluation value to generate a digest.

特開２０１４−１１２７８７号公報Japanese Unexamined Patent Publication No. 2014-112787

しかしながら特許文献１に記載の動画像処理装置は、元の動画像から単に部分動画を抽出してダイジェストを生成するのみであり、ユーザによる動画像の編集の補助には不十分である。 However, the moving image processing device described in Patent Document 1 merely extracts a partial moving image from the original moving image to generate a digest, and is insufficient for assisting the user in editing the moving image.

本発明は、斯かる事情に鑑みてなされたものであって、その目的とするところは、動画像の編集を容易化することが期待できる情報処理方法、情報処理装置及びコンピュータプログラムを提供することにある。 The present invention has been made in view of such circumstances, and an object of the present invention is to provide an information processing method, an information processing device, and a computer program which can be expected to facilitate editing of moving images. It is in.

一実施形態に係る情報処理方法は、情報処理装置が動画像データを生成する情報処理方法であって、前記情報処理装置が、動画像データを取得し、取得した動画像データの特徴を特定し、特定した特徴に応じて前記動画像データの編集処理を行う。 The information processing method according to one embodiment is an information processing method in which an information processing device generates moving image data, and the information processing device acquires moving image data and identifies the characteristics of the acquired moving image data. , The moving image data is edited according to the specified feature.

一実施形態による場合は、動画像の編集を容易化することが期待できる。 In the case of one embodiment, it can be expected that the editing of moving images will be facilitated.

本実施の形態に係る情報処理システムの概要を説明するための模式図である。It is a schematic diagram for demonstrating the outline of the information processing system which concerns on this Embodiment. 本実施の形態に係るサーバ装置の構成を示すブロック図である。It is a block diagram which shows the structure of the server apparatus which concerns on this embodiment. 編集方法決定テーブルの一例を示す模式図である。It is a schematic diagram which shows an example of an editing method determination table. 本実施の形態に係る端末装置の構成を示すブロック図である。It is a block diagram which shows the structure of the terminal apparatus which concerns on this embodiment. 端末装置が表示する動画像再生画面の一例を示す模式図である。It is a schematic diagram which shows an example of the moving image reproduction screen displayed by a terminal device. 本実施の形態に係る端末装置が行う処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of the process performed by the terminal apparatus which concerns on this embodiment. サーバ装置が行う動画像データの編集処理を説明するための模式図である。It is a schematic diagram for demonstrating the editing process of moving image data performed by a server apparatus. 画像を重畳する編集処理の一例を説明するための模式図である。It is a schematic diagram for demonstrating an example of an editing process which superimposes an image. 本実施の形態においてサーバ装置が行う処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of the process performed by the server apparatus in this embodiment. 変形例１に係る端末装置が表示する編集設定画面の一例を示す模式図である。It is a schematic diagram which shows an example of the edit setting screen displayed by the terminal apparatus which concerns on modification 1. FIG. 変形例２に係る情報処理システムの構成を説明するための模式図である。It is a schematic diagram for demonstrating the structure of the information processing system which concerns on modification 2. FIG.

本発明の実施形態に係る情報処理システムの具体例を、以下に図面を参照しつつ説明する。なお、本発明はこれらの例示に限定されるものではなく、特許請求の範囲によって示され、特許請求の範囲と均等の意味及び範囲内でのすべての変更が含まれることが意図される。 A specific example of the information processing system according to the embodiment of the present invention will be described below with reference to the drawings. It should be noted that the present invention is not limited to these examples, and is indicated by the scope of claims, and is intended to include all modifications within the meaning and scope equivalent to the scope of claims.

＜システム構成＞
図１は、本実施の形態に係る情報処理システムの概要を説明するための模式図である。本実施の形態に係る情報処理システムは、動画像の編集サービスを提供するサーバ装置１と、このサービスを利用するユーザが所持するスマートフォン又はタブレット端末装置等の端末装置３とを備えて構成されている。サーバ装置１及び端末装置３は、携帯電話通信網、無線ＬＡＮ（Local Area Network）及びインターネット等を含むネットワークＮを介して通信を行うことができる。 <System configuration>
FIG. 1 is a schematic diagram for explaining an outline of an information processing system according to the present embodiment. The information processing system according to the present embodiment includes a server device 1 that provides a moving image editing service, and a terminal device 3 such as a smartphone or tablet terminal device owned by a user who uses this service. There is. The server device 1 and the terminal device 3 can communicate via a network N including a mobile phone communication network, a wireless LAN (Local Area Network), the Internet, and the like.

例えばユーザは、端末装置３に搭載されたカメラにて動画像の撮影を行った後、撮影により得られた動画像データの自動編集を実施する指示を端末装置３へ与える。この指示に応じて端末装置３は、ネットワークＮを介したサーバ装置１との通信を行うことにより、編集対象の動画像データをサーバ装置１へ送信する。サーバ装置１は、端末装置３から送信された動画像データを取得し、この動画像データに対して適宜の編集処理を行い、編集済みの動画像データを端末装置３へ返送する。端末装置３は、サーバ装置１から編集済みの動画像データを受信して記憶し、この動画像データの再生（表示）又はＳＮＳ（Social Networking Service）への投稿等の処理を行う。 For example, the user gives an instruction to the terminal device 3 to automatically edit the moving image data obtained by taking a moving image with a camera mounted on the terminal device 3. In response to this instruction, the terminal device 3 transmits the moving image data to be edited to the server device 1 by communicating with the server device 1 via the network N. The server device 1 acquires the moving image data transmitted from the terminal device 3, performs appropriate editing processing on the moving image data, and returns the edited moving image data to the terminal device 3. The terminal device 3 receives and stores the edited moving image data from the server device 1, and performs processing such as reproduction (display) of the moving image data or posting to the SNS (Social Networking Service).

図２は、本実施の形態に係るサーバ装置１の構成を示すブロック図である。本実施の形態に係るサーバ装置１は、処理部１１、記憶部（ストレージ）１２及び通信部（トランシーバ）１３等を備えて構成されている。なお本実施の形態においては、１つのサーバ装置１にて処理が行われるものとして説明を行うが、複数のサーバ装置１が分散して処理を行ってもよい。 FIG. 2 is a block diagram showing a configuration of the server device 1 according to the present embodiment. The server device 1 according to the present embodiment includes a processing unit 11, a storage unit (storage) 12, a communication unit (transceiver) 13, and the like. In the present embodiment, it is assumed that the processing is performed by one server device 1, but a plurality of server devices 1 may be distributed to perform the processing.

処理部１１は、ＣＰＵ（Central Processing Unit）、ＭＰＵ（Micro-Processing Unit）又はＧＰＵ（Graphics Processing Unit）等の演算処理装置、ＲＯＭ（Read Only Memory）、及び、ＲＡＭ（Random Access Memory）等を用いて構成されている。処理部１１は、記憶部１２に記憶されたサーバプログラム１２ａを読み出して実行することにより、動画像データの編集に係る種々の処理を行う。 The processing unit 11 uses an arithmetic processing unit such as a CPU (Central Processing Unit), an MPU (Micro-Processing Unit) or a GPU (Graphics Processing Unit), a ROM (Read Only Memory), a RAM (Random Access Memory) and the like. It is composed of. The processing unit 11 reads and executes the server program 12a stored in the storage unit 12 to perform various processes related to the editing of moving image data.

記憶部１２は、例えばハードディスク等の大容量の記憶装置を用いて構成されている。記憶部１２は、処理部１１が実行する各種のプログラム、及び、処理部１１の処理に必要な各種のデータを記憶する。本実施の形態において記憶部１２は、処理部１１が実行するサーバプログラム１２ａと、編集処理に用いられる特徴特定モデル１２ｂ及び編集方法決定テーブル１２ｃとを記憶している。 The storage unit 12 is configured by using a large-capacity storage device such as a hard disk. The storage unit 12 stores various programs executed by the processing unit 11 and various data required for processing by the processing unit 11. In the present embodiment, the storage unit 12 stores the server program 12a executed by the processing unit 11, the feature identification model 12b used for the editing process, and the editing method determination table 12c.

本実施の形態においてサーバプログラム１２ａは、メモリカード又は光ディスク等の記録媒体９９に記録された態様で提供され、サーバ装置１は記録媒体９９からサーバプログラム１２ａを読み出して記憶部１２に記憶する。ただし、サーバプログラム１２ａは、例えばサーバ装置１の製造段階において記憶部１２に書き込まれてもよい。また例えばサーバプログラム１２ａは、遠隔の他のサーバ装置等が配信するものをサーバ装置１が通信にて取得してもよい。例えばサーバプログラム１２ａは、記録媒体９９に記録されたものを書込装置が読み出してサーバ装置１の記憶部１２に書き込んでもよい。サーバプログラム１２ａは、ネットワークを介した配信の態様で提供されてもよく、記録媒体９９に記録された態様で提供されてもよい。 In the present embodiment, the server program 12a is provided in a mode of being recorded on a recording medium 99 such as a memory card or an optical disk, and the server device 1 reads the server program 12a from the recording medium 99 and stores it in the storage unit 12. However, the server program 12a may be written to the storage unit 12 at the manufacturing stage of the server device 1, for example. Further, for example, in the server program 12a, the server device 1 may acquire what is distributed by another remote server device or the like by communication. For example, in the server program 12a, the writing device may read what has been recorded on the recording medium 99 and write it in the storage unit 12 of the server device 1. The server program 12a may be provided in the form of distribution via the network, or may be provided in the form of being recorded on the recording medium 99.

本実施の形態に係るサーバ装置１は、いわゆる人工知能を活用して動画像データの編集処理を行うものであり、編集処理に用いる特徴特定モデル１２ｂを有している。特徴特定モデル１２ｂは、予め機械学習がなされた学習済の学習モデルであり、例えばニューラルネットワーク又はＳＶＭ（Support Vector Machine）等の学習モデルが採用され得る。特徴特定モデル１２ｂは、例えばサーバプログラム１２ａと共に記録媒体９９を介して提供されてもよく、また例えばサーバプログラム１２ａとは別に他のサーバ装置等により配信されてもよく、どのような態様で提供されてもよい。本実施の形態に係るサーバ装置１は、予め機械学習がなされた特徴特定モデル１２ｂを取得して記憶部１２に記憶している。また本実施の形態においては、特徴特定モデル１２ｂを機械学習により生成する処理は、サーバ装置１とは別の装置にて行われるものとするが、これに限るものではなく、サーバ装置１が機械学習を行ってもよい。また特徴特定モデル１２ｂをサーバ装置１が備えていなくてもよく、特徴特定モデル１２ｂを備える他の装置に対してサーバ装置１がこの特徴特定モデル１２ｂを用いる処理を依頼し、他の装置からサーバ装置１が処理結果を取得してもよい。 The server device 1 according to the present embodiment edits moving image data by utilizing so-called artificial intelligence, and has a feature-specific model 12b used for the editing process. The feature-specific model 12b is a learned learning model that has been machine-learned in advance, and for example, a learning model such as a neural network or SVM (Support Vector Machine) can be adopted. The feature-specific model 12b may be provided, for example, together with the server program 12a via the recording medium 99, or may be distributed by another server device or the like separately from the server program 12a, and is provided in any manner. You may. The server device 1 according to the present embodiment acquires the feature-specific model 12b that has been machine-learned in advance and stores it in the storage unit 12. Further, in the present embodiment, the process of generating the feature identification model 12b by machine learning is performed by a device different from the server device 1, but the present invention is not limited to this, and the server device 1 is a machine. You may study. Further, the server device 1 does not have to include the feature-specific model 12b, and the server device 1 requests a process using the feature-specific model 12b from another device having the feature-specific model 12b, and the server from the other device. The device 1 may acquire the processing result.

特徴特定モデル１２ｂは、動画像データの入力に対して、この動画像データに含まれるシーンの特徴を特定した情報を出力する学習モデルである。特徴特定モデル１２ｂは、例えばＣＮＮ（Convolutional Neural Network）又はＲＮＮ（Recurrent Neural Network）等の種々の学習モデルが採用され得る。本実施の形態において特徴特定モデル１２ｂが入力を受け付けるデータは動画像データとするが、動画像データに加えて例えばＧＰＳ（Global Positioning System）による位置情報又は時刻情報等の種々の情報が入力されてもよい。特徴特定モデル１２ｂが出力する情報には、例えば動画像に写っているもの（人物、自動車、自転車、建物、木、動物及び植物等）が何であるかを示す情報、写っているものの属性（表情、年齢、性別及び動作等）を示す情報、動画像が撮影された場所（屋内、屋外、海、山、森及び都市）がどこであるかを示す情報、動画像が撮影された時間帯（朝、昼、夕方又は夜等）を示す情報、及び、動画像が撮影された際の天候（晴、雨、曇又は雪等）を示す情報等の種々の情報が含まれ得る。なお特徴特定モデル１２ｂは、１つの学習モデルとして実現されるのではなく、例えば動画像に写っているものを検出する学習モデル、写っているものの属性を検出する学習モデル等のように、複数の学習モデルの集合体として実現されてよい。 The feature-specific model 12b is a learning model that outputs information that identifies the features of the scene included in the moving image data in response to the input of the moving image data. As the feature-specific model 12b, various learning models such as CNN (Convolutional Neural Network) or RNN (Recurrent Neural Network) can be adopted. In the present embodiment, the data that the feature identification model 12b accepts for input is moving image data, but in addition to the moving image data, various information such as position information or time information by GPS (Global Positioning System) is input. May be good. The information output by the feature-specific model 12b includes, for example, information indicating what is reflected in a moving image (person, automobile, bicycle, building, tree, animal, plant, etc.), and attributes (facial expression) of what is reflected. Information indicating age, gender and movement, etc., information indicating where the moving image was taken (indoor, outdoor, sea, mountain, forest and city), time zone when the moving image was taken (morning) , Day, evening or night, etc.), and various information such as information indicating the weather (fine, rain, cloudy, snow, etc.) when the moving image was taken may be included. Note that the feature-specific model 12b is not realized as one learning model, but a plurality of learning models such as a learning model that detects what is reflected in a moving image, a learning model that detects attributes of what is reflected, and the like. It may be realized as a set of learning models.

編集方法決定テーブル１２ｃは、動画像データに対して特定された特徴に係る一又は複数の情報であるシーン情報と、動画像データに対して行う編集方法とが対応付けて記憶されたテーブルである。図３は、編集方法決定テーブル１２ｃの一例を示す模式図である。図示の編集方法決定テーブル１２ｃでは、動画像データに関する天候、場所、時間帯、子供の有無、大人の有無及び自動車の有無等の特徴がシーン情報に含まれる特徴として例示されている。また図示の編集方法決定テーブル１２ｃでは、これらのシーン情報の特徴の組み合わせに対して、編集方法として方法１及び方法２の２つが予め定められている。 The editing method determination table 12c is a table in which the scene information, which is one or more information related to the features specified for the moving image data, and the editing method performed for the moving image data are stored in association with each other. .. FIG. 3 is a schematic view showing an example of the editing method determination table 12c. In the illustrated editing method determination table 12c, features such as weather, place, time zone, presence / absence of children, presence / absence of adults, and presence / absence of automobiles related to moving image data are exemplified as features included in the scene information. Further, in the illustrated editing method determination table 12c, two editing methods, method 1 and method 2, are predetermined for the combination of the features of the scene information.

本例の編集方法決定テーブル１２ｃには、シーン情報として天候が晴、場所が屋外、時間帯が昼、子供が写っており性別が男且つ笑顔であるという特徴が動画像データから特定されている場合、笑いのエフェクト画像を追加する編集、及び、明るいＢＧＭを追加する編集を行うことが編集方法として定められている。またシーン情報として場所が屋内、大人が写っており怒りを表しているという特徴が特定されている場合、怒りのエフェクト画像を追加する編集、及び、動画像に映された人物が発する声を変更する編集を行うことが編集方法として定められている。またシーン情報として場所が屋内、子供がハイハイ（四つ這い、ずり這い、いざり這い）しているという特徴が特定されている場合、キャラクタ画像を追加する編集を行うことが編集方法として定められている。また天候が雨、場所が屋外、時間帯が夕方、自動車が写っており走行しているという特徴が特定されている場合、動画をスロー再生する編集を行うことが編集方法として定められている。 In the editing method determination table 12c of this example, the characteristics that the weather is fine, the place is outdoors, the time zone is daytime, the child is shown, and the gender is male and smiling are specified from the moving image data as the scene information. In this case, it is defined as an editing method to perform editing to add a laughing effect image and editing to add a bright BGM. Also, if the scene information is indoors, and the feature that an adult is shown and represents anger is specified, edit to add an anger effect image and change the voice of the person reflected in the moving image. It is stipulated as an editing method to perform editing. In addition, when the feature that the place is indoors and the child is high-high (crawl on all fours, crawling, crawling) is specified as the scene information, editing to add a character image is defined as the editing method. There is. In addition, when the characteristics that the weather is rainy, the place is outdoors, the time zone is evening, and the car is in the picture and the car is running are specified, the editing method is defined as slow-playing the video.

なお上記の編集方法決定テーブル１２ｃの内容は一例であって、これに限るものではない。シーン情報には、天候、場所、時間帯、子供の有無、大人の有無及び自動車の有無以外の種々の特徴が含まれ得る。編集方法は、シーン情報における１つの組み合わせに対して２つではなく、１つ又は３つ以上であってよい。 The content of the above-mentioned editing method determination table 12c is an example, and is not limited to this. The scene information may include various features other than weather, place, time zone, presence / absence of children, presence / absence of adults, and presence / absence of automobiles. The editing method may be one or three or more for one combination in the scene information, instead of two.

通信部１３は、携帯電話通信網及びインターネット等を含むネットワークＮを介して、種々の装置との間で通信を行う。本実施の形態において通信部１３は、ネットワークＮを介して、一又は複数の端末装置３との間で通信を行う。通信部１３は、処理部１１から与えられたデータを他の装置へ送信すると共に、他の装置から受信したデータを処理部１１へ与える。 The communication unit 13 communicates with various devices via a network N including a mobile phone communication network and the Internet. In the present embodiment, the communication unit 13 communicates with one or a plurality of terminal devices 3 via the network N. The communication unit 13 transmits the data given by the processing unit 11 to another device, and gives the data received from the other device to the processing unit 11.

なお記憶部１２は、サーバ装置１に接続された外部記憶装置であってよい。またサーバ装置１は、複数のコンピュータを含んで構成されるマルチコンピュータであってよく、ソフトウェアによって仮想的に構築された仮想マシンであってもよい。またサーバ装置１は、上記の構成に限定されず、例えば可搬型の記憶媒体に記憶された情報を読み取る読取部、操作入力を受け付ける入力部、又は、画像を表示する表示部等を含んでもよい。 The storage unit 12 may be an external storage device connected to the server device 1. Further, the server device 1 may be a multi-computer including a plurality of computers, or may be a virtual machine virtually constructed by software. Further, the server device 1 is not limited to the above configuration, and may include, for example, a reading unit that reads information stored in a portable storage medium, an input unit that accepts operation input, a display unit that displays an image, and the like. ..

また本実施の形態に係るサーバ装置１の処理部１１には、記憶部１２に記憶されたサーバプログラム１２ａを処理部１１が読み出して実行することにより、動画像取得部１１ａ、部分動画像抽出部１１ｂ、特徴特定部１１ｃ、編集方法決定部１１ｄ、編集処理部１１ｅ、結合処理部１１ｆ及び編集済動画像送信部１１ｇ等が、ソフトウェア的な機能部として処理部１１に実現される。 Further, in the processing unit 11 of the server device 1 according to the present embodiment, the processing unit 11 reads out and executes the server program 12a stored in the storage unit 12, thereby causing the moving image acquisition unit 11a and the partial moving image extraction unit. 11b, a feature specifying unit 11c, an editing method determining unit 11d, an editing processing unit 11e, a combining processing unit 11f, an edited moving image transmitting unit 11g, and the like are realized in the processing unit 11 as software-like functional units.

動画像取得部１１ａは、自動編集の対象となる動画像データを端末装置３から取得する処理を行う。動画像取得部１１ａは、例えば端末装置３から編集処理を実施する依頼と共に送信される動画像データを通信部１３にて受信し、受信した動画像データを記憶部１２の空き領域等に記憶することで、動画像データを取得する。また動画像取得部１１ａは、動画像データと共に、例えばこの動画像データが撮影された日時又は場所等の情報を取得してもよい。撮影日時の情報は、例えば端末装置３のカレンダー機能及び時計機能等に基づいて動画像データに付され得る。撮影場所の情報は、例えば端末装置３が受信したＧＰＳ信号に基づいて特定され、動画像データに付され得る。 The moving image acquisition unit 11a performs a process of acquiring moving image data to be automatically edited from the terminal device 3. For example, the moving image acquisition unit 11a receives the moving image data transmitted together with the request for performing the editing process from the terminal device 3 in the communication unit 13, and stores the received moving image data in an empty area of the storage unit 12 or the like. By doing so, the moving image data is acquired. Further, the moving image acquisition unit 11a may acquire information such as the date and time or place where the moving image data was taken, together with the moving image data. The shooting date and time information can be attached to the moving image data based on, for example, the calendar function and the clock function of the terminal device 3. The information on the shooting location can be specified based on, for example, a GPS signal received by the terminal device 3 and attached to the moving image data.

部分動画像抽出部１１ｂは、編集対象の動画像データから一又は複数の部分動画像データを抽出する処理を行う。部分動画像抽出部１１ｂは、いわゆるダイジェストを生成するための動画像抽出処理を行う。部分動画像抽出部１１ｂは、例えば特定の人もしくは物が映されている部分、写されている人の動きが大きい部分、写されている人が笑顔である部分、写されている人数が多い部分、場面が切り替わる部分、又は、音声の音量が大きい部分等のように、所定の特徴を有する部分画像を動画像データの全体から抽出する。なお、部分動画像抽出部１１ｂによる部分動画像データの抽出処理は、既存の技術であるため、詳細な説明を省略する。例えば部分動画像抽出部１１ｂは、特許文献１に記載の動画像処理装置と同様の技術により、動画像の撮像時又は再生時に撮像した操作者の顔の表情を数値化して評価値を算出し、算出した評価値に基づいて部分動画像データを抽出してもよい。また例えば部分動画像抽出部１１ｂは、特徴特定モデル１２ｂにより動画像データに含まれる一又は複数のシーンの特徴を特定し、所定の特徴を有するシーンを部分動画像データとして抽出することができる。 The partial moving image extraction unit 11b performs a process of extracting one or a plurality of partial moving image data from the moving image data to be edited. The partial moving image extraction unit 11b performs a moving image extraction process for generating a so-called digest. In the partial moving image extraction unit 11b, for example, a part where a specific person or an object is projected, a part where the person being photographed has a large movement, a part where the person being photographed is smiling, and a large number of people are photographed. A partial image having a predetermined feature, such as a part, a part where a scene is switched, or a part where the volume of sound is loud, is extracted from the entire moving image data. Since the partial moving image data extraction process by the partial moving image extraction unit 11b is an existing technique, detailed description thereof will be omitted. For example, the partial moving image extraction unit 11b digitizes the facial expression of the operator captured at the time of capturing or reproducing the moving image and calculates the evaluation value by the same technique as the moving image processing device described in Patent Document 1. , Partial moving image data may be extracted based on the calculated evaluation value. Further, for example, the partial moving image extraction unit 11b can specify the features of one or a plurality of scenes included in the moving image data by the feature specifying model 12b, and can extract the scenes having predetermined features as the partial moving image data.

特徴特定部１１ｃは、記憶部１２に記憶された特徴特定モデル１２ｂを用い、部分動画像抽出部１１ｂが抽出した一又は複数の部分動画像データに対して、この部分動画像データに含まれるシーンの特徴を特定する処理を行う。特徴特定部１１ｃは、抽出された部分動画像データを特徴特定モデル１２ｂへ入力し、特徴特定モデル１２ｂが出力する特徴の特定結果を取得する。特徴特定部１１ｃは、部分動画像データに対して特定された一又は複数の特徴をまとめたシーン情報を作成する。また特徴特定部１１ｃは、部分動画像データの撮影日時及び撮影場所等の情報を特徴特定モデル１２ｂへ入力する構成であってもよい。 The feature specifying unit 11c uses the feature specifying model 12b stored in the storage unit 12, and the scene included in the partial moving image data with respect to one or more partial moving image data extracted by the partial moving image extracting unit 11b. Performs processing to identify the characteristics of. The feature identification unit 11c inputs the extracted partial moving image data into the feature identification model 12b, and acquires the feature identification result output by the feature identification model 12b. The feature specifying unit 11c creates scene information summarizing one or a plurality of features specified for the partial moving image data. Further, the feature specifying unit 11c may be configured to input information such as the shooting date and time and the shooting location of the partial moving image data to the feature specifying model 12b.

編集方法決定部１１ｄは、特徴特定部１１ｃが特定した特徴に関するシーン情報と、記憶部１２に記憶された編集方法決定テーブル１２ｃとに基づいて、部分動画像データに対して行う編集処理の方法を決定する処理を行う。編集方法決定部１１ｄは、シーン情報にて特定された一又は複数の特徴に基づいて編集方法決定テーブル１２ｃを参照し、特定された特徴に対応付けられた編集方法を取得することによって、編集方法を決定する。 The editing method determining unit 11d determines a method of editing processing performed on the partial moving image data based on the scene information related to the feature specified by the feature specifying unit 11c and the editing method determining table 12c stored in the storage unit 12. Perform the process of determining. The editing method determination unit 11d refers to the editing method determination table 12c based on one or a plurality of features specified in the scene information, and obtains the editing method associated with the specified feature to obtain the editing method. To determine.

なお本実施の形態において編集方法決定部１１ｄは、編集方法決定テーブル１２ｃを用いて編集方法を決定する構成とするが、これに限るものではなく、例えば動画像データやシーン情報等の入力に対して適した編集方法を出力するよう予め機械学習がなされた学習モデルを用いて編集方法を決定してもよい。編集方法の決定に学習モデルを用いる構成の場合、例えば編集後の動画像データを視聴したユーザから編集方法に対する評価を受け付けて、受け付けた評価に基づいて学習モデルの再学習処理を行うことができる。 In the present embodiment, the editing method determination unit 11d is configured to determine the editing method using the editing method determination table 12c, but the present invention is not limited to this, and for example, for inputting moving image data, scene information, and the like. The editing method may be determined using a learning model that has been machine-learned in advance so as to output a suitable editing method. In the case of a configuration in which a learning model is used to determine the editing method, for example, an evaluation of the editing method can be received from a user who has viewed the edited moving image data, and the learning model can be relearned based on the received evaluation. ..

編集処理部１１ｅは、編集方法決定部１１ｄが決定した編集方法に従って、部分動画像データに対する編集処理を行う。本実施の形態において編集処理部１１ｅは、例えば装飾画像、キャラクタ画像又はエフェクト画像等の種々の画像を動画像に重畳する編集処理を行う。また例えば編集処理部１１ｅは、動画像に含まれるシーンの時間帯を変更する（昼間から夜間へ、夜間から昼間へ、又は、昼間から夕方へ等）編集処理を行う。また例えば編集処理部１１ｅは、動画像のスタイルを変更する（通常スタイルの動画から絵画風又はアニメ風等へ）編集処理を行う。また例えば編集処理部１１ｅは、動画像データに対して効果音又は背景音を追加する編集処理を行う。また例えば編集処理部１１ｅは、動画像に映された人が話す音声について声色又は声音等を変更する処理を行う。また例えば編集処理部１１ｅは、動画像データの再生速度を変更する（スロー再生、コマ送り再生、倍速再生又は早送り再生等）編集処理を行う。上記の編集処理は一例であって、編集処理部１１ｅはこれら以外の様々な編集処理を行う構成であってよい。 The editing processing unit 11e performs editing processing on the partial moving image data according to the editing method determined by the editing method determining unit 11d. In the present embodiment, the editing processing unit 11e performs editing processing for superimposing various images such as a decorative image, a character image, and an effect image on a moving image. Further, for example, the editing processing unit 11e performs editing processing for changing the time zone of the scene included in the moving image (from daytime to nighttime, from nighttime to daytime, or from daytime to evening, etc.). Further, for example, the editing processing unit 11e performs an editing process for changing the style of the moving image (from a normal style moving image to a painting style, an animation style, or the like). Further, for example, the editing processing unit 11e performs an editing process for adding a sound effect or a background sound to the moving image data. Further, for example, the editing processing unit 11e performs a process of changing the voice color, voice sound, or the like of the voice spoken by the person reflected in the moving image. Further, for example, the editing processing unit 11e performs an editing process for changing the reproduction speed of the moving image data (slow reproduction, frame advance reproduction, double speed reproduction, fast forward reproduction, etc.). The above editing process is an example, and the editing processing unit 11e may be configured to perform various editing processes other than these.

結合処理部１１ｆは、部分動画像抽出部１１ｂにて抽出されて編集処理部１１ｅにて編集処理がなされた複数の部分動画像データを結合することによって、ダイジェスト動画像データを生成する処理を行う。 The combining processing unit 11f performs a process of generating digest moving image data by combining a plurality of partial moving image data extracted by the partial moving image extracting unit 11b and edited by the editing processing unit 11e. ..

編集済動画像送信部１１ｇは、結合処理部１１ｆが部分動画像データを結合して生成したダイジェスト動画像データを、編集処理の依頼元である端末装置３へ送信する処理を行う。 The edited moving image transmitting unit 11g performs a process of transmitting the digest moving image data generated by combining the partial moving image data to the terminal device 3 which is the request source of the editing process.

図４は、本実施の形態に係る端末装置３の構成を示すブロック図である。本実施の形態に係る端末装置３は、処理部３１、記憶部（ストレージ）３２、通信部（トランシーバ）３３、表示部（ディスプレイ）３４、操作部３５及びカメラ３６等を備えて構成されている。端末装置３は、カメラ３６による動画像の撮影機能を搭載した例えばスマートフォン、タブレット型端末装置又はパーソナルコンピュータ等の情報処理装置を用いて構成され得る。また端末装置３は、動画像を撮影するビデオカメラ等の撮影装置であってもよい。 FIG. 4 is a block diagram showing the configuration of the terminal device 3 according to the present embodiment. The terminal device 3 according to the present embodiment includes a processing unit 31, a storage unit (storage) 32, a communication unit (transceiver) 33, a display unit (display) 34, an operation unit 35, a camera 36, and the like. .. The terminal device 3 may be configured by using an information processing device such as a smartphone, a tablet-type terminal device, or a personal computer equipped with a function of capturing a moving image by the camera 36. Further, the terminal device 3 may be a photographing device such as a video camera that captures a moving image.

処理部３１は、ＣＰＵ又はＭＰＵ等の演算処理装置、ＲＯＭ及び等を用いて構成されている。処理部３１は、記憶部３２に記憶されたプログラム３２ａを読み出して実行することにより、カメラ３６による動画像の撮影処理及び撮影した動画像の表示部３４への表示処理等の種々の処理を行う。 The processing unit 31 is configured by using an arithmetic processing unit such as a CPU or MPU, a ROM, and the like. By reading and executing the program 32a stored in the storage unit 32, the processing unit 31 performs various processing such as shooting processing of the moving image by the camera 36 and display processing of the shot moving image on the display unit 34. ..

記憶部３２は、例えばフラッシュメモリ等の不揮発性のメモリ素子を用いて構成されている。記憶部３２は、処理部３１が実行する各種のプログラム、及び、処理部３１の処理に必要な各種のデータを記憶する。本実施の形態において記憶部３２は、処理部３１が実行するプログラム３２ａを記憶している。本実施の形態においてプログラム３２ａは遠隔のサーバ装置等により配信され、これを端末装置３が通信にて取得し、記憶部３２に記憶する。ただしプログラム３２ａは、例えば端末装置３の製造段階において記憶部３２に書き込まれてもよい。例えばプログラム３２ａは、メモリカード又は光ディスク等の記録媒体９８に記録されたプログラム３２ａを端末装置３が読み出して記憶部３２に記憶してもよい。例えばプログラム３２ａは、記録媒体９８に記録されたものを書込装置が読み出して端末装置３の記憶部３２に書き込んでもよい。プログラム３２ａは、ネットワークを介した配信の態様で提供されてもよく、記録媒体９８に記録された態様で提供されてもよい。 The storage unit 32 is configured by using a non-volatile memory element such as a flash memory. The storage unit 32 stores various programs executed by the processing unit 31 and various data required for processing by the processing unit 31. In the present embodiment, the storage unit 32 stores the program 32a executed by the processing unit 31. In the present embodiment, the program 32a is distributed by a remote server device or the like, which is acquired by the terminal device 3 by communication and stored in the storage unit 32. However, the program 32a may be written to the storage unit 32, for example, at the manufacturing stage of the terminal device 3. For example, in the program 32a, the terminal device 3 may read the program 32a recorded on the recording medium 98 such as a memory card or an optical disk and store it in the storage unit 32. For example, in the program 32a, the writing device may read what was recorded on the recording medium 98 and write it in the storage unit 32 of the terminal device 3. The program 32a may be provided in a mode of distribution via a network, or may be provided in a mode recorded on a recording medium 98.

通信部３３は、携帯電話通信網及びインターネット等を含むネットワークＮを介して、種々の装置との間で通信を行う。本実施の形態において通信部３３は、ネットワークＮを介して、サーバ装置１との間で通信を行う。通信部３３は、処理部３１から与えられたデータを他の装置へ送信すると共に、他の装置から受信したデータを処理部３１へ与える。 The communication unit 33 communicates with various devices via a network N including a mobile phone communication network and the Internet. In the present embodiment, the communication unit 33 communicates with the server device 1 via the network N. The communication unit 33 transmits the data given by the processing unit 31 to another device, and gives the data received from the other device to the processing unit 31.

表示部３４は、液晶ディスプレイ等を用いて構成されており、処理部３１の処理に基づいて種々の画像及び文字等を表示する。 The display unit 34 is configured by using a liquid crystal display or the like, and displays various images, characters, and the like based on the processing of the processing unit 31.

操作部３５は、ユーザの操作を受け付け、受け付けた操作を処理部３１へ通知する。例えば操作部３５は、機械式のボタン又は表示部３４の表面に設けられたタッチパネル等の入力デバイスによりユーザの操作を受け付ける。また例えば操作部３５は、マウス及びキーボード等の入力デバイスであってよく、これらの入力デバイスは端末装置３に対して取り外すことが可能な構成であってもよい。 The operation unit 35 accepts the user's operation and notifies the processing unit 31 of the accepted operation. For example, the operation unit 35 accepts a user's operation by an input device such as a mechanical button or a touch panel provided on the surface of the display unit 34. Further, for example, the operation unit 35 may be an input device such as a mouse and a keyboard, and these input devices may be configured to be removable with respect to the terminal device 3.

カメラ３６は、ＣＣＤ（Charge Coupled Device）イメージセンサ又はＣＭＯＳ（Complementary Metal Oxide Semiconductor）イメージセンサ等の撮像素子を用いて構成されている。カメラ３６は、例えば端末装置３の筐体の適所に配置されている。カメラ３６は、撮像素子により撮影した画像を処理部１１へ与える。本実施の形態においてカメラ３６は、動画像を撮影することができる。なお本実施の形態においては、端末装置３がカメラ３６を備え、カメラ３６にて撮影された動画像データに対して編集処理が行われるものとするが、これに限るものではない。例えばデジタルビデオカメラ等の他の装置で撮影された動画像データを端末装置３が取得したものが編集処理の対象とされてもよく、インターネット等において公開されている動画像データを端末装置３が取得したものが編集対象とされてもよい。 The camera 36 is configured by using an image sensor such as a CCD (Charge Coupled Device) image sensor or a CMOS (Complementary Metal Oxide Semiconductor) image sensor. The camera 36 is arranged at an appropriate position in the housing of the terminal device 3, for example. The camera 36 gives the image captured by the image sensor to the processing unit 11. In the present embodiment, the camera 36 can capture a moving image. In the present embodiment, the terminal device 3 is provided with the camera 36, and the moving image data captured by the camera 36 is edited, but the present invention is not limited to this. For example, the moving image data captured by another device such as a digital video camera may be acquired by the terminal device 3 as the target of the editing process, and the terminal device 3 may obtain the moving image data published on the Internet or the like. The acquired one may be edited.

また本実施の形態に係る端末装置３は、記憶部３２に記憶されたプログラム３２ａを処理部３１が読み出して実行することにより、撮影処理部３１ａ、表示処理部３１ｂ、編集指示受付部３１ｃ、動画像送信部３１ｄ及び動画像受信部３１ｅ等がソフトウェア的な機能部として処理部３１に実現される。なおプログラム３２ａは、本実施の形態に係る情報処理システムに専用のプログラムであってもよく、インターネットブラウザ又はウェブブラウザ等の汎用のプログラムであってもよい。プログラム３２ａは、例えば動画像を他のユーザと共有するＳＮＳ（Social Networking Service）のアプリケーションプログラムであってよい。 Further, in the terminal device 3 according to the present embodiment, the processing unit 31 reads out and executes the program 32a stored in the storage unit 32, so that the photographing processing unit 31a, the display processing unit 31b, the editing instruction receiving unit 31c, and the moving image are executed. The image transmitting unit 31d, the moving image receiving unit 31e, and the like are realized in the processing unit 31 as software-like functional units. The program 32a may be a program dedicated to the information processing system according to the present embodiment, or may be a general-purpose program such as an Internet browser or a web browser. The program 32a may be, for example, an application program of an SNS (Social Networking Service) that shares a moving image with another user.

撮影処理部３１ａは、カメラ３６による動画像の撮影に係る処理を行う。撮影処理部３１ａは、例えばカメラ３６のシャッタースピード、フレームレート、ＩＳＯ感度、露出及び絞り値等の制御を行う。また撮影処理部３１ａは、カメラ３６が撮影した動画像のデータを取得し、例えばＭＰ４又はＡＶＩ（Audio Video Interleave）等の適宜のファイル形式の動画像データとして記憶部３２に記憶する。 The shooting processing unit 31a performs processing related to shooting a moving image by the camera 36. The photographing processing unit 31a controls, for example, the shutter speed, frame rate, ISO sensitivity, exposure, aperture value, and the like of the camera 36. Further, the photographing processing unit 31a acquires the moving image data photographed by the camera 36 and stores the moving image data in an appropriate file format such as MP4 or AVI (Audio Video Interleave) in the storage unit 32.

表示処理部３１ｂは、記憶部３２に記憶された動画像データを再生して表示部３４に表示する処理を行う。本実施の形態において表示処理部３１ｂが表示する動画像データには、カメラ３６によって撮影された動画像データと、サーバ装置１によって編集処理がなされた動画像データとが含まれる。また表示処理部３１ｂは、ユーザが本システムを利用するためのメニュー画面、ホーム画面又は設定画面等の表示を行う。これらの画面表示に必要なデータは、プログラム３２ａと共に記憶部３２に記憶されている。 The display processing unit 31b performs a process of reproducing the moving image data stored in the storage unit 32 and displaying it on the display unit 34. The moving image data displayed by the display processing unit 31b in the present embodiment includes the moving image data taken by the camera 36 and the moving image data edited by the server device 1. In addition, the display processing unit 31b displays a menu screen, a home screen, a setting screen, or the like for the user to use the system. The data required for displaying these screens is stored in the storage unit 32 together with the program 32a.

編集指示受付部３１ｃは、操作部３５に対してなされた操作に基づいて、動画像データに対してサーバ装置１による自動的な編集処理を実施する指示をユーザから受け付ける処理を行う。例えば編集指示受付部３１ｃは、表示処理部３１ｂが動画像データを再生して表示する画面に設けられたボタン又はアイコン等に対するタッチ操作、タップ操作又はクリック操作等に応じて、動画像データに対する編集処理の実施指示を受け付ける。また例えば、表示処理部３１ｂが撮影済みの動画像データのファイル名又はサムネイル画像等の一覧表示を行い、編集指示受付部３１ｃは、一覧表示された動画像データから編集処理の対象とする一又は複数の動画像データの選択操作を受け付けることで、選択された動画像データに対する編集処理の実施指示を受け付ける。なお編集指示受付部３１ｃによる編集指示の受け付け方法は一例であって、これに限るものではない。 The editing instruction receiving unit 31c performs a process of receiving an instruction from the user to automatically edit the moving image data by the server device 1 based on the operation performed on the operation unit 35. For example, the editing instruction receiving unit 31c edits the moving image data in response to a touch operation, a tap operation, a click operation, or the like on a button or icon provided on the screen on which the display processing unit 31b reproduces and displays the moving image data. Accepts processing execution instructions. Further, for example, the display processing unit 31b displays a list of the file names or thumbnail images of the captured moving image data, and the editing instruction receiving unit 31c displays the list displayed moving image data as a target for editing processing. By accepting a selection operation of a plurality of moving image data, an instruction to execute an editing process for the selected moving image data is received. The method of receiving the editing instruction by the editing instruction receiving unit 31c is an example, and is not limited to this.

動画像送信部３１ｄは、編集指示受付部３１ｃが編集指示を受け付けた場合に、編集対象の動画像データを記憶部３２から読み出し、読み出した動画像データを通信部３３にてサーバ装置１へ送信する処理を行う。動画像送信部３１ｄは、編集処理の実施を依頼するメッセージ等と共に、動画像データをサーバ装置１へ送信する。またこのときに動画像送信部３１ｄは、動画像データの自動編集に対する条件又は設定等の情報をサーバ装置１へ送信してもよい。 When the editing instruction receiving unit 31c receives the editing instruction, the moving image transmitting unit 31d reads the moving image data to be edited from the storage unit 32, and transmits the read moving image data to the server device 1 by the communication unit 33. Perform the processing to be performed. The moving image transmitting unit 31d transmits the moving image data to the server device 1 together with a message requesting the execution of the editing process and the like. At this time, the moving image transmitting unit 31d may transmit information such as conditions or settings for automatic editing of moving image data to the server device 1.

動画像受信部３１ｅは、サーバ装置１が編集処理を行った編集済の動画像データを通信部３３にて受信する処理を行う。動画像受信部３１ｅは、受信した動画像データを記憶部３２に記憶する。なお本実施の形態においては、サーバ装置１は編集した動画像データを端末装置３へ送信し、端末装置３は編集済の動画像データを記憶部３２に記憶する構成とするが、これに限るものではない。例えば、編集済みの動画像データをサーバ装置１が記憶しておき、端末装置３は編集済みの動画像データを再生して表示する場合に、その都度、サーバ装置１から編集済の動画像データを取得して再生する、いわゆるストリーミング再生の構成が採用されてもよい。 The moving image receiving unit 31e performs a process of receiving the edited moving image data edited by the server device 1 in the communication unit 33. The moving image receiving unit 31e stores the received moving image data in the storage unit 32. In the present embodiment, the server device 1 transmits the edited moving image data to the terminal device 3, and the terminal device 3 stores the edited moving image data in the storage unit 32, but the present invention is limited to this. It's not a thing. For example, when the server device 1 stores the edited moving image data and the terminal device 3 reproduces and displays the edited moving image data, the edited moving image data from the server device 1 each time. A so-called streaming playback configuration, in which the data is acquired and played back, may be adopted.

＜編集指示受付処理＞
図５は、端末装置３が表示する動画像再生画面の一例を示す模式図である。本実施の形態に係る端末装置３は、動画像再生画面を表示部３４に表示して、動画像データの再生及び表示を行う。例えば端末装置３は、動画像再生画面の最上部に、再生する動画像のタイトルを示す「タイトル：小学校の運動会」等の文字列を表示する。動画像のタイトルは、例えばユーザが予め付したものであってもよく、また例えば動画像データに付されたファイル名等であってもよい。端末装置３は、タイトルの文字列の下方に設けられた矩形の再生領域に、動画像を再生して表示する。再生領域の下部には、再生、停止及び早送り等を制御するためのアイコン等が並べて設けられており、端末装置３はこれらのアイコンに対するユーザの操作に応じて動画像の再生を制御する。 <Editing instruction reception process>
FIG. 5 is a schematic diagram showing an example of a moving image reproduction screen displayed by the terminal device 3. The terminal device 3 according to the present embodiment displays the moving image reproduction screen on the display unit 34, and reproduces and displays the moving image data. For example, the terminal device 3 displays a character string such as "Title: Elementary school athletic meet" indicating the title of the moving image to be reproduced at the top of the moving image reproduction screen. The title of the moving image may be, for example, a user's prior name, or may be, for example, a file name attached to the moving image data. The terminal device 3 reproduces and displays a moving image in a rectangular reproduction area provided below the character string of the title. Icons and the like for controlling playback, stop, fast forward, and the like are arranged side by side in the lower part of the reproduction area, and the terminal device 3 controls the reproduction of moving images according to the user's operation on these icons.

また操作用のアイコンの下方には動画像データに関する情報を表示する情報表示領域が設けられており、端末装置３は、例えば動画像データが撮影された日時又は場所等の情報をこの領域に表示する。本実施の形態において、動画像再生画面の情報表示領域の下部には、「自動編集」のラベルが付されたアイコン１０１と、「動画共有」のラベルが付されたアイコン１０２とが並べて設けられている。アイコン１０１は、動画像再生画面に表示されている動画像に対するユーザからの自動編集の指示を受け付けるためのものである。アイコン１０１に対するタッチ操作、タップ操作又はクリック操作等を受け付けた場合、端末装置３は、動画像データをサーバ装置１へ送信して、この動画像データの自動編集を依頼する。アイコン１０２は、動画像データを例えば動画像共有サイト又はＳＮＳ等へ投稿し、一又は複数の他ユーザにこの動画像データを公開するための操作を受け付ける。動画像データの共有については、既存の技術であるため、詳細な説明は省略する。 Further, an information display area for displaying information related to the moving image data is provided below the operation icon, and the terminal device 3 displays information such as the date and time or place where the moving image data was taken in this area. do. In the present embodiment, an icon 101 labeled "automatic editing" and an icon 102 labeled "video sharing" are provided side by side at the bottom of the information display area of the moving image reproduction screen. ing. The icon 101 is for receiving an instruction for automatic editing from the user on the moving image displayed on the moving image reproduction screen. When a touch operation, a tap operation, a click operation, or the like on the icon 101 is accepted, the terminal device 3 transmits the moving image data to the server device 1 and requests automatic editing of the moving image data. The icon 102 posts the moving image data to, for example, a moving image sharing site or an SNS, and accepts an operation for publishing the moving image data to one or a plurality of other users. Since the sharing of moving image data is an existing technology, detailed description thereof will be omitted.

端末装置３からの編集処理の依頼に応じてサーバ装置１は、端末装置３から送信される動画像データを取得して、この動画像データに対する編集処理を自動的に行う。本実施の形態においてサーバ装置１は、元の動画像データから適宜のシーンを抽出し、抽出した各シーンに適宜の画像処理及び音声処理等を施して結合することによりダイジェスト動画像を作成する編集処理を自動的に行う。サーバ装置１は、編集済みの動画像データを、依頼元の端末装置３へ送信する。 In response to the request for the editing process from the terminal device 3, the server device 1 acquires the moving image data transmitted from the terminal device 3 and automatically performs the editing process on the moving image data. In the present embodiment, the server device 1 extracts an appropriate scene from the original moving image data, performs appropriate image processing, audio processing, and the like to combine the extracted scenes to create a digest moving image. Process automatically. The server device 1 transmits the edited moving image data to the requesting terminal device 3.

アイコン１０１に対する操作に応じて動画像データをサーバ装置１へ送信した端末装置３は、例えば「しばらくお待ちください」等のメッセージを動画像再生画面に重畳して表示し、サーバ装置１から送信される編集済みの動画像データの受信を待機する。編集済みの動画像データを受信した場合、端末装置３は、編集前の動画像データに代えて、サーバ装置１から受信した編集済みの動画像データを再生領域にて再生して表示する。 The terminal device 3 that has transmitted the moving image data to the server device 1 in response to the operation for the icon 101 displays a message such as "Please wait for a while" superimposed on the moving image playback screen, and is transmitted from the server device 1. Waits for the edited video data to be received. When the edited moving image data is received, the terminal device 3 reproduces and displays the edited moving image data received from the server device 1 in the reproduction area instead of the moving image data before editing.

また編集済みの動画像データを視聴したユーザは、例えば動画像データに対して行われた編集処理の内容に不満等を有する場合、自動編集のアイコン１０１に対する操作を再び行うことで、動画像データに対する再編集処理を依頼することが可能であってもよい。サーバ装置１は、端末装置３からの再編集の依頼に応じて、動画像データに対して以前とは異なる内容の編集処理を施して、端末装置３へ送信することができる。 Further, when the user who has viewed the edited moving image data is dissatisfied with the content of the editing process performed on the moving image data, for example, the moving image data can be operated by performing the operation for the automatic editing icon 101 again. It may be possible to request a re-editing process for. In response to a request for re-editing from the terminal device 3, the server device 1 can edit the moving image data with contents different from the previous ones and transmit the moving image data to the terminal device 3.

図６は、本実施の形態に係る端末装置３が行う処理の手順を示すフローチャートである。本実施の形態に係る端末装置３の処理部３１の表示処理部３１ｂは、例えば記憶部３２に記憶された一の動画像データの選択操作等を受け付けた場合に、選択された動画像データを再生すべく、図５に示した構成の動画像再生画面を表示部３４に表示する（ステップＳ１）。表示処理部３１ｂは、例えば動画像再生画面に設けられた再生アイコン等に対する操作を受け付けたか否かに基づいて、再生操作がなされたか否かを判定する（ステップＳ２）。再生操作がなされた場合（Ｓ２：ＹＥＳ）、表示処理部３１ｂは、記憶部３２から動画像データを読み出し、読み出した動画像データを再生して表示部３４に表示する再生処理を行い（ステップＳ３）、処理を終了する。 FIG. 6 is a flowchart showing a procedure of processing performed by the terminal device 3 according to the present embodiment. When the display processing unit 31b of the processing unit 31 of the terminal device 3 according to the present embodiment receives, for example, a selection operation of one moving image data stored in the storage unit 32, the selected moving image data is displayed. In order to reproduce, the moving image reproduction screen having the configuration shown in FIG. 5 is displayed on the display unit 34 (step S1). The display processing unit 31b determines whether or not the reproduction operation has been performed based on whether or not the operation for the reproduction icon or the like provided on the moving image reproduction screen is accepted, for example (step S2). When the reproduction operation is performed (S2: YES), the display processing unit 31b reads the moving image data from the storage unit 32, reproduces the read moving image data, and performs a reproduction process of displaying the read moving image data on the display unit 34 (step S3). ), End the process.

再生操作がなされていない場合（Ｓ２：ＮＯ）、処理部３１の編集指示受付部３１ｃは、動画像再生画面に設けられた自動編集のアイコン１０１に対する操作がなされたか否かに基づいて、ユーザから動画像データの自動編集の指示が与えられたか否かを判定する（ステップＳ４）。自動編集の指示が与えられていない場合（Ｓ４：ＮＯ）、処理部３１は、ステップＳ１へ処理を戻す。 When the playback operation is not performed (S2: NO), the editing instruction receiving unit 31c of the processing unit 31 receives the operation from the user based on whether or not the operation for the automatic editing icon 101 provided on the moving image playback screen is performed. It is determined whether or not an instruction for automatic editing of moving image data has been given (step S4). When the instruction for automatic editing is not given (S4: NO), the processing unit 31 returns the processing to step S1.

自動編集の指示が与えられた場合（Ｓ４：ＹＥＳ）、処理部３１の動画像送信部３１ｄは、編集対象の動画像データを記憶部３２から読み出して、読み出した動画像データをサーバ装置１へ送信する（ステップＳ５）。その後、処理部３１の動画像受信部３１ｅは、サーバ装置１が送信する編集済みの動画像データを受信したか否かを判定する（ステップＳ６）。編集済みの動画像データを受信していない場合（Ｓ６：ＮＯ）、動画像受信部３１ｅは、サーバ装置１から動画像データを受信するまで待機する。サーバ装置１から編集済みの動画像データを動画像受信部３１ｅが受信した場合（Ｓ６：ＹＥＳ）、表示処理部３１ｂは、受信した編集済みの動画像データを再生して表示部３４に表示する再生処理を行い（ステップＳ７）、処理を終了する。 When an instruction for automatic editing is given (S4: YES), the moving image transmitting unit 31d of the processing unit 31 reads the moving image data to be edited from the storage unit 32, and sends the read moving image data to the server device 1. Transmit (step S5). After that, the moving image receiving unit 31e of the processing unit 31 determines whether or not the edited moving image data transmitted by the server device 1 has been received (step S6). When the edited moving image data is not received (S6: NO), the moving image receiving unit 31e waits until the moving image data is received from the server device 1. When the moving image receiving unit 31e receives the edited moving image data from the server device 1 (S6: YES), the display processing unit 31b reproduces the received edited moving image data and displays it on the display unit 34. The reproduction process is performed (step S7), and the process is completed.

＜編集処理＞
図７は、サーバ装置１が行う動画像データの編集処理を説明するための模式図である。本実施の形態に係るサーバ装置１は、端末装置３から編集依頼と共に送信される動画像データに対して編集処理を行い、ダイジェスト動画像データを生成する処理を行う。本図においては、端末装置３からサーバ装置１が取得した動画像データを全体動画像として最上部に図示している。 <Editing process>
FIG. 7 is a schematic diagram for explaining the editing process of moving image data performed by the server device 1. The server device 1 according to the present embodiment edits the moving image data transmitted from the terminal device 3 together with the editing request, and performs a process of generating digest moving image data. In this figure, the moving image data acquired by the server device 1 from the terminal device 3 is shown at the top as a whole moving image.

まず本実施の形態に係るサーバ装置１は、この全体動画像から一又は複数の部分動画像を抽出する処理を行う。なお、全体動画像から部分動画像を抽出してダイジェスト動画像を生成する技術は既存のものであるため、サーバ装置１が行う部分動画像の抽出方法の詳細な説明は省略する。サーバ装置１は、どのような方法で全体動画像から部分動画像を抽出してもよい。図示の例では、サーバ装置１は、全体動画像から３つの部分動画像を抽出している。 First, the server device 1 according to the present embodiment performs a process of extracting one or a plurality of partial moving images from the whole moving image. Since the technique of extracting the partial moving image from the whole moving image to generate the digest moving image is existing, the detailed description of the method of extracting the partial moving image performed by the server device 1 will be omitted. The server device 1 may extract a partial moving image from the whole moving image by any method. In the illustrated example, the server device 1 extracts three partial moving images from the whole moving image.

例えばサーバ装置１は、全体動画像の中で笑顔の人が映されている部分、笑顔の人がより多く映されている部分を抽出することができる。また例えばサーバ装置１は、特定の人又は物が映されている部分を抽出することができる。また例えばサーバ装置１は、全体動画像の中に映されている人又は物等の動作又は変化等が大きい部分を抽出することができ、動作又は変化等が小さい部分を抽出してもよい。また例えばサーバ装置１は、動画像に映されている人の会話の音量が閾値を超えた部分を抽出することができる。上記の部分動画像の抽出方法は一例であって、これに限るものではない。 For example, the server device 1 can extract a part in which a smiling person is projected and a portion in which more smiling people are projected in the whole moving image. Further, for example, the server device 1 can extract a part in which a specific person or an object is projected. Further, for example, the server device 1 can extract a portion where the movement or change of a person or an object displayed in the whole moving image is large, and may extract a portion where the movement or change is small. Further, for example, the server device 1 can extract a portion where the volume of the conversation of the person shown in the moving image exceeds the threshold value. The above method for extracting a partial moving image is an example, and is not limited to this.

次いでサーバ装置１は、抽出した部分動画像毎に、映されているシーンの特徴を特定する処理を行う。本実施の形態においてサーバ装置１は、記憶部３２に記憶した特徴特定モデル１２ｂに部分動画像のデータを入力し、この部分動画像データに含まれるシーンの特徴を特定した出力情報を特徴特定モデル１２ｂから取得する。図示の例では、部分動画像に対して天候、撮影場所、時間帯、自動車の有無及び状態、人物１（子供且つ男）の有無及び状態、並びに、人物２（大人且つ女）の有無及び状態等の特徴が特定されている。また自動車に関しては走行しているか又は停車しているかが特徴として特定され、人物１に関しては笑顔であるか否かが特徴として特定され、人物２に関しては話しているか否かが特徴として特定されている。なお図示の特徴は一例であって、これに限るものではなく、動画像データからはこれら以外の種々の特徴が特定されてよい。例えば動画像データに自転車、建物、木、植物又は動物等が映されているか否かを特徴として特定としてもよく、映っている人又は物の属性として表情、年齢、性別又は動作等の種類を特徴として特定してもよい。 Next, the server device 1 performs a process of specifying the characteristics of the projected scene for each extracted partial moving image. In the present embodiment, the server device 1 inputs the partial moving image data into the feature specifying model 12b stored in the storage unit 32, and uses the output information that specifies the features of the scene included in the partial moving image data as the feature specifying model. Obtained from 12b. In the illustrated example, the weather, shooting location, time zone, presence / absence and condition of a car, presence / absence and state of person 1 (children and men), and presence / absence and state of person 2 (adults and women) with respect to a partial moving image. Features such as are specified. In addition, for a car, whether it is running or stopped is specified as a feature, for person 1 it is specified whether it is a smile or not, and for person 2 it is specified whether it is talking or not. There is. The features shown in the illustration are merely examples, and are not limited to these features. Various features other than these may be specified from the moving image data. For example, it may be specified whether or not a bicycle, a building, a tree, a plant, an animal, etc. is reflected in the moving image data, and the type of facial expression, age, gender, movement, etc. may be specified as the attribute of the person or object in the image. It may be specified as a feature.

特徴特定モデル１２ｂには、例えば動画像データを構成する複数のフレームが時系列で入力される。特徴特定モデル１２ｂは、時系列で入力される各フレームについて上記の特徴を特定する。特徴特定モデル１２ｂが出力する情報に基づいてサーバ装置１は、動画像データに対して特定された時系列の特徴の情報を取得することができる。サーバ装置１は、動画像データを構成するフレーム毎に特徴を特定してもよく、所定数のフレームの集まり毎に特徴を特定してもよく、動画像データの再生時間において１秒等の所定時間毎に特徴を特定してもよい。いずれにしても本実施の形態に係るサーバ装置１が特定する特徴は、時間の経過に従って変化し得る時系列の特徴である。 In the feature-specific model 12b, for example, a plurality of frames constituting moving image data are input in time series. The feature-specific model 12b identifies the above features for each frame input in time series. Based on the information output by the feature-specific model 12b, the server device 1 can acquire information on the time-series features specified for the moving image data. The server device 1 may specify a feature for each frame constituting the moving image data, may specify a feature for each set of a predetermined number of frames, and have a predetermined playback time of the moving image data such as 1 second. Features may be specified on an hourly basis. In any case, the feature specified by the server device 1 according to the present embodiment is a time-series feature that can change with the passage of time.

サーバ装置１は、特徴特定モデル１２ｂとして、例えば動画像データの入力に対して、この動画像データに映されているシーンの天候が晴、曇、雨及び雪等のいずれであるかを分類する学習モデルを用いることができる。この特徴特定モデル１２ｂは、例えば動画像データと天候のラベルとが対応付けられた教師データを用いて予め機械学習がなされることで生成される。またサーバ装置１は、特徴特定モデル１２ｂとして、例えば動画像データの入力に対して、この動画像データに映されているシーンの撮影場所が屋内又は屋外のいずれであるかを分類する学習モデルを用いることができる。またサーバ装置１は、例えば動画像データの入力に対して、この動画像データに映されているシーンの時間帯が朝、昼間、夕方又は夜間のいずれであるかを分類する学習モデルを用いることができる。 As the feature identification model 12b, the server device 1 classifies, for example, whether the weather of the scene reflected in the moving image data is sunny, cloudy, rainy, snow, or the like with respect to the input of the moving image data. A learning model can be used. This feature-specific model 12b is generated by performing machine learning in advance using, for example, teacher data in which moving image data and weather labels are associated with each other. Further, the server device 1 uses the feature identification model 12b as a learning model for classifying whether the shooting location of the scene reflected in the moving image data is indoors or outdoors with respect to the input of the moving image data, for example. Can be used. Further, the server device 1 uses, for example, a learning model that classifies whether the time zone of the scene reflected in the moving image data is morning, daytime, evening, or night with respect to the input of the moving image data. Can be done.

またサーバ装置１は、特徴特定モデル１２ｂとして、例えば動画像データの入力に対して、この動画像データに自動車が映されているか否か、写されている自動車の位置、及び、写されている自動車の属性として走行しているか又は停車しているか等を特定する学習モデルを用いることができる。図示の例では、部分動画像１において停車している自動車が映されている期間が特定されている。サーバ装置１は、特徴特定モデル１２ｂから時系列の特徴を取得することによって、部分動画像１に自動車が映されているか否かのみでなく、部分動画像１に自動車が映されている期間を特定することができる。 Further, the server device 1 uses the feature identification model 12b as a feature-specific model 12b, for example, for inputting moving image data, whether or not a car is shown in the moving image data, the position of the car being copied, and copying. As an attribute of the automobile, a learning model for specifying whether the vehicle is running or stopped can be used. In the illustrated example, the period during which the stopped vehicle is projected in the partial moving image 1 is specified. By acquiring the time-series features from the feature-specific model 12b, the server device 1 not only determines whether or not the car is shown in the partial moving image 1, but also determines the period during which the car is shown in the partial moving image 1. Can be identified.

同様にサーバ装置１は、特徴特定モデル１２ｂとして、例えば動画像データの入力に対して、この動画像データに人が映されているか否か、写されている人の位置、並びに、写されている人の属性として表情、年齢及び性別等を特定する学習モデルを用いることができる。図示の例では、部分動画像１において子供の男性が映されている期間が特定され、更にこの子供の表情が笑顔である期間が特定されている。また図示の例では、部分動画像１において大人の女性が映されている期間が特定され、更にこの女性が話している期間が特定されている。 Similarly, the server device 1 uses the feature identification model 12b as a feature-specific model 12b, for example, with respect to the input of moving image data, whether or not a person is reflected in the moving image data, the position of the person being copied, and the copied person. A learning model that specifies facial expressions, age, gender, etc. can be used as attributes of the person. In the illustrated example, the period during which the male child is projected in the partial moving image 1 is specified, and the period during which the child's facial expression is smiling is specified. Further, in the illustrated example, the period in which the adult woman is projected in the partial moving image 1 is specified, and the period in which the woman is talking is further specified.

サーバ装置１は、上記のような複数の特徴を動画像データから特定する複数の学習モデルを、記憶部３２に特徴特定モデル１２ｂとして記憶している。サーバ装置１は、これら複数の学習モデルに対してそれぞれ動画像データを入力し、各学習モデルが出力する情報を取得することによって、動画像データから複数の特徴を特定することができる。ただし特徴特定モデル１２ｂは、これら複数の特徴を出力する１つの学習モデルとして生成されてもよい。 The server device 1 stores a plurality of learning models for specifying the plurality of features as described above from the moving image data as the feature identification model 12b in the storage unit 32. The server device 1 can specify a plurality of features from the moving image data by inputting moving image data to each of the plurality of learning models and acquiring the information output by each learning model. However, the feature-specific model 12b may be generated as one learning model that outputs these plurality of features.

各部分動画像について時系列の特徴を特定したサーバ装置１は、特定した特徴に基づいて記憶部３２の編集方法決定テーブル１２ｃを参照することによって、各部分動画像に対して行う編集処理を決定する。サーバ装置１は、特定された部分動画像についての時系列の特徴の組み合わせに合致する組み合わせを、編集方法決定テーブル１２ｃにシーン情報として定められた複数の組み合わせの中から検索する。部分動画像から特定した特徴の組み合わせに合致する組み合わせが編集方法決定テーブル１２ｃに存在する場合、サーバ装置１は、この組み合わせに対応付けられた編集方法を編集方法決定テーブル１２ｃから取得する。 The server device 1 that has specified the time-series features of each partial moving image determines the editing process to be performed on each partial moving image by referring to the editing method determination table 12c of the storage unit 32 based on the specified features. do. The server device 1 searches for a combination that matches the combination of time-series features of the specified partial moving image from a plurality of combinations defined as scene information in the editing method determination table 12c. When a combination matching the combination of features specified from the partial moving image exists in the editing method determination table 12c, the server device 1 acquires the editing method associated with this combination from the editing method determination table 12c.

例えば、図７に示す例では、天候が晴、撮影場所が屋外、時間帯が昼間、且つ、子供の男性が笑顔という特徴の組み合わせが特定されるタイミングが存在している。これらの特徴の組み合わせは、図３に示す編集方法決定テーブル１２ｃの１番目に存在しており、サーバ装置１は、対応する編集方法として「笑いエフェクト画像追加」及び「明るいＢＧＭ」の２種を編集方法決定テーブル１２ｃから取得することができる。 For example, in the example shown in FIG. 7, there is a timing in which the combination of the characteristics that the weather is fine, the shooting location is outdoors, the time zone is daytime, and the male child smiles is specified. The combination of these features exists first in the editing method determination table 12c shown in FIG. 3, and the server device 1 has two types of corresponding editing methods, "laughter effect image addition" and "bright BGM". It can be obtained from the editing method determination table 12c.

サーバ装置１は、部分動画像に対して時系列に特定した複数の特徴の組み合わせに対して、時系列に編集方法を編集方法決定テーブル１２ｃから取得して、時系列に編集方法を決定する。例えば、部分動画像のフレーム毎に特徴が特定されている場合、サーバ装置１は、フレーム毎に編集方法を決定することができる。ただしサーバ装置１は、フレーム毎に特徴が特定されている場合であっても、例えば１０フレーム毎又は１００フレーム毎等の所定フレーム毎に編集方法を決定してもよく、また例えば動画像の再生時間の１秒毎又は１０秒毎等の所定時間毎に編集方法を決定してもよい。また例えばサーバ装置１は、部分動画像において最も長い時間に亘って現れる特徴の組み合わせを１つ又は複数選択して代表の特徴とし、選択した代表の特徴に基づいてこの部分動画像に対する編集方法を決定してもよい。 The server device 1 acquires an editing method in time series from the editing method determination table 12c for a combination of a plurality of features specified in time series for a partial moving image, and determines the editing method in time series. For example, when the feature is specified for each frame of the partial moving image, the server device 1 can determine the editing method for each frame. However, even when the feature is specified for each frame, the server device 1 may determine the editing method for each predetermined frame such as every 10 frames or every 100 frames, and for example, the reproduction of a moving image. The editing method may be determined at predetermined time intervals such as every 1 second or every 10 seconds of the time. Further, for example, the server device 1 selects one or a plurality of combinations of features that appear over the longest time in the partial moving image as representative features, and based on the selected representative features, edits the partial moving image. You may decide.

全体動画像から抽出した全ての部分動画像について編集方法を決定したサーバ装置１は、決定した編集方法による部分動画像の編集処理を行う。本実施の形態においてサーバ装置１は、例えば以下の編集処理を行うことができる。ただし以下に列挙する編集処理は一例であって、サーバ装置１はこれら以外の様々な編集処理を行ってよい。
・装飾画像、キャラクタ画像又はエフェクト画像等の画像を重畳する編集処理
・動画像に含まれるシーンの時間帯を変更する編集処理
・動画像のスタイルを変更する編集処理
・効果音又は背景音を追加する編集処理
・音声について声色又は声音等を変更する編集処理
・再生速度を変更する編集処理 The server device 1 that has determined the editing method for all the partial moving images extracted from the whole moving image performs the editing process of the partial moving image by the determined editing method. In the present embodiment, the server device 1 can perform the following editing processing, for example. However, the editing processes listed below are examples, and the server device 1 may perform various editing processes other than these.
-Editing process to superimpose images such as decorative images, character images or effect images-Editing process to change the time zone of the scene included in the moving image-Editing process to change the style of the moving image-Add sound effect or background sound Editing process to change the voice color or voice sound, etc. Editing process to change the playback speed

図８は、画像を重畳する編集処理の一例を説明するための模式図である。画像を重畳する編集処理において、サーバ装置１は、例えば動画像に映された人に宇宙服の装飾画像を重畳することで、登場人物を宇宙飛行士に変身させることができる。また例えばサーバ装置１は、動画像に映された人の顔に眼鏡又はサングラスの装飾画像を重畳することで、登場人物を装飾することができる。また例えばサーバ装置１は、動画像に映された人に衣服の装飾画像を重畳することで、登場人物の服装を変更することができる。また例えばサーバ装置１は、動画像に映された赤ん坊の近くにひよこのキャラクタ画像を重畳することができる。また例えばサーバ装置１は、動画像に映されたサッカー選手の足元に存在するサッカーボールに対してエフェクト画像を重畳することができる。 FIG. 8 is a schematic diagram for explaining an example of an editing process for superimposing images. In the editing process of superimposing images, the server device 1 can transform a character into an astronaut by superimposing a decorative image of a space suit on a person projected on a moving image, for example. Further, for example, the server device 1 can decorate a character by superimposing a decorative image of glasses or sunglasses on a person's face projected on a moving image. Further, for example, the server device 1 can change the clothes of the characters by superimposing the decorative image of the clothes on the person projected on the moving image. Further, for example, the server device 1 can superimpose the character image of the chick near the baby projected on the moving image. Further, for example, the server device 1 can superimpose an effect image on a soccer ball existing at the feet of a soccer player projected on a moving image.

サーバ装置１が動画像に重畳したこれらの画像は、動画像の再生に伴ってアニメーションにより動いてよい。サーバ装置１が重畳するこれらの画像のデータは、記憶部１２に予め記憶されている。予め記憶された多数の画像のいずれを重畳するかは、例えば編集方法決定テーブル１２ｃに定められていてもよく、また例えば特定された特定の特徴に対して予め紐付けられた画像を重畳してもよく、また例えば乱数に基づいて重畳する画像を選択してもよく、これら以外の種々の方法で重畳する画像を決定してよい。サーバ装置１は、重畳する画像を、元の動画像の適宜の位置に重畳する画像処理を行う。このときにサーバ装置１は、例えば動画像中での人又は物等の位置の検出、人の顔の検出、人の姿勢の検出、人又は物の移動方向又は移動速度の検出等を行って、重畳する画像の位置及び向き等を決定することができる。 These images superimposed on the moving image by the server device 1 may move by animation as the moving image is reproduced. The data of these images superimposed on the server device 1 is stored in advance in the storage unit 12. Which of the large number of images stored in advance is to be superimposed may be determined, for example, in the editing method determination table 12c, or, for example, an image associated with a specific feature specified in advance may be superimposed. Alternatively, for example, an image to be superimposed may be selected based on a random number, and an image to be superimposed may be determined by various methods other than these. The server device 1 performs image processing for superimposing the superimposed image on an appropriate position of the original moving image. At this time, the server device 1 detects, for example, the position of a person or an object in a moving image, the face of the person, the posture of the person, the moving direction or the moving speed of the person or the object, and the like. , The position and orientation of the superimposed image can be determined.

また、サーバ装置１は、例えば元の動画像の色相、彩度、明度等を変更する画像処理を施すことによって、動画像に含まれるシーンの時間帯を変更することができる。これによりサーバ装置１は、例えば昼間から夕方又は夜間へ、夜間から昼間又は夕方へ、夕方から昼間又は夜間へ等の時間帯を変更する画像処理を行うことができる。 Further, the server device 1 can change the time zone of the scene included in the moving image by performing image processing for changing the hue, saturation, brightness, etc. of the original moving image, for example. As a result, the server device 1 can perform image processing for changing the time zone such as from daytime to evening or nighttime, from nighttime to daytime or evening, and from evening to daytime or nighttime.

また、サーバ装置１は、動画像のスタイルを変更する編集処理を、いわゆる画風変換の学習モデルを利用して行うことができる。既存の技術であるため詳細な説明は省略するが、例えば画風変換の学習モデルは、入力画像に対してスタイル（画風）を変換した変換画像を出力するモデルであり、入力画像及びスタイル画像に近い変換画像を出力するように予め機械学習がなされる。学習モデルは変換するスタイル毎に生成され、サーバ装置１は、複数の画風変換の学習モデルを記憶部１２に記憶して利用することができる。変換可能な複数のスタイルのうち、いずれのスタイルへ変換するかは、例えば編集方法決定テーブル１２ｃに定められていてもよく、また例えば特定された特定の特徴に対して予め紐付けられたスタイルを採用してもよく、また例えば乱数に基づいてスタイルを決定してもよく、これら以外の種々の方法でスタイルを決定してよい。これらによりサーバ装置１は、例えば撮影された通常スタイルの動画像を、絵画風又はアニメ風等のスタイルに変更することができる。 Further, the server device 1 can perform an editing process for changing the style of the moving image by using a learning model of so-called style conversion. Since it is an existing technique, detailed description is omitted, but for example, a style conversion learning model is a model that outputs a converted image in which a style (style) is converted with respect to an input image, and is close to the input image and the style image. Machine learning is performed in advance so as to output the converted image. The learning model is generated for each style to be converted, and the server device 1 can store and use a plurality of learning models for style conversion in the storage unit 12. Which of the plurality of convertible styles to be converted may be determined, for example, in the editing method determination table 12c, and for example, a style associated with a specific specific feature specified in advance may be specified. It may be adopted, or the style may be determined based on, for example, a random number, and the style may be determined by various methods other than these. As a result, the server device 1 can change, for example, a photographed normal style moving image into a style such as a painting style or an animation style.

また、サーバ装置１は、動画像に効果音又は背景音を追加する編集処理を行うことができ、追加する効果音又は背景音のデータは記憶部１２に予め記憶されている。サーバ装置１は、例えば編集方法決定テーブル１２ｃにて定められた効果音又は背景音に対応するデータを記憶部１２から読み出して、この効果音又は背景音を動画像データに追加する処理を行う。 Further, the server device 1 can perform an editing process for adding a sound effect or a background sound to the moving image, and the data of the added sound effect or the background sound is stored in the storage unit 12 in advance. The server device 1 reads, for example, data corresponding to the sound effect or background sound defined in the editing method determination table 12c from the storage unit 12, and performs a process of adding the sound effect or background sound to the moving image data.

また、サーバ装置１は、動画像に映された人又は動物等が発した音声の声色又は声音等を変更する、例えば男性の声を女性の声に変換する又は人の声を特定のアニメーションキャラクターの声に変換する等の編集処理を行うことができる。サーバ装置１は、例えば動画像から人が発した音声を抽出し、抽出した音声の周波数変換等の処理を行うことによって、声色又は声音を変更する。 Further, the server device 1 changes the voice color or voice sound of the voice emitted by a person or an animal or the like reflected in the moving image, for example, converting a male voice into a female voice or converting a human voice into a specific animation character. It is possible to perform editing processing such as converting to the voice of. The server device 1 changes the voice color or voice sound by, for example, extracting a voice emitted by a person from a moving image and performing processing such as frequency conversion of the extracted voice.

また、サーバ装置１は、動画像データの再生速度を変更する、例えばスロー再生、コマ送り再生、倍速再生又は早送り再生等の速度変更の処理を行う。これら複数の速度変更の方法うち、いずれの方法で速度変更を行うかは、例えば編集方法決定テーブル１２ｃに定められていてもよく、また例えば特定された特定の特徴に対して予め紐付けられた方法を採用してもよく、また例えば乱数に基づいて速度変更の方法を決定してもよく、これら以外の種々の方法で速度変更の方法を決定してよい Further, the server device 1 changes the reproduction speed of the moving image data, for example, performs a speed change process such as slow reproduction, frame advance reproduction, double speed reproduction, or fast forward reproduction. Of these plurality of speed changing methods, which method is used for speed changing may be specified in, for example, the editing method determination table 12c, or is associated with, for example, a specific feature specified in advance. The method may be adopted, or the speed changing method may be determined based on, for example, a random number, or the speed changing method may be determined by various methods other than these.

なお本実施の形態においてサーバ装置１は、画像の重畳、時間帯の変更、効果音又は背景音の追加、声色又は声音の変更、再生速度の変更等の編集処理について、予め定められた方法（ルール）に従って画像処理又は音声処理等を行うことにより、動画像を編集するものとするが、これに限るものではない。サーバ装置１は、これらの編集処理についても、スタイル変更の編集処理と同様に、学習済の学習モデルを用いて編集処理を行ってよい。 In the present embodiment, the server device 1 uses a predetermined method (predetermined method) for editing processing such as superimposing an image, changing a time zone, adding a sound effect or a background sound, changing a voice color or a voice sound, and changing a playback speed. The moving image shall be edited by performing image processing or sound processing according to the rule), but the present invention is not limited to this. The server device 1 may also perform the editing process using the trained learning model in the same manner as the editing process of changing the style.

全ての部分動画像に対して編集処理を行ったサーバ装置１は、複数の部分動画像を結合して１つの動画像データを生成する。これにより生成された動画像データが、編集済みのダイジェスト動画像データとして、サーバ装置１から編集依頼元の端末装置３へ送信される。 The server device 1 that has performed editing processing on all the partial moving images combines a plurality of partial moving images to generate one moving image data. The moving image data generated thereby is transmitted from the server device 1 to the editing request source terminal device 3 as the edited digest moving image data.

なおサーバ装置１は、編集済みの動画像データを端末装置３へ送信した後に、この端末装置３から再編集の依頼を受け付けてもよい。再編集の依頼を受け付けたサーバ装置１は、例えば未編集の全体動画像のデータから編集処理を再度行ってもよく、編集済みのダイジェスト動画像データに対して編集内容の変更、追加又は削除等を行ってもよい。サーバ装置１は、例えば編集方法決定テーブル１２ｃにて決定される編集方法をランダムに別の編集方法へ変更してもよい。また例えば、編集方法決定テーブル１２ｃに２回目の編集方法、３回目の編集方法…のように複数の編集方法を記憶しておき、サーバ装置１が編集方法決定テーブル１２ｃにて定められた複数の編集方法を順番に実施してもよい。また例えばサーバ装置１は、前回の編集方法と再編集で行う編集方法とを対応付けたテーブルを予め記憶しておき、このテーブルを用いて再編集の編集方法を決定してもよい。サーバ装置１は、再編集の編集方法をどのように決定してもよい。 The server device 1 may accept a request for re-editing from the terminal device 3 after transmitting the edited moving image data to the terminal device 3. The server device 1 that has received the re-editing request may, for example, perform the editing process again from the unedited whole moving image data, and change, add, or delete the edited contents of the edited digest moving image data. May be done. For example, the server device 1 may randomly change the editing method determined in the editing method determination table 12c to another editing method. Further, for example, a plurality of editing methods are stored in the editing method determination table 12c, such as the second editing method, the third editing method, and so on, and the server device 1 stores a plurality of editing methods defined in the editing method determination table 12c. The editing methods may be carried out in order. Further, for example, the server device 1 may store in advance a table in which the previous editing method and the editing method performed in the re-editing are associated with each other, and use this table to determine the editing method for the re-editing. The server device 1 may determine the editing method for re-editing.

図９は、本実施の形態においてサーバ装置１が行う処理の手順を示すフローチャートである。本実施の形態に係るサーバ装置１の処理部１１は、端末装置３からの依頼に応じて本処理を開始する。まず処理部１１の動画像取得部１１ａは、端末装置３から編集処理を実施する依頼と共に送信される動画像データを、編集処理の対象とする全体動画像のデータとして取得する（ステップＳ１１）。動画像取得部１１ａが取得した全体動画像のデータは、例えば記憶部１２に一時的に記憶される。 FIG. 9 is a flowchart showing a procedure of processing performed by the server device 1 in the present embodiment. The processing unit 11 of the server device 1 according to the present embodiment starts this processing in response to a request from the terminal device 3. First, the moving image acquisition unit 11a of the processing unit 11 acquires the moving image data transmitted together with the request for performing the editing process from the terminal device 3 as the data of the entire moving image to be edited (step S11). The data of the entire moving image acquired by the moving image acquisition unit 11a is temporarily stored in, for example, the storage unit 12.

次いで処理部１１の部分動画像抽出部１１ｂは、ステップＳ１１にて取得した全体動画像のデータから一又は複数の部分動画像のデータを抽出する（ステップＳ１２）。処理部１１の特徴特定部１１ｃは、記憶部１２に記憶された特徴特定モデル１２ｂを用いて、ステップＳ１２にて抽出された各部分動画像のデータについて時系列の特徴を特定する処理を行う（ステップＳ１３）。処理部１１の編集方法決定部１１ｄは、ステップＳ１３にて特定された各部分動画像の特徴に基づいて、記憶部１２に記憶された編集方法決定テーブル１２ｃを参照し、編集方法決定テーブル１２ｃに定められた編集方法を取得することにより、各部分動画像に対して行う編集方法を決定する（ステップＳ１４）。 Next, the partial moving image extraction unit 11b of the processing unit 11 extracts the data of one or a plurality of partial moving images from the data of the whole moving image acquired in step S11 (step S12). The feature identification unit 11c of the processing unit 11 uses the feature identification model 12b stored in the storage unit 12 to perform a process of specifying time-series features of the data of each partial moving image extracted in step S12 (. Step S13). The editing method determination unit 11d of the processing unit 11 refers to the editing method determination table 12c stored in the storage unit 12 based on the characteristics of each partial moving image specified in step S13, and displays the editing method determination table 12c in the editing method determination table 12c. By acquiring the defined editing method, the editing method to be performed on each partial moving image is determined (step S14).

次いで処理部１１の編集処理部１１ｅは、ステップＳ１４にて決定された編集方法に従って、各部分動画像のデータを編集する（ステップＳ１５）。処理部１１の結合処理部１１ｆは、ステップＳ１５にて編集処理が行われた全ての部分動画像のデータを時系列順に結合する（ステップＳ１６）。処理部１１の編集済動画像送信部１１ｇは、ステップＳ１６にて部分動画像を結合することにより得られた編集済みの動画像データを、編集処理の依頼元の端末装置３へ送信し（ステップＳ１７）、処理を終了する。 Next, the editing processing unit 11e of the processing unit 11 edits the data of each partial moving image according to the editing method determined in step S14 (step S15). The combination processing unit 11f of the processing unit 11 combines the data of all the partial moving images edited in step S15 in chronological order (step S16). The edited moving image transmitting unit 11g of the processing unit 11 transmits the edited moving image data obtained by combining the partial moving images in step S16 to the terminal device 3 of the requesting source of the editing process (step). S17), the process is terminated.

＜まとめ＞
以上の構成の本実施の形態に係るサーバ装置１は、端末装置３から編集処理の対象となる動画像データを取得し、取得した動画像データの特徴を特定し、特定した特徴に応じて動画像データの編集処理を行う。これにより、サーバ装置１は動画像データの特徴に適した編集処理を自動的に行うことができ、ユーザは動画像データの編集処理を自ら行う必要がないため、ユーザにとって動画像データの編集を容易化することが期待できる。 <Summary>
The server device 1 according to the present embodiment having the above configuration acquires moving image data to be edited from the terminal device 3, identifies the characteristics of the acquired moving image data, and moves the moving image according to the specified characteristics. Edit the image data. As a result, the server device 1 can automatically perform editing processing suitable for the characteristics of the moving image data, and the user does not have to perform the editing processing of the moving image data by himself / herself. Therefore, the user can edit the moving image data. It can be expected to be facilitated.

また本実施の形態に係るサーバ装置１は、動画像データの中の同一時点で複数の特徴を特定し、特定した複数の特徴の組み合わせに応じて編集方法を決定して動画像データを編集する。これにより、サーバ装置１が多様な編集処理を動画像データに対して行うことが期待できる。 Further, the server device 1 according to the present embodiment identifies a plurality of features at the same time in the moving image data, determines an editing method according to a combination of the specified plurality of features, and edits the moving image data. .. As a result, it can be expected that the server device 1 performs various editing processes on the moving image data.

また本実施の形態に係るサーバ装置１は、装飾画像等を重畳する編集処理、シーンの時間帯を変更する編集処理、スタイルを変更する編集処理、効果音又は背景音を追加する編集処理、音声を変更する編集処理、及び、再生速度を変更する編集処理のうちの少なくとも１つの編集処理を行う。これにより、サーバ装置１は娯楽性又は趣向性の高い動画像データを作成することが期待できる。なお、サーバ装置１が行う編集処理は上記のものに限定されず、サーバ装置１はこれら以外の様々な編集処理を行ってよい。 Further, the server device 1 according to the present embodiment has an editing process for superimposing a decorative image or the like, an editing process for changing the time zone of a scene, an editing process for changing a style, an editing process for adding a sound effect or a background sound, and audio. At least one of the editing process for changing the playback speed and the editing process for changing the playback speed is performed. As a result, the server device 1 can be expected to create moving image data with high entertainment or taste. The editing process performed by the server device 1 is not limited to the above, and the server device 1 may perform various editing processes other than these.

また本実施の形態に係るサーバ装置１は、端末装置３から取得した動画像データの全体から複数の部分動画像のデータを抽出し、抽出した各部分動画像に対して編集処理を行い、編集した複数の部分動画像データを結合することによって、ダイジェスト動画像データを生成する。これによりサーバ装置１は、再生時間の長い動画像データに基づいて、再生時間を短縮し且つ編集処理を施したダイジェスト動画像データを生成することができる。なお本実施の形態においてサーバ装置１は、全体動画像から部分動画像を抽出して編集処理を行ったが、これに限るものではなく、部分動画像を抽出せず、全体動画像に対して編集処理を行ってもよい。これは、全体動画像の再生時間が短い場合等に好適である。換言すれば、サーバ装置１は、全体動画像から部分動画像を抽出する際に、全体動画像そのものを１つの部分動画像として抽出してもよい。 Further, the server device 1 according to the present embodiment extracts a plurality of partial moving image data from the entire moving image data acquired from the terminal device 3, edits each extracted partial moving image, and edits the extracted partial moving image. Digest moving image data is generated by combining a plurality of partial moving image data. As a result, the server device 1 can generate the digest moving image data in which the playing time is shortened and the editing process is performed, based on the moving image data having a long playing time. In the present embodiment, the server device 1 extracts a partial moving image from the whole moving image and performs editing processing, but the present invention is not limited to this, and the partial moving image is not extracted and the whole moving image is subjected to the editing process. Editing processing may be performed. This is suitable when the reproduction time of the entire moving image is short. In other words, when the server device 1 extracts the partial moving image from the whole moving image, the server device 1 may extract the whole moving image itself as one partial moving image.

また本実施の形態に係る端末装置３は、動画像データに対する編集処理の実施指示を受け付けるアイコン１０１を、動画像再生画面に動画像と共に表示し、アイコン１０１に対する操作を受け付けることで編集処理の実施指示を受け付ける。実施指示を受け付けた場合に端末装置３は編集処理の依頼をサーバ装置１へ与え、この依頼に応じてサーバ装置１が動画像データの編集処理を行い、編集済みの動画像データを端末装置３へ送信する。端末装置３は、再生画面において、編集前の動画像データに代えて、サーバ装置１から受信した編集済みの動画像データを再生して表示する。これによりユーザは、動画像データの編集処理の実施指示を容易に行うことができると共に、編集処理がなされた動画像データを容易に視聴することが期待できる。 Further, the terminal device 3 according to the present embodiment displays an icon 101 for receiving an instruction to execute an editing process on the moving image data on the moving image reproduction screen together with the moving image, and executes the editing process by accepting an operation on the icon 101. Accept instructions. When the execution instruction is received, the terminal device 3 gives a request for editing processing to the server device 1, the server device 1 edits the moving image data in response to this request, and the edited moving image data is sent to the terminal device 3 Send to. On the playback screen, the terminal device 3 reproduces and displays the edited moving image data received from the server device 1 instead of the moving image data before editing. As a result, the user can easily give an instruction to execute the editing process of the moving image data, and can expect to easily view the edited moving image data.

なお、本実施の形態において情報処理システムは、端末装置３からサーバ装置１が動画像データを取得して編集処理を行い、編集済みの動画像データをサーバ装置１から端末装置３へ送信する構成としたが、これに限るものではなく、例えば端末装置３にて編集処理等を行う構成としてもよく、この場合には情報処理システムにサーバ装置１が含まれていなくてもよい。本実施の形態において説明した情報処理は、複数の装置の協働により行われてもよく、単体の装置にて行われてもよい。 In the present embodiment, the information processing system has a configuration in which the server device 1 acquires moving image data from the terminal device 3 and performs editing processing, and transmits the edited moving image data from the server device 1 to the terminal device 3. However, the present invention is not limited to this, and for example, the terminal device 3 may be configured to perform editing processing and the like. In this case, the information processing system may not include the server device 1. The information processing described in the present embodiment may be performed by the cooperation of a plurality of devices, or may be performed by a single device.

また本実施の形態においてサーバ装置１は、１つの動画像データに対して編集処理を行っているが、これに限るものではない。例えば端末装置３から複数の動画像データをサーバ装置１が取得し、複数の全体動画像からそれぞれ一又は複数の部分動画像を抽出して編集処理を行い、編集済みの複数の部分動画像を結合して１つのダイジェスト動画像を生成してもよい。 Further, in the present embodiment, the server device 1 performs editing processing on one moving image data, but the present invention is not limited to this. For example, the server device 1 acquires a plurality of moving image data from the terminal device 3, extracts one or a plurality of partial moving images from each of the plurality of whole moving images, performs editing processing, and edits the edited plurality of partial moving images. They may be combined to generate one digest video.

また本実施の形態においてサーバ装置１は、動画像データの入力に対してこの動画像データの特徴を出力する学習モデルを用いたが、例えば以下のような学習モデルを用いて編集処理を行ってもよい。例えば、動画像データの入力に対して、本実施の形態においてサーバ装置１の処理として説明した編集処理を施すように機械学習がなされた学習モデルを生成してもよい。サーバ装置１は、端末装置３から取得した動画像データをこの学習モデルへ入力し、学習モデルが出力する編集済みの動画像データを取得して端末装置３へ送信することができる。 Further, in the present embodiment, the server device 1 uses a learning model that outputs the features of the moving image data in response to the input of the moving image data. For example, the server device 1 performs editing processing using the following learning model. May be good. For example, a learning model in which machine learning is performed so as to perform the editing process described as the process of the server device 1 in the present embodiment may be generated for the input of the moving image data. The server device 1 can input the moving image data acquired from the terminal device 3 into the learning model, acquire the edited moving image data output by the learning model, and transmit it to the terminal device 3.

また例えばサーバ装置１は、動画像データやシーン情報等の入力に対して又は特徴特定モデル１２ｂにより特定した特徴の入力に対して、編集方法を出力する学習モデルを用いてもよい。サーバ装置１が編集済みの動画像データを端末装置３へ送信し、端末装置３が編集済みの動画像データを再生して表示した後、端末装置３は、この編集内容に対するユーザの評価を取得してもよい。例えば図５に示した動画像再生画面に設けられた動画共有のアイコン１０２に対する操作がなされた場合には編集内容に対して高評価がなされたものとし、自動編集のアイコン１０１に対する２回目以降の操作がなされた場合（即ち再編集の依頼がなされた場合）には編集内容に対して低評価がなされたものとして端末装置３が編集内容に対する評価を取得して蓄積することができる。端末装置３が取得した評価はサーバ装置１へフィードバックされ、編集方法を出力する学習モデルの再学習に用いることができる。ユーザの評価に基づいて再学習された学習モデルを用いることによって、よりユーザの好みを反映した編集処理を実現できる。 Further, for example, the server device 1 may use a learning model that outputs an editing method for input of moving image data, scene information, or the like, or for input of a feature specified by the feature specifying model 12b. After the server device 1 transmits the edited moving image data to the terminal device 3 and the terminal device 3 reproduces and displays the edited moving image data, the terminal device 3 acquires the user's evaluation for the edited content. You may. For example, when the operation for the video sharing icon 102 provided on the moving image playback screen shown in FIG. 5 is performed, it is assumed that the edited content is highly evaluated, and the second and subsequent times for the automatic editing icon 101. When an operation is performed (that is, when a request for re-editing is made), the terminal device 3 can acquire and store an evaluation of the edited content as if the edited content was given a low evaluation. The evaluation acquired by the terminal device 3 is fed back to the server device 1 and can be used for re-learning the learning model that outputs the editing method. By using a learning model that has been relearned based on the user's evaluation, it is possible to realize an editing process that more reflects the user's preference.

また本実施の形態においては端末装置３にて撮影した動画像データを編集処理の対象としたが、これに限るものではない。端末装置３とは異なるビデオカメラ等の装置にて撮影した動画像データをサーバ装置１が取得して編集処理を行い、編集済みの動画像データを端末装置３へ送信してもよい。またインターネット等にて公開されている動画像データをユーザが端末装置３にて取得し、取得した動画像データに対して編集処理を行ってもよい。またゲーム機にてプレイしたゲームの様子を動画像データとして保存し、この動画像データに対して編集処理を行ってもよい。この場合に編集処理はサーバ装置１が行ってもよく、ゲーム機が行ってもよい。 Further, in the present embodiment, the moving image data captured by the terminal device 3 is targeted for the editing process, but the present invention is not limited to this. The server device 1 may acquire the moving image data taken by a device such as a video camera different from the terminal device 3, perform an editing process, and transmit the edited moving image data to the terminal device 3. Further, the user may acquire the moving image data published on the Internet or the like on the terminal device 3 and edit the acquired moving image data. Further, the state of the game played on the game machine may be saved as moving image data, and the moving image data may be edited. In this case, the editing process may be performed by the server device 1 or the game machine.

＜変形例１＞
図１０は、変形例１に係る端末装置３が表示する編集設定画面の一例を示す模式図である。変形例１に係る情報処理システムでは、サーバ装置１が行う種々の編集処理に対して、ユーザが各編集処理の実施の可否を設定することが可能である。端末装置３は、例えば図示しないメニュー画面等において編集処理に関する設定の項目が選択された場合に、図示の編集設定画面を表示部３４に表示する。 <Modification example 1>
FIG. 10 is a schematic view showing an example of an edit setting screen displayed by the terminal device 3 according to the modified example 1. In the information processing system according to the first modification, the user can set whether or not to execute each editing process for various editing processes performed by the server device 1. The terminal device 3 displays the illustrated edit setting screen on the display unit 34 when, for example, a setting item related to the editing process is selected on a menu screen (not shown) or the like.

編集設定画面は、最上部に「編集設定」のタイトル文字列が表示されて、その下方に実施可能な編集処理の方法がチェックボックスと共に一覧表示される。端末装置３は、各チェックボックスに対するユーザの操作に応じてチェックの有無をトグル的に更新して表示する。図示の例では、「装飾画像の追加」、「キャラクタ画像の追加」、「エフェクト画像の追加」、「時間帯変更」、「スタイル変更」、「効果音追加」、「背景音追加」、「音声変更」及び「再生速度変更」の９項目が設定可能な編集方法として示されている。これらのうち、「時間帯変更」及び「再生速度変更」の２項目がチェックボックスにチェックされておらず、ユーザが編集方法から除外したものとみなされる。 On the edit setting screen, the title character string of "edit setting" is displayed at the top, and the edit processing methods that can be executed are listed together with a check box below it. The terminal device 3 toggles and displays the presence or absence of a check according to the user's operation on each check box. In the illustrated example, "Add decorative image", "Add character image", "Add effect image", "Change time zone", "Change style", "Add sound effect", "Add background sound", " Nine items of "sound change" and "playback speed change" are shown as editable editing methods. Of these, the two items "change time zone" and "change playback speed" are not checked in the check boxes, and it is considered that the user has excluded them from the editing method.

編集設定画面の最下部には「ＯＫ」のラベルが付されたボタンが設けられている。端末装置３は、ＯＫのボタンに対するタッチ操作、タップ操作又はクリック操作等を受け付けた場合に、編集設定画面に表示された各項目のチェックボックスのチェック状態を取得し、各編集方法の採用の可否を決定する。端末装置３は、サーバ装置１に対して動画像データの編集を依頼する際に、各編集方法の可否を示した設定情報をサーバ装置１へ送信する。 A button labeled "OK" is provided at the bottom of the edit setting screen. When the terminal device 3 receives a touch operation, a tap operation, a click operation, or the like for the OK button, the terminal device 3 acquires the check state of the check box of each item displayed on the edit setting screen, and whether or not each edit method can be adopted. To determine. When requesting the server device 1 to edit the moving image data, the terminal device 3 transmits the setting information indicating whether or not each editing method is possible to the server device 1.

サーバ装置１は、端末装置３から送信される設定情報を受信し、受信した設定情報に基づいて各編集方法の適用の可否を判断する。例えばサーバ装置１は、動画像データから特定して特徴の組み合わせに基づいて編集方法決定テーブル１２ｃを参照することで編集方法を決定するが、編集方法決定テーブル１２ｃに定められた編集方法が適用不可に設定されている場合、この編集方法を用いた編集処理を行わない。この場合にサーバ装置１は、代わりの編集方法で編集処理を行ってもよく、編集処理を行わなくてもよい。 The server device 1 receives the setting information transmitted from the terminal device 3 and determines whether or not each editing method can be applied based on the received setting information. For example, the server device 1 determines the editing method by specifying from the moving image data and referring to the editing method determination table 12c based on the combination of features, but the editing method defined in the editing method determination table 12c cannot be applied. If is set to, the editing process using this editing method is not performed. In this case, the server device 1 may perform the editing process by an alternative editing method, or may not perform the editing process.

＜変形例２＞
図１１は、変形例２に係る情報処理システムの構成を説明するための模式図である。変形例２に係る情報処理システムは、例えばテーマパーク、遊園地、動物園、イベント会場又は結婚式場等の施設に設置された一又は複数のカメラ５が撮影した動画像データをサーバ装置１が取得する。サーバ装置１は、カメラ５から取得した複数の動画像データから部分動画像を抽出してダイジェスト動画像データを生成する。サーバ装置１は、生成したダイジェスト動画像データを予め定められたユーザの端末装置３へ送信する。 <Modification 2>
FIG. 11 is a schematic diagram for explaining the configuration of the information processing system according to the second modification. In the information processing system according to the second modification, the server device 1 acquires moving image data taken by one or more cameras 5 installed in a facility such as a theme park, an amusement park, a zoo, an event venue, or a wedding hall. .. The server device 1 extracts partial moving images from a plurality of moving image data acquired from the camera 5 and generates digest moving image data. The server device 1 transmits the generated digest moving image data to a predetermined user terminal device 3.

変形例２に係るサーバ装置１は、例えば予め顔画像等が登録されたユーザについて、このユーザが映されているシーンを複数の動画像データから抽出することで複数の部分動画像を生成する。サーバ装置１は、これら複数の部分動画像に対して上述のような編集処理を施して結合したダイジェスト動画像データを生成することができる。サーバ装置１は、顔画像と共に登録されたユーザのメールアドレス等に対して生成した動画像データを送信することができる。 The server device 1 according to the second modification generates a plurality of partial moving images by extracting a scene in which the user is projected from a plurality of moving image data, for example, for a user in which a face image or the like is registered in advance. The server device 1 can generate digest moving image data obtained by performing the above-mentioned editing processing on these plurality of partial moving images and combining them. The server device 1 can transmit the generated moving image data to the user's e-mail address or the like registered together with the face image.

今回開示された実施形態はすべての点で例示であって、制限的なものではないと考えられるべきである。本発明の範囲は、上記した意味ではなく、特許請求の範囲によって示され、特許請求の範囲と均等の意味及び範囲内でのすべての変更が含まれることが意図される。 The embodiments disclosed this time should be considered as exemplary in all respects and not restrictive. The scope of the present invention is indicated by the scope of claims, not the above-mentioned meaning, and is intended to include all modifications within the meaning and scope equivalent to the scope of claims.

１サーバ装置
３端末装置
５カメラ
１１処理部
１１ａ動画像取得部
１１ｂ部分動画像抽出部
１１ｃ特徴特定部
１１ｄ編集方法決定部
１１ｅ編集処理部
１１ｆ結合処理部
１１ｇ編集済動画像送信部
１２記憶部
１２ａサーバプログラム
１２ｂ特徴特定モデル
１２ｃ編集方法決定テーブル
１３通信部
３１処理部
３１ａ撮影処理部
３１ｂ表示処理部
３１ｃ編集指示受付部
３１ｄ動画像送信部
３１ｅ動画像受信部
３２記憶部
３２ａプログラム
３３通信部
３４表示部
３５操作部
３６カメラ
９８，９９記録媒体
１０１，１０２アイコン 1 Server device 3 Terminal device 5 Camera 11 Processing unit 11a Moving image acquisition unit 11b Partial moving image extraction unit 11c Feature identification unit 11d Editing method determination unit 11e Editing processing unit 11f Combined processing unit 11g Edited moving image transmission unit 12 Storage unit 12a Server program 12b Feature identification model 12c Editing method determination table 13 Communication unit 31 Processing unit 31a Shooting processing unit 31b Display processing unit 31c Editing instruction reception unit 31d Video transmission unit 31e Video reception unit 32 Storage unit 32a Program 33 Communication unit 34 Display Unit 35 Operation unit 36 Camera 98,99 Recording medium 101,102 Icon

一実施形態に係る情報処理方法は、情報処理装置が動画像データを生成する情報処理方法であって、前記情報処理装置が、施設に設置された一又は複数のカメラが撮影した動画像データを取得し、取得した動画像データから、登録されたユーザが映された複数の部分動画像データを抽出し、抽出した部分動画像データの特徴を特定し、特定した特徴に応じて前記部分動画像データの編集処理を行い、編集した部分動画像データを結合し、結合した動画像データを、登録された端末装置へ送信する。 The information processing method according to one embodiment is an information processing method in which an information processing apparatus generates moving image data, and the information processing apparatus obtains moving image data taken by one or more cameras installed in a facility. From the acquired moving image data, a plurality of partial moving image data in which the registered user is projected are extracted, the features of the extracted partial moving image data are specified, and the partial moving image is described according to the specified features. There line editing processing data, combines the partial moving image data edited, the video data bound, and transmits to the registered terminal apparatus.

一実施形態に係る情報処理方法は、情報処理装置が動画像データを生成する情報処理方法であって、前記情報処理装置が、施設に設置された一又は複数のカメラが撮影した動画像データを取得し、取得した動画像データから、登録されたユーザが映された複数の部分動画像データを抽出し、動画像データを入力として受け付けて当該動画像データに含まれるシーンの特徴を特定した情報を出力するよう機械学習がなされた学習モデルを用いて、抽出した部分動画像データを前記学習モデルへ入力して当該学習モデルが出力する情報を取得することで前記部分動画像データの特徴を特定し、前記情報に対応付けられた編集方法に基づいて前記部分動画像データの編集処理を行い、編集した部分動画像データを結合し、結合した動画像データを、登録された端末装置へ送信する。 The information processing method according to one embodiment is an information processing method in which an information processing apparatus generates moving image data, and the information processing apparatus obtains moving image data taken by one or a plurality of cameras installed in a facility. Information that extracts a plurality of partial moving image data in which the registered user is projected from the acquired moving image data, accepts the moving image data as input, and identifies the characteristics of the scene included in the moving image data. The characteristics of the partial moving image data are specified by inputting the extracted partial moving image data into the learning model and acquiring the information output by the learning model using the learning model machine-learned to output. Then, the partial moving image data is edited based on the editing method associated with the information , the edited partial moving image data is combined, and the combined moving image data is transmitted to the registered terminal device. ..

Claims

An information processing method in which an information processing device generates moving image data.
The information processing device
Get moving image data,
Identify the characteristics of the acquired moving image data and
The moving image data is edited according to the specified feature.
Information processing method.

The information processing device
Identifying multiple features at the same time in the moving image data,
Editing processing is performed on the moving image data according to a combination of a plurality of specified features.
The information processing method according to claim 1.

In the editing process,
Processing for superimposing a decorative image, a character image, or an effect image on the moving image data,
Processing to change the time zone of the scene included in the moving image data,
The process of changing the style of the moving image data,
Processing to add sound effects or background sounds to the moving image data,
Processing to change the sound included in the moving image data, and
Includes at least one process of changing the reproduction speed of the moving image data.
The information processing method according to claim 1 or 2.

The information processing device
A plurality of partial moving image data are extracted from the moving image data, and
The editing process is performed on the extracted partial moving image data.
Combine the edited partial moving image data,
The information processing method according to any one of claims 1 to 3.

The information processing device
Accepting the settings related to the editing process
Perform the editing process according to the received settings.
The information processing method according to any one of claims 1 to 4.

An image that receives an instruction to execute an editing process on the moving image data is output together with the moving image data.
When the execution instruction is received, the information processing device edits the moving image data, and the information processing device edits the moving image data.
Instead of the moving image data before editing, the moving image data after editing is output.
The information processing method according to any one of claims 1 to 5.

The acquisition unit that acquires moving image data, and
A specific unit that identifies the characteristics of the moving image data acquired by the acquisition unit, and
An information processing device including an editing unit that edits the moving image data according to the features specified by the specific unit.

On the computer
Get moving image data,
Identify the characteristics of the acquired moving image data and
A computer program that executes a process of editing the moving image data according to the specified features.