JP2020005309A

JP2020005309A - Moving image editing server and program

Info

Publication number: JP2020005309A
Application number: JP2019170075A
Authority: JP
Inventors: 雄康高松; Yuko Takamatsu; 孝弘坪野; Takahiro Tsubono; 尚武石橋; Naotake Ishibashi
Original assignee: Open8 Inc
Current assignee: Open8 Inc
Priority date: 2019-09-19
Filing date: 2019-09-19
Publication date: 2020-01-09

Abstract

To provide a server and a program that make it possible to easily create a moving image content.SOLUTION: There are provided a moving image editing server and a program therefor, the server creating a moving image content to be distributed to a user terminal, and comprising: a template management part which manages a template having a plurality of prescribed cuts given tag information; a raw material read-in part which receives input of image data and character data; a raw material analysis part which recognizes an object in the input image data and outputs similarity to the tag information; an image insertion part which allocates the image data to the template on the basis of the similarity; and a telop insertion part which allocates a telop created from the character data to the template, wherein the image data and telop are allocated to a selected template to create a moving image content.SELECTED DRAWING: Figure 2

Description

本発明は、ユーザ端末に配信する動画コンテンツを自動で生成するサーバおよびプログラムに関する。 The present invention relates to a server and a program for automatically generating moving image content to be distributed to a user terminal.

従来、動画を複数のチャプタに分割すると共に、メタデータを付加することが行われている。
例えば、特許文献１には、複数のチャプタを有する動画から所望の瞬間のシーン画像を効率的に検索する動画処理装置であって、動画を所定の単位時間毎に複数の大ブロックに分割する大ブロック分割部と、各大ブロックの画像変化の複雑さを数値化する複雑さ分析部と、各大ブロックに係る再生時間を、複雑さに係る数値により複数の小ブロックにそれぞれ分割する小ブロック分割部と、複数の小ブロックを時系列に沿って所定数毎に順次区切ることでチャプタを作成するチャプタ作成部と、を備えてなる動画処理装置が提案されている。 Conventionally, a moving image is divided into a plurality of chapters and metadata is added.
For example, Patent Literature 1 discloses a moving image processing apparatus that efficiently searches for a scene image at a desired moment from a moving image having a plurality of chapters. The moving image processing apparatus divides a moving image into a plurality of large blocks every predetermined unit time. A block division unit, a complexity analysis unit that quantifies the complexity of image change of each large block, and a small block division that divides the reproduction time of each large block into a plurality of small blocks by using a numerical value related to complexity There has been proposed a moving image processing apparatus including a unit and a chapter creating unit that creates a chapter by sequentially dividing a plurality of small blocks in a predetermined number along a time series.

特開２０１１−１３０００７号公報JP 2011-130007 A

動画コンテンツを作成することには多大な手間がかかるため、簡便に動画コンテンツを作成することができるシステムの提供が求められていた。 Since it takes a great deal of time to create moving image content, it has been required to provide a system that can easily create moving image content.

そこで、本発明では、動画コンテンツを簡便に作成することを可能とするサーバおよびプログラムを提供することを目的とする。 In view of the above, an object of the present invention is to provide a server and a program capable of easily creating moving image content.

本発明の動画編集サーバは、ユーザ端末に配信するための動画コンテンツを作成するサーバであって、タグ情報が付された複数のカットが規定されたテンプレートを管理するテンプレート管理部と、イメージデータおよび文字データの入力を受け付ける素材読込部と、入力されたイメージデータ中のオブジェクトを認識し、前記タグ情報との類似度を出力する素材分析部と、前記類似度に基づき前記テンプレートにイメージデータを割り付けるイメージ挿入部と、前記文字データから作成したテロップを前記テンプレートに割り付けるテロップ挿入部と、を備え、選択されたテンプレートにイメージデータおよびテロップを割り付けることにより動画コンテンツを作成することを特徴とする。
上記動画編集サーバにおいて、前記テロップ挿入部が、前記タグ情報との類似度に基づき前記テンプレートに文字データを割り付けることを特徴としてもよい。
上記動画編集サーバにおいて、さらに、入力されたイメージデータに基づき推奨するテンプレートを出力するテンプレート推奨部を備えることを特徴としてもよい。
上記動画編集サーバにおいて、学習データを用いて機械学習を行った学習済モデルであり、入力されたイメージデータのアノテーション単語を出力する分類器を備え、前記素材分析部が出力する類似度が、前記分類器用が出力したアノテーション単語と前記タグ情報との単語類似度であることを特徴としてもよい。 The moving image editing server of the present invention is a server that creates moving image content for distribution to a user terminal, and includes a template management unit that manages a template in which a plurality of cuts with tag information are defined, A material reading unit that receives input of character data, a material analyzing unit that recognizes an object in the input image data and outputs a similarity with the tag information, and allocates image data to the template based on the similarity A telop insertion unit for allocating a telop created from the character data to the template is provided, and a moving image content is created by allocating the image data and the telop to the selected template.
In the moving image editing server, the telop insertion unit may assign character data to the template based on a degree of similarity with the tag information.
The moving image editing server may further include a template recommendation unit that outputs a recommended template based on the input image data.
In the moving image editing server, the learning model is a learned model that has performed machine learning using learning data, includes a classifier that outputs an annotation word of input image data, and the similarity output by the material analysis unit is It may be characterized by the word similarity between the annotation word output by the classifier and the tag information.

上記動画編集サーバにおいて、前記テンプレートには色情報が付されており、前記素材分析部が、前記イメージデータと前記色情報との色類似度を出力し、前記イメージ挿入部が、前記単語類似度および／または前記色類似度に基づき前記テンプレートにイメージデータを割り付けることを特徴としてもよい。
上記動画編集サーバにおいて、前記テロップ挿入部が、前記文字データに含まれる文章情報を要約して前記テロップを作成することを特徴としてもよい。
上記動画編集サーバにおいて、前記素材読込部が、動画ファイル中の音声を音声認識して文字データを入力する機能を備えることを特徴としてもよい。
上記動画編集サーバにおいて、前記素材読込部が、入力されたイメージデータに加え、データベースまたはＷｅｂ上からイメージデータを取得する機能を備えることを特徴としてもよい。 In the moving image editing server, the template is provided with color information, the material analysis unit outputs a color similarity between the image data and the color information, and the image insertion unit includes the word similarity. And / or allocating image data to the template based on the color similarity.
In the moving image editing server, the telop insertion unit may create the telop by summarizing text information included in the character data.
In the moving image editing server, the material reading unit may have a function of recognizing voice in the moving image file and inputting character data.
In the moving image editing server, the material reading unit may include a function of acquiring image data from a database or the Web in addition to the input image data.

本発明の動画配信システムは、上記動画編集サーバと、動画配信サーバとを備える動画配信システムであって、動画配信サーバが、動画の配信面が設定されているＷｅｂページのコンテンツ情報を収集するＷｅｂスクレイパーと、前記コンテンツ情報を解析し、予め作成した前記動画コンテンツとの類似度を解析するコンテンツ解析部と、前記類似度が高い動画コンテンツを配信する動画配信部と、を備えることを特徴とする。これらは動画広告の配信などに活用もされる。 A moving image distribution system of the present invention is a moving image distribution system including the moving image editing server and the moving image distribution server, wherein the moving image distribution server collects content information of a Web page on which a moving image distribution surface is set. A scraper, a content analysis unit that analyzes the content information and analyzes a similarity with the video content created in advance, and a video distribution unit that distributes the video content with a high similarity. . These are also used for distribution of video advertisements.

本発明の動画編集サーバ用プログラムは、インターネットを介してアクセスしたユーザ端末に動画コンテンツを配信するサーバ用の動画編集プログラムにおいて、前記サーバを、タグ情報が付された複数のカットが規定されたテンプレートを管理するテンプレート管理部と、イメージデータおよび文字データの入力を受け付ける素材読込部、入力されたイメージデータ中のオブジェクトを認識し、前記タグ情報との類似度を出力する素材分析部、前記類似度に基づき前記テンプレートにイメージデータを割り付けるイメージ挿入部、および、前記文字データから作成したテロップを挿入するテロップ挿入部、として機能させることを特徴とする。 The moving image editing server program of the present invention is a moving image editing program for a server that distributes moving image content to a user terminal accessed via the Internet, wherein the server is a template in which a plurality of cuts with tag information are defined. And a material reading unit for receiving input of image data and character data, a material analyzing unit for recognizing an object in the input image data and outputting a similarity with the tag information, And a telop insertion unit that inserts a telop created from the character data based on the image data.

本発明によれば、動画コンテンツを簡便に作成することを可能とするサーバおよびプログラムを提供することが可能となる。 ADVANTAGE OF THE INVENTION According to this invention, it becomes possible to provide the server and program which enable easy creation of moving image content.

実施形態例に係る動画編集システムの構成図である。1 is a configuration diagram of a moving image editing system according to an embodiment. 実施形態例に係る動画編集サーバの構成図である。FIG. 2 is a configuration diagram of a moving image editing server according to the embodiment. テンプレートを構成する画面レイアウト例を説明する図である。FIG. 4 is a diagram illustrating an example of a screen layout that forms a template. 合成動画作成部の構成図である。FIG. 3 is a configuration diagram of a composite moving image creating unit. 素材入力画面例を説明する図である。It is a figure explaining the example of a material input screen. 素材から動画コンテンツを作成するための各工程を説明する図である。FIG. 3 is a diagram illustrating each process for creating moving image content from a material. 各カットに最も適した画像・動画を割り当てる処理の説明図である。FIG. 11 is an explanatory diagram of a process of allocating an image / moving image most suitable for each cut. 動画コンテンツを構成する複数のカットを画面上に一覧表示する態様の説明図である。FIG. 11 is an explanatory diagram of a mode in which a plurality of cuts constituting moving image content are displayed as a list on a screen. 動画コンテンツには、ＢＧＭを挿入する画面の説明図である。FIG. 3 is an explanatory diagram of a screen for inserting BGM into moving image content. 要約文作成機能の処理フローである。It is a processing flow of a summary sentence creation function. 文書を形態素解析にかけ、トークンに分ける処理を説明する図である。FIG. 9 is a diagram illustrating a process of subjecting a document to morphological analysis and dividing the document into tokens. テンプレートの各カットへの文章の挿入を説明する図である。It is a figure explaining insertion of a sentence to each cut of a template. 実施形態例に係る動画配信サーバの構成図である。It is a lineblock diagram of a video distribution server concerning an example of an embodiment.

＜構成＞
実施形態例に係る本発明の動画編集システムは、図１に示すように、動画編集サーバ１と、管理者端末２と、複数台のユーザ端末３とを備えて構成される。図１の例では、動画編集サーバ１を１台で構成する例を説明しているが、複数台のサーバ装置により動画編集サーバ１を実現することも可能である。 <Configuration>
As shown in FIG. 1, the moving image editing system according to the embodiment of the present invention includes a moving image editing server 1, an administrator terminal 2, and a plurality of user terminals 3. Although the example of FIG. 1 illustrates an example in which the moving image editing server 1 is configured by one unit, the moving image editing server 1 can be realized by a plurality of server devices.

動画編集サーバ１は、ＣＰＵを有する処理部、ＨＤＤ等の記憶装置を有する記憶部、および、ＬＡＮポートを有する通信部を備えたサーバ装置に動画編集ソフトウェアおよびデータベースソフトウェアをインストールして構築されている。動画編集ソフトウェアは、図２に示すように、テンプレート管理部１１と、分類器作成部１２と、合成動画作成部１３とを備えている。データベースソフトウェアは、図２に示すように、テンプレートＤＢ２１と、学習データＤＢ２２と、合成動画ＤＢ２３と、組込用素材ＤＢ２４と、を管理している。 The moving image editing server 1 is constructed by installing moving image editing software and database software on a server device including a processing unit having a CPU, a storage unit having a storage device such as an HDD, and a communication unit having a LAN port. . As shown in FIG. 2, the moving image editing software includes a template managing unit 11, a classifier creating unit 12, and a combined moving image creating unit 13. As shown in FIG. 2, the database software manages a template DB 21, a learning data DB 22, a combined moving image DB 23, and an embedded material DB 24.

テンプレート管理部１１は、テンプレートＤＢ２１に記憶された複数のテンプレートを管理する。各テンプレートは、複数のカットから構成され、各カットには画面レイアウトおよび再生時間が規定されている。テンプレートには、画像ファイルが割り付けられている場合もあり、テンプレートに割り付ける画像ファイルは組込用素材ＤＢ２４から選択することができる。図３は、テンプレートを構成するカットの画面レイアウトの一例である。同図中、テロップフィールド３１に編集された記事素材（テキスト文章）が挿入され、イメージフィールド３２に選択されたイメージ素材が挿入され、ロゴフィールド３３にロゴ素材が挿入される。各テンプレートには、スタイル情報、色情報およびタグ情報が付されている。色情報およびタグ情報は、後述のテンプレート推奨部１３４により利用される。 The template management unit 11 manages a plurality of templates stored in the template DB 21. Each template is composed of a plurality of cuts, and a screen layout and a reproduction time are defined for each cut. An image file may be assigned to the template, and the image file to be assigned to the template can be selected from the embedded material DB 24. FIG. 3 is an example of a screen layout of cuts constituting a template. In the figure, the edited article material (text sentence) is inserted into the telop field 31, the selected image material is inserted into the image field 32, and the logo material is inserted into the logo field 33. Each template is provided with style information, color information, and tag information. The color information and the tag information are used by a template recommendation unit 134 described later.

分類器作成部１２は、学習データを学習データＤＢ２２から取得し、機械学習させることで、学習済モデルである後述の分類器１３３を作成する。分類器作成部１２による分類器１３３の作成は、例えば、年に数回程度行われる。学習データは、インターネットから収集したデータや自社のデータにラベルをつけたものを利用してもよいし、ラベルのついたデータセットを調達して利用してもよい。 The classifier creating unit 12 acquires the learning data from the learning data DB 22 and performs machine learning to create a classifier 133 described later, which is a learned model. The creation of the classifier 133 by the classifier creation unit 12 is performed several times a year, for example. As the learning data, data collected from the Internet or data of the company in which a label is attached may be used, or a labeled data set may be procured and used.

合成動画作成部１３は、図４に示すように、素材読込部１３１と、素材分析部１３２と、分類器１３３と、テンプレート推奨部１３４と、イメージ挿入部１３５と、テロップ挿入部１３６と、音楽挿入部１３７とを備えている。 As shown in FIG. 4, the synthetic moving image creation unit 13 includes a material reading unit 131, a material analysis unit 132, a classifier 133, a template recommendation unit 134, an image insertion unit 135, a telop insertion unit 136, and music. And an insertion portion 137.

素材読込部１３１は、ユーザ端末３のＷｅｂブラウザ上に素材入力画面ページを表示する。素材読込部１３１は、イメージデータの入力を受け付けるイメージデータ入力部と、文字データの入力を受け付ける文字データ入力部とを備えている。
図５は、素材読込部１３１が出力する素材入力画面の一例を説明する図である。ユーザが素材入稿フィールド５１またはロゴ入稿フィールド５２にイメージデータ（画像・動画）をドラッグアンドドロップすることで、画像・動画素材を入稿することができる。入稿する画像・動画素材の数は、＋ボタンまたは−ボタンをクリックすることにより増減することができる。
文字データ（記事素材）については、タイトルフィールド５３および本文フィールド５４にテキスト文章を入力することにより入稿することができる。各素材を入稿した後、推奨フォーマットボタン５５をクリックすると、推奨フォーマット表示欄に推奨フォーマットが出力される。図５の例では推奨フォーマットを一つだけ表示する仕様としているが、複数の推奨フォーマットを推奨度順に表示させる仕様を採用してもよい。実施形態例と異なり、ユーザが複数のフォーマットの中から任意のフォーマットを手動で選ぶ仕様を採用してもよい。また、テキスト文章の入力を、動画ファイル中の音声を音声認識してテキスト化することでテキスト文章を入力する仕様を採用してもよく、例えば、外部ＡＰＩとして提供される音声認識サービスを利用しもよい。また、ユーザが入力したＵＲＬから、記事素材、画像・動画素材を取得し、組込用素材ＤＢ２４に格納する仕様を採用してもよい。 The material reading unit 131 displays a material input screen page on a Web browser of the user terminal 3. The material reading unit 131 includes an image data input unit that receives input of image data, and a character data input unit that receives input of character data.
FIG. 5 is a diagram illustrating an example of a material input screen output by the material reading unit 131. By dragging and dropping image data (image / movie) into the material submission field 51 or the logo submission field 52, the user can submit an image / video material. The number of image / movie materials to be submitted can be increased or decreased by clicking the + button or the-button.
Character data (article material) can be entered by inputting text in the title field 53 and the body field 54. After submitting each material, when the recommended format button 55 is clicked, the recommended format is output in a recommended format display column. In the example of FIG. 5, only one recommended format is displayed. However, a specification in which a plurality of recommended formats are displayed in order of recommendation may be adopted. Unlike the embodiment, a specification in which a user manually selects an arbitrary format from a plurality of formats may be adopted. In addition, a specification may be adopted in which a text sentence is input by inputting a text sentence by recognizing a voice in a moving image file and converting the text into a text. For example, a voice recognition service provided as an external API may be used. Is also good. Further, a specification may be adopted in which article materials, image / moving image materials are acquired from the URL input by the user, and are stored in the embedded material DB 24.

素材分析部１３２は、入力された素材から抽出される情報とフォーマットに割り付けられたタグ情報との関連性の度合いを算出する。素材からの情報抽出は、分類器１３３により行う。
分類器１３３は、畳み込みニューラルネットワークを利用した学習済モデルであり、動画または画像を入力すると、特定の情報（アノテーション単語）を抽出する。実施形態例の第１分類器は、動画または画像の分類を表す単語（例えば、スタイリッシュ、シンプル）を出力する。実施形態例の第２分類器は、動画または画像に映るオブジェクトを表す単語（例えば、魚介、焼肉、人物、家具）を出力する。 The material analysis unit 132 calculates the degree of relevance between the information extracted from the input material and the tag information assigned to the format. Information extraction from the material is performed by the classifier 133.
The classifier 133 is a trained model using a convolutional neural network, and extracts specific information (annotation word) when a moving image or an image is input. The first classifier according to the embodiment outputs a word (for example, stylish or simple) representing a classification of a moving image or an image. The second classifier according to the embodiment outputs words (for example, seafood, grilled meat, people, and furniture) representing an object appearing in a moving image or an image.

テンプレート推奨部１３４は、素材分析部１３２の出力に基づき最も関連度の高いフォーマットをユーザに推奨する。
イメージ挿入部１３５は、畳み込みニューラルネットワークを利用した学習済モデルであり、入力された素材から抽出される情報と各カットに割り付けられた単語、色および明度の類似度を算出し、最も類似度の高い画像・動画を各カットに割り当てる。ここで、ユーザが入力した画像・動画素材のみならず、予め組込用素材ＤＢ２４に格納した画像・動画素材についても類似度を判定し、割り当てるようにしてよい。
単語の類似度の判定は、単語ベクトルを学習した学習済モデルを用意し、そのベクトルを利用してコサイン類似度やWord Mover’s Distanceなどの方法により単語の類似度を判定する。色の類似度の判定は、例えば、CIELAB座標におけるユークリッド距離を計算することにより行う。具体的には、画像全てのピクセルにおいて、比較対象の色との色差を計算し、その合計値が小さいものを色差が近いものと判定する。動画においては、動画のフレーム画像からサンプリングされた画像においてこの計算を行う。明度の判定は、例えば画像をグレースケールに変換し、ピクセルのRMSコントラスト値を比較することで行う。動画は色の判定と同様、動画のフレーム画像からサンプリングした画像においてこの計算を行う。
テロップ挿入部１３６は、選択されたテンプレートの各カットにテロップ（テキスト文章）を挿入する。テロップ挿入部１３６による要約文作成機能の詳細については後述する。
音楽挿入部１３７は、動画コンテンツにＢＧＭとなる音楽を挿入する。 The template recommendation unit 134 recommends the format with the highest relevance to the user based on the output of the material analysis unit 132.
The image insertion unit 135 is a learned model using a convolutional neural network, and calculates similarity between information extracted from the input material and words, colors, and brightness assigned to each cut, and calculates the highest similarity. Assign a high image / movie to each cut. Here, the similarity may be determined and assigned not only to the image / video material input by the user but also to the image / video material stored in the embedded material DB 24 in advance.
To determine the word similarity, a learned model in which a word vector is learned is prepared, and the vector similarity is used to determine the word similarity by a method such as cosine similarity or Word Mover's Distance. The determination of the color similarity is performed, for example, by calculating the Euclidean distance in CIELAB coordinates. More specifically, the color difference from the color to be compared is calculated for all the pixels of the image, and a pixel having a small total value is determined to be close to the color difference. For moving images, this calculation is performed on images sampled from frame images of moving images. The determination of the lightness is performed, for example, by converting the image to gray scale and comparing the RMS contrast values of the pixels. For a moving image, this calculation is performed on an image sampled from a frame image of the moving image, as in the case of color determination.
The telop insertion unit 136 inserts a telop (text sentence) into each cut of the selected template. Details of the summary sentence creation function by the telop insertion unit 136 will be described later.
The music insertion unit 137 inserts music that becomes BGM into the moving image content.

管理者端末２およびユーザ端末３は、入力部、表示部、処理部、記憶部および通信部を備えたコンピュータであり、例えば、スマートフォン、タブレット端末（タブレットＰＣ）、ノートパソコン、デスクトップパソコンなどのＷｅｂブラウザが搭載されたコンピュータである。
管理者は、管理者端末２により動画編集サーバ１の設定変更やデータベースの運用管理などを行う。
ユーザは、ユーザ端末３により動画編集サーバ１にアクセスして、作成された動画コンテンツを閲覧することができる。 The administrator terminal 2 and the user terminal 3 are computers each including an input unit, a display unit, a processing unit, a storage unit, and a communication unit, and include, for example, Web sites such as a smartphone, a tablet terminal (tablet PC), a notebook personal computer, and a desktop personal computer. It is a computer with a browser.
The administrator uses the administrator terminal 2 to change the settings of the moving image editing server 1 and operate and manage the database.
The user can access the moving image editing server 1 through the user terminal 3 and browse the created moving image content.

本発明の動画編集システムは、図６に示すように、ユーザが入力した素材を分析する素材分析工程、作成する動画のフォーマットを選択するフォーマット選択工程、選択したフォーマットにテロップおよびイメージデータ（静止画・動画）を割り当てる動画構成工程を実行する。
素材分析工程では、素材分析部１３２が記事、画像（静止画）、動画等の素材を分析し、テンプレート推奨部１３４が入力された素材に関連性が高いフォーマットを選び、ユーザに推奨する。例えば、アップロードされた素材から抽出されるタグ情報が「人」、「ネイル」、「せっけん」である場合、「コスメ」、「女性」に近いと判定し、キュートのスタイルが付されたフォーマットＡを推奨する。また、素材より抽出される色情報に基づき推奨フォーマットを選択してもよく、さらには色情報とタグ情報の両方を利用して推奨フォーマットを選択してもよい。 As shown in FIG. 6, the moving image editing system of the present invention includes a material analyzing step of analyzing a material input by a user, a format selecting step of selecting a format of a moving image to be created, and a telop and image data (still image) in the selected format. (Moving image) is executed.
In the material analysis step, the material analysis unit 132 analyzes materials such as articles, images (still images), and moving images, and the template recommendation unit 134 selects a format highly relevant to the input material and recommends it to the user. For example, if the tag information extracted from the uploaded material is “person”, “nail”, or “soap”, it is determined that it is close to “cosmetics” or “female”, and the format A with the cute style is attached. Is recommended. Further, the recommended format may be selected based on the color information extracted from the material, or the recommended format may be selected using both the color information and the tag information.

フォーマット選択工程では、ユーザは動画構成に用いられる動画のフォーマットを選択する。ユーザは、テンプレート推奨部１３４が最も推奨するフォーマットをそのまま採用してもよいし、複数推奨されたフォーマットの中から気に入ったフォーマットを選択してもよいし、推奨されていないフォーマットの中から自己の好きなフォーマットを選択してもよい。フォーマットを構成する各カットには、装飾されたフレームが配置されており、また挿入された画像に対するアニメーション効果（例えば、パワーポイントにおけるスプリット、フェード、スライドイン・アウト、スピン、ターンのような効果）が設定されている場合もある。 In the format selection step, the user selects the format of the moving image used for the moving image configuration. The user may use the format most recommended by the template recommendation unit 134 as it is, may select a favorite format from a plurality of recommended formats, or may select his or her own format from unrecommended formats. You may choose your favorite format. Each cut that makes up the format has a decorated frame placed on it and an animation effect (such as split, fade, slide in / out, spin, turn, etc. on PowerPoint) on the inserted image. It may be set.

動画構成工程では、テロップ挿入部１３６が素材から抽出した要約文を作成し、各カットに割り当てる処理、イメージ挿入部１３５が素材から抽出した画像・動画を各カットに割り当てる処理を行う。図７に示すように、フォーマットを構成する各カットには、明度情報、色情報、および、人、車、ビジネス、自然風景などのタグ情報が関連付けられている。イメージ挿入部１３５は、単語、色および明度の類似度に基づき各カットに最も適した画像・動画を判別し、各カットに割り当てる。実施形態例では、イメージ挿入部１３５はユーザが入力した画像・動画素材を各カットに割り当てているが、予め用意された画像・動画素材を各カットに割り当てる仕様を採用してもよい。
画像・動画の割り当てが完了すると、図８に示すように、動画コンテンツを構成する複数のカットを画面上に一覧表示することができる。各カットには、表示される画像・動画およびテロップと共に各カットの再生時間（秒数）の情報も表示される。ユーザは、テキストボタンまたテキストフィールドをクリックすることで、テロップを修正することができ、画像ボタンをクリックすることで画像を差し替えることができる。 In the moving image composing step, the telop insertion unit 136 creates a summary sentence extracted from the material and assigns it to each cut, and the image insertion unit 135 assigns an image / moving image extracted from the material to each cut. As shown in FIG. 7, each cut constituting the format is associated with brightness information, color information, and tag information such as a person, a car, a business, and a natural scenery. The image insertion unit 135 determines the most suitable image / moving image for each cut based on the similarity between the word, the color, and the brightness, and assigns the cut image / moving image to each cut. In the embodiment, the image insertion unit 135 assigns the image / video material input by the user to each cut. However, a specification may be adopted in which a prepared image / video material is allocated to each cut.
When the image / moving image assignment is completed, a plurality of cuts constituting the moving image content can be displayed on the screen as shown in FIG. In each cut, information on the reproduction time (number of seconds) of each cut is displayed together with the displayed image / moving image and telop. The user can correct the telop by clicking the text button or the text field, and can replace the image by clicking the image button.

作成された動画コンテンツには、音楽挿入部１４によりＢＧＭを挿入することも可能である。図８の画面において、ＢＧＭボタンをクリックすると、図９に示すように、挿入可能なＢＧＭの一覧が表示される。曲名の右側にある再生ボタンをクリックすると、曲の演奏が開始される。曲名の左側にあるチェックボックスをクリックしてプレビューボタンをクリックすると、ＢＧＭが挿入された動画コンテンツのプレビューを再生することが可能となる。 BGM can be inserted into the created moving image content by the music insertion unit 14. When the BGM button is clicked on the screen in FIG. 8, a list of BGM that can be inserted is displayed as shown in FIG. Click the play button to the right of the song name to start playing the song. By clicking the check box on the left side of the song name and clicking the preview button, it becomes possible to reproduce a preview of the moving image content into which BGM has been inserted.

（要約文作成機能）
図１０〜図１２を参照しながらテロップ挿入部１３６による要約文作成機能について説明する。
ＳＴＥＰ９１：段落分割・文書分割
テロップ挿入部１３６は、本文フィールド５４に入力された文書を段落に分割し、各段落内の文書を文書に分割する。また、動画のテロップとして１シーンで表示すると長すぎて可読性を落とす文章（例えば８０文字以上）については、特定の品詞、表記等の条件を満たす箇所で、さらに複数の文章に分割する。
ＳＴＥＰ９２：文書の形態素解析
テロップ挿入部１３６は、各文を形態素解析にかけ、構文解析の最小単位となるトークンを取り出す。図１１に示すように、各トークンには品詞が付与されている。 (Summary sentence creation function)
The summary sentence creation function of the telop insertion unit 136 will be described with reference to FIGS.
STEP 91: Paragraph Division / Document Division The telop insertion unit 136 divides the document input in the body field 54 into paragraphs, and divides the document in each paragraph into documents. Also, a sentence (for example, 80 characters or more) which is too long to be read if it is displayed as a telop of a moving image in one scene is further divided into a plurality of sentences at a portion satisfying a condition of a specific part of speech or notation.
STEP 92: Morphological Analysis of Document The telop insertion unit 136 performs a morphological analysis on each sentence, and extracts a token that is the minimum unit for syntactic analysis. As shown in FIG. 11, a part of speech is assigned to each token.

ＳＴＥＰ９３：不要語・不要段落の削除
テロップ挿入部１３６は、予め定義された無効な文の判定ルールより、無効と定義される文、段落を削除する。例えば、「■」、「▼」などの特定記号から始まる行、特定記号で囲まれた段落、ＵＲＬ、メールアドレス、住所・電話番号などが記載された段落を削除する。
ＳＴＥＰ９４：ストップワード等の削除
テロップ挿入部１３６は、トークンから「に」、「から」、「これ」、「さん」などのあまり意味としては重要でないワード（ストップワード）や助詞などの特定品詞を削除する。
ＳＴＥＰ９５：トークンバイグラムの作成
特定の条件（例えば、予め定義された品詞条件）を満たす複数のトークンを繋げ、トークンバイグラムを得る。例えば、「２０１４年」（名詞、固有名詞、一般）と「６月」（名詞、固有名詞、一般）を繋げて「２０１４年６月」としたり、「ヴェルディ」（固有名詞）と「協賛」（普通名詞）を繋げ、「ヴェルディ協賛」としたりする。 STEP 93: Deletion of Unnecessary Word / Unnecessary Paragraph The telop insertion unit 136 deletes a sentence and a paragraph defined as invalid from a predefined invalid sentence determination rule. For example, a line starting with a specific symbol such as “■” or “▼”, a paragraph surrounded by the specific symbol, a paragraph describing a URL, a mail address, an address or a telephone number, etc. are deleted.
STEP94: Deletion of Stop Words and the Like The telop insertion unit 136 converts specific tokens such as “ni”, “kara”, “this”, “san” and other words that are not so important as words (stop words) and particles from the token. delete.
STEP 95: Token Bigram Creation A token bigram is obtained by connecting a plurality of tokens satisfying a specific condition (for example, a predefined part of speech condition). For example, "2014" (noun, proper noun, general) and "June" (noun, proper noun, general) are connected to form "June 2014", or "Verdi" (proper noun) and "support" (Ordinary nouns) to form "Verdi sponsorship".

ＳＴＥＰ９６：重要文の抽出
トークンおよびトークンバイグラムを元にＴＦ−ＩＤＦなど単語の重要度を評価する指標から特徴語となるトークンおよびトークンバイアグラムを抽出し、前述の単語類似度判定からセンテンスのセグメンテーションを行い、各セグメントから重要文を抽出することで要約とする。
ＳＴＥＰ９７：テンプレートへの当てはめ
要約（重要文）を構文解析にかけ、文節と構文木に別ける。上述のテンプレートは各カットに挿入できる文字数が定義されているところ、文節間の修飾関係から、文章として自然な区間が各テンプレートに収まるように文を区切り、テンプレートに当てはめる。図１２に、テンプレートの各カットへの文章の挿入例を示す。
以上に説明した要約文作成機能は、日本語のみならず、英語はじめとする多言語に対応が可能である。 STEP 96: Extraction of Important Sentences Tokens and token viagrams, which are characteristic words, are extracted from an index for evaluating the importance of words such as TF-IDF based on the tokens and token bigrams, and sentence segmentation is performed from the above-described word similarity determination. Then, an important sentence is extracted from each segment to form a summary.
STEP 97: Applying to a template The summary (important sentence) is subjected to syntax analysis, and is separated into a phrase and a syntax tree. In the above-mentioned template, the number of characters that can be inserted into each cut is defined, and the sentence is divided based on the modification relation between the phrases so that a natural section as a sentence is included in each template, and is applied to the template. FIG. 12 shows an example of inserting a sentence into each cut of the template.
The summary sentence creation function described above can support not only Japanese but also multiple languages such as English.

＜動画配信サーバ＞
作成した動画コンテンツを配信する機能を持つ動画配信サーバを付加してもよい。なお、動画配信スクリプトが組み込まれたＷｅｂページ６の発信は、外部Ｗｅｂサーバ５を利用する。
動画配信サーバ４は、図１３に示すように、Ｗｅｂスクレイパーと、記事本文抽出部と、記事本文解析部と、動画解析部と、動画配信部と、ハッシュ値データベースと、動画データベースと、を備えている。記事本文解析部および動画解析部をコンテンツ解析部と呼称する場合がある。なお、複数台のサーバ装置により動画配信サーバ４を実現することも可能である。
Ｗｅｂスクレイパーは、動画の配信面が設定されているＷｅｂページのコンテンツ情報を収集する。
記事本文抽出部は、収集したコンテンツ情報から記事の本文に該当する部分を抽出するソフトウェアであり、例えばReadabilityなどのツールにより構成することができる。
記事本文解析部は、抽出した記事本文を形態素解析にかけ、ＴＦ−ＩＤＦなどの方法により重要キーワードを抽出し、ハッシュ値生成MinHashやb-bit Minwise Hashingなどを使い生成された本文ハッシュ値をハッシュ値データベースへＷｅｂページのＵＲＬと共に保存する。 <Video distribution server>
A moving image distribution server having a function of distributing the created moving image content may be added. The transmission of the Web page 6 in which the moving image distribution script is incorporated uses the external Web server 5.
As shown in FIG. 13, the video distribution server 4 includes a Web scraper, an article text extraction unit, an article text analysis unit, a video analysis unit, a video distribution unit, a hash value database, and a video database. ing. The article body analysis unit and the moving image analysis unit may be referred to as a content analysis unit. Note that the moving image distribution server 4 can be realized by a plurality of server devices.
The web scraper collects content information of a web page on which a distribution surface of a moving image is set.
The article body extracting unit is software for extracting a part corresponding to the body of the article from the collected content information, and can be configured by a tool such as Readability.
The article body analysis unit performs a morphological analysis on the extracted article body, extracts important keywords by a method such as TF-IDF, and converts the body hash value generated using a hash value generation MinHash or b-bit Minwise Hashing into a hash value. It is stored in the database together with the URL of the Web page.

動画解析部は、収集したコンテンツ情報中の動画のテロップを形態素解析にかけ、ＴＦ−ＩＤＦなどの方法により重要キーワードを抽出し、記事本文解析部と同様の方法でテロップハッシュ値を生成し、ハッシュ値データベースへ保存する。動画解析部は、生成したテロップハッシュ値と保存されている記事ハッシュ値から類似度を求め、保存しておく。
動画配信部は、Ｗｅｂページ６に組み込まれたスクリプトが動画の取得をリクエストした際に、受信したＷｅｂページ６のＵＲＬから類似度が高く、現在配信が可能となっている動画を動画データベースから取得し、動画の配信を行う。より詳細には、Ｗｅｂページ６が動画をリクエストする際に送信するリクエストＵＲＬには、Ｗｅｂページ６のＵＲＬがパラメータとして含まれており、動画配信サーバ４は受け取ったパラメータからＷｅｂページ６のＵＲＬを取り出し、動画解析部による解析結果に基づき類似度が高いとされた動画を選択し、Ｗｅｂページ６に配信する。
動画データベースには、上述の動画編集サーバ１により作成した動画を格納することができる。実施形態例の動画配信サーバ４によれば、例えば、Ｗｅｂページ６の記事本文と類似度が高い動画広告を配信することで、ユーザの興味・関心にあったターゲティング広告を効率よく行うことも可能である。 The video analysis unit performs a morphological analysis on the telop of the video in the collected content information, extracts important keywords by a method such as TF-IDF, generates a telop hash value in the same manner as the article body analysis unit, and generates a hash value. Save to database. The moving image analysis unit calculates the similarity from the generated telop hash value and the stored article hash value, and stores the similarity.
When the script embedded in the Web page 6 requests acquisition of a moving image, the moving image distribution unit obtains, from the moving image database, a moving image that has a high degree of similarity and is currently available for distribution from the URL of the received Web page 6 And distribute the video. More specifically, the request URL transmitted when the Web page 6 requests a moving image includes the URL of the Web page 6 as a parameter, and the moving image distribution server 4 determines the URL of the Web page 6 from the received parameter. The extracted moving image is selected based on the analysis result by the moving image analysis unit, and is distributed to the Web page 6.
The moving image database can store the moving image created by the moving image editing server 1 described above. According to the moving image distribution server 4 of the embodiment, for example, by distributing a moving image advertisement having a high similarity to the article body of the Web page 6, it is also possible to efficiently perform a targeting advertisement that matches the interest of the user. It is.

以上に説明した実施形態例の動画編集システムによれば、動画編集用ソフト、動画サーバ、専門技術を持った編集者などを自前で揃えなくとも、簡単に動画コンテンツを作成することが可能となる。例えば、下記のような場面での活用が想定される。
１）ＥＣショップで販売している商品情報の動画化
２）プレスリリース情報、ＣＳＲ情報などを動画で配信
３）利用方法・オペレーションフローなどのマニュアルを動画化
４）動画広告として活用できるクリエイティブを制作 According to the moving image editing system of the embodiment described above, it is possible to easily create moving image contents without having to prepare moving image editing software, a moving image server, an editor with specialized skills, and the like. . For example, it can be used in the following situations.
1) Animation of product information sold at EC shops 2) Distribution of press release information, CSR information, etc. in animation 3) Animation of manuals on usage and operation flow 4) Creation of creatives that can be used as video advertisements

以上、本発明の好ましい実施形態例について説明したが、本発明の技術的範囲は上記実施形態の記載に限定されるものではない。上記実施形態例には様々な変更・改良を加えることが可能であり、そのような変更または改良を加えた形態のものも本発明の技術的範囲に含まれる。 The preferred embodiment of the present invention has been described above, but the technical scope of the present invention is not limited to the description of the above embodiment. Various changes and improvements can be made to the above-described embodiment, and the forms with such changes or improvements are also included in the technical scope of the present invention.

１動画編集サーバ
２管理者端末
３ユーザ端末
４動画配信サーバ
５外部Ｗｅｂサーバ
１１テンプレート管理部
１２分類器作成部
１３合成動画作成部
２１テンプレートＤＢ
２２学習データＤＢ
２３合成動画ＤＢ
２４組込用素材ＤＢ

1 Video Editing Server 2 Administrator Terminal 3 User Terminal 4 Video Distribution Server 5 External Web Server 11 Template Management Unit 12 Classifier Creation Unit 13 Synthetic Animation Creation Unit 21 Template DB
22 Learning data DB
23 Synthetic Video DB
24 Embedding Material DB

Claims

A server that creates video content for distribution to a user terminal,
A template management unit that manages a template in which a plurality of cuts with tag information are defined,
A material reading unit for receiving input of image data and character data,
A material analysis unit that recognizes an object in the input image data and outputs a similarity to the tag information;
An image insertion unit that assigns image data to the template based on the similarity;
A telop insertion unit for allocating a telop created from the character data to the template,
And a moving image editing server that creates moving image contents by allocating image data and telop to a selected template.

The moving picture editing server according to claim 1, wherein the telop insertion unit assigns character data to the template based on a similarity with the tag information.

The moving image editing server according to claim 1, further comprising a template recommendation unit that outputs a template recommended based on the input image data.

It is a trained model that has performed machine learning using learning data, and has a classifier that outputs annotation words of input image data,
The moving image editing server according to any one of claims 1 to 3, wherein the similarity output by the material analysis unit is a word similarity between the annotation word output by the classifier and the tag information.

The template is provided with color information,
The material analysis unit outputs a color similarity between the image data and the color information,
The moving image editing server according to claim 4, wherein the image insertion unit assigns image data to the template based on the word similarity and / or the color similarity.

The moving picture editing server according to any one of claims 1 to 5, wherein the telop insertion unit summarizes text information included in the character data to create the telop.

7. The moving image editing server according to claim 1, wherein the material reading unit has a function of recognizing voice in a moving image file and inputting character data.

The moving image editing server according to any one of claims 1 to 7, wherein the material reading unit has a function of acquiring image data from a database or the Web in addition to the input image data.

A video distribution system comprising the video editing server according to any one of claims 1 to 7, and a video distribution server,
The video distribution server
A web scraper for collecting content information of a web page to which a video distribution surface is set,
A content analysis unit that analyzes the content information and analyzes a degree of similarity with the previously created moving image content,
A moving image advertisement distribution unit that distributes the moving image content having a high degree of similarity.

In a moving image editing program for a server that distributes moving image contents to a user terminal accessed via the Internet,
Said server,
A template management unit that manages a template in which a plurality of cuts with tag information are defined,
Material reading unit that accepts input of image data and character data,
A material analysis unit that recognizes an object in the input image data and outputs a similarity with the tag information;
An image insertion unit that assigns image data to the template based on the similarity, and
A moving image editing server program functioning as a telop inserting unit for inserting a telop created from the character data.