JP6299531B2

JP6299531B2 - Singing video editing device, singing video viewing system

Info

Publication number: JP6299531B2
Application number: JP2014175826A
Authority: JP
Inventors: 靖司柳原
Original assignee: Brother Industries Ltd
Current assignee: Brother Industries Ltd
Priority date: 2014-08-29
Filing date: 2014-08-29
Publication date: 2018-03-28
Anticipated expiration: 2034-08-29
Also published as: JP2016051031A

Description

本発明は、楽曲を歌唱している歌唱者の映像及び音声が記録された複数の歌唱動画を編集する技術に関する。 The present invention relates to a technique for editing a plurality of singing videos in which video and audio of a singer singing a song are recorded.

近年、カラオケの楽曲を歌唱しているユーザの映像及び音声が記録された歌唱動画を視聴可能に公開するサービスが提供されている。この種のカラオケサービスとして、特許文献１に記載の視聴システムが知られている。 2. Description of the Related Art In recent years, services have been provided that allow a user to sing a song of karaoke music and singing a moving image in which a video and audio of the user are recorded. As this kind of karaoke service, a viewing system described in Patent Document 1 is known.

特許文献１には、視聴回数や採点結果に基づいて複数の歌唱動画を抽出し、抽出された複数の歌唱動画を採点区間毎に割振り、割振られた当該歌唱動画における当該採点区間の部分を結合することによって編集した歌唱動画を公開する技術が記載されている。 In Patent Document 1, a plurality of song videos are extracted based on the number of times of viewing and scoring results, the plurality of extracted song videos are allocated for each scoring section, and the portions of the scoring sections in the allocated song videos are combined. The technique of publishing the song animation edited by doing is described.

特開２０１４−１０９６５９号公報JP 2014-109659 A

しかしながら、特許文献１に記載の技術では、必ずしも、歌唱動画を視聴するユーザの関心に適合した歌唱動画が選択されて編集されるとは限らない。また、採点区間ごとに単独の歌唱動画をつなぎ合わせて編集するだけでは、ユーザに対して十分なエンターテイメント性を提供することができない。 However, with the technique described in Patent Document 1, a song video that matches the interest of the user who views the song video is not necessarily selected and edited. Moreover, it is not possible to provide sufficient entertainment to the user simply by connecting and editing single singing moving images for each scoring section.

本発明は上記問題を解決するためになされたものである。その目的は、歌唱動画を視聴するユーザの関心やエンターテイメント性を考慮した態様にて、複数の歌唱動画を編集する技術を提供することである。 The present invention has been made to solve the above problems. The purpose is to provide a technique for editing a plurality of song videos in a manner that takes into account the interest and entertainment of the user who views the song videos.

本発明の歌唱動画編集装置は、記憶手段と、取得手段と、決定手段と、出力制御手段とを備える。記憶手段は、楽曲を歌唱している歌唱者の映像及び音声が記録された複数の歌唱動画と、歌唱評価情報とを対応付けて記憶する。この歌唱評価情報は、歌唱動画に記録されている歌唱が、所定期間ごとに所定の評価項目について評価された評価結果を表すものである。取得手段は、記憶手段に記憶されている歌唱動画の中から特定の楽曲に対応する歌唱動画の視聴を要求したユーザについて、評価項目に対する関心の度合を表すユーザパラメータを取得する。 The song moving image editing apparatus of the present invention includes a storage unit, an acquisition unit, a determination unit, and an output control unit. A memory | storage means matches and memorize | stores the some song moving image by which the video and audio | voice of the singer who is singing a music were recorded, and song evaluation information. This singing evaluation information represents an evaluation result in which a singing recorded in a singing video is evaluated for a predetermined evaluation item every predetermined period. An acquisition unit acquires a user parameter representing a degree of interest in an evaluation item for a user who has requested viewing of a song video corresponding to a specific song from among the song videos stored in the storage unit.

決定手段は、特定の楽曲に該当する複数の歌唱動画について、特定の楽曲を複数の演奏区間に分けたそれぞれの演奏区間ごとに、複数の歌唱動画同士で互いの歌唱の旋律が同調するか否かを判定する。また、決定手段は、複数の歌唱動画それぞれの各演奏区間における歌唱の評価結果とユーザパラメータとの適合度合を判定する。そして、決定手段は、それらの判定結果に基づいて単独の歌唱動画を出力するか、複数の歌唱動画を出力するかを演奏区間ごとに決定する。 The determination means determines whether or not the melody of each song is synchronized between the plurality of song videos for each of the performance segments obtained by dividing the specific song into the plurality of performance segments for the plurality of song videos corresponding to the specific song. Determine whether. Moreover, a determination means determines the matching degree of the evaluation result of a song in each performance area of each of several song animation, and a user parameter. Then, the determining means determines for each performance section whether to output a single singing moving image or a plurality of singing moving images based on the determination results.

出力制御手段は、前記特定の楽曲に該当する複数の歌唱動画の中から、決定手段によって決定された出力方法に従って、演奏区間ごとに出力対象となる歌唱動画を切替えて、各演奏区間の出力対象の歌唱動画における当該演奏区間の部分を順次つなぎ合わせて出力することで、全演奏区間分の一連の歌唱動画として出力する。その際、出力制御手段は、決定手段より単独の歌唱動画を出力すると決定された場合、複数の歌唱動画の中から、所定の演奏区間における歌唱の評価結果とユーザパラメータで表される関心の度合との適合度合が最も高い単独の歌唱動画を出力対象に決定する。一方、出力制御手段は、決定手段により複数の歌唱動画を出力すると決定された場合、所定の演奏区間における歌唱の旋律が同調している複数の歌唱動画同士を合成し、その合成した歌唱動画を出力対象に決定する。 The output control means switches the singing video to be output for each performance section from the plurality of singing videos corresponding to the specific music according to the output method determined by the determination means, and outputs the performance target for each performance section. By sequentially connecting and outputting the sections of the performance section in the singing moving image, a series of singing moving images for the entire performance section is output. At that time, if the output control means decides to output a single singing video from the deciding means, the singing evaluation result in a predetermined performance section and the degree of interest represented by the user parameter from the plurality of singing videos. A single singing movie with the highest degree of matching with is determined as an output target. On the other hand, when it is determined by the determining means to output a plurality of song videos, the output control means synthesizes a plurality of song videos in which the melody of the song in a predetermined performance section is synchronized, and the synthesized song videos Determine the output target.

本発明によれば、歌唱の評価に関する視聴者の嗜好に適合する歌唱動画を単独で提示したり、歌唱の旋律が互いに同調（ハーモナイズ）する複数の歌唱動画同士を合成した歌唱動画を提示したりといった具合に、演奏区間ごとに複数の歌唱動画を様々な態様にてつなぎ合わせて提示できる。このようにすることで、視聴者の嗜好に合った態様やエンターテイメント性の高い態様にて、より多くの歌唱動画を視聴者に対して提供できる。 According to the present invention, a singing video that suits the viewer's preference regarding the evaluation of the singing is presented alone, or a singing video obtained by synthesizing a plurality of singing videos that harmonize with each other For example, a plurality of song videos can be connected and presented in various manners for each performance section. By doing in this way, more song animation can be provided with respect to a viewer in the mode suitable for a viewer's taste, and the mode with high entertainment nature.

ところで、近年、カラオケの採点機能においては、音高やリズム、歌唱技巧（テクニック）等といった複数の評価項目についてそれぞれ評価するものが普及している。そこで、請求項２に記載のように構成するとよい。すなわち、歌唱動画に対応付けて記憶されている歌唱評価情報は、当該歌唱動画に記録されている歌唱が複数種類の評価項目について評価された評価結果を表す。そして、取得手段は、複数の評価項目それぞれに対する関心の度合を表すユーザパラメータを取得する。このようにすることで、複数の評価項目に基づく評価結果が対応付けられた歌唱動画について、それぞれの評価項目に対する視聴者の嗜好を的確に反映した態様にて編集した歌唱動画を視聴者に提供できる。 By the way, in recent years, in the scoring function of karaoke, what evaluates each of a plurality of evaluation items such as pitch, rhythm, singing technique (technique) and the like has become widespread. Therefore, it is preferable to configure as described in claim 2. That is, the singing evaluation information stored in association with the singing video represents an evaluation result in which the singing recorded in the singing video is evaluated for a plurality of types of evaluation items. And an acquisition means acquires the user parameter showing the degree of interest with respect to each of a plurality of evaluation items. By doing in this way, about the song video in which the evaluation results based on a plurality of evaluation items are associated, the song video edited in a manner that accurately reflects the viewer's preference for each evaluation item is provided to the viewer it can.

つぎに、請求項３に記載の歌唱動画視聴システムは、記憶手段と、受付手段と、決定手段と、出力制御手段と、受付手段と、再生手段とを備える。このうち、受付手段は、記憶手段に記憶されている歌唱動画の中から特定の楽曲に対応する歌唱動画に対する視聴要求と、評価項目に対する関心の度合を表すユーザパラメータを指定する指示とをユーザから受付ける。受信手段は、出力制御手段により出力された歌唱動画を受信する。再生手段は、受信手段により受信された歌唱動画を再生し、所定の表示手段及び音声出力手段に出力させる。本発明によれば、ユーザにより指定された特定の楽曲に関する歌唱動画及びユーザパラメータに基づいて、請求項１に記載の歌唱動画編集装置によって編集された歌唱動画を受信し、その歌唱動画を再生可能なシステムを実現できる。 Next, the singing moving image viewing system according to claim 3 includes a storage unit, a receiving unit, a determining unit, an output control unit, a receiving unit, and a reproducing unit. Among these, the reception means receives a viewing request for a song video corresponding to a specific song from the song videos stored in the storage means, and an instruction to specify a user parameter indicating the degree of interest in the evaluation item from the user. Accept. The receiving means receives the singing moving image output by the output control means. The reproduction means reproduces the singing moving image received by the reception means, and causes the predetermined display means and sound output means to output the song moving image. According to the present invention, a singing video edited by the singing video editing device according to claim 1 is received based on a singing video and user parameters relating to a specific music specified by the user, and the singing video can be reproduced. System can be realized.

カラオケシステムの概略構成を表すブロック図。The block diagram showing the schematic structure of a karaoke system. ユーザ視聴装置と配信サーバとが実行する処理の手順を表すシーケンス図。The sequence diagram showing the procedure of the process which a user viewing-and-listening apparatus and a delivery server perform. ユーザ関心パラメータを入力するＧＵＩ画面の一例。An example of the GUI screen which inputs a user interest parameter. 歌唱動画配信処理の手順を表すフローチャート（１）。The flowchart (1) showing the procedure of a song moving image delivery process. 歌唱動画配信処理の手順を表すフローチャート（２）。The flowchart (2) showing the procedure of a song moving image delivery process. 歌唱動画の出力態様の推移を表す説明図。Explanatory drawing showing transition of the output aspect of a song animation.

以下、本発明の実施形態を図面に基づいて説明する。なお、本発明は下記の実施形態に限定されるものではなく様々な態様にて実施することが可能である。
［カラオケシステム１の構成の説明］
図１に示すように、カラオケシステム１は、配信サーバ２及びユーザ視聴端末３それぞれがインターネット１０に接続されており、互いに通信可能に構成されている。なお、図１においては、説明を簡潔にするためユーザ視聴端末３を１台のみ記載した。しかしながら、実際には、これらの機器が数多くの接続された態様で実施される。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. In addition, this invention is not limited to the following embodiment, It is possible to implement in various aspects.
[Description of configuration of karaoke system 1]
As shown in FIG. 1, the karaoke system 1 is configured such that the distribution server 2 and the user viewing terminal 3 are connected to the Internet 10 and can communicate with each other. In FIG. 1, only one user viewing terminal 3 is shown for the sake of brevity. In practice, however, these devices are implemented in a number of connected ways.

配信サーバ２は、コンピュータシステムによって構成されたサーバ装置であり、制御部２１、記憶部２２、通信部２３を備える。この配信サーバ２は、ユーザ視聴端末３からの要求に応じて、楽曲を歌唱している歌唱者の映像及び音声が記録された歌唱動画を、インターネット１０経由でユーザ視聴端末３に配信する。なお、配信サーバ２による歌唱動画の配信は、データをダウンロードしながら再生可能なストリーミング方式で行われる。 The distribution server 2 is a server device configured by a computer system, and includes a control unit 21, a storage unit 22, and a communication unit 23. In response to a request from the user viewing terminal 3, the distribution server 2 distributes a singing moving image in which video and sound of a singer singing a song are recorded to the user viewing terminal 3 via the Internet 10. The distribution of the singing moving image by the distribution server 2 is performed by a streaming method that can be reproduced while downloading the data.

制御部２１は、ＣＰＵ、ＲＡＭ、ＲＯＭ等を中心に構成された情報処理デバイスであり、装置全体の制御を司る。制御部２１は、所定のプログラムに従って処理を実行することにより、配信サーバ２各部に対する制御及び各種演算を実行する。記憶部２２は、プログラムや、各種データベース等を保存しておくための記憶装置である。通信部２３は、配信サーバ２をインターネット１０に接続してユーザ視聴端末とデータ通信を行うための通信インタフェースである。 The control unit 21 is an information processing device mainly composed of a CPU, a RAM, a ROM, and the like, and controls the entire apparatus. The control part 21 performs control with respect to each part of the delivery server 2, and various calculations by performing a process according to a predetermined program. The storage unit 22 is a storage device for storing programs, various databases, and the like. The communication unit 23 is a communication interface for connecting the distribution server 2 to the Internet 10 and performing data communication with the user viewing terminal.

配信サーバ２は、本システムが提供する歌唱動画配信サービスにおいて公開する複数の歌唱動画のデータを蓄積する歌唱動画データベースを記憶部２２に備える。この歌唱動画データベースには、インターネットを介してカラオケ演奏機器（例えば、ユーザ視聴端末３や他の業務用のカラオケ装置）からアップロードされた複数の歌唱動画及び分析採点結果のデータが蓄積されている。カラオケ演奏機器は、カラオケ演奏時に撮影した歌唱動画、及びそのときの歌唱音声を複数の項目（例えば、音程、リズム、安定感、熱唱度、テクニック等）で分析して評価した分析採点結果のデータを、楽曲の識別情報と対応付けて配信サーバ２にアップロードする。これにより、多くの歌唱者の歌唱動画及び分析採点結果のデータが配信サーバ２に蓄積される。歌唱動画には、カラオケ演奏機器を利用して楽曲を歌唱している歌唱者の映像及び歌唱音声が記録された動画情報が含まれている。また、個々の歌唱動画には、歌唱された楽曲を識別するための楽曲識別情報が対応付けられている。 The distribution server 2 includes a singing moving image database that accumulates data of a plurality of singing moving images to be disclosed in the singing moving image distribution service provided by the system in the storage unit 22. In this singing moving image database, a plurality of singing moving images and analysis scoring result data uploaded from a karaoke performance device (for example, the user viewing terminal 3 or other karaoke apparatus for business use) are accumulated via the Internet. Karaoke performance equipment is an analysis scoring result data evaluated by analyzing the singing video taken at the time of karaoke performance and the singing voice at that time with multiple items (for example, pitch, rhythm, stability, enthusiasm, technique, etc.) Is uploaded to the distribution server 2 in association with the music identification information. Thereby, the singing animation and analysis scoring result data of many singers are accumulated in the distribution server 2. The singing moving image includes moving image information in which the video and singing sound of a singer who is singing a song using a karaoke performance device are recorded. Each song video is associated with music identification information for identifying the sung music.

分析採点結果のデータは、カラオケ演奏機器の備える分析採点機能によって、上述の歌唱動画に係る歌唱の巧拙度合が精密に分析された分析結果を表す情報である。この分析採点結果のデータは、採点対象の歌唱が記録されている歌唱動画に対応付けられている。分析採点結果のデータには、ユーザによる歌唱を「音程」、「リズム」、「安定感」、「熱唱度」、「テクニック」といった複数の評価項目それぞれについて、所定の演奏区間ごとに評価した結果を数値化した情報が含まれる。また、分析採点結果のデータには、「ビブラート」等の特定の歌唱技巧（テクニック）の出現回数を、所定の演奏区間ごとに検出した結果を数値化した情報も含まれる。なお、カラオケにおける分析採点の方法は周知の技術につき、詳しい説明については省略する。 The data of the analysis scoring result is information representing the analysis result in which the skill level of the song related to the above-mentioned singing movie is precisely analyzed by the analysis scoring function of the karaoke performance device. The data of this analysis scoring result is matched with the song moving image in which the singing target song is recorded. Analytical scoring result data is the result of evaluating the singing by the user for each of several performance items, such as “pitch”, “rhythm”, “stability”, “enthusiasm”, and “technique” Information that is digitized is included. The analysis scoring result data also includes information obtained by quantifying the result of detecting the number of appearances of a specific singing technique (technique) such as “vibrato” for each predetermined performance section. In addition, the analysis scoring method in karaoke is a well-known technique, and detailed description is omitted.

ユーザ視聴端末３は、配信サーバ２から歌唱動画のデータを取得して再生出力する機能を有する情報処理装置である。このユーザ視聴端末３は、例えば、カラオケ店舗に設置される業務用のカラオケ装置や、ユーザ個人が所有するスマートフォン等の高機能携帯電話端末、パーソナルコンピュータ等によって具現化される。ユーザ視聴端末３は、インターネット１０経由で配信サーバ２とデータ通信を行う。ユーザ視聴端末３は、ユーザから特定の楽曲に関する歌唱動画のリクエスト指示を受付ける。そして、リクエストに応じて配信サーバ２から取得した歌唱動画をストリーミング形式で再生し、再生した映像及び音声を出力する。 The user viewing terminal 3 is an information processing apparatus having a function of acquiring singing video data from the distribution server 2 and reproducing and outputting the data. The user viewing terminal 3 is embodied by, for example, a business karaoke apparatus installed in a karaoke store, a high-functional mobile phone terminal such as a smartphone owned by the user, a personal computer, or the like. The user viewing terminal 3 performs data communication with the distribution server 2 via the Internet 10. The user viewing terminal 3 accepts a request instruction for a singing video related to a specific music piece from the user. Then, in response to the request, the song moving image acquired from the distribution server 2 is reproduced in a streaming format, and the reproduced video and audio are output.

［通信処理の手順］
配信サーバ２とユーザ視聴端末３との間で行われる通信処理の手順について、図２のフローチャートを参照しながら説明する。 [Communication processing procedure]
The procedure of communication processing performed between the distribution server 2 and the user viewing terminal 3 will be described with reference to the flowchart of FIG.

Ｓ１０では、ユーザ視聴端末３が、動画検索要求を配信サーバ２に送信する。この動画検索要求には、ユーザ視聴端末３がユーザからリクエストを受付けた楽曲の識別情報が含まれる。Ｓ１１では、配信サーバ２は、動画検索要求で表される楽曲に該当する歌唱動画を記憶部２２の歌唱動画データベースの中から検索し、該当する歌唱動画の識別情報を含む動画検索結果を、要求元のユーザ視聴端末３に返信する。 In S 10, the user viewing terminal 3 transmits a moving image search request to the distribution server 2. This moving image search request includes the identification information of the music that the user viewing terminal 3 has received the request from the user. In S11, the distribution server 2 searches the song video database in the storage unit 22 for a song video corresponding to the music represented by the video search request, and requests a video search result including identification information of the corresponding song video. A reply is made to the original user viewing terminal 3.

Ｓ１２では、ユーザ視聴端末３は、複数の再生候補動画とユーザ関心パラメータとを指定する情報を送信する。再生候補動画は、ユーザ視聴端末３が配信サーバ２から受信した動画検索結果の中から、ユーザから視聴を希望する候補として指定を受付けた歌唱動画である。ユーザ関心パラメータは、歌唱動画の分析採点結果における評価項目についてユーザの関心の度合を表す数値である。ユーザ視聴端末３は、ユーザからユーザ関心パラメータの指定を受付け、受付けたユーザ関心パラメータの数値を配信サーバ２に通知する。 In S12, the user viewing terminal 3 transmits information specifying a plurality of reproduction candidate videos and user interest parameters. The reproduction candidate video is a singing video that has been designated as a candidate that the user wishes to view from the video search results received by the user viewing terminal 3 from the distribution server 2. The user interest parameter is a numerical value indicating the degree of interest of the user regarding the evaluation item in the analysis score result of the singing moving image. The user viewing terminal 3 receives designation of the user interest parameter from the user, and notifies the distribution server 2 of the received numerical value of the user interest parameter.

ここで、ユーザ関心パラメータの指定方法について、図３を参照しながら説明する。ユーザ視聴端末３は、図３に例示されるグラフィカルユーザインタフェース（ＧＵＩ）をユーザに対して提示することで、ユーザ関心パラメータの指定をユーザから受付ける。このＧＵＩは、図３に例示されるとおり、採点項目に対する関心の度合を入力するためのレーダーチャート３１と、採点項目の１つであるテクニックについて、複数種類のビブラートに対する関心の有無を入力するためのチェックリスト３２からなる。 Here, a user interest parameter designation method will be described with reference to FIG. The user viewing terminal 3 accepts designation of the user interest parameter from the user by presenting the graphical user interface (GUI) illustrated in FIG. 3 to the user. As illustrated in FIG. 3, this GUI is used to input the presence / absence of interest in a plurality of types of vibrato for a radar chart 31 for inputting the degree of interest in a scoring item and a technique that is one of scoring items. Of the checklist 32.

レーダーチャート３１は、歌唱動画の分析採点結果の評価項目と共通する「音程」、「リズム」、「安定感」、「熱唱度」、「テクニック」の５つの評価項目それぞれについて、ユーザが関心の度合を０〜３の４段階の数値で指定できるようになっている。ユーザ視聴端末３は、レーダーチャート３１を介してユーザから入力された各評価項目の数値の集合をベクトル化して、採点項目関心ベクトルＰ_１{音程，リズム，安定感，熱唱度，テクニック}を得る。図３の事例では、Ｐ_１＝{１，１，１，３，３}となる。 The radar chart 31 shows that the user is interested in each of the five evaluation items of “pitch”, “rhythm”, “stability”, “degree of passion”, and “technique” that are common to the evaluation items of the singing video analysis scoring results. The degree can be specified by a numerical value in four stages from 0 to 3. The user viewing terminal 3 vectorizes a set of numerical values of each evaluation item input from the user via the radar chart 31 to obtain a scoring item interest vector P ₁ {pitch, rhythm, stability, enthusiasm, technique}. . In the case of FIG. 3, P ₁ = { ₁ , ₁ , ₁ , 3, 3}.

チェックリスト３２は、分析採点機能による検出対象であって発声方法が異なる８種類のビブラートＶ１〜Ｖ８それぞれについて、ユーザが関心の有無をチェックボックスで指定できるようになっている。ユーザ視聴端末３は、チェックリスト３２においてユーザからチェックが入れられたビブラートに「１」の値を付与し、チェックが入れられていないビブラートに「０」の値を付与する。そして、各ビブラートに対応する数値の集合をベクトル化して、ビブラート関心ベクトルＰ_２{Ｖ１，Ｖ２，Ｖ３，Ｖ４，Ｖ５，Ｖ６，Ｖ７，Ｖ８}を得る。図３の事例では、Ｐ_２＝{０，１，０，０，０，０，１，０}となる。 In the check list 32, the user can designate whether or not he / she is interested in each of eight types of vibratos V1 to V8 which are detection targets by the analysis scoring function and have different utterance methods. The user viewing terminal 3 assigns a value of “1” to the vibrato that is checked by the user in the check list 32 and assigns a value of “0” to the vibrato that is not checked. Then, a set of numerical values corresponding to each vibrato is vectorized to obtain a vibrato interest vector P ₂ {V1, V2, V3, V4, V5, V6, V7, V8}. In the case of FIG. 3, P ₂ = {0, _1, 0, 0, 0, 0, 1, 0}.

ユーザ視聴端末３は、レーダーチャート３１及びチェックリスト３２を介して得られた採点項目関心ベクトルＰ_１及びビブラート関心ベクトルＰ_２を、ユーザ関心パラメータとして配信サーバ２に送信する。図２のシーケンス図の説明に戻る。Ｓ１３では、配信サーバ２は、再生対象となる歌唱動画のストリーミングデータを時刻順にユーザ視聴端末３に送信する。 User viewing terminal 3 transmits the scoring items of interest vectors P ₁ and vibrato interest vector P ₂ obtained through the radar chart 31 and the check list 32, the distribution server 2 as the user parameter of interest. Returning to the description of the sequence diagram of FIG. In S 13, the distribution server 2 transmits the streaming data of the song moving image to be reproduced to the user viewing terminal 3 in the order of time.

このとき、配信サーバ２は、ユーザ視聴端末３から通知されたユーザ関心パラメータに基づいて、歌唱動画に対応する楽曲を構成するフレーズごとに、複数の再生候補動画の中から何れかの単独の歌唱動画を出力するか、何れかの複数の歌唱動画を合成して出力するかを決定する。そして、配信サーバ２は、決定した出力方法に従ってフレーズごとに出力対象となる歌唱動画を切替えて、各フレーズの出力対象の歌唱動画のデータを順次つなぎ合わせて配信することで、全演奏区間分の一連の歌唱動画として出力する。この一連の処理の詳細な内容については、後述する。 At this time, based on the user interest parameter notified from the user viewing terminal 3, the distribution server 2 sings any single song from among a plurality of reproduction candidate videos for each phrase constituting the song corresponding to the song video. It is determined whether to output a moving image or to synthesize any of a plurality of singing moving images. And the delivery server 2 switches the song animation used as an output object for every phrase according to the determined output method, and connects and delivers the data of the song animation of the output object of each phrase, and distributes it for all performance sections. Output as a series of song videos. Details of this series of processing will be described later.

その後、歌唱動画のストリーミング配信の途中で、ユーザ視聴端末３においてユーザ関心パラメータを再指定する操作が行われた場合、ユーザ視聴端末３は、新たに得られたユーザ関心パラメータを、配信サーバ２に送信する（Ｓ１４）。これに対し、Ｓ１５では、配信サーバ２は、以降の歌唱動画の出力方法を新たに通知されたユーザ関心パラメータに基づいて決定し、決定した出力方法に従ってストリーミングデータをユーザ視聴端末３に送信する。 Thereafter, when the user viewing terminal 3 performs an operation to re-specify the user interest parameter in the middle of the streaming distribution of the song video, the user viewing terminal 3 sends the newly obtained user interest parameter to the distribution server 2. Transmit (S14). On the other hand, in S15, the distribution server 2 determines a subsequent singing moving image output method based on the newly notified user interest parameter, and transmits streaming data to the user viewing terminal 3 according to the determined output method.

［歌唱動画配信処理の説明］
配信サーバ２の制御部２１が実行する歌唱動画配信処理の手順について、図４，５のフローチャートに基づいて説明する。この処理は、ユーザ視聴端末３から動画検索要求（図２のＳ１０参照）が送信されたときに実行される処理である。 [Description of singing video delivery processing]
The procedure of the song moving image distribution process executed by the control unit 21 of the distribution server 2 will be described based on the flowcharts of FIGS. This process is a process executed when a moving image search request (see S10 in FIG. 2) is transmitted from the user viewing terminal 3.

Ｓ１００では、制御部２１は、ユーザ視聴端末３から動画検索要求を受信する。この動画検索要求には、ユーザが視聴を希望する楽曲の識別情報が含まれている。Ｓ１０２では、制御部２１は、受信した動画検索要求に含まれる楽曲の識別情報に該当する歌唱動画を記憶部２２の歌唱動画データベースの中から探索し、該当する複数の歌唱動画の一覧を要求元のユーザ視聴端末３に返信する。 In S 100, the control unit 21 receives a moving image search request from the user viewing terminal 3. This moving image search request includes identification information of the music that the user desires to view. In S102, the control unit 21 searches the song video database in the storage unit 22 for a song video corresponding to the music identification information included in the received video search request, and requests a list of a plurality of corresponding song videos. To the user viewing terminal 3.

Ｓ１０４では、制御部２１は、ユーザ視聴端末３から再生候補動画の識別情報と、ユーザ関心パラメータとを受信する。なお、ここでは、再生候補動画として複数の歌唱動画が指定されていることを前提とする。また、ユーザ関心パラメータには、採点項目関心ベクトルＰ_１及びビブラート関心ベクトルＰ_２の情報が含まれている。ここで受信した各情報は、制御部２１のメモリに記憶される。 In S 104, the control unit 21 receives the identification information of the reproduction candidate moving image and the user interest parameter from the user viewing terminal 3. Here, it is assumed that a plurality of song videos are designated as reproduction candidate videos. Further, the user interest parameters are included information of scoring items interest vectors P ₁ and vibrato interest vector P ₂ is. Each information received here is stored in the memory of the control unit 21.

Ｓ１０６では、制御部２１は、再生候補動画に対応する楽曲のフレーズの順序を表すフレーズ番号ｉのカウンタを初期化（ｉ＝１）する。なお、ここでいう「フレーズ」とは、楽曲のメロディを複数の演奏区間ごとに分けたひと区切りを表す単位である。メロディのどこからどこまでを１つのフレーズとするかは、楽曲ごとに予め定義されたフレーズ割りを表すメタデータに基づいて特定することが考えられる。 In S106, the control unit 21 initializes a counter of the phrase number i indicating the order of the phrases of the music corresponding to the reproduction candidate moving image (i = 1). Here, the phrase “phrase” is a unit representing a single break obtained by dividing a melody of a music into a plurality of performance sections. It is conceivable to determine from where to where in the melody one phrase is based on metadata representing a phrase division predefined for each piece of music.

Ｓ１０８では、制御部２１は、全ての再生候補動画のｉ番目のフレーズに相当する区間の歌唱の旋律について、各歌唱動画同士の同調（ハーモナイズ）の状態を検査する。そして、Ｓ１１０では、制御部２１は、Ｓ１０８の検査の結果、再生候補動画の中に歌唱の旋律が同調する複数の歌唱動画が存在するか否かを判定する。歌唱の旋律が同調する複数の歌唱動画が存在する場合（Ｓ１１０：ＹＥＳ）、制御部２１はＳ１１６に進む。一方、歌唱の旋律が同調する複数の歌唱動画が存在しない場合（Ｓ１１０：ＮＯ）、制御部２１はＳ１１２に進む。 In S108, the control unit 21 inspects the state of tune (harmonizing) between the song videos for the melody of the song corresponding to the i-th phrase of all the reproduction candidate videos. In S110, as a result of the inspection in S108, the control unit 21 determines whether or not there are a plurality of song videos in which the melody of the song is synchronized in the reproduction candidate videos. When there are a plurality of singing moving images in which the melody of the singing is synchronized (S110: YES), the control unit 21 proceeds to S116. On the other hand, when a plurality of singing moving images that synchronize the melody of the singing do not exist (S110: NO), the control unit 21 proceeds to S112.

Ｓ１１２では、制御部２１は、全ての再生候補動画について、ｉ番目のフレーズに相当する区間の分析採点結果と、ユーザ関心パラメータとの類似度を計算する。具体的には、再生候補動画ごとに、次の２種類の類似度をそれぞれ算出する。 In S112, the control unit 21 calculates the similarity between the analysis scoring result of the section corresponding to the i-th phrase and the user interest parameter for all reproduction candidate moving images. Specifically, the following two types of similarity are calculated for each reproduction candidate video.

１つは、再生候補の歌唱動画Ｘのｉ番目のフレーズにおける平均採点結果Ｓ_１-Xiと、採点項目関心ベクトルＰ_１との類似度Ｓｉｍ_1-Xiである。ここで、Ｓ_１-Xiは、歌唱動画Ｘに対応する分析採点結果の「音程」、「リズム」、「安定感」、「熱唱度」、「テクニック」の各採点項目におけるｉ番目のフレーズに相当する区間の平均値の集合を、ベクトル{音程，リズム，安定感，熱唱度，テクニック}として表したものである。 One is a similarity Sim _1-Xi between the average scoring result S ₁ -Xi in the i-th phrase of the song video X to be played and the scoring item interest vector P ₁ . Here, S _{1 -Xi} is the i-th phrase in each of the scoring items “pitch”, “rhythm”, “stability”, “degree of enthusiasm”, and “technique” in the analysis scoring results corresponding to the song video X. The set of average values for the corresponding interval is represented as a vector {pitch, rhythm, stability, enthusiasm, technique}.

類似度Ｓｉｍ_1-Xiは下記式１によって表される。なお、下記式１では、類似度Ｓｉｍ_1-Xiは１以下の正数となる。 The similarity Sim _1-Xi is expressed by the following formula 1. In the following formula 1, similarity Sim _1-Xi is a positive number of 1 or less.

もう１つは、歌唱動画Ｘのｉ番目のフレーズにおけるビブラートの検出結果Ｓ_２-Xiと、ビブラート関心ベクトルＰ_２との類似度Ｓｉｍ_2-Xiである。ここで、Ｓ_２-Xiは、歌唱動画Ｘに対応する分析採点結果のｉ番目のフレーズに相当する区間における８種類のビブラートＶ１〜Ｖ８それぞれの検出回数の集合を、ベクトル{Ｖ１，Ｖ２，Ｖ３，Ｖ４，Ｖ５，Ｖ６，Ｖ７，Ｖ８}として表したものである。

Second, the vibrato detection result S _2-Xi in the i-th phrase singing video X, which is the similarity Sim _2-Xi between the vibrato interest vector P _2. Here, S ₂ -Xi is a vector {V 1, V 2, V 3 representing a set of detection counts of each of the 8 types of vibrato V 1 to V 8 in the section corresponding to the i-th phrase of the analysis scoring result corresponding to the song video X. , V4, V5, V6, V7, V8}.

類似度Ｓｉｍ_２-Xiは下記式２によって表される。なお、下記式２では、類似度Ｓｉｍ_２-Xiは１以下の正数となる。 The similarity Sim _2-Xi is expressed by the following formula 2. In the following formula 2, the similarity Sim _2-Xi is a positive number of 1 or less.

次のＳ１１４では、制御部２１は、Ｓ１１２で算出した類似度が規定値（例えば、０．８）以上となる歌唱動画が複数存在するか否かを判定する。類似度が規定値以上となる歌唱動画が複数存在する場合（Ｓ１１４：ＹＥＳ）、制御部２１はＳ１１６に進む。一方、類似度が規定値以上となる歌唱動画が複数存在しない場合（Ｓ１１４：ＮＯ）、制御部２１はＳ１２２に進む。

In next S 114, the control unit 21 determines whether or not there are a plurality of song videos in which the similarity calculated in S 112 is equal to or greater than a specified value (e.g., 0.8). When there are a plurality of singing movies whose similarity is equal to or higher than the specified value (S114: YES), the control unit 21 proceeds to S116. On the other hand, when there are not a plurality of singing videos whose similarity is equal to or higher than the specified value (S114: NO), the control unit 21 proceeds to S122.

Ｓ１１６では、制御部２１は、該当する複数の歌唱動画のｉ番目のフレーズに該当する部分を合成した合成動画を作成する。具体的には、歌唱動画の表示領域全体を複数の部分領域に分割し、複数の部分領域それぞれに別々の歌唱動画の映像を割当てる。また、該当する複数の歌唱動画の音声を混合する。ここでは、制御部２１は、１つ前のフレーズ（ｉ−１番目）において単独の歌唱動画が出力された状態から次のフレーズで合成動画に推移する過程として、例えば、ワイプ処理、フェード処理、スプリット処理等の演出効果を合成動画の映像に付与する。 In S116, control part 21 creates a synthetic animation which compounded a part applicable to the i-th phrase of a plurality of applicable song videos. Specifically, the entire display area of the singing moving image is divided into a plurality of partial areas, and videos of different singing moving pictures are allocated to each of the plurality of partial areas. Moreover, the sound of a corresponding some song animation is mixed. Here, the control unit 21 performs, for example, a wipe process, a fade process, and the like as a process of transitioning from a state in which a single song video is output in the previous phrase (i-1) to a composite video in the next phrase. Produce effects such as split processing to the video of the composite video.

次のＳ１１８では、制御部２１は、Ｓ１１６で作成したｉ番目のフレーズに対応する合成動画のフレームデータを、要求元のユーザ視聴端末３に順次ストリーミング形式で配信する。Ｓ１２０では、制御部２１は、ｉ番目のフレーズに対応する合成動画の配信が終了したか否かを判定する。配信が終了していない場合（Ｓ１２０：ＮＯ）、制御部２１はＳ１１８に戻り、配信を継続する。一方、配信が終了した場合（Ｓ１２０：ＹＥＳ）、制御部２１はＳ１３４（図５）に進む。 In next S118, the control unit 21 sequentially distributes the frame data of the synthesized moving image corresponding to the i-th phrase created in S116 to the requesting user viewing terminal 3 in a streaming format. In S120, the control unit 21 determines whether or not the distribution of the composite moving image corresponding to the i-th phrase has ended. If the distribution has not ended (S120: NO), the control unit 21 returns to S118 and continues the distribution. On the other hand, when the distribution is completed (S120: YES), the control unit 21 proceeds to S134 (FIG. 5).

一方、Ｓ１１４において否定判定をした場合に進むＳ１２２では、制御部２１は、１つ前（ｉ−１番目）のフレーズにおいて出力した歌唱動画が合成動画であった否かを判定する。前のフレーズが合成動画であった場合（Ｓ１２２：ＹＥＳ）、制御部２１はＳ１２８に進む。一方、前のフレーズが合成動画でなかった場合（Ｓ１２２：ＮＯ）、制御部２１はＳ１３０に進む。 On the other hand, in S122 which proceeds when a negative determination is made in S114, the control unit 21 determines whether or not the singing moving image output in the previous (i−1) th phrase is a synthetic moving image. When the previous phrase is a synthesized moving image (S122: YES), the control unit 21 proceeds to S128. On the other hand, when the previous phrase is not a synthesized moving image (S122: NO), the control unit 21 proceeds to S130.

Ｓ１２８では、制御部２１は、１つ前のフレーズにおいて複数の歌唱動画からなる合成動画が出力された状態から、次のフレーズで単独の歌唱動画に推移する過程として、例えば、ワイプ処理、フェード処理、スプリット処理等の演出効果を付与する。Ｓ１３０では、制御部２１は、Ｓ１１２で算出した類似度が最大となった単独の歌唱動画のｉ番目のフレーズに該当する部分のフレームデータを、要求元のユーザ視聴端末３に順次ストリーミング形式で配信する。 In S128, the control unit 21 performs, for example, a wipe process or a fade process as a process of transitioning from a composite video composed of a plurality of song videos in the previous phrase to a single song video in the next phrase. Providing effects such as split processing. In S130, the control unit 21 sequentially distributes the frame data of the portion corresponding to the i-th phrase of the single singing video having the maximum similarity calculated in S112 to the requesting user viewing terminal 3 in a streaming format. To do.

Ｓ１３２では、制御部２１は、ｉ番目のフレーズに対応する歌唱動画の配信が終了したか否かを判定する。配信が終了していない場合（Ｓ１３２：ＮＯ）、制御部２１はＳ１３０に戻り、配信を継続する。一方、配信が終了した場合（Ｓ１３２：ＹＥＳ）、制御部２１はＳ１３４（図５）に進む。 In S132, the control unit 21 determines whether or not the distribution of the song video corresponding to the i-th phrase has ended. If the distribution has not ended (S132: NO), the control unit 21 returns to S130 and continues the distribution. On the other hand, when the distribution is completed (S132: YES), the control unit 21 proceeds to S134 (FIG. 5).

次のＳ１３４では、制御部２１は、フレーズ番号ｉのカウンタをインクリメントする。Ｓ１３６では、制御部２１は、要求元のユーザ視聴端末３から新たにユーザ関心パラメータを受信したか否かを判定する。ユーザ関心パラメータを受信した場合（Ｓ１３６：ＹＥＳ）、制御部２１はＳ１３８に進む。一方、ユーザ関心パラメータを受信していない場合（Ｓ１３６：ＮＯ）、制御部２１はＳ１４０に進む。 In next step S134, the control unit 21 increments the counter of the phrase number i. In S136, the control unit 21 determines whether or not a user interest parameter is newly received from the requesting user viewing terminal 3. When the user interest parameter is received (S136: YES), the control unit 21 proceeds to S138. On the other hand, when the user interest parameter has not been received (S136: NO), the control unit 21 proceeds to S140.

Ｓ１３８では、制御部２１は、メモリに記憶しているユーザ関心パラメータを、今回新たに受信したユーザ関心パラメータに更新する。Ｓ１４０では、制御部２１は、楽曲を構成する全てのフレーズについて、歌唱動画のデータの配信が完了したか否かを判定する。全てのフレーズについて配信が完了していない場合（Ｓ１４０：ＮＯ）、制御部２１はＳ１０８（図４）に戻る。一方、全てのフレーズについて配信が完了した場合（Ｓ１４０：ＥＳ）、制御部２１は本処理を終了する。 In S138, the control unit 21 updates the user interest parameter stored in the memory to the user interest parameter newly received this time. In S140, the control unit 21 determines whether or not the distribution of the song video data has been completed for all phrases constituting the music. When distribution is not completed for all phrases (S140: NO), the control unit 21 returns to S108 (FIG. 4). On the other hand, when the distribution is completed for all phrases (S140: ES), the control unit 21 ends this process.

［歌唱動画の出力態様の推移］
上述の歌唱動画配信処理（図４，５参照）による歌唱動画の出力態様の一例について、図６を参照しながら説明する。図６（ａ）〜（ｆ）は、２つの歌唱動画Ａ，Ｂについて、楽曲のフレーズごとに出力態様が切替わる様子を時系列で表したものである。 [Transition of output mode of singing video]
An example of the output mode of the song moving image by the above-described song moving image distribution process (see FIGS. 4 and 5) will be described with reference to FIG. FIGS. 6A to 6F show, in chronological order, how the output mode is switched for each phrase of music for the two song videos A and B. FIG.

図６（ａ）は、動画Ａの１番目のフレーズに相当する区間が出力されている状態を表している。次の図６（ｂ）は、引き続き動画Ａの２番目のフレーズに相当する区間が単独で出力されている状態を表している。次の図６（ｃ）は、３番目のフレーズおいて、動画Ａ及び動画Ｂからなる合成動画に切替わる過程を表している。ここでは、２番目のフレーズから引き続いて動画Ａの映像が表示されているところに、新たに動画Ｂの映像がフレームインする演出効果が挿入される。 FIG. 6A shows a state in which a section corresponding to the first phrase of the moving image A is output. Next, FIG. 6B shows a state in which the section corresponding to the second phrase of the moving image A is continuously output. Next, FIG. 6C shows a process of switching to a composite video composed of video A and video B in the third phrase. Here, the effect that the video of video B is newly framed in is inserted where the video of video A is displayed following the second phrase.

次の図６（ｄ）は、演出効果の後で動画Ａ，Ｂの３番目のフレーズに相当する区間の合成動画出力されている状態を表している。この合成動画は、歌唱動画の表示領域全体を左右２つに分割し、それぞれの分割領域に歌唱動画Ａ，Ｂの映像が同時に表示されるものである。次の図６（ｅ）は、４番目のフレーズにおいて、動画Ｂが単独で出力される状態に切替わる過程を表している。ここでは、３番目のフレーズにおいて合成されて出力されていた動画Ａ，Ｂのうち、動画Ａの映像がフレームアウトする演出効果が挿入される。次の図６（ｆ）は、演出効果の後で単独の動画Ｂの４番目のフレーズに相当する区間が出力されている状態を表している。 Next, FIG. 6D shows a state in which the synthesized moving image is output in the section corresponding to the third phrase of the moving images A and B after the effect. In this synthesized moving image, the entire display area of the singing moving image is divided into left and right two parts, and the images of the singing moving images A and B are simultaneously displayed in the respective divided areas. FIG. 6E shows a process of switching to a state in which the moving image B is output alone in the fourth phrase. Here, an effect of inserting the video of the video A out of the videos A and B synthesized and output in the third phrase is inserted. FIG. 6F shows a state where a section corresponding to the fourth phrase of the single video B is output after the effect.

［効果］
実施形態のカラオケシステム１によれば、次の効果を奏する。
視聴者のユーザ関心パラメータとの類似度が高い歌唱動画を単独で提示したり、歌唱の旋律が互いに同調する複数の歌唱動画同士を合成した歌唱動画を提示したりといった具合に、楽曲のフレーズごとに複数の歌唱動画を様々な態様にてつなぎ合わせて提示できる。このようにすることで、エンターテイメント性の高い態様にて歌唱動画を視聴者に対して提供できる。例えば、歌唱の旋律が同調する複数の歌唱動画を同時に表示させることによって、演奏をより一層盛り上げることができる。 [effect]
The karaoke system 1 according to the embodiment has the following effects.
For each phrase of a song, such as presenting a single singing video that has a high degree of similarity to the user interest parameter of the viewer, or presenting a singing video that combines multiple singing videos that synchronize the melody of the song. A plurality of song videos can be connected and presented in various ways. By doing in this way, a song animation can be provided with respect to a viewer in the aspect with high entertainment property. For example, the performance can be further enhanced by simultaneously displaying a plurality of singing moving images synchronized with the melody of the singing.

また、楽曲のフレーズごとに単独の歌唱動画と、複数の歌唱動画からなる合成動画とを切替える際に、所定の演出効果を挿入することで、歌唱動画のエンターテイメント性を高めることができる。また、カラオケの分析採点に適用されている複数の評価項目に対する関心の度合を、ユーザ関心パラメータとして指定することができるので、それぞれの評価項目に対する視聴者の嗜好を的確に反映した態様にて編集した歌唱動画を視聴者に提供できる。 Moreover, the entertainment property of a song moving image can be improved by inserting a predetermined production effect when switching between a single song moving image and a synthesized moving image composed of a plurality of song moving images for each phrase of a song. In addition, since the degree of interest in a plurality of evaluation items applied to karaoke analysis scoring can be specified as a user interest parameter, it is edited in a manner that accurately reflects the viewer's preference for each evaluation item. Can be provided to viewers.

［変形例］
上述の実施形態では、配信サーバ２からネットワークを介してユーザ視聴端末３に歌唱動画が配信される事例について説明した。これとは別に、単体のコンピュータシステムにおいて、ユーザ関心パラメータを取得し、それに基づいて複数の歌唱動画を編集して再生するといった具合に、配信サーバ２及びユーザ視聴端末３の機能を併せ持つ装置単体で構成されるものであってもよい。 [Modification]
In the above-described embodiment, the example in which the singing moving image is distributed from the distribution server 2 to the user viewing terminal 3 via the network has been described. Separately, in a stand-alone computer system, a user interest parameter is acquired, and a plurality of song videos are edited and played back based on the parameter, so that the device alone has both functions of the distribution server 2 and the user viewing terminal 3. It may be configured.

［特許請求の範囲に記載の構成との対応］
実施形態のカラオケシステム１の各構成と、特許請求に記載の構成との対応は次のとおりである。 [Correspondence with configuration described in claims]
The correspondence between each configuration of the karaoke system 1 of the embodiment and the configuration described in the claims is as follows.

配信サーバ２の制御部２１が、取得手段、決定手段、出力制御手段に相当する。配信サーバ２の記憶部２２が、記憶手段に相当する。ユーザ視聴端末３が、受付手段、受信手段、再生手段に相当する。 The control unit 21 of the distribution server 2 corresponds to an acquisition unit, a determination unit, and an output control unit. The storage unit 22 of the distribution server 2 corresponds to a storage unit. The user viewing terminal 3 corresponds to an accepting unit, a receiving unit, and a reproducing unit.

１…カラオケシステム、２…配信サーバ、２１…制御部、２２…記憶部、２３…通信部、３…ユーザ視聴端末、１０…インターネット。 DESCRIPTION OF SYMBOLS 1 ... Karaoke system, 2 ... Distribution server, 21 ... Control part, 22 ... Memory | storage part, 23 ... Communication part, 3 ... User viewing terminal, 10 ... Internet.

Claims

A plurality of singing videos in which video and audio of a singer singing a song are recorded, and singing that represents an evaluation result in which the singing recorded in the singing video is evaluated for a predetermined evaluation item every predetermined period Storage means for storing evaluation information in association with each other;
An acquisition means for acquiring a user parameter indicating a degree of interest in the evaluation item for a user who has requested viewing of a song video corresponding to a specific song from among the song videos stored in the storage unit;
Whether or not the singing melody of each of the plurality of singing videos synchronizes with each other for each performance section obtained by dividing the specific music into a plurality of performance sections for the plurality of singing videos corresponding to the specific music And determining the degree of conformity between the singing evaluation results and the user parameters in each performance section of each of the plurality of singing videos, and outputting a single singing video based on the determination results, Determining means for determining for each performance section whether to output the singing video of
According to the output method determined by the determining means, among the plurality of song videos corresponding to the specific music, the song videos to be output for each performance section are switched, and the song videos to be output in each performance section The output control means for outputting as a series of singing videos for all performance sections by sequentially connecting and outputting the sections of the performance sections,
When the output control unit determines that the determination unit outputs a single singing video, the matching degree between the evaluation result of the singing in a predetermined performance section and the user parameter is the highest among the plurality of singing videos. While deciding to output a plurality of singing videos by the deciding means while deciding a high single singing video as an output target, a plurality of singing videos in which the melody of the singing in a predetermined performance section is synchronized are synthesized. , To determine the synthesized song video as the output target,
A singing video editing device characterized by this.

The singing video editing apparatus according to claim 1,
The singing evaluation information stored in the storage means in association with the singing video represents an evaluation result in which the singing recorded in the singing video is evaluated for a plurality of types of evaluation items.
The acquisition means acquires a user parameter representing a degree of interest in each of the plurality of evaluation items;
A singing video editing device characterized by this.

A plurality of singing videos in which video and audio of a singer singing a song are recorded, and singing that represents an evaluation result in which the singing recorded in the singing video is evaluated for a predetermined evaluation item every predetermined period Storage means for storing evaluation information in association with each other;
A receiving unit that receives from the user a viewing request for a song video corresponding to a specific song from among the song videos stored in the storage unit, and an instruction that specifies a user parameter indicating the degree of interest in the evaluation item;
Whether or not the singing melody of each of the plurality of singing videos synchronizes with each other for each performance section obtained by dividing the specific music into a plurality of performance sections for the plurality of singing videos corresponding to the specific music And determining the degree of conformity between the singing evaluation results and the user parameters in each performance section of each of the plurality of singing videos, and outputting a single singing video based on the determination results, Determining means for determining for each performance section whether to output the singing video of
According to the output method determined by the determining means, among the plurality of song videos corresponding to the specific music, the song videos to be output for each performance section are switched, and the song videos to be output in each performance section Output control means for outputting a series of singing videos for all performance sections by sequentially connecting and outputting the parts of the performance sections;
Receiving means for receiving the singing video output by the output control means;
Reproducing means for reproducing the singing video received by the receiving means and outputting to a predetermined display means and audio output means,
When the output control unit determines that the determination unit outputs a single singing video, the matching degree between the evaluation result of the singing in a predetermined performance section and the user parameter is the highest among the plurality of singing videos. While deciding to output a plurality of singing videos by the deciding means while deciding a high single singing video as an output target, a plurality of singing videos in which the melody of the singing in a predetermined performance section is synchronized are synthesized. , To determine the synthesized song video as the output target,
Singing video viewing system characterized by.