JP2023144076A

JP2023144076A - Program, information processing method and information processing device

Info

Publication number: JP2023144076A
Application number: JP2023132975A
Authority: JP
Inventors: 岬壱岐; Misaki Iki; 雄大石川; Takehiro Ishikawa; 直美菅; Naomi Suga; 琢人田寺; Takuto Tadera; アンジャナーゴビンダラジャン; Govindarajan Anjana; 樹理投野; Juri Tono; 隆資岡; Takashi Oka
Original assignee: Line Corp
Current assignee: Z Intermediate Global Corp
Priority date: 2021-12-27
Filing date: 2023-08-17
Publication date: 2023-10-06
Also published as: JP7335316B2; WO2023127486A1; JP2023096830A

Abstract

【課題】音源データから、歌唱を採点する際の基準として、より適切であるリファレンスデータを生成可能なプログラムを提供する。【解決手段】プログラムは、楽曲のメロディパートを含む音源データを取得する音源データ取得ステップと、音源データから、第１歌唱者の第１歌唱音声データと第１歌唱者とは異なる第２歌唱者の第２歌唱音声データとを分離して抽出する歌唱音声分離抽出ステップと、第１歌唱音声データに基づいてメロディパートのうちで第１歌唱者が歌唱する部分の第１リファレンスデータを生成し、第２歌唱音声データに基づいてメロディパートのうちで第２歌唱者が歌唱する部分の第２リファレンスデータを生成するリファレンスデータ生成ステップと、を情報処理装置のコンピュータに実行させるためのものである。【選択図】図８The present invention provides a program that can generate reference data that is more appropriate as a standard for scoring singing from sound source data. [Solution] The program includes a sound source data acquisition step of acquiring sound source data including a melody part of a song, and from the sound source data, first singing voice data of a first singer and a second singer different from the first singer. a singing voice separation and extraction step of separating and extracting the second singing voice data; and generating first reference data of the part sung by the first singer of the melody part based on the first singing voice data; This is for causing the computer of the information processing device to execute a reference data generation step of generating second reference data of a portion of the melody part sung by the second singer based on the second singing voice data. [Selection diagram] Figure 8

Description

本開示は、楽曲の音源データからリファレンスデータを生成するプログラム、情報処理方法及び情報処理装置に関するものである。 The present disclosure relates to a program, an information processing method, and an information processing apparatus that generate reference data from sound source data of a song.

楽曲データの歌唱データ中、センター定位されているボーカル信号の帯域をキャンセルし、残りのデータを元データ（歌唱データ）より減算して主ボーカルデータとして抽出することにより、楽曲データから主旋律の歌唱データである主ボーカルデータを抽出するプログラム主ボーカルデータ抽出手段と、抽出された主ボーカルデータから、歌唱を採点する際の基準となるリファレンスデータを生成するリファレンスデータ生成手段と、を有するカラオケ装置が知られている（例えば、特許文献１参照）。 By canceling the band of the center-localized vocal signal in the singing data of the song data and subtracting the remaining data from the original data (singing data) and extracting it as main vocal data, the singing data of the main melody is extracted from the song data. A karaoke apparatus is known which has a program main vocal data extraction means for extracting main vocal data, and a reference data generation means for generating reference data that is a standard for scoring singing from the extracted main vocal data. (For example, see Patent Document 1).

特開２０１５－２２５３０２号公報Japanese Patent Application Publication No. 2015-225302

しかしながら、特許文献１に示されるような技術では、抽出された主ボーカルデータが、必ずしもカラオケとして歌唱する部分と一致していない場合がある。例えば、歌唱部分以外で人の声が使われている演出部分や、楽器音であるが抽出精度等の問題で誤って抽出されてしまった音等が、抽出された主ボーカルデータに混じってしまう可能性がある。そして、このようなカラオケの歌唱部分と一致していない非歌唱部分を含む主ボーカルデータを用いてリファレンスデータを生成すると、歌唱を採点する際の基準として適切ではなくなってしまうおそれがある。 However, with the technology shown in Patent Document 1, the extracted main vocal data may not necessarily match the part sung as karaoke. For example, production parts where human voices are used in areas other than singing parts, sounds of musical instruments that were incorrectly extracted due to problems with extraction accuracy, etc., may be mixed in with the extracted main vocal data. there is a possibility. If reference data is generated using main vocal data including non-singing parts that do not match the karaoke singing parts, it may not be appropriate as a standard for scoring singing.

本開示は、このような課題を解決するためになされたものである。その目的は、楽曲のメロディパートを含む音源データから、歌唱を採点する際の基準として、より適切であるリファレンスデータを生成可能なプログラム、情報処理方法及び情報処理装置を提供することにある。 The present disclosure has been made to solve such problems. The purpose is to provide a program, an information processing method, and an information processing device that can generate reference data that is more appropriate as a standard for scoring singing from sound source data that includes a melody part of a song.

本開示に係るプログラムは、情報処理装置のコンピュータに実行されるためのプログラムであって、楽曲のメロディパートを含む音源データを取得する音源データ取得ステップと、前記音源データから、第１歌唱者の第１歌唱音声データと前記第１歌唱者とは異なる第２歌唱者の第２歌唱音声データとを分離して抽出する歌唱音声分離抽出ステップと、前記第１歌唱音声データに基づいて前記メロディパートのうちで前記第１歌唱者が歌唱する部分の第１リファレンスデータを生成し、前記第２歌唱音声データに基づいて前記メロディパートのうちで前記第２歌唱者が歌唱する部分の第２リファレンスデータを生成するリファレンスデータ生成ステップと、が前記情報処理装置のコンピュータに実行される。 A program according to the present disclosure is a program to be executed by a computer of an information processing device, and includes a sound source data acquisition step of acquiring sound source data including a melody part of a song, and a step of acquiring sound source data including a melody part of a song, and a step of acquiring sound source data of a first singer from the sound source data. a singing voice separation and extraction step of separating and extracting first singing voice data and second singing voice data of a second singer different from the first singer; generating first reference data for a part of the melody part sung by the first singer; and based on the second singing voice data, second reference data for a part of the melody part sung by the second singer. A reference data generation step of generating the reference data is executed by the computer of the information processing device.

本開示に係る情報処理方法は、情報処理装置のコンピュータに実行される情報処理方法であって、楽曲のメロディパートを含む音源データを取得する音源データ取得ステップと、前記音源データから、第１歌唱者の第１歌唱音声データと前記第１歌唱者とは異なる第２歌唱者の第２歌唱音声データとを分離して抽出する歌唱音声分離抽出ステップと、前記第１歌唱音声データに基づいて前記メロディパートのうちで前記第１歌唱者が歌唱する部分の第１リファレンスデータを生成し、前記第２歌唱音声データに基づいて前記メロディパートのうちで前記第２歌唱者が歌唱する部分の第２リファレンスデータを生成するリファレンスデータ生成ステップと、を含む。 An information processing method according to the present disclosure is an information processing method executed by a computer of an information processing device, and includes a sound source data acquisition step of acquiring sound source data including a melody part of a song, and a step of acquiring sound source data including a melody part of a song. a singing voice separation and extraction step of separating and extracting first singing voice data of a person and second singing voice data of a second singer different from the first singer; Generate first reference data of a part of the melody part sung by the first singer, and generate second reference data of the part of the melody part sung by the second singer based on the second singing voice data. and a reference data generation step of generating reference data.

本開示に係る情報処理装置は、楽曲のメロディパートを含む音源データを取得する音源データ取得部と、前記音源データから、第１歌唱者の第１歌唱音声データと前記第１歌唱者とは異なる第２歌唱者の第２歌唱音声データとを分離して抽出する歌唱音声分離抽出部と、前記第１歌唱音声データに基づいて前記メロディパートのうちで前記第１歌唱者が歌唱する部分の第１リファレンスデータを生成し、前記第２歌唱音声データに基づいて前記メロディパートのうちで前記第２歌唱者が歌唱する部分の第２リファレンスデータを生成するリファレンスデータ生成部と、を含む。 The information processing device according to the present disclosure includes a sound source data acquisition unit that acquires sound source data including a melody part of a song, and from the sound source data, first singing voice data of a first singer is different from the first singer. a singing voice separation and extraction unit that separates and extracts second singing voice data of a second singer; and a singing voice separating and extracting unit that separates and extracts second singing voice data of a second singer; 1 reference data, and generates second reference data of a portion of the melody part sung by the second singer based on the second singing voice data.

本開示に係るプログラム、情報処理方法及び情報処理装置によれば、楽曲のメロディパートを含む音源データから、歌唱を採点する際の基準として、より適切であるリファレンスデータを生成可能であるという効果を奏する。 According to the program, information processing method, and information processing device according to the present disclosure, it is possible to generate reference data that is more appropriate as a standard for scoring singing from sound source data including the melody part of a song. play.

実施の形態１に係る通信システムの全体構成を示す図である。1 is a diagram showing the overall configuration of a communication system according to Embodiment 1. FIG. 実施の形態１に係る通信システムが備える端末の構成を示すブロック図である。1 is a block diagram showing the configuration of a terminal included in the communication system according to Embodiment 1. FIG. 実施の形態１に係る通信システムが備えるサーバの構成を示すブロック図である。1 is a block diagram showing the configuration of a server included in the communication system according to Embodiment 1. FIG. 実施の形態１に係る通信システムが備えるサーバに記憶された同期歌詞データの一例を示す図である。FIG. 3 is a diagram showing an example of synchronized lyrics data stored in a server included in the communication system according to the first embodiment. 実施の形態１に係る通信システムにおける処理の一例を示すフローチャートである。5 is a flowchart illustrating an example of processing in the communication system according to the first embodiment. 実施の形態１に係る通信システムが備えるサーバの変形例における要部の構成を示すブロック図である。FIG. 3 is a block diagram showing the configuration of main parts in a modified example of the server included in the communication system according to the first embodiment. 実施の形態１に係る通信システムの変形例における処理の一例を示すフローチャートである。7 is a flowchart illustrating an example of processing in a modification of the communication system according to the first embodiment. 実施の形態２に係る通信システムが備えるサーバの構成を示すブロック図である。FIG. 2 is a block diagram showing the configuration of a server included in the communication system according to Embodiment 2. FIG. 実施の形態２に係る通信システムにおける処理の一例を示すフローチャートである。7 is a flowchart illustrating an example of processing in the communication system according to Embodiment 2. FIG.

本開示に係るプログラム、情報処理方法及び情報処理装置を実施するための形態について添付の図面を参照しながら説明する。各図において、同一又は相当する部分には同一の符号を付して、重複する説明は適宜に簡略化又は省略する。以下の説明においては便宜上、図示の状態を基準に各構造の位置関係を表現することがある。なお、本開示は以下の実施の形態に限定されることなく、本開示の趣旨を逸脱しない範囲において、各実施の形態の自由な組み合わせ、各実施の形態の任意の構成要素の変形、又は各実施の形態の任意の構成要素の省略が可能である。 Embodiments for implementing a program, an information processing method, and an information processing apparatus according to the present disclosure will be described with reference to the accompanying drawings. In each figure, the same or corresponding parts are given the same reference numerals, and overlapping explanations will be simplified or omitted as appropriate. In the following description, for convenience, the positional relationship of each structure may be expressed based on the illustrated state. Note that the present disclosure is not limited to the following embodiments, and any combination of embodiments, modification of any component of each embodiment, or modification of each embodiment may be made without departing from the spirit of the present disclosure. Any component of the embodiment can be omitted.

実施の形態１．
図１から図７を参照しながら、本開示の実施の形態１について説明する。図１は通信システムの全体構成を示す図である。図２は通信システムが備える端末の構成を示すブロック図である。図３は通信システムが備えるサーバの構成を示すブロック図である。図４は通信システムが備えるサーバに記憶された同期歌詞データの一例を示す図である。図５は通信システムにおける処理の一例を示すフローチャートである。図６は通信システムが備えるサーバの変形例における要部の構成を示すブロック図である。図７は通信システムの変形例における処理の一例を示すフローチャートである。 Embodiment 1.
Embodiment 1 of the present disclosure will be described with reference to FIGS. 1 to 7. FIG. 1 is a diagram showing the overall configuration of a communication system. FIG. 2 is a block diagram showing the configuration of a terminal included in the communication system. FIG. 3 is a block diagram showing the configuration of a server included in the communication system. FIG. 4 is a diagram showing an example of synchronized lyrics data stored in a server included in the communication system. FIG. 5 is a flowchart showing an example of processing in the communication system. FIG. 6 is a block diagram showing the configuration of main parts in a modified example of the server included in the communication system. FIG. 7 is a flowchart showing an example of processing in a modified example of the communication system.

図１に示すように、この実施の形態に係る通信システム４００は、サーバ１００と、端末２００とを備えている。通信システム４００では、サーバ１００と端末２００とが、ネットワーク３００を介して通信可能に接続されている。ここで説明する構成例では、サーバ１００は、ネットワーク３００を介してユーザが所有する端末２００に、楽曲の音源データの配信サービスやカラオケ採点サービス等を提供する。なお、ネットワーク３００に接続される端末２００の数は２台に限られず、１台であってもよいし３台以上であってもよい。 As shown in FIG. 1, a communication system 400 according to this embodiment includes a server 100 and a terminal 200. In the communication system 400, a server 100 and a terminal 200 are communicably connected via a network 300. In the configuration example described here, the server 100 provides a music source data distribution service, a karaoke scoring service, etc. to the terminal 200 owned by the user via the network 300. Note that the number of terminals 200 connected to network 300 is not limited to two, and may be one or three or more.

ネットワーク３００は、１以上の端末２００と、１以上のサーバ１００とを接続する役割を担う。すなわち、ネットワーク３００は、端末２００がサーバ１００に接続した後、データを送受信することができるように接続経路を提供する通信網を意味する。ネットワーク３００のうちの１つ又は複数の部分は、有線ネットワークや無線ネットワークであってもよいし、そうでなくてもよい。 The network 300 plays a role of connecting one or more terminals 200 and one or more servers 100. That is, the network 300 refers to a communication network that provides a connection path so that the terminal 200 can transmit and receive data after connecting to the server 100. One or more portions of network 300 may or may not be a wired network or a wireless network.

ネットワーク３００は、例えば、アドホック・ネットワーク（ＡｄＨｏｃＮｅｔｗｏｒｋ）、イントラネット、エクストラネット、仮想プライベート・ネットワーク（ＶｉｒｔｕａｌＰｒｉｖａｔｅＮｅｔｗｏｒｋ：ＶＰＮ）、ローカル・エリア・ネットワーク（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ：ＬＡＮ）、ワイヤレスＬＡＮ（ＷｉｒｅｌｅｓｓＬＡＮ：ＷＬＡＮ）、広域ネットワーク（ＷｉｄｅＡｒｅａＮｅｔｗｏｒｋ：ＷＡＮ）、ワイヤレスＷＡＮ（ＷｉｒｅｌｅｓｓＷＡＮ：ＷＷＡＮ）、大都市圏ネットワーク（ＭｅｔｒｏｐｏｌｉｔａｎＡｒｅａＮｅｔｗｏｒｋ：ＭＡＮ）、インターネットの一部、公衆交換電話網（ＰｕｂｌｉｃＳｗｉｔｃｈｅｄＴｅｌｅｐｈｏｎｅＮｅｔｗｏｒｋ：ＰＳＴＮ）の一部、携帯電話網、ＩＳＤＮ（ＩｎｔｅｇｒａｔｅｄＳｅｒｖｉｃｅＤｉｇｉｔａｌＮｅｔｗｏｒｋｓ）、無線ＬＡＮ、ＬＴＥ（ＬｏｎｇＴｅｒｍＥｖｏｌｕｔｉｏｎ）、ＣＤＭＡ（ＣｏｄｅＤｉｖｉｓｉｏｎＭｕｌｔｉｐｌｅＡｃｃｅｓｓ）、ブルートゥース（Ｂｌｕｅｔｏｏｔｈ（登録商標））、又は、衛星通信等、もしくは、これらの２つ以上の組合せを含むことができる。ネットワーク３００は、１つ又は複数のネットワーク３００を含むことができる。 The network 300 is, for example, an ad hoc network, an intranet, an extranet, a virtual private network (VPN), a local area network (LAN), or a wireless LAN. : WLAN), Wide Area Network (WAN), Wireless WAN (WWAN), Metropolitan Area Network (MAN), part of the Internet, Public Switched Telephone Network Ephone Network: PSTN), mobile phone network, ISDN (Integrated Service Digital Networks), wireless LAN, LTE (Long Term Evolution), CDMA (Code Division Multiple Access), Blue Bluetooth (registered trademark) or satellite communication, etc. , or a combination of two or more of these. Network 300 may include one or more networks 300.

端末２００は、本開示に係る実施形態の機能を実現できる情報処理端末であればどのような端末であってもよい。端末２００は、例えば、スマートフォン、携帯電話（フィーチャーフォン）、コンピュータ（例えば、デスクトップＰＣ、ラップトップＰＣ、タブレットＰＣ等）、メディアコンピュータプラットホーム（例えば、ケーブル、衛星セットトップボックス、デジタルビデオレコーダ）、ハンドヘルドコンピュータデバイス（例えば、ＰＤＡ（ＰｅｒｓｏｎａｌＤｉｇｉｔａｌＡｓｓｉｓｔａｎｔ）、電子メールクライアント等）、ウェアラブル端末（メガネ型デバイス、時計型デバイス等）、又は他種のコンピュータ、又はコミュニケーションプラットホームを含む。また、端末２００は情報処理端末と表現されてもよい。 The terminal 200 may be any information processing terminal that can implement the functions of the embodiments of the present disclosure. Terminal 200 may be, for example, a smartphone, a mobile phone (feature phone), a computer (e.g., desktop PC, laptop PC, tablet PC, etc.), a media computer platform (e.g., cable, satellite set-top box, digital video recorder), handheld. These include computer devices (eg, PDAs (Personal Digital Assistants), e-mail clients, etc.), wearable terminals (glass-type devices, watch-type devices, etc.), or other types of computers, or communication platforms. Further, the terminal 200 may be expressed as an information processing terminal.

サーバ１００は、端末２００に対して、所定のサービスを提供する機能を備える。サーバ１００は、本開示に係る実施形態の機能を実現できる情報処理装置であればどのような装置であってもよい。サーバ１００は、例えば、サーバ装置、コンピュータ（例えば、デスクトップＰＣ、ラップトップＰＣ、タブレットＰＣ等）、メディアコンピュータプラットホーム（例えば、ケーブル、衛星セットトップボックス、デジタルビデオレコーダ）、ハンドヘルドコンピュータデバイス（例えば、ＰＤＡ、電子メールクライアント等）、あるいは他種のコンピュータ、又はコミュニケーションプラットホームを含む。また、サーバ１００は情報処理装置と表現されてもよい。サーバ１００と端末２００とを区別する必要がない場合は、サーバ１００と端末２００とは、それぞれ情報処理装置と表現されてもよいし、されなくてもよい。 The server 100 has a function of providing predetermined services to the terminal 200. The server 100 may be any information processing device that can implement the functions of the embodiments of the present disclosure. Server 100 may be, for example, a server device, a computer (e.g., desktop PC, laptop PC, tablet PC, etc.), a media computer platform (e.g., cable, satellite set-top box, digital video recorder), a handheld computer device (e.g., PDA , email clients, etc.) or other types of computers or communication platforms. Further, the server 100 may be expressed as an information processing device. If there is no need to distinguish between the server 100 and the terminal 200, the server 100 and the terminal 200 may or may not each be expressed as an information processing device.

それぞれの端末２００の構成は基本的には同一である。次に、図２を参照しながら、端末２００の構成について説明する。端末２００は、端末制御部２３０、端末記憶部２２０、端末通信部２１０、入出力部２４０、表示部２５０、マイク２６０、スピーカ２７０及びカメラ２８０を備える。端末２００のハードウェアの各構成要素は、例えば、バスを介して相互に接続されている。なお、端末２００のハードウェア構成として、ここで説明する全ての構成要素を含むことは必須ではない。例えば、端末２００は、カメラ２８０等の個々の構成要素、又は複数の構成要素を取り外すような構成であってもよいし、そうでなくてもよい。 The configuration of each terminal 200 is basically the same. Next, the configuration of the terminal 200 will be described with reference to FIG. 2. The terminal 200 includes a terminal control section 230, a terminal storage section 220, a terminal communication section 210, an input/output section 240, a display section 250, a microphone 260, a speaker 270, and a camera 280. The hardware components of the terminal 200 are interconnected via, for example, a bus. Note that the hardware configuration of the terminal 200 does not necessarily include all the components described here. For example, the terminal 200 may or may not be configured such that an individual component or multiple components such as the camera 280 can be removed.

端末通信部２１０は、ネットワーク３００を介して各種データの送受信を行う。当該通信は、有線、無線のいずれで実行されてもよく、互いの通信が実行できるのであれば、どのような通信プロトコルを用いてもよい。端末通信部２１０は、ネットワーク３００を介して、サーバ１００との通信を実行する機能を有する。端末通信部２１０は、端末送信部２１１及び端末受信部２１２を含んでいる。端末送信部２１１は、各種データを端末制御部２３０からの指示に従って、サーバ１００に送信する。端末受信部２１２は、サーバ１００から送信された各種データを受信し、端末制御部２３０に伝達する。また、端末通信部２１０を端末通信Ｉ／Ｆ（インタフェース）と表現する場合もある。また、端末通信部２１０が物理的に構造化された回路で構成される場合には、端末通信回路と表現する場合もある。 The terminal communication unit 210 transmits and receives various data via the network 300. The communication may be performed by wire or wirelessly, and any communication protocol may be used as long as mutual communication can be performed. The terminal communication unit 210 has a function of communicating with the server 100 via the network 300. The terminal communication section 210 includes a terminal transmitting section 211 and a terminal receiving section 212. The terminal transmitter 211 transmits various data to the server 100 according to instructions from the terminal controller 230. Terminal receiving section 212 receives various data transmitted from server 100 and transmits it to terminal controlling section 230. Further, the terminal communication unit 210 may be expressed as a terminal communication I/F (interface). In addition, when the terminal communication unit 210 is composed of a physically structured circuit, it may be expressed as a terminal communication circuit.

入出力部２４０は、入力部及び出力部を含む。入力部は、端末２００に対する各種操作を入力する装置である。出力部は、端末２００で処理された処理結果を出力する装置である。入出力部２４０は、入力部と出力部が一体化していてもよいし、入力部と出力部に分離していてもよいし、そうでなくてもよい。 The input/output section 240 includes an input section and an output section. The input unit is a device for inputting various operations to the terminal 200. The output unit is a device that outputs the processing results processed by the terminal 200. The input/output section 240 may have an input section and an output section integrated, or may be separated into an input section and an output section, or may not.

入力部は、ユーザからの入力を受け付けて、当該入力に係る情報を端末制御部２３０に伝達できる全ての種類の装置のいずれか、又は、その組み合わせにより実現される。入力部は、例えば、タッチパネル、タッチディスプレイ、キーボード等のハードウェアキーや、マウス等のポインティングデバイス、カメラ（動画像を介した操作入力）、マイク（音声による操作入力）を含む。 The input unit is realized by any one of all types of devices, or a combination thereof, that can receive input from the user and transmit information related to the input to the terminal control unit 230. The input unit includes, for example, a touch panel, a touch display, hardware keys such as a keyboard, a pointing device such as a mouse, a camera (operation input via moving images), and a microphone (operation input via voice).

出力部は、端末制御部２３０で処理された処理結果を出力することができる全ての種類の装置のいずれか、又は、その組み合わせにより実現される。出力部は、例えば、タッチパネル、タッチディスプレイ、スピーカ（音声出力）、レンズ（例えば３Ｄ（ＴｈｒｅｅＤｉｍｅｎｓｉｏｎｓ）出力や、ホログラム出力）、プリンター等を含む。 The output unit is realized by any of all types of devices capable of outputting the processing results processed by the terminal control unit 230, or a combination thereof. The output unit includes, for example, a touch panel, a touch display, a speaker (audio output), a lens (for example, 3D (Three Dimensions) output or hologram output), a printer, and the like.

表示部２５０は、フレームバッファに書き込まれた表示データに従って、表示することができる全ての種類の装置のいずれか、又は、その組み合わせにより実現される。表示部２５０は、例えば、タッチパネル、タッチディスプレイ、モニタ（例えば、液晶ディスプレイやＯＥＬＤ（ＯｒｇａｎｉｃＥｌｅｃｔｒｏｌｕｍｉｎｅｓｃｅｎｃｅＤｉｓｐｌａｙ）等）、ヘッドマウントディスプレイ（ＨＤＭ：ＨｅａｄＭｏｕｎｔｅｄＤｉｓｐｌａｙ）、プロジェクションマッピング、ホログラム、空気中等（真空であってもよいし、そうでなくてもよい）に画像やテキスト情報等を表示可能な装置を含む。なお、これらの表示部２５０は、３Ｄで表示データを表示可能であってもよいし、そうでなくてもよい。 The display unit 250 is realized by any one of all types of devices capable of displaying data according to the display data written in the frame buffer, or a combination thereof. The display unit 250 is, for example, a touch panel, a touch display, a monitor (for example, a liquid crystal display, an OELD (Organic Electroluminescence Display), etc.), a head mounted display (HDM), a projection mapping, a hologram, an air display, etc. (which may or may not be the case) includes devices that can display images, text information, etc. Note that these display units 250 may or may not be capable of displaying display data in 3D.

なお、入出力部２４０がタッチパネルを有する場合、入出力部２４０と表示部２５０とは、略同一の大きさ及び形状で対向して配置されていてもよい。 Note that when the input/output unit 240 has a touch panel, the input/output unit 240 and the display unit 250 may have substantially the same size and shape and may be disposed facing each other.

端末制御部２３０は、プログラム内に含まれたコード又は命令によって実現する機能を実行するために物理的に構造化された回路を有し、例えば、ハードウェアに内蔵されたデータ処理装置により実現される。そのため、端末制御部２３０は、制御回路と表現されてもよいし、されなくてもよい。 The terminal control unit 230 has a physically structured circuit for executing functions realized by codes or instructions included in a program, and is realized by, for example, a data processing device built into hardware. Ru. Therefore, the terminal control unit 230 may or may not be expressed as a control circuit.

端末制御部２３０は、例えば、中央処理装置（ＣＰＵ：ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）、マイクロプロセッサ（Ｍｉｃｒｏｐｒｏｃｅｓｓｏｒ）、プロセッサコア（ＰｒｏｃｅｓｓｏｒＣｏｒｅ）、マルチプロセッサ（Ｍｕｌｔｉｐｒｏｃｅｓｓｏｒ）、ＡＳＩＣ（Ａｐｐｌｉｃａｔｉｏｎ－ＳｐｅｃｉｆｉｃＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ）、ＦＰＧＡ（ＦｉｅｌｄＰｒｏｇｒａｍｍａｂｌｅＧａｔｅＡｒｒａｙ）等を含む。 The terminal control unit 230 includes, for example, a central processing unit (CPU), a microprocessor, a processor core, a multiprocessor, and an application-specific IC (ASIC). ific Integrated Circuit), FPGA (Field Programmable Gate Array), etc.

端末記憶部２２０は、端末２００が動作するうえで必要とする各種プログラムや各種データを記憶する機能を有する。端末記憶部２２０は、例えば、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）、ＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）、フラッシュメモリ、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）等各種の記憶媒体を含む。また、端末記憶部２２０は、メモリ（Ｍｅｍｏｒｙ）と表現されてもよいし、されなくてもよい。 The terminal storage unit 220 has a function of storing various programs and various data necessary for the operation of the terminal 200. The terminal storage unit 220 includes various storage media such as, for example, a hard disk drive (HDD), a solid state drive (SSD), a flash memory, a random access memory (RAM), and a read only memory (ROM). Further, the terminal storage unit 220 may or may not be expressed as a memory.

端末２００は、プログラムを端末記憶部２２０に記憶し、このプログラムを実行することで、端末制御部２３０が、端末制御部２３０に含まれる各部としての処理を実行する。つまり、端末記憶部２２０に記憶されるプログラムは、端末２００に、端末制御部２３０が実行する各機能を実現させる。換言すれば、端末２００においてメモリに記憶されたプログラムをプロセッサが実行し、端末２００のハードウェアとソフトウェアとが協働することによって、端末２００が備える各部の機能が実現される。なお、このプログラムは、プログラムモジュールと表現されてもよいし、されなくてもよい。 The terminal 200 stores a program in the terminal storage unit 220, and by executing this program, the terminal control unit 230 executes processing as each unit included in the terminal control unit 230. That is, the programs stored in the terminal storage section 220 cause the terminal 200 to implement each function executed by the terminal control section 230. In other words, the processor of the terminal 200 executes a program stored in the memory, and the hardware and software of the terminal 200 cooperate to realize the functions of each part of the terminal 200. Note that this program may or may not be expressed as a program module.

マイク２６０は、音声データの入力に利用される。スピーカ２７０は、音声データの出力に利用される。カメラ２８０は、動画像データ及び／又は静止画像データの取得に利用される。 Microphone 260 is used for inputting audio data. Speaker 270 is used to output audio data. Camera 280 is used to acquire moving image data and/or still image data.

次に、図３を参照しながら、サーバ１００の構成について説明する。サーバ１００は、サーバ制御部１３０、サーバ記憶部１２０及びサーバ通信部１１０を備えている。サーバ１００のハードウェアの各構成要素は、例えば、バスを介して相互に接続されている。 Next, the configuration of the server 100 will be described with reference to FIG. 3. The server 100 includes a server control section 130, a server storage section 120, and a server communication section 110. The hardware components of the server 100 are interconnected via, for example, a bus.

サーバ制御部１３０は、プログラム内に含まれたコード又は命令によって実現する機能を実行するために物理的に構造化された回路を有し、例えば、ハードウェアに内蔵されたデータ処理装置により実現される。サーバ制御部１３０は、代表的には中央処理装置（ＣＰＵ）であり、その他にマイクロプロセッサ、プロセッサコア、マルチプロセッサ、ＡＳＩＣ、ＦＰＧＡ等であってもよいし、そうでなくてもよい。本開示において、サーバ制御部１３０は、これらに限定されない。 The server control unit 130 has a physically structured circuit to execute functions realized by codes or instructions included in a program, and is realized by, for example, a data processing device built in hardware. Ru. The server control unit 130 is typically a central processing unit (CPU), and may or may not be a microprocessor, processor core, multiprocessor, ASIC, FPGA, or the like. In the present disclosure, the server control unit 130 is not limited to these.

サーバ記憶部１２０は、サーバ１００が動作するうえで必要とする各種プログラムや各種データを記憶する機能を有する。サーバ記憶部１２０は、ＨＤＤ、ＳＳＤ、フラッシュメモリ等各種の記憶媒体により実現される。ただし、本開示において、サーバ記憶部１２０は、これらに限定されない。また、サーバ記憶部１２０は、メモリ（Ｍｅｍｏｒｙ）と表現されてもよいし、されなくてもよい。 The server storage unit 120 has a function of storing various programs and various data necessary for the operation of the server 100. The server storage unit 120 is realized by various storage media such as HDD, SSD, and flash memory. However, in the present disclosure, the server storage unit 120 is not limited to these. Further, the server storage unit 120 may or may not be expressed as a memory.

サーバ通信部１１０は、ネットワーク３００を介して各種データの送受信を行う。当該通信は、有線、無線のいずれで実行されてもよく、互いの通信が実行できるのであれば、どのような通信プロトコルを用いてもよい。サーバ通信部１１０は、ネットワーク３００を介して、端末２００との通信を実行する機能を有する。サーバ通信部１１０は、サーバ送信部１１１及びサーバ受信部１１２を含んでいる。サーバ送信部１１１は、各種データをサーバ制御部１３０からの指示に従って、端末２００に送信する。また、サーバ通信部１１０は、端末２００から送信された各種データを受信し、サーバ制御部１３０に伝達する。また、サーバ通信部１１０をサーバ通信Ｉ／Ｆ（インタフェース）と表現する場合もある。また、サーバ通信部１１０が物理的に構造化された回路で構成される場合には、サーバ通信回路と表現する場合もある。 The server communication unit 110 transmits and receives various data via the network 300. The communication may be performed by wire or wirelessly, and any communication protocol may be used as long as mutual communication can be performed. The server communication unit 110 has a function of communicating with the terminal 200 via the network 300. The server communication section 110 includes a server transmission section 111 and a server reception section 112. The server transmitter 111 transmits various data to the terminal 200 according to instructions from the server controller 130. Additionally, the server communication unit 110 receives various data transmitted from the terminal 200 and transmits it to the server control unit 130. Further, the server communication unit 110 may be expressed as a server communication I/F (interface). Further, when the server communication unit 110 is composed of a physically structured circuit, it may be expressed as a server communication circuit.

なお、サーバ１００は、ハードウェア構成として、入出力部及びディスプレイを備えてもよい。入出力部は、サーバ１００に対する各種操作を入力する装置により実現される。入出力部は、ユーザからの入力を受け付けて、当該入力に係る情報をサーバ制御部１３０に伝達できる全ての種類の装置のいずれか、又は、その組み合わせにより実現される。ディスプレイは、代表的にはモニタ（例えば、液晶ディスプレイやＯＥＬＤ等）で実現される。この場合、例えば、サーバ１００のハードウェアは、ディスプレイを取り外すような構成であってもよいし、そうでなくてもよい。 Note that the server 100 may include an input/output unit and a display as a hardware configuration. The input/output unit is realized by a device that inputs various operations to the server 100. The input/output unit is realized by any one of all types of devices capable of receiving input from a user and transmitting information related to the input to the server control unit 130, or a combination thereof. The display is typically realized by a monitor (eg, a liquid crystal display, an OELD, etc.). In this case, for example, the hardware of the server 100 may or may not have a display that can be removed.

サーバ１００は、プログラムをサーバ記憶部１２０に記憶し、このプログラムを実行することで、サーバ制御部１３０が、サーバ制御部１３０に含まれる各部としての処理を実行する。つまり、サーバ記憶部１２０に記憶されるプログラムは、サーバ１００に、サーバ制御部１３０が実行する各機能を実現させる。換言すれば、サーバ１００においてメモリに記憶されたプログラムをプロセッサが実行し、サーバ１００のハードウェアとソフトウェアとが協働することによって、サーバ１００が備える各部の機能が実現される。なお、このプログラムは、プログラムモジュールと表現されてもよいし、されなくてもよい。 The server 100 stores a program in the server storage unit 120, and by executing this program, the server control unit 130 executes processing as each unit included in the server control unit 130. That is, the programs stored in the server storage unit 120 cause the server 100 to implement each function executed by the server control unit 130. In other words, in the server 100, a processor executes a program stored in a memory, and the hardware and software of the server 100 cooperate to realize the functions of each part of the server 100. Note that this program may or may not be expressed as a program module.

なお、端末２００の端末制御部２３０、及び／又は、サーバ１００のサーバ制御部１３０は、制御回路を有するＣＰＵだけでなく、集積回路（ＩＣ（ＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ）チップ、ＬＳＩ（ＬａｒｇｅＳｃａｌｅＩｎｔｅｇｒａｔｉｏｎ））等に形成された論理回路（ハードウェア）や専用回路によって各処理を実現してもよいし、そうでなくてもよい。また、これらの回路は、１又は複数の集積回路により実現されてよく、本開示に係る実施形態に示す複数の処理を１つの集積回路により実現されることとしてもよいし、そうでなくてもよい。また、ＬＳＩは、集積度の違いにより、ＶＬＳＩ、スーパーＬＳＩ、ウルトラＬＳＩ等と呼称されることもある。 Note that the terminal control unit 230 of the terminal 200 and/or the server control unit 130 of the server 100 include not only a CPU having a control circuit but also an integrated circuit (IC (Integrated Circuit) chip, LSI (Large Scale Integration)), etc. Each process may or may not be realized by a logic circuit (hardware) or a dedicated circuit formed in the system. Further, these circuits may be realized by one or more integrated circuits, and the plurality of processes shown in the embodiments of the present disclosure may be realized by one integrated circuit, or even if not. good. Further, LSIs are sometimes called VLSIs, super LSIs, ultra LSIs, etc. depending on the degree of integration.

また、本開示に係る実施形態のプログラム(例えば、ソフトウェアプログラム、コンピュータプログラム、又はプログラムモジュール)は、コンピュータに読み取り可能な記憶媒体に記憶された状態で提供されてもよいし、されなくてもよい。記憶媒体は、「一時的でない有形の媒体」に、プログラムを記憶可能である。また、プログラムは、本開示に係る実施形態の機能の一部を実現するためのものであってもよいし、そうでなくてもよい。さらに、本開示に係る実施形態の機能を記憶媒体に既に記録されているプログラムとの組み合わせで実現できるもの、いわゆる差分ファイル（差分プログラム）であってもよいし、そうでなくてもよい。 Further, the program (for example, a software program, a computer program, or a program module) according to an embodiment of the present disclosure may or may not be provided in a state stored in a computer-readable storage medium. . The storage medium is a "non-temporary tangible medium" that can store a program. Furthermore, the program may or may not be for realizing part of the functions of the embodiments of the present disclosure. Furthermore, it may or may not be a so-called difference file (difference program) that can realize the functions of the embodiments of the present disclosure in combination with a program already recorded on a storage medium.

記憶媒体は、１つ又は複数の半導体ベースの、又は他の集積回路（ＩＣ）（例えば、フィールド・プログラマブル・ゲート・アレイ（ＦＰＧＡ）又は特定用途向けＩＣ（ＡＳＩＣ）等）、ハード・ディスク・ドライブ（ＨＤＤ）、ハイブリッド・ハード・ドライブ（ＨＨＤ）、光ディスク、光ディスクドライブ（ＯＤＤ）、光磁気ディスク、光磁気ドライブ、フロッピィ・ディスケット、フロッピィ・ディスク・ドライブ（ＦＤＤ）、磁気テープ、固体ドライブ（ＳＳＤ）、ＲＡＭドライブ、セキュア・デジタル・カード、又はドライブ、任意の他の適切な記憶媒体、もしくは、これらの２つ以上の適切な組合せを含むことができる。記憶媒体は、適切な場合、揮発性、不揮発性、又は揮発性と不揮発性の組合せでよい。なお、記憶媒体はこれらの例に限られず、プログラムを記憶可能であれば、どのようなデバイス又は媒体であってもよい。また、記憶媒体をメモリ（Ｍｅｍｏｒｙ）と表現されてもよいし、されなくてもよい。 The storage medium may include one or more semiconductor-based or other integrated circuits (ICs), such as field programmable gate arrays (FPGAs) or application specific integrated circuits (ASICs), hard disk drives, etc. (HDD), hybrid hard drive (HHD), optical disk, optical disk drive (ODD), magneto-optical disk, magneto-optical drive, floppy diskette, floppy disk drive (FDD), magnetic tape, solid-state drive (SSD) , a RAM drive, a secure digital card, or a drive, any other suitable storage medium, or a suitable combination of two or more thereof. Storage media may be volatile, non-volatile, or a combination of volatile and non-volatile, where appropriate. Note that the storage medium is not limited to these examples, and may be any device or medium as long as it can store the program. Further, the storage medium may or may not be expressed as a memory.

また、本開示のプログラムは、当該プログラムを伝送可能な任意の伝送媒体（通信ネットワークや放送波等）を介して、サーバ１００及び／又は端末２００に提供されてもよいし、されなくてもよい。プログラムが伝送媒体を介して提供される場合、サーバ１００及び／又は端末２００は、例えば、インターネット等を介してダウンロードしたプログラムを実行することにより、各実施形態に示す複数の機能部の機能を実現することが可能である。 Further, the program of the present disclosure may or may not be provided to the server 100 and/or the terminal 200 via any transmission medium (communication network, broadcast waves, etc.) that can transmit the program. . When the program is provided via a transmission medium, the server 100 and/or the terminal 200 realizes the functions of the plurality of functional units shown in each embodiment by, for example, executing the program downloaded via the Internet or the like. It is possible to do so.

また、本開示に係る実施形態は、プログラムが電子的な伝送によって具現化された、搬送波に埋め込まれたデータ信号の形態でも実現され得る。また、サーバ１００及び／又は端末２００における処理の少なくとも一部は、１以上のコンピュータにより構成されるクラウドコンピューティングにより実現されていてもよいし、そうでなくてもよい。また、端末２００における処理の少なくとも一部を、サーバ１００により行う構成としてもよいし、そうでなくてもよい。この場合、端末２００の端末制御部２３０の各機能部の処理のうち少なくとも一部の処理を、サーバ１００で行う構成としてもよいし、そうでなくてもよい。また、サーバ１００における処理の少なくとも一部を、端末２００により行う構成としてもよいし、そうでなくてもよい。この場合、サーバ１００のサーバ制御部１３０の各機能部の処理のうち少なくとも一部の処理を、端末２００で行う構成としてもよいし、そうでなくてもよい。 Embodiments according to the present disclosure may also be implemented in the form of a data signal embedded in a carrier wave, where the program is embodied by electronic transmission. Furthermore, at least part of the processing in the server 100 and/or the terminal 200 may or may not be realized by cloud computing configured by one or more computers. Further, at least a part of the processing in the terminal 200 may be performed by the server 100, or may not be configured. In this case, at least part of the processing of each functional unit of the terminal control unit 230 of the terminal 200 may or may not be performed by the server 100. Further, at least a part of the processing in the server 100 may be performed by the terminal 200, or may not be performed. In this case, at least some of the processing of each functional unit of the server control unit 130 of the server 100 may or may not be performed by the terminal 200.

なお、本開示のプログラムは、例えば、ＡｃｔｉｏｎＳｃｒｉｐｔ、ＪａｖａＳｃｒｉｐｔ（登録商標）等のスクリプト言語、Ｏｂｊｅｃｔｉｖｅ－Ｃ、Ｊａｖａ（登録商標）等のオブジェクト指向プログラミング言語、ＨＴＭＬ５等のマークアップ言語等を用いて実装され得る。 Note that the program of the present disclosure may be implemented using, for example, a script language such as ActionScript or JavaScript (registered trademark), an object-oriented programming language such as Objective-C or Java (registered trademark), or a markup language such as HTML5. obtain.

この実施の形態に係る通信システム４００においては、図３に示すように、サーバ１００のサーバ記憶部１２０は、音源データ記憶部１２１及び歌詞データ記憶部１２２を含んでいる。音源データ記憶部１２１は、楽曲の音源データを記憶している。音源データは、楽曲の伴奏パート及びメロディパートが混合された波形データからなるデータである。メロディパートは主旋律のパートであり、通常は人が歌唱するボーカルパートである。音源データには、ＷＡＶＥ、ＡＩＦＦ、ＭＰ３、ＡＡＣ、ＦＬＡＣ等の各種の音声ファイル形式を用いることができる。 In the communication system 400 according to this embodiment, as shown in FIG. 3, the server storage section 120 of the server 100 includes a sound source data storage section 121 and a lyrics data storage section 122. The sound source data storage unit 121 stores sound source data of songs. The sound source data is data consisting of waveform data in which an accompaniment part and a melody part of a song are mixed. The melody part is the main melody part, and is usually a vocal part sung by a person. Various audio file formats such as WAVE, AIFF, MP3, AAC, and FLAC can be used for the sound source data.

歌詞データ記憶部１２２は、楽曲の歌詞データを記憶している。歌詞データは、楽曲のメロディパートの歌詞のデータである。ここで説明する構成例では、歌詞データは同期歌詞データである。同期歌詞データには、複数の歌詞フレーズが含まれている。そして、同期歌詞データにおいては、それぞれの歌詞フレーズ毎に当該楽曲の音源データの再生と同期して表示するため表示開始時間が対応付けられている。 The lyrics data storage unit 122 stores lyrics data of songs. The lyrics data is data of lyrics of a melody part of a song. In the configuration example described here, the lyrics data is synchronous lyrics data. The synchronized lyrics data includes multiple lyrics phrases. In the synchronized lyrics data, each lyric phrase is associated with a display start time in order to be displayed in synchronization with the reproduction of the sound source data of the song.

このような同期歌詞データとして、例えばＬＲＣファイルを用いることができる。同期歌詞データであるＬＲＣファイルの一例を図４に示す。ＬＲＣファイルは、特定の書式に従ったテキスト形式のファイルである。ＬＲＣファイルの書式では、各行が一度に表示する歌詞フレーズになっている。そして、各行の先頭の”［”及び”］”で囲われた数字が当該行の歌詞フレーズの表示開始時間である。表示開始時間は、ｍｍ：ｓｓ．ｘｘの書式で記載されている。ｍｍは分、ｓｓは秒、ｘｘは１／１００秒を示している。 For example, an LRC file can be used as such synchronized lyrics data. An example of an LRC file that is synchronized lyrics data is shown in FIG. The LRC file is a text file that follows a specific format. In the LRC file format, each line is a lyric phrase that is displayed at once. The number surrounded by "[" and "]" at the beginning of each line is the display start time of the lyrics phrase in that line. The display start time is mm:ss. It is written in xx format. mm indicates minutes, ss indicates seconds, and xx indicates 1/100 seconds.

この実施の形態に係る通信システム４００においては、サーバ１００は、サーバ制御部１３０により実現される機能として、音源データ取得部１３１、歌詞データ取得部１３２、歌唱部分特定部１３３及びリファレンスデータ生成部１３４を備えている。音源データ取得部１３１は、音源データ記憶部１２１に記憶されている音源データを取得する処理を行う。歌詞データ取得部１３２は、歌詞データ記憶部１２２に記憶されている歌詞データを取得する処理を行う。 In the communication system 400 according to this embodiment, the server 100 includes, as functions realized by the server control unit 130, a sound source data acquisition unit 131, a lyrics data acquisition unit 132, a singing part identification unit 133, and a reference data generation unit 134. It is equipped with The sound source data acquisition unit 131 performs a process of acquiring sound source data stored in the sound source data storage unit 121. The lyrics data acquisition unit 132 performs a process of acquiring lyrics data stored in the lyrics data storage unit 122.

歌唱部分特定部１３３は、歌詞データ取得部１３２により取得された歌詞データに基づいて、音源データ取得部１３１により取得された音源データにおける歌唱部分を特定する。ここで説明する構成例では、歌詞データ取得部１３２が取得する歌詞データは、同期歌詞データである。そして、歌唱部分特定部１３３は、同期歌詞データに含まれる各歌詞フレーズの表示開始時間に基づいて、音源データにおける歌唱部分を特定する。 The singing part identifying unit 133 identifies the singing part in the sound source data acquired by the sound source data acquiring unit 131 based on the lyrics data acquired by the lyrics data acquiring unit 132. In the configuration example described here, the lyrics data acquired by the lyrics data acquisition unit 132 is synchronous lyrics data. Then, the singing part identifying unit 133 identifies the singing part in the sound source data based on the display start time of each lyric phrase included in the synchronized lyrics data.

この歌唱部分特定部１３３による歌唱部分の特定について、具体例を挙げながら説明する。例えば、図４に示す同期歌詞データの場合、楽曲が開始してから１１．１４秒が経過するまでは歌詞がない前奏である。そして、１１．１４秒から１分１０．６８秒までは、１番の歌唱部分である。また、１分１０．６８秒から１分３４．１６秒までは歌詞がない間奏である。１分３４．１６秒から２分１６．７１秒までは、２番の歌唱部分である。そして、２分１６．７１秒以降は後奏である。前奏、間奏及び後奏は非歌唱部分である。 The identification of a singing part by the singing part specifying unit 133 will be explained using a specific example. For example, in the case of the synchronized lyrics data shown in FIG. 4, there is no lyrics until 11.14 seconds have elapsed from the start of the song as a prelude. The period from 11.14 seconds to 1 minute 10.68 seconds is the first singing part. Also, from 1 minute 10.68 seconds to 1 minute 34.16 seconds is an interlude with no lyrics. The second part is sung from 1 minute 34.16 seconds to 2 minutes 16.71 seconds. Then, from 2 minutes 16.71 seconds onwards, there is a postlude. The prelude, interlude, and postlude are non-sung parts.

リファレンスデータ生成部１３４は、音源データのメロディパートのリファレンスデータを生成する。リファレンスデータは、メロディパートの音の要素に関するデータである。リファレンスデータの対象となる音の要素には、音高（ピッチ）、発音タイミング、音長等が含まれる。 The reference data generation unit 134 generates reference data of the melody part of the sound source data. The reference data is data regarding the sound elements of the melody part. Sound elements that are the subject of reference data include pitch, pronunciation timing, note length, and the like.

リファレンスデータ生成部１３４は、まず、音源データから人の歌唱音声データを抽出する。音源データからの歌唱音声データの抽出は、例えば、音源データがステレオ音源であれば人の歌唱音声（ボーカルパート）は中央に定位している場合が多いことを利用したり、人の歌唱音声の周波数成分を抽出したり、人の声でない楽器等の音色の波形成分を除去したり等の既知の方法を用いて行うことができる。そして、リファレンスデータ生成部１３４は、例えば、抽出した歌唱音声データについて既知のピッチ検出アルゴリズムを利用してピッチを検出し、リファレンスデータとして出力する。 The reference data generation unit 134 first extracts human singing voice data from the sound source data. To extract singing audio data from sound source data, for example, if the sound source data is a stereo sound source, the human singing audio (vocal part) is often localized in the center. This can be done using known methods such as extracting frequency components or removing waveform components of timbres such as musical instruments that are not human voices. Then, the reference data generation unit 134 detects the pitch of the extracted singing voice data using a known pitch detection algorithm, for example, and outputs the detected pitch as reference data.

特に、この実施の形態に係るサーバ１００においては、リファレンスデータ生成部１３４は、歌唱部分特定部１３３により特定された歌唱部分の音源データを用いて、メロディパートのリファレンスデータを生成する。逆にいえば、リファレンスデータ生成部１３４は、当該楽曲の音源データから非歌唱部分を除外した上で、メロディパートのリファレンスデータを生成する。 In particular, in the server 100 according to this embodiment, the reference data generation unit 134 uses the sound source data of the singing part specified by the singing part identification unit 133 to generate reference data of the melody part. In other words, the reference data generation unit 134 generates reference data for the melody part after excluding the non-sung parts from the sound source data of the song.

前述した図４に示す同期歌詞データの例では、リファレンスデータ生成部１３４は、音源データのうちで、１１．１４秒から１分１０．６８秒までの１番の歌唱部分と、１分３４．１６秒から２分１６．７１秒までの２番の歌唱部分とを用いて、メロディパートのリファレンスデータを生成する。したがって、リファレンスデータ生成部１３４は、当該楽曲の音源データのうちで１１．１４秒から１分１０．６８秒までと１分３４．１６秒から２分１６．７１秒までの部分を用いて、メロディパートのリファレンスデータを生成する。これは、楽曲が開始してから１１．１４秒までと１分１０．６８秒から１分３４．１６秒までと２分１６．７１秒以降の非歌唱部分を除外した上で、メロディパートのリファレンスデータを生成すると言い換えることができる。 In the example of the synchronized lyrics data shown in FIG. 4 described above, the reference data generation unit 134 generates the first singing part from 11.14 seconds to 1 minute 10.68 seconds and the first singing part from 1 minute 34 seconds out of the sound source data. Reference data for the melody part is generated using the second singing part from 16 seconds to 2 minutes and 16.71 seconds. Therefore, the reference data generation unit 134 uses the portions from 11.14 seconds to 1 minute 10.68 seconds and from 1 minute 34.16 seconds to 2 minutes 16.71 seconds of the sound source data of the song, Generate reference data for the melody part. This excludes the non-sung parts from 11.14 seconds after the song starts, from 1 minute 10.68 seconds to 1 minute 34.16 seconds, and after 2 minutes 16.71 seconds, and then calculates the melody part. In other words, it generates reference data.

図３に示すように、サーバ記憶部１２０は、リファレンスデータ記憶部１２３をさらに含んでいる。リファレンスデータ記憶部１２３は、リファレンスデータ生成部１３４により生成されたそれぞれの楽曲のリファレンスデータを記憶する。 As shown in FIG. 3, the server storage section 120 further includes a reference data storage section 123. The reference data storage unit 123 stores reference data for each song generated by the reference data generation unit 134.

また、図３に示す構成例では、サーバ１００は、サーバ制御部１３０により実現される機能として、伴奏パート取得部１３５をさらに備えている。伴奏パート取得部１３５は、音源データ取得部１３１により取得された音源データについて、いわゆるボーカルキャンセル処理を施し、当該楽曲の伴奏パートを抽出して取得する。ボーカルキャンセル処理には既知の方法が利用できる。伴奏パート取得部１３５により取得された伴奏パートのデータは、いわゆるカラオケ音源のデータとして利用できる。 In the configuration example shown in FIG. 3, the server 100 further includes an accompaniment part acquisition section 135 as a function realized by the server control section 130. The accompaniment part acquisition unit 135 performs so-called vocal cancellation processing on the sound source data acquired by the sound source data acquisition unit 131, and extracts and acquires the accompaniment part of the music piece. Known methods can be used for vocal cancellation processing. The accompaniment part data acquired by the accompaniment part acquisition unit 135 can be used as so-called karaoke sound source data.

サーバ１００のサーバ送信部１１１は、音源データ記憶部１２１に記憶されている楽曲の音源データを端末２００に送信する。これにより、楽曲の音源データの配信サービスが実現される。また、サーバ送信部１１１は、歌詞データ記憶部１２２に記憶されている楽曲の歌詞データを端末２００に送信してもよい。さらに、サーバ送信部１１１は、伴奏パート取得部１３５により取得された伴奏パートのデータを端末２００に送信してもよい。 The server transmission section 111 of the server 100 transmits the sound source data of the music stored in the sound source data storage section 121 to the terminal 200. As a result, a distribution service for music source data is realized. Further, the server transmitting unit 111 may transmit the lyrics data of the song stored in the lyrics data storage unit 122 to the terminal 200. Further, the server transmitter 111 may transmit the accompaniment part data acquired by the accompaniment part acquirer 135 to the terminal 200.

端末２００の端末受信部２１２は、サーバ１００から送信された楽曲の音源データ、歌詞データ及び伴奏パートのデータを受信する。図２に示す構成例では、端末２００は、端末制御部２３０により実現される機能として、再生処理部２３１、表示処理部２３２及び歌唱者音声取得部２３３を備えている。再生処理部２３１は、端末受信部２１２により受信された楽曲の音源データを再生し、スピーカ２７０等から出力させる。 The terminal receiving unit 212 of the terminal 200 receives the sound source data, lyrics data, and accompaniment part data of the song transmitted from the server 100. In the configuration example shown in FIG. 2, the terminal 200 includes a reproduction processing section 231, a display processing section 232, and a singer voice acquisition section 233 as functions realized by the terminal control section 230. The reproduction processing unit 231 reproduces the sound source data of the music piece received by the terminal reception unit 212, and outputs it from the speaker 270 or the like.

また、再生処理部２３１は、端末受信部２１２により受信された楽曲の伴奏パートのデータを再生し、スピーカ２７０等から出力させる。この際、表示処理部２３２は、端末受信部２１２により受信された楽曲の歌詞データを表示部２５０に表示させる。前述したように、ここで説明する構成例では歌詞データは同期歌詞データである。このため、表示処理部２３２は、再生処理部２３１による楽曲の伴奏パートのデータの再生と同期させて、歌詞データを表示させることができる。歌唱者音声取得部２３３は、再生処理部２３１による楽曲の伴奏パートのデータの再生中において、マイク２６０に入力された音声を歌唱者の歌唱音声データとして取得する。 Furthermore, the reproduction processing unit 231 reproduces the data of the accompaniment part of the music received by the terminal reception unit 212, and causes the data to be output from the speaker 270 or the like. At this time, the display processing section 232 causes the display section 250 to display the lyrics data of the song received by the terminal receiving section 212. As mentioned above, in the configuration example described here, the lyrics data is synchronous lyrics data. Therefore, the display processing unit 232 can display the lyrics data in synchronization with the playback of the accompaniment part data of the song by the playback processing unit 231. The singer voice acquisition unit 233 acquires the voice input to the microphone 260 as the singer's singing voice data while the reproduction processing unit 231 is playing back data of the accompaniment part of the song.

端末送信部２１１は、歌唱者音声取得部２３３により取得された歌唱音声データをサーバ１００に送信する。サーバ１００のサーバ受信部１１２は、端末２００から送信された歌唱者の歌唱音声データを受信する。 The terminal transmitter 211 transmits the singing voice data acquired by the singer voice acquirer 233 to the server 100. The server reception unit 112 of the server 100 receives the singer's singing voice data transmitted from the terminal 200.

図３に示す構成例では、サーバ１００は、サーバ制御部１３０により実現される機能として、歌唱音声取得部１３６及び評価部１３７をさらに備えている。歌唱音声取得部は、サーバ受信部１１２が受信した歌唱音声データを取得する。そして、サーバ１００の評価部１３７は、歌唱音声取得部１３６により取得された歌唱者の歌唱音声データを、リファレンスデータ記憶部１２３に記憶されているリファレンスデータと比較して評価する。例えば、評価部１３７は、前述した既知のピッチ検出アルゴリズムを利用して歌唱者の歌唱音声データのピッチを検出する。そして、評価部１３７は、リファレンスデータの基準ピッチと比較し、これらのピッチの一致度が高いほどよい評価をする。評価部１３７による評価結果は、サーバ送信部１１１及び端末受信部２１２を介して端末２００に送信され、例えば表示部２５０に表示される。 In the configuration example shown in FIG. 3, the server 100 further includes a singing voice acquisition section 136 and an evaluation section 137 as functions realized by the server control section 130. The singing voice acquisition unit acquires the singing voice data received by the server receiving unit 112. The evaluation unit 137 of the server 100 compares and evaluates the singing voice data of the singer acquired by the singing voice acquisition unit 136 with the reference data stored in the reference data storage unit 123. For example, the evaluation unit 137 detects the pitch of the singer's singing voice data using the known pitch detection algorithm described above. The evaluation unit 137 then compares the pitch with the standard pitch of the reference data, and evaluates the higher the degree of agreement between these pitches. The evaluation result by the evaluation section 137 is transmitted to the terminal 200 via the server transmission section 111 and the terminal reception section 212, and displayed on the display section 250, for example.

また、リファレンスデータをサーバ１００から端末２００に送信してもよい。この場合、例えば、端末２００においてカラオケ音源である伴奏パートのデータを再生中に、リファレンスデータを用いて表示部２５０にいわゆるガイドメロディを表示させることができる。また、評価部１３７をサーバ１００でなく端末２００に設けるようにしてもよい。 Further, reference data may be transmitted from the server 100 to the terminal 200. In this case, for example, a so-called guide melody can be displayed on the display unit 250 using the reference data while the data of the accompaniment part, which is a karaoke sound source, is being played back on the terminal 200. Furthermore, the evaluation unit 137 may be provided in the terminal 200 instead of the server 100.

次に、図５のフロー図を参照しながら、この実施の形態に係る通信システム４００の動作例について説明する。まず、ステップＳ１０において、サーバ１００の音源データ取得部１３１は、音源データ記憶部１２１に記憶されている音源データを取得する。続くステップＳ１１において、歌詞データ取得部１３２は、歌詞データ記憶部１２２に記憶されている歌詞データを取得する。ステップＳ１１の後、サーバ制御部１３０は次にステップＳ１２の処理を行う。 Next, an example of the operation of the communication system 400 according to this embodiment will be described with reference to the flowchart in FIG. 5. First, in step S10, the sound source data acquisition unit 131 of the server 100 acquires sound source data stored in the sound source data storage unit 121. In subsequent step S11, the lyrics data acquisition section 132 acquires the lyrics data stored in the lyrics data storage section 122. After step S11, the server control unit 130 next performs the process of step S12.

ステップＳ１２において、歌唱部分特定部１３３は、ステップＳ１１で取得された歌詞データに基づいて、ステップＳ１０で取得された音源データにおける歌唱部分を特定する。続くステップＳ１３において、リファレンスデータ生成部１３４は、ステップＳ１２で特定された歌唱部分の音源データを用いて、メロディパートのリファレンスデータを生成する。ステップＳ１３の後、サーバ１００は次にステップＳ１４の処理を行う。 In step S12, the singing part identifying unit 133 identifies the singing part in the sound source data acquired in step S10, based on the lyrics data acquired in step S11. In subsequent step S13, the reference data generation unit 134 generates reference data of the melody part using the sound source data of the singing part identified in step S12. After step S13, the server 100 next performs the process of step S14.

ステップＳ１４において、サーバ１００のリファレンスデータ記憶部１２３は、ステップＳ１３で生成されたリファレンスデータを記憶する。また、伴奏パート取得部１３５は、ステップＳ１０で取得された音源データから伴奏パートを抽出して取得する。そして、サーバ送信部１１１は、音源データ、歌詞データ及び伴奏パートのデータを端末２００に送信する。ステップＳ１４の後、端末２００は次にステップＳ１５の処理を行う。 In step S14, the reference data storage unit 123 of the server 100 stores the reference data generated in step S13. Further, the accompaniment part acquisition unit 135 extracts and acquires an accompaniment part from the sound source data acquired in step S10. Then, the server transmitter 111 transmits the sound source data, lyrics data, and accompaniment part data to the terminal 200. After step S14, the terminal 200 next performs the process of step S15.

ステップＳ１５においては、端末２００の端末受信部２１２は、ステップＳ１４でサーバ１００から送信された楽曲の音源データ、歌詞データ及び伴奏パートのデータを受信する。そして、端末２００の再生処理部２３１は、端末受信部２１２により受信された楽曲の伴奏パートのデータを再生し、スピーカ２７０等から出力させる。また、端末２００の表示処理部２３２は、端末受信部２１２により受信された楽曲の歌詞データを表示部２５０に表示させる。ステップＳ１５の後、端末２００は次にステップＳ１６の処理を行う。 In step S15, the terminal receiving unit 212 of the terminal 200 receives the sound source data, lyrics data, and accompaniment part data of the song transmitted from the server 100 in step S14. Then, the reproduction processing unit 231 of the terminal 200 reproduces the data of the accompaniment part of the music received by the terminal reception unit 212, and causes the data to be output from the speaker 270 or the like. Furthermore, the display processing unit 232 of the terminal 200 causes the display unit 250 to display the lyrics data of the song received by the terminal receiving unit 212. After step S15, the terminal 200 next performs the process of step S16.

ステップＳ１６においては、端末２００の歌唱音声取得部１３６は、マイク２６０に入力された歌唱者の歌唱音声データを取得する。続くステップＳ１７において、端末送信部２１１は、ステップＳ１６で取得された歌唱音声データをサーバ１００に送信する。サーバ１００のサーバ受信部１１２は、端末２００から送信された歌唱者の歌唱音声データを受信する。ステップＳ１７の後、サーバ１００は次にステップＳ１８の処理を行う。 In step S16, the singing voice acquisition unit 136 of the terminal 200 acquires the singing voice data of the singer input into the microphone 260. In subsequent step S17, the terminal transmitter 211 transmits the singing voice data acquired in step S16 to the server 100. The server reception unit 112 of the server 100 receives the singer's singing voice data transmitted from the terminal 200. After step S17, the server 100 next performs the process of step S18.

ステップＳ１８においては、サーバ１００の歌唱音声取得部１３６は、ステップＳ１６でサーバ受信部１１２が受信した歌唱音声データを取得する。続くステップＳ１９において、サーバ１００の評価部１３７は、ステップＳ１８で取得された歌唱者の歌唱音声データを、ステップＳ１４でリファレンスデータ記憶部１２３に記憶されたリファレンスデータと比較して評価する。さらに続くステップＳ２０において、評価部１３７による評価結果は、サーバ送信部１１１及び端末受信部２１２を介して端末２００に送信され、例えば表示部２５０に表示される。 In step S18, the singing voice acquisition unit 136 of the server 100 acquires the singing voice data received by the server receiving unit 112 in step S16. In subsequent step S19, the evaluation unit 137 of the server 100 compares and evaluates the singer's singing voice data acquired in step S18 with the reference data stored in the reference data storage unit 123 in step S14. In the subsequent step S20, the evaluation result by the evaluation unit 137 is transmitted to the terminal 200 via the server transmission unit 111 and the terminal reception unit 212, and displayed on the display unit 250, for example.

以上のフローにおいて、ステップＳ１０は音源データ取得ステップに、ステップＳ１１は歌詞データ取得ステップに、ステップＳ１２は歌唱部分特定ステップに、ステップＳ１３はリファレンスデータ生成ステップに、それぞれ相当する。そして、ステップＳ１８は歌唱音声取得ステップに、ステップＳ１９は評価ステップに、それぞれ相当する。 In the above flow, step S10 corresponds to a sound source data acquisition step, step S11 corresponds to a lyrics data acquisition step, step S12 corresponds to a singing part identification step, and step S13 corresponds to a reference data generation step. Step S18 corresponds to a singing voice acquisition step, and step S19 corresponds to an evaluation step.

以上のように構成された通信システム４００が有するサーバ１００によれば、歌詞データを用いて音源データにおける歌唱部分を特定し、特定された歌唱部分の音源データを用いて、メロディパートのリファレンスデータを生成する。このため、歌唱部分以外で人の声が使われている演出部分や、楽器音であるが抽出精度等の問題で誤って抽出されてしまった音等がリファレンスデータに反映されてしまうことを抑制でき、歌唱を採点する際の基準として、より適切であるリファレンスデータを生成することができる。 According to the server 100 of the communication system 400 configured as described above, the singing part in the sound source data is specified using the lyrics data, and the reference data of the melody part is generated using the sound source data of the specified singing part. generate. This prevents production parts where human voices are used other than singing parts, and sounds that are incorrectly extracted due to extraction accuracy etc. to be reflected in the reference data. It is possible to generate reference data that is more appropriate as a standard for scoring singing.

次に、この実施の形態に係る通信システム４００が有するサーバ１００の変形例について説明する。この変形例では、サーバ１００の歌唱部分特定部１３３は、図６に示すように、歌唱音声抽出部１４１、テキスト化部１４２及び照合部１４３を備えている。歌唱音声抽出部１４１は、音源データ取得部１３１により取得された音源データから人の歌唱音声データを抽出する。音源データからの人の歌唱音声データの抽出は、前述した既知の方法を利用して行うことができる。 Next, a modification of the server 100 included in the communication system 400 according to this embodiment will be described. In this modification, the singing part identifying section 133 of the server 100 includes a singing voice extracting section 141, a text converting section 142, and a collating section 143, as shown in FIG. The singing voice extraction unit 141 extracts human singing voice data from the sound source data acquired by the sound source data acquisition unit 131. Extraction of human singing voice data from sound source data can be performed using the known method described above.

テキスト化部１４２は、歌唱音声抽出部１４１で抽出された歌唱音声データを既知の音声認識処理によってテキスト化して歌唱テキストデータを生成する。照合部１４３は、テキスト化部１４２により生成された歌唱テキストデータと歌詞データ取得部１３２により取得された歌詞データとを照合して音源データにおける歌唱部分を特定する。すなわち、照合部１４３は、歌唱テキストデータと歌詞データとが一致する部分を歌唱部分として特定する。また、逆にいえば、照合部１４３は、歌唱テキストデータと歌詞データとが一致しない部分を非歌唱部分として特定する。 The text converting unit 142 converts the singing voice data extracted by the singing voice extracting unit 141 into text using known voice recognition processing to generate singing text data. The collation unit 143 collates the singing text data generated by the text conversion unit 142 and the lyrics data acquired by the lyrics data acquisition unit 132 to identify the singing portion in the sound source data. That is, the matching unit 143 identifies a portion where the singing text data and lyrics data match as a singing portion. In other words, the matching unit 143 identifies a portion where the singing text data and lyrics data do not match as a non-singing portion.

次に、図７のフロー図を参照しながら、この変形例に係るサーバ１００の動作例について説明する。同図のフロー図は、図５のフロー図におけるステップＳ１２（歌唱部分特定ステップ）のサブプロセスを示している。まず、ステップＳ３０において、歌唱音声抽出部１４１は、図５のステップＳ１０で取得された音源データから人の歌唱音声データを抽出し、抽出された歌唱音声データを取得する。続くステップＳ３１において、テキスト化部１４２は、ステップＳ３０で取得された歌唱音声データを音声認識処理によってテキスト化して歌唱テキストデータを生成し、生成された歌唱テキストデータを取得する。さらに続くステップＳ３２において、照合部１４３は、ステップＳ３１で取得された歌唱テキストデータと図５のステップＳ１１で取得された歌詞データとを照合して音源データにおける歌唱部分を特定する。 Next, an example of the operation of the server 100 according to this modification will be described with reference to the flowchart in FIG. 7. The flowchart in FIG. 5 shows the sub-process of step S12 (singing part identification step) in the flowchart in FIG. First, in step S30, the singing voice extraction unit 141 extracts human singing voice data from the sound source data acquired in step S10 of FIG. 5, and obtains the extracted singing voice data. In subsequent step S31, the text converting unit 142 converts the singing voice data acquired in step S30 into text by voice recognition processing to generate singing text data, and acquires the generated singing text data. In further subsequent step S32, the collation unit 143 collates the singing text data acquired in step S31 with the lyrics data acquired in step S11 of FIG. 5 to identify the singing part in the sound source data.

以上のフローにおいて、ステップＳ３０は歌唱音声抽出ステップに、ステップＳ３１はテキスト化ステップに、ステップＳ３２は照合ステップに、それぞれ相当する。 In the above flow, step S30 corresponds to a singing voice extraction step, step S31 corresponds to a text conversion step, and step S32 corresponds to a collation step.

この変形例では、歌詞データに歌詞フレーズ毎の表示開始時間が含まれていなくともよい。したがって、このような変形例によれば、同期歌詞データでない歌詞データを用いて、音源データの歌唱部分／非歌唱部分を特定し、非歌唱部分の音がリファレンスデータに反映されてしまうことを抑制でき、歌唱を採点する際の基準として、より適切であるリファレンスデータを生成することができる。 In this modification, the lyrics data does not need to include the display start time for each lyrics phrase. Therefore, according to such a modification, the singing part/non-singing part of the sound source data is identified using lyrics data that is not synchronous lyrics data, and the sound of the non-singing part is suppressed from being reflected in the reference data. It is possible to generate reference data that is more appropriate as a standard for scoring singing.

実施の形態２．
図８及び図９を参照しながら、本開示の実施の形態２について説明する。図８は通信システムが備えるサーバの構成を示すブロック図である。図９は通信システムにおける処理の一例を示すフローチャートである。 Embodiment 2.
Embodiment 2 of the present disclosure will be described with reference to FIGS. 8 and 9. FIG. 8 is a block diagram showing the configuration of a server included in the communication system. FIG. 9 is a flowchart showing an example of processing in the communication system.

以下、この実施の形態２に係るプログラム、情報処理方法及び情報処理装置について、実施の形態１との相違点を中心に説明する。説明を省略した構成については実施の形態１と基本的に同様である。以降の説明においては、実施の形態１と同様の又は対応する構成について、原則として実施の形態１の説明で用いたものと同じ符号を付して記載する。 The program, information processing method, and information processing apparatus according to the second embodiment will be described below, focusing on the differences from the first embodiment. The configuration whose description is omitted is basically the same as that of the first embodiment. In the following description, structures similar to or corresponding to those of the first embodiment will be described with the same reference numerals used in the description of the first embodiment.

この実施の形態に係る通信システム４００においては、図８に示すように、サーバ１００は、サーバ制御部１３０により実現される機能として、歌唱音声分離抽出部１５０をさらに備えている。歌唱音声分離抽出部１５０は、音源データ取得部１３１により取得された音源データから複数の人の歌唱音声データのそれぞれを分離して抽出する。歌唱音声分離抽出部１５０は、例えば、まず音源データから人の歌唱音声データを抽出する。音源データからの人の歌唱音声データの抽出は、前述した既知の方法を利用して行うことができる。そして、歌唱音声分離抽出部１５０は、抽出した歌唱音声データに複数の人の歌唱音声データが含まれている場合には、歌唱音声データを人毎に分離する。この際の分離は、歌唱音声データの波形分析等の既知の方法により行うことができる。なお、歌唱音声分離抽出部１５０は、音源データから直接に複数の人の歌唱音声データのそれぞれを分離してもよい。 In the communication system 400 according to this embodiment, as shown in FIG. 8, the server 100 further includes a singing voice separation and extraction section 150 as a function realized by the server control section 130. The singing voice separation and extraction section 150 separates and extracts each of the singing voice data of a plurality of people from the sound source data acquired by the sound source data acquisition section 131. For example, the singing voice separation and extraction unit 150 first extracts human singing voice data from the sound source data. Extraction of human singing voice data from sound source data can be performed using the known method described above. Then, if the extracted singing voice data includes singing voice data of a plurality of people, the singing voice separation and extraction unit 150 separates the singing voice data for each person. The separation at this time can be performed by a known method such as waveform analysis of singing voice data. Note that the singing voice separation and extraction unit 150 may separate each of the singing voice data of a plurality of people directly from the sound source data.

この実施の形態においては、歌唱部分特定部１３３は、歌唱音声分離抽出部１５０で抽出された複数の人の歌唱音声データのうちの特定の人の歌唱音声データに基づいて、音源データにおける歌唱部分を特定する。ここでいう特定の人の歌唱音声データとは、例えば、メインボーカルの歌唱音声データである。歌唱部分特定部１３３は、例えば、複数の人の歌唱音声データのそれぞれの定位、音量、当該楽曲の再生時間に占める割合等を用いて、特定の人の歌唱音声データを決定する。また、当該楽曲のアーティスト名からメインボーカルの性別が判明している場合には、複数の人の歌唱音声データのそれぞれが男声か女声かに基づいて、特定の人の歌唱音声データを決定してもよい。 In this embodiment, the singing part specifying unit 133 determines the singing part in the sound source data based on the singing voice data of a specific person among the singing voice data of a plurality of people extracted by the singing voice separating and extracting unit 150. Identify. The specific person's singing voice data here is, for example, the singing voice data of a main vocalist. The singing part specifying unit 133 determines the singing audio data of a specific person using, for example, the localization, volume, and proportion of the singing audio data of a plurality of people in the playback time of the song. In addition, if the gender of the main vocalist is known from the artist name of the song, the singing audio data of a specific person is determined based on whether each of the singing audio data of multiple people has a male or female voice. Good too.

歌唱部分特定部１３３は、このようにして決定した特定の人の歌唱音声データの歌唱部分を特定し、これを当該楽曲の音源データの歌唱部分とする。そして、リファレンスデータ生成部１３４は、歌唱部分特定部１３３により特定された歌唱部分の音源データを用いて、メロディパートのリファレンスデータを生成する。 The singing part identifying unit 133 identifies the singing part of the singing voice data of the specific person determined in this way, and sets this as the singing part of the sound source data of the song. Then, the reference data generation unit 134 generates reference data of the melody part using the sound source data of the singing part specified by the singing part identification unit 133.

なお、特定の人の歌唱音声データは、１人のものに限られず、２人以上の歌唱音声データであってもよい。この場合、歌唱部分特定部１３３は、特定の複数人の歌唱音声データのそれぞれに基づいて、音源データにおける複数人の歌唱部分のそれぞれを特定する。そして、リファレンスデータ生成部１３４は、歌唱部分特定部１３３により特定された複数人の歌唱部分のそれぞれの音源データを用いて、メロディパートのリファレンスデータを生成する。このようにすることで、メロディパートを複数の歌唱者が交代しながら担当する曲等の場合に、メロディパートを担当する複数の歌唱者のそれぞれの担当部分について、別々にリファレンスデータを生成できる。したがって、例えば、メロディパートを担当する歌唱者のうちの任意の人のみをキャンセルしたカラオケ伴奏を再生し、当該人が歌唱を担当する部分だけをリファレンスデータを用いて採点することができる。 Note that the singing voice data of a specific person is not limited to one person's singing voice data, and may be singing voice data of two or more people. In this case, the singing part identification unit 133 identifies each of the singing parts of the plurality of people in the sound source data based on each of the singing voice data of the specific plurality of people. Then, the reference data generation unit 134 generates reference data of the melody part using the respective sound source data of the singing parts of the plurality of people specified by the singing part identification unit 133. By doing so, in the case of a song in which a plurality of singers alternately perform the melody part, reference data can be generated separately for each of the plurality of singers who perform the melody part. Therefore, for example, it is possible to reproduce a karaoke accompaniment in which only an arbitrary singer among the singers in charge of the melody part is canceled, and score only the part sung by the singer using the reference data.

次に、図９のフロー図を参照しながら、この実施の形態に係る通信システム４００の動作例について説明する。同図のフロー図におけるステップＳ１０及びＳ１１は、図５のフロー図におけるステップＳ１０及びＳ１１と同様であるため、その説明はここでは省略する。ステップＳ１１の後、サーバ制御部１３０は次にステップＳ４０の処理を行う。 Next, an example of the operation of the communication system 400 according to this embodiment will be described with reference to the flowchart of FIG. 9. Steps S10 and S11 in the flowchart of the same figure are the same as steps S10 and S11 in the flowchart of FIG. 5, so the explanation thereof will be omitted here. After step S11, the server control unit 130 next performs the process of step S40.

ステップＳ４０においては、歌唱音声分離抽出部１５０は、ステップＳ１０で取得された音源データから複数の人の歌唱音声データのそれぞれを分離して抽出する。このステップＳ４０は歌唱音声分離抽出ステップに相当する。ステップＳ４０の後、サーバ制御部１３０は次にステップＳ１２の処理を行う。ステップＳ１２においては、歌唱部分特定部１３３は、ステップＳ４０で抽出された複数の人の歌唱音声データのうちの特定の人の歌唱音声データに基づいて、音源データにおける歌唱部分を特定する。ステップＳ１２以降のステップＳ１３からＳ２０は、図５のフロー図におけるステップＳ１３からＳ２０と同様であるため、その説明はここでは省略する。 In step S40, the singing voice separation and extraction unit 150 separates and extracts each of the plurality of people's singing voice data from the sound source data acquired in step S10. This step S40 corresponds to a singing voice separation and extraction step. After step S40, the server control unit 130 next performs the process of step S12. In step S12, the singing part identifying unit 133 identifies the singing part in the sound source data based on the singing voice data of a specific person among the singing voice data of a plurality of people extracted in step S40. Steps S13 to S20 after step S12 are the same as steps S13 to S20 in the flowchart of FIG. 5, so the description thereof will be omitted here.

なお、この実施の形態においては、サーバ１００は、歌唱部分特定部１３３を備えていなくともよい。この場合、リファレンスデータ生成部１３４は、歌唱音声分離抽出部１５０で抽出された複数の人の歌唱音声データのうちの特定の人の歌唱音声データに基づいて、メロディパートのリファレンスデータを生成する。また、この場合、図９のフロー図におけるステップＳ１２の処理は行われず、ステップＳ４０の後、サーバ制御部１３０は次にステップＳ１３の処理を行う。そして、ステップＳ１３においては、リファレンスデータ生成部１３４は、ステップＳ４０で抽出された複数の人の歌唱音声データのうちの特定の人の歌唱音声データに基づいて、メロディパートのリファレンスデータを生成する。このような構成によっても、複数の人の歌唱音声データのうちの特定の人の歌唱音声データに基づいてリファレンスデータを生成することで、歌唱部分以外で人の声が使われている演出部分等がリファレンスデータに反映されてしまうことを抑制でき、歌唱を採点する際の基準として、より適切であるリファレンスデータを生成することができる。 Note that, in this embodiment, the server 100 does not need to include the singing part identifying section 133. In this case, the reference data generation unit 134 generates the reference data of the melody part based on the singing voice data of a specific person among the singing voice data of a plurality of people extracted by the singing voice separation and extraction unit 150. Further, in this case, the process of step S12 in the flow diagram of FIG. 9 is not performed, and after step S40, the server control unit 130 next performs the process of step S13. Then, in step S13, the reference data generation unit 134 generates reference data of the melody part based on the singing voice data of a specific person among the singing voice data of the plurality of people extracted in step S40. Even with such a configuration, reference data can be generated based on the singing voice data of a specific person among the singing voice data of multiple people, so that it is possible to generate reference data based on the singing voice data of a specific person among the singing voice data of multiple people. can be suppressed from being reflected in the reference data, and it is possible to generate reference data that is more appropriate as a standard when scoring singing.

また、以上で説明した各実施の形態の構成例では、音源データ取得部１３１、歌詞データ取得部１３２、歌唱部分特定部１３３（歌唱音声抽出部１４１、テキスト化部１４２及び照合部１４３を含む）、リファレンスデータ生成部１３４、伴奏パート取得部１３５、歌唱音声取得部１３６、評価部１３７及び歌唱音声分離抽出部１５０が単一のサーバ１００に設けられている。しかしながら、これらの各部は単一のサーバ１００でなく、複数のサーバ装置に分散して設けられていてもよい。また、複数の装置で協働してサーバ１００が有する各部の機能を実現してもよい。 In addition, in the configuration example of each embodiment described above, the sound source data acquisition unit 131, the lyrics data acquisition unit 132, and the singing part identification unit 133 (including the singing voice extraction unit 141, the text conversion unit 142, and the collation unit 143) , a reference data generation section 134, an accompaniment part acquisition section 135, a singing voice acquisition section 136, an evaluation section 137, and a singing voice separation/extraction section 150 are provided in a single server 100. However, each of these units may be provided not in the single server 100 but in a distributed manner in a plurality of server devices. Further, the functions of each part of the server 100 may be realized by cooperation of a plurality of devices.

本開示に係る発明を諸図面や実施例に基づき説明してきたが、当業者であれば本開示に基づき種々の変形や修正を行うことが容易であることに注意されたい。したがって、これらの変形や修正は本開示に係る発明の範囲に含まれることに留意されたい。例えば、各部、各手段、各ステップ等に含まれる機能等は論理的に矛盾しないように再配置可能であり、複数の手段やステップ等を１つに組み合わせたり、あるいは分割したりすることが可能である。また、以上で説明した実施形態に示す構成を適宜組み合わせることとしてもよい。 Although the invention according to the present disclosure has been explained based on the drawings and examples, it should be noted that those skilled in the art can easily make various changes and modifications based on the present disclosure. Therefore, it should be noted that these variations and modifications are included within the scope of the invention according to the present disclosure. For example, the functions included in each part, each means, each step, etc. can be rearranged so as not to be logically contradictory, and it is possible to combine multiple means or steps into one, or to divide them. It is. Furthermore, the configurations shown in the embodiments described above may be combined as appropriate.

本開示に係るプログラム、情報処理方法及び情報処理装置は、楽曲の音源データからリファレンスデータを生成するプログラム、情報処理方法及び情報処理装置として活用することができる。 A program, an information processing method, and an information processing device according to the present disclosure can be utilized as a program, an information processing method, and an information processing device that generate reference data from sound source data of a song.

１００サーバ
１１０サーバ通信部
１１１サーバ送信部
１１２サーバ受信部
１２０サーバ記憶部
１２１音源データ記憶部
１２２歌詞データ記憶部
１２３リファレンスデータ記憶部
１３０サーバ制御部
１３１音源データ取得部
１３２歌詞データ取得部
１３３歌唱部分特定部
１３４リファレンスデータ生成部
１３５伴奏パート取得部
１３６歌唱音声取得部
１３７評価部
１４１歌唱音声抽出部
１４２テキスト化部
１４３照合部
１５０歌唱音声分離抽出部
２００端末
２１０端末通信部
２１１端末送信部
２１２端末受信部
２２０端末記憶部
２３０端末制御部
２３１再生処理部
２３２表示処理部
２３３歌唱者音声取得部
２４０入出力部
２５０表示部
２６０マイク
２７０スピーカ
２８０カメラ
３００ネットワーク
４００通信システム 100 Server 110 Server communication section 111 Server transmission section 112 Server reception section 120 Server storage section 121 Sound source data storage section 122 Lyrics data storage section 123 Reference data storage section 130 Server control section 131 Sound source data acquisition section 132 Lyrics data acquisition section 133 Singing section Specification unit 134 Reference data generation unit 135 Accompaniment part acquisition unit 136 Singing voice acquisition unit 137 Evaluation unit 141 Singing voice extraction unit 142 Text conversion unit 143 Collation unit 150 Singing voice separation extraction unit 200 Terminal 210 Terminal communication unit 211 Terminal transmission unit 212 Terminal Receiving unit 220 Terminal storage unit 230 Terminal control unit 231 Playback processing unit 232 Display processing unit 233 Singer voice acquisition unit 240 Input/output unit 250 Display unit 260 Microphone 270 Speaker 280 Camera 300 Network 400 Communication system

Claims

A program to be executed on a computer of an information processing device,
a sound source data acquisition step of acquiring sound source data including a melody part of the song;
a singing voice separation and extraction step of separating and extracting first singing voice data of a first singer and second singing voice data of a second singer different from the first singer from the sound source data;
The first reference data of the part of the melody part sung by the first singer is generated based on the first singing voice data, and the first reference data of the part of the melody part sung by the first singer is generated based on the second singing voice data. A reference data generation step of generating second reference data of a portion sung by two singers is executed by the computer of the information processing device.

The program according to claim 1,
a singing voice acquisition step of acquiring singing voice data of a singer;
An evaluation step of comparing and evaluating the singing voice data acquired in the singing voice acquisition step with either the first reference data or the second reference data is further executed by the computer of the information processing device.

The program according to claim 1,
First accompaniment data in which a portion of the melody part sung by the first singer is canceled from the sound source data, and a portion of the melody part sung by the second singer is canceled from the sound source data. The computer of the information processing device further executes the step of generating the second accompaniment data.

4. The program according to claim 3,
a singing voice acquisition step of acquiring singing voice data of a singer;
When the first accompaniment data is played back, the singing sound data acquired in the singing sound acquisition step is evaluated by comparing it with the first reference data, and when the second accompaniment data is played back, the singing sound data is evaluated by comparing it with the first reference data. An evaluation step of comparing and evaluating the singing voice data acquired in the acquisition step with the second reference data is further executed by the computer of the information processing device.

An information processing method executed by a computer of an information processing device, the method comprising:
a sound source data acquisition step of acquiring sound source data including a melody part of the song;
a singing voice separation and extraction step of separating and extracting first singing voice data of a first singer and second singing voice data of a second singer different from the first singer from the sound source data;
The first reference data of the part of the melody part sung by the first singer is generated based on the first singing voice data, and the first reference data of the part of the melody part sung by the first singer is generated based on the second singing voice data. and a reference data generation step of generating second reference data of a portion sung by two singers.

An information processing device,
a sound source data acquisition unit that acquires sound source data including a melody part of a song;
a singing voice separation extraction unit that separates and extracts first singing voice data of a first singer and second singing voice data of a second singer different from the first singer from the sound source data;
The first reference data of the part of the melody part sung by the first singer is generated based on the first singing voice data, and the first reference data of the part of the melody part sung by the first singer is generated based on the second singing voice data. and a reference data generation unit that generates second reference data of a portion sung by two singers.