JP2017111274A

JP2017111274A - Data processor

Info

Publication number: JP2017111274A
Application number: JP2015244995A
Authority: JP
Inventors: 松本　秀一; Shuichi Matsumoto; 秀一松本
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2015-12-16
Filing date: 2015-12-16
Publication date: 2017-06-22

Abstract

PROBLEM TO BE SOLVED: To easily utilize sound data by adjusting a reproduction timing of acquired sound data.SOLUTION: A data processor comprises; a sound data acquisition part for acquiring first sound data including time information; a featured value calculation part for generating first featured value data associated with the time information; a specification part for, on the basis of second featured value data and the first featured value data, specifying a first data position of the first featured value data corresponding to a second data position of the second featured value data; a moving part for, on the basis of a temporal positional relation between the first data position and the second data position, changing the time information such that reproduction timings move in parallel; and an expansion/contraction part for comparing the first featured value data and the second featured value data associated with the changed time information to detect a correspondence relation between respective data portions, and for changing the time information such that an interval between the reproduction timings expands/contracts on the basis of the correspondence relation.SELECTED DRAWING: Figure 4

Description

本発明は、複数のデータを調整する技術に関する。 The present invention relates to a technique for adjusting a plurality of data.

近年のカラオケ装置では、単に歌唱するだけでなく、様々な機能が付加されている。歌唱音声を解析して評価する機能、曲のテンポを自動的に調整する機能などが例示される。歌唱に応じて曲のテンポを自動的に調整する技術は、例えば、特許文献１に開示されている。また、特許文献２には、複数パートの同期演奏の技術が開示されている。 In recent karaoke apparatuses, not only singing but also various functions are added. Examples include a function for analyzing and evaluating a singing voice, a function for automatically adjusting the tempo of a song, and the like. A technique for automatically adjusting the tempo of a song according to the singing is disclosed in Patent Document 1, for example. Patent Document 2 discloses a technique for synchronized performance of a plurality of parts.

特許３９４４９３０号公報Japanese Patent No. 3944930 特開２００３−２６３１６１号公報JP 2003-263161 A

本発明の目的は、取得された音のデータの再生タイミングを調整することによって音のデータの活用をしやすくすることにある。 An object of the present invention is to facilitate the utilization of sound data by adjusting the reproduction timing of acquired sound data.

本発明の一実施形態によると、データ各部の再生タイミングを規定する時刻情報を含む第１の音データを取得する音データ取得部と、前記第１の音データの特徴量を時系列に算出し、当該特徴量が前記時刻情報に対応付けられた第１の特徴量データを生成する特徴量算出部と、第２の音データの特徴量を示す第２の特徴量データおよび前記第１の特徴量データに基づいて、当該第２の特徴量データの第２のデータ位置に対応する前記第１の特徴量データの第１のデータ位置を特定する特定部と、前記第１のデータ位置と前記第２のデータ位置との時間的な位置関係に基づいて、前記再生タイミングを平行移動するように前記時刻情報を変更する移動部と、変更された前記時刻情報に対応付けられた前記第１の特徴量データと前記第２の特徴量データとを比較してデータ各部の対応関係を検出し、当該対応関係に基づいて前記再生タイミングの間隔を伸縮するように前記時刻情報を変更する伸縮部と、を備えることを特徴とするデータ処理装置が提供される。 According to an embodiment of the present invention, the sound data acquisition unit that acquires the first sound data including time information that defines the reproduction timing of each part of the data, and the feature amount of the first sound data are calculated in time series. , A feature amount calculation unit that generates first feature amount data in which the feature amount is associated with the time information, second feature amount data indicating a feature amount of second sound data, and the first feature A specifying unit that specifies a first data position of the first feature amount data corresponding to a second data position of the second feature amount data based on the amount data; the first data position; Based on a temporal positional relationship with the second data position, a moving unit that changes the time information so as to translate the reproduction timing, and the first information associated with the changed time information Feature quantity data and the second feature quantity And a data expansion / contraction unit that detects a correspondence relationship between the respective data by comparing with the data and changes the time information so as to expand and contract the reproduction timing interval based on the correspondence relationship. A processing device is provided.

前記伸縮部は、前記対応関係に基づいて前記第１の音データの所定の再生タイミングを移動させることによって、当該再生タイミングより前の第１区間および後の第２区間の一方において前記再生タイミングの間隔を伸長し、他方において当該再生タイミングの間隔を縮小する伸縮処理を含み、当該伸縮処理に基づいて前記時刻情報を変更してもよい。 The expansion / contraction unit moves the predetermined reproduction timing of the first sound data based on the correspondence relationship, so that the reproduction timing of one of the first interval before the reproduction timing and the second interval after the reproduction timing is changed. The time information may be changed based on the expansion / contraction process including an expansion / contraction process for extending the interval and reducing the reproduction timing interval on the other hand.

前記伸縮部は、前記音データの範囲を複数の分割区間に分割し、前記分割区間において前記伸縮処理を行う分割伸縮処理を行って前記時刻情報を変更してもよい。 The expansion / contraction unit may change the time information by dividing the range of the sound data into a plurality of divided sections and performing a divided expansion / contraction process for performing the expansion / contraction process in the divided section.

前記伸縮部は、前記分割伸縮処理を複数回実行し、前記分割伸縮処理が実行される度に、前記分割区間が短くてもよい。 The expansion / contraction part may execute the division expansion / contraction process a plurality of times, and each time the division expansion / contraction process is executed, the division section may be short.

前記伸縮部は、前記伸縮処理前に、前記対応関係に基づいて前記第１の音データの全体に対して伸縮するように前記時刻情報を変更してもよい。 The expansion / contraction part may change the time information so as to expand / contract with respect to the entire first sound data based on the correspondence relationship before the expansion / contraction process.

前記第２の音データは、複数の音データを含み、前記第２の特徴量データは、前記複数の音データの特徴量および当該特徴量の分布を示し、前記伸縮部は、前記第２の特徴量データにおける前記特徴量の分布に基づいて、前記対応関係を検出してもよい。 The second sound data includes a plurality of sound data, the second feature amount data indicates a feature amount of the plurality of sound data and a distribution of the feature amount, and the expansion / contraction part includes the second sound data. The correspondence relationship may be detected based on the distribution of the feature amount in the feature amount data.

前記伸縮部によって変更された前記時刻情報を含む前記第１音データを、前記第２音データに合成する合成部をさらに備えてもよい。 You may further provide the synthetic | combination part which synthesize | combines the said 1st sound data containing the said time information changed by the said expansion-contraction part with the said 2nd sound data.

また、本発明の一実施形態によると、コンピュータに、データ各部の再生タイミングを規定する時刻情報を含む第１の音データを取得し、前記第１の音データの特徴量を時系列に算出し、当該特徴量が前記時刻情報に対応付けられた第１の特徴量データを生成し、第２の音データの特徴量を示す第２の特徴量データおよび前記第１の特徴量データに基づいて、当該第２の特徴量データの第２のデータ位置に対応する前記第１の特徴量データの第１のデータ位置を特定し、前記第１のデータ位置と前記第２のデータ位置との時間的な位置関係に基づいて、前記再生タイミングを平行移動するように前記時刻情報を変更し、変更された前記時刻情報に対応付けられた前記第１の特徴量データと前記第２の特徴量データとを比較してデータ各部の対応関係を検出し、当該対応関係に基づいて前記再生タイミングの間隔を伸縮するように前記時刻情報を変更することを実行させるためのプログラムが提供される。 According to one embodiment of the present invention, the computer acquires first sound data including time information that defines the reproduction timing of each part of the data, and calculates a feature amount of the first sound data in time series. First feature value data in which the feature value is associated with the time information is generated, and based on the second feature value data indicating the feature value of the second sound data and the first feature value data , Specifying a first data position of the first feature quantity data corresponding to a second data position of the second feature quantity data, and a time between the first data position and the second data position The time information is changed so as to translate the reproduction timing based on a specific positional relationship, and the first feature amount data and the second feature amount data associated with the changed time information And compare each part of the data Detecting a relationship, a program for executing to change the time information so as to expand and contract the distance between the reproduction timing based on the correspondence relationship is provided.

本発明の一実施形態によれば、取得された音のデータの再生タイミングを調整することによって音のデータの活用をしやすくすることができる。 According to an embodiment of the present invention, it is possible to facilitate the utilization of sound data by adjusting the reproduction timing of the acquired sound data.

本発明の第１実施形態におけるデータ処理システムの構成を示すブロック図である。It is a block diagram which shows the structure of the data processing system in 1st Embodiment of this invention. 本発明の第１実施形態における評価装置の構成を示すブロック図である。It is a block diagram which shows the structure of the evaluation apparatus in 1st Embodiment of this invention. 本発明の第１実施形態における評価機能の構成を示すブロック図である。It is a block diagram which shows the structure of the evaluation function in 1st Embodiment of this invention. 本発明の第１実施形態における伸縮機能の構成を示すブロック図である。It is a block diagram which shows the structure of the expansion-contraction function in 1st Embodiment of this invention. 本発明の第１実施形態における伸縮処理を示すフローチャートである。It is a flowchart which shows the expansion-contraction process in 1st Embodiment of this invention. 本発明の第１実施形態における特定処理および移動処理の具体例を示す図である。It is a figure which shows the specific example of the specific process and movement process in 1st Embodiment of this invention. 本発明の第１実施形態における全体伸縮処理の具体例を示す図である。It is a figure which shows the specific example of the whole expansion-contraction process in 1st Embodiment of this invention. 本発明の第１実施形態における第１分割伸縮処理の具体例を示す図である。It is a figure which shows the specific example of the 1st division | segmentation expansion / contraction process in 1st Embodiment of this invention. 本発明の第１実施形態における第２分割伸縮処理の具体例を示す図である。It is a figure which shows the specific example of the 2nd division expansion / contraction process in 1st Embodiment of this invention. 本発明の第１実施形態における特定処理および移動処理の具体例（曲の一部のみ歌唱した場合の例）を示す図である。It is a figure which shows the specific example (example when only a part of music is sung) of the specific process and movement process in 1st Embodiment of this invention. 本発明の第２実施形態における特定処理および移動処理の具体例（歌唱の比較対象がガイドメロディの場合の例）を示す図である。It is a figure which shows the specific example (example in case the comparison object of a song is a guide melody) in the specific process and movement process in 2nd Embodiment of this invention. 本発明の第３実施形態における伸縮処理を示すフローチャートである。It is a flowchart which shows the expansion-contraction process in 3rd Embodiment of this invention. 本発明の第３実施形態における評価対象ウインドウの例を示す図である。It is a figure which shows the example of the evaluation object window in 3rd Embodiment of this invention. 本発明の第３実施形態におけるＯＦＦＳＥＴ算出処理によって算出されるＯＦＦＳＥＴを説明する図である。It is a figure explaining OFFSET calculated by the OFFSET calculation process in 3rd Embodiment of this invention. 本発明の第３実施形態におけるＳＨＩＦＴ算出処理によって算出されるＳＨＩＦＴを説明する図である。It is a figure explaining SHIFT calculated by the SHIFT calculation process in 3rd Embodiment of this invention. 本発明の第３実施形態における模範特徴量データＳｄと歌唱特徴量データＳｓとのデータ各部の対応関係を説明する図である。It is a figure explaining the correspondence of each data part of model feature-value data Sd and singing feature-value data Ss in 3rd Embodiment of this invention. 本発明の第３実施形態における歌唱特徴量データの伸縮を説明する図である。It is a figure explaining expansion and contraction of the song feature-value data in 3rd Embodiment of this invention.

以下、本発明の一実施形態におけるデータ処理システムについて、図面を参照しながら詳細に説明する。以下に示す実施形態は本発明の実施形態の一例であって、本発明はこれらの実施形態に限定されるものではない。なお、本実施形態で参照する図面において、同一部分または同様な機能を有する部分には同一の符号または類似の符号（数字の後にＡ、Ｂ等を付しただけの符号）を付し、その繰り返しの説明は省略する場合がある。 Hereinafter, a data processing system according to an embodiment of the present invention will be described in detail with reference to the drawings. The following embodiments are examples of the embodiments of the present invention, and the present invention is not limited to these embodiments. Note that in the drawings referred to in the present embodiment, the same portion or a portion having a similar function is denoted by the same reference symbol or a similar reference symbol (a reference symbol simply including A, B, etc. after a number) and repeated. The description of may be omitted.

＜第１実施形態＞
［概要］
本発明の第１実施形態におけるデータ処理システムについて、図面を参照しながら詳細に説明する。第１実施形態におけるデータ処理システムは、歌唱するユーザ（以下、歌唱者という場合がある）の歌唱音声のデータ（歌唱音声データ）をデータベースに登録する。データベースに登録された歌唱音声は、統計的に処理されることにより、例えば、カラオケの歌唱評価の基準として用いられる。この評価基準によれば、歌唱者の平均的な歌唱に対する歌唱評価を行うことができる。また、データベースに登録された歌唱音声の一部（例えば、２つ）を用いて再生することによって、デュエット歌唱を再現することもできる。 <First Embodiment>
[Overview]
The data processing system according to the first embodiment of the present invention will be described in detail with reference to the drawings. The data processing system in the first embodiment registers singing voice data (singing voice data) of a user who sings (hereinafter may be called a singer) in a database. The singing voice registered in the database is statistically processed, and is used, for example, as a reference for karaoke singing evaluation. According to this evaluation standard, singing evaluation with respect to the average singing of a singer can be performed. Moreover, a duet song can also be reproduced by playing back using a part (for example, two) of the singing voices registered in the database.

歌唱の内容によっては、すでにデータベースに登録されている歌唱と、これから登録しようとする歌唱とは、時間的なずれ（早い歌唱または遅い歌唱）が生じる場合がある。このように時間的なずれが存在する場合には、同じ曲であっても、同じ再生タイミングにおいて歌唱される部分が異なってしまう。このような状態で歌唱音声データを収集しても、有用なデータと集合とはいえない。 Depending on the content of the song, there may be a time lag (early song or late song) between the song already registered in the database and the song to be registered. In this way, when there is a time lag, even if the music is the same, the parts sung at the same playback timing will be different. Even if singing voice data is collected in such a state, it cannot be said to be useful data and a set.

そこで、第１実施形態におけるデータ処理システムは、データベースに登録される歌唱音声データにおけるデータ各部の再生タイミングを規定する情報を変更することによって、歌唱を時間軸上で伸縮する。以下、「伸縮」とは、時間軸上での伸縮をいうが、伸縮処理によっては歌唱のピッチは維持されて変更されないように処理される。この処理によって、同じ曲の歌唱のデータについて、既に登録されているデータとこれから登録されるデータとが、同じ再生タイミングにおいて、できるだけ同じ部分が歌唱されるように調整される。以下、第１実施形態におけるデータ処理システムについて説明する。 Therefore, the data processing system according to the first embodiment expands and contracts the singing on the time axis by changing the information defining the reproduction timing of each part of the data in the singing voice data registered in the database. Hereinafter, “extension / contraction” refers to expansion / contraction on the time axis, but depending on the expansion / contraction processing, the singing pitch is maintained and is not changed. With this processing, the data already registered and the data registered from now on are adjusted so that the same portion is sung as much as possible at the same reproduction timing. The data processing system in the first embodiment will be described below.

［データ処理システムの構成］
図１は、本発明の第１実施形態におけるデータ処理システムの構成を示すブロック図である。データ処理システム１０００は、評価装置１、データ処理装置３、およびデータベース５を備える。これらの各構成は、インターネット等のネットワークＮＷを介して接続されている。この例では、複数の評価装置１がネットワークＮＷに接続されている。評価装置１は、例えば、カラオケ装置であり、この例では歌唱評価が可能なカラオケ装置である。なお、スマートフォン等の携帯装置であってもよい。 [Data processing system configuration]
FIG. 1 is a block diagram showing a configuration of a data processing system according to the first embodiment of the present invention. The data processing system 1000 includes an evaluation device 1, a data processing device 3, and a database 5. Each of these components is connected via a network NW such as the Internet. In this example, a plurality of evaluation devices 1 are connected to the network NW. The evaluation device 1 is, for example, a karaoke device, and in this example, is a karaoke device capable of singing evaluation. In addition, portable apparatuses, such as a smart phone, may be sufficient.

この例では、これらの評価装置１において歌唱音声が入力され、データ処理装置３において歌唱音声が伸縮され、伸縮された歌唱音声のデータがデータベース５に登録される。データベース５には、このように各評価装置１において生成された歌唱音声データが、歌唱音声が伸縮されるように変換されて、楽曲毎に関連付けられて登録されている。 In this example, the singing voice is input in these evaluation devices 1, the singing voice is expanded and contracted in the data processing device 3, and the expanded and contracted singing voice data is registered in the database 5. In the database 5, the singing voice data generated in each evaluation device 1 is converted so that the singing voice is expanded and contracted, and is registered in association with each music piece.

ここで、データベース５には、歌唱音声データだけでなく、歌唱音声データから所定の特徴量を算出して得られた特徴量データについても登録されていてもよい。この例では、特徴量は、歌唱音声のピッチに対応する。それぞれの歌唱音声のデータに対応付けて特徴量データとして登録される。なお、この特徴量は、評価装置１において算出されたものであってもよいし、データ処理装置３またはデータベース５において算出されたものであってもよい。この例では、各楽曲に対応付けられた複数の歌唱音声データは、評価装置１において歌唱音声を評価する際の評価基準として用いられる。この評価基準は、複数の歌唱音声をまとめた模範音声としてのデータ（模範音声データ）、および複数の歌唱音声を統計的に処理して得られたデータ（例えば、各タイミングにおけるピッチの分布等の分布データ）を含む。この例では、さらに、模範音声の特徴量（この例では模範音声のピッチ）の時間的な変化を表す特徴量データ（模範特徴量データ）も含まれている。ここで、模範特徴量データは、複数の歌唱音声についての特徴量を平均値または中央値のデータである。このデータに分布データが含まれていてもよい。 Here, not only the singing voice data but also the feature quantity data obtained by calculating a predetermined feature quantity from the singing voice data may be registered in the database 5. In this example, the feature amount corresponds to the pitch of the singing voice. It is registered as feature data in association with the data of each singing voice. The feature amount may be calculated by the evaluation device 1 or may be calculated by the data processing device 3 or the database 5. In this example, a plurality of singing voice data associated with each piece of music is used as an evaluation criterion when the singing voice is evaluated in the evaluation device 1. This evaluation standard is based on data (model voice data) as a model voice that summarizes a plurality of singing voices, and data obtained by statistically processing a plurality of singing voices (for example, pitch distribution at each timing, etc. Distribution data). In this example, feature amount data (exemplary feature amount data) representing a temporal change in the feature amount of the model voice (pitch of the model voice in this example) is further included. Here, the model feature value data is data of an average value or a median value of feature values for a plurality of singing voices. This data may include distribution data.

模範音声データおよび歌唱音声データは、タイムコードによってデータ各部（データ位置）の再生タイミングを規定するための時刻情報を含んでいる。タイムコードは、例えば、時、分、秒、フレームで再生タイミングを規定するＳＭＰＴＥコードを用いればよい。なお、時刻情報は、データ各部における再生タイミングを規定するものであれば、どのような情報であってもよい。なお、分布データ、模範特徴量データについても、模範音声データに対応した時刻情報を含んでいる。なお、分布データ、特徴量データは再生されるものではないが、時刻情報は、模範音声データと関連して時系列でデータ位置を特定するための情報として機能する。 The model voice data and the singing voice data include time information for defining the reproduction timing of each part (data position) of the data by the time code. As the time code, for example, an SMPTE code that defines the reproduction timing in hours, minutes, seconds, and frames may be used. The time information may be any information as long as it defines the reproduction timing in each part of the data. Note that the distribution data and the exemplary feature data also include time information corresponding to the exemplary audio data. Although the distribution data and the feature amount data are not reproduced, the time information functions as information for specifying the data position in time series in association with the model voice data.

［データ処理装置の構成］
図１に示すように、データ処理装置３は、制御部３１、記憶部３３、および通信部３９を含む。制御部３１は、ＣＰＵなどの演算処理回路を含む。制御部３１は、記憶部３３に記憶された制御プログラムをＣＰＵにより実行して、各種機能をデータ処理装置３において実現する。実現される機能には、歌唱音声を時間的に伸縮させる機能（伸縮機能いう場合がある）が含まれる。伸縮機能については後述する。 [Data processor configuration]
As shown in FIG. 1, the data processing device 3 includes a control unit 31, a storage unit 33, and a communication unit 39. The control unit 31 includes an arithmetic processing circuit such as a CPU. The control unit 31 executes a control program stored in the storage unit 33 by the CPU, and realizes various functions in the data processing device 3. The realized function includes a function of expanding and contracting the singing voice in time (sometimes referred to as an expansion / contraction function). The expansion / contraction function will be described later.

記憶部３３は、不揮発性メモリ、ハードディスク等の記憶装置である。記憶部３３は、伸縮機能を実現するための制御プログラムを記憶する。制御プログラムは、コンピュータにより実行可能であればよく、磁気記録媒体、光記録媒体、光磁気記録媒体、半導体メモリなどのコンピュータ読み取り可能な記録媒体に記憶した状態で提供されてもよい。この場合には、データ処理装置３は、記録媒体を読み取る装置を備えていればよい。また、制御プログラムは、ネットワークＮＷ経由でダウンロードされてもよい。通信部３９は、制御部３１の制御に基づいて、ネットワークＮＷに接続して、ネットワークＮＷに接続された外部装置と情報の送受信を行う。 The storage unit 33 is a storage device such as a nonvolatile memory or a hard disk. The storage unit 33 stores a control program for realizing the expansion / contraction function. The control program only needs to be executable by a computer, and may be provided in a state stored in a computer-readable recording medium such as a magnetic recording medium, an optical recording medium, a magneto-optical recording medium, or a semiconductor memory. In this case, the data processing device 3 only needs to include a device that reads the recording medium. Further, the control program may be downloaded via the network NW. Based on the control of the control unit 31, the communication unit 39 connects to the network NW and transmits / receives information to / from an external device connected to the network NW.

［評価装置の構成］
図２は、本発明の第１実施形態における評価装置の構成を示すブロック図である。評価装置１は、制御部１１、記憶部１３、操作部１５、表示部１７、通信部１９、および信号処理部２１を含む。これらの各構成は、バスを介して接続されている。また、信号処理部２１には、マイクロフォン２３およびスピーカ２５が接続されている。 [Configuration of evaluation device]
FIG. 2 is a block diagram showing a configuration of the evaluation apparatus according to the first embodiment of the present invention. The evaluation device 1 includes a control unit 11, a storage unit 13, an operation unit 15, a display unit 17, a communication unit 19, and a signal processing unit 21. Each of these components is connected via a bus. In addition, a microphone 23 and a speaker 25 are connected to the signal processing unit 21.

制御部１１は、ＣＰＵなどの演算処理回路を含む。制御部１１は、記憶部１３に記憶された制御プログラムをＣＰＵにより実行して、各種機能を評価装置１において実現させる。実現される機能には、歌唱音声を評価する機能（以下、評価機能という場合がある）が含まれる。記憶部１３は、不揮発性メモリ、ハードディスク等の記憶装置である。記憶部１３は、評価機能を実現するための制御プログラムを記憶する。制御プログラムは、コンピュータにより実行可能であればよく、磁気記録媒体、光記録媒体、光磁気記録媒体、半導体メモリなどのコンピュータ読み取り可能な記録媒体に記憶した状態で提供されてもよい。この場合には、評価装置１は、記録媒体を読み取る装置を備えていればよい。また、制御プログラムは、ネットワークＮＷ経由でダウンロードされてもよい。 The control unit 11 includes an arithmetic processing circuit such as a CPU. The control unit 11 causes the CPU to execute the control program stored in the storage unit 13 to realize various functions in the evaluation device 1. The realized functions include a function for evaluating the singing voice (hereinafter sometimes referred to as an evaluation function). The storage unit 13 is a storage device such as a nonvolatile memory or a hard disk. The storage unit 13 stores a control program for realizing the evaluation function. The control program only needs to be executable by a computer, and may be provided in a state stored in a computer-readable recording medium such as a magnetic recording medium, an optical recording medium, a magneto-optical recording medium, or a semiconductor memory. In this case, the evaluation device 1 only needs to include a device that reads the recording medium. Further, the control program may be downloaded via the network NW.

また、記憶部１３は、歌唱に関するデータとして、楽曲データ、歌唱音声データ、および評価基準情報を記憶する。楽曲データは、カラオケの歌唱曲に関連するデータ、例えば、ガイドメロディデータ、伴奏データ、歌詞データなどが含まれている。ガイドメロディデータは、歌唱曲のメロディを示すデータである。伴奏データは、歌唱曲の伴奏を示すデータである。ガイドメロディデータおよび伴奏データは、ＭＩＤＩ形式で表現されたデータであってもよい。歌詞データは、歌唱曲の歌詞を表示させるためのデータ、および表示させた歌詞テロップを色替えするタイミングを示すデータである。これらのデータは、外部サーバから取得してもよい。 Moreover, the memory | storage part 13 memorize | stores music data, song audio | voice data, and evaluation criteria information as data regarding a song. The music data includes data related to the karaoke song, for example, guide melody data, accompaniment data, and lyrics data. The guide melody data is data indicating the melody of the song. Accompaniment data is data indicating the accompaniment of a song. The guide melody data and accompaniment data may be data expressed in the MIDI format. The lyric data is data for displaying the lyrics of the song and data indicating the timing for changing the color of the displayed lyrics telop. These data may be acquired from an external server.

歌唱音声データは、歌唱者がマイクロフォン２３から入力した歌唱音声を示すデータであり、記憶部１３にバッファされる。歌唱音声データは、上述した時刻情報を含んでいる。評価基準情報は、評価機能が歌唱音声の評価の基準として用いる情報であり、例えば、評価値を算出するための演算式の情報等が含まれる。なお、評価対象の歌唱音声と比較されるときの基準となる歌唱、すなわち、模範的な歌唱（模範音声）が、評価基準情報に含まれていてもよい。評価基準情報は、データベース５に登録された評価基準を取得して生成されたものであってもよい。 The singing voice data is data indicating the singing voice input from the microphone 23 by the singer and is buffered in the storage unit 13. The singing voice data includes the time information described above. The evaluation reference information is information used by the evaluation function as a reference for evaluating the singing voice, and includes, for example, information on an arithmetic expression for calculating an evaluation value. In addition, the song used as the reference | standard when compared with the singing voice | voice of evaluation object, ie, model singing (model voice), may be contained in evaluation criteria information. The evaluation criterion information may be generated by acquiring the evaluation criterion registered in the database 5.

操作部１５は、操作パネルおよびリモコンなどに設けられた操作ボタン、キーボード、マウスなどの装置であり、入力された操作に応じた信号を制御部１１に出力する。この操作部１５によって、例えば、楽曲の選択等、カラオケ装置において一般的に行われる入力操作が可能である。表示部１７は、液晶ディスプレイ、有機ＥＬディスプレイ等の表示装置であり、制御部１１による制御に基づいた画面が表示される。なお、操作部１５と表示部１７とは一体としてタッチパネルを構成してもよい。通信部１９は、制御部１１の制御に基づいて、インターネットなどの通信回線と接続して、サーバ等の外部装置と情報の送受信を行う。なお、記憶部１３の機能は、通信部１９において通信可能な外部装置で実現されてもよい。 The operation unit 15 is a device such as an operation button, a keyboard, or a mouse provided on an operation panel and a remote controller, and outputs a signal corresponding to the input operation to the control unit 11. With this operation unit 15, for example, an input operation generally performed in a karaoke apparatus such as selection of music can be performed. The display unit 17 is a display device such as a liquid crystal display or an organic EL display, and displays a screen based on control by the control unit 11. Note that the operation unit 15 and the display unit 17 may integrally form a touch panel. The communication unit 19 is connected to a communication line such as the Internet based on the control of the control unit 11 and transmits / receives information to / from an external device such as a server. The function of the storage unit 13 may be realized by an external device that can communicate with the communication unit 19.

信号処理部２１は、ＭＩＤＩ形式の信号からオーディオ信号を生成する音源、Ａ／Ｄコンバータ、Ｄ／Ａコンバータ等を含む。歌唱音声は、マイクロフォン２３において電気信号に変換されて信号処理部２１に入力され、信号処理部２１においてＡ／Ｄ変換されて制御部１１に出力される。上述したように、歌唱音声は、歌唱音声データとして記憶部１３にバッファされる。また、伴奏データは、制御部１１によって読み出され、信号処理部２１においてＤ／Ａ変換され、スピーカ２５から歌唱曲の伴奏として出力される。このとき、ガイドメロディもスピーカ２５から出力されるようにしてもよい。 The signal processing unit 21 includes a sound source that generates an audio signal from a MIDI format signal, an A / D converter, a D / A converter, and the like. The singing voice is converted into an electric signal by the microphone 23 and input to the signal processing unit 21, and A / D converted by the signal processing unit 21 and output to the control unit 11. As described above, the singing voice is buffered in the storage unit 13 as singing voice data. The accompaniment data is read out by the control unit 11, D / A converted by the signal processing unit 21, and output from the speaker 25 as an accompaniment of the song. At this time, a guide melody may be output from the speaker 25.

［評価機能］
評価装置１の制御部１１が制御プログラムを実行することによって実現される評価機能について説明する。なお、以下に説明する評価機能を実現する構成の一部または全部は、ハードウエアによって実現されてもよい。 [Evaluation function]
An evaluation function realized by the control unit 11 of the evaluation apparatus 1 executing a control program will be described. A part or all of the configuration for realizing the evaluation function described below may be realized by hardware.

図３は、本発明の第１実施形態における評価機能の構成を示すブロック図である。評価機能１００は、入力音取得部１０１、特徴量算出部１０３、比較部１０５、および評価値算出部１０７を含む。入力音取得部１０３は、マイクロフォン２３から入力された歌唱音声を示す歌唱音声データを取得する。 FIG. 3 is a block diagram showing the configuration of the evaluation function in the first embodiment of the present invention. The evaluation function 100 includes an input sound acquisition unit 101, a feature amount calculation unit 103, a comparison unit 105, and an evaluation value calculation unit 107. The input sound acquisition unit 103 acquires singing voice data indicating the singing voice input from the microphone 23.

特徴量算出部１０３は、入力音取得部１０１によって取得された歌唱音声データを解析し、歌唱の特徴量、この例では歌唱音声のピッチ（周波数）の時間的な変化、すなわち歌唱ピッチ波形を算出する。具体的には、歌唱音声の波形のゼロクロスを用いた方法、ＦＦＴ（Fast Fourier Transform）を用いた方法等、公知の方法で歌唱ピッチ波形が算出される。なお、特徴量としては、ピッチ以外にも、音量レベル、倍音比率等であってもよい。また、特徴量は特性値そのものとして得られる値だけでなく、微分を用いた演算等によって特性値の変化量として得られる値であってもよい。このように変化量で表すと、絶対値の成分を除去できるため、例えば、キーシフトした歌唱などにおいても、適切に評価することができる。 The feature amount calculation unit 103 analyzes the singing voice data acquired by the input sound acquisition unit 101, and calculates a temporal change in the singing feature amount, in this example, the pitch (frequency) of the singing voice, that is, the singing pitch waveform. To do. Specifically, the singing pitch waveform is calculated by a known method such as a method using a zero cross of the waveform of the singing voice or a method using FFT (Fast Fourier Transform). In addition to the pitch, the feature amount may be a volume level, a harmonic ratio, or the like. The feature amount is not limited to a value obtained as the characteristic value itself, but may be a value obtained as a change amount of the characteristic value by an operation using differentiation or the like. Since the absolute value component can be removed in this way by the amount of change, for example, a key-shifted singing can be appropriately evaluated.

比較部１０５は、特徴量算出部１０３によって得られた歌唱音声の特徴量（歌唱ピッチ波形）と模範音声の特徴量（模範ピッチ波形）とを比較する。比較部１０５は、歌唱された楽曲に対応付けられてデータベース５に登録されている模範特徴量データを取得し、この模範特徴量データに基づいて模範音声の特徴量を得る。比較結果として、この例では、歌唱ピッチ波形と模範ピッチ波形との一致度を、例えば、波形間の距離に基づいて算出する。ここでは、波形間の距離が近いほど、一致度が高い。このとき、データベース５から分布データも取得することによって、模範音声のピッチの分布を考慮して、一致度の算出に重み付けをしてもよい。例えば、比較の対象部分における歌唱音と模範音声とのピッチが離れていても、模範音声のピッチの分布が広がっている場合には、分布が狭い場合に比べて一致度が低くなりにくくすればよい。 The comparison unit 105 compares the feature amount (singing pitch waveform) of the singing voice obtained by the feature amount calculation unit 103 with the feature amount (exemplary pitch waveform) of the model voice. The comparison unit 105 acquires model feature data registered in the database 5 in association with the sung song, and obtains model voice feature data based on the model feature data. As a comparison result, in this example, the degree of coincidence between the singing pitch waveform and the exemplary pitch waveform is calculated based on, for example, the distance between the waveforms. Here, the closer the distance between the waveforms, the higher the degree of coincidence. At this time, by obtaining distribution data also from the database 5, the calculation of the degree of coincidence may be weighted in consideration of the pitch distribution of the model voice. For example, even if the pitch of the singing sound and the model voice in the comparison target part is far apart, if the pitch distribution of the model voice is wide, the degree of coincidence is less likely to be lower than when the distribution is narrow. Good.

評価値算出部１０７は、比較部１０５における比較結果に基づいて、歌唱音声の評価の指標となる評価値を算出する。この例では、比較部１０５で算出された一致度が高いほど評価値が高く算出され、歌唱音声の評価が高くなる。なお、評価値算出部１０７は、この一致度のみに基づいて評価値を算出するのではなく、さらに他の要素に基づいて評価値を算出してもよい。この評価結果は、表示部１７において提示されてもよい。以上が評価機能１００についての説明である。 The evaluation value calculation unit 107 calculates an evaluation value that serves as an index for evaluating the singing voice based on the comparison result in the comparison unit 105. In this example, the higher the degree of coincidence calculated by the comparison unit 105, the higher the evaluation value is calculated, and the higher the evaluation of the singing voice. Note that the evaluation value calculation unit 107 may calculate the evaluation value based not only on the degree of coincidence but also based on other factors. This evaluation result may be presented on the display unit 17. The above is the description of the evaluation function 100.

［伸縮機能］
データ処理装置３の制御部３１が制御プログラムを実行することによって実現される伸縮機能について説明する。なお、以下に説明する伸縮機能を実現する構成の一部または全部は、ハードウエアによって実現されてもよい。 [Extension function]
The expansion / contraction function realized by the control unit 31 of the data processing device 3 executing the control program will be described. Note that part or all of the configuration for realizing the expansion / contraction function described below may be realized by hardware.

図４は、本発明の第１実施形態における伸縮機能の構成を示すブロック図である。伸縮機能３００は、音データ取得部３０１、特徴量算出部３０３、特定部３０５、移動部３０７、および伸縮部３０９を含む。 FIG. 4 is a block diagram showing the configuration of the expansion / contraction function in the first embodiment of the present invention. The expansion / contraction function 300 includes a sound data acquisition unit 301, a feature amount calculation unit 303, a specification unit 305, a movement unit 307, and an expansion / contraction unit 309.

音データ取得部３０１は、評価装置１において生成された歌唱音声データを、ネットワークＮＷを介して取得する。この歌唱音声データには、上述したようにデータ各部の再生タイミングを規定するための時刻情報が含まれている。 The sound data acquisition unit 301 acquires the singing voice data generated in the evaluation device 1 via the network NW. This singing voice data includes time information for defining the reproduction timing of each part of the data as described above.

特徴量算出部３０３は、音データ取得部３０１によって取得された歌唱音声データを解析し、歌唱の特徴量を時系列に算出し、この特徴量を示す特徴量データを生成する。特徴量の算出方法は、上述した特徴量算出部１０３における方法と同様である。すなわち、特徴量は、歌唱音声のピッチであり、特徴量データは、このピッチの時間的な変化を表す。生成される特徴量データは、上記の時刻情報を含んでいる。すなわち、歌唱音声データのうち特定の再生タイミングにおける歌唱音声の特徴量が、特徴量データのうち特定の再生タイミングに対応するデータ位置の特徴量に対応することになる。 The feature amount calculation unit 303 analyzes the singing voice data acquired by the sound data acquisition unit 301, calculates the feature amount of the song in time series, and generates feature amount data indicating the feature amount. The feature amount calculation method is the same as the method in the feature amount calculation unit 103 described above. That is, the feature amount is the pitch of the singing voice, and the feature amount data represents a temporal change of this pitch. The generated feature data includes the time information described above. That is, the feature quantity of the singing voice at a specific playback timing in the singing voice data corresponds to the feature quantity at the data position corresponding to the specific playback timing in the feature quantity data.

特定部３０５は、特徴量算出部３０３において生成された特徴量データ（歌唱特徴量データという）と、歌唱された楽曲に対応付けられてデータベース５に登録されている模範特徴量データとに基づいて、それぞれのデータのうち、所定データ位置についての関係を特定する。所定データ位置は、この例では、歌唱区間の冒頭のデータ位置を示す。そのため、特定部３０５は、歌唱特徴量データにおける歌唱区間の冒頭のデータ位置（第１のデータ位置という）に対応する、模範特徴量データにおけるデータ位置（第２のデータ位置という）を特定する。なお、特定部３０５、移動部３０７および伸縮部３０９の処理（伸縮処理）について、具体例については後述する。 The specifying unit 305 is based on the feature amount data (referred to as singing feature amount data) generated in the feature amount calculating unit 303 and the model feature amount data registered in the database 5 in association with the sung song. Among the data, the relationship about the predetermined data position is specified. In this example, the predetermined data position indicates the data position at the beginning of the singing section. Therefore, the specifying unit 305 specifies the data position (referred to as the second data position) in the model feature data corresponding to the beginning data position (referred to as the first data position) of the singing section in the singing feature value data. A specific example of the processing (expansion / contraction processing) of the specifying unit 305, the moving unit 307, and the expansion / contraction unit 309 will be described later.

移動部３０７は、特定部３０５において特定されたデータ位置の関係に基づいて、歌唱音声データの再生タイミングを平行移動するように時刻情報を変更する。このように変更される時刻情報は、歌唱音声データおよび歌唱特徴量データに含まれる時刻情報である。ここでは、時刻情報のタイムコードを全体的にシフトさせる、すなわちデータ位置を平行移動することによって、歌唱特徴量データの第１のデータ位置が、模範特徴量データの第２のデータ位置に一致するように時刻情報を変更する。 The moving unit 307 changes the time information so as to translate the playback timing of the singing voice data based on the data position relationship specified by the specifying unit 305. The time information changed in this way is time information included in the singing voice data and the singing feature amount data. Here, the first data position of the singing feature value data coincides with the second data position of the model feature value data by shifting the time code of the time information as a whole, that is, by moving the data position in parallel. Change the time information as follows.

伸縮部３０９は、移動部３０７によって時刻情報が変更された歌唱特徴量データと、模範特徴量データとを比較して、データ各部の対応関係を検出する。この対応関係は、歌唱特徴量データと模範特徴量データとの類似度が高くなるデータ部分の関係を示す。対応関係の具体的な検出方法は、後述する具体例において説明する。伸縮部３０９は、この対応関係に基づいて、データ間隔（再生タイミングの間隔）を伸縮するように時刻情報を変更する。ここで変更される時刻情報も、歌唱音声データおよび歌唱特徴量データに含まれる時刻情報である。伸縮部３０９は、時刻情報を変更した歌唱音声データおよび歌唱特徴量データを、歌唱された楽曲に対応付けてデータベース５に登録する。以上が伸縮機能３００についての説明である。 The expansion / contraction unit 309 compares the singing feature amount data whose time information has been changed by the moving unit 307 with the model feature amount data, and detects the correspondence between the data units. This correspondence relationship indicates the relationship between the data portions where the similarity between the singing feature amount data and the model feature amount data is high. A specific method for detecting the correspondence will be described in a specific example described later. The expansion / contraction unit 309 changes the time information so as to expand and contract the data interval (reproduction timing interval) based on this correspondence. The time information changed here is also the time information included in the singing voice data and the singing feature amount data. The expansion / contraction unit 309 registers the singing voice data and the singing feature amount data whose time information is changed in the database 5 in association with the sung music. The above is the description of the expansion / contraction function 300.

［伸縮処理］
続いて、伸縮機能３００において実行される伸縮処理について、具体的に説明する。伸縮処理は、上記の伸縮機能３００のうち、特定部３０５、移動部３０７、および伸縮部３０９において実行される処理をまとめたものである。そのため、伸縮処理は、特徴量算出部３０３において歌唱特徴量データが算出されると開始される。 [Extension processing]
Next, the expansion / contraction process executed in the expansion / contraction function 300 will be specifically described. The expansion / contraction process is a collection of processes executed in the specifying unit 305, the moving unit 307, and the expansion / contraction unit 309 in the expansion / contraction function 300 described above. Therefore, the expansion / contraction process is started when the singing feature amount data is calculated by the feature amount calculation unit 303.

図５は、本発明の第１実施形態における伸縮処理を示すフローチャートである。図６は、本発明の第１実施形態における特定処理および移動処理の具体例を示す図である。図７は、本発明の第１実施形態における全体伸縮処理の具体例を示す図である。図８は、本発明の第１実施形態における第１分割伸縮処理の具体例を示す図である。図９は、本発明の第１実施形態における第２分割伸縮処理の具体例を示す図である。以下に説明するステップＳ１０１、Ｓ１０３の処理は特定部３０５により実行され、ステップＳ１０５の処理は移動部３０７により実行される。これらの処理は、図６を用いて説明する。また、ステップ１０７〜Ｓ１１１の処理は、伸縮部３０９によって実行される。これらの処理は、図７から図９を用いて説明する。 FIG. 5 is a flowchart showing the expansion / contraction processing in the first embodiment of the present invention. FIG. 6 is a diagram showing a specific example of the specifying process and the moving process in the first embodiment of the present invention. FIG. 7 is a diagram showing a specific example of the entire expansion / contraction process in the first embodiment of the present invention. FIG. 8 is a diagram showing a specific example of the first division expansion / contraction process in the first embodiment of the present invention. FIG. 9 is a diagram showing a specific example of the second divided expansion / contraction process in the first embodiment of the present invention. The processing of steps S101 and S103 described below is executed by the specifying unit 305, and the processing of step S105 is executed by the moving unit 307. These processes will be described with reference to FIG. Further, the processing of steps 107 to S111 is executed by the expansion / contraction unit 309. These processes will be described with reference to FIGS.

図６には、模範音声データの存在する区間Ｄｄ、模範特徴量データＳｄ、歌唱音声データの存在する区間Ｄｓ、および歌唱特徴量データＳｓを模した図が示されている。横軸方向は、時間軸ｔである。まず、歌唱特徴量データＳｓに基づいて、歌唱区間Ｄｓ１を抽出する（図５；ステップＳ１０１）。歌唱区間Ｄｓ１は、この例では、いわゆるＳＡＤ（song activity direction）といわれる方法を用い、特徴量（ピッチ）が算出できた領域に対応する区間として定められている。なお、いわゆるＶＡＤ（voice activity direction）といわれる方法を用い、音量レベルによって歌唱区間Ｄｓ１が定められていてもよい。 FIG. 6 shows a diagram simulating the section Dd in which the model voice data exists, the model feature data Sd, the section Ds in which the singing voice data exists, and the singing feature data Ss. The horizontal axis direction is the time axis t. First, the singing section Ds1 is extracted based on the singing feature amount data Ss (FIG. 5; step S101). In this example, the singing section Ds1 is defined as a section corresponding to an area where a feature amount (pitch) can be calculated by using a so-called SAD (song activity direction) method. The so-called VAD (voice activity direction) method may be used to determine the singing section Ds1 according to the volume level.

続いて、模範特徴量データＳｄのうち、歌唱特徴量データＳｓの所定データ位置と対応する関係にあるデータ位置を特定する（図５；ステップＳ１０３）。上述したように所定データ位置は、ここでは、歌唱区間の冒頭のデータ位置を示す。歌唱特徴量データＳｓのうち、歌唱区間Ｄｓ１の冒頭の区間Ｄｓ２におけるデータ部分を、模範特徴量データＳｄと比較する。比較結果に基づき、模範特徴量データＳｄのうち、区間Ｄｓ２のデータ部分に類似した区間Ｄｄ２を検出する。 Subsequently, a data position corresponding to a predetermined data position of the singing feature value data Ss is specified in the model feature value data Sd (FIG. 5; step S103). As described above, the predetermined data position here indicates the data position at the beginning of the singing section. Of the singing feature amount data Ss, the data portion in the beginning section Ds2 of the singing section Ds1 is compared with the model feature amount data Sd. Based on the comparison result, the section Dd2 similar to the data portion of the section Ds2 is detected from the exemplary feature data Sd.

この例では、区間Ｄｓ２の歌唱特徴量データＳｓの波形を、模範特徴量データＳｄの様々な領域の波形と比較し、最も類似度が高くなる領域を区間Ｄｄ２として特定する。これによって、歌唱特徴量データＳｓのうち区間Ｄｓ２のデータ位置（例えば、区間Ｄｓ２の中央のデータ位置の再生タイミングｔｓ１）と、模範特徴量データＳｄのうち区間Ｄｄ２のデータ位置（例えば、区間Ｄｄ２の中央のデータ位置の再生タイミングｔｄ１）とが対応する関係にあると特定する。 In this example, the waveform of the singing feature value data Ss in the section Ds2 is compared with the waveforms of various regions in the model feature data Sd, and the region having the highest similarity is specified as the section Dd2. Accordingly, the data position of the section Ds2 (for example, the reproduction timing ts1 of the center data position of the section Ds2) in the singing feature value data Ss and the data position (for example, of the section Dd2 of the model feature data Sd). It is specified that the reproduction timing td1) at the center data position has a corresponding relationship.

類似度は、上述した比較部１０５において算出される一致度と同様に算出される。すなわち、波形間の距離が近いほど、高い類似度として算出される。なお、模範特徴量データに対応する分布データをデータベース５から取得しておき、特徴量の分布を類似度の算出に用いてもよい。例えば、特徴量の分布が広がりの少ない区間、すなわち特徴量のばらつきの少ない区間の模範特徴量データの波形のみ類似度の算出の対象としてもよい。また、分布に応じて類似度の算出に重み付けをしてもよい。なお、評価装置１における評価値の算出アルゴリズムと同様の方法により、類似度を評価値として算出するようにしてもよい。 The degree of similarity is calculated in the same manner as the degree of coincidence calculated in the comparison unit 105 described above. That is, the closer the distance between the waveforms is, the higher the similarity is calculated. Note that distribution data corresponding to the exemplary feature quantity data may be acquired from the database 5 and the feature quantity distribution may be used to calculate the similarity. For example, only the waveform of the exemplary feature quantity data in a section where the distribution of the feature quantity has a small spread, that is, a section where the variation of the feature quantity is small may be the target of similarity calculation. Moreover, you may weight the calculation of similarity according to distribution. Note that the similarity may be calculated as the evaluation value by a method similar to the evaluation value calculation algorithm in the evaluation apparatus 1.

続いて、対応関係にあるデータ位置に基づいて、歌唱特徴量データＳｓを時間軸上で平行移動させる（図５；ステップＳ１０５）。平行移動するときには、再生タイミングｔｓ１が再生タイミングｔｄ１になるように歌唱特徴量データＳｓの時刻情報を変更する。すなわち、各データ位置の再生タイミングに（ｔｄ１−ｔｓ１）を加算するように変更すればよい。このように時刻情報が変更されたデータが、図６に示す歌唱特徴量データＳｓ２である。 Subsequently, the singing feature value data Ss is translated on the time axis based on the data positions in the correspondence relationship (FIG. 5; step S105). When moving in parallel, the time information of the singing feature amount data Ss is changed so that the reproduction timing ts1 becomes the reproduction timing td1. That is, it may be changed so that (td1-ts1) is added to the reproduction timing of each data position. The data in which the time information is changed in this way is singing feature amount data Ss2 shown in FIG.

続いて、歌唱特徴量データＳｓ２と模範特徴量データＳｄとを比較して、歌唱特徴量データＳｓ２のデータ全体を伸縮させる（図５；ステップＳ１０７）。この処理を全体伸縮処理という。ここでは、歌唱特徴量データＳｓ２のデータ全体を比例的に伸縮させ、模範特徴量データＳｄとの類似度ができるだけ高くなるように、時刻情報を変更する。このとき、上記の再生タイミングｔｓ１のデータ位置または区間Ｄｓ２の最初のデータ位置については、その位置を変更しないようにする。例えば、全体的に９０％に縮める場合には、位置を変更しないデータ以降のデータについて、それぞれ、データ位置の再生タイミングに０．９を乗じて時刻情報とすればよい。このように、伸縮させるときには、伸縮させる区間においては均等に伸縮するように時刻情報を変更する。以下に説明する分割伸縮処理も、同一区間内での伸縮は均等に伸縮するように時刻情報を変更する点で同様である。 Subsequently, the singing feature value data Ss2 and the model feature value data Sd are compared, and the entire data of the singing feature value data Ss2 is expanded or contracted (FIG. 5; step S107). This process is referred to as an entire expansion / contraction process. Here, the entire time data of the singing feature value data Ss2 is proportionally expanded and contracted, and the time information is changed so that the similarity to the model feature value data Sd is as high as possible. At this time, the data position of the reproduction timing ts1 or the first data position of the section Ds2 is not changed. For example, in the case of reducing to 90% as a whole, time information may be obtained by multiplying the reproduction timing of the data position by 0.9 for the data after the data whose position is not changed. Thus, when expanding and contracting, the time information is changed so as to expand and contract evenly in the section to be expanded and contracted. The split expansion / contraction process described below is also similar in that the time information is changed so that expansion / contraction within the same section is evenly expanded / contracted.

時刻情報が変更されたデータが、図７に示す歌唱特徴量データＳｓ３である。ここでの類似度の算出についても、上述した方法と同じようにすればよい。例えば、歌唱特徴量データＳｓ３と模範特徴量データＳｄとの波形間の距離ができるだけ小さくなるように、歌唱特徴量データＳｓ２を伸縮する。類似度の算出には、分布データを上述のように類似度のは算出に含めてもよい。なお、図７においては、歌唱区間の最後のデータ位置の再生タイミングが一致するように示されているが、最後のデータ位置の再生タイミングは、上記の通り類似度に依存するため、必ずしも図示の通りになるとは限らない。 The data in which the time information is changed is singing feature amount data Ss3 shown in FIG. The calculation of similarity here may be performed in the same manner as described above. For example, the singing feature value data Ss2 is expanded and contracted so that the distance between the waveforms of the singing feature value data Ss3 and the model feature value data Sd is as small as possible. In calculating the similarity, the distribution data may be included in the calculation as described above. In FIG. 7, the playback timing of the last data position in the singing section is shown to match, but the playback timing of the last data position depends on the similarity as described above, and is not necessarily shown in the drawing. Not necessarily street.

続いて、歌唱特徴量データＳｓ３を複数の区間に分割し、それぞれの区間に対して模範特徴量データＳｄを比較して、歌唱特徴量データＳｓ３を区間毎に伸縮させる（図５；ステップＳ１０９）。この処理を分割伸縮処理という。分割伸縮処理は複数回行われることが望ましい。まず、１回目の分割伸縮処理について、図８を用いて説明する。最初のデータ位置ＳＴと最後のデータ位置ＥＮとの中点のデータ位置Ｍ１を決定する。これによって、歌唱特徴量データＳｓ３をＳＴとＭ１との区間ＳＰ１と、Ｍ１とＥＮとの区間ＳＬ１とに分割する。 Subsequently, the singing feature quantity data Ss3 is divided into a plurality of sections, the model feature quantity data Sd is compared with each section, and the singing feature quantity data Ss3 is expanded or contracted for each section (FIG. 5; step S109). . This process is called a division expansion / contraction process. The division expansion / contraction process is desirably performed a plurality of times. First, the first division expansion / contraction process will be described with reference to FIG. A midpoint data position M1 between the first data position ST and the last data position EN is determined. Thus, the singing feature amount data Ss3 is divided into the section SP1 between ST and M1, and the section SL1 between M1 and EN.

ここでのデータの伸縮は、中点のデータ位置Ｍ１の再生タイミングを時間的に前後に移動させることによって実行される。Ｍ１を時間的に前に移動させると、区間ＳＰ１が縮小され、区間ＳＬ１が伸長される。逆にＭ１を時間的に後に移動させると、区間ＳＰ１が伸長され、区間ＳＬ１が縮小される。このとき、各区間の一方の端部にあたるＳＴおよびＥＮのデータ位置は変更しない。そのため、ＳＴおよびＥＮは固定区切位置という場合がある。また、Ｍ１は移動区切位置という場合がある。 The expansion / contraction of the data here is executed by moving the reproduction timing of the data point M1 at the middle point back and forth in time. When M1 is moved forward in time, the section SP1 is reduced and the section SL1 is expanded. Conversely, when M1 is moved later in time, the section SP1 is expanded and the section SL1 is contracted. At this time, the data positions of ST and EN corresponding to one end of each section are not changed. Therefore, ST and EN may be referred to as fixed delimiter positions. M1 may be referred to as a movement delimiter position.

図８に示すように、データ位置Ｍ１を、より前の時間のＭ１ｓに移動させると、区間ＳＰ１はＳＰ１ｓに縮小され、区間ＳＬ１はＳＬ１ｓに伸長される。このようにして歌唱特徴量データＳｓ３を伸縮させるように時刻情報を変更したものが歌唱特徴量データＳｓ４である。Ｍ１の移動先であるＭ１ｓの再生タイミングは、歌唱特徴量データＳｓ４と模範特徴量データＳｄとの類似度ができるだけ高くなるように決められる。この類似度の算出に関しても上述した方法で行えばよい。 As shown in FIG. 8, when the data position M1 is moved to M1s at an earlier time, the section SP1 is reduced to SP1s and the section SL1 is extended to SL1s. The singing feature value data Ss4 is obtained by changing the time information so as to expand and contract the singing feature value data Ss3. The reproduction timing of M1s, which is the movement destination of M1, is determined so that the similarity between the singing feature value data Ss4 and the model feature value data Sd is as high as possible. The calculation of the similarity may be performed by the method described above.

図８に示す分割伸縮処理が終了すると、分割単位が最小値に到達したか判定する（図５；ステップＳ１１１）。分割単位とは、図８の例では分割された区間の１つのことをいう。この区間長が、予め決められた最小値以下になった場合（ステップＳ１１１；Ｙｅｓ）には、伸縮処理を終了する。一方、最小値に到達していない場合（ステップＳ１１１；Ｎｏ）には、再び分割伸縮処理を実行する（ステップＳ１０９）。 When the division expansion / contraction process shown in FIG. 8 ends, it is determined whether the division unit has reached the minimum value (FIG. 5; step S111). The division unit refers to one of the divided sections in the example of FIG. When the section length is equal to or smaller than a predetermined minimum value (step S111; Yes), the expansion / contraction process is terminated. On the other hand, when the minimum value has not been reached (step S111; No), the division expansion / contraction process is executed again (step S109).

ステップＳ１０９における分割伸縮処理が２回目以降に実行される場合には、実行する度に、直前に行われた分割伸縮処理のときの分割単位をより細かくする。この例では、図９に示すように、区間ＳＰ１、ＳＬ１、ＳＰ２、ＳＬ２の４分割として、１回目よりも分割単位を細かくする。まず、歌唱特徴量データＳｓ４を大きく２つに分ける。ここでは、ＳＴ１からＥＮ１の区間と、ＳＴ２からＥＮ２の区間との２つの区間に分ける。ＥＮ１（ＳＴ２）は、ＳＴ１とＥＮ２との中点である。ＳＴ１、ＥＮ１（ＳＴ２）、ＥＮ２については、分割伸縮処理においてデータ位置が固定される区切位置（固定区切位置）である。一方、２つの区間のそれぞれの中点において、データ位置を移動するための区切位置（移動区切位置）としてＭ１、Ｍ２を設定する。これによって、上記の４つの区間に分割される。 When the division expansion / contraction process in step S109 is executed after the second time, the division unit at the time of the division expansion / contraction process performed immediately before is made finer each time it is executed. In this example, as shown in FIG. 9, the division unit is made finer than the first time as four divisions of sections SP1, SL1, SP2, and SL2. First, the singing feature value data Ss4 is roughly divided into two. Here, it is divided into two sections, a section from ST1 to EN1, and a section from ST2 to EN2. EN1 (ST2) is the midpoint between ST1 and EN2. ST1, EN1 (ST2), and EN2 are delimitation positions (fixed delimitation positions) at which data positions are fixed in the division expansion / contraction process. On the other hand, M1 and M2 are set as delimiter positions (movement delimiter positions) for moving the data positions at the midpoints of the two sections. As a result, it is divided into the above four sections.

このときの分割伸縮処理は、最初の分割伸縮処理の時と同様に、Ｍ１を時間的に前後に移動させることで区間ＳＰ１、ＳＬ１を伸縮し、Ｍ２を時間的に前後に移動させることで区間ＳＰ２、ＳＬ２を伸縮する。図９の例のように、Ｍ１をＭ１ｓに移動させ、Ｍ２をＭ２ｓに移動させると、区間ＳＰ１、ＳＰ２は、それぞれＳＰ１ｓ、ＳＰ２ｓに伸長され、区間ＳＬ１、ＳＬ２は、それぞれＳＬ１ｓ、ＳＬ２ｓに縮小される。このようにして歌唱特徴量データＳｓ３を伸縮させるように時刻情報を変更したものが歌唱特徴量データＳｓ５である。Ｍ１、Ｍ２の移動先であるＭ１ｓ、Ｍ２ｓのデータ位置は、歌唱特徴量データＳｓ５と模範特徴量データＳｄとの類似度ができるだけ高くなるように決められる。 The division expansion / contraction process at this time is the same as the first division expansion / contraction process by moving M1 back and forth in time to expand / contract sections SP1 and SL1, and moving M2 back and forth in time. Expand and contract SP2 and SL2. If M1 is moved to M1s and M2 is moved to M2s as in the example of FIG. 9, sections SP1 and SP2 are expanded to SP1s and SP2s, respectively, and sections SL1 and SL2 are reduced to SL1s and SL2s, respectively. The The singing feature value data Ss5 is obtained by changing the time information so as to expand and contract the singing feature value data Ss3. The data positions of M1s and M2s, which are the movement destinations of M1 and M2, are determined so that the similarity between the singing feature value data Ss5 and the model feature value data Sd is as high as possible.

３回目の分割伸縮処理は、さらに分割数を増やして分割単位を細かくする。このとき、固定区切位置として、ＳＴ１〜ＳＴ４、ＥＮ１〜ＥＮ４を定める。また、隣接する固定区切位置の間において、移動区切位置として、Ｍ１、Ｍ２、Ｍ３、Ｍ４を定める。固定区切位置および移動区切位置によって分割単位（区間ＳＰ１、ＳＬ１、・・・、ＳＰ４、ＳＬ４）が決められる。そして、上記と同様にして、移動区切位置Ｍ１、Ｍ２、・・・を移動させて、各区間を伸縮させる。 In the third division expansion / contraction process, the number of divisions is further increased to make the division unit finer. At this time, ST1 to ST4 and EN1 to EN4 are determined as fixed delimiter positions. Also, M1, M2, M3, and M4 are defined as movement partition positions between adjacent fixed partition positions. The division unit (sections SP1, SL1,..., SP4, SL4) is determined by the fixed division position and the movement division position. In the same manner as described above, the movement delimiter positions M1, M2,... Are moved to expand and contract each section.

このようにして、分割単位が予め決められた最小値に到達するまで、分割単位を細かくしながら分割伸縮処理を繰り返していく。分割伸縮処理を繰り返すことによって、時刻情報を変更していく。これによって、歌唱特徴量データは、模範特徴量データに類似するように伸縮される。これに伴い、変更された時刻情報が歌唱音声データについても適用されて、歌唱音声データが模範音声データに類似するように伸縮されることになる。 In this way, the division expansion / contraction process is repeated while making the division unit fine until the division unit reaches a predetermined minimum value. The time information is changed by repeating the division expansion / contraction process. Accordingly, the singing feature value data is expanded and contracted so as to be similar to the model feature value data. Along with this, the changed time information is also applied to the singing voice data, and the singing voice data is expanded and contracted so as to be similar to the model voice data.

なお、分割伸縮処理は、様々に変更可能である。例えば、分割伸縮処理の回数が増えると、分割数を増やして、分割単位を細かく、すなわち１つの区間長を短くなるようにしていたが、予め決められたルールにしたがって、処理の回数に応じて分割数を増減させてもよい。また、歌唱特徴量データは、分割後のそれぞれの区間長が同じになるように均等に分割されていたが、予め決められたルールにしたがって、それぞれの区間長が異なるように分割されてもよい。また、ステップＳ１１１において、分割単位が最小値に到達したことを判定するのではなく、分割伸縮処理後の歌唱特徴量データと模範特徴量データとの類似度が所定値以上になったことを判定するようにしてもよい。 The division expansion / contraction process can be variously changed. For example, when the number of division expansion / contraction processes increases, the number of divisions is increased to make the division unit fine, that is, to shorten one section length, but according to the number of processes according to a predetermined rule. The number of divisions may be increased or decreased. In addition, the singing feature amount data is equally divided so that the respective segment lengths after the division are the same, but may be divided so that the respective segment lengths are different according to a predetermined rule. . In step S111, it is not determined that the division unit has reached the minimum value, but it is determined that the similarity between the singing feature value data and the model feature value data after the division expansion / contraction processing is equal to or greater than a predetermined value. You may make it do.

上記の例では、楽曲全体が歌唱されている場合を想定して説明した。一方、楽曲の一部のみが歌唱されている場合等、歌唱の期間が短い歌唱音声データを伸縮させる場合であっても、上記の伸縮処理によれば、図１０に示すように処理が可能である。 In the above example, the case where the entire music is sung has been described. On the other hand, even when only a part of the song is sung or the like, when the singing voice data with a short singing period is expanded and contracted, according to the expansion / contraction processing, the processing can be performed as shown in FIG. is there.

図１０は、本発明の第１実施形態における特定処理および移動処理の具体例（曲の一部のみ歌唱した場合の例）を示す図である。図１０に示すように歌唱音声データの存在する区間Ｄｓは、模範音声データが存在する区間Ｄｄに比べて大幅に短い。このような場合には、楽曲の一部のみを歌唱している場合がある。この場合であっても、図５に示す伸縮処理と同様に処理をすればよい。その結果、図６に示す例と同様にして図１０に示す例のように処理される。すなわち、歌唱区間Ｄｓ１を抽出し、冒頭の区間Ｄｓ２の歌唱特徴量データＳｓの波形を、模範特徴量データＳｄの様々な領域の波形と比較して、最も類似度が高くなる領域を区間Ｄｄ２として特定する。 FIG. 10 is a diagram showing a specific example of the specifying process and the moving process in the first embodiment of the present invention (an example in the case where only a part of a song is sung). As shown in FIG. 10, the section Ds in which the singing voice data exists is significantly shorter than the section Dd in which the model voice data exists. In such a case, only a part of the music may be sung. Even in this case, the same process as the expansion / contraction process shown in FIG. As a result, processing is performed as in the example shown in FIG. 10 in the same manner as in the example shown in FIG. That is, the singing section Ds1 is extracted, the waveform of the singing feature value data Ss of the opening section Ds2 is compared with the waveforms of various regions of the model feature value data Sd, and the region having the highest similarity is defined as the section Dd2. Identify.

楽曲が２コーラスある場合、歌唱した区間が、いずれかの１コーラスである場合がある。この際、１コーラス目が類似しているのか、２コーラス目が類似しているのかがわかりにくい場合がある。このような場合（類似度の差が所定値以内の差となる区間がある場合）、別の種類の特徴量をさらに用いてもよい。例えば、歌詞の違いによる音素の違いを判断するため、倍音比率を用いるとよい場合がある。 When there are two choruses of music, the sung section may be any one chorus. At this time, it may be difficult to determine whether the first chorus is similar or the second chorus is similar. In such a case (when there is a section where the difference in similarity is within a predetermined value), another type of feature amount may be further used. For example, in order to determine the difference in phonemes due to the difference in lyrics, it may be desirable to use a harmonic ratio.

これによって、歌唱特徴量データＳｓのうち区間Ｄｓ２の所定のデータ位置の再生タイミングｔｓ２と、模範特徴量データＳｄのうち区間Ｄｄ２の所定のデータ位置の再生タイミングｔｄ２とが対応する関係にあると特定する。そして、対応関係にあるデータ位置に基づいて、再生タイミングｔｓ２が再生タイミングｔｄ２になるように時刻情報を変更し、歌唱特徴量データＳｓを時間軸上で平行移動させる。図１０に示す例では、区間ＳＡに移動するように時刻情報が変更される。その後の伸縮処理については、区間ＳＡの範囲内で実施される。分割伸縮処理においては、区間ＳＡを複数の区間に分割して伸縮させればよい。 As a result, the reproduction timing ts2 of the predetermined data position in the section Ds2 in the singing feature value data Ss and the reproduction timing td2 in the predetermined data position of the section Dd2 in the model feature data Sd are identified as corresponding to each other. To do. Then, based on the data position in the correspondence relationship, the time information is changed so that the reproduction timing ts2 becomes the reproduction timing td2, and the singing feature value data Ss is translated on the time axis. In the example shown in FIG. 10, the time information is changed so as to move to the section SA. Subsequent expansion / contraction processing is performed within the range of the section SA. In the division expansion / contraction process, the section SA may be divided into a plurality of sections and expanded / contracted.

なお、楽曲全体が歌唱されているものの、１コーラス目と２コーラス目との間が長く、歌唱区間が複数に分かれて検出される場合もある。このような場合には、それぞれの歌唱区間について、図１０に示す処理を行って、伸縮処理をすればよい。 In addition, although the whole music is sung, the distance between the first chorus and the second chorus is long, and the singing section may be detected in a plurality of parts. In such a case, the process shown in FIG. 10 may be performed for each singing section to perform the expansion / contraction process.

以上の通り、第１実施形態に係るデータ処理システム１０００においては、評価装置１において入力された歌唱者の歌唱音声のデータをデータベース５に登録する。このとき、すでに同じ楽曲としてデータベース５において、模範音声データの一部として登録されている歌唱音声データとは、時間的なずれ（早い歌唱または遅い歌唱）が生じていても、このずれを少なくすることができる。その結果、新たな歌唱音声データを模範音データの一部に含むようにして、歌唱の特徴量を統計処理する際に、有用なデータとすることができる。また、データベース５に登録された同一楽曲の複数の歌唱音声データのうち、２つの歌唱音声データを用いて再生すると、デュエット歌唱を再現することもできる。これらの歌唱音声データは、ネットワークＮＷを介して評価装置１または様々な端末（スマートフォン、パーソナルコンピュータ等）にダウンロードして聴取することもできる。 As described above, in the data processing system 1000 according to the first embodiment, the singing voice data of the singer input in the evaluation device 1 is registered in the database 5. At this time, even if there is a time lag (early singing or late singing) from the singing voice data already registered as a part of the model voice data in the database 5 as the same music, this deviation is reduced. be able to. As a result, new singing voice data can be included in a part of the model sound data, and can be made useful data when statistically processing the singing feature amount. Moreover, when it reproduces | regenerates using two song audio | speech data among several song audio | voice data of the same music registered into the database 5, a duet song can also be reproduced. These singing voice data can also be downloaded and listened to the evaluation apparatus 1 or various terminals (smartphone, personal computer, etc.) via the network NW.

＜第２実施形態＞
第２実施形態においては、歌唱音声データを伸縮させる際に比較の対象となる模範特徴量データが、ガイドメロディを示すデータである場合について説明する。このように模範特徴量データは、歌唱音声のピッチ等の特徴量の波形を示すデータに限らず、離散的なピッチ（例えば、１００ｃｅｎｔ単位で定めたピッチ）によってメロディを表したデータであってもよい。この場合の具体的な例について図１１を用いて説明する。 Second Embodiment
In the second embodiment, a case will be described in which model feature data that is a comparison target when expanding and contracting singing voice data is data indicating a guide melody. As described above, the exemplary feature amount data is not limited to data indicating the waveform of the feature amount such as the pitch of the singing voice, but may be data representing a melody with a discrete pitch (for example, a pitch determined in units of 100 cents). Good. A specific example in this case will be described with reference to FIG.

図１１は、本発明の第２実施形態における特定処理および移動処理の具体例を示す図である。図１１に示すように、模範特徴量データＳｄは、離散的なピッチとその期間によってメロディを表したデータである。このような模範特徴量データＳｄを基準として、上記の実施形態と同様な歌唱特徴量データＳｓを伸縮させる例を示している。図５に示す伸縮処理と同様に、歌唱区間Ｄｓ１を抽出し、冒頭の区間Ｄｓ２の歌唱特徴量データＳｓの波形を、模範特徴量データＳｄの様々な領域のメロディと比較して、最も類似度が高くなる領域を区間Ｄｄ２として特定する。類似度の算出については、例えば、離散的なピッチのメロディを波形とみなした上で、上記実施形態と同様に、区間Ｄｓ２における歌唱特徴量データＳｓの波形との距離を用いればよい。 FIG. 11 is a diagram showing a specific example of the specifying process and the moving process in the second embodiment of the present invention. As shown in FIG. 11, the exemplary feature data Sd is data representing a melody by a discrete pitch and its period. The example which expands / contracts the singing feature-value data Ss similar to said embodiment on the basis of such model feature-value data Sd is shown. Similar to the expansion / contraction processing shown in FIG. 5, the singing section Ds1 is extracted, and the waveform of the singing feature amount data Ss in the opening section Ds2 is compared with the melody of various regions of the model feature amount data Sd, and the degree of similarity is highest. A region in which the value becomes high is specified as the section Dd2. For calculating the similarity, for example, a melody with a discrete pitch is regarded as a waveform, and the distance from the waveform of the singing feature amount data Ss in the section Ds2 may be used as in the above embodiment.

これによって、歌唱特徴量データＳｓのうち区間Ｄｓ２の所定のデータ位置の再生タイミングｔｓ３と、模範特徴量データＳｄのうち区間Ｄｄ２の所定のデータ位置の再生タイミングｔｄ３とが対応する関係にあると特定する。そして、対応関係にあるデータ位置に基づいて、再生タイミングｔｓ３が再生タイミングｔｄ３になるように時刻情報を変更し、歌唱特徴量データＳｓを時間軸上で平行移動させる。その後の伸縮処理については、第１実施形態と同様に実行される。第１実施形態と異なる点は、模範となる比較の対象が離散的なピッチのメロディである点である。類似度の算出は上記と同様に、離散的なピッチのメロディを波形とみなした上で、上記実施形態と同様に、歌唱特徴量データの波形との距離を用いればよい。 As a result, the reproduction timing ts3 of the predetermined data position in the section Ds2 in the singing feature value data Ss and the reproduction timing td3 in the predetermined data position of the section Dd2 in the model feature data Sd are identified as corresponding to each other. To do. Then, based on the data position in the correspondence relationship, the time information is changed so that the reproduction timing ts3 becomes the reproduction timing td3, and the singing feature value data Ss is translated on the time axis. The subsequent expansion / contraction process is executed in the same manner as in the first embodiment. The difference from the first embodiment is that the model to be compared is a melody having a discrete pitch. Similar to the above, the similarity can be calculated by considering a melody having a discrete pitch as a waveform and using the distance from the waveform of the singing feature value data as in the above embodiment.

＜第３実施形態＞
第３実施形態においては、第１実施形態とは異なる方法で伸縮処理を行う。ただし、特定部３０５および移動部３０７については、第１実施形態とほぼ同様の処理を実行する。一方、伸縮部３０９は、歌唱特徴量データと模範特徴量データとの波形間の距離に応じて時刻情報を変更することは変わりが無いが、その過程となる処理が第１実施形態とは異なる。すなわち、第１実施形態では分割伸縮処理を用いていたが、第３実施形態では逐次伸縮処理を採用する。以下に逐次伸縮処理について説明する。 <Third Embodiment>
In the third embodiment, the expansion / contraction process is performed by a method different from that of the first embodiment. However, the identifying unit 305 and the moving unit 307 perform substantially the same processing as in the first embodiment. On the other hand, the expansion / contraction unit 309 changes the time information according to the distance between the waveforms of the singing feature value data and the model feature value data, but the process is different from the first embodiment. . That is, the division expansion / contraction process is used in the first embodiment, but the sequential expansion / contraction process is adopted in the third embodiment. The sequential expansion / contraction process will be described below.

図１２は、本発明の第３実施形態における伸縮処理を示すフローチャートである。歌唱特徴量データＳｓに基づいて、歌唱区間Ｄｓを抽出する（図１２；ステップＳ２０１）。なお、この例では、模範特徴量データＳｄについては、予め歌唱区間Ｄｄが抽出されたデータになっている。続いて、ＯＦＦＳＥＴの算出を行う（ステップＳ２０３）。この処理は、特定部３０５によって実行される。ＯＦＦＳＥＴの算出方法を図１３、図１４を用いて説明する。 FIG. 12 is a flowchart showing expansion / contraction processing in the third embodiment of the present invention. A singing section Ds is extracted based on the singing feature amount data Ss (FIG. 12; step S201). In this example, the model feature value data Sd is data in which the singing section Dd is extracted in advance. Subsequently, OFFSET is calculated (step S203). This process is executed by the specifying unit 305. The OFFSET calculation method will be described with reference to FIGS.

図１３は、本発明の第３実施形態における評価対象ウインドウの例を示す図である。図１４は、本発明の第３実施形態におけるＯＦＦＳＥＴ算出処理によって算出されるＯＦＦＳＥＴを説明する図である。まず、図１３に示すように、模範特徴量データＳｄの冒頭部分に区間Ｂｄを設定し、歌唱特徴量データＳｓの冒頭部分に区間Ｂｓを設定する。ここで、区間Ｂｄと区間Ｂｓとは同じ長さであり、例えば３０秒程度の幅である。 FIG. 13 is a diagram illustrating an example of an evaluation target window in the third embodiment of the present invention. FIG. 14 is a diagram for explaining OFFSET calculated by the OFFSET calculation processing in the third embodiment of the present invention. First, as shown in FIG. 13, a section Bd is set at the beginning of the model feature data Sd, and a section Bs is set at the beginning of the singing feature data Ss. Here, the section Bd and the section Bs have the same length, for example, a width of about 30 seconds.

続いて、図１４に示すように、歌唱特徴量データＳｓ（歌唱区間Ｄｓ）を時間的に全体的に平行移動しつつ、模範特徴量データＳｄの区間Ｂｄのデータ部分と、歌唱特徴量データＳｓの区間Ｂｓのデータ部分とを比較して、類似度が最も高くなる関係となる区間Ｂｓの位置を算出する。類似度は、模範特徴量データＳｄの区間Ｂｄのデータ部分と歌唱特徴量データＳｓの区間Ｂｓのデータ部分との波形間の距離に対応する。そして、類似度が最も高くなる点については最小自乗法を用いて算出する。 Subsequently, as shown in FIG. 14, the singing feature value data Ss (singing section Ds) is translated in time as a whole, and the data portion of the section Bd of the model feature value data Sd and the singing feature value data Ss. Are compared with the data portion of the section Bs, and the position of the section Bs having the highest similarity is calculated. The degree of similarity corresponds to the distance between the waveforms of the data portion of the section Bd of the exemplary feature data Sd and the data portion of the section Bs of the singing feature data Ss. Then, the point with the highest similarity is calculated using the method of least squares.

具体的には、以下の通り算出する。模範特徴量データＳｄの波形を時間ｔの関数で表した場合を、ＤＳｄ（ｔ）とする。また、歌唱特徴量データＳｓの波形を時間ｔの関数で表した場合を、ＤＳｓ（ｔ）とする。そして、歌唱特徴量データＳｓの並行移動量をｔｓとする。Ｃ＝（ＤＳｄ（ｔ）−ＤＳｓ（ｔ＋ｔｓ））²を、区間Ｂｄのｔの範囲（例えば、ｔ＝０〜ｔｂ）で加算したΣＣを算出する。そして、ΣＣが最も小さくなるｔｓを検出する。検出されたｔｓが算出すべきＯＦＦＳＥＴの値となる。このようにしてＯＦＦＳＥＴが算出される。なお、第１実施形態でも述べたように、区間Ｄｓが短い場合には、区間Ｄｄの一部分のみ歌唱されている場合もある。この場合にはＯＦＦＳＥＴが大きい値になる場合もある。この場合であっても、以降の処理は同様に行うことができる。 Specifically, it is calculated as follows. A case where the waveform of the exemplary feature data Sd is expressed as a function of time t is defined as DSd (t). In addition, a case where the waveform of the singing feature value data Ss is expressed as a function of time t is DSs (t). And let ts be the parallel movement amount of the singing feature amount data Ss. ΣC is calculated by adding C = (DSd (t) −DSs (t + ts)) ² in the range t of the section Bd (for example, t = 0 to tb). Then, ts where ΣC is the smallest is detected. The detected ts is the OFFSET value to be calculated. In this way, OFFSET is calculated. As described in the first embodiment, when the section Ds is short, only a part of the section Dd may be sung. In this case, OFFSET may be a large value. Even in this case, the subsequent processing can be performed in the same manner.

続いて、歌唱特徴量データＳｓを時間軸上で平行移動させる（図１２；ステップＳ２０５）。平行移動するときには、算出したＯＦＦＳＥＴが適用されるように、歌唱特徴量データＳｓの時刻情報を変更する。すなわち、ＯＦＦＳＥＴだけ歌唱特徴量データＳｓの各データ位置の再生タイミングをずらすように時刻情報を変更すればよい。以下の説明において、歌唱特徴量データＳｓ１（歌唱区間Ｄｓ１）は、ＯＦＦＳＥＴを反映して時刻情報を変更したデータを示している。 Subsequently, the singing feature value data Ss is translated on the time axis (FIG. 12; step S205). When translating, the time information of the singing feature value data Ss is changed so that the calculated OFFSET is applied. That is, the time information may be changed so that the reproduction timing of each data position of the singing feature value data Ss is shifted by OFFSET. In the following description, the singing feature amount data Ss1 (singing section Ds1) indicates data in which time information is changed by reflecting OFFSET.

続いて、模範特徴量データＳｄと歌唱特徴量データＳｓ１とのずれに対応するＳＨＩＦＴを算出する（図１２；ステップＳ２０７）。ＳＨＩＦＴの算出方法を図１５を用いて説明する。 Subsequently, the SHIFT corresponding to the difference between the exemplary feature data Sd and the singing feature data Ss1 is calculated (FIG. 12; step S207). A method for calculating SHIFT will be described with reference to FIG.

図１５は、本発明の第３実施形態におけるＳＨＩＦＴ算出処理によって算出されるＳＨＩＦＴを説明する図である。まず、模範特徴量データＳｄの冒頭部分に設定した区間Ｂｄの位置を変更する。変更する量は、予め決められた時間（単位時間という）である。なお、単位時間は、区間Ｂｄの長さに対して短くてもよいし、長くてもよい。 FIG. 15 is a diagram for explaining the SHIFT calculated by the SHIFT calculation process according to the third embodiment of the present invention. First, the position of the section Bd set at the beginning of the model feature data Sd is changed. The amount to be changed is a predetermined time (referred to as unit time). The unit time may be shorter or longer than the length of the section Bd.

そして、区間Ｂｄに対応する位置となる区間Ｂｓを検出する。ここでは、変更後の区間Ｂｄのデータ部分と、変更後の区間Ｂｓのデータ部分との類似度が最も高くなるように、歌唱特徴量データＳｓ１における区間Ｂｓの位置が検出される。区間Ｂｄに対応する区間Ｂｓの位置を算出する具体的な方法は、ＯＦＦＳＥＴを算出する方法と同様である。検出された区間Ｂｓと区間Ｂｄとの位置の差をＳＨＩＦＴの値として算出する。 Then, a section Bs that is a position corresponding to the section Bd is detected. Here, the position of the section Bs in the singing feature amount data Ss1 is detected so that the similarity between the data part of the section Bd after the change and the data part of the section Bs after the change becomes the highest. A specific method for calculating the position of the section Bs corresponding to the section Bd is the same as the method for calculating OFFSET. A difference in position between the detected section Bs and the section Bd is calculated as a SHIFT value.

図１５に示すように、区間Ｂｄを単位時間ずつ進めて、上記の処理を繰り返して、それぞれの区間Ｂｄに対応したＳＨＩＦＴを算出する（ステップＳ２０９；Ｎｏ、ステップＳ２０７）。区間Ｂｄが模範特徴量データＳｄの最後のデータに到達すると（ステップＳ２０９；Ｙｅｓ）、ＳＨＩＦＴの算出が終了する。なお、区間Ｂｄに対して区間Ｂｓが時間的に進んでいる場合に、ＳＨＩＦＴ＞０とする。 As shown in FIG. 15, the section Bd is advanced by unit time, and the above processing is repeated to calculate SHIFT corresponding to each section Bd (step S209; No, step S207). When the section Bd reaches the last data of the exemplary feature data Sd (step S209; Yes), the calculation of SHIFT ends. Note that SHIFT> 0 is set when the section Bs advances in time with respect to the section Bd.

図１６は、本発明の第３実施形態における模範特徴量データＳｄと歌唱特徴量データＳｓとのデータ各部の対応関係を説明する図である。横軸が模範特徴量データＳｄに対応し、縦軸が歌唱特徴量データＳｓに対応する。 FIG. 16 is a diagram for explaining a correspondence relationship between each part of the model feature data Sd and the singing feature data Ss in the third embodiment of the present invention. The horizontal axis corresponds to the model feature data Sd, and the vertical axis corresponds to the singing feature data Ss.

上記のＳＨＩＦＴの値は、ＳＨＩＦＴ＝０からのずれ量に対応する。すなわち、模範特徴量データＳｄと歌唱特徴量データＳｓとの時間的なずれがＯＦＦＳＥＴ以外に存在しない場合、ＳＨＩＦＴ＝０の破線部分に一致する。データ位置ｔｍより前の区間においては、歌唱音声は模範音声に対して歌唱すべき位置が遅れ、ＳＨＩＦＴ＜０になっている。また、データ位置ｔｍより後の区間においては、歌唱音声は模範音声に対して歌唱すべき位置が早く、ＳＨＩＦＴ＞０になっている。ＳＦは、最後のデータにおけるＳＨＩＦＴの値である。時間的なずれがあったとしても、歌唱音声と模範音声との歌唱の進行速度が同じであれば、ＳＨＩＦＴ＝０の傾きと並行になる。一方歌唱音声が模範音声よりも歌唱の進行速度が遅い場合にはＳＨＩＦＴ＝０より小さい傾きとなり、歌唱の進行速度が速い場合には、ＳＨＩＦＴ＝０より大きい傾きとなる。 The SHIFT value corresponds to the amount of deviation from SHIFT = 0. That is, when there is no time difference between the exemplary feature value data Sd and the singing feature value data Ss other than OFFSET, this corresponds to the broken line portion of SHIFT = 0. In the section before the data position tm, the singing voice is delayed in the position to be sung with respect to the model voice, and SHIFT <0. In the section after the data position tm, the singing voice has a position where the singing voice should be sung earlier than the model voice, and SHIFT> 0. SF is the value of SHIFT in the last data. Even if there is a time lag, if the singing voice and the model voice have the same singing speed, the slope is parallel to the SHIFT = 0 slope. On the other hand, the singing voice has a slope smaller than SHIFT = 0 when the singing progress speed is slower than the model voice, and the slope higher than SHIFT = 0 when the singing progress speed is fast.

続いて、算出したＳＨＩＦＴに基づいて、歌唱特徴量データＳｓ１を伸縮する逐次伸縮処理を実行する（ステップＳ２１１）。この例では、歌唱特徴量データＳｓ１の時刻情報が各データ位置におけるＳＨＩＦＴに応じて変更される。例えば、各データ位置ｔのＳＨＩＦＴの値をＳ（ｔ）とすれば、歌唱特徴量データＳｓ１の時刻情報において、各データ位置「ｔ」を「ｔ−Ｓ（ｔ）」に変更すればよい。なお、ＳＨＩＦＴの値が算出されていないデータ位置がある場合には、ＳＨＩＦＴの値を補完して用いればよい。なお、データ各部の対応関係を示す線の複雑度が所定値以上である場合には、ＳＨＩＦＴ算出処理を改めて実行してもよい。このときには、２番目に類似度が高いものを用いてもよいし、類似度の算出方法を変更してもよい。 Subsequently, based on the calculated SHIFT, a sequential expansion / contraction process for expanding / contracting the singing feature value data Ss1 is executed (step S211). In this example, the time information of the singing feature value data Ss1 is changed according to the SHIFT at each data position. For example, if the SHIFT value at each data position t is S (t), each data position “t” may be changed to “t−S (t)” in the time information of the singing feature value data Ss1. If there is a data position for which the SHIFT value is not calculated, the SHIFT value may be complemented and used. In addition, when the complexity of the line indicating the correspondence relationship between each part of the data is equal to or greater than a predetermined value, the SHIFT calculation process may be executed again. At this time, the second highest similarity may be used, or the similarity calculation method may be changed.

上記の複雑度については、データ各部の対抗関係を示す線の滑らかさの程度を示す指標とし、滑らかなほど複雑度が低い値として定義されればよい。例えば、対応関係を示す線に不連続点または急激に変化する領域（波形状である領域等）が存在する場合には、複雑度が高い。 The above complexity may be defined as an index indicating the degree of smoothness of the line indicating the opposing relationship of each part of the data, and as the smoothness becomes lower, the complexity is lower. For example, when there is a discontinuous point or a rapidly changing region (such as a region having a wave shape) on the line indicating the correspondence, the complexity is high.

図１７は、本発明の第３実施形態における歌唱特徴量データの伸縮を説明する図である。図１７では、歌唱特徴量データＳｓ（歌唱区間Ｄｓ）、ＯＦＦＳＥＴが調整された歌唱特徴量データＳｓ１（歌唱区間Ｄｓ１）、さらに逐次伸縮処理がなされた歌唱特徴量データＳｓｓ（歌唱区間Ｄｓｓ）を、模範特徴量データＳｄ（歌唱区間Ｄｄ）と比較して示している。 FIG. 17 is a diagram for explaining expansion and contraction of singing feature value data in the third embodiment of the present invention. In FIG. 17, the singing feature value data Ss (singing section Ds), the singing feature value data Ss1 (singing section Ds1) in which OFFSET is adjusted, and the singing feature value data Sss (singing section Dss) subjected to successive expansion / contraction processing, This is shown in comparison with the exemplary feature data Sd (singing section Dd).

歌唱特徴量データＳｓ、Ｓｓ１、Ｓｓｓについては、時刻情報の変更に伴うデータ位置の変化を模式的に示している。ＯＦＦＳＥＴ、ｔｍ、ＳＦについては、上述の例で示したものと同じである。ｔｍより前の区間においては再生タイミングを進め、後の区間においては再生タイミングを戻すように時刻情報を変更することになる。このようにして変更された時刻情報を歌唱音声データに適用することにより、歌唱音声データを伸縮させて、模範音声データに類似したデータに変更することができる。 About singing feature-value data Ss, Ss1, and Sss, the change of the data position accompanying the change of time information is shown typically. OFFSET, tm, and SF are the same as those shown in the above example. The time information is changed so that the playback timing is advanced in the section before tm and the playback timing is returned in the section after the tm. By applying the time information thus changed to the singing voice data, the singing voice data can be stretched and changed to data similar to the model voice data.

＜第４実施形態＞
第４実施形態においては、上述した実施形態の伸縮機能３００において、時刻情報を変更した歌唱音声データおよび歌唱特徴量データを、データベース５に登録する際に、既に登録されているデータと合成する処理を行う場合について説明する。 <Fourth embodiment>
In 4th Embodiment, in the expansion-contraction function 300 of embodiment mentioned above, when registering the singing voice data and singing feature-value data which changed time information in the database 5, the process which synthesize | combines with the data already registered. The case of performing will be described.

第４実施形態における伸縮機能は、上記実施形態における伸縮機能３００に対して合成部をさらに備える。合成部以外については、伸縮機能３００と同様であり、既に説明しているとおりである。伸縮機能３００における伸縮部３０９の出力がデータベース５に入力される代わりに、第４実施形態では合成部に入力される。すなわち、合成部は、伸縮部３０９において、時刻情報が変更された歌唱音声データおよび歌唱特徴量データを取得する。そして、合成部は、これらに対応する楽曲の模範音声データおよび模範特徴量データをデータベース５から取得して、伸縮部３０９から取得したデータと合成し、データベース５の登録内容を更新する The expansion / contraction function in the fourth embodiment further includes a combining unit with respect to the expansion / contraction function 300 in the above embodiment. The parts other than the combining unit are the same as those of the expansion / contraction function 300, as already described. Instead of the output of the expansion / contraction unit 309 in the expansion / contraction function 300 being input to the database 5, in the fourth embodiment, the output is input to the combining unit. That is, the synthesizing unit acquires the singing voice data and the singing feature amount data in which the time information is changed in the expansion / contraction unit 309. And a synthetic | combination part acquires the model audio | voice data and model feature-value data of a music corresponding to these from the database 5, synthesize | combines with the data acquired from the expansion-contraction part 309, and updates the registration content of the database 5.

合成とは、データベース５から取得した模範音声データを構成する複数の歌唱音声データの一つとして、伸縮部３０９から取得した歌唱音声データを含めることを含む。同様に、データベース５から取得した模範特徴量データに、伸縮部３０９から取得した歌唱特徴量データを含めることを含む。合成部３１１は、合成によって得られた新たな模範音声データおよび模範特徴量データの統計処理を、さらに行ってデータベース５の登録内容を更新してもよい。 The synthesis includes including the singing voice data acquired from the expansion / contraction unit 309 as one of a plurality of singing voice data constituting the model voice data acquired from the database 5. Similarly, the singing feature value data acquired from the expansion / contraction unit 309 is included in the model feature value data acquired from the database 5. The synthesizer 311 may further update the registered content of the database 5 by further performing statistical processing on the new model voice data and model feature data obtained by the synthesis.

＜その他の実施形態＞
上記の実施形態において、特徴量の例は、音のピッチである場合について説明した。特徴量としては、ピッチ以外にも、音量レベル、倍音比率等であってもよい。また、特徴量は特性値そのものとして得られる値だけでなく、微分を用いた演算等によって特性値の変化量として得られる値であってもよい。このように変化量で表すと、絶対値の成分を除去できるため、例えば、音量レベル等、絶対値としては重要な要素を持たず、相対的な変化を重視するパラメータである場合に有効である。また、キーシフトした歌唱などにおいても、適切に評価することができる。 <Other embodiments>
In the above embodiment, the case where the example of the feature amount is the pitch of the sound has been described. In addition to the pitch, the feature amount may be a volume level, a harmonic ratio, or the like. The feature amount is not limited to a value obtained as the characteristic value itself, but may be a value obtained as a change amount of the characteristic value by an operation using differentiation or the like. When expressed in terms of the amount of change in this way, the component of the absolute value can be removed, and for example, it is effective when the parameter does not have an important element as an absolute value such as a volume level and emphasizes relative change. . In addition, it is possible to appropriately evaluate a key-shifted song.

歌唱音声データおよび模範音声データが示す音は、歌唱者による音声に限られず、歌唱合成による音声であってもよいし、楽器音であってもよい。楽器音である場合には、単音演奏であることが望ましい。 The sound represented by the singing voice data and the model voice data is not limited to the voice by the singer, but may be a voice by singing synthesis or an instrument sound. If it is a musical instrument sound, it is desirable to be a single note performance.

１…評価装置、１１…制御部、１３…記憶部、１５…操作部、１７…表示部、１９…通信部、２１…信号処理部、２３…マイクロフォン、２５…スピーカ、３…データ処理装置、３１…制御部、３３…記憶部、３９…通信部、５…データベース、１００…評価機能、１０１…入力音取得部、１０２…模範音取得部、１０３…特徴量算出部、１０５…比較部、１０７…評価値算出部、３００…伸縮機能、３０１…音データ取得部、３０３…特徴量算出部、３０５…特定部、３０７…移動部、３０９…伸縮部 DESCRIPTION OF SYMBOLS 1 ... Evaluation apparatus, 11 ... Control part, 13 ... Memory | storage part, 15 ... Operation part, 17 ... Display part, 19 ... Communication part, 21 ... Signal processing part, 23 ... Microphone, 25 ... Speaker, 3 ... Data processing apparatus, DESCRIPTION OF SYMBOLS 31 ... Control part, 33 ... Memory | storage part, 39 ... Communication part, 5 ... Database, 100 ... Evaluation function, 101 ... Input sound acquisition part, 102 ... Model sound acquisition part, 103 ... Feature-value calculation part, 105 ... Comparison part, DESCRIPTION OF SYMBOLS 107 ... Evaluation value calculation part, 300 ... Expansion / contraction function, 301 ... Sound data acquisition part, 303 ... Feature-value calculation part, 305 ... Specification part, 307 ... Movement part, 309 ... Expansion / contraction part

Claims

A sound data acquisition unit that acquires first sound data including time information that defines the reproduction timing of each part of the data;
A feature amount calculation unit that calculates a feature amount of the first sound data in time series and generates first feature amount data in which the feature amount is associated with the time information;
Based on the second feature value data indicating the feature value of the second sound data and the first feature value data, the first feature value corresponding to the second data position of the second feature value data. A specifying unit for specifying the first data position of the data;
A moving unit that changes the time information so as to translate the reproduction timing based on a temporal positional relationship between the first data position and the second data position;
The first feature value data associated with the changed time information is compared with the second feature value data to detect the correspondence of each part of the data, and based on the correspondence, the reproduction timing An expansion / contraction part that changes the time information so as to expand / contract the interval,
A data processing apparatus comprising:

The expansion / contraction unit moves the predetermined reproduction timing of the first sound data based on the correspondence relationship, so that the reproduction timing of one of the first interval before the reproduction timing and the second interval after the reproduction timing is changed. The data processing apparatus according to claim 1, further comprising an expansion / contraction process for extending the interval and reducing the reproduction timing interval on the other hand, and changing the time information based on the expansion / contraction process.

The said expansion / contraction part divides | segments the range of the said sound data into a some division | segmentation area, and performs the division | segmentation expansion / contraction process which performs the said expansion / contraction process in the said division | segmentation area, The said time information is changed. Data processing equipment.

The data processing apparatus according to claim 3, wherein the expansion / contraction unit executes the division expansion / contraction process a plurality of times, and the division section is shortened each time the division expansion / contraction process is executed.

The second sound data includes a plurality of sound data,
The second feature amount data indicates a feature amount of the plurality of sound data and a distribution of the feature amount,
5. The data processing apparatus according to claim 1, wherein the expansion / contraction unit detects the correspondence relationship based on a distribution of the feature amount in the second feature amount data.

The data processing apparatus according to claim 5, further comprising a synthesizing unit that synthesizes the first sound data including the time information changed by the expansion / contraction unit with the second sound data.