JP2020064177A

JP2020064177A - Information processing apparatus and program

Info

Publication number: JP2020064177A
Application number: JP2018195753A
Authority: JP
Inventors: 立司西; Tatsuji Nishi
Original assignee: NTT Docomo Inc
Current assignee: NTT Docomo Inc
Priority date: 2018-10-17
Filing date: 2018-10-17
Publication date: 2020-04-23
Anticipated expiration: 2038-10-17
Also published as: JP7232011B2

Abstract

To make a performance sound of a player and a musical score sound on a musical score correspond to each other even if a performance speed changes.SOLUTION: A recognition part 12 recognizes, from performance sound data representing a performance sound absorbed by a sound collection part 11, a pitch and sound pressure of the performance sound. An identification part 13 identifies, if the degree of matching between a pitch of a first musical score sound included in musical score data and the pitch of the performance sound recognized from the performance sound data exceeds a threshold, a start timing of the first performance sound corresponding to the first musical score sound. Further, the identification part 13 identifies, if the degree of matching between a pitch of a second musical score sound following the first musical score sound of the musical score data and a pitch of a performance sound following the first performance sound recognized from the performance sound data exceeds a threshold, an end timing of the first performance sound corresponding to the first musical score sound.SELECTED DRAWING: Figure 3

Description

本発明は、演奏者の演奏音と楽譜上の楽譜音とを対応させるための技術に関する。 The present invention relates to a technique for associating a performance sound of a performer with a score sound on a score.

演奏者の演奏を評価するためには、演奏者が楽譜上のどの位置を演奏しているのかを特定する必要がある（例えば特許文献１参照）。 In order to evaluate the performance of the performer, it is necessary to specify which position on the score the performer is playing (for example, refer to Patent Document 1).

特開２００７−１３３１９９号公報JP, 2007-133199, A

近年のスマートホン等のモバイル端末の普及に伴い、この種の端末によってピアノ等の演奏を収音してその演奏を評価するようなサービスが望まれている。このようなサービスにおいては、例えば端末において決められた演奏速度に応じて楽譜の表示を遷移させていき、その楽譜の進行に応じてユーザが演奏を行うことで、同じタイミングに相当する演奏音と楽譜上の音とを比較する、という手法が考えられる。 With the spread of mobile terminals such as smart phones in recent years, there is a demand for a service that collects a performance of a piano or the like by this type of terminal and evaluates the performance. In such a service, for example, the musical score display is transited in accordance with the performance speed determined on the terminal, and the user plays in accordance with the progress of the musical score, so that a musical sound corresponding to the same timing is produced. A method of comparing with the sound on the score can be considered.

しかしながら、このような手法では演奏速度が予め決められているため、ユーザが意図した演奏速度で演奏を行った場合、正しく評価を行えないという問題がある。 However, in such a method, since the performance speed is predetermined, there is a problem that the evaluation cannot be performed correctly when the performance is performed by the user.

そこで、本発明は、演奏速度が変化した場合であっても、演奏者の演奏音と楽譜上の楽譜音とを対応させることを目的とする。 Therefore, an object of the present invention is to make the performance sound of the performer correspond to the score sound on the score even when the performance speed changes.

上記課題を解決するため、本発明は、収音された演奏音を示す演奏音データにおいて、演奏音の音高を認識する認識部と、楽譜データによって示される楽譜音の音高と、前記認識部により演奏音データにおいて認識された演奏音の音高との一致度に基づいて、楽譜音と演奏音との対応関係を特定する特定部とを備え、前記特定部は、前記楽譜データに含まれる第１の楽譜音の音高と、前記演奏音データにおいて認識された演奏音の音高との一致度が閾値を超えた場合に、当該第１の楽譜音に対応する第１の演奏音の開始タイミングを特定し、前記楽譜データにおいて前記第１の楽譜音に続く第２の楽譜音の音高と、前記演奏音データにおいて認識された前記第１の演奏音に続く演奏音の音高との一致度が閾値を超えた場合に、前記第１の楽譜音に対応する前記第１の演奏音の終了タイミングを特定することを特徴とする情報処理装置を提供する。 In order to solve the above-mentioned problems, the present invention, in the performance sound data indicating the collected performance sound, a recognition unit that recognizes the pitch of the performance sound, the pitch of the score sound indicated by the score data, and the recognition. A specifying unit that specifies a correspondence relationship between the musical score sound and the musical performance sound based on the degree of coincidence with the pitch of the musical performance sound recognized in the musical sound data by the unit, and the specifying unit is included in the musical score data. When the degree of coincidence between the pitch of the first musical note sound and the pitch of the musical performance sound recognized in the musical performance sound data exceeds a threshold value, the first musical sound corresponding to the first musical note sound. Is specified, the pitch of the second musical note sound following the first musical note sound in the musical score data, and the pitch of the musical performance sound following the first musical sound recognized in the musical sound data. If the degree of coincidence with the Identifying the end timing of the first performance sound corresponding to an information processing apparatus according to claim.

前記特定部により特定された対応関係にある楽譜音及び演奏音を比較して演奏を評価する評価部を備えるようにしてもよい。 An evaluation unit may be provided that evaluates the performance by comparing the musical score sound and the performance sound that are in the correspondence relationship specified by the specification unit.

前記特定部は、前記楽譜データにおける１の楽譜音の音高と、前記演奏音データにおいて認識された演奏音の音高とを時間的に連続して所定回数比較し、比較した各々の音高の一致度が閾値を超えた場合に、当該楽譜音と当該演奏音とが対応することを特定するようにしてもよい。 The specifying unit continuously and temporally compares the pitch of one score sound in the score data with the pitch of the performance sound recognized in the performance sound data a predetermined number of times, and compares the respective pitches. If the degree of coincidence exceeds the threshold value, it may be specified that the musical score sound and the performance sound correspond to each other.

前記所定回数は、前記１の楽譜音の長さ、前記楽譜データにおけるテンポ又は演奏のテンポに応じて異なるようにしてもよい。 The predetermined number of times may be different depending on the length of the score sound, the tempo in the score data, or the tempo of performance.

前記楽譜データにおいて同一の音高の楽譜音が連続している場合と異なる音高の楽譜音が連続している場合とで、前記特定部が楽譜音に対応する演奏音の開始タイミング又は終了タイミングを特定する方法が異なるようにしてもよい。 The start timing or the end timing of the performance sound corresponding to the musical score sound by the specifying unit depending on whether the musical sound data of the same pitch is continuous in the musical score data or the musical sound of different pitches is continuous. The method of specifying the may be different.

前記楽譜データにおいて同一の音高の第１の楽譜音及び第２の楽譜音が連続する場合に、前記特定部は、当該第１の楽譜音及び当該第２の楽譜音の音高との一致度が閾値を超える演奏音の区間において、演奏音の音圧の増減に基づいて、前記第１の楽譜音に対応する第１の演奏音を特定し、前記第２の楽譜音に対応する第２の演奏音を特定するようにしてもよい。 When the first musical note sound and the second musical note sound of the same pitch are consecutive in the musical score data, the specifying unit matches the pitches of the first musical note sound and the second musical note sound. In the section of the performance sound whose degree exceeds the threshold value, the first performance sound corresponding to the first musical score sound is specified based on the increase / decrease in the sound pressure of the performance sound, and the first musical sound corresponding to the second musical sound is identified. The second performance sound may be specified.

前記楽譜データにおいて異なる音高の第１の楽譜音及び第２の楽譜音が連続する場合に、前記特定部は、前記楽譜データに含まれる第１の楽譜音の音高と、前記演奏音データにおいて認識された演奏音の音高との一致度が閾値を超えた場合に、当該第１の楽譜音に対応する第１の演奏音の開始タイミングを特定し、前記楽譜データにおいて前記第１の楽譜音に続く第２の楽譜音の音高と、前記演奏音データにおいて認識された前記第１の演奏音に続く演奏音の音高との一致度が閾値を超えた場合に、前記第１の楽譜音に対応する前記第１の演奏音の終了タイミングを特定するようにしてもよい。 When the first musical note sound and the second musical note sound with different pitches are continuous in the musical score data, the specifying unit determines the pitch of the first musical note sound included in the musical score data and the performance sound data. When the degree of coincidence with the pitch of the performance sound recognized in step S1 exceeds a threshold value, the start timing of the first performance sound corresponding to the first musical score sound is specified, and the first timing is specified in the musical score data. When the degree of coincidence between the pitch of the second musical score sound following the musical score sound and the pitch of the musical performance sound following the first musical performance sound recognized in the musical performance data exceeds a threshold value, the first musical sound The end timing of the first performance sound corresponding to the musical score sound may be specified.

前記特定部は、前記楽譜データにおいて前記第１の楽譜音に続く第２の楽譜音の音高と、前記演奏音データにおいて認識された前記第１の演奏音に続く演奏音の音高との一致度が閾値を超えない期間が所定期間を経過した場合には、前記第１の楽譜音に対応する前記第１の演奏音が終了したと判断するようにしてもよい。 The specifying unit includes a pitch of a second musical note following the first musical note in the musical score data and a pitch of a musical note following the first musical note recognized in the musical note data. If the period in which the degree of coincidence does not exceed the threshold has passed a predetermined period, it may be determined that the first performance sound corresponding to the first musical score sound has ended.

前記特定部は、前記第２の楽譜音の音高と前記第１の演奏音に続く演奏音の音高との一致度が閾値を超えない期間が所定期間を経過した場合が所定回数連続すると、前記楽譜データにおいて時間的に連続する複数の楽譜音の音高の組み合わせと、前記演奏音データにおいて時間的に連続して認識された複数の演奏音の音高の組み合わせとを照合して、楽譜音と演奏音との対応関係を特定するようにしてもよい。 The specifying unit determines that a predetermined number of times continues when a period in which the degree of coincidence between the pitch of the second musical score sound and the pitch of the performance sound following the first performance sound does not exceed a threshold has passed a predetermined number of times. By comparing a combination of pitches of a plurality of musical score sounds that are temporally continuous in the musical score data with a combination of pitches of a plurality of musical performance sounds that are continuously recognized in time in the musical performance sound data, You may make it specify the correspondence of a musical score sound and a performance sound.

また、本発明は、コンピュータを、収音された演奏音を示す演奏音データにおいて、演奏音の音高を認識する認識部と、楽譜データによって示される楽譜音の音高と、前記認識部により演奏音データにおいて認識された演奏音の音高との一致度に基づいて、楽譜音と演奏音との対応関係を特定する特定部であって、前記楽譜データに含まれる第１の楽譜音の音高と、前記演奏音データにおいて認識された演奏音の音高との一致度が閾値を超えた場合に、当該第１の楽譜音に対応する第１の演奏音の開始タイミングを特定し、前記楽譜データにおいて前記第１の楽譜音に続く第２の楽譜音の音高と、前記演奏音データにおいて認識された前記第１の演奏音に続く演奏音の音高との一致度が閾値を超えた場合に、前記第１の楽譜音に対応する前記第１の演奏音の終了タイミングを特定する特定部として機能させるためのプログラムを提供する。 Further, according to the present invention, in a computer, a recognition unit for recognizing a pitch of a performance sound in performance sound data indicating a collected performance sound, a pitch of a music score indicated by the score data, and the recognition unit are provided. A specifying unit that specifies a correspondence relationship between the musical score sound and the musical performance sound based on the degree of coincidence with the pitch of the musical performance sound recognized in the musical performance sound data. When the degree of coincidence between the pitch and the pitch of the performance sound recognized in the performance sound data exceeds a threshold value, the start timing of the first performance sound corresponding to the first musical score sound is specified, In the score data, the degree of coincidence between the pitch of the second score sound following the first score sound and the pitch of the performance sound following the first performance sound recognized in the performance sound data has a threshold value. If it exceeds, the above-mentioned corresponding to the above-mentioned first musical note sound It provides a program for functioning as a specifying unit configured to specify the end timing of a performance sound.

本発明によれば、演奏速度が変化した場合であっても、演奏者の演奏音と楽譜上の楽譜音とを対応させることが可能となる。 According to the present invention, even when the performance speed changes, the performance sound of the performer and the score sound on the score can be associated with each other.

本発明の一実施形態に係る演奏評価システムの構成の一例を示す図である。It is a figure which shows an example of a structure of the performance evaluation system which concerns on one Embodiment of this invention. ユーザ端末のハードウェア構成の一例を示すブロック図である。It is a block diagram which shows an example of the hardware constitutions of a user terminal. ユーザ端末の機能構成の一例を示すブロック図である。It is a block diagram showing an example of functional composition of a user terminal. ユーザ端末の基本動作の一例を示すフローチャートである。It is a flow chart which shows an example of basic operation of a user terminal. ユーザ端末の特定処理の一例を示すフローチャートである。It is a flow chart which shows an example of specific processing of a user terminal. 楽譜音と演奏音との対応関係の一例を説明する図である。It is a figure explaining an example of the correspondence of musical score sound and performance sound.

［構成］
図１は、本実施形態の演奏評価システム１の一例を示す図である。演奏評価システム１は、例えばスマートホンやタブレット等のユーザ端末１０と、サーバ装置２０とを備える。ネットワーク２は、これらユーザ端末１０及びサーバ装置２０を通信可能に接続する。ネットワーク２は、例えばＬＡＮ（Local Area Network）又はＷＡＮ（Wide Area Network）、若しくはこれらの組み合わせであり、有線区間又は無線区間を含んでいる。なお、図１には、ユーザ端末１０及びサーバ装置２０を１つずつ示しているが、これらはそれぞれ複数であってもよい。 [Constitution]
FIG. 1 is a diagram showing an example of a performance evaluation system 1 of the present embodiment. The performance evaluation system 1 includes a user terminal 10 such as a smart phone or a tablet, and a server device 20. The network 2 communicably connects the user terminal 10 and the server device 20. The network 2 is, for example, a LAN (Local Area Network), a WAN (Wide Area Network), or a combination thereof, and includes a wired section or a wireless section. Note that FIG. 1 shows one user terminal 10 and one server device 20, but there may be a plurality of each.

ユーザは、ピアノ等の楽器を演奏するときにユーザ端末１０を自身の近くに置き、ユーザ端末１０に実装されている演奏評価プログラムを起動する。ユーザの演奏が開始されると、ユーザ端末１０は、ユーザによって演奏された音が楽譜上のどの音に対応するかを特定して両者を比較することで、その演奏の優劣を評価する。サーバ装置２０は、上記演奏評価プログラムやそのプログラム実行時に利用されるデータをユーザ端末１０に提供したり、演奏評価プログラムの実行により得られた評価結果等をユーザ端末１０から取得して保存したりする。演奏評価プログラム実行時において、ユーザ端末１０及びサーバ装置２０は協働して動作してもよいし、ユーザ端末１０単体で動作してもよい。本実施形態ではユーザ端末１０単体で動作する例を説明する。 When playing a musical instrument such as a piano, the user puts the user terminal 10 close to himself and activates the performance evaluation program installed in the user terminal 10. When the user's performance starts, the user terminal 10 evaluates the superiority or inferiority of the performance by identifying which sound on the score corresponds to the sound played by the user and comparing the two. The server device 20 provides the performance evaluation program and data used during execution of the program to the user terminal 10, or acquires and stores the evaluation result and the like obtained by executing the performance evaluation program from the user terminal 10. To do. When the performance evaluation program is executed, the user terminal 10 and the server device 20 may operate in cooperation with each other, or the user terminal 10 alone may operate. In this embodiment, an example in which the user terminal 10 operates alone will be described.

図２は、ユーザ端末１０のハードウェア構成の一例を示す図である。ユーザ端末１０は、物理的には、プロセッサ１００１、メモリ１００２、ストレージ１００３、通信装置１００４、入力装置１００５、出力装置１００６及びこれらを接続するバスなどを含むコンピュータ装置として構成されている。なお、以下の説明では、「装置」という文言は、回路、デバイス、ユニットなどに読み替えることができる。ユーザ端末１０のハードウェア構成は、図に示した各装置を１つ又は複数含むように構成されてもよいし、一部の装置を含まずに構成されてもよい。 FIG. 2 is a diagram illustrating an example of the hardware configuration of the user terminal 10. The user terminal 10 is physically configured as a computer device including a processor 1001, a memory 1002, a storage 1003, a communication device 1004, an input device 1005, an output device 1006, and a bus connecting these. In the following description, the word "device" can be read as a circuit, a device, a unit, or the like. The hardware configuration of the user terminal 10 may be configured to include one or a plurality of each device illustrated in the figure, or may be configured not to include some devices.

ユーザ端末１０における各機能は、プロセッサ１００１、メモリ１００２などのハードウェア上に所定のソフトウェア（プログラム）を読み込ませることによって、プロセッサ１００１が演算を行い、通信装置１００４による通信を制御したり、メモリ１００２及びストレージ１００３におけるデータの読み出し及び書き込みの少なくとも一方を制御したりすることによって実現される。 Each function in the user terminal 10 causes a predetermined software (program) to be loaded on hardware such as the processor 1001 and the memory 1002, so that the processor 1001 performs an arithmetic operation, controls communication by the communication device 1004, and controls the memory 1002. Also, it is realized by controlling at least one of reading and writing of data in the storage 1003.

プロセッサ１００１は、例えば、オペレーティングシステムを動作させてコンピュータ全体を制御する。プロセッサ１００１は、周辺装置とのインターフェース、制御装置、演算装置、レジスタなどを含む中央処理装置（ＣＰＵ：Central Processing Unit）によって構成されてもよい。また、例えばベースバンド信号処理部や呼処理部などがプロセッサ１００１によって実現されてもよい。 The processor 1001 operates an operating system to control the entire computer, for example. The processor 1001 may be configured by a central processing unit (CPU) including an interface with peripheral devices, a control device, an arithmetic device, a register, and the like. Further, for example, a baseband signal processing unit, a call processing unit, etc. may be realized by the processor 1001.

プロセッサ１００１は、プログラム（プログラムコード）、ソフトウェアモジュール、データなどを、ストレージ１００３及び通信装置１００４の少なくとも一方からメモリ１００２に読み出し、これらに従って各種の処理を実行する。プログラムとしては、後述する動作の少なくとも一部をコンピュータに実行させるプログラムが用いられる。ユーザ端末１０の機能ブロックは、メモリ１００２に格納され、プロセッサ１００１において動作する制御プログラムによって実現されてもよい。各種の処理は、１つのプロセッサ１００１によって実行されてもよいが、２以上のプロセッサ１００１により同時又は逐次に実行されてもよい。プロセッサ１００１は、１以上のチップによって実装されてもよい。なお、プログラムは、電気通信回線を介してネットワーク２からユーザ端末１０に送信されてもよい。 The processor 1001 reads a program (program code), a software module, data, and the like from at least one of the storage 1003 and the communication device 1004 into the memory 1002, and executes various processes according to these. As the program, a program that causes a computer to execute at least a part of the operations described below is used. The functional block of the user terminal 10 may be implemented by a control program stored in the memory 1002 and operating in the processor 1001. Various types of processing may be executed by one processor 1001, but may be executed simultaneously or sequentially by two or more processors 1001. The processor 1001 may be implemented by one or more chips. The program may be transmitted from the network 2 to the user terminal 10 via an electric communication line.

メモリ１００２は、コンピュータ読み取り可能な記録媒体であり、例えば、ＲＯＭ（Read Only Memory）、ＥＰＲＯＭ（Erasable Programmable ＲＯＭ）、ＥＥＰＲＯＭ（Electrically Erasable Programmable ＲＯＭ）、ＲＡＭ（Random Access Memory）などの少なくとも１つによって構成されてもよい。メモリ１００２は、レジスタ、キャッシュ、メインメモリ（主記憶装置）などと呼ばれてもよい。メモリ１００２は、本実施形態に係る方法を実施するために実行可能なプログラム（プログラムコード）、ソフトウェアモジュールなどを保存することができる。 The memory 1002 is a computer-readable recording medium, and is configured by at least one of a ROM (Read Only Memory), an EPROM (Erasable Programmable ROM), an EEPROM (Electrically Erasable Programmable ROM), a RAM (Random Access Memory), and the like. May be done. The memory 1002 may be called a register, a cache, a main memory (main storage device), or the like. The memory 1002 can store an executable program (program code), a software module, etc. for implementing the method according to the present embodiment.

ストレージ１００３は、コンピュータ読み取り可能な記録媒体であり、例えば、ＣＤ−ＲＯＭ（Compact Disc ＲＯＭ）などの光ディスク、ハードディスクドライブ、フレキシブルディスク、光磁気ディスク(例えば、コンパクトディスク、デジタル多用途ディスク、Ｂｌｕ−ｒａｙ（登録商標）ディスク)、スマートカード、フラッシュメモリ(例えば、カード、スティック、キードライブ)、フロッピー（登録商標）ディスク、磁気ストリップなどの少なくとも１つによって構成されてもよい。ストレージ１００３は、補助記憶装置と呼ばれてもよい。ストレージ１００３は、演奏評価プログラムや後述する楽譜データ群を記憶する。 The storage 1003 is a computer-readable recording medium, for example, an optical disk such as a CD-ROM (Compact Disc ROM), a hard disk drive, a flexible disk, a magneto-optical disk (for example, a compact disk, a digital versatile disk, a Blu-ray disc). At least one of a (registered trademark) disk, a smart card, a flash memory (for example, a card, a stick, and a key drive), a floppy (registered trademark) disk, a magnetic strip, or the like. The storage 1003 may be called an auxiliary storage device. The storage 1003 stores a performance evaluation program and a score data group described later.

通信装置１００４は、有線ネットワーク及び無線ネットワークの少なくとも一方を介してコンピュータ間の通信を行うためのハードウェア（送受信デバイス）であり、例えばネットワークデバイス、ネットワークコントローラ、ネットワークカード、通信モジュールなどともいう。通信装置１００４は、例えば周波数分割複信（ＦＤＤ：Frequency Division Duplex）及び時分割複信（ＴＤＤ：Time Division Duplex）の少なくとも一方を実現するために、高周波スイッチ、デュプレクサ、フィルタ、周波数シンセサイザなどを含んで構成されてもよい。例えば、送受信アンテナ、アンプ部、送受信部、伝送路インターフェースなどは、通信装置１００４によって実現されてもよい。送受信部は、送信部と受信部とで、物理的に、または論理的に分離された実装がなされてもよい。 The communication device 1004 is hardware (transmission / reception device) for performing communication between computers via at least one of a wired network and a wireless network, and is also called, for example, a network device, a network controller, a network card, a communication module, or the like. The communication device 1004 includes a high frequency switch, a duplexer, a filter, a frequency synthesizer, and the like in order to realize at least one of frequency division duplex (FDD) and time division duplex (TDD), for example. May be composed of For example, the transmission / reception antenna, the amplifier unit, the transmission / reception unit, the transmission line interface, and the like may be realized by the communication device 1004. The transmitter / receiver may be implemented by physically or logically separating the transmitter and the receiver.

入力装置１００５は、外部からの入力を受け付ける入力デバイス（例えば、キーボード、マウス、マイクロフォン、スイッチ、ボタン、センサなど）である。出力装置１００６は、外部への出力を実施する出力デバイス（例えば、ディスプレイ、スピーカー、LEDランプなど）である。なお、入力装置１００５及び出力装置１００６は、一体となった構成（例えば、タッチパネル）であってもよい。 The input device 1005 is an input device (for example, a keyboard, a mouse, a microphone, a switch, a button, a sensor, etc.) that receives an input from the outside. The output device 1006 is an output device (for example, a display, a speaker, an LED lamp, etc.) that outputs to the outside. The input device 1005 and the output device 1006 may be integrated (for example, a touch panel).

プロセッサ１００１、メモリ１００２などの各装置は、情報を通信するためのバスによって接続される。バスは、単一のバスを用いて構成されてもよいし、装置間ごとに異なるバスを用いて構成されてもよい。 The respective devices such as the processor 1001 and the memory 1002 are connected by a bus for communicating information. The bus may be configured by using a single bus, or may be configured by using a different bus for each device.

また、ユーザ端末１０は、マイクロプロセッサ、デジタル信号プロセッサ（ＤＳＰ：Digital Signal Processor）、ＡＳＩＣ（Application Specific Integrated Circuit）、ＰＬＤ（Programmable Logic Device）、ＦＰＧＡ（Field Programmable Gate Array）などのハードウェアを含んで構成されてもよく、当該ハードウェアにより、各機能ブロックの一部又は全てが実現されてもよい。例えば、プロセッサ１００１は、これらのハードウェアの少なくとも１つを用いて実装されてもよい。 The user terminal 10 includes hardware such as a microprocessor, a digital signal processor (DSP), an ASIC (Application Specific Integrated Circuit), a PLD (Programmable Logic Device), and an FPGA (Field Programmable Gate Array). It may be configured, and a part or all of each functional block may be realized by the hardware. For example, the processor 1001 may be implemented using at least one of these hardware.

図３は、ユーザ端末１０の機能構成の一例を示すブロック図である。図３に示すように、ユーザ端末１０において、収音部１１と、認識部１２と、特定部１３と、評価部１４という機能ブロックが実現される。 FIG. 3 is a block diagram showing an example of the functional configuration of the user terminal 10. As shown in FIG. 3, in the user terminal 10, functional blocks of a sound collection unit 11, a recognition unit 12, a specification unit 13, and an evaluation unit 14 are realized.

収音部１１は、ユーザがピアノ等の楽器を演奏したときの音（演奏音という）を収音する。 The sound collection unit 11 collects a sound (called a performance sound) when the user plays an instrument such as a piano.

認識部１２は、収音部１１により収音された演奏音を示す演奏音データにおいて、その演奏音の高さ（音高）及び強度（音圧）を認識する。 The recognition unit 12 recognizes the pitch (pitch) and intensity (sound pressure) of the performance sound in the performance sound data indicating the performance sound collected by the sound collection unit 11.

特定部１３は、楽譜データを記憶している。楽譜データは、楽曲を構成する各音の発音タイミング、音高、音圧及び長さ等の、音を示す情報を含む。楽譜データによって示される各音のことを楽譜音という。特定部１３は、楽譜データによって示される楽譜音の音高と、認識部１２により演奏音データにおいて認識された演奏音の音高との一致度に基づいて、楽譜音と演奏音との対応関係を特定する。特定部１３は音高の変化に基づいて演奏音と楽譜音との対応関係を特定する、つまり、音の長さに依存せずに演奏音と楽譜音との対応関係を特定するから、仮にユーザの演奏速度が変化した場合であっても、演奏音と楽譜音とを対応させることが可能となる。 The specifying unit 13 stores score data. The musical score data includes information indicating a sound, such as sounding timing, pitch, sound pressure, and length of each sound constituting the music. Each sound indicated by the score data is called a score sound. The specifying unit 13 correlates the musical score sound with the musical performance sound based on the degree of coincidence between the musical pitch sound pitch indicated by the musical score data and the musical pitch of the musical performance sound recognized by the recognizing unit 12 in the musical performance sound data. Specify. The specifying unit 13 specifies the correspondence between the performance sound and the score sound based on the change in pitch, that is, specifies the correspondence relationship between the performance sound and the score sound without depending on the length of the sound. Even if the user's performance speed changes, the performance sound and the musical score sound can be associated with each other.

評価部１４は、特定部１３により特定された対応関係にある楽譜音及び演奏音を比較して、ユーザの演奏を評価する。さらに、評価部１４は、その評価結果を、演奏の優劣の指標値（例えば点数やレベル）として出力する。このときの出力形態は、表示によるものや音声出力によるものなど、ユーザに通知できる形態であればどのようなものであってもよい。 The evaluation unit 14 evaluates the performance of the user by comparing the musical score sound and the performance sound that are in the correspondence relationship specified by the specification unit 13. Further, the evaluation unit 14 outputs the evaluation result as an index value (for example, score or level) of performance superiority or inferiority. The output form at this time may be any form such as a display form or a voice output form as long as it can notify the user.

なお、前述したように、ユーザ端末１０及びサーバ装置２０が協働して動作する場合は、図３に示した機能ブロックは、ユーザ端末１０及びサーバ装置２０によって実現される。例えばユーザ端末１０が収音部１１及び認識部１２を実現し、サーバ装置２０が特定部１３及び評価部１４を実現するといった例が考えられるが、もちろんこの例に限定されるわけではない。 As described above, when the user terminal 10 and the server device 20 work together, the functional blocks illustrated in FIG. 3 are realized by the user terminal 10 and the server device 20. For example, an example is conceivable in which the user terminal 10 realizes the sound collecting unit 11 and the recognizing unit 12, and the server device 20 realizes the specifying unit 13 and the evaluating unit 14, but the present invention is not limited to this example.

［動作］
図４〜図６を参照して、ユーザ端末１０の動作について説明する。図４，５に示す各処理の手順は、ユーザ端末１０に実装されている演奏評価プログラムに記述されている。図４において、ユーザは、ユーザ端末１０を操作して演奏評価プログラムを起動して、自身がこれから演奏する楽曲を指定する。そして、ユーザがユーザ端末１０において演奏を開始する操作を行うと、ユーザ端末１０の収音部１１は、ユーザによる楽器の演奏音を収音する（ステップＳ１１：収音処理）。 [motion]
The operation of the user terminal 10 will be described with reference to FIGS. 4 to 6. The procedure of each process shown in FIGS. 4 and 5 is described in the performance evaluation program installed in the user terminal 10. In FIG. 4, the user operates the user terminal 10 to activate the performance evaluation program, and designates a song to be played by himself. When the user performs an operation to start playing on the user terminal 10, the sound collecting unit 11 of the user terminal 10 collects the performance sound of the musical instrument played by the user (step S11: sound collection processing).

次に、認識部１２は、収音部１１により収音された演奏音を示す演奏音データにおいて、その演奏音の音高及び音圧を認識する（ステップＳ１２：認識処理）。 Next, the recognition unit 12 recognizes the pitch and sound pressure of the performance sound in the performance sound data indicating the performance sound collected by the sound collection unit 11 (step S12: recognition processing).

ここで、図６は、楽譜音と演奏音との対応関係の一例を説明する図である。図６において、（Ａ）は、縦軸を音高とし、横軸を時間としたときの楽譜音データの一例を示している。（Ｂ）は、縦軸を音高とし、横軸を時間としたときの、（Ａ）に対応する楽譜の区間における演奏音データの一例を示している。（Ｃ）は、縦軸を音圧とし、横軸を時間としたときの、（Ｂ）に対応する区間における演奏音データの一例を示している。例えば図６（Ａ）の楽譜データの例では、楽譜における第１区間（時間Ｔ０〜Ｔ１）では音高Ｃ４、第２区間（時間Ｔ１〜Ｔ２）では音高Ｇ４、第３区間（時間Ｔ２〜Ｔ３）では音高Ｅ４となっている。 Here, FIG. 6 is a diagram illustrating an example of a correspondence relationship between a musical score sound and a performance sound. In FIG. 6, (A) shows an example of musical score sound data in which the vertical axis represents pitch and the horizontal axis represents time. (B) shows an example of performance sound data in the section of the musical score corresponding to (A), where the vertical axis represents the pitch and the horizontal axis represents the time. (C) shows an example of performance sound data in the section corresponding to (B), where the vertical axis represents sound pressure and the horizontal axis represents time. For example, in the example of the score data of FIG. 6A, the pitch C4 is in the first section (time T0 to T1), the pitch G4 is the second section (time T1 to T2), and the third section (time T2 to T2). At T3), the pitch is E4.

認識部１２は、十分に短い時間間隔（例えば２０ｍｓｅｃ）で、演奏音データの周波数を解析して、その音高を検出する。認識部１２は、時間的に所定回数連続して（例えば２０ｍｓｅｃ×４回）、同一音高として決められた範囲内の音高が検出されると、その音高を演奏音の音高として認識する。このとき、同一音高として決められた範囲内の音高が最初に検出されたタイミングが、その音高の演奏音の開始タイミングとして認識される。このような処理により、例えば図６（Ｂ）の例では、第１区間（時間Ｔ０〜Ｔ１）では音高Ｃ４が認識され、第２区間（時間Ｔ１〜Ｔ２）では音高Ｇ４が認識され、第３区間（時間Ｔ２〜Ｔ３）では音高Ｅ４が認識されている。 The recognition unit 12 analyzes the frequency of the performance sound data at a sufficiently short time interval (for example, 20 msec) to detect the pitch. When a pitch within a range determined as the same pitch is detected continuously for a predetermined number of times (for example, 20 msec × 4 times) in time, the recognition unit 12 recognizes the pitch as the pitch of the performance sound. To do. At this time, the timing at which a pitch within the range determined as the same pitch is first detected is recognized as the start timing of the performance sound of that pitch. By such processing, for example, in the example of FIG. 6B, the pitch C4 is recognized in the first section (time T0 to T1), and the pitch G4 is recognized in the second section (time T1 to T2). The pitch E4 is recognized in the third section (time T2 to T3).

また、認識部１２は、演奏音データの音高とともに、その音圧を認識する。楽器がピアノのような鍵盤楽器である場合、ユーザの指が鍵盤を押下する打鍵時の音圧が最も大きく、以降は時間の経過に伴ってその音圧が減衰する。このため、図６（Ｃ）に例示しているように、第１区間、第２区間及び第３区間のそれぞれにおいて、区間の冒頭における音圧が最も大きく、以降は徐々に小さくなっている。 The recognition unit 12 also recognizes the pitch of the performance sound data and the sound pressure thereof. When the musical instrument is a keyboard musical instrument such as a piano, the sound pressure when the user's finger presses the keyboard is the highest, and thereafter, the sound pressure is attenuated with the passage of time. For this reason, as illustrated in FIG. 6C, in each of the first section, the second section, and the third section, the sound pressure at the beginning of the section is the highest and gradually decreases thereafter.

図４において、特定部１３は、ユーザによって指定された楽曲の楽譜データをストレージ１００３から読み出し、その楽譜データによって示される楽譜音の音高と、認識部１２により演奏音データにおいて認識された演奏音の音高との一致度に基づいて、楽譜音と演奏音との対応関係を特定する（ステップＳ１３：特定処理）。 In FIG. 4, the identifying unit 13 reads out the musical score data of the musical composition designated by the user from the storage 1003, and the pitch of the musical score sound indicated by the musical score data and the performance sound recognized in the performance sound data by the recognition unit 12. The correspondence between the musical score sound and the performance sound is specified based on the degree of coincidence with the pitch of (step S13: specifying process).

ここで、図５は、ステップＳ１３の特定処理の一例を示すフローチャートである。ユーザにより演奏開始が指示されて演奏音の収音が始まると、まず、特定部１３は、アタックが発生したか否かを判断する（ステップＳ１３１）。ここでいうアタックとは、ピアノの鍵盤が打鍵されることで、認識部１２により認識された演奏音の音圧が第１閾値を超え、且つ、連続して検出された音圧の差が第２閾値を超えることをいう。第１閾値は、図６（Ｃ）に例示した音高の閾値Ｔｈであり、ピアノの鍵盤が打鍵されたか否かを判断するための基準となる値である。第２閾値は、認識部１２が音圧を認識するときの一単位である時間間隔（例えば２０ｍｓｅｃ）における音圧差に基づいてピアノの鍵盤が打鍵されたか否かを判断するための基準となる値である。 Here, FIG. 5 is a flowchart showing an example of the identification processing in step S13. When the user gives an instruction to start playing and the collection of playing sounds starts, the identifying unit 13 first determines whether an attack has occurred (step S131). Here, the attack means that the sound pressure of the performance sound recognized by the recognition unit 12 exceeds the first threshold when the keyboard of the piano is tapped, and the difference between the sound pressures continuously detected is the first. 2 It means exceeding a threshold value. The first threshold is the pitch threshold Th illustrated in FIG. 6C, which is a reference value for determining whether or not the keyboard of the piano has been tapped. The second threshold value is a reference value for determining whether or not the keyboard of the piano is tapped based on the sound pressure difference in a time interval (for example, 20 msec) that is one unit when the recognition unit 12 recognizes the sound pressure. Is.

アタックが発生した場合（ステップＳ１３１；Ｙｅｓ）、特定部１３は、認識部１２により前述した手順で認識された演奏音の音高と、楽譜データの最初の楽譜音の音高との一致度が或る閾値を超えたか否か（以降、演奏音の音高が楽譜データの最初の楽譜音の音高と一致するか否か、と表現する）を判断する（ステップＳ１３２）。特定部１３は、演奏音の音高が楽譜データの最初の楽譜音の音高と一致すると判断すると、ユーザによる楽器の演奏が開始されたと判断する（ステップＳ１３３）。 When an attack occurs (step S131; Yes), the identifying unit 13 determines whether the pitch of the performance tone recognized by the recognizing unit 12 in the above-described procedure and the pitch of the first musical score of the musical score data match. It is determined whether or not a certain threshold is exceeded (hereinafter expressed as whether or not the pitch of the performance sound matches the pitch of the first score sound of the score data) (step S132). When determining that the pitch of the performance sound matches the pitch of the first score sound in the score data, the identifying unit 13 determines that the user has started playing the musical instrument (step S133).

特定部１３は、楽譜データ上の最初の楽譜音（第１の楽譜音という）と、その次の楽譜音（つまり楽譜データにおける２番目の楽譜音であり、第２の楽譜音という）とが同一の音高であるか否かを判断する（ステップＳ１３４）。第１の楽譜音と第２の楽譜音とが同一の音高、つまり楽譜データにおいて同じ音が連続する場合（ステップＳ１３４；同音）、特定部１３は、ステップＳ１３１〜Ｓ１３２で認識された演奏音（第１の演奏音という）の開始タイミングから一定時間が経過したか否かを判断する（ステップＳ１３５）。第１の演奏音の開始タイミングから一定時間が経過すると（ステップＳ１３５；Ｙｅｓ）、特定部１３は、認識部１２により認識された演奏音の音圧が閾値Ｔｈを超え且つ連続して検出された音圧の差が第２閾値を超えたか否か、つまりアタックが発生したか否かを判断する（ステップＳ１３６）。 The specifying unit 13 detects a first musical sound on the musical score data (referred to as a first musical sound) and a subsequent musical sound (that is, a second musical sound on the musical score data and referred to as a second musical sound). It is determined whether the pitches are the same (step S134). When the first pitch sound and the second pitch sound have the same pitch, that is, the same sound continues in the score data (step S134; same sound), the specifying unit 13 causes the identifying sound to be recognized in steps S131 to S132. It is determined whether or not a certain time has elapsed from the start timing of the (first playing sound) (step S135). When a certain time has elapsed from the start timing of the first performance sound (step S135; Yes), the specifying unit 13 detects that the sound pressure of the performance sound recognized by the recognition unit 12 exceeds the threshold Th and is continuously detected. It is determined whether or not the difference in sound pressure exceeds the second threshold value, that is, whether or not an attack has occurred (step S136).

アタックが発生した場合（ステップＳ１３６；Ｙｅｓ）、特定部１３は、第１の楽譜音と同じ第２の楽譜音の音高と、このアタック時に認識部１２により認識された演奏音（第２の演奏音という）の音高とが一致するか否かを判断する（ステップＳ１３７）。第２の楽譜音の音高と第２の演奏音の音高とが一致した場合には（ステップＳ１３７；Ｙｅｓ）、特定部１３は、その一致したタイミングを、第１の演奏音が終了した終了タイミングであり、第２の演奏音が開始された開始タイミングであると判断する（ステップＳ１３８）。このように、楽譜データにおいて同一の音高の楽譜音が連続する場合に、特定部１３は、楽譜音の音高との一致度が閾値を超える演奏音の区間において、演奏音の音圧の増減に基づいて、第１の楽譜音に対応する第１の演奏音を特定し、第２の楽譜音に対応する第２の演奏音を特定する。 When an attack occurs (step S136; Yes), the specifying unit 13 determines the pitch of the second musical score sound that is the same as the first musical score sound, and the performance sound (second sound) recognized by the recognition unit 12 at the time of this attack. It is determined whether or not the pitch of the performance sound) matches (step S137). When the pitch of the second musical note sound and the pitch of the second musical performance sound match (step S137; Yes), the identifying unit 13 ends the first musical sound at the timing of the matching. It is the end timing, and it is determined that it is the start timing at which the second performance sound is started (step S138). In this way, when the musical score data of the same pitch continues in the musical score data, the specifying unit 13 determines the sound pressure of the musical performance sound in the musical performance sound section in which the degree of coincidence with the musical pitch of the musical score sound exceeds the threshold value. Based on the increase / decrease, the first performance sound corresponding to the first music sound is specified, and the second performance sound corresponding to the second music sound is specified.

そして、特定部１３は、注目対象となる音を、楽譜データ上の次の楽譜音にセットして（ステップＳ１３９）、ステップＳ１３４の処理に遷移する。つまり、特定部１３は、ステップＳ１３４以降の処理において、楽譜データにおける２番目の楽譜音を上述の第１の楽譜音として取り扱い、楽譜データにおける３番目の楽譜音を上述の第２の楽譜音として取り扱う。 Then, the specifying unit 13 sets the sound of interest as the next musical sound on the musical score data (step S139), and transitions to the processing of step S134. That is, the specifying unit 13 treats the second musical note sound in the musical score data as the above-mentioned first musical note sound and the third musical note sound in the musical score data as the above-mentioned second musical note sound in the processing of step S134 and thereafter. handle.

ステップＳ１３６において、アタックが発生せずに（ステップＳ１３６；Ｎｏ）、所定期間が経過してタイムアウトとなった場合にも（ステップＳ１４０；Ｙｅｓ）、特定部１３は、そのタイムアウトとなったタイミングを、第１の演奏音が終了した終了タイミングであり、第２の演奏音が開始された開始タイミングであると判断する（ステップＳ１３８）。この所定期間は、例えば第１の楽譜音の長さ以上の期間（例えば第１の楽譜音の長さ×１．２）である。そして、特定部１３は、注目対象となる音を楽譜データ上の次の楽譜音にセットして（ステップＳ１３９）、ステップＳ１３５の処理に遷移する。 In step S136, even when the attack has not occurred (step S136; No) and the timeout has occurred after a predetermined period of time (step S140; Yes), the specifying unit 13 determines the timing of the timeout. It is determined that it is the end timing when the first performance sound ends and the start timing when the second performance sound starts (step S138). The predetermined period is, for example, a period equal to or longer than the length of the first musical score sound (for example, the length of the first musical score sound × 1.2). Then, the specifying unit 13 sets the sound of interest as the next musical sound on the musical score data (step S139), and transitions to the processing of step S135.

また、ステップＳ１３７において第１の楽譜音と同じ第２の楽譜音の音高と演奏音の音高とが一致しないまま（ステップＳ１３７；Ｎｏ）、所定期間が経過してタイムアウトとなった場合にも（ステップＳ１４０；Ｙｅｓ）、特定部１３は、そのタイムアウトとなったタイミングを、第１の演奏音が終了した終了タイミングであり、第２の演奏音が開始された開始タイミングであると判断する（ステップＳ１３８）。この所定期間は、例えば第１の楽譜音の長さ以上の期間（例えば第１の楽譜音の長さ×１．２）である。つまり、特定部１３は、楽譜データにおいて第１の楽譜音に続く第２の楽譜音の音高と、演奏音データにおいて認識された第１の演奏音に続く演奏音の音高との一致度が閾値を超えない期間が所定期間を経過した場合には、第１の楽譜音に対応する第１の演奏音が終了したと判断する。そして、特定部１３は、注目対象となる音を楽譜データ上の次の楽譜音にセットして（ステップＳ１３９）、ステップＳ１３５の処理に遷移する。 Further, in step S137, when the pitch of the second musical score sound that is the same as the first musical score sound and the pitch of the performance sound do not match (step S137; No), a predetermined period elapses and a time-out occurs. Also (step S140; Yes), the specifying unit 13 determines that the time-out timing is the end timing when the first performance sound is ended and the start timing when the second performance sound is started. (Step S138). The predetermined period is, for example, a period equal to or longer than the length of the first musical score sound (for example, the length of the first musical score sound × 1.2). That is, the identifying unit 13 determines the degree of coincidence between the pitch of the second musical note sound following the first musical note sound in the musical score data and the pitch of the musical performance sound following the first musical sound recognized in the musical sound data. If the period in which the value does not exceed the threshold has exceeded the predetermined period, it is determined that the first performance sound corresponding to the first musical score sound has ended. Then, the specifying unit 13 sets the sound of interest as the next musical sound on the musical score data (step S139), and transitions to the processing of step S135.

なお、前述したステップＳ１３５において、ステップＳ１３６に移行するまで一定時間待機する理由は、第１の演奏音のアタックと第２の演奏音のアタックとが時間的に或る程度離れているべきだからである。 In step S135 described above, the reason why the attack of the first performance sound and the attack of the second performance sound should be separated to some extent in time is the reason why the attack of the first performance sound should wait for a certain period of time before proceeding to step S136. is there.

一方、ステップＳ１３４にて、楽譜データにおいて異なる音高の楽譜音が連続する場合（ステップＳ１３４；異音）、上述したような同一音高の楽譜音が連続している場合とは、特定部１３が楽譜音に対応する演奏音の開始タイミング又は終了タイミングを特定する方法が異なる。まず、特定部１３は、ステップＳ１３５と同様に、第１の演奏音の開始タイミングから一定時間が経過したか否かを判断する（ステップＳ１４１）。第１の演奏音の開始タイミングから一定時間が経過すると（ステップＳ１４１；Ｙｅｓ）、特定部１３は、第２の楽譜音の音高と、第１の演奏音の開始タイミングから一定時間が経過したあとに認識部１２により認識された第２の演奏音の音高とが一致するか否かを判断する（ステップＳ１４２）。第２の楽譜音の音高と第２の演奏音の音高とが一致した場合には（ステップＳ１４２；Ｙｅｓ）、特定部１３は、その一致したタイミングを、第１の演奏音が終了した終了タイミングであり、第２の演奏音が開始された開始タイミングであると判断する（ステップＳ１３８）。そして、特定部１３は、注目対象となる音を楽譜データ上の次の楽譜音にセットして（ステップＳ１３９）、ステップＳ１３５の処理に遷移する。 On the other hand, in step S134, when the score sounds of different pitches are continuous in the score data (step S134; abnormal sound), the case where the score sounds of the same pitch as described above are continuous is specified by the specifying unit 13 Are different in the method of specifying the start timing or the end timing of the performance sound corresponding to the musical score sound. First, as in step S135, the identifying unit 13 determines whether or not a fixed time has elapsed from the start timing of the first performance sound (step S141). When a certain time has elapsed from the start timing of the first performance sound (step S141; Yes), the specifying unit 13 has passed the pitch of the second musical score sound and a certain time from the start timing of the first performance sound. It is then determined whether or not the pitch of the second performance sound recognized by the recognition unit 12 matches (step S142). When the pitch of the second musical score sound and the pitch of the second performance sound match (step S142; Yes), the specifying unit 13 ends the matching timing at which the first performance sound ends. It is the end timing, and it is determined that it is the start timing at which the second performance sound is started (step S138). Then, the specifying unit 13 sets the sound of interest as the next musical sound on the musical score data (step S139), and transitions to the processing of step S135.

ステップＳ１４２において、第２の楽譜音の音高と第２の演奏音の音高とが一致せずに（ステップＳ１４２；Ｎｏ）、所定期間が経過してタイムアウトとなった場合にも（ステップＳ１４３；Ｙｅｓ）、特定部１３は、そのタイムアウトとなったタイミングを、第１の演奏音が終了した終了タイミングであり、第２の演奏音が開始された開始タイミングであると判断する（ステップＳ１３８）。この所定期間は、例えば第２の楽譜音の長さ以上の期間（例えば第２の楽譜音の長さ×１．２）である。そして、特定部１３は、注目対象となる音を楽譜データ上の次の楽譜音にセットして（ステップＳ１３９）、ステップＳ１３５の処理に遷移する。 Also in step S142, when the pitch of the second musical score sound and the pitch of the second musical performance sound do not match (step S142; No), a predetermined period elapses and time-out occurs (step S143). Yes)), and the identifying unit 13 determines that the time-out timing is the end timing when the first performance sound ends and the start timing when the second performance sound starts (step S138). . This predetermined period is, for example, a period equal to or longer than the length of the second musical score sound (for example, the length of the second musical score sound × 1.2). Then, the specifying unit 13 sets the sound of interest as the next musical sound on the musical score data (step S139), and transitions to the processing of step S135.

特定部１３が以上の処理を全ての楽譜音データに対して行うことで、楽譜音と演奏音との対応関係が特定される。 The specifying unit 13 performs the above processing on all the musical score sound data, and thereby the correspondence between the musical score sound and the performance sound is specified.

再び図４の説明に戻り、評価部１４は、特定部１３により特定された対応関係にある楽譜音及び演奏音を比較して、ユーザの演奏を評価する（ステップＳ１４：評価処理）。この比較は、演奏音が楽譜音をどの程度忠実に再現しているか否かという観点から行われる。そして、評価部１４は、その評価結果を、演奏の優劣の指標値として出力する。 Returning to the explanation of FIG. 4 again, the evaluation unit 14 evaluates the performance of the user by comparing the musical score sound and the performance sound that are in the correspondence relationship specified by the specification unit 13 (step S14: evaluation processing). This comparison is performed from the viewpoint of how faithfully the performance sound reproduces the musical score sound. Then, the evaluation unit 14 outputs the evaluation result as an index value of performance superiority or inferiority.

以上説明した実施形態によれば、演奏速度が変化した場合であっても、演奏者の演奏音と楽譜上の楽譜音とを対応させられる。また、ピアノのような楽器の場合は、演奏の発音当初から或る時間が経過すると音圧が小さくなることからその演奏音の音高を正しく認識できない可能性があるが、本実施形態によれば、音圧が小さくなった第１の演奏音の次の第２の演奏音が検出されたことをもって、第１の演奏音の終了タイミングを特定するので、演奏音の長さをより正確に特定することが可能となる。また、本実施形態が目的としているような楽譜上のトラッキングに関する技術として、ＤＰマッチング（動的計画法）と呼ばれる手法が知られているが、本実施形態の手法はＤＰマッチングよりも簡単な処理手順のため処理速度が速い。このため、本実施形態によれば、演奏時においてより早くその演奏を評価することが可能となる。 According to the embodiment described above, the performance sound of the performer and the score sound on the score can be associated with each other even when the performance speed changes. Further, in the case of a musical instrument such as a piano, there is a possibility that the pitch of the performance sound cannot be correctly recognized because the sound pressure decreases after a certain time has passed from the beginning of the performance pronunciation, but according to the present embodiment. For example, since the end timing of the first performance sound is specified by detecting the second performance sound next to the first performance sound whose sound pressure has decreased, the length of the performance sound can be more accurately determined. It becomes possible to specify. A technique called DP matching (dynamic programming) is known as a technique related to tracking on a score, which is the object of the present embodiment, but the technique of the present embodiment is a simpler process than DP matching. Due to the procedure, the processing speed is fast. Therefore, according to the present embodiment, it becomes possible to evaluate the performance earlier during the performance.

［変形例］
本発明は、上述した実施形態に限定されない。上述した実施形態を以下のように変形してもよい。また、以下の２つ以上の変形例を組み合わせて実施してもよい。
［変形例１］
楽器においては、同時に複数の音が演奏される場合がある。これは和音と呼ばれている。認識部１２は、和音が演奏された場合、周知技術を用いて、その和音を構成する音の音高をその確からしさ（最も確からしい第１順位〜第Ｘ順位）とともに認識する。このような和音が楽譜上に存在する場合、特定部１３は、その和音を構成する複数の音（例えば音高Ｃ４，Ｇ４，Ｅ４の音）の各音高と、認識部１２により演奏音から第１順位で認識される音高とを比較する。そして、特定部１３は、認識された第１順位の演奏音の音高が和音を構成する複数の音のいずれかに一致した場合には、楽譜データにおいてその和音の楽譜音に相当する演奏音が開始されたこと、及び、楽譜データにおいてその和音の前の楽譜音に相当する演奏音が終了したと判断する。 [Modification]
The present invention is not limited to the above embodiments. The embodiment described above may be modified as follows. Further, the following two or more modified examples may be combined and implemented.
[Modification 1]
In a musical instrument, a plurality of sounds may be played at the same time. This is called a chord. When a chord is played, the recognizing unit 12 uses known techniques to recognize the pitch of the notes forming the chord together with their certainty (most probable first rank to Xth rank). When such a chord exists on the musical score, the identifying unit 13 determines the pitch of each of a plurality of notes (for example, the notes of pitches C4, G4, and E4) that compose the chord and the playing sound by the recognition unit 12. The pitch recognized in the first rank is compared. Then, when the pitch of the recognized first-order performance sound matches any of the plurality of notes that form the chord, the specifying unit 13 determines the performance sound corresponding to the score sound of the chord in the score data. Is started, and it is determined that the performance sound corresponding to the music sound preceding the chord in the music score data has ended.

［変形例２］
実施形態において、認識部１２は、或る時間間隔（例えば２０ｍｓｅｃ）で演奏音の音高を検出し、時間的に所定回数連続して（例えば２０ｍｓｅｃ×４回）、同一音高として決められた範囲内の音高が検出されると、その音高を演奏音の音高として認識していた。この所定回数は、固定値であってもよいし、また、楽譜音の長さ、楽譜データのテンポ又は演奏のテンポ（演奏速度）に応じて異なっていてもよい。例えば、認識部１２は、注目対象となる楽譜音の長さが長いほど、上記の所定回数を多くしてもよい。また、認識部１２は、楽譜データにおいて定められたテンポが速いほど、上記の所定回数を少なくしてもよい。また、特定部１３が演奏音の開始タイミング及び終了タイミングを特定することで演奏のテンポ（演奏速度）を特定し得るから、認識部１２は、その演奏のテンポが速いほど、上記の所定回数を少なくしてもよい。 [Modification 2]
In the embodiment, the recognition unit 12 detects the pitch of the performance sound at a certain time interval (for example, 20 msec), and is determined as the same pitch continuously for a predetermined number of times (for example, 20 msec × 4 times). When the pitch within the range is detected, the pitch is recognized as the pitch of the performance sound. This predetermined number of times may be a fixed value, or may differ depending on the length of the musical score sound, the tempo of the musical score data, or the tempo (performance speed) of the performance. For example, the recognizing unit 12 may increase the above-described predetermined number of times as the length of the musical score sound of interest becomes longer. Further, the recognition unit 12 may decrease the above-mentioned predetermined number of times as the tempo determined in the score data is faster. Further, since the specifying unit 13 can specify the tempo (performance speed) of the performance by specifying the start timing and the end timing of the performance sound, the recognition unit 12 determines that the above-mentioned predetermined number of times is set as the performance tempo is faster. May be less.

［変形例３］
実施形態において、特定部１３は、楽譜データにおいて第１の楽譜音に続く第２の楽譜音の音高と、演奏音データにおいて認識された第１の演奏音に続く演奏音の音高との一致度が閾値を超えない期間が所定期間を経過した場合には、第１の楽譜音に対応する第１の演奏音が終了したと判断していた。このような第２の楽譜音の音高と第１の演奏音に続く演奏音の音高との一致度が閾値を超えない期間が所定期間を経過した、という場合が、所定回数連続して発生することが考えられる。例えば、ユーザが何度も連続して演奏を失敗するようなケースであり、このようなときには楽譜データにおける演奏位置が不明となる可能性が考えられる。このような場合、特定部１３は、楽譜データにおいて時間的に連続する複数の楽譜音の音高の組み合わせと、演奏音データにおいて時間的に連続して認識された複数の演奏音の音高の組み合わせとを照合して、楽譜音と演奏音との対応関係を特定するようにしてもよい。具体的には、認識部１２は、演奏音データにおいて、アタックの発生位置を境界として、複数の演奏音に分割して、各演奏音の音高を認識する。特定部１３は、演奏位置が不明となった前後の楽譜データにおいて、認識した演奏音の音高と一致する楽譜音を検索し、さらに、その次の演奏音と楽譜音の音高とが一致するか否かを判断する。これにより、一定数以上の連続した音高の組み合わせが一致した場合には、特定部１３は、楽譜データにおけるその位置を演奏位置と判断し、前述した図５の処理を再開する。 [Modification 3]
In the embodiment, the identifying unit 13 determines the pitch of the second musical note sound following the first musical note sound in the musical score data and the pitch of the musical performance sound following the first musical sound recognized in the musical sound data. When the period in which the degree of coincidence does not exceed the threshold has passed the predetermined period, it is determined that the first performance sound corresponding to the first musical score sound has ended. In the case where the period during which the degree of coincidence between the pitch of the second musical note sound and the pitch of the musical performance sound following the first musical performance sound does not exceed the threshold value has passed a predetermined time It may occur. For example, there is a case where the user fails to play a number of times in succession, and in such a case, the playing position in the score data may be unknown. In such a case, the specifying unit 13 determines the combination of the pitches of a plurality of musical score sounds that are temporally continuous in the musical score data and the pitches of a plurality of musical performance sounds that are recognized in the musical performance data continuously in time. The correspondence may be identified by identifying the correspondence between the musical score sound and the performance sound. Specifically, the recognition unit 12 divides the performance sound data into a plurality of performance sounds with the attack occurrence position as a boundary, and recognizes the pitch of each performance sound. The specifying unit 13 searches the score data before and after the playing position is unknown for a score sound that matches the pitch of the recognized performance sound, and further, matches the next performance sound with the pitch of the score sound. Determine whether to do. Accordingly, when a combination of a certain number or more of consecutive pitches matches, the identifying unit 13 determines that position in the score data as a performance position, and restarts the process of FIG. 5 described above.

［変形例４］
本発明に係る情報処理装置の一例として、スマートホンやタブレット等のユーザ端末１０を例示したが、図３に例示した機能ブロックを実現する情報処理装置であれば本発明を適用可能である。
また、本発明において、演奏に用いる楽器は、実施形態にて例示したピアノに限定されない。 [Modification 4]
Although the user terminal 10 such as a smartphone or a tablet is illustrated as an example of the information processing apparatus according to the present invention, the present invention can be applied to any information processing apparatus that realizes the functional blocks illustrated in FIG.
Further, in the present invention, the musical instrument used for performance is not limited to the piano exemplified in the embodiment.

［そのほかの変形例］
なお、上記実施形態の説明に用いたブロック図は、機能単位のブロックを示している。これらの機能ブロック（構成部）は、ハードウェア及びソフトウェアの少なくとも一方の任意の組み合わせによって実現される。また、各機能ブロックの実現方法は特に限定されない。すなわち、各機能ブロックは、物理的又は論理的に結合した１つの装置を用いて実現されてもよいし、物理的又は論理的に分離した２つ以上の装置を直接的又は間接的に（例えば、有線、無線などを用いて）接続し、これら複数の装置を用いて実現されてもよい。機能ブロックは、上記１つの装置又は上記複数の装置にソフトウェアを組み合わせて実現されてもよい。 [Other modifications]
Note that the block diagrams used in the description of the above embodiment show blocks of functional units. These functional blocks (components) are realized by an arbitrary combination of at least one of hardware and software. The method of realizing each functional block is not particularly limited. That is, each functional block may be realized by using one device physically or logically coupled, or directly or indirectly (for example, two or more devices physically or logically separated). , Wired, wireless, etc.) and may be implemented using these multiple devices. The functional blocks may be realized by combining the one device or the plurality of devices with software.

機能には、判断、決定、判定、計算、算出、処理、導出、調査、探索、確認、受信、送信、出力、アクセス、解決、選択、選定、確立、比較、想定、期待、見做し、報知（broadcasting）、通知（notifying）、通信（communicating）、転送（forwarding）、構成（configuring）、再構成（reconfiguring）、割り当て（allocating、mapping）、割り振り（assigning）などがあるが、これらに限られない。たとえば、送信を機能させる機能ブロック（構成部）は、送信部（transmitting unit）や送信機（transmitter）と呼称される。いずれも、上述したとおり、実現方法は特に限定されない。 Functions include judgment, decision, judgment, calculation, calculation, processing, derivation, investigation, search, confirmation, reception, transmission, output, access, solution, selection, selection, establishment, comparison, assumption, expectation, and observation. Broadcasting, notifying, communicating, forwarding, configuration, reconfiguring, allocating, mapping, assigning, etc., but not limited to these. I can't. For example, a functional block (component) that causes transmission to function is called a transmitting unit or a transmitter. In any case, as described above, the implementation method is not particularly limited.

例えば、本開示の一実施の形態におけるユーザ端末は、本開示の無線通信方法の処理を行うコンピュータとして機能してもよい。 For example, the user terminal according to the embodiment of the present disclosure may function as a computer that performs the process of the wireless communication method of the present disclosure.

情報の通知は、本開示において説明した態様／実施形態に限られず、他の方法を用いて行われてもよい。例えば、情報の通知は、物理レイヤシグナリング（例えば、ＤＣＩ（Downlink Control Information）、ＵＣＩ（Uplink Control Information））、上位レイヤシグナリング（例えば、ＲＲＣ（Radio Resource Control）シグナリング、ＭＡＣ（Medium Access Control）シグナリング、報知情報（ＭＩＢ（Master Information Block）、ＳＩＢ（System Information Block）））、その他の信号又はこれらの組み合わせによって実施されてもよい。また、ＲＲＣシグナリングは、ＲＲＣメッセージと呼ばれてもよく、例えば、ＲＲＣ接続セットアップ（RRC Connection Setup）メッセージ、ＲＲＣ接続再構成（RRC Connection Reconfiguration）メッセージなどであってもよい。 The notification of information is not limited to the aspect / embodiment described in the present disclosure, and may be performed using another method. For example, the information is notified by physical layer signaling (for example, DCI (Downlink Control Information), UCI (Uplink Control Information)), upper layer signaling (for example, RRC (Radio Resource Control) signaling, MAC (Medium Access Control) signaling, It may be implemented by notification information (MIB (Master Information Block), SIB (System Information Block)), another signal, or a combination thereof. Further, the RRC signaling may be called an RRC message, and may be, for example, an RRC connection setup (RRC Connection Setup) message, an RRC connection reconfiguration message, or the like.

本開示において説明した各態様／実施形態は、ＬＴＥ（Long Term Evolution）、ＬＴＥ−Ａ（LTE-Advanced）、ＳＵＰＥＲ３Ｇ、ＩＭＴ−Ａｄｖａｎｃｅｄ、４Ｇ（4th generation mobile communication system）、５Ｇ（5th generation mobile communication system）、ＦＲＡ（Future Radio Access）、ＮＲ（new Radio）、Ｗ−ＣＤＭＡ（登録商標）、ＧＳＭ（登録商標）、ＣＤＭＡ２０００、ＵＭＢ（Ultra Mobile Broadband）、ＩＥＥＥ８０２．１１（Ｗｉ−Ｆｉ（登録商標））、ＩＥＥＥ８０２．１６（ＷｉＭＡＸ（登録商標））、ＩＥＥＥ８０２．２０、ＵＷＢ（Ultra-WideBand）、Ｂｌｕｅｔｏｏｔｈ（登録商標）、その他の適切なシステムを利用するシステム及びこれらに基づいて拡張された次世代システムの少なくとも一つに適用されてもよい。また、複数のシステムが組み合わされて（例えば、ＬＴＥ及びＬＴＥ−Ａの少なくとも一方と５Ｇとの組み合わせ等）適用されてもよい。 Each aspect / embodiment described in the present disclosure is LTE (Long Term Evolution), LTE-A (LTE-Advanced), SUPER 3G, IMT-Advanced, 4G (4th generation mobile communication system), 5G (5th generation mobile communication). system), FRA (Future Radio Access), NR (new Radio), W-CDMA (registered trademark), GSM (registered trademark), CDMA2000, UMB (Ultra Mobile Broadband), IEEE 802.11 (Wi-Fi (registered trademark) )), IEEE 802.16 (WiMAX (registered trademark)), IEEE 802.20, UWB (Ultra-WideBand), Bluetooth (registered trademark), systems using other suitable systems, and extensions based on these. It may be applied to at least one of the next-generation systems. Further, a plurality of systems may be combined and applied (for example, a combination of at least one of LTE and LTE-A and 5G).

本開示において説明した各態様／実施形態の処理手順、シーケンス、フローチャートなどは、矛盾の無い限り、順序を入れ替えてもよい。例えば、本開示において説明した方法については、例示的な順序を用いて様々なステップの要素を提示しており、提示した特定の順序に限定されない。 The processing procedure, sequence, flowchart, etc. of each aspect / embodiment described in the present disclosure may be interchanged as long as there is no contradiction. For example, the methods described in this disclosure present elements of the various steps in a sample order, and are not limited to the specific order presented.

情報等は、上位レイヤ（又は下位レイヤ）から下位レイヤ（又は上位レイヤ）へ出力され得る。複数のネットワークノードを介して入出力されてもよい。 Information and the like can be output from the upper layer (or lower layer) to the lower layer (or upper layer). Input / output may be performed via a plurality of network nodes.

入出力された情報等は特定の場所（例えば、メモリ）に保存されてもよいし、管理テーブルを用いて管理してもよい。入出力される情報等は、上書き、更新、又は追記され得る。出力された情報等は削除されてもよい。入力された情報等は他の装置へ送信されてもよい。 The input / output information and the like may be stored in a specific place (for example, a memory) or may be managed using a management table. Information that is input / output can be overwritten, updated, or added. The output information and the like may be deleted. The input information and the like may be transmitted to another device.

判定は、１ビットで表される値（０か１か）によって行われてもよいし、真偽値（Boolean：true又はfalse）によって行われてもよいし、数値の比較（例えば、所定の値との比較）によって行われてもよい。 The determination may be performed based on a value represented by 1 bit (0 or 1), may be performed based on a Boolean value (Boolean: true or false), or may be compared by numerical values (for example, a predetermined value). (Comparison with value).

本開示において説明した各態様／実施形態は単独で用いてもよいし、組み合わせて用いてもよいし、実行に伴って切り替えて用いてもよい。また、所定の情報の通知（例えば、「Ｘであること」の通知）は、明示的に行うものに限られず、暗黙的（例えば、当該所定の情報の通知を行わない）ことによって行われてもよい。
以上、本開示について詳細に説明したが、当業者にとっては、本開示が本開示中に説明した実施形態に限定されるものではないということは明らかである。本開示は、請求の範囲の記載により定まる本開示の趣旨及び範囲を逸脱することなく修正及び変更態様として実施することができる。したがって、本開示の記載は、例示説明を目的とするものであり、本開示に対して何ら制限的な意味を有するものではない。 Each aspect / embodiment described in the present disclosure may be used alone, in combination, or may be switched according to execution. Further, the notification of the predetermined information (for example, the notification of “being X”) is not limited to the explicit notification, and is performed implicitly (for example, the notification of the predetermined information is not performed). Good.
Although the present disclosure has been described in detail above, it is obvious to those skilled in the art that the present disclosure is not limited to the embodiments described in the present disclosure. The present disclosure can be implemented as modified and changed modes without departing from the spirit and scope of the present disclosure defined by the description of the claims. Therefore, the description of the present disclosure is for the purpose of exemplifying description, and does not have any restrictive meaning to the present disclosure.

ソフトウェアは、ソフトウェア、ファームウェア、ミドルウェア、マイクロコード、ハードウェア記述言語と呼ばれるか、他の名称で呼ばれるかを問わず、命令、命令セット、コード、コードセグメント、プログラムコード、プログラム、サブプログラム、ソフトウェアモジュール、アプリケーション、ソフトウェアアプリケーション、ソフトウェアパッケージ、ルーチン、サブルーチン、オブジェクト、実行可能ファイル、実行スレッド、手順、機能などを意味するよう広く解釈されるべきである。
また、ソフトウェア、命令、情報などは、伝送媒体を介して送受信されてもよい。例えば、ソフトウェアが、有線技術（同軸ケーブル、光ファイバケーブル、ツイストペア、デジタル加入者回線（ＤＳＬ：Digital Subscriber Line）など）及び無線技術（赤外線、マイクロ波など）の少なくとも一方を使用してウェブサイト、サーバ、又は他のリモートソースから送信される場合、これらの有線技術及び無線技術の少なくとも一方は、伝送媒体の定義内に含まれる。 Software, whether called software, firmware, middleware, microcode, hardware description language, or any other name, instructions, instruction sets, code, code segments, program code, programs, subprograms, software modules. , Application, software application, software package, routine, subroutine, object, executable, thread of execution, procedure, function, etc. should be construed broadly.
Also, software, instructions, information, etc. may be sent and received via a transmission medium. For example, the software uses a wired technology (coaxial cable, optical fiber cable, twisted pair, Digital Subscriber Line (DSL), etc.) and / or wireless technology (infrared, microwave, etc.) to use a website, When sent from a server, or other remote source, at least one of these wired and wireless technologies is included within the definition of transmission medium.

本開示において説明した情報、信号などは、様々な異なる技術のいずれかを使用して表されてもよい。例えば、上記の説明全体に渡って言及され得るデータ、命令、コマンド、情報、信号、ビット、シンボル、チップなどは、電圧、電流、電磁波、磁界若しくは磁性粒子、光場若しくは光子、又はこれらの任意の組み合わせによって表されてもよい。
なお、本開示において説明した用語及び本開示の理解に必要な用語については、同一の又は類似する意味を有する用語と置き換えてもよい。例えば、チャネル及びシンボルの少なくとも一方は信号（シグナリング）であってもよい。また、信号はメッセージであってもよい。また、コンポーネントキャリア（ＣＣ：Component Carrier）は、キャリア周波数、セル、周波数キャリアなどと呼ばれてもよい。 The information, signals, etc. described in this disclosure may be represented using any of a variety of different technologies. For example, data, instructions, commands, information, signals, bits, symbols, chips, etc. that may be referred to throughout the above description include voltage, current, electromagnetic waves, magnetic fields or magnetic particles, optical fields or photons, or any of these. May be represented by a combination of
The terms described in the present disclosure and terms necessary for understanding the present disclosure may be replaced with terms having the same or similar meanings. For example, at least one of the channel and the symbol may be a signal (signaling). The signal may also be a message. Moreover, a component carrier (CC: Component Carrier) may be called a carrier frequency, a cell, a frequency carrier, or the like.

本開示において使用する「システム」及び「ネットワーク」という用語は、互換的に使用される。 The terms "system" and "network" used in this disclosure are used interchangeably.

また、本開示において説明した情報、パラメータなどは、絶対値を用いて表されてもよいし、所定の値からの相対値を用いて表されてもよいし、対応する別の情報を用いて表されてもよい。例えば、無線リソースはインデックスによって指示されるものであってもよい。
上述したパラメータに使用する名称はいかなる点においても限定的な名称ではない。さらに、これらのパラメータを使用する数式等は、本開示で明示的に開示したものと異なる場合もある。様々なチャネル（例えば、ＰＵＣＣＨ、ＰＤＣＣＨなど）及び情報要素は、あらゆる好適な名称によって識別できるので、これらの様々なチャネル及び情報要素に割り当てている様々な名称は、いかなる点においても限定的な名称ではない。 Further, the information, parameters, etc. described in the present disclosure may be represented by using an absolute value, may be represented by using a relative value from a predetermined value, or by using other corresponding information. May be represented. For example, the radio resources may be those indicated by the index.
The names used for the above parameters are not limiting in any way. Further, the formulas and the like that use these parameters may differ from those explicitly disclosed in this disclosure. Since different channels (eg PUCCH, PDCCH, etc.) and information elements can be identified by any suitable name, the different names assigned to these different channels and information elements are in no way limited names. is not.

本開示においては、「移動局（ＭＳ：Mobile Station）」、「ユーザ端末（user terminal）」、「ユーザ装置（ＵＥ：User Equipment）」、「端末」などの用語は、互換的に使用され得る。
移動局は、当業者によって、加入者局、モバイルユニット、加入者ユニット、ワイヤレスユニット、リモートユニット、モバイルデバイス、ワイヤレスデバイス、ワイヤレス通信デバイス、リモートデバイス、モバイル加入者局、アクセス端末、モバイル端末、ワイヤレス端末、リモート端末、ハンドセット、ユーザエージェント、モバイルクライアント、クライアント、又はいくつかの他の適切な用語で呼ばれる場合もある。 In the present disclosure, terms such as “mobile station (MS)”, “user terminal”, “user equipment (UE)”, and “terminal” may be used interchangeably. .
Mobile stations are defined by those skilled in the art as subscriber stations, mobile units, subscriber units, wireless units, remote units, mobile devices, wireless devices, wireless communication devices, remote devices, mobile subscriber stations, access terminals, mobile terminals, wireless. It may also be referred to as a terminal, remote terminal, handset, user agent, mobile client, client, or some other suitable term.

ユーザ端末は、送信装置、受信装置、通信装置などと呼ばれてもよい。なお、ユーザ端末の少なくとも一方は、移動体に搭載されたデバイス、移動体自体などであってもよい。当該移動体は、乗り物（例えば、車、飛行機など）であってもよいし、無人で動く移動体（例えば、ドローン、自動運転車など）であってもよいし、ロボット（有人型又は無人型）であってもよい。なお、ユーザ端末は、必ずしも通信動作時に移動しない装置も含む。例えば、ユーザ端末１０は、センサなどのＩｏＴ（Internet of Things）機器であってもよい。本開示におけるユーザ端末は、基地局で読み替えてもよい。この場合、上述のユーザ端末１０が有する機能を基地局が有する構成としてもよい。 The user terminal may be called a transmitting device, a receiving device, a communication device, or the like. Note that at least one of the user terminals may be a device mounted on a mobile body, the mobile body itself, or the like. The moving body may be a vehicle (eg, car, airplane, etc.), an unmanned moving body (eg, drone, self-driving car, etc.), or a robot (manned type or unmanned type). ) May be sufficient. Note that the user terminal also includes a device that does not necessarily move during communication operation. For example, the user terminal 10 may be an IoT (Internet of Things) device such as a sensor. The user terminal in the present disclosure may be replaced by the base station. In this case, the base station may have the function of the above-described user terminal 10.

「判断(determining)」、「決定(determining)」という用語は、多種多様な動作を包含する場合がある。「判断」、「決定」は、例えば、判定(judging)、計算(calculating)、算出(computing)、処理(processing)、導出(deriving)、調査(investigating)、探索(looking up、search、inquiry)（例えば、テーブル、データベース又は別のデータ構造での探索）、確認(ascertaining)した事を「判断」「決定」したとみなす事などを含み得る。また、「判断」、「決定」は、受信(receiving)（例えば、情報を受信すること）、送信(transmitting)(例えば、情報を送信すること)、入力(input)、出力(output)、アクセス(accessing)（例えば、メモリ中のデータにアクセスすること）した事を「判断」「決定」したとみなす事などを含み得る。また、「判断」、「決定」は、解決(resolving)、選択(selecting)、選定(choosing)、確立(establishing)、比較(comparing)などした事を「判断」「決定」したとみなす事を含み得る。つまり、「判断」「決定」は、何らかの動作を「判断」「決定」したとみなす事を含み得る。また、「判断（決定）」は、「想定する（assuming）」、「期待する（expecting）」、「みなす（considering）」などで読み替えられてもよい。 The terms "determining" and "determining" may encompass a wide variety of actions. "Judgment", "decision" means, for example, judgment (judging), calculation (calculating), calculation (computing), processing (processing), derivation (deriving), investigating (investigating), searching (looking up, search, inquiry) (Eg, searching in a table, database, or another data structure), ascertaining what is considered to be “judgment” and “decision”, and the like. In addition, "decision" and "decision" include receiving (eg, receiving information), transmitting (eg, transmitting information), input (input), output (output), access (accessing) (for example, accessing data in a memory) may be regarded as “judging” and “deciding”. In addition, "judgment" and "decision" are considered to be "judgment" and "decision" when things such as resolving, selecting, choosing, choosing, establishing, and comparing are done. May be included. That is, the “judgment” and “decision” may include considering some action as “judgment” and “decision”. In addition, "determination (decision)" may be read as "assuming," "expecting," "considering," and the like.

「接続された(connected)」、「結合された(coupled)」という用語、又はこれらのあらゆる変形は、２又はそれ以上の要素間の直接的又は間接的なあらゆる接続又は結合を意味し、互いに「接続」又は「結合」された２つの要素間に１又はそれ以上の中間要素が存在することを含むことができる。要素間の結合又は接続は、物理的なものであっても、論理的なものであっても、或いはこれらの組み合わせであってもよい。例えば、「接続」は「アクセス」で読み替えられてもよい。本開示で使用する場合、２つの要素は、１又はそれ以上の電線、ケーブル及びプリント電気接続の少なくとも一つを用いて、並びにいくつかの非限定的かつ非包括的な例として、無線周波数領域、マイクロ波領域及び光（可視及び不可視の両方）領域の波長を有する電磁エネルギーなどを用いて、互いに「接続」又は「結合」されると考えることができる。 The terms "connected," "coupled," or any variation thereof, mean any direct or indirect connection or coupling between two or more elements, It can include the presence of one or more intermediate elements between two elements that are “connected” or “coupled”. The connections or connections between the elements may be physical, logical, or a combination thereof. For example, “connection” may be read as “access”. As used in this disclosure, two elements are in the radio frequency domain, with at least one of one or more wires, cables and printed electrical connections, and as some non-limiting and non-exhaustive examples. , Can be considered to be “connected” or “coupled” to each other, such as with electromagnetic energy having wavelengths in the microwave region and the light (both visible and invisible) region.

本開示において使用する「に基づいて」という記載は、別段に明記されていない限り、「のみに基づいて」を意味しない。言い換えれば、「に基づいて」という記載は、「のみに基づいて」と「に少なくとも基づいて」の両方を意味する。 As used in this disclosure, the phrase “based on” does not mean “based only on,” unless expressly specified otherwise. In other words, the phrase "based on" means both "based only on" and "based at least on."

本開示において使用する「第１の」、「第２の」などの呼称を使用した要素へのいかなる参照も、それらの要素の量又は順序を全般的に限定しない。これらの呼称は、２つ以上の要素間を区別する便利な方法として本開示において使用され得る。したがって、第１及び第２の要素への参照は、２つの要素のみが採用され得ること、又は何らかの形で第１の要素が第２の要素に先行しなければならないことを意味しない。 Any reference to elements using the designations "first," "second," etc. as used in this disclosure does not generally limit the amount or order of those elements. These designations may be used in this disclosure as a convenient way to distinguish between two or more elements. Thus, references to the first and second elements do not mean that only two elements may be employed, or that the first element must precede the second element in any way.

上記の各装置の構成における「手段」を、「部」、「回路」、「デバイス」等に置き換えてもよい。 The “means” in the configuration of each of the above devices may be replaced with “unit”, “circuit”, “device”, and the like.

本開示において、「含む（include）」、「含んでいる（including）」及びそれらの変形が使用されている場合、これらの用語は、用語「備える（comprising）」と同様に、包括的であることが意図される。さらに、本開示において使用されている用語「又は（or）」は、排他的論理和ではないことが意図される。 Where the terms “include”, “including” and variations thereof are used in this disclosure, these terms are inclusive, as is the term “comprising”. Is intended. Furthermore, the term "or" as used in this disclosure is not intended to be an exclusive or.

本開示において、例えば、英語でのa, an及びtheのように、翻訳により冠詞が追加された場合、本開示は、これらの冠詞の後に続く名詞が複数形であることを含んでもよい。 In the present disclosure, where translations add articles, such as a, an, and the in English, the disclosure may include that the noun that follows these articles is in the plural.

本開示において、「ＡとＢが異なる」という用語は、「ＡとＢが互いに異なる」ことを意味してもよい。なお、当該用語は、「ＡとＢがそれぞれＣと異なる」ことを意味してもよい。「離れる」、「結合される」などの用語も、「異なる」と同様に解釈されてもよい。 In the present disclosure, the term “A and B are different” may mean “A and B are different from each other”. The term may mean that “A and B are different from C”. The terms "remove", "coupled" and the like may be construed as "different" as well.

１…演奏評価システム、２…ネットワーク、１０…ユーザ端末、１１…収音部、１２…認識部、１３…特定部、１４…評価部、２０…サーバ装置、１００１…プロセッサ、１００２…メモリ、１００３…ストレージ、１００４…通信装置、１００５…入力装置、１００６…出力装置。 DESCRIPTION OF SYMBOLS 1 ... Performance evaluation system, 2 ... Network, 10 ... User terminal, 11 ... Sound collection part, 12 ... Recognition part, 13 ... Specification part, 14 ... Evaluation part, 20 ... Server device, 1001 ... Processor, 1002 ... Memory, 1003 Storage 1004 Communication device 1005 Input device 1006 Output device

Claims

A recognition unit that recognizes the pitch of the performance sound in the performance sound data indicating the collected performance sound,
A specifying unit that specifies a correspondence relationship between the score sound and the performance sound based on the degree of coincidence between the pitch of the score sound indicated by the score data and the pitch of the performance sound recognized in the performance sound data by the recognition unit. With and
The specifying unit is
When the degree of coincidence between the pitch of the first musical note sound included in the musical score data and the pitch of the musical performance sound recognized in the musical performance data exceeds a threshold value, it corresponds to the first musical note sound. Specify the start timing of the first performance sound,
In the score data, the degree of coincidence between the pitch of the second score sound following the first score sound and the pitch of the performance sound following the first performance sound recognized in the performance sound data has a threshold value. An information processing apparatus, characterized in that, when the time exceeds, the end timing of the first performance sound corresponding to the first musical score sound is specified.

The information processing apparatus according to claim 1, further comprising an evaluation unit that evaluates the performance by comparing the musical score sound and the performance sound that are in the correspondence relationship specified by the specifying unit.

The specifying unit continuously and temporally compares the pitch of one score sound in the score data with the pitch of the performance sound recognized in the performance sound data a predetermined number of times, and compares the respective pitches. The information processing apparatus according to claim 1 or 2, characterized in that, when the degree of coincidence of (1) exceeds a threshold value, it is specified that the musical score sound and the performance sound correspond.

The information processing apparatus according to claim 3, wherein the predetermined number of times differs depending on the length of the musical score sound, the tempo in the musical score data, or the tempo of performance.

The start timing or the end timing of the performance sound corresponding to the musical score sound by the specifying unit depending on whether the musical sound data of the same pitch is continuous in the musical score data or the musical sound of different pitches is continuous. The information processing apparatus according to any one of claims 1 to 4, characterized in that a method of specifying is different.

When the first musical note sound and the second musical note sound of the same pitch are consecutive in the musical score data, the specifying unit matches the pitches of the first musical note sound and the second musical note sound. In the section of the performance sound whose degree exceeds the threshold value, the first performance sound corresponding to the first musical score sound is specified based on the increase / decrease in the sound pressure of the performance sound, and the first musical sound corresponding to the second musical sound is identified. The information processing apparatus according to claim 5, wherein the performance sound of 2 is specified.

When the first musical note sound and the second musical note sound with different pitches are continuous in the musical score data, the specifying unit determines the pitch of the first musical note sound included in the musical score data and the performance sound data. In the case where the degree of coincidence with the pitch of the performance sound recognized in 1 exceeds a threshold value, the start timing of the first performance sound corresponding to the first musical score sound is specified,
In the score data, the degree of coincidence between the pitch of the second score sound following the first score sound and the pitch of the performance sound following the first performance sound recognized in the performance sound data has a threshold value. The information processing apparatus according to claim 5 or 6, wherein when it exceeds, the ending timing of the first performance sound corresponding to the first musical score sound is specified.

The specifying unit includes a pitch of a second musical note following the first musical note in the musical score data and a pitch of a musical note following the first musical note recognized in the musical note data. The first performance sound corresponding to the first musical score sound is determined to have ended when a period in which the degree of coincidence does not exceed a threshold value has passed a predetermined time period. The information processing apparatus according to any one of items.

The specifying unit determines that a predetermined number of times continues when a period in which the degree of coincidence between the pitch of the second musical score sound and the pitch of the performance sound following the first performance sound does not exceed a threshold has passed a predetermined number of times. By comparing a combination of pitches of a plurality of musical score sounds that are temporally continuous in the musical score data with a combination of pitches of a plurality of musical performance sounds that are continuously recognized in time in the musical performance sound data, 9. The information processing apparatus according to claim 8, wherein the correspondence relationship between the musical score sound and the performance sound is specified.

Computer,
A recognition unit that recognizes the pitch of the performance sound in the performance sound data indicating the collected performance sound,
A specifying unit that specifies a correspondence relationship between the score sound and the performance sound based on the degree of coincidence between the pitch of the score sound indicated by the score data and the pitch of the performance sound recognized in the performance sound data by the recognition unit. If the degree of coincidence between the pitch of the first musical note sound included in the musical score data and the pitch of the musical performance sound recognized in the musical performance data exceeds a threshold, the first musical score A start timing of a first performance sound corresponding to a note is specified, and a pitch of a second music sound following the first music sound in the music score data and the first pitch recognized in the performance sound data. To function as a specifying unit that specifies the end timing of the first performance sound corresponding to the first musical score sound when the degree of coincidence with the pitch of the performance sound following the performance sound exceeds a threshold value. Program of.