WO2022202199A1 - Code estimation device, training device, code estimation method, and training method - Google Patents

Code estimation device, training device, code estimation method, and training method Download PDF

Info

Publication number
WO2022202199A1
WO2022202199A1 PCT/JP2022/009233 JP2022009233W WO2022202199A1 WO 2022202199 A1 WO2022202199 A1 WO 2022202199A1 JP 2022009233 W JP2022009233 W JP 2022009233W WO 2022202199 A1 WO2022202199 A1 WO 2022202199A1
Authority
WO
WIPO (PCT)
Prior art keywords
series data
string
information
time
chord
Prior art date
Application number
PCT/JP2022/009233
Other languages
French (fr)
Japanese (ja)
Inventor
正博 鈴木
Original Assignee
ヤマハ株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ヤマハ株式会社 filed Critical ヤマハ株式会社
Priority to JP2023508892A priority Critical patent/JPWO2022202199A1/ja
Priority to CN202280023333.9A priority patent/CN117043852A/en
Publication of WO2022202199A1 publication Critical patent/WO2022202199A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10GREPRESENTATION OF MUSIC; RECORDING MUSIC IN NOTATION FORM; ACCESSORIES FOR MUSIC OR MUSICAL INSTRUMENTS NOT OTHERWISE PROVIDED FOR, e.g. SUPPORTS
    • G10G3/00Recording music in notation form, e.g. recording the mechanical operation of a musical instrument
    • G10G3/04Recording music in notation form, e.g. recording the mechanical operation of a musical instrument using electrical means
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/36Accompaniment arrangements
    • G10H1/38Chord

Definitions

  • the present invention relates to a chord estimation device and method for estimating chords for playing a musical instrument, and a training device and method for constructing a chord estimation device.
  • a code is estimated for each specific section. For example, one chord is estimated per bar. If it is possible to perform chord estimation with a higher degree of freedom from given notes, it is expected that it will be possible to more appropriately support the production of musical scores with chords.
  • the purpose of the present invention is to perform chord estimation with a high degree of freedom based on musical note strings.
  • a chord estimation apparatus uses a receiving unit that receives time-series data including a string of notes composed of a plurality of notes, and a trained model to generate a code string corresponding to the string of notes based on the time-series data. and an estimating unit for estimating the code string information to be indicated.
  • a training apparatus includes a first acquisition unit that acquires input time-series data including a reference note string composed of a plurality of notes, and output code string information that indicates a code string corresponding to the reference note string.
  • a chord estimation method is executed by a computer, accepts time series data including a string of notes, uses a trained model, and corresponds to the string of notes based on the time series data. Estimate code string information that indicates the code string to be used.
  • a training method is executed by a computer, acquires input time-series data including a reference note string consisting of a plurality of notes, and outputs code string information indicating a code string corresponding to the reference note string. acquire and build a trained model that has learned the input/output relationship between the input time-series data and the output code string information.
  • chord estimation with a high degree of freedom can be performed based on a string of musical notes.
  • FIG. 1 is a block diagram showing the configuration of a processing system including a chord estimation device and a training device according to one embodiment of the present invention.
  • FIG. 2 is a diagram showing an example of input time-series data included in training data.
  • FIG. 3 is a diagram showing an example of output code string information included in training data.
  • FIG. 4 is a block diagram showing the configuration of the training device and chord estimation device.
  • FIG. 5 shows an example of an arranged musical score displayed on the display unit.
  • FIG. 6 is a flowchart showing an example of training processing.
  • FIG. 7 is a flowchart showing an example of chord estimation processing.
  • FIG. 8 is a diagram showing a modified example of output code string information included in training data.
  • FIG. 1 is a block diagram showing the configuration of a processing system including a chord estimation device and a training device according to one embodiment of the present invention.
  • the processing system 100 includes a RAM (Random Access Memory) 110, a ROM (Read Only Memory) 120, a CPU (Central Processing Unit) 130, a storage section 140, an operation section 150 and a display section 160. .
  • RAM Random Access Memory
  • ROM Read Only Memory
  • CPU Central Processing Unit
  • the processing system 100 is implemented by a computer such as a personal computer, tablet terminal, or smart phone.
  • the processing system 100 may be realized by cooperative operation of a plurality of computers connected by a communication path such as Ethernet, or may be realized by an electronic musical instrument such as an electronic piano having performance functions.
  • RAM 110 , ROM 120 , CPU 130 , storage section 140 , operation section 150 and display section 160 are connected to the bus 170 .
  • RAM 110 , ROM 120 and CPU 130 constitute training device 10 and chord estimation device 20 .
  • training device 10 and chord estimation device 20 are configured by common processing system 100 in this embodiment, they may be configured by separate processing systems.
  • the RAM 110 consists of, for example, a volatile memory, and is used as a work area for the CPU 130.
  • the ROM 120 is, for example, a non-volatile memory and stores a training program and a code estimation program.
  • CPU 130 performs a training process by executing a training program stored in ROM 120 on RAM 110 . Further, the CPU 130 performs code estimation processing by executing a code estimation program stored in the ROM 120 on the RAM 110 . Details of the training process and the code estimation process will be described later.
  • the training program or code estimation program may be stored in the storage unit 140 instead of the ROM 120.
  • the training program or code estimation program may be provided in a form stored in a computer-readable storage medium and installed in ROM 120 or storage unit 140 .
  • a training program or code estimation program distributed from a server (including a cloud server) on the network is installed in the ROM 120 or the storage unit 140.
  • the storage unit 140 includes a storage medium such as a hard disk, an optical disk, a magnetic disk, or a memory card, and stores a trained model M and a plurality of training data D.
  • the trained model M or each piece of training data D may not be stored in the storage unit 140, but may be stored in a computer-readable storage medium.
  • the trained model M or respective training data D may be stored on a server on that network.
  • the trained model M is a machine learning model that has been trained to present chord strings to be referred to when the user of the chord estimation device 20 (hereinafter referred to as a performer) plays a piece of music. be.
  • a trained model M is constructed using a plurality of training data D.
  • a user of the training device 10 can generate the training data D by operating the operation unit 150 .
  • the training data D is data created based on the musical knowledge or musical sense of the reference performer.
  • the reference performer has a relatively high level of skill in playing the piece of music.
  • a reference performer may be the performer's mentor or teacher in the performance of the musical composition.
  • the training data D indicates a set of input time-series data and output code string information.
  • the input time-series data indicates a reference note string consisting of a plurality of notes.
  • the input time-series data is data that forms a melody or accompaniment sound with a plurality of notes.
  • the input time-series data may be image data representing images of musical scores.
  • the output code string information is data in which codes corresponding to the reference note string are arranged in time series. A code string corresponding to the reference note string is provided by the reference performer.
  • Figs. 2 and 3 are diagrams showing an example of each training data D.
  • Figs. The example in FIG. 2 shows input time-series data including a reference note string consisting of a plurality of notes.
  • the example in FIG. 3 shows output code string information indicating a code string corresponding to the reference note string.
  • the input time-series data has a metrical structure and additional information in addition to the reference note string.
  • the input time-series data A shown in FIG. 2 is data obtained by extracting data for the first two bars of a song. In the input time-series data A, bars are separated by "bar", and beats are separated by "beat". In this way, the input time-series data A has a metrical structure with the "bar” and "beat” information.
  • Elements A1 to A37 indicate the reference note string of the first bar. That is, the elements A1 to A37 are separated into bars by the "bar” before the element A1 and the "bar” after the element A37. In addition, it is divided into beats by "beat” after elements A8, A18, and A26.
  • the element A0 is additional information.
  • additional information for example, key information, genre information, difficulty level information, and the like are used.
  • key information is added by the Key element.
  • the key information is information specifying the key of the music represented by the reference note string.
  • the numerical value following Key is the numerical value that designates the key.
  • Genre information is information that designates the genre of music represented by the reference note string.
  • genre information for example, genres such as rock, pops, and jazz are specified. By designating genre information as additional information, a reference note string and a code string corresponding to the genre are machine-learned.
  • the difficulty level information is information indicating the difficulty level of the musical score indicated by the reference note string.
  • difficulty level information is information indicating the difficulty level of the musical score indicated by the reference note string.
  • a code string corresponding to the difficulty level of the reference note string and score is machine-learned. For example, in the case of a score with a low difficulty level, machine learning is performed while interpolating notes from a small number of tones. In the case of a score with a high degree of difficulty, machine learning is performed while selecting notes that form chords from an excessive number of tones.
  • elements other than the element A0, "bar” and “beat” correspond to the reference note string.
  • Elements A1 to A37 indicate the reference note string of the first measure.
  • the element A0 is placed at the beginning of the input time-series data A, that is, before the reference note string (elements A1 to A37), but it may be placed at any position in the input time-series data A.
  • elements A1 to A37 in the reference note string, “L” means left hand, “R” means right hand, and the number following “L” or “R” means scale. Also, “on” and “off” mean key depression and key release, respectively. Also, “wait” means waiting, and the number following "wait” means the length of time.
  • elements A1-A5 indicate pressing the keys of scale 77 and 74 with the right hand simultaneously while simultaneously pressing the keys of scale 53 and 46 with the left hand, followed by holding for 11 units of time.
  • elements A6 to A8 indicate that the left hand keys of scale 53 and scale 46 are released at the same time and then held for 1 unit of time. Then, after maintaining for one unit time, elements A9 to A11 indicate that the left hand presses scale 53 and scale 46 again, and then waits for five unit time.
  • the output code string information B shown in FIG. 3 indicates a code string corresponding to the reference note string included in the input time-series data A.
  • Code strings corresponding to elements A1 to A37 of input time-series data A are represented by elements B1 to B3 and elements B4 to B6. That is, the elements B1 to B6 indicate the code string corresponding to the first bar of the input time-series data A.
  • FIG. In the output code string information B bars are also separated by “bar” and beats by "beat”. A range delimited by "bar” before the element B1 and "bar” after the element B6 corresponds to the first bar.
  • one code is indicated by three elements.
  • Elements B1 to B3 define the chord of the first beat of the first bar.
  • Elements B4 to B6 define the chord of the fourth beat of the first bar.
  • Elements B7 to B9 define the chord of the first beat of the fourth bar.
  • the first element (B1, B4, B7) represents basic code information.
  • the basic chord information (chord) indicates a numerical value from 1 to 24 that designates the type of major chord and minor chord for each of the 12 tones (C, C#, D, D#, . . . A, A#, B).
  • the second element (B2, B5, B8) of the three elements indicating the chord indicates chord type information.
  • the chord type information indicates a numerical value designating the type of tension chord.
  • the third element (B3, B6, B9) represents chord root information.
  • the chord root information (root) indicates a numerical value designating the root note of the on-chord.
  • FIG. 4 is a block diagram showing the configuration of the training device 10 and the chord estimation device 20.
  • the training device 10 includes a first acquisition unit 11, a second acquisition unit 12, and a construction unit 13 as functional units.
  • the functional units of the training device 10 are implemented by the CPU 130 of FIG. 1 executing the training program. At least part of the functional units of the training device 10 may be realized by hardware such as an electronic circuit.
  • the first acquisition unit 11 acquires the input time-series data A from each training data D stored in the storage unit 140 or the like.
  • the second acquisition unit 12 acquires output code string information B from each training data D.
  • the construction unit 13 uses the input time-series data A acquired by the first acquisition unit 11 as an input element, and the output code string information B acquired by the second acquisition unit 12 as an output element. perform machine learning to By repeating machine learning for a plurality of training data D, the construction unit 13 constructs a trained model M indicating the input/output relationship between the input time-series data A and the output code string information B.
  • the building unit 13 builds the trained model M by training the Transformer, but the embodiment is not limited to this.
  • the construction unit 13 may construct the trained model M by training a machine learning model of another method that handles time series.
  • the trained model M constructed by the construction unit 13 is stored in the storage unit 140, for example.
  • the trained model M constructed by the construction unit 13 may be stored in a server or the like on the network.
  • the code estimation device 20 includes a reception unit 21, an estimation unit 22, and a generation unit 23 as functional units.
  • the functional units of the code estimation device 20 are implemented by the CPU 130 of FIG. 1 executing the code estimation program. At least part of the functional units of the code estimation device 20 may be realized by hardware such as an electronic circuit.
  • the reception unit 21 receives time-series data including a string of notes made up of a plurality of notes.
  • the performer can give image data representing an image of the musical score to the reception unit 21 as time-series data.
  • the performer can generate time-series data by operating the operation unit 150 and provide it to the reception unit 21 .
  • the time-series data has the same configuration as the input time-series data A in FIG. In other words, time-series data has a metrical structure and additional information in addition to a string of musical notes.
  • the estimation unit 22 estimates code string information using the trained model M stored in the storage unit 140 or the like.
  • the code string information indicates a code string corresponding to the note string accepted by the accepting unit 21, and is estimated based on the note string and additional information. Since the time-series data has the same configuration as the input time-series data A, the code string information has the same configuration as the output code string information B.
  • the generation unit 23 generates score information based on the note sequence of the time-series data received by the reception unit 21 and the code string information estimated by the estimation unit 22 .
  • the musical score information is information on an arranged musical score for a piano, and is data in which chord information is added to a staff notation.
  • the musical score information is MIDI data to which code string information is added.
  • the display unit 160 displays the musical score with chords based on the musical score information generated by the generating unit 23 .
  • FIG. 5 shows an example of a musical score with chords displayed on the display unit 160.
  • the code-attached musical score indicates that the code string information estimated by the estimating section 22 corresponds to each note of the note string accepted by the accepting section 21 .
  • FIG. 6 is a flowchart showing an example of training processing by the training apparatus 10 of FIG.
  • the training process in FIG. 6 is performed by CPU 130 in FIG. 1 executing a training program.
  • the first acquisition unit 11 acquires the input time-series data A from each training data D (step S1).
  • the second acquisition unit 12 acquires the output code string information B from each training data D (step S2). Either of steps S1 and S2 may be performed first, or may be performed simultaneously.
  • the construction unit 13 performs machine learning using the input time-series data A obtained in step S1 as an input element and the output code string information B obtained in step S2 as an output element. (Step S3). Subsequently, the construction unit 13 determines whether or not sufficient machine learning has been performed (step S4). If the machine learning is insufficient, the construction unit 13 returns to step S3. Steps S3 and S4 are repeated while changing the parameters until sufficient machine learning is performed. The number of iterations of machine learning changes according to quality conditions that the trained model M to be constructed should satisfy.
  • the construction unit 13 saves the input/output relationship between the input time-series data A and the output code string information B learned by the machine learning in step S3 as a trained model M (step S5). This completes the training process.
  • FIG. 7 is a flowchart showing an example of chord estimation processing by the chord estimation device 20 of FIG.
  • the chord estimation process in FIG. 7 is performed by CPU 130 in FIG. 1 executing a chord estimation program.
  • the receiving unit 21 receives time-series data (step S11).
  • the estimation unit 22 estimates code string information from the time-series data received in step S11 using the trained model M saved in step S5 of the training process (step S12).
  • code string information including one or a plurality of code strings is estimated from the note strings included in the time-series data
  • chord estimation is performed with a high degree of freedom.
  • the chord change timing is also estimated in the course of time, more appropriate chord estimation is performed.
  • the time-series data does not contain information that serves as a chord change delimiter, but the estimation unit 22 performs chord estimation including chord change timing.
  • the generation unit 23 After that, the generation unit 23 generates score information based on the note string of the time-series data received in step S11 and the code string information estimated in step S12 (step S13). A score with chords may be displayed on the display unit 160 based on the generated score information. This completes the chord estimation process.
  • the chord estimation apparatus 20 includes the receiving unit 21 that receives time-series data including a string of notes composed of a plurality of notes, and the trained model M: and an estimating unit 22 for estimating code string information indicating a code string corresponding to the musical note string.
  • the trained model M is used to estimate appropriate code string information from the temporal flow of multiple notes in the time-series data. This makes it possible to present a coded musical score based on time-series data including a string of notes. Since one or more chord strings are estimated from the note string, chord estimation is performed with a high degree of freedom.
  • the trained model M learns the input/output relationship between the input time-series data A including a reference note string consisting of a plurality of notes and the output code string information B indicating a code string corresponding to each note in the reference note string. It may be a machine learning model that In this case, code string information can be easily estimated from time-series data.
  • the estimation unit 22 may also estimate the chord change timing in the code string. As a result, more appropriate chord estimation corresponding to the note string is performed.
  • the input time-series data A may include genre information specifying the genre of music represented by the reference note string.
  • the time-series data may also include genre information that designates the genre of music represented by a string of musical notes.
  • the estimation unit 22 may estimate the code string information based on the time-series data including the genre information. Thus, chord estimation suitable for the genre of music is performed.
  • the input time-series data A may include key information that specifies the key of the music represented by the reference note string.
  • the time-series data may also include key information that specifies the key of music represented by a string of notes.
  • the estimating section 22 may estimate the code string information based on the time-series data including the key information. This provides a chord estimation that is appropriate for the key of the music.
  • the input time-series data A may include difficulty level information specifying the difficulty level of the musical score indicated by the reference note string.
  • the time-series data may also include difficulty level information that designates the difficulty level of the musical score indicated by the note string.
  • the estimation unit 22 may estimate the code string information based on the time-series data including the difficulty level information. As a result, appropriate chord estimation is performed according to the difficulty level of the musical score indicated by the note string.
  • the chord estimating device 20 may further include a generation unit 23 that generates musical score information indicating a chorded musical score to which code string information is added so as to correspond to each note of the musical note string.
  • the training apparatus 10 includes a first acquisition unit 11 that acquires input time-series data A including a reference note string composed of a plurality of notes, and an output code string that indicates a code string corresponding to the reference note string.
  • a second acquisition unit 12 that acquires information B, and a construction unit 13 that constructs a trained model M that has learned the input/output relationship between the input time-series data A and the output code string information B.
  • a trained model M that has learned the input/output relationship between the input time-series data A and the output code string information B can be easily constructed.
  • the input time-series data A includes additional information, and the time-series data includes additional information, but the embodiment is not limited to this.
  • the input time-series data A only needs to include the reference note string, and does not have to include additional information.
  • the time-series data may include musical note sequences and may not include additional information.
  • the input time-series data A has "bar” and "beat” information as the metrical structure, but the embodiment is not limited to this.
  • the input time-series data A may not have a metrical structure.
  • FIG. 8 is a diagram showing an example of output code string information B prepared for input time-series data A having no metrical structure. As shown in FIG. 8, the output code string information B does not have a metrical structure consisting of "bar" and "beat” information.
  • the construction unit 13 may construct different trained models M according to the type of additional information, or may construct one trained model M.
  • the input time-series data A may include, as additional information, a plurality of information out of key information, genre information, and difficulty level information.
  • the code estimation device 20 includes the generator 23, but the embodiment is not limited to this.
  • the player can create a musical score with chords by transcribing the chord string information estimated by the estimating section 22 to a desired musical score. Therefore, the code estimation device 20 does not have to include the generator 23 .
  • the training data D is trained to estimate chord string information when performing on the piano, but the embodiment is not limited to this.
  • the training data D may be trained to estimate chord string information when performing with other musical instruments such as guitars and drums.
  • the user of the chord estimation device 20 is a performer.
  • the machine learning by the training device 10 may be performed in advance by the staff of the musical score production company.

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Electrophonic Musical Instruments (AREA)

Abstract

In the present invention, a code estimation device comprises: a reception unit for receiving time-series data including a note sequence comprising a plurality of notes; and an estimation unit for estimating, using a trained model, code sequence information indicating a code sequence corresponding to the note sequence, on the basis of the time-series data.

Description

コード推定装置、訓練装置、コード推定方法および訓練方法Code estimation device, training device, code estimation method and training method
 本発明は、楽器を演奏するためのコードを推定するコード推定装置および方法、並びに、コード推定装置を構築するための訓練装置および方法に関する。 The present invention relates to a chord estimation device and method for estimating chords for playing a musical instrument, and a training device and method for constructing a chord estimation device.
 コードが付記された楽譜がある。演奏者は、ピアノ、ギターなどの楽器を用いてコードを演奏することで、楽器演奏を楽しむことができる。コード付きの楽譜を制作するにあたって、制作者は、音符で示されるメロディ、伴奏音などに基づいてコードを付与する作業を行う。コードを付与する作業は音楽的知識とセンスが必要とされる。下記特許文献1においては、演奏情報または音響信号からコードを推定するコード進行推定検出装置が開示されている。 There is a sheet music with chords added. Players can enjoy playing musical instruments by playing chords using musical instruments such as pianos and guitars. When creating a musical score with chords, the producer assigns chords based on the melody and accompaniment sounds indicated by the notes. The task of adding chords requires musical knowledge and sense. Japanese Unexamined Patent Application Publication No. 2002-200002 discloses a chord progression estimation and detection device that estimates chords from performance information or acoustic signals.
特許第6151121号公報Japanese Patent No. 6151121
 特許文献1においては、特定区間毎にコードが推定される。例えば、小節毎に1つのコードが推定される。与えられた音符から、さらに自由度の高いコード推定を行うことができれば、コード付き楽譜の制作を、より適切に支援可能となると期待される。 In Patent Document 1, a code is estimated for each specific section. For example, one chord is estimated per bar. If it is possible to perform chord estimation with a higher degree of freedom from given notes, it is expected that it will be possible to more appropriately support the production of musical scores with chords.
 本発明の目的は、音符列に基づいて自由度の高いコード推定を行うことである。 The purpose of the present invention is to perform chord estimation with a high degree of freedom based on musical note strings.
 本発明の一局面に従うコード推定装置は、複数の音符からなる音符列を含む時系列データを受け付ける受付部と、訓練済モデルを用い、時系列データに基づいて、音符列に対応するコード列を示すコード列情報を推定する推定部とを備える。 A chord estimation apparatus according to an aspect of the present invention uses a receiving unit that receives time-series data including a string of notes composed of a plurality of notes, and a trained model to generate a code string corresponding to the string of notes based on the time-series data. and an estimating unit for estimating the code string information to be indicated.
 本発明の他の局面に従う訓練装置は、複数の音符からなる参照音符列を含む入力時系列データを取得する第1の取得部と、参照音符列に対応するコード列を示す出力コード列情報を取得する第2の取得部と、入力時系列データと出力コード列情報との間の入出力関係を習得した訓練済モデルを構築する構築部とを備える。 A training apparatus according to another aspect of the present invention includes a first acquisition unit that acquires input time-series data including a reference note string composed of a plurality of notes, and output code string information that indicates a code string corresponding to the reference note string. A second acquisition unit for acquisition, and a construction unit for constructing a trained model that has learned the input/output relationship between the input time-series data and the output code string information.
 本発明のさらに他の局面に従うコード推定方法は、コンピュータにより実行され、複数の音符からなる音符列を含む時系列データを受け付け、訓練済モデルを用い、時系列データに基づいて、音符列に対応するコード列を示すコード列情報を推定する。 A chord estimation method according to yet another aspect of the present invention is executed by a computer, accepts time series data including a string of notes, uses a trained model, and corresponds to the string of notes based on the time series data. Estimate code string information that indicates the code string to be used.
 本発明のさらに他の局面に従う訓練方法は、コンピュータにより実行され、複数の音符からなる参照音符列を含む入力時系列データを取得し、参照音符列に対応するコード列を示す出力コード列情報を取得し、入力時系列データと出力コード列情報との間の入出力関係を習得した訓練済モデルを構築する。 A training method according to still another aspect of the present invention is executed by a computer, acquires input time-series data including a reference note string consisting of a plurality of notes, and outputs code string information indicating a code string corresponding to the reference note string. acquire and build a trained model that has learned the input/output relationship between the input time-series data and the output code string information.
 本発明によれば、音符列に基づいて自由度の高いコード推定を行うことができる。 According to the present invention, chord estimation with a high degree of freedom can be performed based on a string of musical notes.
図1は本発明の一実施の形態に係るコード推定装置および訓練装置を含む処理システムの構成を示すブロック図である。FIG. 1 is a block diagram showing the configuration of a processing system including a chord estimation device and a training device according to one embodiment of the present invention. 図2は訓練データに含まれる入力時系列データの一例を示す図である。FIG. 2 is a diagram showing an example of input time-series data included in training data. 図3は訓練データに含まれる出力コード列情報の一例を示す図である。FIG. 3 is a diagram showing an example of output code string information included in training data. 図4は訓練装置およびコード推定装置の構成を示すブロック図である。FIG. 4 is a block diagram showing the configuration of the training device and chord estimation device. 図5は表示部に表示されるアレンジ楽譜の一例を示す。FIG. 5 shows an example of an arranged musical score displayed on the display unit. 図6は訓練処理の一例を示すフローチャートである。FIG. 6 is a flowchart showing an example of training processing. 図7はコード推定処理の一例を示すフローチャートである。FIG. 7 is a flowchart showing an example of chord estimation processing. 図8は訓練データに含まれる出力コード列情報の変形例を示す図である。FIG. 8 is a diagram showing a modified example of output code string information included in training data.
 (1)処理システムの構成
 以下、本発明の実施の形態に係るコード推定装置、訓練装置、コード推定方法および訓練方法について図面を用いて詳細に説明する。図1は、本発明の一実施の形態に係るコード推定装置および訓練装置を含む処理システムの構成を示すブロック図である。図1に示すように、処理システム100は、RAM(ランダムアクセスメモリ)110、ROM(リードオンリメモリ)120、CPU(中央演算処理装置)130、記憶部140、操作部150および表示部160を備える。
(1) Configuration of Processing System Hereinafter, a chord estimation device, a training device, a chord estimation method, and a training method according to embodiments of the present invention will be described in detail with reference to the drawings. FIG. 1 is a block diagram showing the configuration of a processing system including a chord estimation device and a training device according to one embodiment of the present invention. As shown in FIG. 1, the processing system 100 includes a RAM (Random Access Memory) 110, a ROM (Read Only Memory) 120, a CPU (Central Processing Unit) 130, a storage section 140, an operation section 150 and a display section 160. .
 処理システム100は、パーソナルコンピュータ、タブレット端末またはスマートフォン等のコンピュータにより実現される。あるいは、処理システム100は、イーサネット等の通信路により接続された複数のコンピュータの共同動作により実現されてもよいし、電子ピアノ等の演奏機能を備えた電子楽器により実現されてもよい。 The processing system 100 is implemented by a computer such as a personal computer, tablet terminal, or smart phone. Alternatively, the processing system 100 may be realized by cooperative operation of a plurality of computers connected by a communication path such as Ethernet, or may be realized by an electronic musical instrument such as an electronic piano having performance functions.
 RAM110、ROM120、CPU130、記憶部140、操作部150および表示部160は、バス170に接続される。RAM110、ROM120およびCPU130により訓練装置10およびコード推定装置20が構成される。本実施の形態では、訓練装置10とコード推定装置20とは共通の処理システム100により構成されるが、別個の処理システムにより構成されてもよい。 The RAM 110 , ROM 120 , CPU 130 , storage section 140 , operation section 150 and display section 160 are connected to the bus 170 . RAM 110 , ROM 120 and CPU 130 constitute training device 10 and chord estimation device 20 . Although training device 10 and chord estimation device 20 are configured by common processing system 100 in this embodiment, they may be configured by separate processing systems.
 RAM110は、例えば揮発性メモリからなり、CPU130の作業領域として用いられる。ROM120は、例えば不揮発性メモリからなり、訓練プログラムおよびコード推定プログラムを記憶する。CPU130は、ROM120に記憶された訓練プログラムをRAM110上で実行することにより訓練処理を行う。また、CPU130は、ROM120に記憶されたコード推定プログラムをRAM110上で実行することによりコード推定処理を行う。訓練処理およびコード推定処理の詳細については後述する。 The RAM 110 consists of, for example, a volatile memory, and is used as a work area for the CPU 130. The ROM 120 is, for example, a non-volatile memory and stores a training program and a code estimation program. CPU 130 performs a training process by executing a training program stored in ROM 120 on RAM 110 . Further, the CPU 130 performs code estimation processing by executing a code estimation program stored in the ROM 120 on the RAM 110 . Details of the training process and the code estimation process will be described later.
 訓練プログラムまたはコード推定プログラムは、ROM120ではなく記憶部140に記憶されてもよい。あるいは、訓練プログラムまたはコード推定プログラムは、コンピュータが読み取り可能な記憶媒体に記憶された形態で提供され、ROM120または記憶部140にインストールされてもよい。あるいは、処理システム100がインターネット等のネットワークに接続されている場合には、当該ネットワーク上のサーバ(クラウドサーバを含む。)から配信された訓練プログラムまたはコード推定プログラムがROM120または記憶部140にインストールされてもよい。 The training program or code estimation program may be stored in the storage unit 140 instead of the ROM 120. Alternatively, the training program or code estimation program may be provided in a form stored in a computer-readable storage medium and installed in ROM 120 or storage unit 140 . Alternatively, when the processing system 100 is connected to a network such as the Internet, a training program or code estimation program distributed from a server (including a cloud server) on the network is installed in the ROM 120 or the storage unit 140. may
 記憶部140は、ハードディスク、光学ディスク、磁気ディスクまたはメモリカード等の記憶媒体を含み、訓練済モデルMおよび複数の訓練データDを記憶する。訓練済モデルMまたは各訓練データDは、記憶部140に記憶されず、コンピュータが読み取り可能な記憶媒体に記憶されていてもよい。あるいは、処理システム100がネットワークに接続されている場合には、訓練済モデルMまたは各訓練データDは、当該ネットワーク上のサーバに記憶されていてもよい。 The storage unit 140 includes a storage medium such as a hard disk, an optical disk, a magnetic disk, or a memory card, and stores a trained model M and a plurality of training data D. The trained model M or each piece of training data D may not be stored in the storage unit 140, but may be stored in a computer-readable storage medium. Alternatively, if the processing system 100 is connected to a network, the trained model M or respective training data D may be stored on a server on that network.
 (2)訓練データ
 訓練済モデルMは、コード推定装置20の使用者(以下、演奏者と呼ぶ。)が楽曲を演奏するときに参照するコード列を提示するために訓練された機械学習モデルである。訓練済モデルMは、複数の訓練データDを用いて構築される。訓練装置10の使用者は、操作部150を操作することにより、訓練データDを生成することができる。訓練データDは、参照演奏者の音楽的知識または音楽的センス等に基づいて作成されたデータである。参照演奏者は、楽曲の演奏に関して比較的高い技量を有する。参照演奏者は、楽曲の演奏における演奏者の指導者または師であってもよい。
(2) Training Data The trained model M is a machine learning model that has been trained to present chord strings to be referred to when the user of the chord estimation device 20 (hereinafter referred to as a performer) plays a piece of music. be. A trained model M is constructed using a plurality of training data D. A user of the training device 10 can generate the training data D by operating the operation unit 150 . The training data D is data created based on the musical knowledge or musical sense of the reference performer. The reference performer has a relatively high level of skill in playing the piece of music. A reference performer may be the performer's mentor or teacher in the performance of the musical composition.
 訓練データDは、入力時系列データと出力コード列情報との組を示す。入力時系列データは、複数の音符からなる参照音符列を示す。例えば、入力時系列データは、複数の音符によってメロディや伴奏音を構成するデータである。入力時系列データは楽譜の画像を示す画像データであってもよい。出力コード列情報は、参照音符列に対応するコードが時系列に配置されたデータである。参照音符列に対応するコード列は、参照演奏者により付与される。 The training data D indicates a set of input time-series data and output code string information. The input time-series data indicates a reference note string consisting of a plurality of notes. For example, the input time-series data is data that forms a melody or accompaniment sound with a plurality of notes. The input time-series data may be image data representing images of musical scores. The output code string information is data in which codes corresponding to the reference note string are arranged in time series. A code string corresponding to the reference note string is provided by the reference performer.
 図2および図3は、各訓練データDの一例を示す図である。図2の例は、複数の音符からなる参照音符列を含む入力時系列データを示す。図3の例は、参照音符列に対応するコード列を示す出力コード列情報を示す。  Figs. 2 and 3 are diagrams showing an example of each training data D. Figs. The example in FIG. 2 shows input time-series data including a reference note string consisting of a plurality of notes. The example in FIG. 3 shows output code string information indicating a code string corresponding to the reference note string.
 本実施の形態においては、入力時系列データは、参照音符列に加えて、拍節構造および付加情報を有する。図2に示す入力時系列データAは、曲の先頭の2小節分のデータを抜粋したデータである。入力時系列データAは、“bar”によって小節が区切られ、“beat”によって拍が区切られている。このように、入力時系列データAは、“bar”および“beat”情報により拍節構造を備える。要素A1~A37は、最初の1小節の参照音符列を示す。つまり、要素A1~A37は、要素A1の前の“bar”と要素A37の後の“bar”によって小節に区切られている。また、要素A8、A18、A26の後の“beat”によって拍に区切られている。 In this embodiment, the input time-series data has a metrical structure and additional information in addition to the reference note string. The input time-series data A shown in FIG. 2 is data obtained by extracting data for the first two bars of a song. In the input time-series data A, bars are separated by "bar", and beats are separated by "beat". In this way, the input time-series data A has a metrical structure with the "bar" and "beat" information. Elements A1 to A37 indicate the reference note string of the first bar. That is, the elements A1 to A37 are separated into bars by the "bar" before the element A1 and the "bar" after the element A37. In addition, it is divided into beats by "beat" after elements A8, A18, and A26.
 要素A0は、付加情報である。付加情報としては、例えば、調情報、ジャンル情報、難易度情報などが利用される。図2の例では、Key要素により調情報が付加されている。調情報は、参照音符列で表現される音楽の調を指定する情報である。Keyに続く数値は調を指定する数値である。付加情報として調情報が指定されることにより、参照音符列および調に応じたコード列が機械学習される。ジャンル情報は、参照音符列で表現される音楽のジャンルを指定する情報である。ジャンル情報としては、例えば、ロック、ポップス、ジャズなどのジャンルが指定される。付加情報としてジャンル情報が指定されることにより、参照音符列およびジャンルに応じたコード列が機械学習される。難易度情報は、参照音符列で示される楽譜の難易度を示す情報である。付加情報として難易度情報が指定されることにより、参照音符列および楽譜の難易度に応じたコード列が機械学習される。例えば、低難易度の楽譜であれば少ない音数から音符の補間を行いつつ機械学習が行われる。また、高難易度の楽譜であれば過剰な音数の中からコードを構成する音符を選択しつつ機械学習が行われる。 The element A0 is additional information. As the additional information, for example, key information, genre information, difficulty level information, and the like are used. In the example of FIG. 2, key information is added by the Key element. The key information is information specifying the key of the music represented by the reference note string. The numerical value following Key is the numerical value that designates the key. By designating the key information as the additional information, the reference note string and the code string corresponding to the key are machine-learned. Genre information is information that designates the genre of music represented by the reference note string. As genre information, for example, genres such as rock, pops, and jazz are specified. By designating genre information as additional information, a reference note string and a code string corresponding to the genre are machine-learned. The difficulty level information is information indicating the difficulty level of the musical score indicated by the reference note string. By specifying difficulty level information as additional information, a code string corresponding to the difficulty level of the reference note string and score is machine-learned. For example, in the case of a score with a low difficulty level, machine learning is performed while interpolating notes from a small number of tones. In the case of a score with a high degree of difficulty, machine learning is performed while selecting notes that form chords from an excessive number of tones.
入力時系列データAの要素のうち、要素A0、“bar”および“beat”以外の要素は、参照音符列に対応する。要素A1~A37は、1小節目の参照音符列を示す。本例では、要素A0は入力時系列データAにおける先頭、すなわち参照音符列(要素A1~A37)の前に配置されるが、入力時系列データAにおける任意の位置に配置されてもよい。 Among the elements of the input time-series data A, elements other than the element A0, "bar" and "beat" correspond to the reference note string. Elements A1 to A37 indicate the reference note string of the first measure. In this example, the element A0 is placed at the beginning of the input time-series data A, that is, before the reference note string (elements A1 to A37), but it may be placed at any position in the input time-series data A.
 要素A1~A37に例示するように、参照音符列において、“L”は左手を意味し、“R”は右手を意味し、“L”または“R”に続く数字は音階を意味する。また、“on”および“off”はそれぞれ押鍵および離鍵を意味する。また、“wait”は待機を意味し、“wait”に続く数字は時間の長さを意味する。したがって、要素A1~A5は、右手で音階77および音階74の鍵を押すと同時に、左手で音階53と音階46の鍵を同時に押した後、11単位時間だけ維持することを示す。そして、11単位時間だけ維持した後、要素A6~A8は、左手の音階53と音階46の鍵を同時に離した後、1単位時間だけ維持することを示す。そして、1単位時間だけ維持した後、要素A9~要素A11は、左手で音階53と音階46を再び押した後、5単位時間だけ待機することを示す。 As exemplified by elements A1 to A37, in the reference note string, "L" means left hand, "R" means right hand, and the number following "L" or "R" means scale. Also, "on" and "off" mean key depression and key release, respectively. Also, "wait" means waiting, and the number following "wait" means the length of time. Thus, elements A1-A5 indicate pressing the keys of scale 77 and 74 with the right hand simultaneously while simultaneously pressing the keys of scale 53 and 46 with the left hand, followed by holding for 11 units of time. After holding for 11 units of time, elements A6 to A8 indicate that the left hand keys of scale 53 and scale 46 are released at the same time and then held for 1 unit of time. Then, after maintaining for one unit time, elements A9 to A11 indicate that the left hand presses scale 53 and scale 46 again, and then waits for five unit time.
 図3に示す出力コード列情報Bは、入力時系列データAに含まれる参照音符列に対応するコード列を示す。入力時系列データAの要素A1~A37に対応するコード列は、要素B1~B3および要素B4~B6で表されている。つまり、要素B1~B6は、入力時系列データAの1小節目に対応するコード列を示す。出力コード列情報Bにおいても、“bar”によって小節が区切られ、“beat”によって拍が区切られている。要素B1の前の“bar”および要素B6の後の“bar”によって区切られた範囲が1小節目に対応している。 The output code string information B shown in FIG. 3 indicates a code string corresponding to the reference note string included in the input time-series data A. Code strings corresponding to elements A1 to A37 of input time-series data A are represented by elements B1 to B3 and elements B4 to B6. That is, the elements B1 to B6 indicate the code string corresponding to the first bar of the input time-series data A. FIG. In the output code string information B, bars are also separated by "bar" and beats by "beat". A range delimited by "bar" before the element B1 and "bar" after the element B6 corresponds to the first bar.
 出力コード列情報Bにおいて、1つのコードは3つの要素で示される。要素B1~B3において1小節目の1拍目のコードが規定される。要素B4~B6において1小節目の4拍目のコードが規定される。要素B7~B9において4小節目の1拍目のコードが規定される。コードを示す3つの要素のうち、1番目の要素(B1,B4,B7)は、基本コード情報を示す。基本コード情報(chord)は、12音(C,C#,D,D#,・・・A,A#,B)それぞれについてメジャーコードおよびマイナーコードの種別を指定する1~24の数値を示す。コードを示す3つの要素のうち、2番目の要素(B2,B5,B8)は、コードタイプ情報を示す。コードタイプ情報(type)は、テンションコードの種別を指定する数値を示す。コードを示す3つの要素のうち、3番目の要素(B3,B6,B9)は、コードルート情報を示す。コードルート情報(root)は、オンコードのルート音を指定する数値を示す。 In the output code string information B, one code is indicated by three elements. Elements B1 to B3 define the chord of the first beat of the first bar. Elements B4 to B6 define the chord of the fourth beat of the first bar. Elements B7 to B9 define the chord of the first beat of the fourth bar. Of the three elements representing the code, the first element (B1, B4, B7) represents basic code information. The basic chord information (chord) indicates a numerical value from 1 to 24 that designates the type of major chord and minor chord for each of the 12 tones (C, C#, D, D#, . . . A, A#, B). . The second element (B2, B5, B8) of the three elements indicating the chord indicates chord type information. The chord type information (type) indicates a numerical value designating the type of tension chord. Of the three elements representing chords, the third element (B3, B6, B9) represents chord root information. The chord root information (root) indicates a numerical value designating the root note of the on-chord.
 (3)訓練装置およびコード推定装置
 図4は、訓練装置10およびコード推定装置20の構成を示すブロック図である。図4に示すように、訓練装置10は、機能部として、第1の取得部11、第2の取得部12および構築部13を含む。図1のCPU130が訓練プログラムを実行することにより、訓練装置10の機能部が実現される。訓練装置10の機能部の少なくとも一部は、電子回路等のハードウエアにより実現されてもよい。
(3) Training Device and Chord Estimation Device FIG. 4 is a block diagram showing the configuration of the training device 10 and the chord estimation device 20. As shown in FIG. As shown in FIG. 4, the training device 10 includes a first acquisition unit 11, a second acquisition unit 12, and a construction unit 13 as functional units. The functional units of the training device 10 are implemented by the CPU 130 of FIG. 1 executing the training program. At least part of the functional units of the training device 10 may be realized by hardware such as an electronic circuit.
 第1の取得部11は、記憶部140等に記憶された各訓練データDから入力時系列データAを取得する。第2の取得部12は、各訓練データDから出力コード列情報Bを取得する。構築部13は、各訓練データDについて、第1の取得部11により取得された入力時系列データAを入力要素とし、第2の取得部12により取得された出力コード列情報Bを出力要素とする機械学習を行う。複数の訓練データDについて機械学習を繰り返すことにより、構築部13は、入力時系列データAと出力コード列情報Bとの間の入出力関係を示す訓練済モデルMを構築する。 The first acquisition unit 11 acquires the input time-series data A from each training data D stored in the storage unit 140 or the like. The second acquisition unit 12 acquires output code string information B from each training data D. FIG. For each training data D, the construction unit 13 uses the input time-series data A acquired by the first acquisition unit 11 as an input element, and the output code string information B acquired by the second acquisition unit 12 as an output element. perform machine learning to By repeating machine learning for a plurality of training data D, the construction unit 13 constructs a trained model M indicating the input/output relationship between the input time-series data A and the output code string information B.
 本例では、構築部13はTransformerを訓練することにより訓練済モデルMを構築するが、実施の形態はこれに限定されない。構築部13は、時系列を扱う他の方式の機械学習モデルを訓練することにより訓練済モデルMを構築してもよい。構築部13により構築された訓練済モデルMは、例えば記憶部140に記憶される。構築部13により構築された訓練済モデルMは、ネットワーク上のサーバ等に記憶されてもよい。 In this example, the building unit 13 builds the trained model M by training the Transformer, but the embodiment is not limited to this. The construction unit 13 may construct the trained model M by training a machine learning model of another method that handles time series. The trained model M constructed by the construction unit 13 is stored in the storage unit 140, for example. The trained model M constructed by the construction unit 13 may be stored in a server or the like on the network.
 コード推定装置20は、機能部として、受付部21、推定部22および生成部23を含む。図1のCPU130がコード推定プログラムを実行することにより、コード推定装置20の機能部が実現される。コード推定装置20の機能部の少なくとも一部は、電子回路等のハードウエアにより実現されてもよい。 The code estimation device 20 includes a reception unit 21, an estimation unit 22, and a generation unit 23 as functional units. The functional units of the code estimation device 20 are implemented by the CPU 130 of FIG. 1 executing the code estimation program. At least part of the functional units of the code estimation device 20 may be realized by hardware such as an electronic circuit.
 本実施の形態では、受付部21は、複数の音符からなる音符列を含む時系列データを受け付ける。演奏者は、楽譜の画像を示す画像データを時系列データとして受付部21に与えることができる。あるいは、演奏者は、操作部150を操作することにより時系列データを生成し、受付部21に与えることができる。本例では、時系列データは、図2の入力時系列データAと同様の構成を有する。つまり、時系列データは、音符列に加えて、拍節構造および付加情報を有する。 In the present embodiment, the reception unit 21 receives time-series data including a string of notes made up of a plurality of notes. The performer can give image data representing an image of the musical score to the reception unit 21 as time-series data. Alternatively, the performer can generate time-series data by operating the operation unit 150 and provide it to the reception unit 21 . In this example, the time-series data has the same configuration as the input time-series data A in FIG. In other words, time-series data has a metrical structure and additional information in addition to a string of musical notes.
 推定部22は、記憶部140等に記憶された訓練済モデルMを用いてコード列情報を推定する。コード列情報は、受付部21により受け付けられた音符列に対応するコード列を示し、音符列および付加情報に基づいて推定される。時系列データが、入力時系列データAと同様の構成を有することにより、コード列情報は出力コード列情報Bと同様の構成を有する。生成部23は、受付部21により受け付けられた時系列データの音符列と、推定部22により推定されたコード列情報とに基づいて楽譜情報を生成する。例えば、楽譜情報は、ピアノのアレンジ楽譜の情報であり、五線譜の上にコード情報が付記されたデータである。あるいは、楽譜情報は、コード列情報が付加されたMIDIデータである。 The estimation unit 22 estimates code string information using the trained model M stored in the storage unit 140 or the like. The code string information indicates a code string corresponding to the note string accepted by the accepting unit 21, and is estimated based on the note string and additional information. Since the time-series data has the same configuration as the input time-series data A, the code string information has the same configuration as the output code string information B. The generation unit 23 generates score information based on the note sequence of the time-series data received by the reception unit 21 and the code string information estimated by the estimation unit 22 . For example, the musical score information is information on an arranged musical score for a piano, and is data in which chord information is added to a staff notation. Alternatively, the musical score information is MIDI data to which code string information is added.
 表示部160には、生成部23により生成された楽譜情報に基づいてコード付き楽譜が表示される。図5は、表示部160に表示されるコード付き楽譜の一例を示す。図5に示すように、コード付き楽譜には、推定部22により推定されたコード列情報が受付部21により受け付けられた音符列の各音符に対応するように示される。 The display unit 160 displays the musical score with chords based on the musical score information generated by the generating unit 23 . FIG. 5 shows an example of a musical score with chords displayed on the display unit 160. As shown in FIG. As shown in FIG. 5, the code-attached musical score indicates that the code string information estimated by the estimating section 22 corresponds to each note of the note string accepted by the accepting section 21 .
 (4)訓練処理およびコード推定処理
 図6は、図4の訓練装置10による訓練処理の一例を示すフローチャートである。図6の訓練処理は、図1のCPU130が訓練プログラムを実行することにより行われる。まず、第1の取得部11は、各訓練データDから入力時系列データAを取得する(ステップS1)。また、第2の取得部12は、各訓練データDから出力コード列情報Bを取得する(ステップS2)。ステップS1,S2は、いずれが先に実行されてもよいし、同時に実行されてもよい。
(4) Training Processing and Chord Estimation Processing FIG. 6 is a flowchart showing an example of training processing by the training apparatus 10 of FIG. The training process in FIG. 6 is performed by CPU 130 in FIG. 1 executing a training program. First, the first acquisition unit 11 acquires the input time-series data A from each training data D (step S1). Also, the second acquisition unit 12 acquires the output code string information B from each training data D (step S2). Either of steps S1 and S2 may be performed first, or may be performed simultaneously.
 次に、構築部13は、各訓練データDについて、ステップS1で取得された入力時系列データAを入力要素とし、ステップS2で取得された出力コード列情報Bを出力要素とする機械学習を行う(ステップS3)。続いて、構築部13は、十分な機械学習が実行されたか否かを判定する(ステップS4)。機械学習が不十分な場合、構築部13はステップS3に戻る。十分な機械学習が実行されるまで、パラメータが変化されつつステップS3,S4が繰り返される。機械学習の繰り返し回数は、構築される訓練済モデルMが満たすべき品質条件に応じて変化する。 Next, for each training data D, the construction unit 13 performs machine learning using the input time-series data A obtained in step S1 as an input element and the output code string information B obtained in step S2 as an output element. (Step S3). Subsequently, the construction unit 13 determines whether or not sufficient machine learning has been performed (step S4). If the machine learning is insufficient, the construction unit 13 returns to step S3. Steps S3 and S4 are repeated while changing the parameters until sufficient machine learning is performed. The number of iterations of machine learning changes according to quality conditions that the trained model M to be constructed should satisfy.
 十分な機械学習が実行された場合、構築部13は、ステップS3の機械学習により習得した入力時系列データAと出力コード列情報Bとの間の入出力関係を訓練済モデルMとして保存する(ステップS5)。これにより、訓練処理が終了する。 When sufficient machine learning has been performed, the construction unit 13 saves the input/output relationship between the input time-series data A and the output code string information B learned by the machine learning in step S3 as a trained model M ( step S5). This completes the training process.
 図7は、図4のコード推定装置20によるコード推定処理の一例を示すフローチャートである。図7のコード推定処理は、図1のCPU130がコード推定プログラムを実行することにより行われる。まず、受付部21は、時系列データを受け付ける(ステップS11)。次に、推定部22は、訓練処理のステップS5で保存された訓練済モデルMを用いて、ステップS11で受け付けられた時系列データからコード列情報を推定する(ステップS12)。このとき、時系列データに含まれる音符列からは1つ、または複数のコード列を含むコード列情報が推定されるので、自由度の高いコード推定が行われる。また、時間的流れの中でコードチェンジのタイミングも推定されるので、より適切なコード推定が行われる。つまり、時系列データにはコードチェンジの区切りとなる情報は含まれていないが、推定部22は、コードチェンジのタイミングを含めたコード推定を行う。 FIG. 7 is a flowchart showing an example of chord estimation processing by the chord estimation device 20 of FIG. The chord estimation process in FIG. 7 is performed by CPU 130 in FIG. 1 executing a chord estimation program. First, the receiving unit 21 receives time-series data (step S11). Next, the estimation unit 22 estimates code string information from the time-series data received in step S11 using the trained model M saved in step S5 of the training process (step S12). At this time, since code string information including one or a plurality of code strings is estimated from the note strings included in the time-series data, chord estimation is performed with a high degree of freedom. In addition, since the chord change timing is also estimated in the course of time, more appropriate chord estimation is performed. In other words, the time-series data does not contain information that serves as a chord change delimiter, but the estimation unit 22 performs chord estimation including chord change timing.
 その後、生成部23は、ステップS11で受け付けられた時系列データの音符列およびステップS12で推定されたコード列情報に基づいて楽譜情報を生成する(ステップS13)。生成された楽譜情報に基づいて、コード付き楽譜が表示部160に表示されてもよい。これにより、コード推定処理が終了する。 After that, the generation unit 23 generates score information based on the note string of the time-series data received in step S11 and the code string information estimated in step S12 (step S13). A score with chords may be displayed on the display unit 160 based on the generated score information. This completes the chord estimation process.
 (5)実施の形態の効果
 以上説明したように、本実施の形態に係るコード推定装置20は、複数の音符からなる音符列を含む時系列データを受け付ける受付部21と、訓練済モデルMを用いて、音符列に対応するコード列を示すコード列情報を推定する推定部22とを備える。この構成によれば、訓練済モデルMを用いて、時系列データにおける複数の音符の時間的流れから適切なコード列情報が推定される。これにより、音符列を含む時系列データに基づいてコード付き楽譜を提示することができる。音符列からは1つ、または複数のコード列が推定されるので、自由度の高いコード推定が行われる。
(5) Effect of the Embodiment As described above, the chord estimation apparatus 20 according to the present embodiment includes the receiving unit 21 that receives time-series data including a string of notes composed of a plurality of notes, and the trained model M: and an estimating unit 22 for estimating code string information indicating a code string corresponding to the musical note string. According to this configuration, the trained model M is used to estimate appropriate code string information from the temporal flow of multiple notes in the time-series data. This makes it possible to present a coded musical score based on time-series data including a string of notes. Since one or more chord strings are estimated from the note string, chord estimation is performed with a high degree of freedom.
 訓練済モデルMは、複数の音符からなる参照音符列を含む入力時系列データAと、参照音符列の各音符に対応するコード列を示す出力コード列情報Bとの間の入出力関係を習得した機械学習モデルであってもよい。この場合、時系列データからコード列情報を容易に推定することができる。 The trained model M learns the input/output relationship between the input time-series data A including a reference note string consisting of a plurality of notes and the output code string information B indicating a code string corresponding to each note in the reference note string. It may be a machine learning model that In this case, code string information can be easily estimated from time-series data.
 推定部22は、コード列におけるコードチェンジのタイミングについても推定してもよい。これにより、音符列に対応した、より適切なコード推定が行われる。 The estimation unit 22 may also estimate the chord change timing in the code string. As a result, more appropriate chord estimation corresponding to the note string is performed.
 入力時系列データAは、参照音符列で表現される音楽のジャンルを指定するジャンル情報を含んでもよい。また、時系列データは、音符列で表現される音楽のジャンルを指定するジャンル情報を含んでもよい。そして、推定部22は、ジャンル情報を含む時系列データに基づいて、コード列情報を推定してもよい。これにより、音楽のジャンルに適したコード推定が行われる。 The input time-series data A may include genre information specifying the genre of music represented by the reference note string. The time-series data may also include genre information that designates the genre of music represented by a string of musical notes. Then, the estimation unit 22 may estimate the code string information based on the time-series data including the genre information. Thus, chord estimation suitable for the genre of music is performed.
 入力時系列データAは、参照音符列で表現される音楽の調を指定する調情報を含んでもよい。また、時系列データは、音符列で表現される音楽の調を指定する調情報を含んでもよい。そして、推定部22は、調情報を含む時系列データに基づいて、コード列情報を推定してもよい。これにより、音楽の調に適したコード推定が行われる。 The input time-series data A may include key information that specifies the key of the music represented by the reference note string. The time-series data may also include key information that specifies the key of music represented by a string of notes. Then, the estimating section 22 may estimate the code string information based on the time-series data including the key information. This provides a chord estimation that is appropriate for the key of the music.
 入力時系列データAは、参照音符列で示される楽譜の難易度を指定する難易度情報を含んでもよい。また、時系列データは、音符列で示される楽譜の難易度を指定する難易度情報を含んでもよい。そして、推定部22は、難易度情報を含む時系列データに基づいて、コード列情報を推定してもよい。これにより、音符列で示される楽譜の難易度に応じて適切なコード推定が行われる。 The input time-series data A may include difficulty level information specifying the difficulty level of the musical score indicated by the reference note string. The time-series data may also include difficulty level information that designates the difficulty level of the musical score indicated by the note string. Then, the estimation unit 22 may estimate the code string information based on the time-series data including the difficulty level information. As a result, appropriate chord estimation is performed according to the difficulty level of the musical score indicated by the note string.
 コード推定装置20は、音符列の各音符に対応するようにコード列情報が付されたコード付き楽譜を示す楽譜情報を生成する生成部23をさらに備えてもよい。 The chord estimating device 20 may further include a generation unit 23 that generates musical score information indicating a chorded musical score to which code string information is added so as to correspond to each note of the musical note string.
 本実施の形態に係る訓練装置10は、複数の音符からなる参照音符列を含む入力時系列データAを取得する第1の取得部11と、参照音符列に対応するコード列を示す出力コード列情報Bを取得する第2の取得部12と、入力時系列データAと出力コード列情報Bとの間の入出力関係を習得した訓練済モデルMを構築する構築部13とを備える。この構成によれば、入力時系列データAと出力コード列情報Bとの間の入出力関係を習得した訓練済モデルMを容易に構築することができる。 The training apparatus 10 according to the present embodiment includes a first acquisition unit 11 that acquires input time-series data A including a reference note string composed of a plurality of notes, and an output code string that indicates a code string corresponding to the reference note string. A second acquisition unit 12 that acquires information B, and a construction unit 13 that constructs a trained model M that has learned the input/output relationship between the input time-series data A and the output code string information B. According to this configuration, a trained model M that has learned the input/output relationship between the input time-series data A and the output code string information B can be easily constructed.
 (6)他の実施の形態
 上記実施の形態において、入力時系列データAは付加情報を含み、時系列データは付加情報を含むが、実施の形態はこれに限定されない。入力時系列データAは、参照音符列を含めばよく、付加情報を含まなくてもよい。同様に、時系列データは、音符列を含めばよく、付加情報を含まなくてもよい。
(6) Other Embodiments In the above embodiment, the input time-series data A includes additional information, and the time-series data includes additional information, but the embodiment is not limited to this. The input time-series data A only needs to include the reference note string, and does not have to include additional information. Similarly, the time-series data may include musical note sequences and may not include additional information.
 上記実施の形態において、入力時系列データAは拍節構造として“bar”および“beat”情報を有するが、実施の形態はこれに限定されない。入力時系列データAは拍節構造を有していなくてもよい。図8は、拍節構造を有していない入力時系列データAに対応して準備された出力コード列情報Bの一例を示す図である。図8に示すように、出力コード列情報Bは、“bar”および“beat”情報からなる拍節構造を有していない。 In the above embodiment, the input time-series data A has "bar" and "beat" information as the metrical structure, but the embodiment is not limited to this. The input time-series data A may not have a metrical structure. FIG. 8 is a diagram showing an example of output code string information B prepared for input time-series data A having no metrical structure. As shown in FIG. 8, the output code string information B does not have a metrical structure consisting of "bar" and "beat" information.
 上記実施の形態において、入力時系列データAは付加情報として調情報、ジャンル情報および難易度情報を有する場合を例に説明した。構築部13は、付加情報の種類に応じて異なる訓練済モデルMを構築してもよいし、1つの訓練済モデルMを構築してもよい。あるいは、入力時系列データAは付加情報として、調情報、ジャンル情報および難易度情報のうち複数の情報を含めてもよい。 In the above embodiment, the case where the input time-series data A has key information, genre information, and difficulty level information as additional information has been described as an example. The construction unit 13 may construct different trained models M according to the type of additional information, or may construct one trained model M. Alternatively, the input time-series data A may include, as additional information, a plurality of information out of key information, genre information, and difficulty level information.
 また、上記実施の形態において、コード推定装置20は生成部23を含むが、実施の形態はこれに限定されない。演奏者は、推定部22により推定されたコード列情報を所望の楽譜に転記することによりコード付き楽譜を作成することができる。そのため、コード推定装置20は、生成部23を含まなくてもよい。 Also, in the above embodiment, the code estimation device 20 includes the generator 23, but the embodiment is not limited to this. The player can create a musical score with chords by transcribing the chord string information estimated by the estimating section 22 to a desired musical score. Therefore, the code estimation device 20 does not have to include the generator 23 .
 上記実施の形態において、訓練データDはピアノにより演奏を行う際のコード列情報を推定するように訓練されるが、実施の形態はこれに限定されない。訓練データDは、ギター、ドラム等の他の楽器により演奏を行う際のコード列情報を推定するように訓練されてもよい。 In the above embodiment, the training data D is trained to estimate chord string information when performing on the piano, but the embodiment is not limited to this. The training data D may be trained to estimate chord string information when performing with other musical instruments such as guitars and drums.
 上記実施の形態において、コード推定装置20の使用者が演奏者である場合を例に説明したが、コード推定装置20の使用者は、例えば、楽譜の制作会社のスタッフであってもよい。また、訓練装置10による機械学習は、楽譜の制作会社のスタッフにより事前に行われてもよい。 In the above embodiment, the user of the chord estimation device 20 is a performer. Also, the machine learning by the training device 10 may be performed in advance by the staff of the musical score production company.

Claims (16)

  1. 複数の音符からなる音符列を含む時系列データを受け付ける受付部と、
     訓練済モデルを用い、前記時系列データに基づいて、前記音符列に対応するコード列を示すコード列情報を推定する推定部と、を備えるコード推定装置。
    a reception unit that receives time-series data including a string of notes made up of a plurality of notes;
    an estimating unit for estimating code string information indicating a code string corresponding to the musical note string based on the time-series data using a trained model.
  2. 前記訓練済モデルは、複数の音符からなる参照音符列を含む入力時系列データと、前記参照音符列に対応するコード列を示す出力コード列情報との間の入出力関係を習得したモデルである、請求項1に記載のコード推定装置。 The trained model is a model that has learned the input/output relationship between input time-series data including a reference note string consisting of a plurality of notes and output code string information indicating a code string corresponding to the reference note string. 2. A code estimation apparatus according to claim 1.
  3. 前記推定部は、前記コード列におけるコードチェンジのタイミングについても推定する、請求項1または請求項2に記載のコード推定装置。 3. The chord estimating apparatus according to claim 1, wherein said estimating unit also estimates timing of chord change in said code string.
  4. 前記入力時系列データは、前記参照音符列で表現される音楽のジャンルを指定するジャンル情報を含み、
     前記時系列データは、前記音符列で表現される音楽のジャンルを指定するジャンル情報を含み、
     前記推定部は、ジャンル情報を含む前記時系列データに基づいて、前記コード列情報を推定する、請求項2に記載のコード推定装置。
    the input time-series data includes genre information specifying a genre of music represented by the reference note sequence;
    The time-series data includes genre information specifying the genre of music represented by the musical note sequence,
    3. The chord estimation device according to claim 2, wherein said estimation unit estimates said code string information based on said time series data including genre information.
  5. 前記入力時系列データは、前記参照音符列で表現される音楽の調を指定する調情報を含み、
     前記時系列データは、前記音符列で表現される音楽の調を指定する調情報を含み、
     前記推定部は、調情報を含む前記時系列データに基づいて、前記コード列情報を推定する、請求項2に記載のコード推定装置。
    The input time-series data includes key information specifying the key of music represented by the reference note string,
    The time-series data includes key information specifying the key of the music represented by the string of notes,
    3. The chord estimation apparatus according to claim 2, wherein said estimation unit estimates said chord string information based on said time-series data including key information.
  6. 前記入力時系列データは、前記参照音符列で示される楽譜の難易度を指定する難易度情報を含み、
     前記時系列データは、前記音符列で示される楽譜の難易度を指定する難易度情報を含み、
     前記推定部は、難易度情報を含む前記時系列データに基づいて、前記コード列情報を推定する、請求項2に記載のコード推定装置。
    The input time-series data includes difficulty level information specifying the difficulty level of the musical score indicated by the reference note string,
    The time-series data includes difficulty level information that specifies the difficulty level of the musical score indicated by the string of notes,
    3. The chord estimation device according to claim 2, wherein said estimation unit estimates said code string information based on said time-series data including difficulty level information.
  7. 前記音符列の各音符に対応するように前記コード列情報が付されたコード付き楽譜を示す楽譜情報を生成する生成部をさらに備える、請求項1~6のいずれか一項に記載のコード推定装置。 7. The chord estimation according to any one of claims 1 to 6, further comprising a generating unit that generates score information indicating a chorded score to which the code string information is attached so as to correspond to each note of the note string. Device.
  8. 複数の音符からなる参照音符列を含む入力時系列データを取得する第1の取得部と、
     前記参照音符列に対応するコード列を示す出力コード列情報を取得する第2の取得部と、
     前記入力時系列データと前記出力コード列情報との間の入出力関係を習得した訓練済モデルを構築する構築部と、を備える訓練装置。
    a first acquisition unit that acquires input time-series data including a reference note string consisting of a plurality of notes;
    a second acquisition unit for acquiring output code string information indicating a code string corresponding to the reference note string;
    A training device, comprising: a building unit that builds a trained model that has learned the input/output relationship between the input time-series data and the output code string information.
  9. 複数の音符からなる音符列を含む時系列データを受け付け、
     訓練済モデルを用い、前記時系列データに基づいて、前記音符列に対応するコード列を示すコード列情報を推定する、コンピュータにより実行されるコード推定方法。
    Accepts time-series data containing note strings consisting of multiple notes,
    A computer-implemented chord estimation method for estimating chord string information indicating a chord string corresponding to said string of notes based on said time-series data using a trained model.
  10. 前記訓練済モデルは、複数の音符からなる参照音符列を含む入力時系列データと、前記参照音符列に対応するコード列を示す出力コード列情報との間の入出力関係を習得したモデルである、請求項9に記載のコンピュータにより実行されるコード推定方法。 The trained model is a model that has learned the input/output relationship between input time-series data including a reference note string consisting of a plurality of notes and output code string information indicating a code string corresponding to the reference note string. 10. The computer implemented code estimation method of claim 9.
  11. 前記推定することは、前記コード列におけるコードチェンジのタイミングについても推定する、請求項9または請求項10に記載のコンピュータにより実行されるコード推定方法。 11. A computer-implemented chord estimation method according to claim 9 or 10, wherein said estimating also estimates the timing of chord changes in said chord sequence.
  12. 前記入力時系列データは、前記参照音符列で表現される音楽のジャンルを指定するジャンル情報を含み、
     前記時系列データは、前記音符列で表現される音楽のジャンルを指定するジャンル情報を含み、
     前記推定することは、ジャンル情報を含む前記時系列データに基づいて、前記コード列情報を推定する、請求項10に記載のコンピュータにより実行されるコード推定方法。
    the input time-series data includes genre information specifying a genre of music represented by the reference note sequence;
    The time-series data includes genre information specifying the genre of music represented by the musical note sequence,
    11. The computer-implemented method of estimating chords of claim 10, wherein the estimating estimates the code string information based on the time series data including genre information.
  13. 前記入力時系列データは、前記参照音符列で表現される音楽の調を指定する調情報を含み、
     前記時系列データは、前記音符列で表現される音楽の調を指定する調情報を含み、
     前記推定することは、調情報を含む前記時系列データに基づいて、前記コード列情報を推定する、請求項10に記載のコンピュータにより実行されるコード推定方法。
    The input time-series data includes key information specifying the key of music represented by the reference note string,
    The time-series data includes key information specifying the key of the music represented by the string of notes,
    11. The computer-implemented chord estimation method of claim 10, wherein said estimating estimates said chord string information based on said time series data including key information.
  14. 前記入力時系列データは、前記参照音符列で示される楽譜の難易度を指定する難易度情報を含み、
     前記時系列データは、前記音符列で示される楽譜の難易度を指定する難易度情報を含み、
     前記推定することは、難易度情報を含む前記時系列データに基づいて、前記コード列情報を推定する、請求項10に記載のコンピュータにより実行されるコード推定方法。
    The input time-series data includes difficulty level information specifying the difficulty level of the musical score indicated by the reference note string,
    The time-series data includes difficulty level information that specifies the difficulty level of the musical score indicated by the string of notes,
    11. The computer-implemented method of estimating chords of claim 10, wherein the estimating estimates the chord string information based on the time series data including difficulty information.
  15. さらに、前記音符列の各音符に対応するように前記コード列情報が付されたコード付き楽譜を示す楽譜情報を生成する、請求項9~14のいずれか一項に記載のコンピュータにより実行されるコード推定方法。 15. The computer according to any one of claims 9 to 14, further comprising generating musical score information indicating a chorded musical score to which the code string information has been added so as to correspond to each note of the musical note string. Code estimation method.
  16. 複数の音符からなる参照音符列を含む入力時系列データを取得し、
     前記参照音符列に対応するコード列を示す出力コード列情報を取得し、
     前記入力時系列データと前記出力コード列情報との間の入出力関係を習得した訓練済モデルを構築する、コンピュータにより実行される訓練方法。
    Take input time series data containing a reference note string consisting of multiple notes,
    Acquiring output code string information indicating a code string corresponding to the reference note string;
    A computer-implemented training method for building a trained model that has learned input-output relationships between said input time series data and said output code string information.
PCT/JP2022/009233 2021-03-26 2022-03-03 Code estimation device, training device, code estimation method, and training method WO2022202199A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2023508892A JPWO2022202199A1 (en) 2021-03-26 2022-03-03
CN202280023333.9A CN117043852A (en) 2021-03-26 2022-03-03 Chord estimation device, training device, chord estimation method, and training method

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2021-052532 2021-03-26
JP2021052532 2021-03-26

Publications (1)

Publication Number Publication Date
WO2022202199A1 true WO2022202199A1 (en) 2022-09-29

Family

ID=83396894

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2022/009233 WO2022202199A1 (en) 2021-03-26 2022-03-03 Code estimation device, training device, code estimation method, and training method

Country Status (3)

Country Link
JP (1) JPWO2022202199A1 (en)
CN (1) CN117043852A (en)
WO (1) WO2022202199A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2015031738A (en) * 2013-07-31 2015-02-16 株式会社河合楽器製作所 Chord progression estimation and detection device and chord progression estimation and detection program
WO2020145326A1 (en) * 2019-01-11 2020-07-16 ヤマハ株式会社 Acoustic analysis method and acoustic analysis device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2015031738A (en) * 2013-07-31 2015-02-16 株式会社河合楽器製作所 Chord progression estimation and detection device and chord progression estimation and detection program
WO2020145326A1 (en) * 2019-01-11 2020-07-16 ヤマハ株式会社 Acoustic analysis method and acoustic analysis device

Also Published As

Publication number Publication date
CN117043852A (en) 2023-11-10
JPWO2022202199A1 (en) 2022-09-29

Similar Documents

Publication Publication Date Title
CN112382257B (en) Audio processing method, device, equipment and medium
CN111630590B (en) Method for generating music data
KR101942814B1 (en) Method for providing accompaniment based on user humming melody and apparatus for the same
EP3489946A1 (en) Real-time jamming assistance for groups of musicians
US7411125B2 (en) Chord estimation apparatus and method
JP2019152716A (en) Information processing method and information processor
JP6760450B2 (en) Automatic arrangement method
JP2012506061A (en) Analysis method of digital music sound signal
JP6565528B2 (en) Automatic arrangement device and program
JP6693176B2 (en) Lyrics generation device and lyrics generation method
Jensen Evolutionary music composition: A quantitative approach
JP6645085B2 (en) Automatic arrangement device and program
WO2022202199A1 (en) Code estimation device, training device, code estimation method, and training method
JP7375302B2 (en) Acoustic analysis method, acoustic analysis device and program
US20220383843A1 (en) Arrangement generation method, arrangement generation device, and generation program
US6984781B2 (en) Music formulation
JP2019109357A (en) Feature analysis method for music information and its device
CN116710998A (en) Information processing system, electronic musical instrument, information processing method, and program
Vargas et al. Artificial musical pattern generation with genetic algorithms
Suthaphan et al. Music generator for elderly using deep learning
WO2022244403A1 (en) Musical score writing device, training device, musical score writing method and training method
WO2022190453A1 (en) Fingering presentation device, training device, fingering presentation method, and training method
KR102490769B1 (en) Method and device for evaluating ballet movements based on ai using musical elements
Akimoto et al. SketTune: Real-time input assistance for novices to compose music for self-expression
WO2020171035A1 (en) Sound signal synthesis method, generative model training method, sound signal synthesis system, and program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22774995

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 202280023333.9

Country of ref document: CN

Ref document number: 2023508892

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22774995

Country of ref document: EP

Kind code of ref document: A1