WO2022202199A1

WO2022202199A1 - Code estimation device, training device, code estimation method, and training method

Info

Publication number: WO2022202199A1
Application number: PCT/JP2022/009233
Authority: WO
Inventors: 正博鈴木
Original assignee: ヤマハ株式会社
Priority date: 2021-03-26
Filing date: 2022-03-03
Publication date: 2022-09-29
Also published as: CN117043852A; JPWO2022202199A1

Abstract

In the present invention, a code estimation device comprises: a reception unit for receiving time-series data including a note sequence comprising a plurality of notes; and an estimation unit for estimating, using a trained model, code sequence information indicating a code sequence corresponding to the note sequence, on the basis of the time-series data.

Description

Code estimation device, training device, code estimation method and training method

The present invention relates to a chord estimation device and method for estimating chords for playing a musical instrument, and a training device and method for constructing a chord estimation device.

There is a sheet music with chords added. Players can enjoy playing musical instruments by playing chords using musical instruments such as pianos and guitars. When creating a musical score with chords, the producer assigns chords based on the melody and accompaniment sounds indicated by the notes. The task of adding chords requires musical knowledge and sense. Japanese Unexamined Patent Application Publication No. 2002-200002 discloses a chord progression estimation and detection device that estimates chords from performance information or acoustic signals.

Japanese Patent No. 6151121

In Patent Document 1, a code is estimated for each specific section. For example, one chord is estimated per bar. If it is possible to perform chord estimation with a higher degree of freedom from given notes, it is expected that it will be possible to more appropriately support the production of musical scores with chords.

The purpose of the present invention is to perform chord estimation with a high degree of freedom based on musical note strings.

A chord estimation apparatus according to an aspect of the present invention uses a receiving unit that receives time-series data including a string of notes composed of a plurality of notes, and a trained model to generate a code string corresponding to the string of notes based on the time-series data. and an estimating unit for estimating the code string information to be indicated.

A training apparatus according to another aspect of the present invention includes a first acquisition unit that acquires input time-series data including a reference note string composed of a plurality of notes, and output code string information that indicates a code string corresponding to the reference note string. A second acquisition unit for acquisition, and a construction unit for constructing a trained model that has learned the input/output relationship between the input time-series data and the output code string information.

A chord estimation method according to yet another aspect of the present invention is executed by a computer, accepts time series data including a string of notes, uses a trained model, and corresponds to the string of notes based on the time series data. Estimate code string information that indicates the code string to be used.

A training method according to still another aspect of the present invention is executed by a computer, acquires input time-series data including a reference note string consisting of a plurality of notes, and outputs code string information indicating a code string corresponding to the reference note string. acquire and build a trained model that has learned the input/output relationship between the input time-series data and the output code string information.

According to the present invention, chord estimation with a high degree of freedom can be performed based on a string of musical notes.

FIG. 1 is a block diagram showing the configuration of a processing system including a chord estimation device and a training device according to one embodiment of the present invention. FIG. 2 is a diagram showing an example of input time-series data included in training data. FIG. 3 is a diagram showing an example of output code string information included in training data. FIG. 4 is a block diagram showing the configuration of the training device and chord estimation device. FIG. 5 shows an example of an arranged musical score displayed on the display unit. FIG. 6 is a flowchart showing an example of training processing. FIG. 7 is a flowchart showing an example of chord estimation processing. FIG. 8 is a diagram showing a modified example of output code string information included in training data.

(1) Configuration of Processing System Hereinafter, a chord estimation device, a training device, a chord estimation method, and a training method according to embodiments of the present invention will be described in detail with reference to the drawings. FIG. 1 is a block diagram showing the configuration of a processing system including a chord estimation device and a training device according to one embodiment of the present invention. As shown in FIG. 1, the processing system 100 includes a RAM (Random Access Memory) 110, a ROM (Read Only Memory) 120, a CPU (Central Processing Unit) 130, a storage section 140, an operation section 150 and a display section 160. .

The processing system 100 is implemented by a computer such as a personal computer, tablet terminal, or smart phone. Alternatively, the processing system 100 may be realized by cooperative operation of a plurality of computers connected by a communication path such as Ethernet, or may be realized by an electronic musical instrument such as an electronic piano having performance functions.

The RAM 110 , ROM 120 , CPU 130 , storage section 140 , operation section 150 and display section 160 are connected to the bus 170 . RAM 110 , ROM 120 and CPU 130 constitute training device 10 and chord estimation device 20 . Although training device 10 and chord estimation device 20 are configured by common processing system 100 in this embodiment, they may be configured by separate processing systems.

The RAM 110 consists of, for example, a volatile memory, and is used as a work area for the CPU 130. The ROM 120 is, for example, a non-volatile memory and stores a training program and a code estimation program. CPU 130 performs a training process by executing a training program stored in ROM 120 on RAM 110 . Further, the CPU 130 performs code estimation processing by executing a code estimation program stored in the ROM 120 on the RAM 110 . Details of the training process and the code estimation process will be described later.

The training program or code estimation program may be stored in the storage unit 140 instead of the ROM 120. Alternatively, the training program or code estimation program may be provided in a form stored in a computer-readable storage medium and installed in ROM 120 or storage unit 140 . Alternatively, when the processing system 100 is connected to a network such as the Internet, a training program or code estimation program distributed from a server (including a cloud server) on the network is installed in the ROM 120 or the storage unit 140. may

The storage unit 140 includes a storage medium such as a hard disk, an optical disk, a magnetic disk, or a memory card, and stores a trained model M and a plurality of training data D. The trained model M or each piece of training data D may not be stored in the storage unit 140, but may be stored in a computer-readable storage medium. Alternatively, if the processing system 100 is connected to a network, the trained model M or respective training data D may be stored on a server on that network.

(2) Training Data The trained model M is a machine learning model that has been trained to present chord strings to be referred to when the user of the chord estimation device 20 (hereinafter referred to as a performer) plays a piece of music. be. A trained model M is constructed using a plurality of training data D. A user of the training device 10 can generate the training data D by operating the operation unit 150 . The training data D is data created based on the musical knowledge or musical sense of the reference performer. The reference performer has a relatively high level of skill in playing the piece of music. A reference performer may be the performer's mentor or teacher in the performance of the musical composition.

The training data D indicates a set of input time-series data and output code string information. The input time-series data indicates a reference note string consisting of a plurality of notes. For example, the input time-series data is data that forms a melody or accompaniment sound with a plurality of notes. The input time-series data may be image data representing images of musical scores. The output code string information is data in which codes corresponding to the reference note string are arranged in time series. A code string corresponding to the reference note string is provided by the reference performer.

　Figs. 2 and 3 are diagrams showing an example of each training data D. Figs. The example in FIG. 2 shows input time-series data including a reference note string consisting of a plurality of notes. The example in FIG. 3 shows output code string information indicating a code string corresponding to the reference note string.

In this embodiment, the input time-series data has a metrical structure and additional information in addition to the reference note string. The input time-series data A shown in FIG. 2 is data obtained by extracting data for the first two bars of a song. In the input time-series data A, bars are separated by "bar", and beats are separated by "beat". In this way, the input time-series data A has a metrical structure with the "bar" and "beat" information. Elements A1 to A37 indicate the reference note string of the first bar. That is, the elements A1 to A37 are separated into bars by the "bar" before the element A1 and the "bar" after the element A37. In addition, it is divided into beats by "beat" after elements A8, A18, and A26.

The element A0 is additional information. As the additional information, for example, key information, genre information, difficulty level information, and the like are used. In the example of FIG. 2, key information is added by the Key element. The key information is information specifying the key of the music represented by the reference note string. The numerical value following Key is the numerical value that designates the key. By designating the key information as the additional information, the reference note string and the code string corresponding to the key are machine-learned. Genre information is information that designates the genre of music represented by the reference note string. As genre information, for example, genres such as rock, pops, and jazz are specified. By designating genre information as additional information, a reference note string and a code string corresponding to the genre are machine-learned. The difficulty level information is information indicating the difficulty level of the musical score indicated by the reference note string. By specifying difficulty level information as additional information, a code string corresponding to the difficulty level of the reference note string and score is machine-learned. For example, in the case of a score with a low difficulty level, machine learning is performed while interpolating notes from a small number of tones. In the case of a score with a high degree of difficulty, machine learning is performed while selecting notes that form chords from an excessive number of tones.

Among the elements of the input time-series data A, elements other than the element A0, "bar" and "beat" correspond to the reference note string. Elements A1 to A37 indicate the reference note string of the first measure. In this example, the element A0 is placed at the beginning of the input time-series data A, that is, before the reference note string (elements A1 to A37), but it may be placed at any position in the input time-series data A.

As exemplified by elements A1 to A37, in the reference note string, "L" means left hand, "R" means right hand, and the number following "L" or "R" means scale. Also, "on" and "off" mean key depression and key release, respectively. Also, "wait" means waiting, and the number following "wait" means the length of time. Thus, elements A1-A5 indicate pressing the keys of scale 77 and 74 with the right hand simultaneously while simultaneously pressing the keys of scale 53 and 46 with the left hand, followed by holding for 11 units of time. After holding for 11 units of time, elements A6 to A8 indicate that the left hand keys of scale 53 and scale 46 are released at the same time and then held for 1 unit of time. Then, after maintaining for one unit time, elements A9 to A11 indicate that the left hand presses scale 53 and scale 46 again, and then waits for five unit time.

The output code string information B shown in FIG. 3 indicates a code string corresponding to the reference note string included in the input time-series data A. Code strings corresponding to elements A1 to A37 of input time-series data A are represented by elements B1 to B3 and elements B4 to B6. That is, the elements B1 to B6 indicate the code string corresponding to the first bar of the input time-series data A. FIG. In the output code string information B, bars are also separated by "bar" and beats by "beat". A range delimited by "bar" before the element B1 and "bar" after the element B6 corresponds to the first bar.

In the output code string information B, one code is indicated by three elements. Elements B1 to B3 define the chord of the first beat of the first bar. Elements B4 to B6 define the chord of the fourth beat of the first bar. Elements B7 to B9 define the chord of the first beat of the fourth bar. Of the three elements representing the code, the first element (B1, B4, B7) represents basic code information. The basic chord information (chord) indicates a numerical value from 1 to 24 that designates the type of major chord and minor chord for each of the 12 tones (C, C#, D, D#, . . . A, A#, B). . The second element (B2, B5, B8) of the three elements indicating the chord indicates chord type information. The chord type information (type) indicates a numerical value designating the type of tension chord. Of the three elements representing chords, the third element (B3, B6, B9) represents chord root information. The chord root information (root) indicates a numerical value designating the root note of the on-chord.

(3) Training Device and Chord Estimation Device FIG. 4 is a block diagram showing the configuration of the training device 10 and the chord estimation device 20. As shown in FIG. As shown in FIG. 4, the training device 10 includes a first acquisition unit 11, a second acquisition unit 12, and a construction unit 13 as functional units. The functional units of the training device 10 are implemented by the CPU 130 of FIG. 1 executing the training program. At least part of the functional units of the training device 10 may be realized by hardware such as an electronic circuit.

The first acquisition unit 11 acquires the input time-series data A from each training data D stored in the storage unit 140 or the like. The second acquisition unit 12 acquires output code string information B from each training data D. FIG. For each training data D, the construction unit 13 uses the input time-series data A acquired by the first acquisition unit 11 as an input element, and the output code string information B acquired by the second acquisition unit 12 as an output element. perform machine learning to By repeating machine learning for a plurality of training data D, the construction unit 13 constructs a trained model M indicating the input/output relationship between the input time-series data A and the output code string information B.

In this example, the building unit 13 builds the trained model M by training the Transformer, but the embodiment is not limited to this. The construction unit 13 may construct the trained model M by training a machine learning model of another method that handles time series. The trained model M constructed by the construction unit 13 is stored in the storage unit 140, for example. The trained model M constructed by the construction unit 13 may be stored in a server or the like on the network.

The code estimation device 20 includes a reception unit 21, an estimation unit 22, and a generation unit 23 as functional units. The functional units of the code estimation device 20 are implemented by the CPU 130 of FIG. 1 executing the code estimation program. At least part of the functional units of the code estimation device 20 may be realized by hardware such as an electronic circuit.

In the present embodiment, the reception unit 21 receives time-series data including a string of notes made up of a plurality of notes. The performer can give image data representing an image of the musical score to the reception unit 21 as time-series data. Alternatively, the performer can generate time-series data by operating the operation unit 150 and provide it to the reception unit 21 . In this example, the time-series data has the same configuration as the input time-series data A in FIG. In other words, time-series data has a metrical structure and additional information in addition to a string of musical notes.

The estimation unit 22 estimates code string information using the trained model M stored in the storage unit 140 or the like. The code string information indicates a code string corresponding to the note string accepted by the accepting unit 21, and is estimated based on the note string and additional information. Since the time-series data has the same configuration as the input time-series data A, the code string information has the same configuration as the output code string information B. The generation unit 23 generates score information based on the note sequence of the time-series data received by the reception unit 21 and the code string information estimated by the estimation unit 22 . For example, the musical score information is information on an arranged musical score for a piano, and is data in which chord information is added to a staff notation. Alternatively, the musical score information is MIDI data to which code string information is added.

The display unit 160 displays the musical score with chords based on the musical score information generated by the generating unit 23 . FIG. 5 shows an example of a musical score with chords displayed on the display unit 160. As shown in FIG. As shown in FIG. 5, the code-attached musical score indicates that the code string information estimated by the estimating section 22 corresponds to each note of the note string accepted by the accepting section 21 .

(4) Training Processing and Chord Estimation Processing FIG. 6 is a flowchart showing an example of training processing by the training apparatus 10 of FIG. The training process in FIG. 6 is performed by CPU 130 in FIG. 1 executing a training program. First, the first acquisition unit 11 acquires the input time-series data A from each training data D (step S1). Also, the second acquisition unit 12 acquires the output code string information B from each training data D (step S2). Either of steps S1 and S2 may be performed first, or may be performed simultaneously.

Next, for each training data D, the construction unit 13 performs machine learning using the input time-series data A obtained in step S1 as an input element and the output code string information B obtained in step S2 as an output element. (Step S3). Subsequently, the construction unit 13 determines whether or not sufficient machine learning has been performed (step S4). If the machine learning is insufficient, the construction unit 13 returns to step S3. Steps S3 and S4 are repeated while changing the parameters until sufficient machine learning is performed. The number of iterations of machine learning changes according to quality conditions that the trained model M to be constructed should satisfy.

When sufficient machine learning has been performed, the construction unit 13 saves the input/output relationship between the input time-series data A and the output code string information B learned by the machine learning in step S3 as a trained model M ( step S5). This completes the training process.

FIG. 7 is a flowchart showing an example of chord estimation processing by the chord estimation device 20 of FIG. The chord estimation process in FIG. 7 is performed by CPU 130 in FIG. 1 executing a chord estimation program. First, the receiving unit 21 receives time-series data (step S11). Next, the estimation unit 22 estimates code string information from the time-series data received in step S11 using the trained model M saved in step S5 of the training process (step S12). At this time, since code string information including one or a plurality of code strings is estimated from the note strings included in the time-series data, chord estimation is performed with a high degree of freedom. In addition, since the chord change timing is also estimated in the course of time, more appropriate chord estimation is performed. In other words, the time-series data does not contain information that serves as a chord change delimiter, but the estimation unit 22 performs chord estimation including chord change timing.

After that, the generation unit 23 generates score information based on the note string of the time-series data received in step S11 and the code string information estimated in step S12 (step S13). A score with chords may be displayed on the display unit 160 based on the generated score information. This completes the chord estimation process.

(5) Effect of the Embodiment As described above, the chord estimation apparatus 20 according to the present embodiment includes the receiving unit 21 that receives time-series data including a string of notes composed of a plurality of notes, and the trained model M: and an estimating unit 22 for estimating code string information indicating a code string corresponding to the musical note string. According to this configuration, the trained model M is used to estimate appropriate code string information from the temporal flow of multiple notes in the time-series data. This makes it possible to present a coded musical score based on time-series data including a string of notes. Since one or more chord strings are estimated from the note string, chord estimation is performed with a high degree of freedom.

The trained model M learns the input/output relationship between the input time-series data A including a reference note string consisting of a plurality of notes and the output code string information B indicating a code string corresponding to each note in the reference note string. It may be a machine learning model that In this case, code string information can be easily estimated from time-series data.

The estimation unit 22 may also estimate the chord change timing in the code string. As a result, more appropriate chord estimation corresponding to the note string is performed.

The input time-series data A may include genre information specifying the genre of music represented by the reference note string. The time-series data may also include genre information that designates the genre of music represented by a string of musical notes. Then, the estimation unit 22 may estimate the code string information based on the time-series data including the genre information. Thus, chord estimation suitable for the genre of music is performed.

The input time-series data A may include key information that specifies the key of the music represented by the reference note string. The time-series data may also include key information that specifies the key of music represented by a string of notes. Then, the estimating section 22 may estimate the code string information based on the time-series data including the key information. This provides a chord estimation that is appropriate for the key of the music.

The input time-series data A may include difficulty level information specifying the difficulty level of the musical score indicated by the reference note string. The time-series data may also include difficulty level information that designates the difficulty level of the musical score indicated by the note string. Then, the estimation unit 22 may estimate the code string information based on the time-series data including the difficulty level information. As a result, appropriate chord estimation is performed according to the difficulty level of the musical score indicated by the note string.

The chord estimating device 20 may further include a generation unit 23 that generates musical score information indicating a chorded musical score to which code string information is added so as to correspond to each note of the musical note string.

The training apparatus 10 according to the present embodiment includes a first acquisition unit 11 that acquires input time-series data A including a reference note string composed of a plurality of notes, and an output code string that indicates a code string corresponding to the reference note string. A second acquisition unit 12 that acquires information B, and a construction unit 13 that constructs a trained model M that has learned the input/output relationship between the input time-series data A and the output code string information B. According to this configuration, a trained model M that has learned the input/output relationship between the input time-series data A and the output code string information B can be easily constructed.

(6) Other Embodiments In the above embodiment, the input time-series data A includes additional information, and the time-series data includes additional information, but the embodiment is not limited to this. The input time-series data A only needs to include the reference note string, and does not have to include additional information. Similarly, the time-series data may include musical note sequences and may not include additional information.

In the above embodiment, the input time-series data A has "bar" and "beat" information as the metrical structure, but the embodiment is not limited to this. The input time-series data A may not have a metrical structure. FIG. 8 is a diagram showing an example of output code string information B prepared for input time-series data A having no metrical structure. As shown in FIG. 8, the output code string information B does not have a metrical structure consisting of "bar" and "beat" information.

In the above embodiment, the case where the input time-series data A has key information, genre information, and difficulty level information as additional information has been described as an example. The construction unit 13 may construct different trained models M according to the type of additional information, or may construct one trained model M. Alternatively, the input time-series data A may include, as additional information, a plurality of information out of key information, genre information, and difficulty level information.

Also, in the above embodiment, the code estimation device 20 includes the generator 23, but the embodiment is not limited to this. The player can create a musical score with chords by transcribing the chord string information estimated by the estimating section 22 to a desired musical score. Therefore, the code estimation device 20 does not have to include the generator 23 .

In the above embodiment, the training data D is trained to estimate chord string information when performing on the piano, but the embodiment is not limited to this. The training data D may be trained to estimate chord string information when performing with other musical instruments such as guitars and drums.

In the above embodiment, the user of the chord estimation device 20 is a performer. Also, the machine learning by the training device 10 may be performed in advance by the staff of the musical score production company.

Claims

a reception unit that receives time-series data including a string of notes made up of a plurality of notes;
an estimating unit for estimating code string information indicating a code string corresponding to the musical note string based on the time-series data using a trained model.
The trained model is a model that has learned the input/output relationship between input time-series data including a reference note string consisting of a plurality of notes and output code string information indicating a code string corresponding to the reference note string. 2. A code estimation apparatus according to claim 1.
3. The chord estimating apparatus according to claim 1, wherein said estimating unit also estimates timing of chord change in said code string.
the input time-series data includes genre information specifying a genre of music represented by the reference note sequence;
The time-series data includes genre information specifying the genre of music represented by the musical note sequence,
3. The chord estimation device according to claim 2, wherein said estimation unit estimates said code string information based on said time series data including genre information.
The input time-series data includes key information specifying the key of music represented by the reference note string,
The time-series data includes key information specifying the key of the music represented by the string of notes,
3. The chord estimation apparatus according to claim 2, wherein said estimation unit estimates said chord string information based on said time-series data including key information.
The input time-series data includes difficulty level information specifying the difficulty level of the musical score indicated by the reference note string,
The time-series data includes difficulty level information that specifies the difficulty level of the musical score indicated by the string of notes,
3. The chord estimation device according to claim 2, wherein said estimation unit estimates said code string information based on said time-series data including difficulty level information.
7. The chord estimation according to any one of claims 1 to 6, further comprising a generating unit that generates score information indicating a chorded score to which the code string information is attached so as to correspond to each note of the note string. Device.
a first acquisition unit that acquires input time-series data including a reference note string consisting of a plurality of notes;
a second acquisition unit for acquiring output code string information indicating a code string corresponding to the reference note string;
A training device, comprising: a building unit that builds a trained model that has learned the input/output relationship between the input time-series data and the output code string information.
Accepts time-series data containing note strings consisting of multiple notes,
A computer-implemented chord estimation method for estimating chord string information indicating a chord string corresponding to said string of notes based on said time-series data using a trained model.
The trained model is a model that has learned the input/output relationship between input time-series data including a reference note string consisting of a plurality of notes and output code string information indicating a code string corresponding to the reference note string. 10. The computer implemented code estimation method of claim 9.
11. A computer-implemented chord estimation method according to claim 9 or 10, wherein said estimating also estimates the timing of chord changes in said chord sequence.
the input time-series data includes genre information specifying a genre of music represented by the reference note sequence;
The time-series data includes genre information specifying the genre of music represented by the musical note sequence,
11. The computer-implemented method of estimating chords of claim 10, wherein the estimating estimates the code string information based on the time series data including genre information.
The input time-series data includes key information specifying the key of music represented by the reference note string,
The time-series data includes key information specifying the key of the music represented by the string of notes,
11. The computer-implemented chord estimation method of claim 10, wherein said estimating estimates said chord string information based on said time series data including key information.
The input time-series data includes difficulty level information specifying the difficulty level of the musical score indicated by the reference note string,
The time-series data includes difficulty level information that specifies the difficulty level of the musical score indicated by the string of notes,
11. The computer-implemented method of estimating chords of claim 10, wherein the estimating estimates the chord string information based on the time series data including difficulty information.
15. The computer according to any one of claims 9 to 14, further comprising generating musical score information indicating a chorded musical score to which the code string information has been added so as to correspond to each note of the musical note string. Code estimation method.
Take input time series data containing a reference note string consisting of multiple notes,
Acquiring output code string information indicating a code string corresponding to the reference note string;
A computer-implemented training method for building a trained model that has learned input-output relationships between said input time series data and said output code string information.