WO2019176954A1 - Machine learning method, electronic apparatus, electronic musical instrument, model generator for part selection, and method of part determination - Google Patents

Machine learning method, electronic apparatus, electronic musical instrument, model generator for part selection, and method of part determination Download PDF

Info

Publication number
WO2019176954A1
WO2019176954A1 PCT/JP2019/010066 JP2019010066W WO2019176954A1 WO 2019176954 A1 WO2019176954 A1 WO 2019176954A1 JP 2019010066 W JP2019010066 W JP 2019010066W WO 2019176954 A1 WO2019176954 A1 WO 2019176954A1
Authority
WO
WIPO (PCT)
Prior art keywords
musical instrument
feature value
musical
composition data
specific type
Prior art date
Application number
PCT/JP2019/010066
Other languages
French (fr)
Inventor
Daiki HIGURASHI
Original Assignee
Casio Computer Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Casio Computer Co., Ltd. filed Critical Casio Computer Co., Ltd.
Publication of WO2019176954A1 publication Critical patent/WO2019176954A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10GREPRESENTATION OF MUSIC; RECORDING MUSIC IN NOTATION FORM; ACCESSORIES FOR MUSIC OR MUSICAL INSTRUMENTS NOT OTHERWISE PROVIDED FOR, e.g. SUPPORTS
    • G10G1/00Means for the representation of music
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/0008Associated control or indicating means
    • G10H1/0016Means for indicating which keys, frets or strings are to be actuated, e.g. using lights or leds
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/0033Recording/reproducing or transmission of music for electrophonic musical instruments
    • G10H1/0041Recording/reproducing or transmission of music for electrophonic musical instruments in coded form
    • G10H1/0058Transmission between separate instruments or between individual components of a musical system
    • G10H1/0066Transmission between separate instruments or between individual components of a musical system using a MIDI interface
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2220/00Input/output interfacing specifically adapted for electrophonic musical tools or instruments
    • G10H2220/005Non-interactive screen display of musical or status data
    • G10H2220/015Musical staff, tablature or score displays, e.g. for score reading during a performance
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2240/00Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
    • G10H2240/011Files or data streams containing coded musical information, e.g. for transmission
    • G10H2240/016File editing, i.e. modifying musical data files or streams as such
    • G10H2240/021File editing, i.e. modifying musical data files or streams as such for MIDI-like files or data streams
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2250/00Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
    • G10H2250/311Neural networks for electrophonic musical instruments or musical processing, e.g. for musical recognition or control, automatic composition or improvisation

Definitions

  • the present disclosure relates to a technique for selecting a part included in musical composition data.
  • Non-Patent Document 1 Conventionally, techniques have been known that create arranged music to be performed by one piano (piano reduction), from music for respective musical instruments (respective parts) in a musical composition to be performed with multiple musical instruments other than a piano (see, for example, Non-Patent Document 1).
  • a machine learning method of causing a learning model to learn is executed by a processor.
  • the machine learning method includes extracting a feature value for each of a plurality of parts included in each of a plurality of items of musical composition data, the feature value relating to suitability of a performance with a specific type of musical instrument; and performing machine learning based on the extracted feature value and information representing a part to be performed with the specific type of musical instrument among a plurality of parts included in musical composition data, so as to cause the learning model to learn to be capable of selecting a part to be performed with the specific type of musical instrument from among a plurality of parts included in musical composition data different from any item of the plurality of items of musical composition data.
  • an electronic device includes a memory configured to store a learned model generated by machine learning; and a processor.
  • the processor is configured to execute extracting a feature value for each of a plurality of parts included in musical composition data, the feature value relating to suitability of a performance with a specific type of musical instrument; inputting the extracted feature value into the learned model; obtaining information for selecting a part to be performed with the specific type of musical instrument from among the plurality of parts included in the musical configuration data; and determining, based on the obtained information, the part to be performed by the specific type of musical instrument from among the plurality of parts.
  • an electronic musical instrument includes an operation part configured to receive a performing operation; a sound generator configured to generate a sound corresponding to the performing operation performed on the operation part; a memory configured to store a learned model generated by machine learning; and a processor.
  • the processor is configured to execute extracting a feature value for each of a plurality of parts included in musical composition data, the feature value relating to suitability of a performance with the electronic musical instrument; inputting the extracted feature value into the learned model; obtaining information for selecting a part to be performed with the electronic musical instrument from among the plurality of parts included in the musical configuration data; and determining, based on the obtained information, the part to be performed by the electronic musical instrument from among the plurality of parts.
  • a model generator for part selection includes a memory configured to store a learned model generated by machine learning; and a processor.
  • the processor is configured to execute extracting a feature value for each of a plurality of parts included in each of a plurality of items of musical composition data, the feature value relating to suitability of a performance with a specific type of musical instrument; and performing machine learning based on the extracted feature value and information representing a part to be performed with the specific type of musical instrument among a plurality of parts included in musical composition data, so as to generate the learned model that outputs information for selecting a part to be performed with the specific type of musical instrument from among a plurality of parts included in musical composition data different from any item of the plurality of items of musical composition data.
  • a method of part determination for determining a part to be performed with a specific type of musical instrument from among a plurality of parts included in musical composition data by using a learned model is executed by a processor.
  • the method includes extracting a feature value for each of a plurality of parts included in musical composition data, the feature value relating to suitability of a performance with a specific type of musical instrument; inputting the extracted feature value into the learned model; obtaining information for selecting a part to be performed with the specific type of musical instrument from among the plurality of parts included in the musical configuration data; and determining, based on the obtained information, the part to be performed by the specific type of musical instrument from among the plurality of parts.
  • FIG. 1 is a diagram illustrating an example of a configuration of an information processing system according to an embodiment
  • FIG. 2 is a diagram illustrating an example of a hardware configuration of a server, a terminal, and an electronic musical instrument according to an embodiment
  • FIG. 3 is a diagram illustrating an example of a functional configuration of an information processing apparatus according to an embodiment
  • FIG. 4 is a sequence chart illustrating an example of a process of an information processing system according to an embodiment
  • FIG. 5 is a flowchart illustrating an example of a process of generating a learned model by machine learning according to an embodiment
  • FIG. 6 is a diagram illustrating an example of data for learning according to an embodiment
  • FIG. 7 is a diagram illustrating a feature value for each part used in machine learning;
  • FIG. 1 is a diagram illustrating an example of a configuration of an information processing system according to an embodiment
  • FIG. 2 is a diagram illustrating an example of a hardware configuration of a server, a terminal, and an electronic musical instrument according to an embodiment
  • FIG. 8 is a flowchart for illustrating an example of a process when machine learning is performed by using GBDT
  • FIG. 9 is a diagram illustrating an example of data of a learned model obtained by using a GBDT
  • FIG. 10 is a flowchart illustrating an example of a process of determining a part based on data of a learned model.
  • FIG. 1 is a diagram illustrating an example of a configuration of an information processing system 1 according to an embodiment.
  • the information processing system 1 includes a server 10, a terminal 20, and an electronic musical instrument 30. Note that the number of devices is not limited to the example in FIG. 1.
  • a network 50 such as a cellular phone network, a LAN (Local Area Network), a wireless LAN, the Internet, and the like.
  • the terminal 20 and the electronic musical instrument 30 are connected with each other by, for example, a USB cable, short-range wireless communication, or the like.
  • the server 10 is an information processing apparatus (a computer or electronic device) used as a server.
  • the server 10 performs machine learning for causing a part selection model (learning model) to learn based on data for learning, to generate a learned model for selecting a part to be performed with a predetermined instrument from among multiple parts included in musical composition data.
  • a part selection model learning model
  • the terminal 20 is, for example, an information processing apparatus such as a tablet terminal, a smartphone, a desktop PC (Personal Computer), a notebook PC, or the like. Based on data of the part selection model (learned model) obtained from the server 10 and musical composition data specified by the user, the terminal 20 selects a part to be performed with the predetermined instrument.
  • the part selection model learned model
  • the electronic musical instrument 30 is, for example, an electronic musical instrument such as an electronic keyboard, an electronic organ, an electronic piano, an electronic wind instrument, an electronic string instrument, a synthesizer, or the like.
  • the electronic musical instrument 30 outputs sound based on the musical composition data input from the terminal 20.
  • the electronic musical instrument 30 also outputs sound in response to an operation performed by the user.
  • FIG. 2 is a diagram illustrating an example of a hardware configuration of the server 10, the terminal 20, and the electronic musical instrument 30 according to the embodiment.
  • the server 10 in FIG. 2 includes a drive device 100, an auxiliary storage device 102, a memory device 103, a CPU (Central Processing Unit) 104, an interface device 105, a display device 106, an input device 107, and an output device 108, which are connected with each other through a bus B.
  • a drive device 100 an auxiliary storage device 102, a memory device 103, a CPU (Central Processing Unit) 104, an interface device 105, a display device 106, an input device 107, and an output device 108, which are connected with each other through a bus B.
  • a bus B bus
  • a program for implementing a process on the server 10 (an information processing program) is provided with a recording medium 101.
  • the recording medium 101 storing the program is set in the drive device 100, the program is installed from the recording medium 101 into the auxiliary storage device 102 via the drive device 100.
  • the program may be downloaded from another computer via the network.
  • the auxiliary storage device 102 stores the installed programs, and stores necessary files, data, and the like.
  • the memory device 103 reads out the program from the auxiliary storage device 102 and stores the program in itself upon receiving a command to activate the program.
  • the CPU 104 implements functions relating to the server 10 according to the program stored in the memory device 103.
  • the interface device 105 is used as an interface for connecting to a network or the like.
  • the display device 106 displays a GUI (Graphical User Interface) or the like by the program.
  • the input device 107 is constituted with a keyboard, a mouse, a touch panel, buttons, and the like, to be used for inputting various operation commands.
  • the output device 108 is constituted with a speaker or the like, to be used for outputting various sounds.
  • the server 10 may further include a GPU (Graphics Processing Unit) so as to perform at least a part of the process relating to machine learning, which will be described later, by using the GPU.
  • a GPU Graphics Processing Unit
  • an accelerator board on which the GPU is mounted may be connected to the bus B via the interface device 105 or the like.
  • the hardware configuration of the terminal 20 and the electronic musical instrument 30 may be substantially the same as the hardware configuration example of the server 10 illustrated in FIG. 2.
  • the input device 107 in the electronic musical instrument 30 is constituted with, for example, a keyboard to input a note to be performed.
  • the electronic musical instrument 30 also includes as the output device 108 a sound generator (a sound source, a speaker, etc.) to generate (reproduce) a performed note or the like.
  • the input device and the output device of the electronic musical instrument 30 are examples of an "operation part" and a "sound generator", respectively.
  • FIG. 3 is a diagram illustrating an example of functional configurations of the server 10 and the terminal 20 according to the embodiment.
  • the server 10 includes a storage 11.
  • the storage 11 is implemented by using, for example, the auxiliary storage device 102 and the like.
  • the storage 11 stores data such as data for learning 111.
  • data for learning 111 a data set is stored that includes pairs of data of notes of multiple parts included in an item of musical composition data (performance information), and information representing a part selected in advance from among the multiple parts included in the item of musical composition data for a predetermined musical instrument (training data).
  • performance information data of notes of multiple parts included in an item of musical composition data
  • training data information representing a part selected in advance from among the multiple parts included in the item of musical composition data for a predetermined musical instrument
  • the server 10 also includes an extractor 12, a generator 13, and an output unit 14. These units are implemented by processes which one or more programs installed in the server 10 cause the CPU 104 of the server 10 to execute.
  • the extractor 12 extracts a predetermined feature value for each part included in each of the multiple items of musical composition data ("first musical composition data") stored in the data for learning 111.
  • the generator 13 Based on the predetermined feature value extracted by the extractor 12 and the information representing the part for the predetermined musical instrument stored in the data for learning 111, the generator 13 generates a learned model for selecting a part to be performed with the predetermined musical instrument from among the multiple parts included in an item of musical composition data specified by the user or the like.
  • the output unit 14 outputs the data of the learned model generated by the generator 13 to the terminal 20.
  • the output unit 14 may attach the data of the learned model to an application installed in the terminal 20, to deliver the application to the terminal 20 via an external server or the like.
  • the terminal 20 includes a receiver 21, an obtainer 22, an extractor 23, a determiner 24, a controller 25, and an output unit 26. These units are implemented by processes which one or more programs installed in the terminal 20 cause the CPU of the terminal 20 to execute.
  • the receiver 21 receives various operations from the user of the terminal 20.
  • the obtainer 22 obtains the data of the learned model for selecting the part to be performed with the predetermined musical instrument from among the multiple parts included in the musical composition data from the server 10. Also, in response to a user operation or the like, the obtainer 22 obtains musical composition data ("second musical composition data") including multiple parts from an external server or the like.
  • the extractor 23 extracts a predetermined feature value for each of the multiple parts included in the musical composition data obtained by the obtainer 22.
  • the determiner 24 determines, based on the learned model obtained by the obtainer 22 and the predetermined feature value for each of the multiple parts extracted by the extractor 23, a part to be performed with a predetermined musical instrument, from among the multiple parts included in the musical composition data obtained by the obtainer 22.
  • the controller 25 displays music corresponding to the part determined by the determiner 24 on a screen.
  • the output unit 26 outputs the part information and the like determined by the determiner 24 to the electronic musical instrument 30.
  • the output unit 26 outputs, for example, the first performance data including the part determined by the determiner 24 and the second performance data including parts other than the part determined by the determiner 24 among the parts included in the predetermined musical composition data to the electronic musical instrument 30.
  • the electronic musical instrument 30 includes an obtainer 31, a guide unit 32, and a reproducer 33. These units are implemented by processes which one or more programs installed in the electronic musical instrument 30 cause the CPU of the electronic musical instrument 30 to execute.
  • the obtainer 31 obtains the first performance data and the second performance data from the terminal 20.
  • the guide unit 32 guides a performance by the user based on the first performance data obtained by the obtainer 31.
  • the reproducer 33 reproduces sounds corresponding to notes included in the second performance data, to output the sounds from a speaker.
  • FIG. 4 is a sequence chart illustrating an example of a process of the information processing system 1 according to the embodiment.
  • the extractor 12 of the server 10 extracts a predetermined feature value for each part included in musical composition data stored in the data for learning 111.
  • Step S2 based on the feature value extracted by the extractor 12 and information representing a part for a predetermined musical instrument stored in the data for learning 111, the generator 13 of the server 10 generates a learned model for selecting a part to be performed with the predetermined musical instrument from among the multiple parts included in musical composition data specified by the user or the like (Step S2).
  • the obtainer 22 of the terminal 20 obtains the data of the learned model from the server 10 in response to a user operation or the like (Step S3).
  • the obtainer 22 of the terminal 20 obtains predetermined musical composition data in response to a user operation or the like (Step S4).
  • the predetermined musical composition data may be data (SMF file) generated in, for example, a format of the SMF (Standard MIDI File) standard.
  • the predetermined musical composition data includes data of multiple parts, in which musical instruments for respective parts may be specified with respect to tones to be output, in a format of the GM (General MIDI) standard.
  • the obtainer 22 of the terminal 20 may download the predetermined musical composition data from a server or the like on the Internet in response to, for example, a user operation.
  • the extractor 23 of the terminal 20 extracts a predetermined feature value for each part included in the predetermined musical composition data obtained by the obtainer 22 (Step S5).
  • the receiver 21 of the terminal 20 receives an operation to specify the degree of difficulty of performance (proficiency of the performer) from the user (Step S6).
  • the determiner 24 of the terminal 20 determines one or more parts suitable to be performed with the predetermined musical instrument, from among the multiple parts included in the predetermined musical composition data (Step S7).
  • the controller 25 of the terminal 20 generates first performance data including the part determined by the determiner 24 and second performance data including parts other than the part determined by the determiner 24 among the parts included in the predetermined musical composition data (Step S8).
  • the first performance data and the second performance data may be generated as an SMF file.
  • the controller 25 of the terminal 20 displays the music based on the first performance data on the screen (Step S9).
  • the controller 25 of the terminal 20 may display, for example, a chord progression, which is simpler music with chord names (note names).
  • the obtainer 31 of the electronic musical instrument 30 obtains the first performance data and the second performance data from the terminal 20 (Step S10).
  • the guide unit 32 of the electronic musical instrument 30 guides (navigates and supports) a performance by the user based on the first performance data (Step S11), and outputs sound based on the performance data from the speaker (Step S12).
  • the guide unit 32 of the electronic musical instrument 30 guides the performance, for example, by lighting the operation part such as a keyboard.
  • the guide unit 32 of the electronic musical instrument 30 determines the progress of the performance according to the performing operations by the user (sequentially updates the current position of the performance in the first performance data), and in accordance with the progress of the performance, causes the reproducer 33 to sequentially generate musical sounds corresponding to notes included in the second performance data. This enables the user to make the performance with the electronic musical instrument 30 in accordance with the music while watching the music displayed on the terminal 20, which is, for example, a tablet terminal.
  • the controller 25 of the terminal 20 may guide the performance by, for example, displaying a keyboard or the like on the screen and lighting the keyboard or the like.
  • the electronic musical instrument 30 outputs sounds corresponding to operations on the keyboard or the like performed by the user from the speaker.
  • the electronic musical instrument 30 may output the sounds corresponding to the operations by the tone of the musical instrument specified for the part in the first performance data in the SMF file of the predetermined musical composition data, or may output by the tone of a musical instrument specified by a user operation. This enables to perform, for example, a part for a musical instrument having no keyboard such as a guitar determined as the first performance data, by the tone of a piano or the like by performance operations using a keyboard or the like of the electronic musical instrument 30.
  • This also enables to perform, for example, a part for a musical instrument having a keyboard such as a piano or the like or a part for a musical instrument having no keyboard such as a guitar determined as the first performance data, by the tone of a guitar or the like by performance operations using a keyboard or the like of the electronic musical instrument 30.
  • FIG. 5 is a flowchart illustrating an example of a process of generating a learned model by machine learning according to the embodiment.
  • the electronic musical instrument 30 is a keyboard instrument such as a piano and two parts are determined as a part for the right hand and a part for the left hand.
  • the disclosed technique can be applied not only to the case of determining two parts for a keyboard instrument, but also to the case of determining only one part.
  • the user may select the type of the musical instrument for which the part is to be selected from among multiple types of musical instruments stored in the learning data (training data).
  • the extractor 12 obtains musical composition data including multiple parts, and a data set of combinations of a part for the right hand and a part for the left hand selected from among the parts included in the musical composition data from the data for learning 111.
  • the extractor 12 may obtain only musical composition data that includes three or more parts as data to be processed.
  • FIG. 6 is a diagram illustrating an example of the data for learning 111 according to the embodiment.
  • the musical composition data ID is information for identifying an item of musical composition data.
  • the parts included in the musical composition data are parts included in musical composition data identified by the musical composition data ID, which constitute example data in supervised learning.
  • the parts included in the musical composition data may be performance information in which information on the pitch of a note, the strength of a note, and the like are encoded according to, for example, MIDI (Musical Instrument Digital Interface) standard.
  • MIDI Musical Instrument Digital Interface
  • a part selected for the right hand and a part selected for the left hand are parts that are determined as suitable for performing with the right hand and the left hand with a predetermined musical instrument, respectively, among the multiple parts included in the musical composition data, which correspond to a correct answer in supervised learning.
  • the musical composition data having a musical composition data ID of "001" includes "part 1A, part 1B, part 1C, part 1D, part 1E, part 1F, part 1G, and so on”; the part selected for the right hand is “part 1C”; and the part selected for the left hand is “part 1E”.
  • the data stored in the data for learning 111 may be set in advance by, for example, a company operating the server 10 or the like.
  • the extractor 12 may generate musical composition data from the musical composition data stored in the data for learning 111, by raising or lowering notes of each of multiple parts by a predetermined pitch, to execute data augmentation. This enables to improve the precision of the learned model even when the number of samples is relatively small. Note that, for example, if notes in a part are "do, re, mi, do, !, raising the pitch by two halftones generates "re, mi, fa#, re, ". The extractor 12 may successively change the value of the pitch to be raised or lowered, to generate multiple data items of data augmentation based on one data item stored in the data for learning 111. In this case, the extractor 12 may change, for example, the value of the pitch to be raised or lowered from -10 to -1 and from 1 to 10 one by one, to generate 20 data items as data augmentation based on the one data item.
  • the extractor 12 extracts a predetermined feature value for each part included in the musical composition data (Step S102).
  • the extractor 12 may extract, as the predetermined feature value, at least one of the average or variance of the time length or the pitch with which the sound of each note included in the part is output; the average or variance of the sound length or the pitch of the highest note at each point in time in the case where there are multiple notes to be output at the same time; the average or variance of the number of notes output per unit time by notes included in the part; the ratios of monophony and polyphony in the part; and the ratios of occurrences of same pitch motion, conjunct motion, and disjunct motion in the part.
  • FIG. 7 is a diagram illustrating feature values for parts used for machine learning.
  • FIG. 7 illustrates notes 701 to 708 written in music, and times 701A to 708A during which the sounds of the notes 701 to 708 are output, respectively, specified in MIDI data.
  • the extractor 12 may use the average and variance of the time length (sound length or note value) and pitch with which the sound of each note included in the part is output, as the feature value of the part. In this case, the extractor 12 may calculate the value of the sound length of, for example, a quarter note as the unity. Also, for example, similarly to the numerical representation of the pitch in MIDI, the extractor 12 may assume the pitch value of C0 denoted as in International Pitch Notation as 12, to calculate the pitch value by incrementing by one every time the pitch is raised by a halftone away from C0.
  • the extractor 12 may use the average and variance of the sound length and the pitch of the highest note at each point in time as the feature value of the part. This is because human ears tend to perceive a higher tone more easily, and the highest tones often form a melody line. In this case, in the example in FIG.
  • the lengths of the sounds of the highest notes at the respective points in time are the times 701A to 703A, 705A, and 708A; the time 704B not overlapping with the time 705A in the time 704A during which the sound of the note 704 is output; and the time 707B not overlapping with the time 708A in the time 707A during which the sound of the note 707 is output.
  • the extractor 12 may use the average and variance of the number of notes output per unit time (e.g., one beat) by notes included in the part as the feature value of the part. This is because the number of notes output per unit time differs depending on the type of instrument; for example, instruments such as drums output a relatively greater number of notes per unit time.
  • the extractor 12 may use the ratio of monophony and polyphony in the part as the feature value of the part.
  • the monophony means, for example, that the number of notes output at the same time is one.
  • the polyphony means, for example, that the number of notes output at the same time is plural. This is because in a part for an instrument using a keyboard, such as a piano, multiple notes are often performed at the same time with one hand.
  • the extractor 12 may use the ratio of the monophonic time length and the ratio of the polyphonic time length in the time length (sound producing time) of the sounds output by the notes included in the part as the feature value of the part.
  • the extractor 12 may use the ratio of monophonic notes and the ratio of polyphonic notes among the notes included in the part as the feature value of the part.
  • the extractor 12 may determine that the multiple notes are polyphonic. This is because even when multiple notes are written on the same time position in music as in the case of the notes 707 and 708 in FIG. 7, in the MIDI data, in order to reproduce a human performance, these notes are specified as shifted by several milliseconds to 100 milliseconds from each other as in the case of the time 707A and the time 708A.
  • the extractor 12 may use the ratios of the numbers of occurrences of the same pitch motion, conjunct motion, and disjunct motion in the part as the feature value of the part.
  • the same pitch motion means, for example, that the pitch of one note and the pitch of a note next to the one note are the same.
  • the conjunct motion means, for example, that the pitch of one note is raised or lowered by one unit as the pitch of a note next to the one note.
  • the disjunct motion means, for example, that the pitch of one note is raised or lowered by two units or more as the pitch of a note next to the one note. This is because it is often the case that the ratios of the same pitch motion and the others vary depending on the types of musical instruments.
  • the extractor 12 may set the ratios of the numbers of occurrences of the same pitch motion, conjunct motion, and disjunct motion with respect to the highest note among the multiple notes, as the feature value of the part.
  • feature values include feature values that are suitable for selecting a part to be performed, and at least a feature value that influences the difficulty of performance and the quality of performed sounds that are common to various musical instruments such as keyboard instruments, wind instruments, string instruments, percussion instruments, and the like, and a feature value that influences the difficulty of performance and the quality of performed sounds in the case of performing with a specific type of musical instrument.
  • the generator 13 performs machine learning on the learning model, to generate data of a learned model (Step S103).
  • the generator 13 may use algorithms such as GBDT (gradient boosting decision tree), SVM (Support Vector Machine), neural network, deep learning, linear regression, logistic regression, and the like to perform machine learning.
  • the generator 13 may use another well-known algorithm to perform machine learning.
  • the learning model described above has a data structure such as a neural network on which learning can be performed by a learning program for a neural network or the like.
  • the learned model although it may be possible to have a data structure such as a neural network on which learning can be performed by a learning program for a neural network or the like, an equivalent function may be provided to be used in a converted form, for example, an executable program code and data written in a general-purpose programming language such as C language.
  • FIG. 8 is a flowchart illustrating an example of a process of performing machine learning by using GBDT. Note that, in the following, executing a series of steps from Step S201 to Step S205 once will be referred to as current learning.
  • the generator 13 determines data to be used in the current learning, among data of pairs of example data and correct answer data obtained from the data for learning 111.
  • the data used for the current learning may be determined randomly.
  • the generator 13 determines a feature value to be used for the current learning from among multiple feature values (Step S202).
  • the generator 13 may randomly determine the feature value used in the current learning. In other words, even if a feature value not suitable for selecting a part to be performed is selected due to a random determination, repeated learning automatically enables to select (give a higher weight to) a feature value suitable for selecting a part to be performed.
  • the generator 13 determines a decision tree based on the data used in the current learning and the feature value used in the current learning (Step S203).
  • the generator 13 calculates a branch condition for reducing the average amount of information (entropy) of a classified result, to generate a decision tree having the branch condition.
  • the generator 13 determines the number of votes for each leaf of the decision tree based on the data used in the current learning and the decision tree generated in the current learning (Step S204).
  • the generator 13 introduces differences among the numbers of votes based on classified results obtained by multiple decision trees generated up to the current learning, so as to raise the correct answer rate when a majority decision is made based on the multiple decision trees for the classified results obtained by the multiple decision trees. This increases the number of votes for a leaf (a node) of the decision tree having a relatively high correct answer rate, and decreases the number of votes for a leaf of the decision tree having a relatively low correct answer rate.
  • the generator 13 gives a weight to data misclassified by the decision tree generated in the current learning (Step S205).
  • the generator 13 gives the weight to the misclassified data so that the average amount of information is estimated relatively greater for the misclassified data. This enables the data misclassified by the decision tree generated in the current learning to tend to be classified correctly in a decision tree generated in the next learning.
  • Step S206 the generator 13 determines whether or not a termination condition is met.
  • the generator 13 determines that the termination condition is met. If the termination condition is not met (NO at Step S206), the process proceeds to Step S201. On the other hand, if the termination condition is met (YES at Step S206), the process ends.
  • FIG. 9 is a diagram illustrating an example of data of a learned model using GBDT.
  • data of a learned model includes data of multiple (e.g., several hundreds) decision trees 801 to 804 and so on generated by executing the series of steps from Step S201 to Step S205 in FIG. 8.
  • the number of votes determined at Step S204 is set.
  • the number of votes is set similarly to the example of the decision tree 801. Thereby, as will be described later, in a process in the execution phase on the terminal 20, one of the classified results is selected from among the classified results by the decision trees by majority decision according to the number of votes for each leaf of each decision tree.
  • FIG. 10 is a flowchart illustrating an example of a process of determining a part based on data of a learned model.
  • the average value of the pitch may be calculated for each of the two parts, to determine a part having the higher average value as the part for the right hand, and to determine the other part having the lower average value as the part for the left hand.
  • the extractor 23 extracts a feature value from each part included in the predetermined musical composition data.
  • the extractor 23 extracts the same feature value as the feature value extracted in the process of Step S102 in FIG. 5.
  • the determiner 24 adjusts parameters in the data of the learned model according to the degree of difficulty of performance specified by the user (Step S302).
  • the determiner 24 may relatively increase the number of votes for a classified result that has been classified by a condition that the ratio of polyphony is relatively high (the ratio of monophony is relatively low).
  • the determiner 24 may relatively decrease the number of votes for a classified result that has been classified by a condition that the ratio of polyphony is relatively low.
  • the determiner 24 may relatively increase the number of votes for a classified result that has been classified by a condition that the average or variance of the number of notes output per unit time is relatively great in each decision tree.
  • the determiner 24 may relatively decrease the number of votes for a classified result that has been classified by a condition that the average or variance of the number of notes output per unit time is relatively small. This enables to select, for example, for a user having a high level of proficiency in performance, a part with a high degree of difficulty of performance that includes a relatively great number of points at which a relatively great number of notes are output at the same time.
  • the server 10 may generate a learned model according to the degree of difficulty of performance based on the data for learning according to the degree of difficulty of performance, so that the determiner 24 uses a learned model specified by the user according to the degree of difficulty of performance.
  • the determiner 24 estimates the naturality of the part with respect to a predetermined musical instrument based on the data of the learned model (Step S303).
  • the predetermined musical instrument may be fixed in advance to one musical instrument such as a keyboard instrument or may be selected by the user from among multiple types of musical instruments such as keyboard musical instruments, wind instruments, and string instruments.
  • the determiner 24 calculates a probability value indicating the naturality as a part for the right hand, a probability value indicating the naturality as a part for the left hand, and a probability value indicating the naturality as another part.
  • the determiner 24 may convert a value voted as a part for the right hand into a value of the probability by using the softmax function or the like.
  • the determiner 24 may add a predetermined weight in accordance with the degree of difficulty of performance specified by the user to the probability value indicating the naturality of the part with respect to a predetermined musical instrument. In this case, as the degree of difficulty of performance specified by the user goes higher, for each part included in the predetermined musical composition data, the determiner 24 may adjust the probability value to be greater, for example, for a part in which the ratio of polyphony is relatively high (the ratio of monophony is relatively low).
  • the determiner 24 may adjust the probability value to be greater, for example, for a part that has a relatively great average or variance of the number of notes output per unit time. This enables, for example, to adjust an estimation result by a learned model using GBDT, SVM, neural network, or the like, and enables to select, for a user having a high level of proficiency in performance, a part with a high degree of difficulty of performance that includes a relatively great number of points at which a relatively great number of notes are output at the same time.
  • the determiner 24 determines a part whose probability of the naturality as a part for the right hand is the highest among the parts included in the predetermined musical composition data, as the part for the right hand (Step S304).
  • the determiner 24 determines a part whose probability of the naturality as a part for the left hand is the highest among the parts included in the predetermined musical composition data and other than the part determined as the part for the right hand, as the part for the left hand (Step S305), and ends the process.
  • This enables to select music to be performed by one piano or the like, for example, by using musical composition data to be performed with parts of multiple musical instruments together.
  • the feature value at least one of the variance of the length of sounds included in a part, the variance of the pitch of sounds included in a part, and the variance of the number of sounds to be output per unit time in a part may be extracted as the feature value.
  • at least one of the ratio of monophonic and polyphony in a part, the ratios of the numbers of occurrences of the same pitch motion, conjunct motion, and disjunct motion in a part may be extracted as the feature value.
  • a part to be performed may be selected from among the multiple classified parts in accordance with the type of musical instrument selected by the user.
  • the devices used by the user for performance are divided into the terminal 20 and the electronic musical instrument 30, their functions may be implemented in a single device.
  • the terminal 20 may be provided with a sound generation function and a performing operation function of the electronic musical instrument 30 (to emulate the function of a musical instrument by using a display screen with a touch panel on the terminal 20), or the electronic musical instrument 30 may be provided with a communication function and various processing functions of the terminal 20.
  • Each of the functional units of the server 10 and the terminal 20 may be implemented by, for example, cloud computing constituted with one or more computers. At least a part of the functional units of the terminal 20 may be provided on the server 10.
  • the obtainer 22, the extractor 23, the determiner 24, and the like may be provided on the server 10 so that the server 10 obtains predetermined musical composition data from the terminal 20 or the like, to generate the first performance data and the second performance data so as to deliver these data items to the terminal 20. Also, at least a part of the functional units of the server 10 may be provided on the terminal 20.
  • the server 10, the terminal 20, and the electronic musical instrument 30 may be configured as an integrated device.
  • the server 10 and the terminal 20 may be configured as an integrated device.
  • the terminal 20 and the electronic musical instrument 30 may be configured as an integrated device.
  • the terminal 20 may be built in the housing of the electronic musical instrument 30, or the operation part such as a keyboard of the electronic musical instrument 30 may be implemented with the touch panel or the like of the terminal 20.
  • the extractor 12 is an example of a "learning-phase extractor”.
  • the extractor 23 is an example of an "execution-phase extractor”.
  • generating a model for selecting a part by machine learning enables to generate a complex model (a highly precise model) that includes criteria which may be impossible (or nearly impossible) for a human being to determine by manual work.
  • a complex model a highly precise model

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Electrophonic Musical Instruments (AREA)
  • Auxiliary Devices For Music (AREA)

Abstract

A machine learning method of causing a learning model to learn is executed by a processor. The machine learning method includes extracting a feature value for each of a plurality of parts included in each of a plurality of items of musical composition data, the feature value relating to suitability of a performance with a specific type of musical instrument; and performing machine learning based on the extracted feature value and information representing a part to be performed with the specific type of musical instrument among a plurality of parts included in musical composition data, so as to cause the learning model to learn to be capable of selecting a part to be performed with the specific type of musical instrument from among a plurality of parts included in musical composition data different from any item of the plurality of items of musical composition data.

Description

MACHINE LEARNING METHOD, ELECTRONIC APPARATUS, ELECTRONIC MUSICAL INSTRUMENT, MODEL GENERATOR FOR PART SELECTION, AND METHOD OF PART DETERMINATION
The present disclosure relates to a technique for selecting a part included in musical composition data.
Conventionally, techniques have been known that create arranged music to be performed by one piano (piano reduction), from music for respective musical instruments (respective parts) in a musical composition to be performed with multiple musical instruments other than a piano (see, for example, Non-Patent Document 1).

[NPL 1] "Automatic Piano Reduction from Ensemble Scores Based on Merged-Output Hidden Markov Model" [online] [searched on February 8, 2018]] Internet <URL: http://eita-nakamura.github.io/articles/Nakamura-Sagayama_AutomaticPianoReduction_ICMC2015.pdf>
Since the music of each part is made assuming to be performed with one specific type of instrument or with either one of the left or right hand on a keyboard instrument, even when to be performed with another instrument or the like, it may be relatively easy to perform. However, in the conventional techniques, since main notes are extracted from among the notes in the music of multiple parts, the ease of performing with the original music may be lost in some cases. Therefore, in one aspect, it is an object to provide a technique that is capable of selecting a part to be performed with a predetermined musical instrument from among multiple parts.
According to an aspect of the present inventive concept, a machine learning method of causing a learning model to learn is executed by a processor. The machine learning method includes extracting a feature value for each of a plurality of parts included in each of a plurality of items of musical composition data, the feature value relating to suitability of a performance with a specific type of musical instrument; and performing machine learning based on the extracted feature value and information representing a part to be performed with the specific type of musical instrument among a plurality of parts included in musical composition data, so as to cause the learning model to learn to be capable of selecting a part to be performed with the specific type of musical instrument from among a plurality of parts included in musical composition data different from any item of the plurality of items of musical composition data.
According to an aspect of the present inventive concept, an electronic device includes a memory configured to store a learned model generated by machine learning; and a processor. The processor is configured to execute extracting a feature value for each of a plurality of parts included in musical composition data, the feature value relating to suitability of a performance with a specific type of musical instrument; inputting the extracted feature value into the learned model; obtaining information for selecting a part to be performed with the specific type of musical instrument from among the plurality of parts included in the musical configuration data; and determining, based on the obtained information, the part to be performed by the specific type of musical instrument from among the plurality of parts.
According to an aspect of the present inventive concept, an electronic musical instrument includes an operation part configured to receive a performing operation; a sound generator configured to generate a sound corresponding to the performing operation performed on the operation part; a memory configured to store a learned model generated by machine learning; and a processor. The processor is configured to execute extracting a feature value for each of a plurality of parts included in musical composition data, the feature value relating to suitability of a performance with the electronic musical instrument; inputting the extracted feature value into the learned model; obtaining information for selecting a part to be performed with the electronic musical instrument from among the plurality of parts included in the musical configuration data; and determining, based on the obtained information, the part to be performed by the electronic musical instrument from among the plurality of parts.
According to an aspect of the present inventive concept, a model generator for part selection includes a memory configured to store a learned model generated by machine learning; and a processor. The processor is configured to execute extracting a feature value for each of a plurality of parts included in each of a plurality of items of musical composition data, the feature value relating to suitability of a performance with a specific type of musical instrument; and performing machine learning based on the extracted feature value and information representing a part to be performed with the specific type of musical instrument among a plurality of parts included in musical composition data, so as to generate the learned model that outputs information for selecting a part to be performed with the specific type of musical instrument from among a plurality of parts included in musical composition data different from any item of the plurality of items of musical composition data.
According to an aspect of the present inventive concept, a method of part determination for determining a part to be performed with a specific type of musical instrument from among a plurality of parts included in musical composition data by using a learned model is executed by a processor. The method includes extracting a feature value for each of a plurality of parts included in musical composition data, the feature value relating to suitability of a performance with a specific type of musical instrument; inputting the extracted feature value into the learned model; obtaining information for selecting a part to be performed with the specific type of musical instrument from among the plurality of parts included in the musical configuration data; and determining, based on the obtained information, the part to be performed by the specific type of musical instrument from among the plurality of parts.

FIG. 1 is a diagram illustrating an example of a configuration of an information processing system according to an embodiment; FIG. 2 is a diagram illustrating an example of a hardware configuration of a server, a terminal, and an electronic musical instrument according to an embodiment; FIG. 3 is a diagram illustrating an example of a functional configuration of an information processing apparatus according to an embodiment; FIG. 4 is a sequence chart illustrating an example of a process of an information processing system according to an embodiment; FIG. 5 is a flowchart illustrating an example of a process of generating a learned model by machine learning according to an embodiment; FIG. 6 is a diagram illustrating an example of data for learning according to an embodiment; FIG. 7 is a diagram illustrating a feature value for each part used in machine learning; FIG. 8 is a flowchart for illustrating an example of a process when machine learning is performed by using GBDT; FIG. 9 is a diagram illustrating an example of data of a learned model obtained by using a GBDT; and FIG. 10 is a flowchart illustrating an example of a process of determining a part based on data of a learned model.
In the following, embodiments of the present inventive concept will be described with reference to the drawings.
<System configuration>
FIG. 1 is a diagram illustrating an example of a configuration of an information processing system 1 according to an embodiment. In FIG. 1, the information processing system 1 includes a server 10, a terminal 20, and an electronic musical instrument 30. Note that the number of devices is not limited to the example in FIG. 1.
Communication is established between the server 10 and the terminal 20 through a network 50 such as a cellular phone network, a LAN (Local Area Network), a wireless LAN, the Internet, and the like. The terminal 20 and the electronic musical instrument 30 are connected with each other by, for example, a USB cable, short-range wireless communication, or the like.
The server 10 is an information processing apparatus (a computer or electronic device) used as a server. The server 10 performs machine learning for causing a part selection model (learning model) to learn based on data for learning, to generate a learned model for selecting a part to be performed with a predetermined instrument from among multiple parts included in musical composition data.
The terminal 20 is, for example, an information processing apparatus such as a tablet terminal, a smartphone, a desktop PC (Personal Computer), a notebook PC, or the like. Based on data of the part selection model (learned model) obtained from the server 10 and musical composition data specified by the user, the terminal 20 selects a part to be performed with the predetermined instrument.
The electronic musical instrument 30 is, for example, an electronic musical instrument such as an electronic keyboard, an electronic organ, an electronic piano, an electronic wind instrument, an electronic string instrument, a synthesizer, or the like. The electronic musical instrument 30 outputs sound based on the musical composition data input from the terminal 20. The electronic musical instrument 30 also outputs sound in response to an operation performed by the user.
<Hardware configuration>
FIG. 2 is a diagram illustrating an example of a hardware configuration of the server 10, the terminal 20, and the electronic musical instrument 30 according to the embodiment. In the following, the configuration will be described with taking the server 10 as an example. The server 10 in FIG. 2 includes a drive device 100, an auxiliary storage device 102, a memory device 103, a CPU (Central Processing Unit) 104, an interface device 105, a display device 106, an input device 107, and an output device 108, which are connected with each other through a bus B.
A program for implementing a process on the server 10 (an information processing program) is provided with a recording medium 101. When the recording medium 101 storing the program is set in the drive device 100, the program is installed from the recording medium 101 into the auxiliary storage device 102 via the drive device 100. However, it is not always necessary to install the program from the recording medium 101; the program may be downloaded from another computer via the network. The auxiliary storage device 102 stores the installed programs, and stores necessary files, data, and the like.
The memory device 103 reads out the program from the auxiliary storage device 102 and stores the program in itself upon receiving a command to activate the program. The CPU 104 implements functions relating to the server 10 according to the program stored in the memory device 103. The interface device 105 is used as an interface for connecting to a network or the like. The display device 106 displays a GUI (Graphical User Interface) or the like by the program. The input device 107 is constituted with a keyboard, a mouse, a touch panel, buttons, and the like, to be used for inputting various operation commands. The output device 108 is constituted with a speaker or the like, to be used for outputting various sounds.
Note that as an example of the recording medium 101, a portable recording medium such as CD-ROM, DVD disk, USB memory, or the like may be considered. Also, as an example of the auxiliary storage device 102, an HDD (Hard Disk Drive), a flash memory, or the like may be considered. Each of the recording medium 101 and the auxiliary storage device 102 corresponds to a computer-readable recording medium. The server 10 may further include a GPU (Graphics Processing Unit) so as to perform at least a part of the process relating to machine learning, which will be described later, by using the GPU. In this case, an accelerator board on which the GPU is mounted may be connected to the bus B via the interface device 105 or the like.
Note that the hardware configuration of the terminal 20 and the electronic musical instrument 30 may be substantially the same as the hardware configuration example of the server 10 illustrated in FIG. 2. The input device 107 in the electronic musical instrument 30 is constituted with, for example, a keyboard to input a note to be performed. The electronic musical instrument 30 also includes as the output device 108 a sound generator (a sound source, a speaker, etc.) to generate (reproduce) a performed note or the like. The input device and the output device of the electronic musical instrument 30 are examples of an "operation part" and a "sound generator", respectively.
<Functional configuration>
Next, with reference to FIG. 3, functional configurations of the server 10 and the terminal 20 according to the embodiment will be described. FIG. 3 is a diagram illustrating an example of functional configurations of the server 10 and the terminal 20 according to the embodiment.
<<Functional structure of server 10>>
The server 10 includes a storage 11. The storage 11 is implemented by using, for example, the auxiliary storage device 102 and the like. The storage 11 stores data such as data for learning 111. In the data for learning 111, a data set is stored that includes pairs of data of notes of multiple parts included in an item of musical composition data (performance information), and information representing a part selected in advance from among the multiple parts included in the item of musical composition data for a predetermined musical instrument (training data). A detailed example of the data for learning 111 will be described later.
The server 10 also includes an extractor 12, a generator 13, and an output unit 14. These units are implemented by processes which one or more programs installed in the server 10 cause the CPU 104 of the server 10 to execute.
The extractor 12 extracts a predetermined feature value for each part included in each of the multiple items of musical composition data ("first musical composition data") stored in the data for learning 111.
Based on the predetermined feature value extracted by the extractor 12 and the information representing the part for the predetermined musical instrument stored in the data for learning 111, the generator 13 generates a learned model for selecting a part to be performed with the predetermined musical instrument from among the multiple parts included in an item of musical composition data specified by the user or the like.
The output unit 14 outputs the data of the learned model generated by the generator 13 to the terminal 20. For example, the output unit 14 may attach the data of the learned model to an application installed in the terminal 20, to deliver the application to the terminal 20 via an external server or the like.
<<Functional configuration of terminal 20>>
The terminal 20 includes a receiver 21, an obtainer 22, an extractor 23, a determiner 24, a controller 25, and an output unit 26. These units are implemented by processes which one or more programs installed in the terminal 20 cause the CPU of the terminal 20 to execute.
The receiver 21 receives various operations from the user of the terminal 20.
The obtainer 22 obtains the data of the learned model for selecting the part to be performed with the predetermined musical instrument from among the multiple parts included in the musical composition data from the server 10. Also, in response to a user operation or the like, the obtainer 22 obtains musical composition data ("second musical composition data") including multiple parts from an external server or the like.
The extractor 23 extracts a predetermined feature value for each of the multiple parts included in the musical composition data obtained by the obtainer 22.
The determiner 24 determines, based on the learned model obtained by the obtainer 22 and the predetermined feature value for each of the multiple parts extracted by the extractor 23, a part to be performed with a predetermined musical instrument, from among the multiple parts included in the musical composition data obtained by the obtainer 22.
The controller 25 displays music corresponding to the part determined by the determiner 24 on a screen. The output unit 26 outputs the part information and the like determined by the determiner 24 to the electronic musical instrument 30. The output unit 26 outputs, for example, the first performance data including the part determined by the determiner 24 and the second performance data including parts other than the part determined by the determiner 24 among the parts included in the predetermined musical composition data to the electronic musical instrument 30.
<<Functional configuration of electronic musical instrument 30>>
The electronic musical instrument 30 includes an obtainer 31, a guide unit 32, and a reproducer 33. These units are implemented by processes which one or more programs installed in the electronic musical instrument 30 cause the CPU of the electronic musical instrument 30 to execute.
The obtainer 31 obtains the first performance data and the second performance data from the terminal 20. The guide unit 32 guides a performance by the user based on the first performance data obtained by the obtainer 31. Following instructions of the guide unit 32, the reproducer 33 reproduces sounds corresponding to notes included in the second performance data, to output the sounds from a speaker.
<Process>
Next, with reference to FIG. 4, a process of the information processing system 1 according to the embodiment will be described. FIG. 4 is a sequence chart illustrating an example of a process of the information processing system 1 according to the embodiment.
At Step S1, the extractor 12 of the server 10 extracts a predetermined feature value for each part included in musical composition data stored in the data for learning 111.
Next, based on the feature value extracted by the extractor 12 and information representing a part for a predetermined musical instrument stored in the data for learning 111, the generator 13 of the server 10 generates a learned model for selecting a part to be performed with the predetermined musical instrument from among the multiple parts included in musical composition data specified by the user or the like (Step S2).
Next, the obtainer 22 of the terminal 20 obtains the data of the learned model from the server 10 in response to a user operation or the like (Step S3).
Next, the obtainer 22 of the terminal 20 obtains predetermined musical composition data in response to a user operation or the like (Step S4). Here, for example, the predetermined musical composition data may be data (SMF file) generated in, for example, a format of the SMF (Standard MIDI File) standard. The predetermined musical composition data includes data of multiple parts, in which musical instruments for respective parts may be specified with respect to tones to be output, in a format of the GM (General MIDI) standard. The obtainer 22 of the terminal 20 may download the predetermined musical composition data from a server or the like on the Internet in response to, for example, a user operation.
Next, the extractor 23 of the terminal 20 extracts a predetermined feature value for each part included in the predetermined musical composition data obtained by the obtainer 22 (Step S5). Next, the receiver 21 of the terminal 20 receives an operation to specify the degree of difficulty of performance (proficiency of the performer) from the user (Step S6).
Next, based on the degree of difficulty, the predetermined musical composition data, and the data of the learned model, the determiner 24 of the terminal 20 determines one or more parts suitable to be performed with the predetermined musical instrument, from among the multiple parts included in the predetermined musical composition data (Step S7).
Next, the controller 25 of the terminal 20 generates first performance data including the part determined by the determiner 24 and second performance data including parts other than the part determined by the determiner 24 among the parts included in the predetermined musical composition data (Step S8). Here, the first performance data and the second performance data may be generated as an SMF file.
Next, the controller 25 of the terminal 20 displays the music based on the first performance data on the screen (Step S9). Here, the controller 25 of the terminal 20 may display, for example, a chord progression, which is simpler music with chord names (note names).
Next, the obtainer 31 of the electronic musical instrument 30 obtains the first performance data and the second performance data from the terminal 20 (Step S10). Next, in response to receiving a predetermined operation from the user, the guide unit 32 of the electronic musical instrument 30 guides (navigates and supports) a performance by the user based on the first performance data (Step S11), and outputs sound based on the performance data from the speaker (Step S12). Here, the guide unit 32 of the electronic musical instrument 30 guides the performance, for example, by lighting the operation part such as a keyboard. Also, the guide unit 32 of the electronic musical instrument 30 determines the progress of the performance according to the performing operations by the user (sequentially updates the current position of the performance in the first performance data), and in accordance with the progress of the performance, causes the reproducer 33 to sequentially generate musical sounds corresponding to notes included in the second performance data. This enables the user to make the performance with the electronic musical instrument 30 in accordance with the music while watching the music displayed on the terminal 20, which is, for example, a tablet terminal.
Note that at least a part of the process of guiding the performance at Step S9 and the process of outputting the sound at Step S10 may be executed by the controller 25 of the terminal 20. In this case, the controller 25 of the terminal 20 may guide the performance by, for example, displaying a keyboard or the like on the screen and lighting the keyboard or the like.
Also, the electronic musical instrument 30 outputs sounds corresponding to operations on the keyboard or the like performed by the user from the speaker. In this case, the electronic musical instrument 30 may output the sounds corresponding to the operations by the tone of the musical instrument specified for the part in the first performance data in the SMF file of the predetermined musical composition data, or may output by the tone of a musical instrument specified by a user operation. This enables to perform, for example, a part for a musical instrument having no keyboard such as a guitar determined as the first performance data, by the tone of a piano or the like by performance operations using a keyboard or the like of the electronic musical instrument 30. This also enables to perform, for example, a part for a musical instrument having a keyboard such as a piano or the like or a part for a musical instrument having no keyboard such as a guitar determined as the first performance data, by the tone of a guitar or the like by performance operations using a keyboard or the like of the electronic musical instrument 30.
<<Process of generating learned model>>
Next, with reference to FIG. 5 to FIG. 7, a process of generating a learned model by machine learning on the server 10, which corresponds to Step S1 and Step S2 in FIG. 4, will be described. FIG. 5 is a flowchart illustrating an example of a process of generating a learned model by machine learning according to the embodiment. In the following, an example will be described in which the electronic musical instrument 30 is a keyboard instrument such as a piano and two parts are determined as a part for the right hand and a part for the left hand. Note that the disclosed technique can be applied not only to the case of determining two parts for a keyboard instrument, but also to the case of determining only one part. Also, the user may select the type of the musical instrument for which the part is to be selected from among multiple types of musical instruments stored in the learning data (training data).
At Step S101, the extractor 12 obtains musical composition data including multiple parts, and a data set of combinations of a part for the right hand and a part for the left hand selected from among the parts included in the musical composition data from the data for learning 111. Here, when the electronic musical instrument 30 is an electronic musical instrument performed, for example, with both hands by the user, such as a keyboard, the extractor 12 may obtain only musical composition data that includes three or more parts as data to be processed.
FIG. 6 is a diagram illustrating an example of the data for learning 111 according to the embodiment. In an example of the data for learning 111 illustrated in FIG. 6, for each of the multiple musical composition data IDs, parts included in the musical composition data, a part selected for the right hand, and a part selected for the left hand are associated with each other to be stored. The musical composition data ID is information for identifying an item of musical composition data. The parts included in the musical composition data are parts included in musical composition data identified by the musical composition data ID, which constitute example data in supervised learning. The parts included in the musical composition data may be performance information in which information on the pitch of a note, the strength of a note, and the like are encoded according to, for example, MIDI (Musical Instrument Digital Interface) standard. Here, although data for learning (training data) suitable for selecting parts to be performed with a keyboard instrument is used, in the case of selecting a part to be performed with a musical instrument other than a keyboard instrument, another data for learning (training data) is used.
A part selected for the right hand and a part selected for the left hand are parts that are determined as suitable for performing with the right hand and the left hand with a predetermined musical instrument, respectively, among the multiple parts included in the musical composition data, which correspond to a correct answer in supervised learning.
The example in FIG. 6 illustrates that the musical composition data having a musical composition data ID of "001" includes "part 1A, part 1B, part 1C, part 1D, part 1E, part 1F, part 1G, and so on"; the part selected for the right hand is "part 1C"; and the part selected for the left hand is "part 1E". Note that the data stored in the data for learning 111 may be set in advance by, for example, a company operating the server 10 or the like.
Note that the extractor 12 may generate musical composition data from the musical composition data stored in the data for learning 111, by raising or lowering notes of each of multiple parts by a predetermined pitch, to execute data augmentation. This enables to improve the precision of the learned model even when the number of samples is relatively small. Note that, for example, if notes in a part are "do, re, mi, do, ...", raising the pitch by two halftones generates "re, mi, fa#, re, ...". The extractor 12 may successively change the value of the pitch to be raised or lowered, to generate multiple data items of data augmentation based on one data item stored in the data for learning 111. In this case, the extractor 12 may change, for example, the value of the pitch to be raised or lowered from -10 to -1 and from 1 to 10 one by one, to generate 20 data items as data augmentation based on the one data item.
Next, for each of the obtained combinations, the extractor 12 extracts a predetermined feature value for each part included in the musical composition data (Step S102). Here, as will be described below, the extractor 12 may extract, as the predetermined feature value, at least one of the average or variance of the time length or the pitch with which the sound of each note included in the part is output; the average or variance of the sound length or the pitch of the highest note at each point in time in the case where there are multiple notes to be output at the same time; the average or variance of the number of notes output per unit time by notes included in the part; the ratios of monophony and polyphony in the part; and the ratios of occurrences of same pitch motion, conjunct motion, and disjunct motion in the part.
FIG. 7 is a diagram illustrating feature values for parts used for machine learning. FIG. 7 illustrates notes 701 to 708 written in music, and times 701A to 708A during which the sounds of the notes 701 to 708 are output, respectively, specified in MIDI data.
The extractor 12 may use the average and variance of the time length (sound length or note value) and pitch with which the sound of each note included in the part is output, as the feature value of the part. In this case, the extractor 12 may calculate the value of the sound length of, for example, a quarter note as the unity. Also, for example, similarly to the numerical representation of the pitch in MIDI, the extractor 12 may assume the pitch value of C0 denoted as in International Pitch Notation as 12, to calculate the pitch value by incrementing by one every time the pitch is raised by a halftone away from C0.
Also, when there are multiple notes to be output at the time among the notes included in the part, the extractor 12 may use the average and variance of the sound length and the pitch of the highest note at each point in time as the feature value of the part. This is because human ears tend to perceive a higher tone more easily, and the highest tones often form a melody line. In this case, in the example in FIG. 7, the lengths of the sounds of the highest notes at the respective points in time are the times 701A to 703A, 705A, and 708A; the time 704B not overlapping with the time 705A in the time 704A during which the sound of the note 704 is output; and the time 707B not overlapping with the time 708A in the time 707A during which the sound of the note 707 is output.
Also, the extractor 12 may use the average and variance of the number of notes output per unit time (e.g., one beat) by notes included in the part as the feature value of the part. This is because the number of notes output per unit time differs depending on the type of instrument; for example, instruments such as drums output a relatively greater number of notes per unit time.
Also, the extractor 12 may use the ratio of monophony and polyphony in the part as the feature value of the part. Here, the monophony means, for example, that the number of notes output at the same time is one. Also, the polyphony means, for example, that the number of notes output at the same time is plural. This is because in a part for an instrument using a keyboard, such as a piano, multiple notes are often performed at the same time with one hand. In this case, for example, the extractor 12 may use the ratio of the monophonic time length and the ratio of the polyphonic time length in the time length (sound producing time) of the sounds output by the notes included in the part as the feature value of the part. Alternatively, the extractor 12 may use the ratio of monophonic notes and the ratio of polyphonic notes among the notes included in the part as the feature value of the part. In this case, when the difference between the timings at which sounds of multiple notes are output in the MIDI data is within a predetermined threshold value (e.g., 0.1 seconds), the extractor 12 may determine that the multiple notes are polyphonic. This is because even when multiple notes are written on the same time position in music as in the case of the notes 707 and 708 in FIG. 7, in the MIDI data, in order to reproduce a human performance, these notes are specified as shifted by several milliseconds to 100 milliseconds from each other as in the case of the time 707A and the time 708A.
Also, the extractor 12 may use the ratios of the numbers of occurrences of the same pitch motion, conjunct motion, and disjunct motion in the part as the feature value of the part. Here, the same pitch motion means, for example, that the pitch of one note and the pitch of a note next to the one note are the same. The conjunct motion means, for example, that the pitch of one note is raised or lowered by one unit as the pitch of a note next to the one note. The disjunct motion means, for example, that the pitch of one note is raised or lowered by two units or more as the pitch of a note next to the one note. This is because it is often the case that the ratios of the same pitch motion and the others vary depending on the types of musical instruments. This is also because it is often the case that the part corresponding to a melody is less likely to progress with disjunct motion. When there are multiple notes to be output at the same time among the notes included in the part, the extractor 12 may set the ratios of the numbers of occurrences of the same pitch motion, conjunct motion, and disjunct motion with respect to the highest note among the multiple notes, as the feature value of the part.
These feature values include feature values that are suitable for selecting a part to be performed, and at least a feature value that influences the difficulty of performance and the quality of performed sounds that are common to various musical instruments such as keyboard instruments, wind instruments, string instruments, percussion instruments, and the like, and a feature value that influences the difficulty of performance and the quality of performed sounds in the case of performing with a specific type of musical instrument. In other words, depending on the type of musical instrument, there is a musical instrument that is relatively easy (difficult) even for a beginner to quickly perform operations to sequentially specify the pitch; a musical instrument that is relatively easy (difficult) to perform an operation to specify multiple pitches at the same time; a musical instrument for which a rhythm-specifying operation is more important than a pitch-specifying operation; therefore, when it is necessary to select different parts to be performed in accordance with the type of musical instrument performed by the user, a feature value corresponding to the specific type of musical instrument is required.
Next, based on the extracted feature value, the generator 13 performs machine learning on the learning model, to generate data of a learned model (Step S103). In this case, the generator 13 may use algorithms such as GBDT (gradient boosting decision tree), SVM (Support Vector Machine), neural network, deep learning, linear regression, logistic regression, and the like to perform machine learning. Alternatively, the generator 13 may use another well-known algorithm to perform machine learning. Note that the learning model described above has a data structure such as a neural network on which learning can be performed by a learning program for a neural network or the like. However, as for the learned model, although it may be possible to have a data structure such as a neural network on which learning can be performed by a learning program for a neural network or the like, an equivalent function may be provided to be used in a converted form, for example, an executable program code and data written in a general-purpose programming language such as C language.
(Example of machine learning by using GBDT)
Next, with reference to FIG. 8 and FIG. 9, an example of a process of performing machine learning by using GBDT at Step S103 in FIG. 5 will be described. FIG. 8 is a flowchart illustrating an example of a process of performing machine learning by using GBDT. Note that, in the following, executing a series of steps from Step S201 to Step S205 once will be referred to as current learning.
At Step S201, the generator 13 determines data to be used in the current learning, among data of pairs of example data and correct answer data obtained from the data for learning 111. Here, the data used for the current learning may be determined randomly.
Next, the generator 13 determines a feature value to be used for the current learning from among multiple feature values (Step S202). Here, the generator 13 may randomly determine the feature value used in the current learning. In other words, even if a feature value not suitable for selecting a part to be performed is selected due to a random determination, repeated learning automatically enables to select (give a higher weight to) a feature value suitable for selecting a part to be performed.
Next, the generator 13 determines a decision tree based on the data used in the current learning and the feature value used in the current learning (Step S203). Here, for example, the generator 13 calculates a branch condition for reducing the average amount of information (entropy) of a classified result, to generate a decision tree having the branch condition.
Next, the generator 13 determines the number of votes for each leaf of the decision tree based on the data used in the current learning and the decision tree generated in the current learning (Step S204). Here, the generator 13 introduces differences among the numbers of votes based on classified results obtained by multiple decision trees generated up to the current learning, so as to raise the correct answer rate when a majority decision is made based on the multiple decision trees for the classified results obtained by the multiple decision trees. This increases the number of votes for a leaf (a node) of the decision tree having a relatively high correct answer rate, and decreases the number of votes for a leaf of the decision tree having a relatively low correct answer rate.
Next, the generator 13 gives a weight to data misclassified by the decision tree generated in the current learning (Step S205). Here, the generator 13 gives the weight to the misclassified data so that the average amount of information is estimated relatively greater for the misclassified data. This enables the data misclassified by the decision tree generated in the current learning to tend to be classified correctly in a decision tree generated in the next learning.
Next, the generator 13 determines whether or not a termination condition is met (Step S206). Here, for example, when the improvement of the correct answer rate is settled, the generator 13 determines that the termination condition is met. If the termination condition is not met (NO at Step S206), the process proceeds to Step S201. On the other hand, if the termination condition is met (YES at Step S206), the process ends.
FIG. 9 is a diagram illustrating an example of data of a learned model using GBDT. In the example in FIG. 9, data of a learned model includes data of multiple (e.g., several hundreds) decision trees 801 to 804 and so on generated by executing the series of steps from Step S201 to Step S205 in FIG. 8. Also, for each of the leaves 801A to 801E in the decision tree 801, the number of votes determined at Step S204 is set. Also, for each leaf in the other decision trees 802 to 804 and so on, the number of votes is set similarly to the example of the decision tree 801. Thereby, as will be described later, in a process in the execution phase on the terminal 20, one of the classified results is selected from among the classified results by the decision trees by majority decision according to the number of votes for each leaf of each decision tree.
<<Determination process using learned model>>
Next, referring to FIG. 10, a process of determining one or more parts suitable to be performed with a predetermined musical instrument from among multiple parts included in predetermined musical composition data, based on data of learned model on the terminal 20 at Steps S5 to S7 in FIG. 4, will be described. FIG. 10 is a flowchart illustrating an example of a process of determining a part based on data of a learned model.
In the following, a case where two or more parts are included in the predetermined musical composition data will be described. Note that if the number of parts included in the predetermined musical composition data is two, the average value of the pitch may be calculated for each of the two parts, to determine a part having the higher average value as the part for the right hand, and to determine the other part having the lower average value as the part for the left hand.
At Step S301, the extractor 23 extracts a feature value from each part included in the predetermined musical composition data. Here, the extractor 23 extracts the same feature value as the feature value extracted in the process of Step S102 in FIG. 5.
Next, the determiner 24 adjusts parameters in the data of the learned model according to the degree of difficulty of performance specified by the user (Step S302). In the case of using the above-described GBDT for generation of the learned model, for example, as the degree of difficulty of performance specified by the user goes higher, the determiner 24 may relatively increase the number of votes for a classified result that has been classified by a condition that the ratio of polyphony is relatively high (the ratio of monophony is relatively low). Alternatively, the determiner 24 may relatively decrease the number of votes for a classified result that has been classified by a condition that the ratio of polyphony is relatively low. Also, for example, as the degree of difficulty of performance specified by the user goes higher, the determiner 24 may relatively increase the number of votes for a classified result that has been classified by a condition that the average or variance of the number of notes output per unit time is relatively great in each decision tree. Alternatively, the determiner 24 may relatively decrease the number of votes for a classified result that has been classified by a condition that the average or variance of the number of notes output per unit time is relatively small. This enables to select, for example, for a user having a high level of proficiency in performance, a part with a high degree of difficulty of performance that includes a relatively great number of points at which a relatively great number of notes are output at the same time.
Instead of the above, the server 10 may generate a learned model according to the degree of difficulty of performance based on the data for learning according to the degree of difficulty of performance, so that the determiner 24 uses a learned model specified by the user according to the degree of difficulty of performance.
Next, for each part included in the predetermined musical composition data, the determiner 24 estimates the naturality of the part with respect to a predetermined musical instrument based on the data of the learned model (Step S303). Note that the predetermined musical instrument may be fixed in advance to one musical instrument such as a keyboard instrument or may be selected by the user from among multiple types of musical instruments such as keyboard musical instruments, wind instruments, and string instruments. Here, for each part included in the predetermined musical composition data, the determiner 24 calculates a probability value indicating the naturality as a part for the right hand, a probability value indicating the naturality as a part for the left hand, and a probability value indicating the naturality as another part. Here, for example, in the case of using the above-described GBDT for generation of the learned model, the determiner 24 may convert a value voted as a part for the right hand into a value of the probability by using the softmax function or the like.
Also, for each part included in the predetermined musical composition data, the determiner 24 may add a predetermined weight in accordance with the degree of difficulty of performance specified by the user to the probability value indicating the naturality of the part with respect to a predetermined musical instrument. In this case, as the degree of difficulty of performance specified by the user goes higher, for each part included in the predetermined musical composition data, the determiner 24 may adjust the probability value to be greater, for example, for a part in which the ratio of polyphony is relatively high (the ratio of monophony is relatively low). Also, as the degree of difficulty of performance specified by the user goes higher, for each part included in the predetermined musical composition data, the determiner 24 may adjust the probability value to be greater, for example, for a part that has a relatively great average or variance of the number of notes output per unit time. This enables, for example, to adjust an estimation result by a learned model using GBDT, SVM, neural network, or the like, and enables to select, for a user having a high level of proficiency in performance, a part with a high degree of difficulty of performance that includes a relatively great number of points at which a relatively great number of notes are output at the same time.
Next, the determiner 24 determines a part whose probability of the naturality as a part for the right hand is the highest among the parts included in the predetermined musical composition data, as the part for the right hand (Step S304).
Next, the determiner 24 determines a part whose probability of the naturality as a part for the left hand is the highest among the parts included in the predetermined musical composition data and other than the part determined as the part for the right hand, as the part for the left hand (Step S305), and ends the process. This enables to select music to be performed by one piano or the like, for example, by using musical composition data to be performed with parts of multiple musical instruments together.
<Modified example>
In the embodiment described above, although various feature values are extracted, at least one of the variance of the length of sounds included in a part, the variance of the pitch of sounds included in a part, and the variance of the number of sounds to be output per unit time in a part may be extracted as the feature value. In the embodiment described above, although various feature values are extracted, at least one of the ratio of monophonic and polyphony in a part, the ratios of the numbers of occurrences of the same pitch motion, conjunct motion, and disjunct motion in a part may be extracted as the feature value. In the embodiment described above, although various feature values are extracted, at least two of the average or variance of the time length or the pitch with which the sound of each note included in the part is output; the average or variance of the sound length or the pitch of a highest note at each point in time in the case where there are multiple notes to be output at the same time; the average or variance of the number of notes output per unit time by notes included in the part; the ratios of monophony and polyphony in the part; and the ratios of occurrences of same pitch motion, conjunct motion, and disjunct motion in the part may be extracted as the feature value. In the embodiment described above, although learning is performed by inputting both performance data and training data (part selection information) as data for learning so that a part for the right hand and a part for the left hand can be correctly extracted, only the performance data may be input without inputting the training data (part selection information), to learn to classify (categorize) the parts included in the performance data based on a specified feature value, so as to allow the user to select a part to be performed from among the classified parts. Also, a part to be performed may be selected from among the multiple classified parts in accordance with the type of musical instrument selected by the user.
Also, in the embodiment described above, although the devices used by the user for performance are divided into the terminal 20 and the electronic musical instrument 30, their functions may be implemented in a single device. For example, the terminal 20 may be provided with a sound generation function and a performing operation function of the electronic musical instrument 30 (to emulate the function of a musical instrument by using a display screen with a touch panel on the terminal 20), or the electronic musical instrument 30 may be provided with a communication function and various processing functions of the terminal 20. Each of the functional units of the server 10 and the terminal 20 may be implemented by, for example, cloud computing constituted with one or more computers. At least a part of the functional units of the terminal 20 may be provided on the server 10. In this case, for example, the obtainer 22, the extractor 23, the determiner 24, and the like may be provided on the server 10 so that the server 10 obtains predetermined musical composition data from the terminal 20 or the like, to generate the first performance data and the second performance data so as to deliver these data items to the terminal 20. Also, at least a part of the functional units of the server 10 may be provided on the terminal 20.
The server 10, the terminal 20, and the electronic musical instrument 30 may be configured as an integrated device. Alternatively, the server 10 and the terminal 20 may be configured as an integrated device. Alternatively, the terminal 20 and the electronic musical instrument 30 may be configured as an integrated device. In this case, the terminal 20 may be built in the housing of the electronic musical instrument 30, or the operation part such as a keyboard of the electronic musical instrument 30 may be implemented with the touch panel or the like of the terminal 20.
As above, the embodiments of the present inventive concept have been described in detail; note that the present inventive concept is not limited to such specific embodiments, and various modifications and changes can be made within the scope of the subject matters of the present inventive concept described in the claims. Note that the extractor 12 is an example of a "learning-phase extractor". Also, the extractor 23 is an example of an "execution-phase extractor".
Also, in the embodiments described above, generating a model for selecting a part by machine learning enables to generate a complex model (a highly precise model) that includes criteria which may be impossible (or nearly impossible) for a human being to determine by manual work. However, it is not always necessary to determine all criteria by machine learning; a person may partially intervene in determining criteria that use various feature values that affect performance with a musical instrument as described above, or if possible, a person may determine all the criteria by manual work. Even in such a case, using various feature values that affect a performance with a musical instrument as described above enables to generate a model for selecting a part more effectively.
The present application is based on and claims the benefit of priority of Japanese Priority Application No. 2018-046692 filed on March 14, 2018, with the Japanese Patent Office, the entire contents of which are hereby incorporated by reference.

Claims (17)

  1.     A machine learning method of causing a learning model to learn, executed by a processor, the method comprising:
        extracting a feature value for each of a plurality of parts included in each of a plurality of items of musical composition data, the feature value relating to suitability of a performance with a specific type of musical instrument; and
        performing machine learning based on the extracted feature value and information representing a part to be performed with the specific type of musical instrument among a plurality of parts included in musical composition data, so as to cause the learning model to learn to be capable of selecting a part to be performed with the specific type of musical instrument from among a plurality of parts included in musical composition data different from any item of the plurality of items of musical composition data.
  2.     The machine learning method as claimed in claim 1, wherein the feature value includes at least one of
        variance of lengths of sounds included in the part,
        variance of pitches of sounds included in the part, and
        variance of numbers of sounds output per unit time with respect to sounds included in the part.
  3.     The machine learning method as claimed in claim 1, wherein the feature value includes at least one of
        ratios of monophony and polyphony in the part, and
        ratios of occurrences of same pitch motion, conjunct motion, and disjunct motion in the part.
  4.     The machine learning method as claimed in claim 1, wherein the feature value includes at least two of
        an average or variance of time lengths or pitches with which sounds of notes included in the part are output,
        an average or variance of lengths or pitches of highest notes at respective points in time in a case where there are multiple notes to be output at a same time in the part;
        an average or variance of numbers of sounds output per unit time by notes included in the part;
        ratios of monophony and polyphony in the part, and
        ratios of occurrences of same pitch motion, conjunct motion, and disjunct motion in the part.
  5.     The machine learning method as claimed in any one of claims 1 to 4, the method further comprising:
        extracting feature values for parts in which each note of the parts included in the musical composition data is raised or lowered by a predetermined pitch.
  6.     The machine learning method as claimed in any one of claims 1 to 4, wherein learning of the learning model is performed by machine learning corresponding to GBDT (Gradient Boosting Decision Tree).
  7.     An electronic device comprising:
        a memory configured to store a learned model generated by machine learning; and
        a processor,
        wherein the processor is configured to execute
        extracting a feature value for each of a plurality of parts included in musical composition data, the feature value relating to suitability of a performance with a specific type of musical instrument,
        inputting the extracted feature value into the learned model,
        obtaining information for selecting a part to be performed with the specific type of musical instrument from among the plurality of parts included in the musical configuration data, and
        determining, based on the obtained information, the part to be performed by the specific type of musical instrument from among the plurality of parts.
  8.     The electronic device as claimed in claim 7, wherein the learned model is obtained by causing a learning model to learn by using
        a feature value for each of a plurality of parts included in each of a plurality of items of musical composition data, the feature value relating to suitability of a performance with the specific type of musical instrument; and
        information representing a part to be performed with the specific type of musical instrument among the plurality of parts included in said each of the plurality of items of musical composition data.
  9.     The electronic device as claimed in claim 7, wherein the feature value includes at least one of
        variance of lengths of sounds included in the part,
        variance of pitches of sounds included in the part, and
        variance of numbers of sounds output per unit time with respect to sounds included in the part.
  10.     The electronic device as claimed in claim 7, wherein the feature value includes at least one of
        ratios of monophony and polyphony in the part, and
        ratios of occurrences of same pitch motion, conjunct motion, and disjunct motion in the part.
  11.     The electronic device as claimed in claim 7, wherein the feature value includes at least two of
        an average or variance of time lengths or pitches with which sounds of notes included in the part are output,
        an average or variance of lengths or pitches of highest notes at respective points in time in a case where there are multiple notes to be output at a same time in the part;
        an average or variance of numbers of sounds output per unit time by notes included in the part;
        ratios of monophony and polyphony in the part, and
        ratios of occurrences of same pitch motion, conjunct motion, and disjunct motion in the part.
  12.     The electronic device as claimed in any one of claims 7 to 11, wherein the processor adjusts, in accordance with a degree of difficulty of performance specified by a user, at least one of
        a parameter included in the learned model, and
        a probability value representing naturality of a part to be performed with a musical instrument, estimated by the learned model.
  13.     The electronic device according to any one of claims 7 to 11, wherein the processor further executes
        displaying music corresponding to the determined part on a screen, and
        outputting sounds corresponding to a part other than the determined part to be performed by the specific type of musical instrument among the plurality of parts included in the musical configuration data.
  14.     The electronic device according to any one of claims 7 to 11, wherein the specific type of musical instrument is a musical instrument having a keyboard, and
        wherein the learned model is a model for selecting a part to be performed with a right hand of a user on the musical instrument, and a part to be performed with a left hand of the user on the musical instrument, from among the plurality of parts included in the musical composition data.
  15.     An electronic musical instrument comprising:
        an operation part configured to receive a performing operation;
        a sound generator configured to generate a sound corresponding to the performing operation performed on the operation part;
        a memory configured to store a learned model generated by machine learning; and
        a processor configured to execute
        extracting a feature value for each of a plurality of parts included in musical composition data, the feature value relating to suitability of a performance with the electronic musical instrument,
        inputting the extracted feature value into the learned model,
        obtaining information for selecting a part to be performed with the electronic musical instrument from among the plurality of parts included in the musical configuration data, and
        determining, based on the obtained information, the part to be performed by the electronic musical instrument from among the plurality of parts.
  16.     A model generator for part selection comprising:
        a memory configured to store a learned model generated by machine learning; and
        a processor configured to execute
        extracting a feature value for each of a plurality of parts included in each of a plurality of items of musical composition data, the feature value relating to suitability of a performance with a specific type of musical instrument, and
        performing machine learning based on the extracted feature value and information representing a part to be performed with the specific type of musical instrument among a plurality of parts included in musical composition data, so as to generate the learned model that outputs information for selecting a part to be performed with the specific type of musical instrument from among a plurality of parts included in musical composition data different from any item of the plurality of items of musical composition data.
  17.     A method of part determination for determining a part to be performed with a specific type of musical instrument from among a plurality of parts included in musical composition data by using a learned model, executed by a processor, the method comprising:
        extracting a feature value for each of a plurality of parts included in musical composition data, the feature value relating to suitability of a performance with a specific type of musical instrument,
        inputting the extracted feature value into the learned model,
        obtaining information for selecting a part to be performed with the specific type of musical instrument from among the plurality of parts included in the musical configuration data, and
        determining, based on the obtained information, the part to be performed by the specific type of musical instrument from among the plurality of parts.
PCT/JP2019/010066 2018-03-14 2019-03-12 Machine learning method, electronic apparatus, electronic musical instrument, model generator for part selection, and method of part determination WO2019176954A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2018-046692 2018-03-14
JP2018046692A JP6617784B2 (en) 2018-03-14 2018-03-14 Electronic device, information processing method, and program

Publications (1)

Publication Number Publication Date
WO2019176954A1 true WO2019176954A1 (en) 2019-09-19

Family

ID=67907943

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2019/010066 WO2019176954A1 (en) 2018-03-14 2019-03-12 Machine learning method, electronic apparatus, electronic musical instrument, model generator for part selection, and method of part determination

Country Status (2)

Country Link
JP (1) JP6617784B2 (en)
WO (1) WO2019176954A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114730550A (en) * 2019-11-26 2022-07-08 索尼集团公司 Information processing apparatus, information processing method, and information processing program
CN113780811B (en) * 2021-09-10 2023-12-26 平安科技(深圳)有限公司 Musical instrument performance evaluation method, device, equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0660949U (en) * 1993-01-29 1994-08-23 横河電機株式会社 Relay contact protection circuit
JPH10124078A (en) * 1996-10-24 1998-05-15 Yamaha Corp Method and device for playing data generation
JP2003223165A (en) * 2002-01-29 2003-08-08 Yamaha Corp Musical score display device and electronic instrument
JP2003280651A (en) * 2002-03-22 2003-10-02 Yamaha Corp Melody retrieving device
JP2005284076A (en) * 2004-03-30 2005-10-13 Kawai Musical Instr Mfg Co Ltd Electronic musical instrument
JP2012220653A (en) * 2011-04-07 2012-11-12 Panasonic Corp Change-responsive preference estimation device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0660949U (en) * 1993-01-29 1994-08-23 横河電機株式会社 Relay contact protection circuit
JPH10124078A (en) * 1996-10-24 1998-05-15 Yamaha Corp Method and device for playing data generation
JP2003223165A (en) * 2002-01-29 2003-08-08 Yamaha Corp Musical score display device and electronic instrument
JP2003280651A (en) * 2002-03-22 2003-10-02 Yamaha Corp Melody retrieving device
JP2005284076A (en) * 2004-03-30 2005-10-13 Kawai Musical Instr Mfg Co Ltd Electronic musical instrument
JP2012220653A (en) * 2011-04-07 2012-11-12 Panasonic Corp Change-responsive preference estimation device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
KITAHARA, TETSURO ET AL.: "Instrument Identification in Polyphonic Music: Feature Weighting Based on Mixed-Sound Template and Use of Musical Context", THE IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, December 2006 (2006-12-01), pages 2721 - 2733, ISSN: 1880-4535 *
TANIGUCHI, TORU ET AL.: "Discrimination of speech, musical instruments and singing voices using the patterns of F0 and harmonics", THE 2004 SPRING MEETING OF THE ACOUSTIC SOCIETY OF JAPAN, March 2004 (2004-03-01), pages 589 - 590, XP055638550, ISSN: 1340-3168 *

Also Published As

Publication number Publication date
JP2019159146A (en) 2019-09-19
JP6617784B2 (en) 2019-12-11

Similar Documents

Publication Publication Date Title
JP2020003537A (en) Audio extraction device, learning device, karaoke device, audio extraction method, learning method and program
JP6617783B2 (en) Information processing method, electronic device, and program
JP6004358B1 (en) Speech synthesis apparatus and speech synthesis method
JP7367641B2 (en) Electronic musical instruments, methods and programs
JP7298115B2 (en) Program, information processing method, and electronic device
JP2022116335A (en) Electronic musical instrument, method, and program
US20220238088A1 (en) Electronic musical instrument, control method for electronic musical instrument, and storage medium
JP7180587B2 (en) Electronic musical instrument, method and program
WO2019022118A1 (en) Information processing method
JP2020003536A (en) Learning device, automatic music transcription device, learning method, automatic music transcription method and program
JP2022044938A (en) Electronic musical instrument, method, and program
WO2019167719A1 (en) Information processing method and device for processing music performance
US10298192B2 (en) Sound processing device and sound processing method
WO2019176954A1 (en) Machine learning method, electronic apparatus, electronic musical instrument, model generator for part selection, and method of part determination
JP7327497B2 (en) Performance analysis method, performance analysis device and program
US20230351989A1 (en) Information processing system, electronic musical instrument, and information processing method
JP2014174205A (en) Musical sound information processing device and program
JP6288197B2 (en) Evaluation apparatus and program
JP2016206496A (en) Controller, synthetic singing sound creation device and program
CN110959172B (en) Performance analysis method, performance analysis device, and storage medium
JP2020021098A (en) Information processing equipment, electronic apparatus, and program
CN112992110B (en) Audio processing method, device, computing equipment and medium
US20240087552A1 (en) Sound generation method and sound generation device using a machine learning model
TW201946681A (en) Method for generating customized hit-timing list of music game automatically, non-transitory computer readable medium, computer program product and system of music game
JP7552740B2 (en) Acoustic analysis system, electronic musical instrument, and acoustic analysis method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19767556

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19767556

Country of ref document: EP

Kind code of ref document: A1