WO2019232928A1 - 音乐模型训练、音乐创作方法、装置、终端及存储介质 - Google Patents

音乐模型训练、音乐创作方法、装置、终端及存储介质 Download PDF

Info

Publication number
WO2019232928A1
WO2019232928A1 PCT/CN2018/100333 CN2018100333W WO2019232928A1 WO 2019232928 A1 WO2019232928 A1 WO 2019232928A1 CN 2018100333 W CN2018100333 W CN 2018100333W WO 2019232928 A1 WO2019232928 A1 WO 2019232928A1
Authority
WO
WIPO (PCT)
Prior art keywords
midi
music
feature vector
score
music score
Prior art date
Application number
PCT/CN2018/100333
Other languages
English (en)
French (fr)
Inventor
王义文
刘奡智
王健宗
肖京
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2019232928A1 publication Critical patent/WO2019232928A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H7/00Instruments in which the tones are synthesised from a data store, e.g. computer organs
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/101Music Composition or musical creation; Tools or processes therefor
    • G10H2210/111Automatic composing, i.e. using predefined musical rules
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2240/00Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
    • G10H2240/011Files or data streams containing coded musical information, e.g. for transmission
    • G10H2240/016File editing, i.e. modifying musical data files or streams as such
    • G10H2240/021File editing, i.e. modifying musical data files or streams as such for MIDI-like files or data streams

Definitions

  • the present application relates to the field of music technology, and in particular, to a music model training, music creation method, device, terminal, and storage medium.
  • a range of signal processing tools are commonly used to process audio signals. This includes processing individual audio signals, such as the mix performed by the master; and processing and combining multiple audio signals created by different sound sources (eg, component instruments within the ensemble).
  • the goal of the processing is to improve the aesthetic characteristics of the resulting audio signal, such as to create a high-quality mix when combining multiple signals; or to adhere to some functional constraints related to transmission, such as to minimize Data compression signals degrade or mitigate the effects of background noise on the aircraft.
  • this work is done manually by audio technicians who typically specialize in creating specific areas, which is very labor-intensive.
  • a first aspect of the present application provides a music model training method, which includes:
  • MIDI music data set includes a plurality of MIDI music scores
  • the feature vector is input to a structured support vector machine for training to obtain a music model, including: constructing a discriminant function f (x; w), where x is a feature vector and w is a parameter vector, and the discriminant function f (x; w) data value Output as predicted value; according to preset loss function Calculate the predicted value and the true value, where P is the probability distribution of the data, and the empirical risk calculated using the training sample data Instead; use SVM's optimization formula Solving unique parameter vector ⁇ makes empirical risk through training sample data Is zero; the solution is a discriminant function f (x; ⁇ ), and finally a music time series is output.
  • a second aspect of the present application provides a music creation method, which includes:
  • MIDI scores composed of several MIDI notes created by the user as the MIDI scores to be created
  • a third aspect of the present application provides a music model training device, where the device includes:
  • An acquisition module for acquiring a MIDI music data set including a plurality of MIDI music scores; an extraction module for extracting a feature vector of each MIDI music score; a training module for inputting the feature vector into a structure
  • the training is performed in a modified support vector machine to obtain a music model, including: constructing a discriminant function f (x; w), where x is a feature vector and w is a parameter vector, which will maximize the data value of the discriminant function f (x; w).
  • a fourth aspect of the present application provides a music creation device, where the device includes:
  • An acquisition module for collecting a MIDI music score containing several MIDI notes created by a user as a MIDI music score to be created; a first extraction module for extracting a pitch sequence of the MIDI music score to be created as a third feature vector; A second extraction module for extracting the time series of the MIDI music score to be created as a fourth feature vector; a connection module for connecting the third feature vector and the fourth feature vector to obtain a MIDI music score Feature vector; a learning module for inputting the feature vector into a pre-trained music model for learning, wherein the music model is obtained by training using the music model training device; an output module for outputting a corresponding MIDI sheet music.
  • a third aspect of the present application provides a terminal.
  • the terminal includes a processor and a memory.
  • the processor is configured to implement the music model training method and / or the music creation method when executing computer-readable instructions stored in the memory. .
  • a fourth aspect of the present application provides a non-volatile readable storage medium, where computer-readable instructions are stored on the non-volatile readable storage medium, and the computer-readable instructions are implemented when executed by a processor.
  • Music model training method and / or music creation method are implemented when executed by a processor.
  • This application is the first column of using artificial intelligence to train music models.
  • the trained music models can significantly improve the feature extraction capabilities of MIDI music scores.
  • Using a trained music model only a few MIDI notes need to be collected to create MIDI music greatly reduces the creation cost of MIDI music, saves the various expenses of a large number of band performers, shortens the working time in the recording studio, and improves work efficiency.
  • FIG. 1 is a flowchart of a music model training method provided in Embodiment 1 of the present application.
  • FIG. 2 is a flowchart of a music composition method provided in Embodiment 2 of the present application.
  • FIG. 3 is a functional module diagram of a music model training device provided in Embodiment 3 of the present application.
  • FIG. 4 is a functional module diagram of a music creation device according to a fourth embodiment of the present application.
  • FIG. 5 is a schematic diagram of a terminal provided in Embodiment 5 of the present application.
  • the music model training method and / or the music creation method in the embodiments of the present application are applied to one or more terminals.
  • the music model training method and / or music creation method may also be applied to a hardware environment composed of a terminal and a server connected to the terminal through a network.
  • the music model training method and / or the music creation method in the embodiments of the present application may be executed by a server or a terminal; and may also be executed jointly by the server and the terminal.
  • the music model training method and / or music creation function provided by the method of the present application may be directly integrated on the terminal, or a device for implementing the present application may be installed. Client of the method.
  • the method provided in this application may also be run on a device such as a server in the form of a Software Development Kit (SDK), and provide a music model training method and / or an interface of a music creation function in the form of an SDK.
  • SDK Software Development Kit
  • the terminal or other equipment can implement the training of the music model and / or create music through the provided interface.
  • FIG. 1 is a flowchart of a music model training method provided in Embodiment 1 of the present application. According to different requirements, the execution order in this flowchart can be changed, and some steps can be omitted.
  • MIDI Musical Instrument Digital Interface
  • MIDI is the most widely used music standard format in the composer industry. It can be called "a musical score that a computer can understand”. It uses digital control signals of notes to record music. MIDI transmits not sound signals, but instructions such as notes and control parameters. It instructs the MIDI device what to do and how to do it, such as which note to play, how loud, what tone ends, what accompaniment, etc. That is, the MIDI data includes: a MIDI channel, and information such as the time, pitch, velocity, volume, reverberation, etc. of a certain instrument is sent to a sounding device such as a MIDI synthesizer.
  • acquiring the MIDI music data set may include:
  • the NineS data set contains nine difficulty levels, each difficulty level has data for 50 MIDI scores.
  • the FourS data set contains 400 difficulty MIDI piano score file data sets of four difficulty levels, each difficulty level has 100 MIDI score data.
  • the NineS data set and the FourS data set are both professional MIDI music data sets, and the MIDI music data sets are dedicated to collecting MIDI music.
  • the extracting the feature vector of each MIDI score includes:
  • the piano includes black and white keys, a total of 88 keys, each key represents a different pitch, so a 88-bit vector can be used to represent the pitch.
  • a first mark may be set in advance to mark the pitch when a certain key is pressed, and a second mark may be set in advance to mark the pitch when a certain key is not pressed.
  • the first identifier may be 1, and the second identifier may be 0, that is, the pitch sequence of the MIDI may be marked by using a 0-1 notation method.
  • the pitch sequence of the i-th key is pressed at a certain moment. for
  • the timing sequence refers to the time that a key is continuously pressed after a certain key is pressed at a certain moment, and can be represented by T.
  • the timing sequence of the third key is 2.
  • the first feature vector and the second feature vector may be sequentially connected, for example, the obtained feature vector of the MIDI score is recorded
  • the first feature vector and the second feature vector may also be cross-connected.
  • the obtained feature vector of the MIDI score is recorded as
  • the extracting the timing sequence of the MIDI music score as the second feature vector may further include:
  • the greatest common divisor of all time series is unit time. If the duration of a key is K times the unit time, then the time series of this key is recorded as K, and the pitch of the key is repeated K times during performance to represent this. The duration of the pitch value.
  • the unit duration may also be an arbitrary number set in advance, for example, 3 seconds.
  • the feature vector is input to a structured support vector machine for training to obtain a music model.
  • the structured support vector machine can construct a suitable structured feature function ⁇ (x, y) according to the structural features inside the music data.
  • the goal of the algorithm is to find a discriminant function f (x; ⁇ ) for prediction. After the discriminant function is determined, given the input value x of the music data, choose the function that maximizes the discriminant function f (x ; ⁇ ) as the output.
  • is a parameter vector, assuming that F has a linear relationship with the comprehensive features of input and output as shown in (1-2):
  • the structured feature function ⁇ (x, y) is the feature representation of the input and output.
  • a loss function ⁇ : y ⁇ y ⁇ R needs to be designed.
  • the loss function will be smaller.
  • the loss function will become larger
  • the total loss function can be defined as shown in (1-3):
  • P is the probability distribution of the data, and the empirical risk that needs to be calculated using training sample data
  • the performance of the discriminant function f (x; ⁇ ) can be measured by a loss function. Different f (x; ⁇ ) correspond to different loss functions. In the training process of this algorithm, the smaller the empirical loss function, the better.
  • ⁇ i (y) ⁇ (x i , y i ) - ⁇ (x i , y).
  • the method may further include:
  • the acquired MIDI music data set is divided into a first data set and a second data set.
  • the idea of Cross Validation can be used when training a music model, and the obtained MIDI music data set is divided into a first data set and a second data set according to an appropriate ratio, and the appropriate division For example 7: 3.
  • the first data set is used to train the music model
  • the second data set is used to test the performance of the trained music model. If the accuracy of the test is higher, it indicates that the performance of the trained music model is better. ; If the accuracy of the test is low, it indicates that the performance of the trained music model is poor.
  • the method may further include: randomly selecting a first preset number of data sets in the generated first data set to participate in training of the music model.
  • a random number generation algorithm may be used for random selection.
  • the first preset number may be a preset fixed value, for example, 40, that is, 40 MIDI scores are randomly selected from the generated first data set to participate in the training of the music model.
  • the first preset number may also be a preset ratio value, for example, 1/10, that is, randomly select a 1/10 ratio sample from the generated first data set to participate in the training of the music model.
  • the method may further include:
  • the performance of the trained structured SVM music model is verified.
  • the verification of the performance of the structured SVM-based music model based on the selected MIDI music score and the output MIDI music score includes:
  • the method may further include: analyzing the music elements of the input MIDI music to obtain statistics based on the music elements of the MIDI music Analyze results; input the statistical analysis results into a structured support vector machine for training.
  • the analysis of the music elements includes: structural analysis, soundtrack analysis, timbre analysis, rhythm analysis or tempo analysis.
  • the chord progression is obtained through methods such as chord recognition, so that the structural analysis of MIDI music can be performed.
  • the structural analysis includes: passage analysis, phrase analysis, chord analysis, bar analysis, and note analysis.
  • the passage consists of a number of bars that sound pleasant and smooth, in line with the theory of music chord progression; bars are composed of notes.
  • MIDI files include channel events. From the channel events, you can get information such as pitch information, time information, tone color information, velocity information, expression information, pitch bend wheel or modulation wheel information, breathing controller information, volume controller information, sound field controller information, etc., from This information can be used for track analysis and sound analysis.
  • the audio track analysis includes: drum track analysis, background track analysis, accompaniment track analysis, and melody track analysis. Some information on the rhythm can be roughly obtained from the note distribution and volume distribution of the drum track for rhythm analysis.
  • the MIDI file also includes some additional events, such as lyrics, marks, track names, key signatures, time signatures, tempo values, etc. From these event information, you can get information such as tempo and tune for tempo analysis.
  • a statistical analysis result of the MIDI music based on the music element can be obtained.
  • This application analyzes the music elements of the input MIDI music, and obtains the statistical analysis results based on the music elements of the MIDI music, which will not be described in detail.
  • the music model training method described in this application uses artificial intelligence to train the first row of music models.
  • the structured SVM is used to train the music model.
  • the trained music model can significantly improve the feature extraction capability of MIDI scores.
  • FIG. 2 is a flowchart of a music composition method provided in Embodiment 2 of the present application. According to different requirements, the execution order in this flowchart can be changed, and some steps can be omitted.
  • the user can play a few notes on the piano at will, and when the user plays a few notes, he can stop playing. At this time, collect the several notes and input the several notes into a pre-trained music model. You can play a complete piece of music on your own.
  • the method of sequential connection is also used to sequentially connect the third feature vector and the fourth feature vector.
  • a cross-connection method is also used to sequentially connect the third feature vector and the fourth feature vector.
  • the feature vector is input into a pre-trained music model for learning.
  • the well-structured SVM-based music model has the function of memorizing MIDI scores
  • the feature vector of a MIDI score containing several MIDI notes is input into a pre-trained music model, and the model can automatically output the corresponding MIDI score .
  • the music creation method described in this application uses a pre-trained music model, which can greatly reduce the creation cost of MIDI music, save various expenses for a large number of band performers, shorten the working time in the recording studio, and improve work efficiency. .
  • FIG. 3 is a functional module diagram of a preferred embodiment of the music model training device of the present application.
  • the music model training device 30 runs in a terminal.
  • the music model training device 30 may include a plurality of functional modules composed of program code segments.
  • the program code of each program segment in the music model training device 30 may be stored in a memory and executed by at least one processor to perform (see FIG. 1 and related description for details) training of a music model.
  • the music model training device 30 of the terminal may be divided into a plurality of functional modules according to functions performed by the device.
  • the functional modules may include: an acquisition module 301, an extraction module 302, a training module 303, and a verification module 304.
  • the module referred to in the present application refers to a series of computer-readable instruction segments capable of being executed by at least one processor and capable of performing fixed functions, which are stored in a memory. In some embodiments, functions of each module will be described in detail in subsequent embodiments.
  • the obtaining module 301 is configured to obtain a MIDI music data set, where the MIDI music data set includes multiple MIDI music scores.
  • MIDI Musical Instrument Digital Interface
  • MIDI is the most widely used music standard format in the composer industry. It can be called "a musical score that a computer can understand”. It uses digital control signals of notes to record music. MIDI transmits not sound signals, but instructions such as notes and control parameters. It instructs the MIDI device what to do and how to do it, such as which note to play, how loud, what tone ends, what accompaniment, etc. That is, the MIDI data includes: a MIDI channel, and information such as the time, pitch, velocity, volume, reverberation, etc. of a certain instrument is sent to a sounding device such as a MIDI synthesizer.
  • acquiring the MIDI music data set may include:
  • the NineS data set contains nine difficulty levels, each difficulty level has data for 50 MIDI scores.
  • the FourS data set contains 400 difficulty MIDI piano score file data sets of four difficulty levels, each difficulty level has 100 MIDI score data.
  • An extraction module 302 is configured to extract a feature vector of each MIDI music score.
  • the extraction module 302 extracts a feature vector of each MIDI score includes:
  • the piano includes black and white keys, a total of 88 keys, each key represents a different pitch, so a 88-bit vector can be used to represent the pitch.
  • a first mark may be set in advance to mark the pitch when a certain key is pressed, and a second mark may be set in advance to mark the pitch when a certain key is not pressed.
  • the first identifier may be 1, and the second identifier may be 0, that is, the pitch sequence of the MIDI may be marked by using a 0-1 notation method.
  • the pitch sequence of the i-th key is pressed at a certain moment. for
  • the timing sequence refers to the time that a key is continuously pressed after a certain key is pressed at a certain moment, and can be represented by T.
  • the timing sequence of the third key is 2.
  • the first feature vector and the second feature vector may be sequentially connected, for example, the obtained feature vector of the MIDI score is recorded
  • the first feature vector and the second feature vector may also be cross-connected.
  • the obtained feature vector of the MIDI score is recorded as
  • the extracting the timing sequence of the MIDI music score as the second feature vector may further include:
  • the greatest common divisor of all time series is unit time. If the duration of a key is K times the unit time, then the time series of this key is recorded as K, and the pitch of the key is repeated K times during performance to represent this. The duration of the pitch value.
  • the unit duration may also be an arbitrary number set in advance, for example, 3 seconds.
  • a training module 303 is configured to input the feature vector into a structured support vector machine for training to obtain a music model.
  • the structured support vector machine can construct a suitable structured feature function ⁇ (x, y) according to the structural features inside the music data.
  • the goal of the algorithm is to find a discriminant function f (x; ⁇ ) for prediction. After the discriminant function is determined, given the input value x of the music data, choose the function that maximizes the discriminant function f (x ; ⁇ ) as the output.
  • is a parameter vector, assuming that F has a linear relationship with the comprehensive features of input and output as shown in (1-2):
  • the structured feature function ⁇ (x, y) is the feature representation of the input and output.
  • a loss function ⁇ : y ⁇ y ⁇ R needs to be designed.
  • the loss function will be smaller.
  • the loss function will become larger
  • the total loss function can be defined as shown in (1-3):
  • P is the probability distribution of the data, and the empirical risk that needs to be calculated using the training sample data
  • the performance of the discriminant function f (x; ⁇ ) can be measured by a loss function. Different f (x; ⁇ ) correspond to different loss functions. In the training process of this algorithm, the smaller the empirical loss function, the better.
  • ⁇ i (y) ⁇ (x i , y i ) - ⁇ (x i , y).
  • the obtaining module 301 is further configured to:
  • the acquired MIDI music data set is divided into a first data set and a second data set.
  • the idea of Cross Validation can be used when training a music model, and the obtained MIDI music data set is divided into a first data set and a second data set according to an appropriate ratio, and the appropriate division For example 7: 3.
  • the first data set is used to train the music model
  • the second data set is used to test the performance of the trained music model. If the accuracy of the test is higher, it indicates that the performance of the trained music model is better. ; If the accuracy of the test is low, it indicates that the performance of the trained music model is poor.
  • the obtaining module 301 is further configured to randomly select a first preset number of data sets in the generated first data set to participate in the music model. training.
  • a random number generation algorithm may be used for random selection.
  • the first preset number may be a preset fixed value, for example, 40, that is, 40 MIDI scores are randomly selected from the generated first data set to participate in the training of the music model.
  • the first preset number may also be a preset ratio value, for example, 1/10, that is, randomly select a 1/10 ratio sample from the generated first data set to participate in the training of the music model.
  • the music model training device 30 may further include a verification module 304 for:
  • the performance of the trained structured SVM music model is verified.
  • the verification module 304 verifies the performance of the trained structured SVM music model including:
  • the music model training device 30 may further include: analyzing the input MIDI music for music elements to obtain that the MIDI music is based on Statistical analysis results of music elements; input the statistical analysis results into a structured support vector machine for training.
  • the analysis of the music elements includes: structural analysis, soundtrack analysis, timbre analysis, rhythm analysis or tempo analysis.
  • the chord progression is obtained through methods such as chord recognition, so that the structural analysis of MIDI music can be performed.
  • the structural analysis includes: passage analysis, phrase analysis, chord analysis, bar analysis, and note analysis.
  • the passage consists of a number of bars that sound pleasant and smooth, in line with the theory of music chord progression; bars are composed of notes.
  • MIDI files include channel events. From the channel events, you can get information such as pitch information, time information, tone color information, velocity information, expression information, pitch bend wheel or modulation wheel information, breathing controller information, volume controller information, sound field controller information, etc., from This information can be used for track analysis and sound analysis.
  • the audio track analysis includes: drum track analysis, background track analysis, accompaniment track analysis, and melody track analysis. Some information on the rhythm can be roughly obtained from the note distribution and volume distribution of the drum track for rhythm analysis.
  • the MIDI file also includes some additional events, such as lyrics, marks, track names, key signatures, time signatures, tempo values, etc. From these event information, you can get information such as tempo and tune for tempo analysis.
  • a statistical analysis result of the MIDI music based on the music element can be obtained.
  • This application analyzes the music elements of the input MIDI music, and obtains the statistical analysis results based on the music elements of the MIDI music, which will not be described in detail.
  • the music model training device described in the present application is the first column that uses artificial intelligence to train music models.
  • the structured SVM is used to train the music model.
  • the trained music model can significantly improve the feature extraction capability of MIDI scores.
  • FIG. 4 is a functional module diagram of a preferred embodiment of the music creation device of the present application.
  • the music creation device 40 runs in a terminal.
  • the music creation device 40 may include a plurality of functional modules composed of program code segments.
  • the program code of each program segment in the music creation device 40 may be stored in a memory and executed by at least one processor to perform (see FIG. 2 and related description for details) the creation of music.
  • the music creation device 40 of the terminal may be divided into a plurality of function modules according to functions performed by the terminal.
  • the functional modules may include: a collection module 401, a first extraction module 402, a second extraction module 403, a connection module 404, a learning module 405, and an output module 406.
  • the module referred to in the present application refers to a series of computer-readable instruction segments capable of being executed by at least one processor and capable of performing fixed functions, which are stored in a memory. In some embodiments, functions of each module will be described in detail in subsequent embodiments.
  • the collection module 401 is configured to collect a MIDI music score composed of several MIDI notes created by a user, as a MIDI music score to be created.
  • the user can play a few notes on the piano at will, and when the user plays a few notes, he can stop playing. At this time, collect the several notes and input the several notes into a pre-trained music model. You can play a complete piece of music on your own.
  • the first extraction module 402 is configured to extract a pitch sequence of the MIDI music score to be created as a third feature vector.
  • the second extraction module 403 is configured to extract a time series of the MIDI music score to be created as a fourth feature vector.
  • a connecting module 404 is configured to connect the third feature vector and the fourth feature vector to obtain a feature vector of a MIDI music score.
  • the method of sequential connection is also used to sequentially connect the third feature vector and the fourth feature vector.
  • a cross-connection method is also used to sequentially connect the third feature vector and the fourth feature vector.
  • a learning module 405 is configured to input the feature vector into a pre-trained music model for learning.
  • the well-structured SVM-based music model has the function of memorizing MIDI scores
  • the feature vector of a MIDI score containing several MIDI notes is input into a pre-trained music model, and the model can automatically output the corresponding MIDI score .
  • An output module 406 is configured to output a corresponding MIDI music score.
  • the music creation device described in this application uses a pre-trained music model, which can greatly reduce the creation cost of MIDI music, save various expenses of a large number of band performers, shorten the working time in the recording studio, and improve work efficiency. .
  • the above integrated unit implemented in the form of a software functional module may be stored in a non-volatile readable storage medium.
  • the above software function module is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a dual-screen device, or a network device) or a processor to execute the embodiments described in this application. Part of the method.
  • FIG. 5 is a schematic diagram of a terminal provided in Embodiment 5 of the present application.
  • the terminal 5 includes: a memory 51, at least one processor 52, computer-readable instructions 53 stored in the memory 51 and executable on the at least one processor 52, and at least one communication bus 54.
  • the at least one processor 52 executes the computer-readable instructions 53, the steps in the embodiment of the music model training method and / or the music creation method described above are implemented.
  • the computer-readable instructions 53 may be divided into one or more modules / units, and the one or more modules / units are stored in the memory 51 and processed by the at least one processor 52 Execute to complete this application.
  • the one or more modules / units may be a series of computer-readable instruction segments capable of performing specific functions, and the instruction segments are used to describe the execution process of the computer-readable instructions 53 in the terminal 5.
  • the terminal 5 may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server.
  • a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server.
  • the schematic diagram 5 is only an example of the terminal 5 and does not constitute a limitation on the terminal 5. It may include more or fewer components than shown in the figure, or combine some components or different components.
  • the terminal 5 may further include an input / output device, a network access device, a bus, and the like.
  • the at least one processor 52 may be a central processing unit (CPU), or other general-purpose processors, digital signal processors (DSPs), and application-specific integrated circuits (ASICs). ), Ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • the processor 52 may be a microprocessor or the processor 52 may be any conventional processor, etc.
  • the processor 52 is a control center of the terminal 5 and connects various terminals of the entire terminal 5 by using various interfaces and lines. section.
  • the memory 51 may be configured to store the computer-readable instructions 53 and / or modules / units, and the processor 52 may execute or execute the computer-readable instructions and / or modules / units stored in the memory 51, and
  • the data stored in the memory 51 is called to implement various functions of the terminal 5.
  • the memory 51 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, at least one application required by a function (such as a sound playback function, an image playback function, etc.), etc .; the storage data area may Data (such as audio data, phonebook, etc.) created according to the use of the terminal 5 are stored.
  • the memory 51 may include a high-speed random access memory, and may also include a non-volatile memory, such as a hard disk, an internal memory, a plug-in hard disk, a Smart Media Card (SMC), and a Secure Digital (SD). Card, flash memory card (Flash card), at least one disk storage device, flash memory device, or other volatile solid-state storage device.
  • a non-volatile memory such as a hard disk, an internal memory, a plug-in hard disk, a Smart Media Card (SMC), and a Secure Digital (SD).
  • SSD Secure Digital
  • flash memory card Flash card
  • flash memory device at least one disk storage device, flash memory device, or other volatile solid-state storage device.
  • the modules / units integrated in the terminal 5 When the modules / units integrated in the terminal 5 are implemented in the form of software functional units and sold or used as independent products, they can be stored in a non-volatile readable storage medium. Based on this understanding, this application implements all or part of the processes in the methods of the above embodiments, and can also be completed by computer-readable instructions to instruct related hardware.
  • the computer-readable instructions can be stored in a non-volatile memory. In the read storage medium, when the computer-readable instructions are executed by a processor, the steps of the foregoing method embodiments can be implemented.
  • the computer-readable instructions include computer-readable instruction codes, and the computer-readable instruction codes may be in a source code form, an object code form, an executable file, or some intermediate form.
  • the non-volatile readable medium may include: any entity or device capable of carrying the computer-readable instruction code, a recording medium, a U disk, a mobile hard disk, a magnetic disk, an optical disk, a computer memory, a read-only memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), electric carrier signals, telecommunication signals, and software distribution media.
  • ROM Read-Only Memory
  • RAM Random Access Memory
  • electric carrier signals telecommunication signals
  • telecommunication signals and software distribution media.
  • the content contained in the non-volatile readable medium may be appropriately increased or decreased according to the requirements of legislation and patent practices in the jurisdictions. For example, in some jurisdictions, according to legislation and patent practices, non- Volatile readable media does not include electrical carrier signals and telecommunication signals.
  • each functional unit in each embodiment of the present application may be integrated in the same processing unit, or each unit may exist separately physically, or two or more units may be integrated in the same unit.
  • the integrated unit can be implemented in the form of hardware, or in the form of hardware plus software functional modules.

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Auxiliary Devices For Music (AREA)
  • Electrophonic Musical Instruments (AREA)

Abstract

一种音乐模型训练方法,包括:获取MIDI音乐数据集,所述MIDI音乐数据集包括多个MIDI乐谱(S11);提取每一个MIDI乐谱的特征向量(S12);将特征向量输入至结构化支持向量机中进行训练得到音乐模型(S13),包括:构造判别函数f(x;w),x是特征向量,w是参数向量,将最大化判别函数f(x;w)的数据值式(I)作为预测值进行输出;根据预设损失函数式(II)对预测值与真实值进行计算,其中,P为数据的概率分布,用训练样本数据计算得到的经验风险式(III)代替;使用SVM的优化公式(IV)求解唯一参数向量ω使得通过训练样本数据的经验风险式(III)为零;解得判别函数f(x;ω),最后输出音乐时间序列。还提供了音乐创作方法、装置、终端及存储介质。将人工智能用于训练音乐模型的首列,训练出的音乐模型能够提高MIDI乐谱的特征提取能力。

Description

音乐模型训练、音乐创作方法、装置、终端及存储介质
本申请要求于2018年6月5日提交中国专利局,申请号为201810570846.7发明名称为“音乐模型训练、音乐创作方法、装置、终端及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及音乐技术领域,具体涉及一种音乐模型训练、音乐创作方法、装置、终端及存储介质。
背景技术
在音频创作的所有的领域(例如,录音棚录音、现场表演、广播),通常使用一系列信号处理工具来处理音频信号。这包括处理单独音频信号,例如主控完成的混音;以及处理和组合由不同的声源(例如,合奏内的组件乐器)创作的多个音频信号。所述处理的目标是改善所得的音频信号的审美特性,例如以便在组合多个信号时创作高质量的混音;或者粘附到与传输相关的一些功能约束,例如以便最小化由于诸如mp3的数据压缩的信号劣化,或者减轻飞机上的背景噪音的影响。目前,这一工作由通常专门从事创作的特定区域的音频技术人员手动完成,非常耗费人力。
发明内容
鉴于以上内容,有必要提出一种音乐模型训练及/或音乐创作方法、装置、终端及存储介质,能够实现用户在钢琴上弹几个音,即可编写出完整丰富的曲子,并进行演奏,省时省力且不需专门从事创造的技术人员参阅,节约成本。
本申请的第一方面提供一种音乐模型训练方法,所述方法包括:
获取MIDI音乐数据集,所述MIDI音乐数据集包括多个MIDI乐谱;
提取每一个MIDI乐谱的特征向量;
将所述特征向量输入至结构化支持向量机中进行训练得到音乐模型,包括:构造判别函数f(x;w),x是特征向量,w是参数向量,将最大化判别函数f(x;w)的数据值 作为预测值进行输出;根据预设损失函数
Figure PCTCN2018100333-appb-000002
对所述预测值与真实值进行计算,其中,P为数据的概率分布,用训练样本数据计算得到的经验风险
Figure PCTCN2018100333-appb-000003
代替;使用SVM的优化公式
Figure PCTCN2018100333-appb-000004
求解唯一参数向量ω使得通过训练样本数据的经验风险
Figure PCTCN2018100333-appb-000005
为零;解得判别函数f(x;ω),最后输出音乐时间序列。
本申请的第二方面提供一种音乐创作方法,所述方法包括:
采集用户创作的包含几个MIDI音符的MIDI乐谱,作为待创作的MIDI乐谱;
提取所述待创作的MIDI乐谱的音高序列作为第三特征向量;
提取所述待创作的MIDI乐谱的时序序列作为第四特征向量;
将所述第三特征向量与所述第四特征向量进行连接,得到MIDI乐谱的特征向量;
将所述特征向量输入至预先训练好的音乐模型中进行学习,其中,所述音乐模型为采用所述音乐模型训练装置训练得到;
输出对应的MIDI乐谱。
本申请的第三方面提供一种音乐模型训练装置,所述装置包括:
获取模块,用于获取MIDI音乐数据集,所述MIDI音乐数据集包括多个MIDI乐谱;提取模块,用于提取每一个MIDI乐谱的特征向量;训练模块,用于将所述特征向量输入至结构化支持向量机中进行训练得到音乐模型,包括:构造判别函数f(x;w),x是特征向量,w是参数向量,将最大化判别函数f(x;w)的数据值
Figure PCTCN2018100333-appb-000006
作为预测值进行输出;根据预设损失函数
Figure PCTCN2018100333-appb-000007
对所述预测值与真实值进行计算,其中,P为数据的概率分布,用训练样本数据计算得到的经验风险
Figure PCTCN2018100333-appb-000008
代替;使用SVM的优化公式
Figure PCTCN2018100333-appb-000009
求解唯一参数向量ω使得通过训练样本数据的经验风险
Figure PCTCN2018100333-appb-000010
为零;解得判别函数f(x;ω),最后输出音乐时间序列。
本申请的第四方面提供一种音乐创作装置,所述装置包括:
采集模块,用于采集用户创作的包含几个MIDI音符的MIDI乐谱,作为待创作的MIDI乐谱;第一提取模块,用于提取所述待创作的MIDI乐谱的音高序列作为第三特征向量;第二提取模块,用于提取所述待创作的MIDI乐谱的时序序列作为第四特征向量;连接模块,用于将所述第三特征向量与所述第四特征向量进行连接,得到MIDI乐谱的特征向量;学习模块,用于将所述特征向量输入至预先训练好的音乐模型中进行学习,其中,所述音乐模型为采用所述音乐模型训练装置训练得到;输出模块,用于输出对应的MIDI乐谱。
本申请的第三方面提供一种终端,所述终端包括处理器和存储器,所述处理器用于执行所述存储器中存储的计算机可读指令时实现所述音乐模型训练方法及/或音乐创作方法。
本申请的第四方面提供一种非易失性可读存储介质,所述非易失性可读存储介质上存储有计算机可读指令,所述计算机可读指令被处理器执行时实现所述音乐模型训练方法及/或音乐创作方法。
本申请是将人工智能用于训练音乐模型的首列,训练出的音乐模型能够显著的提高MIDI乐谱的特征提取能力;采用训练好的音乐模型,仅需采集几个MIDI音符,即可创作出MIDI音乐,大大降低了MIDI乐曲的创作成本,节省了大量乐队演奏员的各项开支,缩短了在录音棚的工作时间,提高了工作效率。
附图说明
图1是本申请实施例一提供的音乐模型训练方法的流程图。
图2是本申请实施例二提供的音乐创作方法的流程图。
图3是本申请实施例三提供的音乐模型训练装置的功能模块图。
图4是本申请实施例四提供的音乐创作装置的功能模块图。
图5是本申请实施例五提供的终端的示意图。
如下具体实施方式将结合上述附图进一步说明本申请。
具体实施方式
本申请实施例的音乐模型训练方法及/或音乐创作方法应用在一个或者多个终端中。所述音乐模型训练方法及/或音乐创作方法也可以应用于由终端和通过网络与所述终端进行连接的服务器所构成的硬件环境中。本申请实施例的音乐模型训练方法及/或音乐创作方法可以由服务器来执行,也可以由终端来执行;还可以是由服务器和终端共同执行。
所述对于需要进行音乐模型训练方法及/或音乐创作方法的终端,可以直接在终端上集成本申请的方法所提供的音乐模型训练方法及/或音乐创作功能,或者安装用于实现本申请的方法的客户端。再如,本申请所提供的方法还可以以软件开发工具包(Software Development Kit,SDK)的形式运行在服务器等设备上,以SDK的形式提供音乐模型训练方法及/或音乐创作功能的接口,终端或其他设备通过提供的接口即可实现对音乐模型的训练及/或创作音乐。
实施例一
图1是本申请实施例一提供的音乐模型训练方法的流程图。根据不同的需求,该流程图中的执行顺序可以改变,某些步骤可以省略。
S11、获取MIDI音乐数据集,所述MIDI音乐数据集包括多个MIDI乐谱。
MIDI(Musical Instrument Digital Interface,乐器设备数字接口)是编曲界最广泛的音乐标准格式,可称为“计算机能理解的乐谱”,是用音符的数字控制信号来记录音乐。MIDI传输的不是声音信号,而是音符、控制参数等指令,它指示MIDI设备要做什么,怎么做,如演奏哪个音符、多大音量、什么音调结束、加以什么伴奏等等。即,MIDI数据包括:MIDI通道、向MIDI合成器等发声装置发送某个乐器的时间、音高、力度、音量、混响等信息。
本实施例中,所述MIDI音乐数据集的获取可以包括:
1)从NineS数据集收集到,该NineS数据集包含九个难度等级,每个难度等级有50首MIDI乐谱的数据。
2)从FourS数据集收集到,该FourS数据集包含400首MIDI钢琴乐谱文件组成的四个难度等级的数据集,每一个难度等级有100首MIDI乐谱的数据。
NineS数据集及FourS数据集均为专业的MIDI音乐数据集,MIDI音乐数据集专用于收藏MIDI音乐。
S12、提取每一个MIDI乐谱的特征向量。
本实施例中,所述提取每一个MIDI乐谱的特征向量包括:
1)提取MIDI乐谱的音高序列作为第一特征向量;
钢琴里包括黑键和白键,共88个按键,每一个按键代表不同的音高,因而可以使用一个88位的向量来表示音高。
可以预先设置第一标识用以标记某一个按键被按下时的音高,预先设置第二标识用以标记某一个按键没有被按下时的音高。所述第一标识可以是1,所述第二标识可以是0,即可以采用0-1标记法来标记MIDI的音高序列,某一时刻第i个键被按下时的音高序列记为
Figure PCTCN2018100333-appb-000011
举例说明,钢琴里的第1个按键在第一时刻被按下时的音高为1,第一时刻其余87位按键的音高为0,则第一时刻的音乐序列记为P1=(1,0,0,0,...,0,0);钢琴里的第5个按键在第二时刻被按下时的音高为1,第二时刻其余87位按键的音高为0,则第二时刻的音乐序列记为P5=(0,0,0,0,1,...,0,0)。
2)提取MIDI乐谱的时序序列作为第二特征向量;
所述时序序列是指某一时刻某个按键被按下后持续被按压的时间,可以用T表示。
例如,第3个按键在某一时刻被持续按压了2秒,则第3个按键的时序序列为2。
3)将所述第一特征向量与所述第二特征向量进行连接,得到MIDI乐谱的特征向量。
可以将所述第一特征向量与所述第二特征向量顺序连接,例如,得到的MIDI乐谱的特征向量记为
Figure PCTCN2018100333-appb-000012
还可以将所述第一特征向量与所述第二特征向量交叉连接,例如,得到的MIDI乐谱的特征向量记为
Figure PCTCN2018100333-appb-000013
优选地,为了避免按键的时序序列过大,所述提取MIDI乐谱的时序序列作为第二特征向量还可以包括:
求解所有时序序列的最大公约数,作为单位时长;
计算每一个时序序列为所述单位时长的倍数,将所述倍数作为按键对应的时序序列。
例如,所有时序序列的最大公约数为单位时长,如果一个按键持续的时长是单位时长的K倍,那么这个按键的时序序列记为K,演奏时让按键的音高重复K遍,来表示这个音高值持续的时间。
在其他实施例中,所述单位时长还可以是预先设置的一个任意数,例如,3秒。
S13、将所述特征向量输入至结构化支持向量机中进行训练得到音乐模型。
与传统支持向量机(support vector machine,SVM)不同的是,结构化支持向量机能够根据音乐数据内部的结构性特征,来构造合适的结构化特征函数Ψ(x,y),从而能有效的处理复杂的结构化音乐数据,算法的目标是寻找一个用来预测的判别函数f(x;ω),在判别函数确定后,给定音乐数据输入值x, 选择能最大化判别函数f(x;ω)的数据值y作为输出。
具体实现过程如下:
1)输入所述特征向量X={M i|i=1,2,...N},每一个时间点都有两个数据向量:音高和时长。
2)构造判别函数f(x;w),将最大化判别函数f(x;w)的数据值y作为预测值进行输出。
用式子(1-1)所示:
Figure PCTCN2018100333-appb-000014
其中,ω是参数向量,假设F与输入、输出的综合特征具有线性关系如下(1-2)所示:
F(x,y;w)=<w,Ψ(x,y>     (1-2)
其中,结构化特征函数Ψ(x,y)是输入输出的特征表示。
3)根据预设损失函数对预测值与真实值进行计算。
为了量化预测的准确性,需要设计一个损失函数△:y×y→R,当预测值和真实值越相近,损失函数会越小,当预测值和真实值相差较大,损失函数会变大,总的损失函数可以定义如下(1-3)所示:
Figure PCTCN2018100333-appb-000015
其中,P是数据的概率分布,需要用训练样本数据计算的经验风险
Figure PCTCN2018100333-appb-000016
来代替,判别函数f(x;ω)的性能可以通过损失函数来度量,不同的f(x;ω)对应着不同的损失函数,在本算法训练过程中,经验损失函数越小越好。
4)计算所述参数向量ω。
计算参数向量ω使得通过训练样本数据的经验风险
Figure PCTCN2018100333-appb-000017
为零,并且条件如下(1-4)所示:
Figure PCTCN2018100333-appb-000018
将上面的n个非线性公式展开成n||y||-n个线性式子(1-5),
Figure PCTCN2018100333-appb-000019
其中,δΨ i(y)=Ψ(x i,y i)-Ψ(x i,y)。
5)求解唯一参数向量ω。
在上面的约束限制下,参数向量ω的解可能有多个,为了得到唯一的参数向量ω,接下来借助SVM的maximum-margin原则,把问题转换为SVM的优化问题,入下(1-6)所示:
Figure PCTCN2018100333-appb-000020
到这里,就可以求解优化问题,得到唯一参数向量ω,解得判别函数f(x;ω),最后输出音乐时间序列。
优选地,为了验证训练好的音乐模型的性能,所述方法还可以包括:
将所获取的MIDI音乐数据集分为第一数据集和第二数据集。
本较佳实施例中,训练音乐模型时可以采用交叉验证(Cross Validation)的思想,将所获取的MIDI音乐数据集按照合适的比例进行划分成第一数据集和第二数据集,合适的划分比例如7:3。
所述第一数据集用以训练音乐模型,所述第二数据集用以测试所训练出的音乐模型的性能,若测试的准确率越高,则表明所训练出的音乐模型的性能越好;若测试的准确率较低,则表明所训练出的音乐模型的性能较差。
进一步地,若划分出的第一数据集和第二数据集的总数量依旧较大,即将所有的第一数据集用来参与音乐模型的训练,将导致寻找音乐模型对应的最优参数向量ω代价较大。因而,在将所获取的MIDI音乐数据集分为第一数据集之后,所述方法还可以包括:在所生成的第一数据集中随机选择第一预设数量的数据集参与音乐模型的训练。
本较佳实施例中,为了增加参与训练的第一数据集的随机性,可以采用随机数生成算法进行随机选择。
本较佳实施例中,所述第一预设数量可以是一个预先设置的固定值,例如,40,即在所生成的第一数据集随机挑选出40个MIDI乐谱参与音乐模型的训练。所述第一预设数量还可以是一个预先设置的比例值,例如,1/10,即,即在所生成的第一数据集中随机挑选1/10比例的样本参与音乐模型的训练。
进一步地,为了验证训练的基于结构化SVM音乐模型的性能,所述方法还可以包括:
在所述第二数据集中随机挑选一首MIDI乐谱;
提取挑选出的MIDI乐谱的预设时间段内的乐谱的特征向量;
将所述预设时间段内的乐谱的特征向量输入至训练好的基于结构化SVM音乐模型中,输出对应的MIDI乐谱;
根据所述挑选出的MIDI乐谱与所述输出的MIDI乐谱,验证训练的基于结构化SVM音乐模型的性能。
所述根据所述挑选出的MIDI乐谱与所述输出的MIDI乐谱,验证训练的基于结构化SVM音乐模型的性能具体包括:
提取挑选出的MIDI乐谱的第一波形;
提取输出的MIDI乐谱的第二波形;
计算所述第一波形与所述第二波形的相似度;
判断所述相似度是否大于预设相似度阈值;
如果所述相似度大于或者等于所述预设相似度阈值时,则确定训练出的基于结构化SVM音乐模型的性能较优;
如果所述相似度小于所述预设相似度阈值时,则确定训练出的基于结构化SVM音乐模型的性能较差。
在一个可替代的实施例中,除了提取MIDI乐谱的音高和时序作为特征向量,所述方法还可以包括:对输入的MIDI音乐进行音乐元素的分析,得到所述MIDI音乐基于音乐元素的统计分析结果;将所述统计分析结果输入 至结构化支持向量机中进行训练。
所述音乐元素的分析包括:结构分析、音轨分析、音色分析、节奏分析或者速度分析。在MIDI音乐中,通过和弦识别等方法得到和弦进行式,从而可以进行MIDI音乐的结构分析。其中,所述结构分析包括:乐段分析、乐句分析、和弦分析、小节分析、音符分析。乐段由多个听起来悦耳、流畅,符合音乐和弦进行理论的小节组成;小节由音符组成。MIDI文件中,包括一些通道事件。从通道事件中可以得到比如音高信息、时值信息、音色信息、力度信息、表情信息、弯音轮或调制轮信息、呼吸控制器信息、音量控制器信息、声场控制器信息等等,从这些信息中可以进行音轨分析及音色分析。其中,所述音轨分析包括:鼓轨分析、背景轨分析、伴奏轨分析、旋律轨分析。从鼓轨的音符分布及音量大小分布可大致得到节奏上的一些信息,进行节奏分析。MIDI文件中,还包括一些附加事件,如歌词,标记,音轨名,调号,拍号,速度值等,从这些事件的信息中可以得到比如速度及曲调等信息,从而进行速度分析。按照上述方法对MIDI音乐的结构、音轨、音色、节奏或者速度进行分析,可以得到所述MIDI音乐基于该音乐元素的统计分析结果。
本申请对输入的MIDI音乐进行音乐元素的分析,得到所述MIDI音乐基于音乐元素的统计分析结果不再详细阐述。
本申请所述的音乐模型训练方法,是将人工智能用于训练音乐模型的首列,采用基于结构化的SVM训练音乐模型,训练出的音乐模型能够显著提高MIDI乐谱的特征提取能力。
实施例二
图2是本申请实施例二提供的音乐创作方法的流程图。根据不同的需求,该流程图中的执行顺序可以改变,某些步骤可以省略。
S21、采集用户创作的包含几个MIDI音符的MIDI乐谱,作为待创作的MIDI乐谱。
用户可以在钢琴上随意弹奏几个音符,当用户弹奏几个音符后可以停止弹奏,此时采集所述几个音符,将所述几个音符输入至预先训练好的音乐模型中,即可自行演奏出一首完整的音乐。
S22、提取所述待创作的MIDI乐谱的音高序列作为第三特征向量。
S23、提取所述待创作的MIDI乐谱的时序序列作为第四特征向量。
S24、将所述第三特征向量与所述第四特征向量进行连接,得到MIDI乐谱的特征向量。
若在训练音乐模型时采用的是顺序连接的方法,此时,也采用顺序连接的方法将所述第三特征向量与所述第四特征向量顺序连接。
若在训练音乐模型时采用的是交叉连接的方法,此时,也采用交叉连接的方法将所述第三特征向量与所述第四特征向量顺序连接。
S25、将所述特征向量输入至预先训练好的音乐模型中进行学习。
因训练好的基于结构化SVM音乐模型具有记忆MIDI乐谱的功能,因而将包含几个MIDI音符的MIDI乐谱的特征向量输入至预先训练好的音乐模 型中,模型可以自动输出与之对应的MIDI乐谱。
S26、输出对应的MIDI乐谱。
本申请所述的音乐创作方法,采用预先训练好的音乐模型,能够大大降低MIDI乐曲的创作成本,节省了大量乐队演奏员的各项开支,缩短了在录音棚的工作时间,提高了工作效率。
以上所述,仅是本申请的具体实施方式,但本申请的保护范围并不局限于此,对于本领域的普通技术人员来说,在不脱离本申请创造构思的前提下,还可以做出改进,但这些均属于本申请的保护范围。
下面结合第3至5图,分别对实现上述音乐模型训练方法及音乐创作方法的终端的功能模块及硬件结构进行介绍。
实施例三
图3为本申请音乐模型训练装置较佳实施例中的功能模块图。
在一些实施例中,所述音乐模型训练装置30运行于终端中。所述音乐模型训练装置30可以包括多个由程序代码段所组成的功能模块。所述音乐模型训练装置30中的各个程序段的程序代码可以存储于存储器中,并由至少一个处理器所执行,以执行(详见图1及其相关描述)对音乐模型的训练。
本实施例中,所述终端的音乐模型训练装置30根据其所执行的功能,可以被划分为多个功能模块。所述功能模块可以包括:获取模块301、提取模块302、训练模块303及验证模块304。本申请所称的模块是指一种能够被至少一个处理器所执行并且能够完成固定功能的一系列计算机可读指令段,其存储在存储器中。在一些实施例中,关于各模块的功能将在后续的实施例中详述。
获取模块301,用于获取MIDI音乐数据集,所述MIDI音乐数据集包括多个MIDI乐谱。
MIDI(Musical Instrument Digital Interface,乐器设备数字接口)是编曲界最广泛的音乐标准格式,可称为“计算机能理解的乐谱”,是用音符的数字控制信号来记录音乐。MIDI传输的不是声音信号,而是音符、控制参数等指令,它指示MIDI设备要做什么,怎么做,如演奏哪个音符、多大音量、什么音调结束、加以什么伴奏等等。即,MIDI数据包括:MIDI通道、向MIDI合成器等发声装置发送某个乐器的时间、音高、力度、音量、混响等信息。
本实施例中,所述MIDI音乐数据集的获取可以包括:
1)从NineS数据集收集到,该NineS数据集包含九个难度等级,每个难度等级有50首MIDI乐谱的数据。
2)从FourS数据集收集到,该FourS数据集包含400首MIDI钢琴乐谱文件组成的四个难度等级的数据集,每一个难度等级有100首MIDI乐谱的数据。
提取模块302,用于提取每一个MIDI乐谱的特征向量。
本实施例中,所述提取模块302提取每一个MIDI乐谱的特征向量包括:
1)提取MIDI乐谱的音高序列作为第一特征向量;
钢琴里包括黑键和白键,共88个按键,每一个按键代表不同的音高,因而可以使用一个88位的向量来表示音高。
可以预先设置第一标识用以标记某一个按键被按下时的音高,预先设置第二标识用以标记某一个按键没有被按下时的音高。所述第一标识可以是1,所述第二标识可以是0,即可以采用0-1标记法来标记MIDI的音高序列,某一时刻第i个键被按下时的音高序列记为
Figure PCTCN2018100333-appb-000021
举例说明,钢琴里的第1个按键在第一时刻被按下时的音高为1,第一时刻其余87位按键的音高为0,则第一时刻的音乐序列记为P1=(1,0,0,0,...,0,0);钢琴里的第5个按键在第二时刻被按下时的音高为1,第二时刻其余87位按键的音高为0,则第二时刻的音乐序列记为P5=(0,0,0,0,1,...,0,0)。
2)提取MIDI乐谱的时序序列作为第二特征向量;
所述时序序列是指某一时刻某个按键被按下后持续被按压的时间,可以用T表示。
例如,第3个按键在某一时刻被持续按压了2秒,则第3个按键的时序序列为2。
3)将所述第一特征向量与所述第二特征向量进行连接,得到MIDI乐谱的特征向量。
可以将所述第一特征向量与所述第二特征向量顺序连接,例如,得到的MIDI乐谱的特征向量记为
Figure PCTCN2018100333-appb-000022
还可以将所述第一特征向量与所述第二特征向量交叉连接,例如,得到的MIDI乐谱的特征向量记为
Figure PCTCN2018100333-appb-000023
优选地,为了避免按键的时序序列过大,所述提取MIDI乐谱的时序序列作为第二特征向量还可以包括:
求解所有时序序列的最大公约数,作为单位时长;
计算每一个时序序列为所述单位时长的倍数,将所述倍数作为按键对应的时序序列。
例如,所有时序序列的最大公约数为单位时长,如果一个按键持续的时长是单位时长的K倍,那么这个按键的时序序列记为K,演奏时让按键的音高重复K遍,来表示这个音高值持续的时间。
在其他实施例中,所述单位时长还可以是预先设置的一个任意数,例如,3秒。
训练模块303,用于将所述特征向量输入至结构化支持向量机中进行训练得到音乐模型。
与传统支持向量机(support vector machine,SVM)不同的是,结构化支持向量机能够根据音乐数据内部的结构性特征,来构造合适的结构化特征函数Ψ(x,y),从而能有效的处理复杂的结构化音乐数据,算法的目标是寻找一个用来预测的判别函数f(x;ω),在判别函数确定后,给定音乐数据输入值x,选择能最大化判别函数f(x;ω)的数据值y作为输出。
具体实现过程如下:
1)输入所述特征向量X={M i|i=1,2,...N},每一个时间点都有两个数据向量:音高和时长。
2)构造判别函数f(x;w),将最大化判别函数f(x;w)的数据值y作为预测值进行输出。
用式子(1-1)所示:
Figure PCTCN2018100333-appb-000024
其中,ω是参数向量,假设F与输入、输出的综合特征具有线性关系如下(1-2)所示:
F(x,y;w)=<w,Ψ(x,y>      (1-2)
其中,结构化特征函数Ψ(x,y)是输入输出的特征表示。
3)根据预设损失函数对预测值与真实值进行计算。
为了量化预测的准确性,需要设计一个损失函数△:y×y→R,当预测值和真实值越相近,损失函数会越小,当预测值和真实值相差较大,损失函数会变大,总的损失函数可以定义如下(1-3)所示:
Figure PCTCN2018100333-appb-000025
其中,P是数据的概率分布,需要用训练样本数据计算的经验风险
Figure PCTCN2018100333-appb-000026
来代替,判别函数f(x;ω)的性能可以通过损失函数来度量,不同的f(x;ω)对应着不同的损失函数,在本算法训练过程中,经验损失函数越小越好。
4)计算所述参数向量ω。
计算参数向量ω使得通过训练样本数据的经验风险
Figure PCTCN2018100333-appb-000027
为零,并且条件如下(1-4)所示:
Figure PCTCN2018100333-appb-000028
将上面的n个非线性公式展开成n||y||-n个线性式子(1-5),
Figure PCTCN2018100333-appb-000029
其中,δΨ i(y)=Ψ(x i,y i)-Ψ(x i,y)。
5)求解唯一参数向量ω。
在上面的约束限制下,参数向量ω的解可能有多个,为了得到唯一的参数向量ω,接下来借助SVM的maximum-margin原则,把问题转换为SVM的优化问题,入下(1-6)所示:
Figure PCTCN2018100333-appb-000030
到这里,就可以求解优化问题,得到唯一参数向量ω,解得判别函数f(x;ω),最后输出音乐时间序列。
优选地,为了验证训练好的音乐模型的性能,所述获取模块301还用于:
将所获取的MIDI音乐数据集分为第一数据集和第二数据集。
本较佳实施例中,训练音乐模型时可以采用交叉验证(Cross Validation)的思想,将所获取的MIDI音乐数据集按照合适的比例进行划分成第一数据集和第二数据集,合适的划分比例如7:3。
所述第一数据集用以训练音乐模型,所述第二数据集用以测试所训练出的音乐模型的性能,若测试的准确率越高,则表明所训练出的音乐模型的性能越好;若测试的准确率较低,则表明所训练出的音乐模型的性能较差。
进一步地,若划分出的第一数据集和第二数据集的总数量依旧较大,即将所有的第一数据集用来参与音乐模型的训练,将导致寻找音乐模型对应的最优参数向量ω代价较大。因而,在将所获取的MIDI音乐数据集分为第一数据集之后,所述获取模块301还用于:在所生成的第一数据集中随机选择第一预设数量的数据集参与音乐模型的训练。
本较佳实施例中,为了增加参与训练的第一数据集的随机性,可以采用随机数生成算法进行随机选择。
本较佳实施例中,所述第一预设数量可以是一个预先设置的固定值,例如,40,即在所生成的第一数据集随机挑选出40个MIDI乐谱参与音乐模型的训练。所述第一预设数量还可以是一个预先设置的比例值,例如,1/10,即,即在所生成的第一数据集中随机挑选1/10比例的样本参与音乐模型的训练。
进一步地,为了验证训练的基于结构化SVM音乐模型的性能,所述音乐模型训练装置30还可以包括验证模块304,用于:
在所述第二数据集中随机挑选一首MIDI乐谱;
提取挑选出的MIDI乐谱的预设时间段内的乐谱的特征向量;
将所述预设时间段内的乐谱的特征向量输入至训练好的基于结构化SVM音乐模型中,输出对应的MIDI乐谱;
根据所述挑选出的MIDI乐谱与所述输出的MIDI乐谱,验证训练的基于结构化SVM音乐模型的性能。
所述根据所述挑选出的MIDI乐谱与所述输出的MIDI乐谱,验证模块304验证训练的基于结构化SVM音乐模型的性能具体包括:
提取挑选出的MIDI乐谱的第一波形;
提取输出的MIDI乐谱的第二波形;
计算所述第一波形与所述第二波形的相似度;
判断所述相似度是否大于预设相似度阈值;
如果所述相似度大于或者等于所述预设相似度阈值时,则确定训练出的基于结构化SVM音乐模型的性能较优;
如果所述相似度小于所述预设相似度阈值时,则确定训练出的基于结构化SVM音乐模型的性能较差。
在一个可替代的实施例中,除了提取MIDI乐谱的音高和时序作为特征向量,所述音乐模型训练装置30还可以包括:对输入的MIDI音乐进行音乐元素的分析,得到所述MIDI音乐基于音乐元素的统计分析结果;将所述统计分析结果输入至结构化支持向量机中进行训练。
所述音乐元素的分析包括:结构分析、音轨分析、音色分析、节奏分析或者速度分析。在MIDI音乐中,通过和弦识别等方法得到和弦进行式,从而可以进行MIDI音乐的结构分析。其中,所述结构分析包括:乐段分析、乐句分析、和弦分析、小节分析、音符分析。乐段由多个听起来悦耳、流畅,符合音乐和弦进行理论的小节组成;小节由音符组成。MIDI文件中,包括一些通道事件。从通道事件中可以得到比如音高信息、时值信息、音色信息、力度信息、表情信息、弯音轮或调制轮信息、呼吸控制器信息、音量控制器信息、声场控制器信息等等,从这些信息中可以进行音轨分析及音色分析。其中,所述音轨分析包括:鼓轨分析、背景轨分析、伴奏轨分析、旋律轨分析。从鼓轨的音符分布及音量大小分布可大致得到节奏上的一些信息,进行节奏分析。MIDI文件中,还包括一些附加事件,如歌词,标记,音轨名,调号,拍号,速度值等,从这些事件的信息中可以得到比如速度及曲调等信息,从而进行速度分析。按照上述方法对MIDI音乐的结构、音轨、音色、节奏或者速度进行分析,可以得到所述MIDI音乐基于该音乐元素的统计分析结果。
本申请对输入的MIDI音乐进行音乐元素的分析,得到所述MIDI音乐基于音乐元素的统计分析结果不再详细阐述。
本申请所述的音乐模型训练装置,是将人工智能用于训练音乐模型的首列,采用基于结构化的SVM训练音乐模型,训练出的音乐模型能够显著提高MIDI乐谱的特征提取能力。
实施例四
图4为本申请音乐创作装置较佳实施例中的功能模块图。
在一些实施例中,所述音乐创作装置40运行于终端中。所述音乐创作装置40可以包括多个由程序代码段所组成的功能模块。所述音乐创作装置40中的各个程序段的程序代码可以存储于存储器中,并由至少一个处理器所执行,以执行(详见图2及其相关描述)对音乐的创作。
本实施例中,所述终端的音乐创作装置40根据其所执行的功能,可以被划分为多个功能模块。所述功能模块可以包括:采集模块401、第一提取模块402、第二提取模块403、连接模块404、学习模块405及输出模块406。本申请所称的模块是指一种能够被至少一个处理器所执行并且能够完成固定功能的一系列计算机可读指令段,其存储在存储器中。在一些实施例中,关于各模块的功能将在后续的实施例中详述。
采集模块401,用于采集用户创作的包含几个MIDI音符的MIDI乐谱,作为待创作的MIDI乐谱。
用户可以在钢琴上随意弹奏几个音符,当用户弹奏几个音符后可以停止弹奏,此时采集所述几个音符,将所述几个音符输入至预先训练好的音乐模型中,即可自行演奏出一首完整的音乐。
第一提取模块402,用于提取所述待创作的MIDI乐谱的音高序列作为第三特征向量。
第二提取模块403,用于提取所述待创作的MIDI乐谱的时序序列作为 第四特征向量。
连接模块404,用于将所述第三特征向量与所述第四特征向量进行连接,得到MIDI乐谱的特征向量。
若在训练音乐模型时采用的是顺序连接的方法,此时,也采用顺序连接的方法将所述第三特征向量与所述第四特征向量顺序连接。
若在训练音乐模型时采用的是交叉连接的方法,此时,也采用交叉连接的方法将所述第三特征向量与所述第四特征向量顺序连接。
学习模块405,用于将所述特征向量输入至预先训练好的音乐模型中进行学习。
因训练好的基于结构化SVM音乐模型具有记忆MIDI乐谱的功能,因而将包含几个MIDI音符的MIDI乐谱的特征向量输入至预先训练好的音乐模型中,模型可以自动输出与之对应的MIDI乐谱。
输出模块406,用于输出对应的MIDI乐谱。
本申请所述的音乐创作装置,采用预先训练好的音乐模型,能够大大降低MIDI乐曲的创作成本,节省了大量乐队演奏员的各项开支,缩短了在录音棚的工作时间,提高了工作效率。
上述以软件功能模块的形式实现的集成的单元,可以存储在一个非易失性可读取存储介质中。上述软件功能模块存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,双屏设备,或者网络设备等)或处理器(processor)执行本申请各个实施例所述方法的部分。
实施例五
图5为本申请实施例五提供的终端的示意图。
所述终端5包括:存储器51、至少一个处理器52、存储在所述存储器51中并可在所述至少一个处理器52上运行的计算机可读指令53及至少一条通讯总线54。
所述至少一个处理器52执行所述计算机可读指令53时实现上述音乐模型训练方法及/或音乐创作方法实施例中的步骤。
示例性的,所述计算机可读指令53可以被分割成一个或多个模块/单元,所述一个或者多个模块/单元被存储在所述存储器51中,并由所述至少一个处理器52执行,以完成本申请。所述一个或多个模块/单元可以是能够完成特定功能的一系列计算机可读指令段,该指令段用于描述所述计算机可读指令53在所述终端5中的执行过程。
所述终端5可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。本领域技术人员可以理解,所述示意图5仅仅是终端5的示例,并不构成对终端5的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件,例如所述终端5还可以包括输入输出设备、网络接入设备、总线等。
所述至少一个处理器52可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现 成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。该处理器52可以是微处理器或者该处理器52也可以是任何常规的处理器等,所述处理器52是所述终端5的控制中心,利用各种接口和线路连接整个终端5的各个部分。
所述存储器51可用于存储所述计算机可读指令53和/或模块/单元,所述处理器52通过运行或执行存储在所述存储器51内的计算机可读指令和/或模块/单元,以及调用存储在存储器51内的数据,实现所述终端5的各种功能。所述存储器51可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序(比如声音播放功能、图像播放功能等)等;存储数据区可存储根据终端5的使用所创建的数据(比如音频数据、电话本等)等。此外,存储器51可以包括高速随机存取存储器,还可以包括非易失性存储器,例如硬盘、内存、插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)、至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。
所述终端5集成的模块/单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个非易失性可读取存储介质中。基于这样的理解,本申请实现上述实施例方法中的全部或部分流程,也可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一非易失性可读存储介质中,该计算机可读指令在被处理器执行时,可实现上述各个方法实施例的步骤。其中,所述计算机可读指令包括计算机可读指令代码,所述计算机可读指令代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述非易失性可读介质可以包括:能够携带所述计算机可读指令代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、电载波信号、电信信号以及软件分发介质等。需要说明的是,所述非易失性可读介质包含的内容可以根据司法管辖区内立法和专利实践的要求进行适当的增减,例如在某些司法管辖区,根据立法和专利实践,非易失性可读介质不包括电载波信号和电信信号。
另外,在本申请各个实施例中的各功能单元可以集成在相同处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在相同单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能模块的形式实现。
对于本领域技术人员而言,显然本申请不限于上述示范性实施例的细节,而且在不背离本申请的精神或基本特征的情况下,能够以其他的具体形式实现本申请。因此,无论从哪一点来看,均应将实施例看作是示范性的,而且是非限制性的,本申请的范围由所附权利要求而不是上述说明限定,因此旨在将落在权利要求的等同要件的含义和范围内的所有变化涵括在本申请内。不应将权利要求中的任何附图标记视为限制所涉及的权利要求。此外,显然“包括”一词不排除其他单元或,单数不排除复数。系统权利要求中陈述的多个单元或装置也可以由一个单元或装置通过软件或者硬件来实现。第一,第 二等词语用来表示名称,而并不表示任何特定的顺序。
最后应说明的是,以上实施例仅用以说明本申请的技术方案而非限制,尽管参照较佳实施例对本申请进行了详细说明,本领域的普通技术人员应当理解,可以对本申请的技术方案进行修改或等同替换,而不脱离本申请技术方案的精神范围。

Claims (20)

  1. 一种音乐模型训练方法,其特征在于,所述方法包括:
    获取MIDI音乐数据集,所述MIDI音乐数据集包括多个MIDI乐谱;
    提取每一个MIDI乐谱的特征向量;
    将所述特征向量输入至结构化支持向量机中进行训练得到音乐模型,包括:构造判别函数f(x;w),x是特征向量,w是参数向量,将最大化判别函数f(x;w)的数据值
    Figure PCTCN2018100333-appb-100001
    作为预测值进行输出;根据预设损失函数
    Figure PCTCN2018100333-appb-100002
    对所述预测值与真实值进行计算,其中,P为数据的概率分布,用训练样本数据计算得到的经验风险
    Figure PCTCN2018100333-appb-100003
    代替;使用SVM的优化公式
    Figure PCTCN2018100333-appb-100004
    求解唯一参数向量ω使得通过训练样本数据的经验风险
    Figure PCTCN2018100333-appb-100005
    为零;解得判别函数f(x;ω),最后输出音乐时间序列。
  2. 如权利要求1所述的方法,其特征在于,所述提取每一个MIDI乐谱的特征向量包括:
    提取MIDI乐谱的音高序列作为第一特征向量;
    提取MIDI乐谱的时序序列作为第二特征向量;
    将所述第一特征向量与所述第二特征向量进行连接,得到MIDI乐谱的特征向量。
  3. 如权利要求2所述的方法,其特征在于,所述提取MIDI乐谱的时序序列作为第二特征向量还包括:
    求解所有时序序列的最大公约数,作为单位时长;或
    计算每一个时序序列为所述单位时长的倍数,将所述倍数作为按键对应的时序序列。
  4. 如权利要求1所述的方法,其特征在于,所述方法还包括:
    将所获取的MIDI音乐数据集分为第一数据集和第二数据集;
    在所述第一数据集中随机选择第一预设数量的数据集参与所述音乐模型的训练;
    在所述第二数据集中随机挑选一首MIDI乐谱;
    提取挑选出的MIDI乐谱的预设时间段内的乐谱的特征向量;
    将所述预设时间段内的乐谱的特征向量输入至训练好的所述音乐模型中,输出对应的MIDI乐谱;
    根据所述挑选出的MIDI乐谱与所述输出的MIDI乐谱,验证训练的音乐模型的性能。
  5. 如权利要求4所述的方法,其特征在于,所述验证训练的音乐模型的性能包括:
    提取所述挑选出的MIDI乐谱的第一波形;
    提取所述输出的MIDI乐谱的第二波形;
    计算所述第一波形与所述第二波形的相似度;
    判断所述相似度是否大于预设相似度阈值;
    如果所述相似度大于或者等于所述预设相似度阈值时,则确定训练出的所述音乐模型的性能较优;
    如果所述相似度小于所述预设相似度阈值时,则确定训练出的所述音乐模型的性能较差。
  6. 一种音乐创作方法,其特征在于,所述方法包括:
    采集用户创作的包含几个MIDI音符的MIDI乐谱,作为待创作的MIDI乐谱;
    提取所述待创作的MIDI乐谱的音高序列作为第三特征向量;
    提取所述待创作的MIDI乐谱的时序序列作为第四特征向量;
    将所述第三特征向量与所述第四特征向量进行连接,得到MIDI乐谱的特征向量;
    将所述特征向量输入至预先训练好的音乐模型中进行学习,其中,所述音乐模型为采用如权利要求1-5任意一项所述的方法训练得到;
    输出对应的MIDI乐谱。
  7. 一种音乐模型训练装置,其特征在于,所述装置包括:
    获取模块,用于获取MIDI音乐数据集,所述MIDI音乐数据集包括多个MIDI乐谱;
    提取模块,用于提取每一个MIDI乐谱的特征向量;
    训练模块,用于将所述特征向量输入至结构化支持向量机中进行训练得到音乐模型,包括:构造判别函数f(x;w),x是特征向量,w是参数向量,将最大化判别函数f(x;w)的数据值
    Figure PCTCN2018100333-appb-100006
    作为预测值进行输出;根据预设损失函数
    Figure PCTCN2018100333-appb-100007
    对所述预测值与真实值进行计算,其中,P为数据的概率分布,用训练样本数据计算得到的经验风险
    Figure PCTCN2018100333-appb-100008
    代替;使用SVM的优化公式
    Figure PCTCN2018100333-appb-100009
    求解唯一参数向量ω使得通过训练样本数据的经验风险
    Figure PCTCN2018100333-appb-100010
    为零;解得判别函数f(x;ω),最后输出音乐时间序列。
  8. 一种音乐创作装置,其特征在于,所述装置包括:
    采集模块,用于采集用户创作的包含几个MIDI音符的MIDI乐谱,作为待创作的MIDI乐谱;
    第一提取模块,用于提取所述待创作的MIDI乐谱的音高序列作为第三特征向量;
    第二提取模块,用于提取所述待创作的MIDI乐谱的时序序列作为第四特征向量;
    连接模块,用于将所述第三特征向量与所述第四特征向量进行连接,得到MIDI乐谱的特征向量;
    学习模块,用于将所述特征向量输入至预先训练好的音乐模型中进行学 习,其中,所述音乐模型为采用如权利要求7所述的装置训练得到;
    输出模块,用于输出对应的MIDI乐谱。
  9. 一种终端,其特征在于,所述终端包括处理器和存储器,所述处理器用于执行所述存储器中存储的计算机可读指令时实现以下步骤:
    获取MIDI音乐数据集,所述MIDI音乐数据集包括多个MIDI乐谱;
    提取每一个MIDI乐谱的特征向量;
    将所述特征向量输入至结构化支持向量机中进行训练得到音乐模型,包括:构造判别函数f(x;w),x是特征向量,w是参数向量,将最大化判别函数f(x;w)的数据值
    Figure PCTCN2018100333-appb-100011
    作为预测值进行输出;根据预设损失函数
    Figure PCTCN2018100333-appb-100012
    对所述预测值与真实值进行计算,其中,P为数据的概率分布,用训练样本数据计算得到的经验风险
    Figure PCTCN2018100333-appb-100013
    代替;使用SVM的优化公式
    Figure PCTCN2018100333-appb-100014
    求解唯一参数向量ω使得通过训练样本数据的经验风险
    Figure PCTCN2018100333-appb-100015
    为零;解得判别函数f(x;ω),最后输出音乐时间序列。
  10. 如权利要求9所述的终端,其特征在于,所述提取每一个MIDI乐谱的特征向量包括:
    提取MIDI乐谱的音高序列作为第一特征向量;
    提取MIDI乐谱的时序序列作为第二特征向量;
    将所述第一特征向量与所述第二特征向量进行连接,得到MIDI乐谱的特征向量。
  11. 如权利要求10所述的终端,其特征在于,所述提取MIDI乐谱的时序序列作为第二特征向量还包括:
    求解所有时序序列的最大公约数,作为单位时长;或
    计算每一个时序序列为所述单位时长的倍数,将所述倍数作为按键对应的时序序列。
  12. 如权利要求9所述的终端,其特征在于,所述处理器还用于执行所述计算机可读指令时实现以下步骤:
    将所获取的MIDI音乐数据集分为第一数据集和第二数据集;
    在所述第一数据集中随机选择第一预设数量的数据集参与所述音乐模型的训练;
    在所述第二数据集中随机挑选一首MIDI乐谱;
    提取挑选出的MIDI乐谱的预设时间段内的乐谱的特征向量;
    将所述预设时间段内的乐谱的特征向量输入至训练好的所述音乐模型中,输出对应的MIDI乐谱;
    根据所述挑选出的MIDI乐谱与所述输出的MIDI乐谱,验证训练的音乐模型的性能。
  13. 如权利要求12所述的终端,其特征在于,所述验证训练的音乐模型的性能包括:
    提取所述挑选出的MIDI乐谱的第一波形;
    提取所述输出的MIDI乐谱的第二波形;
    计算所述第一波形与所述第二波形的相似度;
    判断所述相似度是否大于预设相似度阈值;
    如果所述相似度大于或者等于所述预设相似度阈值时,则确定训练出的所述音乐模型的性能较优;
    如果所述相似度小于所述预设相似度阈值时,则确定训练出的所述音乐模型的性能较差。
  14. 一种终端,其特征在于,其特征在于,所述终端包括处理器和存储器,所述处理器用于执行所述存储器中存储的计算机可读指令时实现以下步骤:
    采集用户创作的包含几个MIDI音符的MIDI乐谱,作为待创作的MIDI乐谱;
    提取所述待创作的MIDI乐谱的音高序列作为第三特征向量;
    提取所述待创作的MIDI乐谱的时序序列作为第四特征向量;
    将所述第三特征向量与所述第四特征向量进行连接,得到MIDI乐谱的特征向量;
    将所述特征向量输入至预先训练好的音乐模型中进行学习,其中,所述音乐模型为采用如权利要求1-5任意一项所述的方法训练得到;
    输出对应的MIDI乐谱。
  15. 一种非易失性可读存储介质,所述非易失性可读存储介质上存储有计算机可读指令,其特征在于,所述计算机可读指令被处理器执行时实现以下步骤:
    获取MIDI音乐数据集,所述MIDI音乐数据集包括多个MIDI乐谱;
    提取每一个MIDI乐谱的特征向量;
    将所述特征向量输入至结构化支持向量机中进行训练得到音乐模型,包括:构造判别函数f(x;w),x是特征向量,w是参数向量,将最大化判别函数f(x;w)的数据值
    Figure PCTCN2018100333-appb-100016
    作为预测值进行输出;根据预设损失函数
    Figure PCTCN2018100333-appb-100017
    对所述预测值与真实值进行计算,其中,P为数据的概率分布,用训练样本数据计算得到的经验风险
    Figure PCTCN2018100333-appb-100018
    代替;使用SVM的优化公式
    Figure PCTCN2018100333-appb-100019
    求解唯一参数向量ω使得通过训练样本数据的经验风险
    Figure PCTCN2018100333-appb-100020
    为零;解得判别函数f(x;ω),最后输出音乐时间序列。
  16. 如权利要求15所述的存储介质,其特征在于,所述提取每一个MIDI乐谱的特征向量包括:
    提取MIDI乐谱的音高序列作为第一特征向量;
    提取MIDI乐谱的时序序列作为第二特征向量;
    将所述第一特征向量与所述第二特征向量进行连接,得到MIDI乐谱的 特征向量。
  17. 如权利要求16所述的存储介质,其特征在于,所述提取MIDI乐谱的时序序列作为第二特征向量还包括:
    求解所有时序序列的最大公约数,作为单位时长;或
    计算每一个时序序列为所述单位时长的倍数,将所述倍数作为按键对应的时序序列。
  18. 如权利要求15所述的存储介质,其特征在于,所述计算机可读指令被所述处理器执行时还实现以下步骤:
    将所获取的MIDI音乐数据集分为第一数据集和第二数据集;
    在所述第一数据集中随机选择第一预设数量的数据集参与所述音乐模型的训练;
    在所述第二数据集中随机挑选一首MIDI乐谱;
    提取挑选出的MIDI乐谱的预设时间段内的乐谱的特征向量;
    将所述预设时间段内的乐谱的特征向量输入至训练好的所述音乐模型中,输出对应的MIDI乐谱;
    根据所述挑选出的MIDI乐谱与所述输出的MIDI乐谱,验证训练的音乐模型的性能。
  19. 如权利要求18所述的存储介质,其特征在于,所述计算机可读指令被所述处理器执行时还实现以下步骤:
    提取所述挑选出的MIDI乐谱的第一波形;
    提取所述输出的MIDI乐谱的第二波形;
    计算所述第一波形与所述第二波形的相似度;
    判断所述相似度是否大于预设相似度阈值;
    如果所述相似度大于或者等于所述预设相似度阈值时,则确定训练出的所述音乐模型的性能较优;
    如果所述相似度小于所述预设相似度阈值时,则确定训练出的所述音乐模型的性能较差。
  20. 一种存储介质,其特征在于,所述非易失性可读存储介质上存储有计算机可读指令,其特征在于,所述计算机可读指令被处理器执行时实现以下步骤:
    采集用户创作的包含几个MIDI音符的MIDI乐谱,作为待创作的MIDI乐谱;
    提取所述待创作的MIDI乐谱的音高序列作为第三特征向量;
    提取所述待创作的MIDI乐谱的时序序列作为第四特征向量;
    将所述第三特征向量与所述第四特征向量进行连接,得到MIDI乐谱的特征向量;
    将所述特征向量输入至预先训练好的音乐模型中进行学习,其中,所述音乐模型为采用如权利要求1-5任意一项所述的方法训练得到;
    输出对应的MIDI乐谱。
PCT/CN2018/100333 2018-06-05 2018-08-14 音乐模型训练、音乐创作方法、装置、终端及存储介质 WO2019232928A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810570846.7A CN108806657A (zh) 2018-06-05 2018-06-05 音乐模型训练、音乐创作方法、装置、终端及存储介质
CN201810570846.7 2018-06-05

Publications (1)

Publication Number Publication Date
WO2019232928A1 true WO2019232928A1 (zh) 2019-12-12

Family

ID=64088744

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/100333 WO2019232928A1 (zh) 2018-06-05 2018-08-14 音乐模型训练、音乐创作方法、装置、终端及存储介质

Country Status (2)

Country Link
CN (1) CN108806657A (zh)
WO (1) WO2019232928A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113053336A (zh) * 2021-03-17 2021-06-29 平安科技(深圳)有限公司 音乐作品的生成方法、装置、设备及存储介质

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109771944B (zh) * 2018-12-19 2022-07-12 武汉西山艺创文化有限公司 一种游戏音效生成方法、装置、设备和存储介质
CN109671416B (zh) * 2018-12-24 2023-07-21 成都潜在人工智能科技有限公司 基于增强学习的音乐旋律生成方法、装置及用户终端
CN109784006A (zh) * 2019-01-04 2019-05-21 平安科技(深圳)有限公司 水印嵌入和提取方法及终端设备
JP7226709B2 (ja) * 2019-01-07 2023-02-21 ヤマハ株式会社 映像制御システム、及び映像制御方法
CN110264984B (zh) * 2019-05-13 2021-07-06 北京奇艺世纪科技有限公司 模型训练方法、音乐生成方法、装置和电子设备
CN111539576B (zh) * 2020-04-29 2022-04-22 支付宝(杭州)信息技术有限公司 一种风险识别模型的优化方法及装置
CN111627410B (zh) * 2020-05-12 2022-08-09 浙江大学 一种midi多轨序列表示方法和应用
CN111968452A (zh) * 2020-08-21 2020-11-20 江苏师范大学 和声学学习方法、装置及电子设备
CN112669796A (zh) * 2020-12-29 2021-04-16 西交利物浦大学 基于人工智能的音乐转乐谱的方法及装置
CN113012665B (zh) * 2021-02-19 2024-04-19 腾讯音乐娱乐科技(深圳)有限公司 音乐生成方法及音乐生成模型的训练方法
CN116030777B (zh) * 2023-03-13 2023-08-18 南京邮电大学 一种特定情感音乐生成方法及系统

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104978974A (zh) * 2014-10-22 2015-10-14 腾讯科技(深圳)有限公司 一种音频处理方法及装置
CN107123415A (zh) * 2017-05-04 2017-09-01 吴振国 一种自动编曲方法及系统
CN107909090A (zh) * 2017-10-11 2018-04-13 天津大学 基于测度学习半监督的钢琴乐谱难度识别方法

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101702316B (zh) * 2009-11-20 2014-04-09 北京中星微电子有限公司 一种将midi音乐转化为颜色信息的方法和系统
CN103186527B (zh) * 2011-12-27 2017-04-26 北京百度网讯科技有限公司 建立音乐分类模型的系统、推荐音乐的系统及相应方法
CN106847248B (zh) * 2017-01-05 2021-01-01 天津大学 基于鲁棒性音阶轮廓特征和向量机的和弦识别方法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104978974A (zh) * 2014-10-22 2015-10-14 腾讯科技(深圳)有限公司 一种音频处理方法及装置
CN107123415A (zh) * 2017-05-04 2017-09-01 吴振国 一种自动编曲方法及系统
CN107909090A (zh) * 2017-10-11 2018-04-13 天津大学 基于测度学习半监督的钢琴乐谱难度识别方法

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
YAN, ZHIYONG ET AL.: "SVM PCP (Chord Recognition Based on Support Vector Machine and Enhanced PCP Feature)", ARTIFICIAL INTELLIGENCE AND RECOGNITION TECHNOLOGY, vol. 40, no. 7, 31 July 2014 (2014-07-31), pages 172 *
YU , CHUNYAN ET AL.: "Video Semantic Context Label Tree and Its Structural Analysis", JOURNAL OF GRAPHICS, vol. 36, no. 5, 31 October 2015 (2015-10-31), pages 749 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113053336A (zh) * 2021-03-17 2021-06-29 平安科技(深圳)有限公司 音乐作品的生成方法、装置、设备及存储介质

Also Published As

Publication number Publication date
CN108806657A (zh) 2018-11-13

Similar Documents

Publication Publication Date Title
WO2019232928A1 (zh) 音乐模型训练、音乐创作方法、装置、终端及存储介质
Pachet et al. Reflexive loopers for solo musical improvisation
CN104395953B (zh) 来自音乐音频信号的拍子、和弦和强拍的评估
US20160247496A1 (en) Device and method for generating a real time music accompaniment for multi-modal music
US20230402026A1 (en) Audio processing method and apparatus, and device and medium
Kwon et al. Audio-to-score alignment of piano music using RNN-based automatic music transcription
Chourdakis et al. A machine-learning approach to application of intelligent artificial reverberation
CN108257588B (zh) 一种谱曲方法及装置
Cogliati et al. Transcribing Human Piano Performances into Music Notation.
CN112289300B (zh) 音频处理方法、装置及电子设备和计算机可读存储介质
CN113010730A (zh) 音乐文件生成方法、装置、设备及存储介质
WO2023040332A1 (zh) 一种曲谱生成方法、电子设备及可读存储介质
CN109410972B (zh) 生成音效参数的方法、装置及存储介质
Schuller et al. Music theoretic and perception-based features for audio key determination
CN110134823B (zh) 基于归一化音符显马尔可夫模型的midi音乐流派分类方法
US10431191B2 (en) Method and apparatus for analyzing characteristics of music information
Trochidis et al. CAMeL: Carnatic percussion music generation using n-gram models
CN110867174A (zh) 自动混音装置
CN112825244B (zh) 配乐音频生成方法和装置
Lee et al. i-Ring: A system for humming transcription and chord generation
Ryynänen Automatic transcription of pitch content in music and selected applications
Li et al. Automatic Note Recognition and Generation of MDL and MML using FFT
Tauscher et al. Audio Resynthesis on the Dancefloor: A Music Structural Approach.
Makhmutov et al. Momos-mt: mobile monophonic system for music transcription: sheet music generation on mobile devices
Liao Analysis and trans-synthesis of acoustic bowed-string instrument recordings–a case study using Bach cello suites

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18921721

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18921721

Country of ref document: EP

Kind code of ref document: A1