WO2020029382A1 - Method, system and apparatus for building music composition model, and storage medium - Google Patents

Method, system and apparatus for building music composition model, and storage medium Download PDF

Info

Publication number
WO2020029382A1
WO2020029382A1 PCT/CN2018/106680 CN2018106680W WO2020029382A1 WO 2020029382 A1 WO2020029382 A1 WO 2020029382A1 CN 2018106680 W CN2018106680 W CN 2018106680W WO 2020029382 A1 WO2020029382 A1 WO 2020029382A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
track
generator
music
time
Prior art date
Application number
PCT/CN2018/106680
Other languages
French (fr)
Chinese (zh)
Inventor
张爽
王义文
王健宗
肖京
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020029382A1 publication Critical patent/WO2020029382A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data

Definitions

  • the present application relates to the field of information technology, and in particular, to a method, a system, a device, and a storage medium for establishing a composition model.
  • a method for establishing a composition model includes: acquiring a music data set in a MIDI format and converting the music data set in a MIDI format into a piano key shaft, the piano key shaft being a music storage medium for reproducing piano playing , When it is read, trigger the playback cycle of each note; clean up the piano keys after format conversion; use the generative adversarial network to establish the interference track model and the composition track model and combine the interference track model Build a mixed track model with the composer track model; divide the generator into a time structure generator G temp and a music bar generator G bar , and build music through the time structure generator G temp and the music bar generator G bar The temporal correlation between the bars forms a time model; the mixed track model and the time model are combined to form a multi-track symphony composition model.
  • a system for establishing a composition model including:
  • An acquisition module configured to acquire a music data set in the MIDI format and convert the music data set in the MIDI format into a piano key shaft, the piano key shaft being a music storage medium for reproducing a piano performance when read Trigger the playback loop of each note
  • a clearing module which is used for clearing the data of the piano keys after format conversion
  • a building module for establishing a disturbing track model and a composing track model using a generative adversarial network, and establishing a mixed track model by combining the disturbing track model and the composing track model;
  • a building module configured to divide the generator into a time structure generator G temp and a music bar generator G bar , and use the time structure generator G temp and the music bar generator G bar to construct a timing correlation between music bars sexual formation time model
  • a combination module is configured to combine the mixed audio track model and the time model to form a multi-track symphony composition model.
  • a computer device includes a memory and a processor.
  • the memory stores computer-readable instructions.
  • the processor causes the processor to perform the steps of the composition model building method.
  • a storage medium storing computer-readable instructions, when the computer-readable instructions are executed by one or more processors, cause the one or more processors to execute the steps of the composition model building method described above.
  • the above-mentioned method, system, computer equipment and storage medium for establishing a composition model by acquiring a MIDI format music data set and converting the MIDI format music data set into a piano key, the piano key is a music storage medium, It is used to reproduce the piano performance. When it is read, it triggers the playback cycle of each note. It cleans the piano keys after format conversion, merges the tracks of similar instruments through the piano keys, and combines non-brass and string instruments. The wind instruments and percussion tracks are unified into the string instrument tracks.
  • the four music measures are regarded as one phrase, which is divided by the piano scroll, and the long section is trimmed to a suitable size.
  • the suitable sizes are C1 to C8.
  • a generative adversarial network is used to establish a disturbing track model and a composing track model, and a hybrid track model is established by combining the interfering track model and the composing track model, and the generator is divided into a time structure generator G temp and music section generator G bar, constructed between the timing of the time structure of the music passage through generator G temp music and the generator G bar section Forming correlation time model, the combination of the model and the mixing time track model, orchestra composition to form a multi-track model, the change to meet the people of music and natural diversity requirements.
  • FIG. 1 is a flowchart of a method for establishing a composition model in an embodiment
  • FIG. 2 is a schematic diagram of music generating M audio tracks in an embodiment
  • FIG. 3 is a schematic diagram of a multi-channel piano key created by a single generator in an embodiment
  • FIG. 4 is a schematic diagram of establishing a hybrid model by combining an interference model and a composition model in an embodiment
  • FIG. 5 is a schematic diagram of constructing audio track correlation in an embodiment
  • FIG. 6 is a schematic diagram of a multi-track model in an embodiment
  • FIG. 8 is a flowchart of establishing an interference track model and a composing track model by using a generative adversarial network in an embodiment
  • FIG. 9 is a structural block diagram of a system for establishing a composition model in an embodiment
  • FIG. 10 is a structural block diagram of a cleaning module in an embodiment
  • FIG. 11 is a structural block diagram of an establishment module in an embodiment.
  • a method for establishing a composition model includes the following steps:
  • Step S101 Obtain a music data set in the MIDI format and convert the music data set in the MIDI format into a piano key shaft, the piano key shaft is a music storage medium for reproducing a piano performance, and is triggered when read. Playback loop of each note;
  • This technical solution uses a MIDI data set and converts a MIDI file into a multi-track piano-rolls (piano keys) representation.
  • Piano-rolls is a set of binary representations, which are matrices that represent the existence of notes in different time steps. That is, at each moment, the key pressed is represented as 1, the key not pressed is represented as 0, one has M tracks, each track has R time_steps, and the number of candidate syllables (bar) is S.
  • Is X its data form is X RxSxM , and T syllables are expressed as Therefore, the matrix size of each X is fixed.
  • the MIDI also known as the digital interface of musical instruments, is an industry standard electronic communication protocol. It defines various notes or playing codes for electronic musical instruments and other performance equipment (such as synthesizers), allowing electronic musical instruments, computers, mobile phones or other stage performances. The equipment is connected to each other, adjusted and synchronized to exchange performance data in real time.
  • Step S102 data cleaning is performed on the piano key shaft after the format conversion
  • the original data set is very noisy, although there will be a certain loss of sound source after cleaning the data (such as the sound source of some merged instruments), but the richness and effectiveness of the data is the key to training the model. Cleanup is a necessary operation, so the following methods are used to clean up the data. Some instruments present too little data on the corresponding audio track, which is not conducive to the training of the model. Therefore, this technical solution has merged them.
  • piano-rolls merges the soundtracks of similar instruments. For non-brass, string, wind and percussion tracks are unified into string instruments, which enriches the amount of data. In order to train the time model, four bars (bar) As a phrase, the piano-rolls can be segmented, and longer sections can be trimmed to a suitable size. The highest and lowest notes are not very common, so the range of sounds is C1 to C8 (rightmost of the piano).
  • step S103 an interference track model and a composing track model are established by using a generative adversarial network, and a mixed track model is established by combining the interference track model and the composing track model;
  • Adversarial learning of the GAN network is implemented based on the fighting mode of the generator and the discriminator, so that the generator gradually masters the data distribution rules of the generated music sequence.
  • the GAN network implements adversarial learning by constructing two networks: a generator and a discriminator.
  • the generator can capture the potential distribution of real data samples and generate new data samples.
  • the discriminator is a two-classifier that determines the input of the generator. Whether the data is real data or generated samples, these two models "fight" each other, and in the process, each model becomes more powerful.
  • the generator will try to make its own ability to generate samples as strong as it is to generate samples that are no different from the real data, and the discriminator cannot determine whether the samples generated by the generator are true or false, so as to gradually master the data distribution rules of the generated music sequence,
  • the discriminator has also continuously improved its ability to identify data samples in this process.
  • the GAN network is also called a generative adversarial network, which includes two networks: one network is used to generate data called a generator, and the other network is used to determine whether the generated data is close to the real, called a discriminator.
  • the basic principle is: train a generator to generate realistic samples from random noise or latent variables, and train a discriminator to discriminate between real data and generated data. Both are trained at the same time until a Nash equilibrium is reached. The data is not different from the real sample, and the discriminator cannot distinguish the generated data from the real data correctly.
  • An interference model is established based on the GAN network.
  • Each track has its own set of generators and discriminators, and an independent hidden space variable Zi.
  • a composing model is established based on the GAN network. Since the composing model is the input of multiple track data into the generator, a single generator can be used to create a multi-channel piano-rolls. Each channel represents a specific track. ,As shown in Figure 3. In this model, there is only one set of generators and discriminators globally, and the common input Z is used to generate all audio tracks, so no matter what the value of M is, only one generator and one discriminator are needed.
  • a hybrid model is created by combining the interference model and the composing model.
  • the hybrid model combines the above two model methods.
  • Each track has a generator that accepts an input composed of an independent vector zi and a global vector z.
  • Vector while sharing a discriminator to generate audio tracks.
  • the mixing mode is more flexible. Different parameters (such as the number of layers, the size of the convolution kernel, etc.) can be used in the G model to harmonize the independent generation of the audio track and the global generation.
  • Step S104 the generator is divided into a time structure generator G temp and a music bar generator G bar , and a time correlation between the music bars is formed by the time structure generator G temp and the music bar generator G bar .
  • Time model
  • the purpose of constructing track correlation is how to generate a single measure in different tracks, and the timing relationship between measures and measures needs other structures to supplement the generation.
  • the generator is divided into two sub-networks: a time structure generator G temp and a music bar generator G bar , as shown in the figure below.
  • G temp maps the input vector Z into a sequence of hidden space vectors Among them, T, t are time, Z arrow It will carry some timing information, and then it is sent to G bar to generate piano keys serially.
  • step S105 the mixed audio track model and the time model are combined to form a multi-track symphony composition model.
  • Multi-track model is the integration and extension of the above-mentioned track model and time model.
  • the model input Represented by four parts, global time correlation between tracks vector Z t, the time between the track irrespective of the global vector Z, a separate time-independent vector Z i of the inner rail, the inner rail, and a separate time-dependent vector Z i, t.
  • the shared time structure generator G temp and the respective time structure generator G temp, i respectively adopt random vectors Z t and Z i that change with time , t as input, and they output a series of latent vectors containing inter-track and intra-track time information respectively.
  • the output sequence (latent vector) is sent to the music bar generator G bar together with the time-independent random vectors z and z i , And then generate piano windows in order.
  • the generation process can be formulated as:
  • the data cleaning of the piano key shaft after format conversion includes:
  • step S201 the sound tracks of similar instruments are merged through the piano key shaft, and the tracks of non-brass, string instruments, wind instruments and percussion are unified into a string instrument track;
  • the original data set is very noisy, although there will be a certain loss of sound source after data cleaning, such as the sound source of some merged instruments, but the richness and effectiveness of the data is the key to training the model, so the data is cleaned up. It is a necessary operation, so use the following methods to clean up the data.
  • Some instruments present too little data on the corresponding audio track, which is not conducive to the training of the model. Therefore, this technical solution has merged them.
  • By summarizing piano -rolls merges the soundtracks of similar instruments. For non-brass, string, wind and percussion tracks are unified into string instruments, which enriches the data volume.
  • step S202 the four music bars are regarded as one phrase, and the long section is divided into a suitable size by dividing the piano scroll, and the suitable size is the sound range of C1 to C8.
  • Treat four music bars as a phrase divide it by the piano scroll, trim the long section to a suitable size, and the appropriate size ranges from C1 to C8.
  • the four bars (bar) as one Phrase, so that the piano-rolls can be segmented, and longer sections can be trimmed to the appropriate size.
  • the highest and lowest notes are not very common, so the range is C1 to C8 (rightmost of the piano).
  • the use of a generative adversarial network to establish an interference track model and a composing track model, and combining the interference track model and the composing track model to establish a mixed track model include:
  • Step S301 a generative adversarial network is used to establish an interfering track model.
  • Each of the interfering track models has its own set of generators and discriminators, and independent hidden space variables, where multiple generators are independent of each other.
  • the generator receives feedback from different discriminators, and uses the combat mode of the generator and discriminator to implement adversarial learning of the generative adversarial network, so that the generator controls the generation of music
  • the adversarial learning of the GAN network is realized based on the fighting mode of the generator and the discriminator, so that the generator gradually masters the data distribution rules of the generated music sequence.
  • the GAN network implements adversarial learning by constructing two networks: the generator and the discriminator.
  • the generator can capture the potential distribution of real data samples and generate new data samples.
  • the discriminator is a two-classifier that determines whether the data input by the generator is real data or a generated sample. These two modes "fight" each other. In the process, each model becomes more powerful.
  • the generator will try to make its own ability to generate samples as strong as it is to generate samples that are no different from the real data, and the discriminator cannot determine whether the samples generated by the generator are true or false, so as to gradually master the data distribution rules of the generated music sequence,
  • the discriminator has also continuously improved its ability to identify data samples in this process.
  • the GAN network is also called a generative adversarial network, which includes two networks: one network is used to generate data called a generator, and the other network is used to determine whether the generated data is close to the real, called a discriminator.
  • the basic principle is: train a generator to generate realistic samples from random noise or latent variables, and train a discriminator to discriminate between real data and generated data. Both are trained at the same time until a Nash equilibrium is reached. The data is not different from the real sample, and the discriminator cannot distinguish the generated data from the real data correctly.
  • An interference model is established based on the GAN network.
  • Each track has its own set of generators and discriminators, and an independent hidden space variable Zi.
  • a composition track model is established using a generative adversarial network.
  • the composition track model inputs a plurality of track data quantities into the generator, and a multi-channel piano key is created through a single generator. , Each channel represents a specific audio track;
  • composition model is the input of multiple audio track data into the generator, a single generator can be used to create multi-channel piano-rolls, each channel representing a specific audio track, as shown in Figure 3.
  • this model there is only one set of generators and discriminators globally.
  • the common input z is used to generate all audio tracks, so no matter what the value of M is, only one generator and one discriminator are needed.
  • a mixed track model is established by combining the interference track model and the composing track model.
  • Each track in the mixed track model has a generator that accepts a combination of independent vectors and global vectors.
  • the input vector is shared with a discriminator to generate the audio track.
  • a hybrid model is created by combining the interference model and the composition model.
  • the hybrid model combines the above two model methods.
  • Each track has a generator that accepts the independent vector Z i and the global vector z together.
  • the input vector is shared with a discriminator to generate the audio track.
  • the mixing mode is more flexible. Different parameters such as the number of layers and the size of the convolution kernel can be used in the G model to combine the independent generation of the audio track and the global generation in harmony.
  • the generator is divided into a time structure generator G temp and a music bar generator G bar , and a music bar is constructed between the time structure generator G temp and the music bar generator G bar .
  • the temporal models of temporal correlation include:
  • the generator is divided into a time structure generator G temp and a music bar generator G bar .
  • the time structure generator G temp maps the input vector Z into a sequence of hidden space vectors.
  • T stands for time, Carry timing information, and then send it to the music bar generator G bar to serially generate piano keys, which is defined as:
  • the combining the mixed audio track model and the time model to form a multi-track symphony composition model includes:
  • the combination of the mixed track model and the time model forms a multi-track symphony composition model, where the input of the model is used Represents, Z t the temporal correlation between tracks global vector, the global time between tracks independent vector Z, a separate time-independent vector Z i of the inner rail, the inner rail, and a separate time-dependent vector Z i, t composition;
  • the input variables between the tracks are combined with the global input variables to form a multi-track symphony.
  • the input variables between the tracks are divided into time dependent and unrelated, and the global input variables are divided into time related and unrelated.
  • Multi-track model is the integration and extension of the above-mentioned track model and time model.
  • the model input Represented by four parts, global time correlation between tracks vector Z t, the time between the track irrespective of the global vector Z, a separate time-independent vector Z i of the inner rail, the inner rail, and a separate time-dependent vector Z i, t.
  • the shared time structure generator G temp and the respective time structure generator G temp, i respectively adopt random vectors Z t and Z i that change with time , t as input, and they output a series of latent vectors containing inter-track and intra-track time information, and the output sequence (potential vector) is sent to the music bar generator G bar together with the time-independent random vectors Z and Z i , And then generate piano windows in order.
  • the generation process can be formulated as:
  • a system for establishing a composition model includes:
  • the acquisition module is configured to acquire a music data set in the MIDI format and convert the music data set in the MIDI format into a piano key shaft, where the piano key shaft is a music storage medium for reproducing a piano performance, and when read Trigger the playback loop of each note
  • the cleaning module is set to clean the data of the piano key axis after the format conversion
  • a building module configured to use a generative adversarial network to establish an interference track model and a composing track model, and combine the interference track model and the composing track model to establish a mixed track model;
  • a building module configured to divide the generator into a time structure generator G temp and a music bar generator G bar , and use the time structure generator G temp and the music bar generator G bar to construct a timing correlation between music bars sexual formation time model
  • the combination module is configured to combine the mixed audio track model and the time model to form a multi-track symphony composition model.
  • the cleaning module further includes:
  • the merging unit is configured to merge the tracks of similar instruments through the piano key axis, and integrate the non-brass, string, wind and percussion tracks into the string instrument track;
  • the trimming unit is configured to treat four music bars as one phrase, divide it by a piano reel, and trim a long section to a suitable size, where the suitable size is the sound range of C1 to C8.
  • the establishment module further includes:
  • An interference model unit is set up to set up an interference track model using a generative adversarial network.
  • Each of the interference track models has its own set of generators and discriminators, as well as independent hidden space variables.
  • the generators work independently of each other, and given a random vector to generate their own music track, the generator receives feedback from different discriminators, and uses the combat mode of the generator and discriminator to implement the adversarial learning of the generative adversarial network to make the generation Controller controls the data distribution of generated music sequences;
  • a composition model unit is set up to set up a composition track model using a generative adversarial network.
  • the composition track model inputs a plurality of track data quantities into the generator, and creates a multi-channel through a single generator.
  • Piano keys, each channel represents a specific audio track;
  • the combining unit is configured to establish a mixed audio track model by combining the disturbing audio track model and the composing audio track model.
  • Each audio track in the mixed audio track model has a generator that accepts a combination of independent vectors and global vectors. Into the input vector, and a discriminator is used to generate the audio track.
  • the building module includes:
  • a generator processing unit configured to divide the generator into a time structure generator G temp and a music bar generator G bar , where the time structure generator G temp maps the input vector Z into a sequence of hidden space vectors
  • T stands for time, Carry timing information, and then send it to the music bar generator G bar to serially generate piano keys, which is defined as:
  • the combination module includes:
  • a mixing processing unit configured to form a combination of a mixed audio track model and a time model to form a multi-track symphony composition model, wherein the input of the model is used Represents, Z t the temporal correlation between tracks global vector, the global time between tracks independent vector Z, a separate time-independent vector Z i of the inner rail, the inner rail, and a separate time-dependent vector Z i, t composition;
  • the output sequence potential vectors are sent together with time-independent random vectors Z and Z i Music bar generator G bar and then generate piano windows in sequence.
  • the generation process can be defined as:
  • the input variables between the tracks are combined with the global input variables to form a multi-track symphony.
  • the input variables between the tracks are divided into time dependent and unrelated, and the global input variables are divided into time related and unrelated.
  • a computer device in one embodiment, includes a memory and a processor.
  • the memory stores computer-readable instructions.
  • the processor is caused to execute the foregoing embodiments. Steps in the method of building a composition model.
  • a storage medium storing computer-readable instructions.
  • the computer-readable instructions are executed by one or more processors, the one or more processors are caused to execute all the steps in the foregoing embodiments. Describe the steps of the method of building a composition model.
  • the storage medium may be a non-volatile storage medium.
  • the program may be stored in a computer-readable storage medium.
  • the storage medium may include: Read-only memory (ROM, Read Only Memory), random access memory (RAM, Random Access Memory), magnetic disks or optical disks, etc.

Abstract

A method and system for building a music composition model, a computer device, and a storage medium, the method comprising: acquiring a music data set in a MIDI format and converting the music data set in the MIDI format into a piano roll, wherein the piano roll is a music storage medium for reproducing a piano performance, and the piano roll triggers playing cycles of respective notes upon being read (S101); performing data cleanup on the piano roll after the format conversion (S102); using a generative adversarial network to build an interference track model and a music composition track model, and building a mixed track model by combining the interference track model with the music composition track model (S103); dividing a generator into a temporal structure generator Gtemp and a musical bar generator Gbar, forming a temporal model by establishing a temporal sequence relevance among the musical bars by means of the temporal structure generator Gtemp and the musical bar generator Gbar (S104); combining the mixed track model with the temporal model to form a multi-track symphonic music composition model (S105). The method satisfies requirements for variation and diversity of music.

Description

一种作曲模型的建立方法、系统、设备和存储介质Method, system, equipment and storage medium for establishing composition model
本申请要求于2018年08月08日提交中国专利局、申请号为201810894765.2、发明名称为“一种作曲模型的建立方法、系统、设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on August 08, 2018 with the application number 201810894765.2. The invention name is "A method, system, device, and storage medium for establishing a composition model", and its entire contents Incorporated by reference in this application.
技术领域Technical field
本申请涉及信息技术领域,尤其涉及一种作曲模型的建立方法、系统、设备和存储介质。The present application relates to the field of information technology, and in particular, to a method, a system, a device, and a storage medium for establishing a composition model.
背景技术Background technique
音乐生成的方式多种多样,最为常见的方式便是通过乐器产生,近年来,随着计算机应用技术的不断发展,通过算法的使用也可以产生音乐,在音乐生成算法研究领域,已有很多基于遗传算法、RNN和CNN的算法取得了良好效果,但是大部分生成的音乐是单音轨,即一种乐器,而真正的音乐是由多种乐器组成,例如现代交响乐团通常包括四个部分:黄铜,弦乐器,管乐器和打击乐。单音轨音乐形式比较单一,无法满足人们对音乐多样化的需求,并且目前还没有关于多音轨生成交响乐算法方面的研究。There are many ways to generate music. The most common way is through musical instruments. In recent years, with the continuous development of computer application technology, music can also be generated through the use of algorithms. In the field of music generation algorithm research, there have been many The algorithms of genetic algorithm, RNN and CNN have achieved good results, but most of the generated music is a single track, that is, an instrument, and the real music is composed of multiple instruments. For example, modern symphony orchestras usually include four parts: Brass, string instruments, wind instruments and percussion. The single-track music form is relatively single and cannot meet people's diverse needs for music. At present, there is no research on multi-track generation symphony algorithms.
发明内容Summary of the invention
基于此,有必要针对现行作曲方法的弊端,提供一种作曲模型的建立方法、系统、计算机设备和存储介质。Based on this, it is necessary to provide a method, system, computer equipment, and storage medium for establishing a composition model in view of the disadvantages of the current composition method.
一种作曲模型的建立方法,包括:获取MIDI格式的音乐数据集,并将所述MIDI格式的音乐数据集转换为钢琴键轴,所述钢琴键轴是音乐存储媒介,用于再现钢琴弹奏,当被读取时触发每个音符的播放循环;对格式转换后的钢琴键轴进行数据清理;采用生成式对抗网络建立干扰音轨模型和作曲音轨模型,并结合所述干扰音轨模型和作曲音轨模型建立混合音轨模型;将生成器分为时间结构生成器G temp和音乐小节生成器G bar,通过所述时间结构生成器G temp和所述音乐小节生成器G bar构建音乐小节之间的时序相关性形成时间模型;组合所述混合音轨模型和时间模型,以形成多音轨交响乐作曲模型。 A method for establishing a composition model includes: acquiring a music data set in a MIDI format and converting the music data set in a MIDI format into a piano key shaft, the piano key shaft being a music storage medium for reproducing piano playing , When it is read, trigger the playback cycle of each note; clean up the piano keys after format conversion; use the generative adversarial network to establish the interference track model and the composition track model and combine the interference track model Build a mixed track model with the composer track model; divide the generator into a time structure generator G temp and a music bar generator G bar , and build music through the time structure generator G temp and the music bar generator G bar The temporal correlation between the bars forms a time model; the mixed track model and the time model are combined to form a multi-track symphony composition model.
一种作曲模型的建立系统,包括:A system for establishing a composition model, including:
获取模块,用于获取MIDI格式的音乐数据集,并将所述MIDI格式的音乐数据集转换为钢琴键轴,所述钢琴键轴是音乐存储媒介,用于再现钢琴弹奏,当被读取时触发每个音符的播放循环;An acquisition module, configured to acquire a music data set in the MIDI format and convert the music data set in the MIDI format into a piano key shaft, the piano key shaft being a music storage medium for reproducing a piano performance when read Trigger the playback loop of each note
清理模块,用于对格式转换后的钢琴键轴进行数据清理;A clearing module, which is used for clearing the data of the piano keys after format conversion;
建立模块,用于采用生成式对抗网络建立干扰音轨模型和作曲音轨模型,并结合所述干扰音轨模型和作曲音轨模型建立混合音轨模型;A building module for establishing a disturbing track model and a composing track model using a generative adversarial network, and establishing a mixed track model by combining the disturbing track model and the composing track model;
构建模块,用于将生成器分为时间结构生成器G temp和音乐小节生成器G bar,通过所述时间结构生成器G temp和所述音乐小节生成器G bar构建音乐小节之间的时序相关性形成时间模型; A building module, configured to divide the generator into a time structure generator G temp and a music bar generator G bar , and use the time structure generator G temp and the music bar generator G bar to construct a timing correlation between music bars Sexual formation time model
组合模块,用于组合所述混合音轨模型和时间模型,以形成多音轨交响乐作曲模型。A combination module is configured to combine the mixed audio track model and the time model to form a multi-track symphony composition model.
一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述处理器执行上述作曲模型建立方法的步骤。A computer device includes a memory and a processor. The memory stores computer-readable instructions. When the computer-readable instructions are executed by the processor, the processor causes the processor to perform the steps of the composition model building method.
一种存储有计算机可读指令的存储介质,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行上述作曲模型建立方法的步骤。A storage medium storing computer-readable instructions, when the computer-readable instructions are executed by one or more processors, cause the one or more processors to execute the steps of the composition model building method described above.
上述作曲模型的建立方法、系统、计算机设备和存储介质,通过获取MIDI格式的音乐数据集,并将所述MIDI格式的音乐数据集转换为钢琴键轴,所述钢琴键轴是音乐存储媒介,用于再现钢琴弹奏,当被读取时触发每个音符的播放循环,对格式转换后的钢琴键轴进行数据清理,通过钢琴键轴将相似乐器的音轨合并,将非黄铜,弦乐器,管乐器和打击乐的音轨统一归纳到弦乐器音轨,将四个音乐小节视为一个乐句,通过钢琴卷轴进行分割,将长片段修剪成适合的大小,所述适合的大小为C1到C8的音域,采用生成式对抗网络建立干扰音轨模型和作曲音轨模型,并结合所述干扰音轨模型和作曲音轨模型建立混合音轨模型,将生成器分为时间结构生成器G temp和音乐小节生成器G bar,通过所述时间结构生成器G temp和所述音乐小节生成器G bar构建音乐小节之间的时序相关性形成时间模型,组合所述混合音轨模型和时间模型,以形成多音轨交响乐作曲模型,满足了人们对音乐变化多样化的要求而且自然。 The above-mentioned method, system, computer equipment and storage medium for establishing a composition model, by acquiring a MIDI format music data set and converting the MIDI format music data set into a piano key, the piano key is a music storage medium, It is used to reproduce the piano performance. When it is read, it triggers the playback cycle of each note. It cleans the piano keys after format conversion, merges the tracks of similar instruments through the piano keys, and combines non-brass and string instruments. The wind instruments and percussion tracks are unified into the string instrument tracks. The four music measures are regarded as one phrase, which is divided by the piano scroll, and the long section is trimmed to a suitable size. The suitable sizes are C1 to C8. In the sound field, a generative adversarial network is used to establish a disturbing track model and a composing track model, and a hybrid track model is established by combining the interfering track model and the composing track model, and the generator is divided into a time structure generator G temp and music section generator G bar, constructed between the timing of the time structure of the music passage through generator G temp music and the generator G bar section Forming correlation time model, the combination of the model and the mixing time track model, orchestra composition to form a multi-track model, the change to meet the people of music and natural diversity requirements.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
通过阅读下文优选实施方式的详细描述,各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的,而并不认为是对本申请的限制。Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the detailed description of the preferred embodiments below. The drawings are only for the purpose of illustrating preferred embodiments and are not to be considered as limiting the present application.
图1为一个实施例中作曲模型的建立方法的流程图;FIG. 1 is a flowchart of a method for establishing a composition model in an embodiment; FIG.
图2为一个实施例中产生M个音轨的音乐的示意图;FIG. 2 is a schematic diagram of music generating M audio tracks in an embodiment; FIG.
图3为一个实施例中单一的生成器创建一个多通道的钢琴键轴的示意图;FIG. 3 is a schematic diagram of a multi-channel piano key created by a single generator in an embodiment; FIG.
图4为一个实施例中结合干扰模型和作曲模型建立混合模型的示意图;4 is a schematic diagram of establishing a hybrid model by combining an interference model and a composition model in an embodiment;
图5为一个实施例中构建音轨相关性的示意图;FIG. 5 is a schematic diagram of constructing audio track correlation in an embodiment; FIG.
图6为一个实施例中多音轨模型的示意图;6 is a schematic diagram of a multi-track model in an embodiment;
图7为一个实施例中对格式转换后的钢琴键轴进行数据清理的流程图;7 is a flowchart of data cleaning of a piano key shaft after format conversion in an embodiment;
图8为一个实施例中采用生成式对抗网络建立干扰音轨模型和作曲音轨模型的流程图;8 is a flowchart of establishing an interference track model and a composing track model by using a generative adversarial network in an embodiment;
图9为一个实施例中作曲模型的建立系统的结构框图;9 is a structural block diagram of a system for establishing a composition model in an embodiment;
图10为一个实施例中清理模块的结构框图;10 is a structural block diagram of a cleaning module in an embodiment;
图11为一个实施例中建立模块的结构框图。FIG. 11 is a structural block diagram of an establishment module in an embodiment.
具体实施方式detailed description
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。In order to make the purpose, technical solution, and advantages of the present application clearer, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the application, and are not used to limit the application.
本技术领域技术人员可以理解,除非特意声明,这里使用的单数形式“一”、“一个”、“所述”和“该”也可包括复数形式。应该进一步理解的是,本申请的说明书中使用的措辞“包括”是指存在所述特征、整数、步骤、操作、元件和/或组件,但是并不排除存在或添加一个或多个其他特征、整数、步骤、操作、元件、组件和/或它们的组。Those skilled in the art will understand that, unless specifically stated otherwise, the singular forms "a", "an", "the" and "the" may include plural forms. It should be further understood that the word "comprising" used in the specification of the present application refers to the presence of the described features, integers, steps, operations, elements and / or components, but does not exclude the presence or addition of one or more other features, Integers, steps, operations, elements, components, and / or groups thereof.
作为一个较好的实施例,如图1所示,一种作曲模型的建立方法,该作曲模型的建立方法包括以下步骤:As a better embodiment, as shown in FIG. 1, a method for establishing a composition model includes the following steps:
步骤S101,获取MIDI格式的音乐数据集,并将所述MIDI格式的音乐数据集转换为钢琴键轴,所述钢琴键轴是音乐存储媒介,用于再现钢琴弹奏,当被读取时触发每个音符的播放循环;Step S101: Obtain a music data set in the MIDI format and convert the music data set in the MIDI format into a piano key shaft, the piano key shaft is a music storage medium for reproducing a piano performance, and is triggered when read. Playback loop of each note;
针对目前音乐生成算法领域所形成的音乐以单音轨居多,不符合人们对音乐多样化的需求,本技术方案把音乐分为乐段,乐句,小节,节拍和像素五个层级,然后逐层生成,基于GAN网络(生成式对抗网络)构建音轨的相关性,每个音轨之间相对独立同时又需要互相配合,GAN网络给每个音轨同时输入了两个噪声向量,一个是所有音轨共有的,另一个是为每条音轨单独生成的,同样的方法也用在了处理小节之间的关系上,进而建立时序的相关性,最后所有的音轨和小节作为一个样本,和训练集中的样本一起被输入判别器中进行训练,最终完成了基于多音轨GAN网络的交响乐作曲。In view of the fact that the music formed by the current field of music generation algorithms is mostly single-track, which does not meet people's needs for music diversification, this technical solution divides music into five levels of passages, phrases, bars, beats and pixels, and then layer by layer Generate, build the correlation of audio tracks based on GAN network (generative adversarial network), each track is relatively independent and needs to cooperate with each other, GAN network inputs two noise vectors to each track at the same time, one is all The tracks are shared, and the other is generated separately for each track. The same method is also used to deal with the relationship between measures, and then to establish the correlation of timing. Finally, all the tracks and measures are used as a sample. Together with the samples in the training set, they are input into the discriminator for training, and finally the symphony composition based on the multitrack GAN network is completed.
本技术方案使用MIDI数据集,并将MIDI文件转换为多音轨piano-rolls(钢琴键轴)表示,piano-rolls是一组二进制表示方法,是在不同的时间步骤中表示音符存在的矩阵,即在每一个时刻,按下琴键的表示为1,没有按下的琴键表示为0,一个有M个音轨,每个音轨有R个time_step,候选音节(bar) 数量为S的音节记录为X,其数据形式为X RxSxM,T个音节则被表示为
Figure PCTCN2018106680-appb-000001
因此每个X的矩阵大小是固定的。
This technical solution uses a MIDI data set and converts a MIDI file into a multi-track piano-rolls (piano keys) representation. Piano-rolls is a set of binary representations, which are matrices that represent the existence of notes in different time steps. That is, at each moment, the key pressed is represented as 1, the key not pressed is represented as 0, one has M tracks, each track has R time_steps, and the number of candidate syllables (bar) is S. Is X, its data form is X RxSxM , and T syllables are expressed as
Figure PCTCN2018106680-appb-000001
Therefore, the matrix size of each X is fixed.
所述MIDI又称乐器数字接口,是一个工业标准的电子通信协定,为电子乐器等演奏设备(如合成器)定义各种音符或弹奏码,容许电子乐器、电脑、手机或其它的舞台演出配备彼此连接,调整和同步,得以实时交换演奏数据。The MIDI, also known as the digital interface of musical instruments, is an industry standard electronic communication protocol. It defines various notes or playing codes for electronic musical instruments and other performance equipment (such as synthesizers), allowing electronic musical instruments, computers, mobile phones or other stage performances. The equipment is connected to each other, adjusted and synchronized to exchange performance data in real time.
步骤S102,对格式转换后的钢琴键轴进行数据清理;Step S102, data cleaning is performed on the piano key shaft after the format conversion;
原数据集噪声很大,虽然对其进行数据清理后会有一定的音源损失(例如某些被合并的乐器的音源),但是数据的丰富性和有效性是训练模型的关键,所以对数据进行清理是必要的操作,因此使用以下方法对数据进行清理,有的乐器在相应的音轨上呈现的数据太少,不利于对模型的训练,所以本技术方案对其做了合并操作,通过总结piano-rolls将相似乐器的音轨合并起来,对于非黄铜,弦乐器,管乐器和打击乐的音轨统一归纳到弦乐器上去,使数据量得以丰富,为了训练时间模型,把四个小节(bar)视为一个乐句,从而将piano-rolls进行分割,可以将较长的片段修剪成合适的大小,最高音符和最低音符都不是很常见,所以音域采用C1到C8(钢琴最右键)。The original data set is very noisy, although there will be a certain loss of sound source after cleaning the data (such as the sound source of some merged instruments), but the richness and effectiveness of the data is the key to training the model. Cleanup is a necessary operation, so the following methods are used to clean up the data. Some instruments present too little data on the corresponding audio track, which is not conducive to the training of the model. Therefore, this technical solution has merged them. piano-rolls merges the soundtracks of similar instruments. For non-brass, string, wind and percussion tracks are unified into string instruments, which enriches the amount of data. In order to train the time model, four bars (bar) As a phrase, the piano-rolls can be segmented, and longer sections can be trimmed to a suitable size. The highest and lowest notes are not very common, so the range of sounds is C1 to C8 (rightmost of the piano).
步骤S103,采用生成式对抗网络建立干扰音轨模型和作曲音轨模型,并结合所述干扰音轨模型和作曲音轨模型建立混合音轨模型;In step S103, an interference track model and a composing track model are established by using a generative adversarial network, and a mixed track model is established by combining the interference track model and the composing track model;
基于生成器和鉴别器的互斗模式来实现GAN网络的对抗学习,进而使生成器逐渐掌握生成音乐序列的数据分布规律。GAN网络通过构建两个网络来实现对抗学习:生成器和鉴别器,生成器能捕捉真实数据样本的潜在分布,并且生成新的数据样本,而鉴别器是一个二分类器,判别生成器输入的数据是真实数据还是生成的样本,这两种模式相互“战斗”,在这个过程中,每一个模型都变得更强大。生成器会尽量使得自己生成样本的能力变强,强到生成样本与真实数据无异,并且鉴别器也无法判断生成器生成的样本是真还是假,从而逐渐掌握生成音乐序列的数据分布规律,而鉴别器也在这个过程中不断地提升了自己识别数据的样本的能力。所述GAN网络又称生成对抗网络,它包含了两个网络:一个网络用于生成数据叫做生成器,另一个网络用于判别生成数据是否接近于真实,叫做判别器。其基本原理是:训练一个生成器,从随机噪声或者潜在变量中生成逼真的样本,同时训练一个鉴别器来鉴别真实数据和生成数据,两者同时训练,直至达到一个纳什均衡,生成器生成的数据与真实样本无差别,鉴别器也无法正确的区分生成数据和真实数据。Adversarial learning of the GAN network is implemented based on the fighting mode of the generator and the discriminator, so that the generator gradually masters the data distribution rules of the generated music sequence. The GAN network implements adversarial learning by constructing two networks: a generator and a discriminator. The generator can capture the potential distribution of real data samples and generate new data samples. The discriminator is a two-classifier that determines the input of the generator. Whether the data is real data or generated samples, these two models "fight" each other, and in the process, each model becomes more powerful. The generator will try to make its own ability to generate samples as strong as it is to generate samples that are no different from the real data, and the discriminator cannot determine whether the samples generated by the generator are true or false, so as to gradually master the data distribution rules of the generated music sequence, The discriminator has also continuously improved its ability to identify data samples in this process. The GAN network is also called a generative adversarial network, which includes two networks: one network is used to generate data called a generator, and the other network is used to determine whether the generated data is close to the real, called a discriminator. The basic principle is: train a generator to generate realistic samples from random noise or latent variables, and train a discriminator to discriminate between real data and generated data. Both are trained at the same time until a Nash equilibrium is reached. The data is not different from the real sample, and the discriminator cannot distinguish the generated data from the real data correctly.
基于GAN网络建立干扰模型,每一个音轨拥有自己的一组生成器和鉴别器,及独立的隐空间变量Zi。多个生成器彼此独立工作,并且给定随机向量Zi(i=1, 2,…M,M代表生成器数量也就是音轨的数量)产生自己的音乐轨道,这些生成器接收来自不同鉴别器的反馈。如图2所示,为了产生M个音轨的音乐,需要M个发生器和M个鉴别器。An interference model is established based on the GAN network. Each track has its own set of generators and discriminators, and an independent hidden space variable Zi. Multiple generators work independently of each other, and given a random vector Zi (i = 1, 2, ... M, M represents the number of generators, that is, the number of audio tracks) to generate their own music tracks, these generators receive from different discriminators feedback of. As shown in Fig. 2, in order to generate music of M tracks, M generators and M discriminators are needed.
基于GAN网络建立作曲模型,由于作曲模型是多个音轨数据量输入到生成器中,因此通过一个单一的生成器就可以创建一个多通道的piano-rolls,每个通道代表一个特定的音轨,如图3所示。在这个模型中全局只有一组生成器和鉴别器,公用输入Z来生成所有的音轨,因此不管M的值是多少,只需要一个发生器和一个鉴别器即可。A composing model is established based on the GAN network. Since the composing model is the input of multiple track data into the generator, a single generator can be used to create a multi-channel piano-rolls. Each channel represents a specific track. ,As shown in Figure 3. In this model, there is only one set of generators and discriminators globally, and the common input Z is used to generate all audio tracks, so no matter what the value of M is, only one generator and one discriminator are needed.
如图4所示,结合干扰模型和作曲模型建立混合模型,混合模型结合上面两种模型方式,每个音轨都有一个发生器,接受独立的向量zi及全局的向量z共同组合成的输入向量,同时公用一个鉴别器来生成音轨。与作曲模型相比,混合模式更加灵活,可以在G模型中使用不同的参数(如层数,卷积核大小等),将音轨的独立生成和全局生成和谐结合起来。As shown in Figure 4, a hybrid model is created by combining the interference model and the composing model. The hybrid model combines the above two model methods. Each track has a generator that accepts an input composed of an independent vector zi and a global vector z. Vector, while sharing a discriminator to generate audio tracks. Compared with the composition model, the mixing mode is more flexible. Different parameters (such as the number of layers, the size of the convolution kernel, etc.) can be used in the G model to harmonize the independent generation of the audio track and the global generation.
步骤S104,将生成器分为时间结构生成器G temp和音乐小节生成器G bar,通过所述时间结构生成器G temp和所述音乐小节生成器G bar构建音乐小节之间的时序相关性形成时间模型; Step S104, the generator is divided into a time structure generator G temp and a music bar generator G bar , and a time correlation between the music bars is formed by the time structure generator G temp and the music bar generator G bar . Time model
如图5所示,构建音轨相关性的目的在于怎样在不同音轨中生成单个的小节,小节与小节之间的时序关联需要其他的结构来补充生成。将生成器分为两个子网络:时间结构生成器G temp和音乐小节生成器G bar,如下图所示。G temp将输入向量Z映射成一个隐空间向量的序列
Figure PCTCN2018106680-appb-000002
其中,T、t为时间,Z箭头
Figure PCTCN2018106680-appb-000003
会承载一些时序信息,随后被送入G bar,序列化地生成钢琴琴键piano-rolls,
As shown in FIG. 5, the purpose of constructing track correlation is how to generate a single measure in different tracks, and the timing relationship between measures and measures needs other structures to supplement the generation. The generator is divided into two sub-networks: a time structure generator G temp and a music bar generator G bar , as shown in the figure below. G temp maps the input vector Z into a sequence of hidden space vectors
Figure PCTCN2018106680-appb-000002
Among them, T, t are time, Z arrow
Figure PCTCN2018106680-appb-000003
It will carry some timing information, and then it is sent to G bar to generate piano keys serially.
定义为:
Figure PCTCN2018106680-appb-000004
defined as:
Figure PCTCN2018106680-appb-000004
步骤S105,组合所述混合音轨模型和时间模型,以形成多音轨交响乐作曲模型。In step S105, the mixed audio track model and the time model are combined to form a multi-track symphony composition model.
多音轨模型,是上述音轨模型和时间模型的整合和扩展,如图6所示,模型的输入用
Figure PCTCN2018106680-appb-000005
表示,由四部分组成,轨道间全局时间相关向量Z t,轨道间全局时间无关向量Z,轨道内单独时间无关向量Z i,和轨道内单独时间相关向量Z i,t
Multi-track model is the integration and extension of the above-mentioned track model and time model. As shown in Figure 6, the model input
Figure PCTCN2018106680-appb-000005
Represented by four parts, global time correlation between tracks vector Z t, the time between the track irrespective of the global vector Z, a separate time-independent vector Z i of the inner rail, the inner rail, and a separate time-dependent vector Z i, t.
对于音轨i(i=1,2……M),共用的时间结构生成器G temp和各自使用的时间结构生成器G temp,i,分别采取随时间变化的随机向量Z t和Z i,t作为输入,并且它们分别输出一系列包含音轨间和音轨内时间信息的潜在向量,输出序列(潜在向量)连同时间无关的随机向量z和z i一起被送入音乐小节生成器G bar,然后按顺序生成钢琴窗。 For the audio track i (i = 1,2 ... M), the shared time structure generator G temp and the respective time structure generator G temp, i respectively adopt random vectors Z t and Z i that change with time , t as input, and they output a series of latent vectors containing inter-track and intra-track time information respectively. The output sequence (latent vector) is sent to the music bar generator G bar together with the time-independent random vectors z and z i , And then generate piano windows in order.
生成过程可制定为:
Figure PCTCN2018106680-appb-000006
The generation process can be formulated as:
Figure PCTCN2018106680-appb-000006
从该生成公式上可以清楚地看出,各轨道间的输入变量(分为时间相关和无关)和全局输入变量(分为时间相关和无关)结合起来,形成多音轨交响乐生成系统。It can be clearly seen from the generating formula that the input variables between the tracks (divided into time dependent and irrelevant) and global input variables (divided into time dependent and irrelevant) are combined to form a multi-track symphony generating system.
如图7所示,在一个实施例中,所述对格式转换后的钢琴键轴进行数据清理包括:As shown in FIG. 7, in one embodiment, the data cleaning of the piano key shaft after format conversion includes:
步骤S201,通过钢琴键轴将相似乐器的音轨合并,将非黄铜,弦乐器,管乐器和打击乐的音轨统一归纳到弦乐器音轨;In step S201, the sound tracks of similar instruments are merged through the piano key shaft, and the tracks of non-brass, string instruments, wind instruments and percussion are unified into a string instrument track;
原数据集噪声很大,虽然对其进行数据清理后会有一定的音源损失,例如某些被合并的乐器的音源,但是数据的丰富性和有效性是训练模型的关键,所以对数据进行清理是必要的操作,因此使用以下方法对数据进行清理,有的乐器在相应的音轨上呈现的数据太少,不利于对模型的训练,所以本技术方案对其做了合并操作,通过总结piano-rolls将相似乐器的音轨合并起来,对于非黄铜,弦乐器,管乐器和打击乐的音轨统一归纳到弦乐器上去,使数据量得以丰富。The original data set is very noisy, although there will be a certain loss of sound source after data cleaning, such as the sound source of some merged instruments, but the richness and effectiveness of the data is the key to training the model, so the data is cleaned up. It is a necessary operation, so use the following methods to clean up the data. Some instruments present too little data on the corresponding audio track, which is not conducive to the training of the model. Therefore, this technical solution has merged them. By summarizing piano -rolls merges the soundtracks of similar instruments. For non-brass, string, wind and percussion tracks are unified into string instruments, which enriches the data volume.
步骤S202,将四个音乐小节视为一个乐句,通过钢琴卷轴进行分割,将长片段修剪成适合的大小,所述适合的大小为C1到C8的音域。In step S202, the four music bars are regarded as one phrase, and the long section is divided into a suitable size by dividing the piano scroll, and the suitable size is the sound range of C1 to C8.
将四个音乐小节视为一个乐句,通过钢琴卷轴进行分割,将长片段修剪成适合的大小,适合的大小为C1到C8的音域,为了训练时间模型,把四个小节(bar)视为一个乐句,从而将piano-rolls进行分割,可以将较长的片段修剪成合适的大小,最高音符和最低音符都不是很常见,所以音域采用C1到C8(钢琴最右键)。Treat four music bars as a phrase, divide it by the piano scroll, trim the long section to a suitable size, and the appropriate size ranges from C1 to C8. In order to train the time model, consider the four bars (bar) as one Phrase, so that the piano-rolls can be segmented, and longer sections can be trimmed to the appropriate size. The highest and lowest notes are not very common, so the range is C1 to C8 (rightmost of the piano).
如图8所示,在一个实施例中,所述采用生成式对抗网络建立干扰音轨模型和作曲音轨模型,并结合所述干扰音轨模型和作曲音轨模型建立混合音轨模型包括:As shown in FIG. 8, in one embodiment, the use of a generative adversarial network to establish an interference track model and a composing track model, and combining the interference track model and the composing track model to establish a mixed track model include:
步骤S301,采用生成式对抗网络建立干扰音轨模型,所述干扰音轨模型中每一个音轨拥有自己的一组生成器和鉴别器,以及独立的隐空间变量,其中多个生成器彼此独立工作,并且给定随机向量产生自己的音乐轨道,所述生成器接收来自不同鉴别器的反馈,利用生成器和鉴别器的互斗模式实现生成式对抗网络的对抗学习,使生成器控制生成音乐序列的数据分布规律;Step S301, a generative adversarial network is used to establish an interfering track model. Each of the interfering track models has its own set of generators and discriminators, and independent hidden space variables, where multiple generators are independent of each other. Work, and given its own random vector to generate its own music track, the generator receives feedback from different discriminators, and uses the combat mode of the generator and discriminator to implement adversarial learning of the generative adversarial network, so that the generator controls the generation of music The data distribution of the sequence;
基于生成器和鉴别器的互斗模式来实现GAN网络的对抗学习,进而使生成器逐渐掌握生成音乐序列的数据分布规律,GAN网络通过构建两个网络来实现对抗学习:生成器和鉴别器,生成器能捕捉真实数据样本的潜在分布,并且生成 新的数据样本,而鉴别器是一个二分类器,判别生成器输入的数据是真实数据还是生成的样本,这两种模式相互“战斗”,在这个过程中,每一个模型都变得更强大。生成器会尽量使得自己生成样本的能力变强,强到生成样本与真实数据无异,并且鉴别器也无法判断生成器生成的样本是真还是假,从而逐渐掌握生成音乐序列的数据分布规律,而鉴别器也在这个过程中不断地提升了自己识别数据的样本的能力。所述GAN网络又称生成对抗网络,它包含了两个网络:一个网络用于生成数据叫做生成器,另一个网络用于判别生成数据是否接近于真实,叫做判别器。其基本原理是:训练一个生成器,从随机噪声或者潜在变量中生成逼真的样本,同时训练一个鉴别器来鉴别真实数据和生成数据,两者同时训练,直至达到一个纳什均衡,生成器生成的数据与真实样本无差别,鉴别器也无法正确的区分生成数据和真实数据。The adversarial learning of the GAN network is realized based on the fighting mode of the generator and the discriminator, so that the generator gradually masters the data distribution rules of the generated music sequence. The GAN network implements adversarial learning by constructing two networks: the generator and the discriminator. The generator can capture the potential distribution of real data samples and generate new data samples. The discriminator is a two-classifier that determines whether the data input by the generator is real data or a generated sample. These two modes "fight" each other. In the process, each model becomes more powerful. The generator will try to make its own ability to generate samples as strong as it is to generate samples that are no different from the real data, and the discriminator cannot determine whether the samples generated by the generator are true or false, so as to gradually master the data distribution rules of the generated music sequence, The discriminator has also continuously improved its ability to identify data samples in this process. The GAN network is also called a generative adversarial network, which includes two networks: one network is used to generate data called a generator, and the other network is used to determine whether the generated data is close to the real, called a discriminator. The basic principle is: train a generator to generate realistic samples from random noise or latent variables, and train a discriminator to discriminate between real data and generated data. Both are trained at the same time until a Nash equilibrium is reached. The data is not different from the real sample, and the discriminator cannot distinguish the generated data from the real data correctly.
基于GAN网络建立干扰模型,每一个音轨拥有自己的一组生成器和鉴别器,及独立的隐空间变量Zi。多个生成器彼此独立工作,并且给定随机向量Zi(i=1,2,…M,M代表生成器数量也就是音轨的数量)产生自己的音乐轨道,这些生成器接收来自不同鉴别器的反馈。如图2所示,为了产生M个音轨的音乐,需要M个发生器和M个鉴别器。An interference model is established based on the GAN network. Each track has its own set of generators and discriminators, and an independent hidden space variable Zi. Multiple generators work independently of each other, and given a random vector Zi (i = 1, 2, ... M, M represents the number of generators, that is, the number of audio tracks) to generate their own music tracks, these generators receive from different discriminators feedback of. As shown in Fig. 2, in order to generate music of M tracks, M generators and M discriminators are needed.
步骤S302,采用生成式对抗网络建立作曲音轨模型,所述作曲音轨模型将多个音轨数据量输入到所述生成器中,通过一个单一的生成器就创建一个多通道的钢琴键轴,每个通道代表一个特定的音轨;In step S302, a composition track model is established using a generative adversarial network. The composition track model inputs a plurality of track data quantities into the generator, and a multi-channel piano key is created through a single generator. , Each channel represents a specific audio track;
由于作曲模型是多个音轨数据量输入到生成器中,因此通过一个单一的生成器就可以创建一个多通道的piano-rolls,每个通道代表一个特定的音轨,如图3所示。在这个模型中全局只有一组生成器和鉴别器,公用输入z来生成所有的音轨,因此不管M的值是多少,只需要一个发生器和一个鉴别器即可。Since the composition model is the input of multiple audio track data into the generator, a single generator can be used to create multi-channel piano-rolls, each channel representing a specific audio track, as shown in Figure 3. In this model, there is only one set of generators and discriminators globally. The common input z is used to generate all audio tracks, so no matter what the value of M is, only one generator and one discriminator are needed.
步骤S303,结合所述干扰音轨模型和作曲音轨模型建立混合音轨模型,所述混合音轨模型中每个音轨都有一个发生器,接受独立的向量及全局的向量共同组合成的输入向量,同时公用一个鉴别器来生成音轨。In step S303, a mixed track model is established by combining the interference track model and the composing track model. Each track in the mixed track model has a generator that accepts a combination of independent vectors and global vectors. The input vector is shared with a discriminator to generate the audio track.
如图4所示,结合干扰模型和作曲模型建立混合模型,混合模型结合上面两种模型方式,每个音轨都有一个发生器,接受独立的向量Z i及全局的向量z共同组合成的输入向量,同时公用一个鉴别器来生成音轨。与作曲模型相比,混合模式更加灵活,可以在G模型中使用不同的参数,如层数,卷积核大小等,将音轨的独立生成和全局生成和谐结合起来。 As shown in Figure 4, a hybrid model is created by combining the interference model and the composition model. The hybrid model combines the above two model methods. Each track has a generator that accepts the independent vector Z i and the global vector z together. The input vector is shared with a discriminator to generate the audio track. Compared with the composition model, the mixing mode is more flexible. Different parameters such as the number of layers and the size of the convolution kernel can be used in the G model to combine the independent generation of the audio track and the global generation in harmony.
在一个实施例中,所述将生成器分为时间结构生成器G temp和音乐小节生成器G bar,通过所述时间结构生成器G temp和所述音乐小节生成器G bar构建音乐小节之间 的时序相关性形成时间模型包括: In one embodiment, the generator is divided into a time structure generator G temp and a music bar generator G bar , and a music bar is constructed between the time structure generator G temp and the music bar generator G bar . The temporal models of temporal correlation include:
将生成器分为时间结构生成器G temp和音乐小节生成器G bar,所述时间结构生成器G temp将输入向量Z映射成一个隐空间向量的序列 The generator is divided into a time structure generator G temp and a music bar generator G bar . The time structure generator G temp maps the input vector Z into a sequence of hidden space vectors.
Figure PCTCN2018106680-appb-000007
T代表时间,
Figure PCTCN2018106680-appb-000008
承载时序信息,随后被送入音乐小节生成器G bar,序列化地生成钢琴键轴,定义为:
Figure PCTCN2018106680-appb-000009
Figure PCTCN2018106680-appb-000007
T stands for time,
Figure PCTCN2018106680-appb-000008
Carry timing information, and then send it to the music bar generator G bar to serially generate piano keys, which is defined as:
Figure PCTCN2018106680-appb-000009
在一个实施例中,所述组合所述混合音轨模型和时间模型,以形成多音轨交响乐作曲模型包括:In one embodiment, the combining the mixed audio track model and the time model to form a multi-track symphony composition model includes:
混合音轨模型和时间模型的组合形成多音轨交响乐作曲模型,其中所述模型的输入用
Figure PCTCN2018106680-appb-000010
表示,由轨道间全局时间相关向量Z t,轨道间全局时间无关向量Z,轨道内单独时间无关向量Z i,和轨道内单独时间相关向量Z i, t组成;
The combination of the mixed track model and the time model forms a multi-track symphony composition model, where the input of the model is used
Figure PCTCN2018106680-appb-000010
Represents, Z t the temporal correlation between tracks global vector, the global time between tracks independent vector Z, a separate time-independent vector Z i of the inner rail, the inner rail, and a separate time-dependent vector Z i, t composition;
设定音轨i,所述音轨i=1,2……M,共用的时间结构生成器G temp和各自使用的时间结构生成器G temp,i,分别采取随时间变化的随机向量Z t和Z i,t作为输入,并且分别输出包含音轨间和音轨内时间信息的潜在向量,输出序列潜在向量连同时间无关的随机向量Z和Z i一起被送入音乐小节生成器G bar,然后按顺序生成钢琴窗,生成过程可定义为: Set track i, said track i = 1, 2 ... M, the shared time structure generator G temp and the time structure generator G temp, i used respectively, take random vectors Z t that change with time, respectively And Z i, t as inputs, and output potential vectors containing inter-track and intra-track time information respectively. The output sequence potential vectors are sent to the music bar generator G bar together with the time-independent random vectors Z and Z i . Then generate piano windows in sequence. The generation process can be defined as:
Figure PCTCN2018106680-appb-000011
Figure PCTCN2018106680-appb-000011
各轨道间的输入变量和全局输入变量结合起来,形成多音轨交响乐,所述各轨道间的输入变量分为时间相关和无关,所述全局输入变量分为时间相关和无关。The input variables between the tracks are combined with the global input variables to form a multi-track symphony. The input variables between the tracks are divided into time dependent and unrelated, and the global input variables are divided into time related and unrelated.
多音轨模型,是上述音轨模型和时间模型的整合和扩展,如图6所示,模型的输入用
Figure PCTCN2018106680-appb-000012
表示,由四部分组成,轨道间全局时间相关向量Z t,轨道间全局时间无关向量Z,轨道内单独时间无关向量Z i,和轨道内单独时间相关向量Z i,t。
Multi-track model is the integration and extension of the above-mentioned track model and time model. As shown in Figure 6, the model input
Figure PCTCN2018106680-appb-000012
Represented by four parts, global time correlation between tracks vector Z t, the time between the track irrespective of the global vector Z, a separate time-independent vector Z i of the inner rail, the inner rail, and a separate time-dependent vector Z i, t.
对于音轨i(i=1,2……M),共用的时间结构生成器G temp和各自使用的时间结构生成器G temp,i,分别采取随时间变化的随机向量Z t和Z i,t作为输入,并且它们分别输出一系列包含音轨间和音轨内时间信息的潜在向量,输出序列(潜在向量)连同时间无关的随机向量Z和Z i一起被送入音乐小节生成器G bar,然后按顺序生成钢琴窗。生成过程可制定为: For the audio track i (i = 1,2 ... M), the shared time structure generator G temp and the respective time structure generator G temp, i respectively adopt random vectors Z t and Z i that change with time , t as input, and they output a series of latent vectors containing inter-track and intra-track time information, and the output sequence (potential vector) is sent to the music bar generator G bar together with the time-independent random vectors Z and Z i , And then generate piano windows in order. The generation process can be formulated as:
Figure PCTCN2018106680-appb-000013
Figure PCTCN2018106680-appb-000013
从该生成公式上可以清楚地看出,各轨道间的输入变量(分为时间相关和无关)和全局输入变量(分为时间相关和无关)结合起来,形成多音轨交响乐生成系统。It can be clearly seen from the generating formula that the input variables between the tracks (divided into time dependent and irrelevant) and the global input variables (divided into time dependent and unrelated) are combined to form a multi-track symphony generating system.
如图9所示,在一个实施例中,提供了一种作曲模型的建立系统,所述作曲模型的建立系统包括:As shown in FIG. 9, in one embodiment, a system for establishing a composition model is provided. The system for establishing a composition model includes:
获取模块,设置为获取MIDI格式的音乐数据集,并将所述MIDI格式的音乐数据集转换为钢琴键轴,所述钢琴键轴是音乐存储媒介,用于再现钢琴弹奏,当被读取时触发每个音符的播放循环;The acquisition module is configured to acquire a music data set in the MIDI format and convert the music data set in the MIDI format into a piano key shaft, where the piano key shaft is a music storage medium for reproducing a piano performance, and when read Trigger the playback loop of each note
清理模块,设置为对格式转换后的钢琴键轴进行数据清理;The cleaning module is set to clean the data of the piano key axis after the format conversion;
建立模块,设置为采用生成式对抗网络建立干扰音轨模型和作曲音轨模型,并结合所述干扰音轨模型和作曲音轨模型建立混合音轨模型;A building module configured to use a generative adversarial network to establish an interference track model and a composing track model, and combine the interference track model and the composing track model to establish a mixed track model;
构建模块,设置为将生成器分为时间结构生成器G temp和音乐小节生成器G bar,通过所述时间结构生成器G temp和所述音乐小节生成器G bar构建音乐小节之间的时序相关性形成时间模型; A building module configured to divide the generator into a time structure generator G temp and a music bar generator G bar , and use the time structure generator G temp and the music bar generator G bar to construct a timing correlation between music bars Sexual formation time model
组合模块,设置为组合所述混合音轨模型和时间模型,以形成多音轨交响乐作曲模型。The combination module is configured to combine the mixed audio track model and the time model to form a multi-track symphony composition model.
如图10所示,在一个实施例中,所述清理模块还包括:As shown in FIG. 10, in one embodiment, the cleaning module further includes:
合并单元,设置为通过钢琴键轴将相似乐器的音轨合并,将非黄铜,弦乐器,管乐器和打击乐的音轨统一归纳到弦乐器音轨;The merging unit is configured to merge the tracks of similar instruments through the piano key axis, and integrate the non-brass, string, wind and percussion tracks into the string instrument track;
修剪单元,设置为将四个音乐小节视为一个乐句,通过钢琴卷轴进行分割,将长片段修剪成适合的大小,所述适合的大小为C1到C8的音域。The trimming unit is configured to treat four music bars as one phrase, divide it by a piano reel, and trim a long section to a suitable size, where the suitable size is the sound range of C1 to C8.
如图11所示,在一个实施例中,所述建立模块还包括:As shown in FIG. 11, in one embodiment, the establishment module further includes:
建立干扰模型单元,设置为采用生成式对抗网络建立干扰音轨模型,所述干扰音轨模型中每一个音轨拥有自己的一组生成器和鉴别器,以及独立的隐空间变量,其中多个生成器彼此独立工作,并且给定随机向量产生自己的音乐轨道,所述生成器接收来自不同鉴别器的反馈,利用生成器和鉴别器的互斗模式实现生成式对抗网络的对抗学习,使生成器控制生成音乐序列的数据分布规律;An interference model unit is set up to set up an interference track model using a generative adversarial network. Each of the interference track models has its own set of generators and discriminators, as well as independent hidden space variables. The generators work independently of each other, and given a random vector to generate their own music track, the generator receives feedback from different discriminators, and uses the combat mode of the generator and discriminator to implement the adversarial learning of the generative adversarial network to make the generation Controller controls the data distribution of generated music sequences;
建立作曲模型单元,设置为采用生成式对抗网络建立作曲音轨模型,所述作曲音轨模型将多个音轨数据量输入到所述生成器中,通过一个单一的生成器就创建一个多通道的钢琴键轴,每个通道代表一个特定的音轨;A composition model unit is set up to set up a composition track model using a generative adversarial network. The composition track model inputs a plurality of track data quantities into the generator, and creates a multi-channel through a single generator. Piano keys, each channel represents a specific audio track;
结合单元,设置为结合所述干扰音轨模型和作曲音轨模型建立混合音轨模型,所述混合音轨模型中每个音轨都有一个发生器,接受独立的向量及全局的向量共同组合成的输入向量,同时公用一个鉴别器来生成音轨。The combining unit is configured to establish a mixed audio track model by combining the disturbing audio track model and the composing audio track model. Each audio track in the mixed audio track model has a generator that accepts a combination of independent vectors and global vectors. Into the input vector, and a discriminator is used to generate the audio track.
在一个实施例中,所述构建模块包括:In one embodiment, the building module includes:
生成器处理单元,设置为将生成器分为时间结构生成器G temp和音乐小节生成器G bar,所述时间结构生成器G temp将输入向量Z映射成一个隐空间向量的序列 A generator processing unit configured to divide the generator into a time structure generator G temp and a music bar generator G bar , where the time structure generator G temp maps the input vector Z into a sequence of hidden space vectors
Figure PCTCN2018106680-appb-000014
Figure PCTCN2018106680-appb-000014
T代表时间,
Figure PCTCN2018106680-appb-000015
承载时序信息,随后被送入音乐小节生成器G bar,序列化地生 成钢琴键轴,定义为:
Figure PCTCN2018106680-appb-000016
T stands for time,
Figure PCTCN2018106680-appb-000015
Carry timing information, and then send it to the music bar generator G bar to serially generate piano keys, which is defined as:
Figure PCTCN2018106680-appb-000016
在一个实施例中,所述组合模块包括:In one embodiment, the combination module includes:
混合处理单元,设置为混合音轨模型和时间模型的组合形成多音轨交响乐作曲模型,其中所述模型的输入用
Figure PCTCN2018106680-appb-000017
表示,由轨道间全局时间相关向量Z t,轨道间全局时间无关向量Z,轨道内单独时间无关向量Z i,和轨道内单独时间相关向量Z i, t组成;
A mixing processing unit configured to form a combination of a mixed audio track model and a time model to form a multi-track symphony composition model, wherein the input of the model is used
Figure PCTCN2018106680-appb-000017
Represents, Z t the temporal correlation between tracks global vector, the global time between tracks independent vector Z, a separate time-independent vector Z i of the inner rail, the inner rail, and a separate time-dependent vector Z i, t composition;
乐曲组合生成单元,设置为设定音轨i,所述音轨i=1,2……M,共用的时间结构生成器G temp和各自使用的时间结构生成器G temp,i,分别采取随时间变化的随机向量Z t和Z i,t作为输入,并且分别输出包含音轨间和音轨内时间信息的潜在向量,输出序列潜在向量连同时间无关的随机向量Z和Z i一起被送入音乐小节生成器G bar,然后按顺序生成钢琴窗,生成过程可定义为: The music composition generating unit is set to set a track i, where the track i = 1, 2,... M, the common time structure generator G temp and the respective time structure generator G temp, i are respectively adopted as Time-varying random vectors Z t and Z i, t are used as inputs, and potential vectors containing time information between and within tracks are output respectively. The output sequence potential vectors are sent together with time-independent random vectors Z and Z i Music bar generator G bar and then generate piano windows in sequence. The generation process can be defined as:
Figure PCTCN2018106680-appb-000018
Figure PCTCN2018106680-appb-000018
各轨道间的输入变量和全局输入变量结合起来,形成多音轨交响乐,所述各轨道间的输入变量分为时间相关和无关,所述全局输入变量分为时间相关和无关。The input variables between the tracks are combined with the global input variables to form a multi-track symphony. The input variables between the tracks are divided into time dependent and unrelated, and the global input variables are divided into time related and unrelated.
在一个实施例中,提出了一种计算机设备,所述计算机设备包括存储器和处理器,存储器中存储有计算机可读指令,计算机可读指令被处理器执行时,使得处理器执行上述各实施例中的所述作曲模型的建立方法的步骤。In one embodiment, a computer device is provided. The computer device includes a memory and a processor. The memory stores computer-readable instructions. When the computer-readable instructions are executed by the processor, the processor is caused to execute the foregoing embodiments. Steps in the method of building a composition model.
在一个实施例中,提出了一种存储有计算机可读指令的存储介质,该计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行上述各实施例中的所述作曲模型的建立方法的步骤。其中,所述存储介质可以为非易失性存储介质。In one embodiment, a storage medium storing computer-readable instructions is provided. When the computer-readable instructions are executed by one or more processors, the one or more processors are caused to execute all the steps in the foregoing embodiments. Describe the steps of the method of building a composition model. The storage medium may be a non-volatile storage medium.
本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序可以存储于一计算机可读存储介质中,存储介质可以包括:只读存储器(ROM,Read Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁盘或光盘等。A person of ordinary skill in the art may understand that all or part of the steps in the various methods of the foregoing embodiments may be implemented by a program instructing related hardware. The program may be stored in a computer-readable storage medium. The storage medium may include: Read-only memory (ROM, Read Only Memory), random access memory (RAM, Random Access Memory), magnetic disks or optical disks, etc.
以上所述实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。The technical features of the embodiments described above can be arbitrarily combined. In order to simplify the description, all possible combinations of the technical features in the above embodiments have not been described. However, as long as there is no contradiction in the combination of these technical features, It should be considered as the scope described in this specification.
以上所述实施例仅表达了本申请一些示例性实施例,其描述较为具体和详细,但并不能因此而理解为对本申请专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变 形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。The above-mentioned embodiments only express some exemplary embodiments of the present application, and their descriptions are more specific and detailed, but cannot be understood as a limitation on the scope of the patent of the present application. It should be noted that, for those of ordinary skill in the art, without departing from the concept of the present application, several modifications and improvements can be made, and these all belong to the protection scope of the present application. Therefore, the protection scope of this application patent shall be subject to the appended claims.

Claims (20)

  1. 一种作曲模型的建立方法,包括:获取MIDI格式的音乐数据集,并将所述MIDI格式的音乐数据集转换为钢琴键轴,所述钢琴键轴是音乐存储媒介,用于再现钢琴弹奏,当被读取时触发每个音符的播放循环;对格式转换后的钢琴键轴进行数据清理;采用生成式对抗网络建立干扰音轨模型和作曲音轨模型,并结合所述干扰音轨模型和作曲音轨模型建立混合音轨模型;将生成器分为时间结构生成器G temp和音乐小节生成器G bar,通过所述时间结构生成器G temp和所述音乐小节生成器G bar构建音乐小节之间的时序相关性形成时间模型;组合所述混合音轨模型和时间模型,以形成多音轨交响乐作曲模型。 A method for establishing a composition model includes: acquiring a music data set in a MIDI format and converting the music data set in a MIDI format into a piano key shaft, the piano key shaft being a music storage medium for reproducing a piano performance , When it is read, trigger the playback cycle of each note; clean up the piano keys after format conversion; use the generative adversarial network to establish the interference track model and the composition track model, and combine the interference track model Build a mixed track model with the composer track model; divide the generator into a time structure generator G temp and a music bar generator G bar , and build music through the time structure generator G temp and the music bar generator G bar The temporal correlation between the bars forms a time model; the mixed track model and the time model are combined to form a multi-track symphony composition model.
  2. 根据权利要求1所述的作曲模型的建立方法,其中,所述对格式转换后的钢琴键轴进行数据清理包括:通过钢琴键轴将相似乐器的音轨合并,将非黄铜,弦乐器,管乐器和打击乐的音轨统一归纳到弦乐器音轨;将四个音乐小节视为一个乐句,通过钢琴卷轴进行分割,将长片段修剪成适合的大小,所述适合的大小为C1到C8的音域。The method for establishing a composition model according to claim 1, wherein the data cleaning of the piano keys after format conversion comprises: merging sound tracks of similar instruments through the piano keys, combining non-brass, string instruments, and wind instruments The percussion and percussion tracks are collectively summarized into the string instrument track; the four music measures are regarded as one phrase, which is divided by the piano scroll, and the long section is trimmed to a suitable size, which is the sound range of C1 to C8.
  3. 根据权利要求1所述的作曲模型的建立方法,其中,所述采用生成式对抗网络建立干扰音轨模型和作曲音轨模型,并结合所述干扰音轨模型和作曲音轨模型建立混合音轨模型包括:The method for establishing a composition model according to claim 1, wherein the interference track model and the composition track model are established by using a generative adversarial network, and the mixed track is established by combining the interference track model and the composition track model. The model includes:
    采用生成式对抗网络建立干扰音轨模型,所述干扰音轨模型中每一个音轨拥有自己的一组生成器和鉴别器,以及独立的隐空间变量,其中多个生成器彼此独立工作,并且给定随机向量产生自己的音乐轨道,所述生成器接收来自不同鉴别器的反馈,利用生成器和鉴别器的互斗模式实现生成式对抗网络的对抗学习,使生成器控制生成音乐序列的数据分布规律;A generative adversarial network is used to establish an interfering track model. Each of the interfering track models has its own set of generators and discriminators, and independent hidden space variables, where multiple generators work independently of each other, and Given a random vector to generate its own music track, the generator receives feedback from different discriminators, and uses the combat mode of the generator and discriminator to implement adversarial learning of the generative adversarial network, so that the generator controls the data that generates the music sequence Distribution;
    采用生成式对抗网络建立作曲音轨模型,所述作曲音轨模型将多个音轨数据量输入到所述生成器中,通过一个单一的生成器就创建一个多通道的钢琴键轴,每个通道代表一个特定的音轨;A generative adversarial network is used to establish a composition track model. The composition track model inputs multiple track data quantities into the generator. A single generator creates a multi-channel piano key, each Channel represents a specific audio track;
    结合所述干扰音轨模型和作曲音轨模型建立混合音轨模型,所述混合音轨模型中每个音轨都有一个发生器,接受独立的向量及全局的向量共同组合成的输入向量,同时公用一个鉴别器来生成音轨。A mixed audio track model is established by combining the interference audio track model and the composing audio track model. Each audio track in the mixed audio track model has a generator that accepts an input vector combined by independent vectors and global vectors. A discriminator is also used to generate audio tracks.
  4. 根据权利要求1所述的作曲模型的建立方法,其中,所述将生成器分为时间结构生成器G temp和音乐小节生成器G bar,通过所述时间结构生成器G temp和所述音乐小节生成器G bar构建音乐小节之间的时序相关性形成时间模型包括: The method for establishing a composition model according to claim 1, wherein the generator is divided into a time structure generator Gtemp and a music bar generator Gbar , and the time structure generator Gtemp and the music bar The generator G bar builds the temporal correlation between music bars to form a time model including:
    将生成器分为时间结构生成器G temp和音乐小节生成器G bar,所述时间结构生成器G temp将输入向量Z映射成一个隐空间向量的序列 The generator is divided into a time structure generator G temp and a music bar generator G bar . The time structure generator G temp maps the input vector Z into a sequence of hidden space vectors.
    Figure PCTCN2018106680-appb-100001
    Figure PCTCN2018106680-appb-100001
    T代表时间,
    Figure PCTCN2018106680-appb-100002
    承载时序信息,随后被送入音乐小节生成器G bar,序列化地生成钢琴键轴,定义为:
    Figure PCTCN2018106680-appb-100003
    T stands for time,
    Figure PCTCN2018106680-appb-100002
    Carry timing information, and then send it to the music bar generator G bar to serially generate piano keys, which is defined as:
    Figure PCTCN2018106680-appb-100003
  5. 根据权利要求1所述的作曲模型的建立方法,其中,所述组合所述混合音轨模型和时间模型,以形成多音轨交响乐作曲模型包括:The method for establishing a composition model according to claim 1, wherein the combining the mixed track model and the time model to form a multi-track symphony composition model comprises:
    混合音轨模型和时间模型的组合形成多音轨交响乐作曲模型,其中所述模型的输入用
    Figure PCTCN2018106680-appb-100004
    表示,由轨道间全局时间相关向量Z t,轨道间全局时间无关向量Z,轨道内单独时间无关向量Z i,和轨道内单独时间相关向量Z i,t组成;
    The combination of the mixed track model and the time model forms a multi-track symphony composition model, where the input of the model is used
    Figure PCTCN2018106680-appb-100004
    Represents, Z t the temporal correlation between tracks global vector, the global time between tracks independent vector Z, a separate time-independent vector Z i of the inner rail, the inner rail, and a separate time-dependent vector Z i, t composition;
    设定音轨i,所述音轨i=1,2……M,共用的时间结构生成器G temp和各自使用的时间结构生成器G temp,i,分别采取随时间变化的随机向量Z t和Z i,t作为输入,并且分别输出包含音轨间和音轨内时间信息的潜在向量,输出序列潜在向量连同时间无关的随机向量Z和Z i一起被送入音乐小节生成器G bar,然后按顺序生成钢琴窗, Set track i, said track i = 1, 2 ... M, the shared time structure generator G temp and the time structure generator G temp, i used respectively, take random vectors Z t that change with time, respectively And Z i, t as inputs, and output potential vectors containing inter-track and intra-track time information respectively. The output sequence potential vectors are sent to the music bar generator G bar together with the time-independent random vectors Z and Z i . Then generate piano windows in order,
    生成过程可定义为:
    Figure PCTCN2018106680-appb-100005
    The generation process can be defined as:
    Figure PCTCN2018106680-appb-100005
    各轨道间的输入变量和全局输入变量结合起来,形成多音轨交响乐,所述各轨道间的输入变量分为时间相关和无关,所述全局输入变量分为时间相关和无关。The input variables between the tracks are combined with the global input variables to form a multi-track symphony. The input variables between the tracks are divided into time dependent and unrelated, and the global input variables are divided into time related and unrelated.
  6. 一种作曲模型的建立系统,包括:A system for establishing a composition model, including:
    获取模块,设置为获取MIDI格式的音乐数据集,并将所述MIDI格式的音乐数据集转换为钢琴键轴,所述钢琴键轴是音乐存储媒介,用于再现钢琴弹奏,当被读取时触发每个音符的播放循环;The acquisition module is configured to acquire a music data set in the MIDI format and convert the music data set in the MIDI format into a piano key shaft, where the piano key shaft is a music storage medium for reproducing a piano performance, and when read Trigger the playback loop of each note
    清理模块,设置为对格式转换后的钢琴键轴进行数据清理;The cleaning module is set to clean the data of the piano key axis after the format conversion;
    建立模块,设置为采用生成式对抗网络建立干扰音轨模型和作曲音轨模型,并结合所述干扰音轨模型和作曲音轨模型建立混合音轨模型;A building module configured to use a generative adversarial network to establish an interference track model and a composing track model, and combine the interference track model and the composing track model to establish a mixed track model;
    构建模块,设置为将生成器分为时间结构生成器G temp和音乐小节生成器G bar,通过所述时间结构生成器G temp和所述音乐小节生成器G bar构建音乐小节之间的时序相关性形成时间模型; A building module configured to divide the generator into a time structure generator G temp and a music bar generator G bar , and use the time structure generator G temp and the music bar generator G bar to construct a timing correlation between music bars Sexual formation time model
    组合模块,设置为组合所述混合音轨模型和时间模型,以形成多音轨交响乐作曲模型。The combination module is configured to combine the mixed audio track model and the time model to form a multi-track symphony composition model.
  7. 根据权利要求6所述的作曲模型的建立系统,其中,所述清理模块还包括:The system for establishing a composition model according to claim 6, wherein the cleaning module further comprises:
    合并单元,设置为通过钢琴键轴将相似乐器的音轨合并,将非黄铜,弦乐器,管乐器和打击乐的音轨统一归纳到弦乐器音轨;The merging unit is configured to merge the tracks of similar instruments through the piano key axis, and integrate the non-brass, string, wind and percussion tracks into the string instrument track;
    修剪单元,设置为将四个音乐小节视为一个乐句,通过钢琴卷轴进行分割, 将长片段修剪成适合的大小,所述适合的大小为C1到C8的音域。The trimming unit is configured to treat four music bars as one phrase, divide it by a piano reel, and trim a long section to a suitable size, where the suitable size is the sound range of C1 to C8.
  8. 根据权利要求6所述的作曲模型的建立系统,其中,所述建立模块还包括:The system for establishing a composition model according to claim 6, wherein the establishment module further comprises:
    建立干扰模型单元,设置为采用生成式对抗网络建立干扰音轨模型,所述干扰音轨模型中每一个音轨拥有自己的一组生成器和鉴别器,以及独立的隐空间变量,其中多个生成器彼此独立工作,并且给定随机向量产生自己的音乐轨道,所述生成器接收来自不同鉴别器的反馈,利用生成器和鉴别器的互斗模式实现生成式对抗网络的对抗学习,使生成器控制生成音乐序列的数据分布规律;An interference model unit is set up to set up an interference track model using a generative adversarial network. Each of the interference track models has its own set of generators and discriminators, as well as independent hidden space variables. The generators work independently of each other, and given a random vector to generate their own music track, the generator receives feedback from different discriminators, and uses the combat mode of the generator and discriminator to implement the adversarial learning of the generative adversarial network to make the generation Controller controls the data distribution of generated music sequences;
    建立作曲模型单元,设置为采用生成式对抗网络建立作曲音轨模型,所述作曲音轨模型将多个音轨数据量输入到所述生成器中,通过一个单一的生成器就创建一个多通道的钢琴键轴,每个通道代表一个特定的音轨;A composition model unit is set up to set up a composition track model using a generative adversarial network. The composition track model inputs a plurality of track data quantities into the generator, and creates a multi-channel through a single generator. Piano keys, each channel represents a specific audio track;
    结合单元,设置为结合所述干扰音轨模型和作曲音轨模型建立混合音轨模型,所述混合音轨模型中每个音轨都有一个发生器,接受独立的向量及全局的向量共同组合成的输入向量,同时公用一个鉴别器来生成音轨。The combining unit is configured to establish a mixed audio track model by combining the disturbing audio track model and the composing audio track model. Each audio track in the mixed audio track model has a generator that accepts a combination of independent vectors and global vectors. Into the input vector, and a discriminator is used to generate the audio track.
  9. 根据权利要求6所述的作曲模型的建立系统,其中,所述构建模块包括:The system for establishing a composition model according to claim 6, wherein the building module comprises:
    生成器处理单元,设置为将生成器分为时间结构生成器G temp和音乐小节生成器G bar,所述时间结构生成器G temp将输入向量Z映射成一个隐空间向量的序列 A generator processing unit configured to divide the generator into a time structure generator G temp and a music bar generator G bar , where the time structure generator G temp maps the input vector Z into a sequence of hidden space vectors
    Figure PCTCN2018106680-appb-100006
    Figure PCTCN2018106680-appb-100006
    T代表时间,
    Figure PCTCN2018106680-appb-100007
    承载时序信息,随后被送入音乐小节生成器G bar,序列化地生成钢琴键轴,定义为:
    Figure PCTCN2018106680-appb-100008
    T stands for time,
    Figure PCTCN2018106680-appb-100007
    Carry timing information, and then send it to the music bar generator G bar to serially generate piano keys, which is defined as:
    Figure PCTCN2018106680-appb-100008
  10. 根据权利要求6所述的作曲模型的建立系统,其中,所述组合模块包括:The system for establishing a composition model according to claim 6, wherein the combination module comprises:
    混合处理单元,设置为混合音轨模型和时间模型的组合形成多音轨交响乐作曲模型,其中所述模型的输入用
    Figure PCTCN2018106680-appb-100009
    表示,由轨道间全局时间相关向量Z t,轨道间全局时间无关向量Z,轨道内单独时间无关向量Z i,和轨道内单独时间相关向量Z i,t组成;
    A mixing processing unit configured to form a combination of a mixed audio track model and a time model to form a multi-track symphony composition model, wherein the input of the model is used
    Figure PCTCN2018106680-appb-100009
    Represents, Z t the temporal correlation between tracks global vector, the global time between tracks independent vector Z, a separate time-independent vector Z i of the inner rail, the inner rail, and a separate time-dependent vector Z i, t composition;
    乐曲组合生成单元,设置为设定音轨i,所述音轨i=1,2……M,共用的时间结构生成器G temp和各自使用的时间结构生成器G temp,i,分别采取随时间变化的随机向量Z t和Z i,t作为输入,并且分别输出包含音轨间和音轨内时间信息的潜在向量,输出序列潜在向量连同时间无关的随机向量Z和Z i一起被送入音乐小节生成器G bar,然后按顺序生成钢琴窗,生成过程可定义为: The music composition generating unit is set to set a track i, where the track i = 1, 2,... M, the common time structure generator G temp and the respective time structure generator G temp, i are respectively adopted as Time-varying random vectors Z t and Z i, t are used as inputs, and potential vectors containing time information between and within tracks are output respectively. The output sequence potential vectors are sent together with time-independent random vectors Z and Z i Music bar generator G bar and then generate piano windows in sequence. The generation process can be defined as:
    Figure PCTCN2018106680-appb-100010
    Figure PCTCN2018106680-appb-100010
    各轨道间的输入变量和全局输入变量结合起来,形成多音轨交响乐,所述各轨道间的输入变量分为时间相关和无关,所述全局输入变量分为时间相关和无关。The input variables between the tracks are combined with the global input variables to form a multi-track symphony. The input variables between the tracks are divided into time dependent and unrelated, and the global input variables are divided into time related and unrelated.
  11. 一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述处理器执行以下步骤:A computer device includes a memory and a processor. The memory stores computer-readable instructions. When the computer-readable instructions are executed by the processor, the processor causes the processor to perform the following steps:
    获取MIDI格式的音乐数据集,并将所述MIDI格式的音乐数据集转换为钢琴键轴,所述钢琴键轴是音乐存储媒介,用于再现钢琴弹奏,当被读取时触发每个音符的播放循环;Acquire a music data set in the MIDI format and convert the music data set in the MIDI format into a piano key, which is a music storage medium used to reproduce the piano performance and trigger each note when read Playback loop
    对格式转换后的钢琴键轴进行数据清理;Data cleaning of piano keys after format conversion;
    采用生成式对抗网络建立干扰音轨模型和作曲音轨模型,并结合所述干扰音轨模型和作曲音轨模型建立混合音轨模型;Using a generative adversarial network to establish an interference track model and a composing track model, and combining the interference track model and the composing track model to establish a mixed track model;
    将生成器分为时间结构生成器G temp和音乐小节生成器G bar,通过所述时间结构生成器G temp和所述音乐小节生成器G bar构建音乐小节之间的时序相关性形成时间模型; Divide the generator into a time structure generator G temp and a music measure generator G bar , and use the time structure generator G temp and the music measure generator G bar to construct a temporal correlation between music measures to form a time model;
    组合所述混合音轨模型和时间模型,以形成多音轨交响乐作曲模型。The mixed audio track model and the time model are combined to form a multi-track symphony composition model.
  12. 根据权利要求11所述的一种计算机设备,其中,所述对格式转换后的钢琴键轴进行数据清理时,使得所述处理器执行以下步骤:The computer device according to claim 11, wherein when performing data cleaning on the converted piano key shaft, the processor is caused to perform the following steps:
    通过钢琴键轴将相似乐器的音轨合并,将非黄铜,弦乐器,管乐器和打击乐的音轨统一归纳到弦乐器音轨;Combine the soundtracks of similar instruments through the piano keys, and unify the tracks of non-brass, string, wind and percussion into the strings of the string instrument;
    将四个音乐小节视为一个乐句,通过钢琴卷轴进行分割,将长片段修剪成适合的大小,所述适合的大小为C1到C8的音域。The four music bars are regarded as one phrase, which is divided by the piano reel, and the long section is trimmed to a suitable size, and the suitable size is the sound range of C1 to C8.
  13. 根据权利要求11所述的一种计算机设备,其中,所述采用生成式对抗网络建立干扰音轨模型和作曲音轨模型,并结合所述干扰音轨模型和作曲音轨模型建立混合音轨模型时,使得所述处理器执行以下步骤:The computer device according to claim 11, wherein the interference track model and the composing track model are established by using a generative adversarial network, and the mixed track model is established by combining the interference track model and the composing track model. When the processor is caused to perform the following steps:
    采用生成式对抗网络建立干扰音轨模型,所述干扰音轨模型中每一个音轨拥有自己的一组生成器和鉴别器,以及独立的隐空间变量,其中多个生成器彼此独立工作,并且给定随机向量产生自己的音乐轨道,所述生成器接收来自不同鉴别器的反馈,利用生成器和鉴别器的互斗模式实现生成式对抗网络的对抗学习,使生成器控制生成音乐序列的数据分布规律;A generative adversarial network is used to establish an interfering track model. Each of the interfering track models has its own set of generators and discriminators, and independent hidden space variables, where multiple generators work independently of each other, and Given a random vector to generate its own music track, the generator receives feedback from different discriminators, and uses the combat mode of the generator and discriminator to implement adversarial learning of the generative adversarial network, so that the generator controls the data that generates the music sequence Distribution;
    采用生成式对抗网络建立作曲音轨模型,所述作曲音轨模型将多个音轨数据量输入到所述生成器中,通过一个单一的生成器就创建一个多通道的钢琴键轴,每个通道代表一个特定的音轨;A generative adversarial network is used to establish a composition track model. The composition track model inputs multiple track data quantities into the generator. A single generator creates a multi-channel piano key, each Channel represents a specific audio track;
    结合所述干扰音轨模型和作曲音轨模型建立混合音轨模型,所述混合音轨模型中每个音轨都有一个发生器,接受独立的向量及全局的向量共同组合成的输入向量,同时公用一个鉴别器来生成音轨。A mixed audio track model is established by combining the interference audio track model and the composing audio track model. Each audio track in the mixed audio track model has a generator that accepts an input vector combined by independent vectors and global vectors. A discriminator is also used to generate audio tracks.
  14. 根据权利要求11所述的一种计算机设备,其中,所述将生成器分为时间结构生成器G temp和音乐小节生成器G bar,通过所述时间结构生成器G temp和所述音乐小节生成器G bar构建音乐小节之间的时序相关性形成时间模型时,使得所述处理器执行以下步骤: A computer apparatus according to claim 11, wherein the generator into the generator G temp timing structure section and the music generator G bar, generated by the generator G temp time structure and said music passage When the processor G bar constructs the temporal correlation between the music bars to form a time model, the processor is caused to perform the following steps:
    将生成器分为时间结构生成器G temp和音乐小节生成器G bar,所述时间结构生成器G temp将输入向量Z映射成一个隐空间向量的序列 The generator is divided into a time structure generator G temp and a music bar generator G bar . The time structure generator G temp maps the input vector Z into a sequence of hidden space vectors.
    Figure PCTCN2018106680-appb-100011
    Figure PCTCN2018106680-appb-100011
    T代表时间,
    Figure PCTCN2018106680-appb-100012
    承载时序信息,随后被送入音乐小节生成器G bar,序列化地生成钢琴键轴,定义为:
    Figure PCTCN2018106680-appb-100013
    T stands for time,
    Figure PCTCN2018106680-appb-100012
    Carry timing information, and then send it to the music bar generator G bar to serially generate piano keys, which is defined as:
    Figure PCTCN2018106680-appb-100013
  15. 根据权利要求11所述的一种计算机设备,其中,所述组合所述混合音轨模型和时间模型,以形成多音轨交响乐作曲模型时,使得所述处理器执行以下步骤:The computer device according to claim 11, wherein when said combining the mixed track model and the time model to form a multi-track symphony composition model, the processor is caused to perform the following steps:
    混合音轨模型和时间模型的组合形成多音轨交响乐作曲模型,其中所述模型的输入用
    Figure PCTCN2018106680-appb-100014
    表示,由轨道间全局时间相关向量Z t,轨道间全局时间无关向量Z,轨道内单独时间无关向量Z i,和轨道内单独时间相关向量Z i,t组成;
    The combination of the mixed track model and the time model forms a multi-track symphony composition model, where the input of the model is used
    Figure PCTCN2018106680-appb-100014
    Represents, Z t the temporal correlation between tracks global vector, the global time between tracks independent vector Z, a separate time-independent vector Z i of the inner rail, the inner rail, and a separate time-dependent vector Z i, t composition;
    设定音轨i,所述音轨i=1,2……M,共用的时间结构生成器G temp和各自使用的时间结构生成器G temp,i,分别采取随时间变化的随机向量Z t和Z i,t作为输入,并且分别输出包含音轨间和音轨内时间信息的潜在向量,输出序列潜在向量连同时间无关的随机向量Z和Z i一起被送入音乐小节生成器G bar,然后按顺序生成钢琴窗,生成过程可定义为:
    Figure PCTCN2018106680-appb-100015
    Set track i, said track i = 1, 2 ... M, the shared time structure generator G temp and the time structure generator G temp, i used respectively, take random vectors Z t that change with time, respectively And Z i, t as inputs, and output potential vectors containing inter-track and intra-track time information respectively. The output sequence potential vectors are sent to the music bar generator G bar together with the time-independent random vectors Z and Z i . Then generate piano windows in sequence. The generation process can be defined as:
    Figure PCTCN2018106680-appb-100015
    各轨道间的输入变量和全局输入变量结合起来,形成多音轨交响乐,所述各轨道间的输入变量分为时间相关和无关,所述全局输入变量分为时间相关和无关。The input variables between the tracks are combined with the global input variables to form a multi-track symphony. The input variables between the tracks are divided into time dependent and unrelated, and the global input variables are divided into time related and unrelated.
  16. 一种存储有计算机可读指令的存储介质,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行以下步骤:A storage medium storing computer-readable instructions. When the computer-readable instructions are executed by one or more processors, the one or more processors execute the following steps:
    获取MIDI格式的音乐数据集,并将所述MIDI格式的音乐数据集转换为钢琴键轴,所述钢琴键轴是音乐存储媒介,用于再现钢琴弹奏,当被读取时触发每个音符的播放循环;Acquire a music data set in the MIDI format and convert the music data set in the MIDI format into a piano key, which is a music storage medium used to reproduce the piano performance and trigger each note when read Playback loop
    对格式转换后的钢琴键轴进行数据清理;Data cleaning of piano keys after format conversion;
    采用生成式对抗网络建立干扰音轨模型和作曲音轨模型,并结合所述干扰音轨模型和作曲音轨模型建立混合音轨模型;Using a generative adversarial network to establish an interference track model and a composing track model, and combining the interference track model and the composing track model to establish a mixed track model;
    将生成器分为时间结构生成器G temp和音乐小节生成器G bar,通过所述时间结构生成器G temp和所述音乐小节生成器G bar构建音乐小节之间的时序相关性形成时间模型; Divide the generator into a time structure generator G temp and a music measure generator G bar , and use the time structure generator G temp and the music measure generator G bar to construct a temporal correlation between music measures to form a time model;
    组合所述混合音轨模型和时间模型,以形成多音轨交响乐作曲模型。The mixed audio track model and the time model are combined to form a multi-track symphony composition model.
  17. 根据权利要求16所述的一种存储有计算机可读指令的存储介质,其中,所述对格式转换后的钢琴键轴进行数据清理时,使得所述处理器执行以下步骤:The storage medium storing computer-readable instructions according to claim 16, wherein when performing the data cleaning on the converted piano key shaft, the processor is caused to perform the following steps:
    通过钢琴键轴将相似乐器的音轨合并,将非黄铜,弦乐器,管乐器和打击乐的音轨统一归纳到弦乐器音轨;Combine the soundtracks of similar instruments through the piano keys, and unify the tracks of non-brass, string, wind and percussion into the strings of the string instrument;
    将四个音乐小节视为一个乐句,通过钢琴卷轴进行分割,将长片段修剪成适合的大小,所述适合的大小为C1到C8的音域。The four music bars are regarded as one phrase, which is divided by the piano reel, and the long section is trimmed to a suitable size, and the suitable size is the sound range of C1 to C8.
  18. 根据权利要求16所述的一种存储有计算机可读指令的存储介质,其中,所述采用生成式对抗网络建立干扰音轨模型和作曲音轨模型,并结合所述干扰音轨模型和作曲音轨模型建立混合音轨模型时,使得所述处理器执行以下步骤:The storage medium storing computer-readable instructions according to claim 16, wherein the generative adversarial network is used to establish an interference track model and a composing track model, and combining the interference track model and a composing track When the track model is established, the processor is caused to perform the following steps:
    采用生成式对抗网络建立干扰音轨模型,所述干扰音轨模型中每一个音轨拥有自己的一组生成器和鉴别器,以及独立的隐空间变量,其中多个生成器彼此独立工作,并且给定随机向量产生自己的音乐轨道,所述生成器接收来自不同鉴别器的反馈,利用生成器和鉴别器的互斗模式实现生成式对抗网络的对抗学习,使生成器控制生成音乐序列的数据分布规律;A generative adversarial network is used to establish an interfering track model. Each of the interfering track models has its own set of generators and discriminators, and independent hidden space variables, where multiple generators work independently of each other, and Given a random vector to generate its own music track, the generator receives feedback from different discriminators, and uses the combat mode of the generator and discriminator to implement adversarial learning of the generative adversarial network, so that the generator controls the data that generates the music sequence Distribution;
    采用生成式对抗网络建立作曲音轨模型,所述作曲音轨模型将多个音轨数据量输入到所述生成器中,通过一个单一的生成器就创建一个多通道的钢琴键轴,每个通道代表一个特定的音轨;A generative adversarial network is used to establish a composition track model. The composition track model inputs multiple track data quantities into the generator. A single generator creates a multi-channel piano key, each Channel represents a specific audio track;
    结合所述干扰音轨模型和作曲音轨模型建立混合音轨模型,所述混合音轨模型中每个音轨都有一个发生器,接受独立的向量及全局的向量共同组合成的输入向量,同时公用一个鉴别器来生成音轨。A mixed audio track model is established by combining the interference audio track model and the composing audio track model. Each audio track in the mixed audio track model has a generator that accepts an input vector combined by independent vectors and global vectors. A discriminator is also used to generate audio tracks.
  19. 根据权利要求16所述的一种存储有计算机可读指令的存储介质,其中,所述将生成器分为时间结构生成器G temp和音乐小节生成器G bar,通过所述时间结构生成器G temp和所述音乐小节生成器G bar构建音乐小节之间的时序相关性形成时间模型时,使得所述处理器执行以下步骤: The storage medium storing computer-readable instructions according to claim 16, wherein the generator is divided into a time structure generator Gtemp and a music bar generator Gbar , and the time structure generator G When temp and the music bar generator G bar construct a temporal correlation between music bars to form a time model, the processor is caused to perform the following steps:
    将生成器分为时间结构生成器G temp和音乐小节生成器G bar,所述时间结构生成器G temp将输入向量Z映射成一个隐空间向量的序列 The generator is divided into a time structure generator G temp and a music bar generator G bar . The time structure generator G temp maps the input vector Z into a sequence of hidden space vectors.
    Figure PCTCN2018106680-appb-100016
    Figure PCTCN2018106680-appb-100016
    T代表时间,
    Figure PCTCN2018106680-appb-100017
    承载时序信息,随后被送入音乐小节生成器G bar,序列化地生成钢琴键轴,定义为:
    Figure PCTCN2018106680-appb-100018
    T stands for time,
    Figure PCTCN2018106680-appb-100017
    Carry timing information, and then send it to the music bar generator G bar to serially generate piano keys, which is defined as:
    Figure PCTCN2018106680-appb-100018
  20. 根据权利要求16所述的一种存储有计算机可读指令的存储介质,其中,所述组合所述混合音轨模型和时间模型,以形成多音轨交响乐作曲模型时,使得所述处理器执行以下步骤:The storage medium storing computer-readable instructions according to claim 16, wherein when said combining the mixed track model and the time model to form a multi-track symphony composition model, the processor is caused to execute The following steps:
    混合音轨模型和时间模型的组合形成多音轨交响乐作曲模型,其中所述模型的输入用
    Figure PCTCN2018106680-appb-100019
    表示,由轨道间全局时间相关向量Z t,轨道间全局时间无关向量Z,轨道内单独时间无关向量Z i,和轨道内单独时间相关向量Z i,t组成;
    The combination of the mixed track model and the time model forms a multi-track symphony composition model, where the input of the model is used
    Figure PCTCN2018106680-appb-100019
    Represents, Z t the temporal correlation between tracks global vector, the global time between tracks independent vector Z, a separate time-independent vector Z i of the inner rail, the inner rail, and a separate time-dependent vector Z i, t composition;
    设定音轨i,所述音轨i=1,2……M,共用的时间结构生成器G temp和各自使用的时间结构生成器G temp,i,分别采取随时间变化的随机向量Z t和Z i,t作为输入,并且分别输出包含音轨间和音轨内时间信息的潜在向量,输出序列潜在向量连同时间无关的随机向量Z和Z i一起被送入音乐小节生成器G bar,然后按顺序生成钢琴窗,生成过程可定义为: Set track i, said track i = 1, 2 ... M, the shared time structure generator G temp and the time structure generator G temp, i used respectively, take random vectors Z t that change with time, respectively And Z i, t as inputs, and output potential vectors containing inter-track and intra-track time information respectively. The output sequence potential vectors are sent to the music bar generator G bar together with the time-independent random vectors Z and Z i . Then generate piano windows in sequence. The generation process can be defined as:
    Figure PCTCN2018106680-appb-100020
    Figure PCTCN2018106680-appb-100020
    各轨道间的输入变量和全局输入变量结合起来,形成多音轨交响乐,所述各轨道间的输入变量分为时间相关和无关,所述全局输入变量分为时间相关和无关。The input variables between the tracks are combined with the global input variables to form a multi-track symphony. The input variables between the tracks are divided into time dependent and unrelated, and the global input variables are divided into time related and unrelated.
PCT/CN2018/106680 2018-08-08 2018-09-20 Method, system and apparatus for building music composition model, and storage medium WO2020029382A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810894765.2 2018-08-08
CN201810894765.2A CN109189974A (en) 2018-08-08 2018-08-08 A kind of method for building up, system, equipment and the storage medium of model of wrirting music

Publications (1)

Publication Number Publication Date
WO2020029382A1 true WO2020029382A1 (en) 2020-02-13

Family

ID=64920431

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/106680 WO2020029382A1 (en) 2018-08-08 2018-09-20 Method, system and apparatus for building music composition model, and storage medium

Country Status (2)

Country Link
CN (1) CN109189974A (en)
WO (1) WO2020029382A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102459109B1 (en) * 2018-05-24 2022-10-27 에이미 인코퍼레이티드 music generator
CN109872708B (en) * 2019-01-23 2023-04-28 平安科技(深圳)有限公司 Music generation method and device based on DCGAN
CN110288965B (en) * 2019-05-21 2021-06-18 北京达佳互联信息技术有限公司 Music synthesis method and device, electronic equipment and storage medium
CN111477198B (en) * 2020-03-05 2023-07-14 支付宝(杭州)信息技术有限公司 Method and device for representing music bar and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090260507A1 (en) * 2008-04-22 2009-10-22 Peter Gannon Systems and methods for composing music
CN103902642A (en) * 2012-12-21 2014-07-02 香港科技大学 Music composition system using correlation between melody and lyrics
US20150269852A1 (en) * 2014-03-20 2015-09-24 Pearson Education, Inc. Sound assessment and remediation
CN106898341A (en) * 2017-01-04 2017-06-27 清华大学 A kind of individualized music generation method and device based on common semantic space

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106023969B (en) * 2011-07-29 2020-02-18 音乐策划公司 Method for applying audio effects to one or more tracks of a music compilation
US20160379611A1 (en) * 2015-06-23 2016-12-29 Medialab Solutions Corp. Systems and Method for Music Remixing
CN106652984B (en) * 2016-10-11 2020-06-02 张文铂 Method for automatically composing songs by using computer

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090260507A1 (en) * 2008-04-22 2009-10-22 Peter Gannon Systems and methods for composing music
CN103902642A (en) * 2012-12-21 2014-07-02 香港科技大学 Music composition system using correlation between melody and lyrics
US20150269852A1 (en) * 2014-03-20 2015-09-24 Pearson Education, Inc. Sound assessment and remediation
CN106898341A (en) * 2017-01-04 2017-06-27 清华大学 A kind of individualized music generation method and device based on common semantic space

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
HUANG, A. ET AL.: "Deep Learning for Music", 19 June 2016 (2016-06-19), pages 1 - 4, XP080708782, Retrieved from the Internet <URL:https://cs224d.stanford.edu/reports/allenh.pdf> [retrieved on 20160619] *

Also Published As

Publication number Publication date
CN109189974A (en) 2019-01-11

Similar Documents

Publication Publication Date Title
WO2020029382A1 (en) Method, system and apparatus for building music composition model, and storage medium
Herremans et al. MorpheuS: generating structured music with constrained patterns and tension
CN106023969B (en) Method for applying audio effects to one or more tracks of a music compilation
CN101916568B (en) Information processing apparatus and information processing method
US20160247496A1 (en) Device and method for generating a real time music accompaniment for multi-modal music
WO2020000751A1 (en) Automatic composition method and apparatus, and computer device and storage medium
CN112382257B (en) Audio processing method, device, equipment and medium
US11942071B2 (en) Information processing method and information processing system for sound synthesis utilizing identification data associated with sound source and performance styles
JP2023542431A (en) System and method for hierarchical sound source separation
JP4333700B2 (en) Chord estimation apparatus and method
Borin et al. Musical signal synthesis
Dadman et al. Toward interactive music generation: A position paper
Nadeem et al. Let's make some music
Dubnov et al. Creative improvised interaction with generative musical systems
Renault et al. DDSP-Piano: a Neural Sound Synthesizer Informed by Instrument Knowledge
Bourbon et al. The ecological approach to mixing audio: Agency, activity and environment in the process of audio staging
Braasch A cybernetic model approach for free jazz improvisations
Zhu et al. A Survey of AI Music Generation Tools and Models
CN115004294A (en) Composition creation method, composition creation device, and creation program
US11830463B1 (en) Automated original track generation engine
CN111061908A (en) Recommendation method and system for movie and television dubbing author
Wiggins et al. A Differentiable Acoustic Guitar Model for String-Specific Polyphonic Synthesis
US11790876B1 (en) Music technique responsible for versioning
Eigenfeldt The human fingerprint in machine generated music
JP2020038252A (en) Information processing method and information processing unit

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18929578

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18929578

Country of ref document: EP

Kind code of ref document: A1