WO2020029382A1

WO2020029382A1 - Method, system and apparatus for building music composition model, and storage medium

Info

Publication number: WO2020029382A1
Application number: PCT/CN2018/106680
Authority: WO
Inventors: 张爽; 王义文; 王健宗; 肖京
Original assignee: 平安科技（深圳）有限公司
Priority date: 2018-08-08
Filing date: 2018-09-20
Publication date: 2020-02-13
Also published as: CN109189974A

Abstract

A method and system for building a music composition model, a computer device, and a storage medium, the method comprising: acquiring a music data set in a MIDI format and converting the music data set in the MIDI format into a piano roll, wherein the piano roll is a music storage medium for reproducing a piano performance, and the piano roll triggers playing cycles of respective notes upon being read (S101); performing data cleanup on the piano roll after the format conversion (S102); using a generative adversarial network to build an interference track model and a music composition track model, and building a mixed track model by combining the interference track model with the music composition track model (S103); dividing a generator into a temporal structure generator G_temp and a musical bar generator G_bar, forming a temporal model by establishing a temporal sequence relevance among the musical bars by means of the temporal structure generator G_temp and the musical bar generator G_bar (S104); combining the mixed track model with the temporal model to form a multi-track symphonic music composition model (S105). The method satisfies requirements for variation and diversity of music.

Description

Method, system, equipment and storage medium for establishing composition model

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on August 08, 2018 with the application number 201810894765.2. The invention name is "A method, system, device, and storage medium for establishing a composition model", and its entire contents Incorporated by reference in this application.

Technical field

The present application relates to the field of information technology, and in particular, to a method, a system, a device, and a storage medium for establishing a composition model.

Background technique

There are many ways to generate music. The most common way is through musical instruments. In recent years, with the continuous development of computer application technology, music can also be generated through the use of algorithms. In the field of music generation algorithm research, there have been many The algorithms of genetic algorithm, RNN and CNN have achieved good results, but most of the generated music is a single track, that is, an instrument, and the real music is composed of multiple instruments. For example, modern symphony orchestras usually include four parts: Brass, string instruments, wind instruments and percussion. The single-track music form is relatively single and cannot meet people's diverse needs for music. At present, there is no research on multi-track generation symphony algorithms.

Summary of the invention

Based on this, it is necessary to provide a method, system, computer equipment, and storage medium for establishing a composition model in view of the disadvantages of the current composition method.

A method for establishing a composition model includes: acquiring a music data set in a MIDI format and converting the music data set in a MIDI format into a piano key shaft, the piano key shaft being a music storage medium for reproducing piano playing , When it is read, trigger the playback cycle of each note; clean up the piano keys after format conversion; use the generative adversarial network to establish the interference track model and the composition track model and combine the interference track model Build a mixed track model with the composer track model; divide the generator into a time structure generator G _temp and a music bar generator G _bar , and build music through the time structure generator G _temp and the music bar generator G _bar The temporal correlation between the bars forms a time model; the mixed track model and the time model are combined to form a multi-track symphony composition model.

A system for establishing a composition model, including:

An acquisition module, configured to acquire a music data set in the MIDI format and convert the music data set in the MIDI format into a piano key shaft, the piano key shaft being a music storage medium for reproducing a piano performance when read Trigger the playback loop of each note

A clearing module, which is used for clearing the data of the piano keys after format conversion;

A building module for establishing a disturbing track model and a composing track model using a generative adversarial network, and establishing a mixed track model by combining the disturbing track model and the composing track model;

A building module, configured to divide the generator into a time structure generator G _temp and a music bar generator G _bar , and use the time structure generator G _temp and the music bar generator G _{bar to} construct a timing correlation between music bars Sexual formation time model

A combination module is configured to combine the mixed audio track model and the time model to form a multi-track symphony composition model.

A computer device includes a memory and a processor. The memory stores computer-readable instructions. When the computer-readable instructions are executed by the processor, the processor causes the processor to perform the steps of the composition model building method.

A storage medium storing computer-readable instructions, when the computer-readable instructions are executed by one or more processors, cause the one or more processors to execute the steps of the composition model building method described above.

The above-mentioned method, system, computer equipment and storage medium for establishing a composition model, by acquiring a MIDI format music data set and converting the MIDI format music data set into a piano key, the piano key is a music storage medium, It is used to reproduce the piano performance. When it is read, it triggers the playback cycle of each note. It cleans the piano keys after format conversion, merges the tracks of similar instruments through the piano keys, and combines non-brass and string instruments. The wind instruments and percussion tracks are unified into the string instrument tracks. The four music measures are regarded as one phrase, which is divided by the piano scroll, and the long section is trimmed to a suitable size. The suitable sizes are C1 to C8. In the sound field, a generative adversarial network is used to establish a disturbing track model and a composing track model, and a hybrid track model is established by combining the interfering track model and the composing track model, and the generator is divided into a time structure generator G _temp and music section generator G _bar, constructed between the timing of the time structure of the music passage through generator G _temp music and the generator G _bar section Forming correlation time model, the combination of the model and the mixing time track model, orchestra composition to form a multi-track model, the change to meet the people of music and natural diversity requirements.

BRIEF DESCRIPTION OF THE DRAWINGS

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the detailed description of the preferred embodiments below. The drawings are only for the purpose of illustrating preferred embodiments and are not to be considered as limiting the present application.

FIG. 1 is a flowchart of a method for establishing a composition model in an embodiment; FIG.

FIG. 2 is a schematic diagram of music generating M audio tracks in an embodiment; FIG.

FIG. 3 is a schematic diagram of a multi-channel piano key created by a single generator in an embodiment; FIG.

4 is a schematic diagram of establishing a hybrid model by combining an interference model and a composition model in an embodiment;

FIG. 5 is a schematic diagram of constructing audio track correlation in an embodiment; FIG.

6 is a schematic diagram of a multi-track model in an embodiment;

7 is a flowchart of data cleaning of a piano key shaft after format conversion in an embodiment;

8 is a flowchart of establishing an interference track model and a composing track model by using a generative adversarial network in an embodiment;

9 is a structural block diagram of a system for establishing a composition model in an embodiment;

10 is a structural block diagram of a cleaning module in an embodiment;

FIG. 11 is a structural block diagram of an establishment module in an embodiment.

detailed description

In order to make the purpose, technical solution, and advantages of the present application clearer, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the application, and are not used to limit the application.

Those skilled in the art will understand that, unless specifically stated otherwise, the singular forms "a", "an", "the" and "the" may include plural forms. It should be further understood that the word "comprising" used in the specification of the present application refers to the presence of the described features, integers, steps, operations, elements and / or components, but does not exclude the presence or addition of one or more other features, Integers, steps, operations, elements, components, and / or groups thereof.

As a better embodiment, as shown in FIG. 1, a method for establishing a composition model includes the following steps:

Step S101: Obtain a music data set in the MIDI format and convert the music data set in the MIDI format into a piano key shaft, the piano key shaft is a music storage medium for reproducing a piano performance, and is triggered when read. Playback loop of each note;

In view of the fact that the music formed by the current field of music generation algorithms is mostly single-track, which does not meet people's needs for music diversification, this technical solution divides music into five levels of passages, phrases, bars, beats and pixels, and then layer by layer Generate, build the correlation of audio tracks based on GAN network (generative adversarial network), each track is relatively independent and needs to cooperate with each other, GAN network inputs two noise vectors to each track at the same time, one is all The tracks are shared, and the other is generated separately for each track. The same method is also used to deal with the relationship between measures, and then to establish the correlation of timing. Finally, all the tracks and measures are used as a sample. Together with the samples in the training set, they are input into the discriminator for training, and finally the symphony composition based on the multitrack GAN network is completed.

This technical solution uses a MIDI data set and converts a MIDI file into a multi-track piano-rolls (piano keys) representation. Piano-rolls is a set of binary representations, which are matrices that represent the existence of notes in different time steps. That is, at each moment, the key pressed is represented as 1, the key not pressed is represented as 0, one has M tracks, each track has R time_steps, and the number of candidate syllables (bar) is S. Is X, its data form is X ^RxSxM , and T syllables are expressed as

Therefore, the matrix size of each X is fixed.

The MIDI, also known as the digital interface of musical instruments, is an industry standard electronic communication protocol. It defines various notes or playing codes for electronic musical instruments and other performance equipment (such as synthesizers), allowing electronic musical instruments, computers, mobile phones or other stage performances. The equipment is connected to each other, adjusted and synchronized to exchange performance data in real time.

Step S102, data cleaning is performed on the piano key shaft after the format conversion;

The original data set is very noisy, although there will be a certain loss of sound source after cleaning the data (such as the sound source of some merged instruments), but the richness and effectiveness of the data is the key to training the model. Cleanup is a necessary operation, so the following methods are used to clean up the data. Some instruments present too little data on the corresponding audio track, which is not conducive to the training of the model. Therefore, this technical solution has merged them. piano-rolls merges the soundtracks of similar instruments. For non-brass, string, wind and percussion tracks are unified into string instruments, which enriches the amount of data. In order to train the time model, four bars (bar) As a phrase, the piano-rolls can be segmented, and longer sections can be trimmed to a suitable size. The highest and lowest notes are not very common, so the range of sounds is C1 to C8 (rightmost of the piano).

In step S103, an interference track model and a composing track model are established by using a generative adversarial network, and a mixed track model is established by combining the interference track model and the composing track model;

Adversarial learning of the GAN network is implemented based on the fighting mode of the generator and the discriminator, so that the generator gradually masters the data distribution rules of the generated music sequence. The GAN network implements adversarial learning by constructing two networks: a generator and a discriminator. The generator can capture the potential distribution of real data samples and generate new data samples. The discriminator is a two-classifier that determines the input of the generator. Whether the data is real data or generated samples, these two models "fight" each other, and in the process, each model becomes more powerful. The generator will try to make its own ability to generate samples as strong as it is to generate samples that are no different from the real data, and the discriminator cannot determine whether the samples generated by the generator are true or false, so as to gradually master the data distribution rules of the generated music sequence, The discriminator has also continuously improved its ability to identify data samples in this process. The GAN network is also called a generative adversarial network, which includes two networks: one network is used to generate data called a generator, and the other network is used to determine whether the generated data is close to the real, called a discriminator. The basic principle is: train a generator to generate realistic samples from random noise or latent variables, and train a discriminator to discriminate between real data and generated data. Both are trained at the same time until a Nash equilibrium is reached. The data is not different from the real sample, and the discriminator cannot distinguish the generated data from the real data correctly.

An interference model is established based on the GAN network. Each track has its own set of generators and discriminators, and an independent hidden space variable Zi. Multiple generators work independently of each other, and given a random vector Zi (i = 1, 2, ... M, M represents the number of generators, that is, the number of audio tracks) to generate their own music tracks, these generators receive from different discriminators feedback of. As shown in Fig. 2, in order to generate music of M tracks, M generators and M discriminators are needed.

A composing model is established based on the GAN network. Since the composing model is the input of multiple track data into the generator, a single generator can be used to create a multi-channel piano-rolls. Each channel represents a specific track. ,As shown in Figure 3. In this model, there is only one set of generators and discriminators globally, and the common input Z is used to generate all audio tracks, so no matter what the value of M is, only one generator and one discriminator are needed.

As shown in Figure 4, a hybrid model is created by combining the interference model and the composing model. The hybrid model combines the above two model methods. Each track has a generator that accepts an input composed of an independent vector zi and a global vector z. Vector, while sharing a discriminator to generate audio tracks. Compared with the composition model, the mixing mode is more flexible. Different parameters (such as the number of layers, the size of the convolution kernel, etc.) can be used in the G model to harmonize the independent generation of the audio track and the global generation.

Step S104, the generator is divided into a time structure generator G _temp and a music bar generator G _bar , and a time correlation between the music bars is formed by the time structure generator G _temp and the music bar generator G _bar . Time model

As shown in FIG. 5, the purpose of constructing track correlation is how to generate a single measure in different tracks, and the timing relationship between measures and measures needs other structures to supplement the generation. The generator is divided into two sub-networks: a time structure generator G _temp and a music bar generator G _bar , as shown in the figure below. G _temp maps the input vector Z into a sequence of hidden space vectors

Among them, T, t are time, Z arrow

It will carry some timing information, and then it is sent to G _bar to generate piano keys serially.

defined as:

In step S105, the mixed audio track model and the time model are combined to form a multi-track symphony composition model.

Multi-track model is the integration and extension of the above-mentioned track model and time model. As shown in Figure 6, the model input

Represented by four parts, global time correlation between tracks vector Z _t, the time between the track irrespective of the global vector Z, a separate time-independent vector Z _i of the inner rail, the inner rail, and a separate time-dependent vector Z _{i, t.}

For the audio track i (i = 1,2 ... M), the shared time structure generator G _temp and the respective time structure generator G _{temp, i} respectively adopt random vectors Z _t and Z _{i that} change with time _{, t} as input, and they output a series of latent vectors containing inter-track and intra-track time information respectively. The output sequence (latent vector) is sent to the music bar generator G _bar together with the time-independent random vectors z and z _i , And then generate piano windows in order.

The generation process can be formulated as:

It can be clearly seen from the generating formula that the input variables between the tracks (divided into time dependent and irrelevant) and global input variables (divided into time dependent and irrelevant) are combined to form a multi-track symphony generating system.

As shown in FIG. 7, in one embodiment, the data cleaning of the piano key shaft after format conversion includes:

In step S201, the sound tracks of similar instruments are merged through the piano key shaft, and the tracks of non-brass, string instruments, wind instruments and percussion are unified into a string instrument track;

The original data set is very noisy, although there will be a certain loss of sound source after data cleaning, such as the sound source of some merged instruments, but the richness and effectiveness of the data is the key to training the model, so the data is cleaned up. It is a necessary operation, so use the following methods to clean up the data. Some instruments present too little data on the corresponding audio track, which is not conducive to the training of the model. Therefore, this technical solution has merged them. By summarizing piano -rolls merges the soundtracks of similar instruments. For non-brass, string, wind and percussion tracks are unified into string instruments, which enriches the data volume.

In step S202, the four music bars are regarded as one phrase, and the long section is divided into a suitable size by dividing the piano scroll, and the suitable size is the sound range of C1 to C8.

Treat four music bars as a phrase, divide it by the piano scroll, trim the long section to a suitable size, and the appropriate size ranges from C1 to C8. In order to train the time model, consider the four bars (bar) as one Phrase, so that the piano-rolls can be segmented, and longer sections can be trimmed to the appropriate size. The highest and lowest notes are not very common, so the range is C1 to C8 (rightmost of the piano).

As shown in FIG. 8, in one embodiment, the use of a generative adversarial network to establish an interference track model and a composing track model, and combining the interference track model and the composing track model to establish a mixed track model include:

Step S301, a generative adversarial network is used to establish an interfering track model. Each of the interfering track models has its own set of generators and discriminators, and independent hidden space variables, where multiple generators are independent of each other. Work, and given its own random vector to generate its own music track, the generator receives feedback from different discriminators, and uses the combat mode of the generator and discriminator to implement adversarial learning of the generative adversarial network, so that the generator controls the generation of music The data distribution of the sequence;

The adversarial learning of the GAN network is realized based on the fighting mode of the generator and the discriminator, so that the generator gradually masters the data distribution rules of the generated music sequence. The GAN network implements adversarial learning by constructing two networks: the generator and the discriminator. The generator can capture the potential distribution of real data samples and generate new data samples. The discriminator is a two-classifier that determines whether the data input by the generator is real data or a generated sample. These two modes "fight" each other. In the process, each model becomes more powerful. The generator will try to make its own ability to generate samples as strong as it is to generate samples that are no different from the real data, and the discriminator cannot determine whether the samples generated by the generator are true or false, so as to gradually master the data distribution rules of the generated music sequence, The discriminator has also continuously improved its ability to identify data samples in this process. The GAN network is also called a generative adversarial network, which includes two networks: one network is used to generate data called a generator, and the other network is used to determine whether the generated data is close to the real, called a discriminator. The basic principle is: train a generator to generate realistic samples from random noise or latent variables, and train a discriminator to discriminate between real data and generated data. Both are trained at the same time until a Nash equilibrium is reached. The data is not different from the real sample, and the discriminator cannot distinguish the generated data from the real data correctly.

In step S302, a composition track model is established using a generative adversarial network. The composition track model inputs a plurality of track data quantities into the generator, and a multi-channel piano key is created through a single generator. , Each channel represents a specific audio track;

Since the composition model is the input of multiple audio track data into the generator, a single generator can be used to create multi-channel piano-rolls, each channel representing a specific audio track, as shown in Figure 3. In this model, there is only one set of generators and discriminators globally. The common input z is used to generate all audio tracks, so no matter what the value of M is, only one generator and one discriminator are needed.

In step S303, a mixed track model is established by combining the interference track model and the composing track model. Each track in the mixed track model has a generator that accepts a combination of independent vectors and global vectors. The input vector is shared with a discriminator to generate the audio track.

As shown in Figure 4, a hybrid model is created by combining the interference model and the composition model. The hybrid model combines the above two model methods. Each track has a generator that accepts the independent vector Z _i and the global vector z together. The input vector is shared with a discriminator to generate the audio track. Compared with the composition model, the mixing mode is more flexible. Different parameters such as the number of layers and the size of the convolution kernel can be used in the G model to combine the independent generation of the audio track and the global generation in harmony.

In one embodiment, the generator is divided into a time structure generator G _temp and a music bar generator G _bar , and a music _{bar is} constructed between the time structure generator G _temp and the music bar generator G _bar . The temporal models of temporal correlation include:

The generator is divided into a time structure generator G _temp and a music bar generator G _bar . The time structure generator G _temp maps the input vector Z into a sequence of hidden space vectors.

T stands for time,

Carry timing information, and then send it to the music bar generator G _{bar to serially} generate piano keys, which is defined as:

In one embodiment, the combining the mixed audio track model and the time model to form a multi-track symphony composition model includes:

The combination of the mixed track model and the time model forms a multi-track symphony composition model, where the input of the model is used

Represents, Z _t the temporal correlation between tracks global vector, the global time between tracks independent vector Z, a separate time-independent vector Z _i of the inner rail, the inner rail, and a separate time-dependent vector Z _i, _t composition;

Set track i, said track i = 1, 2 ... M, the shared time structure generator G _temp and the time structure generator G _{temp, i used} respectively, take random vectors Z _{t that} change with time, respectively And Z _{i, t} as inputs, and output potential vectors containing inter-track and intra-track time information respectively. The output sequence potential vectors are sent to the music bar generator G _bar together with the time-independent random vectors Z and Z _i . Then generate piano windows in sequence. The generation process can be defined as:

The input variables between the tracks are combined with the global input variables to form a multi-track symphony. The input variables between the tracks are divided into time dependent and unrelated, and the global input variables are divided into time related and unrelated.

For the audio track i (i = 1,2 ... M), the shared time structure generator G _temp and the respective time structure generator G _{temp, i} respectively adopt random vectors Z _t and Z _{i that} change with time _{, t} as input, and they output a series of latent vectors containing inter-track and intra-track time information, and the output sequence (potential vector) is sent to the music bar generator G _bar together with the time-independent random vectors Z and Z _i , And then generate piano windows in order. The generation process can be formulated as:

It can be clearly seen from the generating formula that the input variables between the tracks (divided into time dependent and irrelevant) and the global input variables (divided into time dependent and unrelated) are combined to form a multi-track symphony generating system.

As shown in FIG. 9, in one embodiment, a system for establishing a composition model is provided. The system for establishing a composition model includes:

The acquisition module is configured to acquire a music data set in the MIDI format and convert the music data set in the MIDI format into a piano key shaft, where the piano key shaft is a music storage medium for reproducing a piano performance, and when read Trigger the playback loop of each note

The cleaning module is set to clean the data of the piano key axis after the format conversion;

A building module configured to use a generative adversarial network to establish an interference track model and a composing track model, and combine the interference track model and the composing track model to establish a mixed track model;

A building module configured to divide the generator into a time structure generator G _temp and a music bar generator G _bar , and use the time structure generator G _temp and the music bar generator G _{bar to} construct a timing correlation between music bars Sexual formation time model

The combination module is configured to combine the mixed audio track model and the time model to form a multi-track symphony composition model.

As shown in FIG. 10, in one embodiment, the cleaning module further includes:

The merging unit is configured to merge the tracks of similar instruments through the piano key axis, and integrate the non-brass, string, wind and percussion tracks into the string instrument track;

The trimming unit is configured to treat four music bars as one phrase, divide it by a piano reel, and trim a long section to a suitable size, where the suitable size is the sound range of C1 to C8.

As shown in FIG. 11, in one embodiment, the establishment module further includes:

An interference model unit is set up to set up an interference track model using a generative adversarial network. Each of the interference track models has its own set of generators and discriminators, as well as independent hidden space variables. The generators work independently of each other, and given a random vector to generate their own music track, the generator receives feedback from different discriminators, and uses the combat mode of the generator and discriminator to implement the adversarial learning of the generative adversarial network to make the generation Controller controls the data distribution of generated music sequences;

A composition model unit is set up to set up a composition track model using a generative adversarial network. The composition track model inputs a plurality of track data quantities into the generator, and creates a multi-channel through a single generator. Piano keys, each channel represents a specific audio track;

The combining unit is configured to establish a mixed audio track model by combining the disturbing audio track model and the composing audio track model. Each audio track in the mixed audio track model has a generator that accepts a combination of independent vectors and global vectors. Into the input vector, and a discriminator is used to generate the audio track.

In one embodiment, the building module includes:

A generator processing unit configured to divide the generator into a time structure generator G _temp and a music bar generator G _bar , where the time structure generator G _temp maps the input vector Z into a sequence of hidden space vectors

T stands for time,

In one embodiment, the combination module includes:

A mixing processing unit configured to form a combination of a mixed audio track model and a time model to form a multi-track symphony composition model, wherein the input of the model is used

The music composition generating unit is set to set a track i, where the track i = 1, 2,... M, the common time structure generator G _temp and the respective time structure generator G _{temp, i} are respectively adopted as Time-varying random vectors Z _t and Z _{i, t are} used as inputs, and potential vectors containing time information between and within tracks are output respectively. The output sequence potential vectors are sent together with time-independent random vectors Z and Z _i Music bar generator G _bar and then generate piano windows in sequence. The generation process can be defined as:

In one embodiment, a computer device is provided. The computer device includes a memory and a processor. The memory stores computer-readable instructions. When the computer-readable instructions are executed by the processor, the processor is caused to execute the foregoing embodiments. Steps in the method of building a composition model.

In one embodiment, a storage medium storing computer-readable instructions is provided. When the computer-readable instructions are executed by one or more processors, the one or more processors are caused to execute all the steps in the foregoing embodiments. Describe the steps of the method of building a composition model. The storage medium may be a non-volatile storage medium.

A person of ordinary skill in the art may understand that all or part of the steps in the various methods of the foregoing embodiments may be implemented by a program instructing related hardware. The program may be stored in a computer-readable storage medium. The storage medium may include: Read-only memory (ROM, Read Only Memory), random access memory (RAM, Random Access Memory), magnetic disks or optical disks, etc.

The technical features of the embodiments described above can be arbitrarily combined. In order to simplify the description, all possible combinations of the technical features in the above embodiments have not been described. However, as long as there is no contradiction in the combination of these technical features, It should be considered as the scope described in this specification.

The above-mentioned embodiments only express some exemplary embodiments of the present application, and their descriptions are more specific and detailed, but cannot be understood as a limitation on the scope of the patent of the present application. It should be noted that, for those of ordinary skill in the art, without departing from the concept of the present application, several modifications and improvements can be made, and these all belong to the protection scope of the present application. Therefore, the protection scope of this application patent shall be subject to the appended claims.

Claims

A method for establishing a composition model includes: acquiring a music data set in a MIDI format and converting the music data set in a MIDI format into a piano key shaft, the piano key shaft being a music storage medium for reproducing a piano performance , When it is read, trigger the playback cycle of each note; clean up the piano keys after format conversion; use the generative adversarial network to establish the interference track model and the composition track model, and combine the interference track model Build a mixed track model with the composer track model; divide the generator into a time structure generator G temp and a music bar generator G bar , and build music through the time structure generator G temp and the music bar generator G bar The temporal correlation between the bars forms a time model; the mixed track model and the time model are combined to form a multi-track symphony composition model.
The method for establishing a composition model according to claim 1, wherein the data cleaning of the piano keys after format conversion comprises: merging sound tracks of similar instruments through the piano keys, combining non-brass, string instruments, and wind instruments The percussion and percussion tracks are collectively summarized into the string instrument track; the four music measures are regarded as one phrase, which is divided by the piano scroll, and the long section is trimmed to a suitable size, which is the sound range of C1 to C8.
The method for establishing a composition model according to claim 1, wherein the interference track model and the composition track model are established by using a generative adversarial network, and the mixed track is established by combining the interference track model and the composition track model. The model includes:

A generative adversarial network is used to establish an interfering track model. Each of the interfering track models has its own set of generators and discriminators, and independent hidden space variables, where multiple generators work independently of each other, and Given a random vector to generate its own music track, the generator receives feedback from different discriminators, and uses the combat mode of the generator and discriminator to implement adversarial learning of the generative adversarial network, so that the generator controls the data that generates the music sequence Distribution;

A generative adversarial network is used to establish a composition track model. The composition track model inputs multiple track data quantities into the generator. A single generator creates a multi-channel piano key, each Channel represents a specific audio track;

A mixed audio track model is established by combining the interference audio track model and the composing audio track model. Each audio track in the mixed audio track model has a generator that accepts an input vector combined by independent vectors and global vectors. A discriminator is also used to generate audio tracks.
The method for establishing a composition model according to claim 1, wherein the generator is divided into a time structure generator Gtemp and a music bar generator Gbar , and the time structure generator Gtemp and the music bar The generator G bar builds the temporal correlation between music bars to form a time model including:

The generator is divided into a time structure generator G temp and a music bar generator G bar . The time structure generator G temp maps the input vector Z into a sequence of hidden space vectors.

T stands for time,
Carry timing information, and then send it to the music bar generator G bar to serially generate piano keys, which is defined as:
The method for establishing a composition model according to claim 1, wherein the combining the mixed track model and the time model to form a multi-track symphony composition model comprises:

The combination of the mixed track model and the time model forms a multi-track symphony composition model, where the input of the model is used
Represents, Z t the temporal correlation between tracks global vector, the global time between tracks independent vector Z, a separate time-independent vector Z i of the inner rail, the inner rail, and a separate time-dependent vector Z i, t composition;

Set track i, said track i = 1, 2 ... M, the shared time structure generator G temp and the time structure generator G temp, i used respectively, take random vectors Z t that change with time, respectively And Z i, t as inputs, and output potential vectors containing inter-track and intra-track time information respectively. The output sequence potential vectors are sent to the music bar generator G bar together with the time-independent random vectors Z and Z i . Then generate piano windows in order,

The generation process can be defined as:

The input variables between the tracks are combined with the global input variables to form a multi-track symphony. The input variables between the tracks are divided into time dependent and unrelated, and the global input variables are divided into time related and unrelated.
A system for establishing a composition model, including:

The acquisition module is configured to acquire a music data set in the MIDI format and convert the music data set in the MIDI format into a piano key shaft, where the piano key shaft is a music storage medium for reproducing a piano performance, and when read Trigger the playback loop of each note

The cleaning module is set to clean the data of the piano key axis after the format conversion;

A building module configured to use a generative adversarial network to establish an interference track model and a composing track model, and combine the interference track model and the composing track model to establish a mixed track model;

A building module configured to divide the generator into a time structure generator G temp and a music bar generator G bar , and use the time structure generator G temp and the music bar generator G bar to construct a timing correlation between music bars Sexual formation time model

The combination module is configured to combine the mixed audio track model and the time model to form a multi-track symphony composition model.
The system for establishing a composition model according to claim 6, wherein the cleaning module further comprises:

The merging unit is configured to merge the tracks of similar instruments through the piano key axis, and integrate the non-brass, string, wind and percussion tracks into the string instrument track;

The trimming unit is configured to treat four music bars as one phrase, divide it by a piano reel, and trim a long section to a suitable size, where the suitable size is the sound range of C1 to C8.
The system for establishing a composition model according to claim 6, wherein the establishment module further comprises:

An interference model unit is set up to set up an interference track model using a generative adversarial network. Each of the interference track models has its own set of generators and discriminators, as well as independent hidden space variables. The generators work independently of each other, and given a random vector to generate their own music track, the generator receives feedback from different discriminators, and uses the combat mode of the generator and discriminator to implement the adversarial learning of the generative adversarial network to make the generation Controller controls the data distribution of generated music sequences;

A composition model unit is set up to set up a composition track model using a generative adversarial network. The composition track model inputs a plurality of track data quantities into the generator, and creates a multi-channel through a single generator. Piano keys, each channel represents a specific audio track;

The combining unit is configured to establish a mixed audio track model by combining the disturbing audio track model and the composing audio track model. Each audio track in the mixed audio track model has a generator that accepts a combination of independent vectors and global vectors. Into the input vector, and a discriminator is used to generate the audio track.
The system for establishing a composition model according to claim 6, wherein the building module comprises:

A generator processing unit configured to divide the generator into a time structure generator G temp and a music bar generator G bar , where the time structure generator G temp maps the input vector Z into a sequence of hidden space vectors

T stands for time,
Carry timing information, and then send it to the music bar generator G bar to serially generate piano keys, which is defined as:
The system for establishing a composition model according to claim 6, wherein the combination module comprises:

A mixing processing unit configured to form a combination of a mixed audio track model and a time model to form a multi-track symphony composition model, wherein the input of the model is used
Represents, Z t the temporal correlation between tracks global vector, the global time between tracks independent vector Z, a separate time-independent vector Z i of the inner rail, the inner rail, and a separate time-dependent vector Z i, t composition;

The music composition generating unit is set to set a track i, where the track i = 1, 2,... M, the common time structure generator G temp and the respective time structure generator G temp, i are respectively adopted as Time-varying random vectors Z t and Z i, t are used as inputs, and potential vectors containing time information between and within tracks are output respectively. The output sequence potential vectors are sent together with time-independent random vectors Z and Z i Music bar generator G bar and then generate piano windows in sequence. The generation process can be defined as:

The input variables between the tracks are combined with the global input variables to form a multi-track symphony. The input variables between the tracks are divided into time dependent and unrelated, and the global input variables are divided into time related and unrelated.
A computer device includes a memory and a processor. The memory stores computer-readable instructions. When the computer-readable instructions are executed by the processor, the processor causes the processor to perform the following steps:

Acquire a music data set in the MIDI format and convert the music data set in the MIDI format into a piano key, which is a music storage medium used to reproduce the piano performance and trigger each note when read Playback loop

Data cleaning of piano keys after format conversion;

Using a generative adversarial network to establish an interference track model and a composing track model, and combining the interference track model and the composing track model to establish a mixed track model;

Divide the generator into a time structure generator G temp and a music measure generator G bar , and use the time structure generator G temp and the music measure generator G bar to construct a temporal correlation between music measures to form a time model;

The mixed audio track model and the time model are combined to form a multi-track symphony composition model.
The computer device according to claim 11, wherein when performing data cleaning on the converted piano key shaft, the processor is caused to perform the following steps:

Combine the soundtracks of similar instruments through the piano keys, and unify the tracks of non-brass, string, wind and percussion into the strings of the string instrument;

The four music bars are regarded as one phrase, which is divided by the piano reel, and the long section is trimmed to a suitable size, and the suitable size is the sound range of C1 to C8.
The computer device according to claim 11, wherein the interference track model and the composing track model are established by using a generative adversarial network, and the mixed track model is established by combining the interference track model and the composing track model. When the processor is caused to perform the following steps:

A generative adversarial network is used to establish an interfering track model. Each of the interfering track models has its own set of generators and discriminators, and independent hidden space variables, where multiple generators work independently of each other, and Given a random vector to generate its own music track, the generator receives feedback from different discriminators, and uses the combat mode of the generator and discriminator to implement adversarial learning of the generative adversarial network, so that the generator controls the data that generates the music sequence Distribution;

A generative adversarial network is used to establish a composition track model. The composition track model inputs multiple track data quantities into the generator. A single generator creates a multi-channel piano key, each Channel represents a specific audio track;

A mixed audio track model is established by combining the interference audio track model and the composing audio track model. Each audio track in the mixed audio track model has a generator that accepts an input vector combined by independent vectors and global vectors. A discriminator is also used to generate audio tracks.
A computer apparatus according to claim 11, wherein the generator into the generator G temp timing structure section and the music generator G bar, generated by the generator G temp time structure and said music passage When the processor G bar constructs the temporal correlation between the music bars to form a time model, the processor is caused to perform the following steps:

The generator is divided into a time structure generator G temp and a music bar generator G bar . The time structure generator G temp maps the input vector Z into a sequence of hidden space vectors.

T stands for time,
Carry timing information, and then send it to the music bar generator G bar to serially generate piano keys, which is defined as:
The computer device according to claim 11, wherein when said combining the mixed track model and the time model to form a multi-track symphony composition model, the processor is caused to perform the following steps:

The combination of the mixed track model and the time model forms a multi-track symphony composition model, where the input of the model is used
Represents, Z t the temporal correlation between tracks global vector, the global time between tracks independent vector Z, a separate time-independent vector Z i of the inner rail, the inner rail, and a separate time-dependent vector Z i, t composition;

Set track i, said track i = 1, 2 ... M, the shared time structure generator G temp and the time structure generator G temp, i used respectively, take random vectors Z t that change with time, respectively And Z i, t as inputs, and output potential vectors containing inter-track and intra-track time information respectively. The output sequence potential vectors are sent to the music bar generator G bar together with the time-independent random vectors Z and Z i . Then generate piano windows in sequence. The generation process can be defined as:

The input variables between the tracks are combined with the global input variables to form a multi-track symphony. The input variables between the tracks are divided into time dependent and unrelated, and the global input variables are divided into time related and unrelated.
A storage medium storing computer-readable instructions. When the computer-readable instructions are executed by one or more processors, the one or more processors execute the following steps:

Acquire a music data set in the MIDI format and convert the music data set in the MIDI format into a piano key, which is a music storage medium used to reproduce the piano performance and trigger each note when read Playback loop

Data cleaning of piano keys after format conversion;

Using a generative adversarial network to establish an interference track model and a composing track model, and combining the interference track model and the composing track model to establish a mixed track model;

Divide the generator into a time structure generator G temp and a music measure generator G bar , and use the time structure generator G temp and the music measure generator G bar to construct a temporal correlation between music measures to form a time model;

The mixed audio track model and the time model are combined to form a multi-track symphony composition model.
The storage medium storing computer-readable instructions according to claim 16, wherein when performing the data cleaning on the converted piano key shaft, the processor is caused to perform the following steps:

Combine the soundtracks of similar instruments through the piano keys, and unify the tracks of non-brass, string, wind and percussion into the strings of the string instrument;

The four music bars are regarded as one phrase, which is divided by the piano reel, and the long section is trimmed to a suitable size, and the suitable size is the sound range of C1 to C8.
The storage medium storing computer-readable instructions according to claim 16, wherein the generative adversarial network is used to establish an interference track model and a composing track model, and combining the interference track model and a composing track When the track model is established, the processor is caused to perform the following steps:

A generative adversarial network is used to establish an interfering track model. Each of the interfering track models has its own set of generators and discriminators, and independent hidden space variables, where multiple generators work independently of each other, and Given a random vector to generate its own music track, the generator receives feedback from different discriminators, and uses the combat mode of the generator and discriminator to implement adversarial learning of the generative adversarial network, so that the generator controls the data that generates the music sequence Distribution;

A generative adversarial network is used to establish a composition track model. The composition track model inputs multiple track data quantities into the generator. A single generator creates a multi-channel piano key, each Channel represents a specific audio track;

A mixed audio track model is established by combining the interference audio track model and the composing audio track model. Each audio track in the mixed audio track model has a generator that accepts an input vector combined by independent vectors and global vectors. A discriminator is also used to generate audio tracks.
The storage medium storing computer-readable instructions according to claim 16, wherein the generator is divided into a time structure generator Gtemp and a music bar generator Gbar , and the time structure generator G When temp and the music bar generator G bar construct a temporal correlation between music bars to form a time model, the processor is caused to perform the following steps:

The generator is divided into a time structure generator G temp and a music bar generator G bar . The time structure generator G temp maps the input vector Z into a sequence of hidden space vectors.

T stands for time,
Carry timing information, and then send it to the music bar generator G bar to serially generate piano keys, which is defined as:
The storage medium storing computer-readable instructions according to claim 16, wherein when said combining the mixed track model and the time model to form a multi-track symphony composition model, the processor is caused to execute The following steps:

The combination of the mixed track model and the time model forms a multi-track symphony composition model, where the input of the model is used
Represents, Z t the temporal correlation between tracks global vector, the global time between tracks independent vector Z, a separate time-independent vector Z i of the inner rail, the inner rail, and a separate time-dependent vector Z i, t composition;

Set track i, said track i = 1, 2 ... M, the shared time structure generator G temp and the time structure generator G temp, i used respectively, take random vectors Z t that change with time, respectively And Z i, t as inputs, and output potential vectors containing inter-track and intra-track time information respectively. The output sequence potential vectors are sent to the music bar generator G bar together with the time-independent random vectors Z and Z i . Then generate piano windows in sequence. The generation process can be defined as:

The input variables between the tracks are combined with the global input variables to form a multi-track symphony. The input variables between the tracks are divided into time dependent and unrelated, and the global input variables are divided into time related and unrelated.